Arxiv Day: Article

Multi-Agent Reinforcement Learning Meets Leaf Sequencing in Radiotherapy

In contemporary radiotherapy planning (RTP), a key module leaf sequencing is predominantly addressed by optimization-based approaches. In this paper, we propose a novel deep reinforcement learning (DRL) model termed as Reinforced Leaf Sequencer (RLS) in a multi-agent framework for leaf sequencing. The RLS model offers improvements to time-consuming iterative optimization steps via large-scale training and can control movement patterns through the design of reward mechanisms. We have conducted experiments on four datasets with four metrics and compared our model with a leading optimization sequencer. Our findings reveal that the proposed RLS model can achieve reduced fluence reconstruction errors, and potential faster convergence when integrated in an optimization planner. Additionally, RLS has shown promising results in a full artificial intelligence RTP pipeline. We hope this pioneer multi-agent RL leaf sequencer can foster future research on machine learning for RTP.

Updated: 2024-06-03 23:55:20

标题: 多智能体强化学习在放射治疗中的叶片顺序问题中的应用

摘要: 在当代放射治疗规划（RTP）中，关键模块叶片排序主要通过基于优化的方法来解决。在本文中，我们提出了一种新颖的深度强化学习（DRL）模型，称为Reinforced Leaf Sequencer（RLS），在多智能体框架中用于叶片排序。RLS模型通过大规模训练提供了对耗时的迭代优化步骤的改进，并可以通过设计奖励机制来控制运动模式。我们对四个数据集进行了实验，使用四个指标比较了我们的模型与一种领先的优化排序器。研究结果表明，提出的RLS模型在集成到优化规划器中时可以实现减少辐射重建错误和潜在更快的收敛。此外，RLS在完整的人工智能RTP流水线中展现了有希望的结果。我们希望这种先驱的多智能体RL叶片排序器能促进将来关于放射治疗规划的机器学习研究。

更新时间: 2024-06-03 23:55:20

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2406.01853v1

Non-uniformity is All You Need: Efficient and Timely Encrypted Traffic Classification With ECHO

With 95% of Internet traffic now encrypted, an effective approach to classifying this traffic is crucial for network security and management. This paper introduces ECHO -- a novel optimization process for ML/DL-based encrypted traffic classification. ECHO targets both classification time and memory utilization and incorporates two innovative techniques. The first component, HO (Hyperparameter Optimization of binnings), aims at creating efficient traffic representations. While previous research often uses representations that map packet sizes and packet arrival times to fixed-sized bins, we show that non-uniform binnings are significantly more efficient. These non-uniform binnings are derived by employing a hyperparameter optimization algorithm in the training stage. HO significantly improves accuracy given a required representation size, or, equivalently, achieves comparable accuracy using smaller representations. Then, we introduce EC (Early Classification of traffic), which enables faster classification using a cascade of classifiers adapted for different exit times, where classification is based on the level of confidence. EC reduces the average classification latency by up to 90\%. Remarkably, this method not only maintains classification accuracy but also, in certain cases, improves it. Using three publicly available datasets, we demonstrate that the combined method, Early Classification with Hyperparameter Optimization (ECHO), leads to a significant improvement in classification efficiency.

Updated: 2024-06-03 23:54:48

标题: 非均匀性就足够了：具有ECHO的高效及时加密流量分类

摘要: 随着95%的互联网流量现在加密，对于对这些流量进行分类的有效方法对于网络安全和管理至关重要。本文介绍了ECHO——一种基于ML/DL的加密流量分类的新型优化过程。ECHO旨在同时优化分类时间和内存利用，并融合了两种创新技术。第一个组件，HO（超参数优化的分箱），旨在创建高效的流量表示。尽管先前的研究通常使用将数据包大小和数据包到达时间映射到固定大小箱中的表示，我们表明非均匀的分箱要显著更加高效。这些非均匀的分箱是通过在训练阶段使用超参数优化算法得出的。HO显著提高了在给定所需表示大小的情况下的准确性，或者等效地使用更小的表示实现可比较的准确性。然后，我们介绍了EC（流量的早期分类），它通过一系列适应不同退出时间的分类器的级联来实现更快的分类，其中分类是基于置信水平的。EC将平均分类延迟降低了高达90%。值得注意的是，这种方法不仅保持了分类准确性，而且在某些情况下还提高了准确性。通过使用三个公开可用数据集，我们证明了结合了早期分类和超参数优化的方法（ECHO）显著提高了分类效率。

更新时间: 2024-06-03 23:54:48

领域: cs.NI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.01852v1

REBUS: A Robust Evaluation Benchmark of Understanding Symbols

We propose a new benchmark evaluating the performance of multimodal large language models on rebus puzzles. The dataset covers 333 original examples of image-based wordplay, cluing 13 categories such as movies, composers, major cities, and food. To achieve good performance on the benchmark of identifying the clued word or phrase, models must combine image recognition and string manipulation with hypothesis testing, multi-step reasoning, and an understanding of human cognition, making for a complex, multimodal evaluation of capabilities. We find that GPT-4o significantly outperforms all other models, followed by proprietary models outperforming all other evaluated models. However, even the best model has a final accuracy of only 42\%, which goes down to just 7\% on hard puzzles, highlighting the need for substantial improvements in reasoning. Further, models rarely understand all parts of a puzzle, and are almost always incapable of retroactively explaining the correct answer. Our benchmark can therefore be used to identify major shortcomings in the knowledge and reasoning of multimodal large language models.

Updated: 2024-06-03 23:49:45

标题: REBUS：理解符号的强大评估基准

摘要: 我们提出了一个新的基准，用于评估多模式大型语言模型在谜题中的表现。该数据集涵盖了333个原始的基于图像的文字游戏示例，涵盖了电影、作曲家、主要城市和食物等13个类别。为了在识别提示的单词或短语的基准测试中取得良好表现，模型必须结合图像识别和字符串操作，与假设测试、多步推理和对人类认知的理解相结合，从而进行复杂的多模式评估。我们发现，GPT-4o在所有其他模型中表现明显优异，其次是专有模型优于所有其他评估过的模型。然而，即使是最好的模型最终准确率也仅为42\%，在难题上仅为7\%，突显了在推理方面需要实质性改进的必要性。此外，模型很少理解谜题的所有部分，并且几乎总是无法事后解释正确答案。因此，我们的基准可以用来识别多模式大型语言模型的知识和推理方面的主要缺陷。

更新时间: 2024-06-03 23:49:45

领域: cs.CL,cs.AI,cs.CV,cs.CY

下载: http://arxiv.org/abs/2401.05604v2

Resampling methods for private statistical inference

We consider the task of constructing confidence intervals with differential privacy. We propose two private variants of the non-parametric bootstrap, which privately compute the median of the results of multiple "little" bootstraps run on partitions of the data and give asymptotic bounds on the coverage error of the resulting confidence intervals. For a fixed differential privacy parameter $\epsilon$, our methods enjoy the same error rates as that of the non-private bootstrap to within logarithmic factors in the sample size $n$. We empirically validate the performance of our methods for mean estimation, median estimation, and logistic regression with both real and synthetic data. Our methods achieve similar coverage accuracy to existing methods (and non-private baselines) while providing notably shorter ($\gtrsim 10$ times) confidence intervals than previous approaches.

Updated: 2024-06-03 23:47:44

标题: 私人统计推断的重采样方法

摘要: 我们考虑使用差分隐私构建置信区间的任务。我们提出了两种私有变体的非参数自举法，这些方法通过在数据分区上运行多个“小”自举法来私下计算结果的中位数，并对得到的置信区间的覆盖误差给出渐近界限。对于固定的差分隐私参数$\epsilon$，我们的方法在样本量$n$的对数因子内享有与非私有自举法相同的误差率。我们通过实验证明了我们的方法在均值估计、中位数估计以及逻辑回归中的性能，使用真实和合成数据。我们的方法在提供与现有方法（和非私有基线）相似的覆盖精度的同时，提供明显更短（大约10倍）的置信区间，比以往的方法更好。

更新时间: 2024-06-03 23:47:44

领域: stat.ML,cs.CR,cs.LG,stat.ME

下载: http://arxiv.org/abs/2402.07131v3

Challenges in Training PINNs: A Loss Landscape Perspective

This paper explores challenges in training Physics-Informed Neural Networks (PINNs), emphasizing the role of the loss landscape in the training process. We examine difficulties in minimizing the PINN loss function, particularly due to ill-conditioning caused by differential operators in the residual term. We compare gradient-based optimizers Adam, L-BFGS, and their combination Adam+L-BFGS, showing the superiority of Adam+L-BFGS, and introduce a novel second-order optimizer, NysNewton-CG (NNCG), which significantly improves PINN performance. Theoretically, our work elucidates the connection between ill-conditioned differential operators and ill-conditioning in the PINN loss and shows the benefits of combining first- and second-order optimization methods. Our work presents valuable insights and more powerful optimization strategies for training PINNs, which could improve the utility of PINNs for solving difficult partial differential equations.

Updated: 2024-06-03 23:35:42

标题: 培训PINNs中的挑战：损失景观视角

摘要: 本文探讨了训练物理启发神经网络（PINNs）中的挑战，强调了损失景观在训练过程中的作用。我们研究了最小化PINN损失函数的困难，特别是由残差项中的微分算子引起的病态条件。我们比较了基于梯度的优化器Adam、L-BFGS以及它们的组合Adam+L-BFGS，展示了Adam+L-BFGS的优越性，并引入了一种新颖的二阶优化器NysNewton-CG（NNCG），显著提高了PINN的性能。从理论上讲，我们的工作阐明了病态微分算子与PINN损失中的病态条件之间的联系，并展示了结合一阶和二阶优化方法的好处。我们的工作提供了有价值的见解和更强大的优化策略，用于训练PINNs，这可以提高PINNs解决困难的偏微分方程的效用。

更新时间: 2024-06-03 23:35:42

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2402.01868v2

Compositional Generative Modeling: A Single Model is Not All You Need

Large monolithic generative models trained on massive amounts of data have become an increasingly dominant approach in AI research. In this paper, we argue that we should instead construct large generative systems by composing smaller generative models together. We show how such a compositional generative approach enables us to learn distributions in a more data-efficient manner, enabling generalization to parts of the data distribution unseen at training time. We further show how this enables us to program and construct new generative models for tasks completely unseen at training. Finally, we show that in many cases, we can discover separate compositional components from data.

Updated: 2024-06-03 23:30:33

标题: 构成生成建模：一个模型并不是你所需要的全部

摘要: 大规模的单片生成模型在大量数据训练下已经成为人工智能研究中越来越主导的方法。在本文中，我们认为我们应该通过将较小的生成模型组合在一起来构建大型生成系统。我们展示了这种组合生成方法如何使我们能够以更节省数据的方式学习分布，从而实现对训练时未见过的数据分布的泛化。我们进一步展示了这如何使我们能够为训练时完全未见过的任务编程和构建新的生成模型。最后，我们展示在许多情况下，我们可以从数据中发现单独的组成部分。

更新时间: 2024-06-03 23:30:33

领域: cs.LG,cs.AI,cs.CV,cs.RO

下载: http://arxiv.org/abs/2402.01103v3

GraphWeaver: Billion-Scale Cybersecurity Incident Correlation

In the dynamic landscape of large enterprise cybersecurity, accurately and efficiently correlating billions of security alerts into comprehensive incidents is a substantial challenge. Traditional correlation techniques often struggle with maintenance, scaling, and adapting to emerging threats and novel sources of telemetry. We introduce GraphWeaver, an industry-scale framework that shifts the traditional incident correlation process to a data-optimized, geo-distributed graph based approach. GraphWeaver introduces a suite of innovations tailored to handle the complexities of correlating billions of shared evidence alerts across hundreds of thousands of enterprises. Key among these innovations are a geo-distributed database and PySpark analytics engine for large-scale data processing, a minimum spanning tree algorithm to optimize correlation storage, integration of security domain knowledge and threat intelligence, and a human-in-the-loop feedback system to continuously refine key correlation processes and parameters. GraphWeaver is integrated into the Microsoft Defender XDR product and deployed worldwide, handling billions of correlations with a 99% accuracy rate, as confirmed by customer feedback and extensive investigations by security experts. This integration has not only maintained high correlation accuracy but reduces traditional correlation storage requirements by 7.4x. We provide an in-depth overview of the key design and operational features of GraphWeaver, setting a precedent as the first cybersecurity company to openly discuss these critical capabilities at this level of depth.

Updated: 2024-06-03 23:28:05

标题: GraphWeaver：十亿级网络安全事件关联

摘要: 在大型企业网络安全的动态风景中，将数十亿个安全警报准确、高效地相关到全面的事件中是一个重大挑战。传统的相关技术经常难以应对维护、扩展和适应新兴威胁和新型遥测来源的问题。我们介绍了GraphWeaver，这是一个面向行业规模的框架，将传统的事件相关过程转变为基于数据优化、地理分布图的方法。GraphWeaver引入了一套创新，旨在处理数十亿个共享证据警报在数十万个企业之间的相关复杂性。其中关键的创新包括一个地理分布式数据库和PySpark分析引擎用于大规模数据处理，一个最小生成树算法来优化相关存储，整合安全领域知识和威胁情报，以及一个人机协同反馈系统，持续细化关键相关过程和参数。GraphWeaver已集成到Microsoft Defender XDR产品中，并在全球范围内部署，处理数十亿次相关操作，准确率达到99%，得到客户反馈和安全专家的广泛调查证实。这种集成不仅保持了高相关准确性，还将传统相关存储需求降低了7.4倍。我们提供了GraphWeaver的关键设计和运营特点的深入概述，并设定了一个先例，成为第一家公开讨论这一深度关键能力的网络安全公司。

更新时间: 2024-06-03 23:28:05

领域: cs.CR,cs.SI

下载: http://arxiv.org/abs/2406.01842v1

A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback

Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning, which involve inferring and shaping the underlying reward function of sequential decision-making problems based on observed human demonstrations and feedback. Most prior work in reward learning has relied on prior knowledge or assumptions about decision or preference models, potentially leading to robustness issues. In response, this paper introduces a novel linear programming (LP) framework tailored for offline reward learning. Utilizing pre-collected trajectories without online exploration, this framework estimates a feasible reward set from the primal-dual optimality conditions of a suitably designed LP, and offers an optimality guarantee with provable sample efficiency. Our LP framework also enables aligning the reward functions with human feedback, such as pairwise trajectory comparison data, while maintaining computational tractability and sample efficiency. We demonstrate that our framework potentially achieves better performance compared to the conventional maximum likelihood estimation (MLE) approach through analytical examples and numerical experiments.

Updated: 2024-06-03 23:23:54

标题: 一个统一的线性规划框架，用于从人类演示和反馈中学习离线奖励

摘要: 逆强化学习（IRL）和从人类反馈中学习强化学习（RLHF）是奖励学习中至关重要的方法，涉及根据观察到的人类演示和反馈推断和塑造顺序决策问题的基础奖励函数。大多数先前的奖励学习工作依赖于对决策或偏好模型的先验知识或假设，可能导致鲁棒性问题。为此，本文介绍了一个专为离线奖励学习量身定制的新型线性规划（LP）框架。利用事先收集的轨迹，而无需在线探索，该框架从适当设计的LP的原始-对偶最优性条件中估计一个可行的奖励集，并提供具有可证明的样本效率的最优性保证。我们的LP框架还使得将奖励函数与人类反馈（例如成对轨迹比较数据）保持一致，同时保持计算可行性和样本效率。我们通过分析例子和数值实验证明，我们的框架可能比传统的最大似然估计（MLE）方法实现更好的性能。

更新时间: 2024-06-03 23:23:54

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.12421v2

Learning the Target Network in Function Space

We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark.

Updated: 2024-06-03 23:10:35

标题: 在函数空间中学习目标网络

摘要: 我们专注于在强化学习（RL）环境中学习值函数的任务。这个任务通常通过更新一对在线网络和目标网络来解决，同时确保这两个网络的参数是等价的。我们提出了一种新的值函数逼近算法Lookahead-Replicate（LR），它对参数空间的等价性不加以考虑。相反，LR算法旨在在函数空间中保持两个网络的等价性。这种基于值的等价性是通过引入一种新的目标网络更新来实现的。我们展示了LR导致学习值函数的收敛行为。我们还提供了实证结果，证明基于LR的目标网络更新显著改进了Atari基准上的深度RL。

更新时间: 2024-06-03 23:10:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01838v1

Iterated INLA for State and Parameter Estimation in Nonlinear Dynamical Systems

Data assimilation (DA) methods use priors arising from differential equations to robustly interpolate and extrapolate data. Popular techniques such as ensemble methods that handle high-dimensional, nonlinear PDE priors focus mostly on state estimation, however can have difficulty learning the parameters accurately. On the other hand, machine learning based approaches can naturally learn the state and parameters, but their applicability can be limited, or produce uncertainties that are hard to interpret. Inspired by the Integrated Nested Laplace Approximation (INLA) method in spatial statistics, we propose an alternative approach to DA based on iteratively linearising the dynamical model. This produces a Gaussian Markov random field at each iteration, enabling one to use INLA to infer the state and parameters. Our approach can be used for arbitrary nonlinear systems, while retaining interpretability, and is furthermore demonstrated to outperform existing methods on the DA task. By providing a more nuanced approach to handling nonlinear PDE priors, our methodology offers improved accuracy and robustness in predictions, especially where data sparsity is prevalent.

Updated: 2024-06-03 23:08:30

标题: 在非线性动力系统中的状态和参数估计的迭代INLA

摘要: 数据同化（DA）方法使用源自微分方程的先验来强有力地插值和外推数据。流行的技术，如处理高维、非线性PDE先验的集合方法，主要集中在状态估计上，但可能难以准确学习参数。另一方面，基于机器学习的方法可以自然地学习状态和参数，但它们的适用性可能受到限制，或者产生难以解释的不确定性。受空间统计中整合嵌套拉普拉斯近似（INLA）方法的启发，我们提出了一种基于迭代线性化动力模型的DA替代方法。这在每次迭代产生一个高斯马尔可夫随机场，使人们能够使用INLA推断状态和参数。我们的方法可用于任意非线性系统，同时保持可解释性，并且进一步证明在DA任务上优于现有方法。通过提供一个更细致的处理非线性PDE先验的方法，我们的方法论在预测中提供了更好的准确性和鲁棒性，特别是在数据稀缺的情况下。

更新时间: 2024-06-03 23:08:30

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.17036v2

An Open Multilingual System for Scoring Readability of Wikipedia

With over 60M articles, Wikipedia has become the largest platform for open and freely accessible knowledge. While it has more than 15B monthly visits, its content is believed to be inaccessible to many readers due to the lack of readability of its text. However, previous investigations of the readability of Wikipedia have been restricted to English only, and there are currently no systems supporting the automatic readability assessment of the 300+ languages in Wikipedia. To bridge this gap, we develop a multilingual model to score the readability of Wikipedia articles. To train and evaluate this model, we create a novel multilingual dataset spanning 14 languages, by matching articles from Wikipedia to simplified Wikipedia and online children encyclopedias. We show that our model performs well in a zero-shot scenario, yielding a ranking accuracy of more than 80% across 14 languages and improving upon previous benchmarks. These results demonstrate the applicability of the model at scale for languages in which there is no ground-truth data available for model fine-tuning. Furthermore, we provide the first overview on the state of readability in Wikipedia beyond English.

Updated: 2024-06-03 23:07:18

标题: 一个用于评分维基百科可读性的开放式多语言系统

摘要: 随着超过6000万篇文章，维基百科已成为最大的开放和自由获取知识的平台。虽然它每月拥有超过150亿次访问量，但由于其文本缺乏可读性，许多读者认为其内容难以理解。然而，先前对维基百科可读性的调查仅限于英语，目前还没有支持对维基百科中300多种语言进行自动可读性评估的系统。为了弥补这一差距，我们开发了一个多语言模型来评分维基百科文章的可读性。为了训练和评估这个模型，我们创建了一个跨14种语言的新颖多语言数据集，通过将维基百科文章与简化维基百科和在线儿童百科全书进行匹配。我们展示了我们的模型在零-shot场景下表现良好，在14种语言中的排名准确率超过80%，并改进了先前的基准。这些结果证明了该模型在规模上适用于没有可用于模型微调的基准数据的语言。此外，我们首次提供维基百科除了英语之外的可读性概述。

更新时间: 2024-06-03 23:07:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.01835v1

CAFO: Feature-Centric Explanation on Time Series Classification

In multivariate time series (MTS) classification, finding the important features (e.g., sensors) for model performance is crucial yet challenging due to the complex, high-dimensional nature of MTS data, intricate temporal dynamics, and the necessity for domain-specific interpretations. Current explanation methods for MTS mostly focus on time-centric explanations, apt for pinpointing important time periods but less effective in identifying key features. This limitation underscores the pressing need for a feature-centric approach, a vital yet often overlooked perspective that complements time-centric analysis. To bridge this gap, our study introduces a novel feature-centric explanation and evaluation framework for MTS, named CAFO (Channel Attention and Feature Orthgonalization). CAFO employs a convolution-based approach with channel attention mechanisms, incorporating a depth-wise separable channel attention module (DepCA) and a QR decomposition-based loss for promoting feature-wise orthogonality. We demonstrate that this orthogonalization enhances the separability of attention distributions, thereby refining and stabilizing the ranking of feature importance. This improvement in feature-wise ranking enhances our understanding of feature explainability in MTS. Furthermore, we develop metrics to evaluate global and class-specific feature importance. Our framework's efficacy is validated through extensive empirical analyses on two major public benchmarks and real-world datasets, both synthetic and self-collected, specifically designed to highlight class-wise discriminative features. The results confirm CAFO's robustness and informative capacity in assessing feature importance in MTS classification tasks. This study not only advances the understanding of feature-centric explanations in MTS but also sets a foundation for future explorations in feature-centric explanations.

Updated: 2024-06-03 23:06:45

标题: CAFO：基于特征的时间序列分类解释

摘要: 在多元时间序列（MTS）分类中，找到模型性能的重要特征（例如传感器）是至关重要的，但由于MTS数据的复杂、高维性质、复杂的时间动态和对领域特定解释的必要性，这一挑战是困难的。目前，MTS的解释方法主要集中在时间为中心的解释上，适用于确定重要的时间段，但在识别关键特征方面效果较差。这种局限突显了对特征为中心方法的迫切需求，这是一个重要但常被忽视的视角，可以补充时间为中心的分析。为了弥合这一差距，我们的研究引入了一种新的MTS特征为中心的解释和评估框架，命名为CAFO（通道注意和特征正交）。CAFO采用基于卷积的方法，结合通道注意机制，包括一个基于深度可分通道注意模块（DepCA）和基于QR分解的损失，以促进特征间的正交性。我们证明，这种正交化增强了注意分布的可分离性，从而改进和稳定了特征重要性的排名。特征间排名的改进增强了我们对MTS中特征可解释性的理解。此外，我们开发了评估全局和类别特征重要性的指标。通过对两个主要公共基准和真实世界数据集进行广泛的实证分析，包括合成数据和自行收集数据，专门设计用于突出类别间差异特征，验证了我们框架的有效性。结果证实了CAFO在评估MTS分类任务中的特征重要性方面的鲁棒性和信息能力。这项研究不仅推进了MTS中特征为中心解释的理解，也为未来对特征为中心解释的探索奠定了基础。

更新时间: 2024-06-03 23:06:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01833v1

Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing

Certification for machine learning is proving that no adversarial sample can evade a model within a range under certain conditions, a necessity for safety-critical domains. Common certification methods for segmentation use a flat set of fine-grained classes, leading to high abstain rates due to model uncertainty across many classes. We propose a novel, more practical setting, which certifies pixels within a multi-level hierarchy, and adaptively relaxes the certification to a coarser level for unstable components classic methods would abstain from, effectively lowering the abstain rate whilst providing more certified semantically meaningful information. We mathematically formulate the problem setup, introduce an adaptive hierarchical certification algorithm and prove the correctness of its guarantees. Since certified accuracy does not take the loss of information into account for coarser classes, we introduce the Certified Information Gain ($\mathrm{CIG}$) metric, which is proportional to the class granularity level. Our extensive experiments on the datasets Cityscapes, PASCAL-Context, ACDC and COCO-Stuff demonstrate that our adaptive algorithm achieves a higher $\mathrm{CIG}$ and lower abstain rate compared to the current state-of-the-art certification method. Our code can be found here: https://github.com/AlaaAnani/adaptive-certify.

Updated: 2024-06-03 23:02:26

标题: 使用随机平滑进行分割的自适应分层认证

摘要: 机器学习的认证证明了在特定条件下，没有对抗性样本可以在一定范围内规避模型，这对于安全关键领域至关重要。常见的分割认证方法使用一组细粒度类别，由于模型在许多类别上的不确定性，导致高弃权率。我们提出了一种新颖且更实用的设置，该设置在多层次层次结构中对像素进行认证，并针对传统方法会放弃的不稳定组件自适应地放宽认证至较粗糙的级别，有效降低放弃率同时提供更多认证的语义有意义信息。我们在数学上阐述了问题设置，并介绍了一种自适应层次认证算法，并证明了其保证的正确性。由于认证准确性不考虑较粗糙类别的信息损失，我们引入了认证信息增益（CIG）指标，该指标与类别的粒度级别成正比。我们在Cityscapes、PASCAL-Context、ACDC和COCO-Stuff数据集上进行了大量实验，结果表明我们的自适应算法相比当前最先进的认证方法实现了更高的CIG和更低的放弃率。我们的代码可以在此找到：https://github.com/AlaaAnani/adaptive-certify。

更新时间: 2024-06-03 23:02:26

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2402.08400v2

OASIS: Offsetting Active Reconstruction Attacks in Federated Learning

Federated Learning (FL) has garnered significant attention for its potential to protect user privacy while enhancing model training efficiency. For that reason, FL has found its use in various domains, from healthcare to industrial engineering, especially where data cannot be easily exchanged due to sensitive information or privacy laws. However, recent research has demonstrated that FL protocols can be easily compromised by active reconstruction attacks executed by dishonest servers. These attacks involve the malicious modification of global model parameters, allowing the server to obtain a verbatim copy of users' private data by inverting their gradient updates. Tackling this class of attack remains a crucial challenge due to the strong threat model. In this paper, we propose a defense mechanism, namely OASIS, based on image augmentation that effectively counteracts active reconstruction attacks while preserving model performance. We first uncover the core principle of gradient inversion that enables these attacks and theoretically identify the main conditions by which the defense can be robust regardless of the attack strategies. We then construct our defense with image augmentation showing that it can undermine the attack principle. Comprehensive evaluations demonstrate the efficacy of the defense mechanism highlighting its feasibility as a solution.

Updated: 2024-06-03 23:02:08

标题: 翻译：OASIS：在联邦学习中抵消主动重构攻击

摘要: 联邦学习（FL）因其在增强模型训练效率的同时保护用户隐私的潜力而受到广泛关注。因此，FL已被广泛应用于各个领域，从医疗保健到工业工程，尤其是在数据由于敏感信息或隐私法律而不能轻易交换的情况下。然而，最近的研究表明，FL协议很容易受到不诚实服务器执行的主动重建攻击的威胁。这些攻击涉及对全局模型参数的恶意修改，使服务器可以通过反转用户的梯度更新来获得用户私人数据的逐字副本。由于强大的威胁模型，解决这类攻击仍然是一个关键挑战。在本文中，我们提出了一种名为OASIS的基于图像增强的防御机制，有效地抵御主动重建攻击，同时保持模型性能。我们首先揭示了使这些攻击成为可能的梯度反演的核心原理，并在理论上确定了无论攻击策略如何，防御都可以具有鲁棒性的主要条件。然后，我们通过图像增强构建了我们的防御机制，显示它可以破坏攻击原理。全面的评估表明了防御机制的功效，突显了其作为解决方案的可行性。

更新时间: 2024-06-03 23:02:08

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2311.13739v2

A Robust Filter for Marker-less Multi-person Tracking in Human-Robot Interaction Scenarios

Pursuing natural and marker-less human-robot interaction (HRI) has been a long-standing robotics research focus, driven by the vision of seamless collaboration without physical markers. Marker-less approaches promise an improved user experience, but state-of-the-art struggles with the challenges posed by intrinsic errors in human pose estimation (HPE) and depth cameras. These errors can lead to issues such as robot jittering, which can significantly impact the trust users have in collaborative systems. We propose a filtering pipeline that refines incomplete 3D human poses from an HPE backbone and a single RGB-D camera to address these challenges, solving for occlusions that can degrade the interaction. Experimental results show that using the proposed filter leads to more consistent and noise-free motion representation, reducing unexpected robot movements and enabling smoother interaction.

Updated: 2024-06-03 22:59:53

标题: 一个用于无标记多人跟踪的强大滤波器在人机交互场景中

摘要: 追求自然且无标记的人机交互（HRI）一直是长期以来的机器人研究重点，其愿景是实现无需物理标记的无缝协作。无标记方法承诺提供更好的用户体验，但目前的技术面临着人体姿势估计（HPE）和深度摄像机固有误差带来的挑战。这些误差可能导致机器人抖动等问题，严重影响用户对协作系统的信任。我们提出了一个过滤管道，从HPE骨干和单个RGB-D摄像机中细化不完整的3D人体姿势，以解决这些挑战，解决可能降低交互效果的遮挡问题。实验结果表明，使用所提出的过滤器可以实现更一致且无噪音的动作表示，减少意外的机器人运动，实现更流畅的交互。

更新时间: 2024-06-03 22:59:53

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.01832v1

FacAID: A Transformer Model for Neuro-Symbolic Facade Reconstruction

We introduce a neuro-symbolic transformer-based model that converts flat, segmented facade structures into procedural definitions using a custom-designed split grammar. To facilitate this, we first develop a semi-complex split grammar tailored for architectural facades and then generate a dataset comprising of facades alongside their corresponding procedural representations. This dataset is used to train our transformer model to convert segmented, flat facades into the procedural language of our grammar. During inference, the model applies this learned transformation to new facade segmentations, providing a procedural representation that users can adjust to generate varied facade designs. This method not only automates the conversion of static facade images into dynamic, editable procedural formats but also enhances the design flexibility, allowing for easy modifications and variations by architects and designers. Our approach sets a new standard in facade design by combining the precision of procedural generation with the adaptability of neuro-symbolic learning.

Updated: 2024-06-03 22:56:40

标题: FacAID：一种用于神经符号化外观重建的Transformer模型

摘要: 我们介绍了一种基于神经符号变换器的模型，将平面分割的立面结构转换为使用定制设计的分割语法的过程化定义。为了实现这一目标，我们首先开发了一个专为建筑立面定制的半复杂分割语法，然后生成了一个数据集，包括立面和相应的过程化表示。该数据集用于训练我们的变换器模型，将分割的平面立面转换为我们的语法的过程化语言。在推理过程中，模型将这种学习到的转换应用于新的立面分割，提供一个用户可以调整的过程化表示，以生成各种不同的立面设计。这种方法不仅可以自动将静态立面图像转换为动态、可编辑的过程化格式，还可以增强设计的灵活性，使建筑师和设计师可以轻松进行修改和变化。我们的方法通过将过程化生成的精确性与神经符号学习的适应性相结合，树立了立面设计的新标准。

更新时间: 2024-06-03 22:56:40

领域: cs.NE,cs.AI,cs.CV,cs.GR,cs.LG,I.3.5; I.2.2; I.4.5

下载: http://arxiv.org/abs/2406.01829v1

Scaling Down Deep Learning with MNIST-1D

Although deep learning models have taken on commercial and political relevance, key aspects of their training and operation remain poorly understood. This has sparked interest in science of deep learning projects, many of which require large amounts of time, money, and electricity. But how much of this research really needs to occur at scale? In this paper, we introduce MNIST-1D: a minimalist, procedurally generated, low-memory, and low-compute alternative to classic deep learning benchmarks. Although the dimensionality of MNIST-1D is only 40 and its default training set size only 4000, MNIST-1D can be used to study inductive biases of different deep architectures, find lottery tickets, observe deep double descent, metalearn an activation function, and demonstrate guillotine regularization in self-supervised learning. All these experiments can be conducted on a GPU or often even on a CPU within minutes, allowing for fast prototyping, educational use cases, and cutting-edge research on a low budget.

Updated: 2024-06-03 22:41:04

标题: 使用MNIST-1D对深度学习进行缩减

摘要: 尽管深度学习模型在商业和政治上具有重要意义，但它们的训练和运行的关键方面仍然知之甚少。这引发了对深度学习科学项目的兴趣，其中许多需要大量的时间、金钱和电力。但这项研究真的需要在规模上进行吗？在本文中，我们介绍了MNIST-1D：一种极简、程序生成、低内存和低计算的经典深度学习基准的替代方案。尽管MNIST-1D的维度仅为40，其默认训练集大小仅为4000，但MNIST-1D可用于研究不同深度架构的归纳偏差，发现中奖彩票，观察深度双下降，元学习激活函数，并演示自监督学习中的断头台正则化。所有这些实验都可以在GPU上进行，甚至通常可以在几分钟内在CPU上进行，从而实现快速原型设计、教育用例和低成本的前沿研究。

更新时间: 2024-06-03 22:41:04

领域: cs.LG,cs.NE,stat.ML

下载: http://arxiv.org/abs/2011.14439v5

API Pack: A Massive Multi-Programming Language Dataset for API Call Generation

We introduce API Pack, a massive multi-programming language dataset containing more than 1 million instruction-API call pairs to improve the API call generation capabilities of large language models. By fine-tuning CodeLlama-13B on 20,000 Python instances from API Pack, we enable it to outperform GPT-3.5 and GPT-4 in generating unseen API calls. Fine-tuning on API Pack also facilitates cross-programming language generalization by leveraging a large amount of data in one language and small amounts of data from other languages. Scaling the training data to 1 million instances further improves the model's ability to generalize to new APIs not used in training. To facilitate further research, we open-source the API Pack dataset, trained model, and associated source code at https://github.com/zguo0525/API-Pack.

Updated: 2024-06-03 22:38:04

标题: API Pack：一个用于API调用生成的大规模多编程语言数据集

摘要: 我们介绍了API Pack，这是一个包含超过100万个指令-API调用对的大规模多编程语言数据集，旨在提高大型语言模型的API调用生成能力。通过在来自API Pack的20,000个Python实例上对CodeLlama-13B进行微调，我们使其能够在生成未见API调用方面胜过GPT-3.5和GPT-4。在API Pack上进行微调还通过利用一个语言中的大量数据和其他语言中的少量数据，促进了跨编程语言泛化。将训练数据扩展到100万个实例进一步改善了模型对未在训练中使用的新API的泛化能力。为了促进进一步的研究，我们在https://github.com/zguo0525/API-Pack 上开放了API Pack数据集、训练模型和相关源代码。

更新时间: 2024-06-03 22:38:04

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.09615v4

EMOE: Expansive Matching of Experts for Robust Uncertainty Based Rejection

Expansive Matching of Experts (EMOE) is a novel method that utilizes support-expanding, extrapolatory pseudo-labeling to improve prediction and uncertainty based rejection on out-of-distribution (OOD) points. We propose an expansive data augmentation technique that generates OOD instances in a latent space, and an empirical trial based approach to filter out augmented expansive points for pseudo-labeling. EMOE utilizes a diverse set of multiple base experts as pseudo-labelers on the augmented data to improve OOD performance through a shared MLP with multiple heads (one per expert). We demonstrate that EMOE achieves superior performance compared to state-of-the-art methods on tabular data.

Updated: 2024-06-03 22:37:45

标题: EMOE: 专家的广泛匹配用于鲁棒的基于不确定性的拒绝

摘要: Expansive Matching of Experts (EMOE)是一种新颖的方法，利用支持扩展、外推伪标记来提高在分布之外（OOD）点上的预测和基于不确定性的拒绝。我们提出了一种扩展数据增强技术，通过在潜在空间中生成OOD实例，并基于经验试验的方法来过滤出扩展点进行伪标记。EMOE利用多个基础专家作为伪标记者在扩展数据上使用共享的MLP来提高OOD性能（每个专家一个头）。我们证明，与最先进的方法相比，EMOE在表格数据上实现了卓越的性能。

更新时间: 2024-06-03 22:37:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01825v1

VQPy: An Object-Oriented Approach to Modern Video Analytics

Video analytics is widely used in contemporary systems and services. At the forefront of video analytics are video queries that users develop to find objects of particular interest. Building upon the insight that video objects (e.g., human, animals, cars, etc.), the center of video analytics, are similar in spirit to objects modeled by traditional object-oriented languages, we propose to develop an object-oriented approach to video analytics. This approach, named VQPy, consists of a frontend$\unicode{x2015}$a Python variant with constructs that make it easy for users to express video objects and their interactions$\unicode{x2015}$as well as an extensible backend that can automatically construct and optimize pipelines based on video objects. We have implemented and open-sourced VQPy, which has been productized in Cisco as part of its DeepVision framework.

Updated: 2024-06-03 22:36:36

标题: VQPy：现代视频分析的面向对象方法

摘要: 视频分析在当代系统和服务中被广泛使用。视频分析的前沿是用户开发的视频查询，以查找特定感兴趣的对象。基于视频对象（例如人类、动物、汽车等）与传统面向对象语言建模的对象在精神上相似的洞察，我们提出了一种面向对象的视频分析方法。这种方法被命名为VQPy，包括一个前端——一种带有构造的Python变体，使用户能够轻松表达视频对象及其交互——以及一个可扩展的后端，可以根据视频对象自动构建和优化管道。我们已经实现并开源了VQPy，它已经作为Cisco DeepVision框架的一部分在Cisco中进行了产品化。

更新时间: 2024-06-03 22:36:36

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2311.01623v4

OpenAPI Specification Extended Security Scheme: A method to reduce the prevalence of Broken Object Level Authorization

APIs have become the prominent technology of choice for achieving inter-service communications. The growth of API deployments has driven the urgency in addressing its lack of security standards. API Security is a topic for concern given the absence of standardized authorization in the OpenAPI standard, improper authorization opens the possibility for known and unknown vulnerabilities, which in the past years have been exploited by malicious actors resulting in data loss. This paper examines the number one vulnerability in API Security: Broken Object Level Authorization(BOLA), and proposes methods and tools to reduce the prevalence of this vulnerability. BOLA affects various API frameworks, our scope is fixated on the OpenAPI Specification(OAS). The OAS is a standard for describing and implementing APIs; popular OAS Implementations are FastAPI, Connexion (Flask), and many more. These implementations carry the pros and cons that are associated with the OASs knowledge of API properties. The Open API Specifications security properties do not address object authorization and provide no standardized approach to define such object properties. This leaves object-level security at the mercy of developers, which presents an increased risk of unintentionally creating attack vectors. Our aim is to tackle this void by introducing 1) the OAS ESS (OpenAPI Specification Extended Security Scheme) which includes declarative security controls for objects in OAS (design-based approach), and 2) an authorization module that can be imported to API services (Flask/FastAPI) to enforce authorization checks at the object level (development-based approach). When building an API service, a developer can start with the API design (specification) or its code. In both cases, a set of mechanisms are introduced to help developers mitigate and reduce the prevalence of BOLA.

Updated: 2024-06-03 22:36:18

标题: 《OpenAPI规范扩展安全方案：减少破损对象级别授权的方法》

摘要: APIs已经成为实现服务之间通信的首选技术。API部署的增长推动了解决其缺乏安全标准的迫切性。API安全是一个令人担忧的话题，因为OpenAPI标准中缺乏标准化的授权，不正确的授权打开了已知和未知漏洞的可能性，这些漏洞在过去几年中被恶意行为者利用，导致数据丢失。本文研究了API安全中的头号漏洞：破损的对象级授权（BOLA），并提出了减少这种漏洞普遍性的方法和工具。BOLA影响各种API框架，我们的范围是集中在OpenAPI规范（OAS）上。OAS是一种描述和实现API的标准；流行的OAS实现包括FastAPI、Connexion（Flask）等。这些实现带有与API属性的OAS知识相关的优缺点。开放API规范的安全属性不涉及对象授权，并没有提供定义此类对象属性的标准化方法。这使得对象级安全取决于开发人员，这增加了无意中创建攻击向量的风险。我们的目标是通过引入1）OAS ESS（OpenAPI规范扩展安全方案），其中包括OAS中对象的声明性安全控制（基于设计的方法），以及2）一个可以导入到API服务（Flask/FastAPI）中的授权模块，以在对象级别执行授权检查（基于开发的方法）。在构建API服务时，开发人员可以从API设计（规范）或其代码开始。在这两种情况下，介绍了一套机制，帮助开发人员减轻和减少BOLA的普遍性。

更新时间: 2024-06-03 22:36:18

领域: cs.CR,cs.PL

下载: http://arxiv.org/abs/2212.06606v3

Towards Causal Foundation Model: on Duality between Causal Inference and Attention

Foundation models have brought changes to the landscape of machine learning, demonstrating sparks of human-level intelligence across a diverse array of tasks. However, a gap persists in complex tasks such as causal inference, primarily due to challenges associated with intricate reasoning steps and high numerical precision requirements. In this work, we take a first step towards building causally-aware foundation models for treatment effect estimations. We propose a novel, theoretically justified method called Causal Inference with Attention (CInA), which utilizes multiple unlabeled datasets to perform self-supervised causal learning, and subsequently enables zero-shot causal inference on unseen tasks with new data. This is based on our theoretical results that demonstrate the primal-dual connection between optimal covariate balancing and self-attention, facilitating zero-shot causal inference through the final layer of a trained transformer-type architecture. We demonstrate empirically that CInA effectively generalizes to out-of-distribution datasets and various real-world datasets, matching or even surpassing traditional per-dataset methodologies. These results provide compelling evidence that our method has the potential to serve as a stepping stone for the development of causal foundation models.

Updated: 2024-06-03 22:32:38

标题: 朝向因果基础模型：因果推断与注意力之间的二元性

摘要: 基础模型已经改变了机器学习的格局，展示了在各种任务中体现人类水平智能的闪光点。然而，在复杂任务（如因果推断）中仍存在差距，主要是由于复杂推理步骤和高数值精度要求所带来的挑战。在这项工作中，我们迈出了向建立考虑因果关系的基础模型以进行治疗效果估计的第一步。我们提出了一种新颖的、理论上合理的方法，称为注意力因果推断（CInA），它利用多个未标记数据集进行自监督因果学习，随后使得在新数据上进行零样本因果推断成为可能。这基于我们的理论结果，展示了最优协变量平衡与自注意力之间的原始-对偶连接，通过训练好的变压器类型架构的最终层实现零样本因果推断。我们实证地证明，CInA有效地推广到分布之外的数据集和各种真实世界数据集，与甚至超过传统的每个数据集方法相匹敌。这些结果提供了有力证据，表明我们的方法有潜力成为发展因果基础模型的垫脚石。

更新时间: 2024-06-03 22:32:38

领域: cs.LG,cs.AI,stat.ME,stat.ML

下载: http://arxiv.org/abs/2310.00809v3

Model Assessment and Selection under Temporal Distribution Shift

We investigate model assessment and selection in a changing environment, by synthesizing datasets from both the current time period and historical epochs. To tackle unknown and potentially arbitrary temporal distribution shift, we develop an adaptive rolling window approach to estimate the generalization error of a given model. This strategy also facilitates the comparison between any two candidate models by estimating the difference of their generalization errors. We further integrate pairwise comparisons into a single-elimination tournament, achieving near-optimal model selection from a collection of candidates. Theoretical analyses and numerical experiments demonstrate the adaptivity of our proposed methods to the non-stationarity in data.

Updated: 2024-06-03 22:30:38

标题: 模型评估和选择在时间分布转移下

摘要: 我们研究了在不断变化的环境中的模型评估和选择，通过合成来自当前时间段和历史时期的数据集。为了解决未知和潜在的任意时间分布变化，我们开发了一种自适应滚动窗口方法来估计给定模型的泛化误差。这种策略还通过估计两个候选模型的泛化误差之间的差异，促进了任意两个候选模型之间的比较。我们进一步将两两比较整合到一场单败淘汰锦标赛中，从候选模型集合中实现近乎最优的模型选择。理论分析和数值实验证明了我们提出的方法对数据的非稳态性的适应性。

更新时间: 2024-06-03 22:30:38

领域: cs.LG,cs.AI,stat.ME,62G05 (Primary), 62J02 (Secondary)

下载: http://arxiv.org/abs/2402.08672v2

Causal Discovery with Fewer Conditional Independence Tests

Many questions in science center around the fundamental problem of understanding causal relationships. However, most constraint-based causal discovery algorithms, including the well-celebrated PC algorithm, often incur an exponential number of conditional independence (CI) tests, posing limitations in various applications. Addressing this, our work focuses on characterizing what can be learned about the underlying causal graph with a reduced number of CI tests. We show that it is possible to a learn a coarser representation of the hidden causal graph with a polynomial number of tests. This coarser representation, named Causal Consistent Partition Graph (CCPG), comprises of a partition of the vertices and a directed graph defined over its components. CCPG satisfies consistency of orientations and additional constraints which favor finer partitions. Furthermore, it reduces to the underlying causal graph when the causal graph is identifiable. As a consequence, our results offer the first efficient algorithm for recovering the true causal graph with a polynomial number of tests, in special cases where the causal graph is fully identifiable through observational data and potentially additional interventions.

Updated: 2024-06-03 22:27:09

标题: 用更少的条件独立性测试进行因果发现

摘要: 许多科学问题围绕着理解因果关系的基本问题。然而，大多数基于约束的因果发现算法，包括著名的PC算法，通常需要进行指数数量的条件独立性（CI）测试，在各种应用中存在限制。针对这一问题，我们的工作重点在于通过减少CI测试次数来揭示潜在因果图的特征。我们展示了可以通过多项式数量的测试学习到隐藏因果图的更粗糙表示。这种更粗糙的表示被称为因果一致分割图（CCPG），包括顶点的一个分割以及在其组件上定义的有向图。CCPG满足定向的一致性和有利于更精细分割的附加约束。此外，当因果图可识别时，它会缩减为潜在因果图。因此，我们的结果为在特定情况下通过观测数据和潜在的额外干预使因果图完全可识别时，提供了首个使用多项式数量的测试恢复真实因果图的高效算法。

更新时间: 2024-06-03 22:27:09

领域: cs.LG,cs.AI,stat.ME,stat.ML

下载: http://arxiv.org/abs/2406.01823v1

Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong downstream performance in a variety of contexts, demonstrating that multitask pretraining leads to effective feature learning. Although several recent theoretical studies have shown that shallow NNs learn meaningful features when either (i) they are trained on a {\em single} task or (ii) they are {\em linear}, very little is known about the closer-to-practice case of {\em nonlinear} NNs trained on {\em multiple} tasks. In this work, we present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks. Using this observation, we show that when the tasks are binary classification tasks with labels depending on the projection of the data onto an $r$-dimensional subspace within the $d\gg r$-dimensional input space, a simple gradient-based multitask learning algorithm on a two-layer ReLU NN recovers this projection, allowing for generalization to downstream tasks with sample and neuron complexity independent of $d$. In contrast, we show that with high probability over the draw of a single task, training on this single task cannot guarantee to learn all $r$ ground-truth features.

Updated: 2024-06-03 22:24:24

标题: 可以翻译为：通过两层ReLU神经网络实现可证明的多任务表示学习

摘要: 一个越来越受欢迎的机器学习范式是在许多离线任务上预训练神经网络（NN），然后通过重新训练网络的最后一个线性层来适应下游任务。这种方法在各种情境下产生了强大的下游性能，表明多任务预训练导致了有效的特征学习。尽管一些最近的理论研究已经表明，当浅层神经网络学习有意义的特征时，要么是在一个单一任务上训练，要么是线性的，但关于非线性神经网络在多个任务上训练的更接近实践的情况，我们知之甚少。在这项工作中，我们提出了第一个证明特征学习发生在用非线性模型在多个任务上训练过程中的结果。我们的关键洞察是，多任务预训练引入了一种伪对比损失，有利于在任务之间通常具有相同标签的点之间对齐表示。借助这一观察，我们展示了当任务是具有标签依赖于数据投影到一个$r$维子空间内的$d\gg r$维输入空间上时的二元分类任务时，基于梯度的简单多任务学习算法在一个两层ReLU NN上恢复了这个投影，从而实现对下游任务的泛化，而样本和神经元复杂度与$d$无关。相反，我们展示了在从单个任务的选择中高概率地训练时，训练这个单个任务不能保证学习所有$r$个地面真实特征。

更新时间: 2024-06-03 22:24:24

领域: cs.LG

下载: http://arxiv.org/abs/2307.06887v4

Transductive Sample Complexities Are Compact

We demonstrate a compactness result holding broadly across supervised learning with a general class of loss functions: Any hypothesis class $H$ is learnable with transductive sample complexity $m$ precisely when all of its finite projections are learnable with sample complexity $m$. We prove that this exact form of compactness holds for realizable and agnostic learning with respect to any proper metric loss function (e.g., any norm on $\mathbb{R}^d$) and any continuous loss on a compact space (e.g., cross-entropy, squared loss). For realizable learning with improper metric losses, we show that exact compactness of sample complexity can fail, and provide matching upper and lower bounds of a factor of 2 on the extent to which such sample complexities can differ. We conjecture that larger gaps are possible for the agnostic case. Furthermore, invoking the equivalence between sample complexities in the PAC and transductive models (up to lower order factors, in the realizable case) permits us to directly port our results to the PAC model, revealing an almost-exact form of compactness holding broadly in PAC learning.

Updated: 2024-06-03 22:23:32

标题: 跨导样本复杂性紧凑

摘要: 我们展示了一个普遍适用于一般损失函数的监督学习的紧致性结果：任何假设类$H$在感知式样本复杂度$m$下是可学习的，当且仅当其所有有限投影在样本复杂度$m$下是可学习的。我们证明了这种确切的紧致性形式适用于可实现和不可知学习，对于任何适当的度量损失函数（例如$\mathbb{R}^d$上的任何范数）和在紧致空间上的任何连续损失（例如交叉熵、平方损失）。对于使用不当的度量损失进行可实现学习，我们展示了样本复杂度的确切紧致性可能失败，并提供了这种样本复杂度可以有多大程度不同的上下界因子为2。我们推测在不可知情况下可能存在更大的差距。此外，利用PAC模型和感知模型之间样本复杂度的等价性（在可实现情况下除去低阶因子），使我们能够直接将结果迁移到PAC模型，揭示了广泛适用于PAC学习的近乎确切的紧致性形式。

更新时间: 2024-06-03 22:23:32

领域: cs.LG,cs.CC,cs.DS,cs.LO,stat.ML

下载: http://arxiv.org/abs/2402.10360v2

Repeat After Me: Transformers are Better than State Space Models at Copying

Transformers are the dominant architecture for sequence modeling, but there is growing interest in models that use a fixed-size latent state that does not depend on the sequence length, which we refer to as "generalized state space models" (GSSMs). In this paper we show that while GSSMs are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks that require copying from the input context. We start with a theoretical analysis of the simple task of string copying and prove that a two layer transformer can copy strings of exponential length while GSSMs are fundamentally limited by their fixed-size latent state. Empirically, we find that transformers outperform GSSMs in terms of efficiency and generalization on synthetic tasks that require copying the context. Finally, we evaluate pretrained large language models and find that transformer models dramatically outperform state space models at copying and retrieving information from context. Taken together, these results suggest a fundamental gap between transformers and GSSMs on tasks of practical interest.

Updated: 2024-06-03 22:22:15

标题: 请您跟着我重复：变压器比状态空间模型更擅长复制

摘要: 变压器是序列建模的主导架构，但越来越多的人对使用一个不依赖于序列长度的固定大小潜在状态的模型表现出兴趣，我们将其称为“广义状态空间模型”（GSSMs）。在本文中，我们展示了虽然在推理时间效率方面GSSMs很有前途，但与需要从输入上下文中进行复制的任务相比，它们受到限制。我们从理论上分析了简单的字符串复制任务，并证明了两层变压器可以复制指数长度的字符串，而GSSMs基本上受到其固定大小潜在状态的限制。在实证方面，我们发现在需要复制上下文的合成任务中，变压器在效率和泛化方面优于GSSMs。最后，我们评估了预训练的大型语言模型，并发现变压器模型在从上下文中复制和检索信息方面远远优于状态空间模型。综合这些结果，这些结果表明在实际兴趣任务中变压器和GSSMs之间存在根本差距。

更新时间: 2024-06-03 22:22:15

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.01032v2

Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning

Recent advances in neural network pruning have shown how it is possible to reduce the computational costs and memory demands of deep learning models before training. We focus on this framework and propose a new pruning at initialization algorithm that leverages the Neural Tangent Kernel (NTK) theory to align the training dynamics of the sparse network with that of the dense one. Specifically, we show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account by providing an analytical upper bound to the NTK's trace obtained by decomposing neural networks into individual paths. This leads to our Path eXclusion (PX), a foresight pruning method designed to preserve the parameters that mostly influence the NTK's trace. PX is able to find lottery tickets (i.e. good paths) even at high sparsity levels and largely reduces the need for additional training. When applied to pre-trained models it extracts subnetworks directly usable for several downstream tasks, resulting in performance comparable to those of the dense counterpart but with substantial cost and computational savings. Code available at: https://github.com/iurada/px-ntk-pruning

Updated: 2024-06-03 22:19:42

标题: 通过数据驱动的频谱预测修剪在视觉模型中寻找彩票票据

摘要: 最近神经网络修剪的进展表明在训练之前如何降低深度学习模型的计算成本和内存需求是可能的。我们专注于这一框架，并提出一种新的修剪初始化算法，利用神经切线核（NTK）理论来使稀疏网络的训练动态与密集网络的训练动态相一致。具体来说，我们展示了如何通过将神经网络分解为单个路径来考虑NTK频谱中通常被忽略的数据相关组件，并提供了一个分解神经网络的解析上界，从而得到NTK的迹。这导致了我们的路径排除（PX），一种预见性修剪方法，旨在保留主要影响NTK迹的参数。PX能够在高稀疏水平下找到幸运票（即好路径），并大大减少了额外训练的需求。当应用于预训练模型时，它提取出可直接用于多个下游任务的子网络，其性能与密集对应物相当，但节省了大量成本和计算资源。代码可在此处获得：https://github.com/iurada/px-ntk-pruning

更新时间: 2024-06-03 22:19:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.01820v1

Diffusion Boosted Trees

Combining the merits of both denoising diffusion probabilistic models and gradient boosting, the diffusion boosting paradigm is introduced for tackling supervised learning problems. We develop Diffusion Boosted Trees (DBT), which can be viewed as both a new denoising diffusion generative model parameterized by decision trees (one single tree for each diffusion timestep), and a new boosting algorithm that combines the weak learners into a strong learner of conditional distributions without making explicit parametric assumptions on their density forms. We demonstrate through experiments the advantages of DBT over deep neural network-based diffusion models as well as the competence of DBT on real-world regression tasks, and present a business application (fraud detection) of DBT for classification on tabular data with the ability of learning to defer.

Updated: 2024-06-03 22:11:38

标题: 扩散增强树

摘要: 结合去噪扩散概率模型和梯度提升的优点，引入了扩散增强范式来解决监督学习问题。我们开发了扩散增强树（DBT），可以看作是一种由决策树参数化的新的去噪扩散生成模型（每个扩散时间步骤一个单一树），以及一种新的提升算法，将弱学习器组合成强条件分布学习器，而不对其密度形式作出明确的参数假设。我们通过实验证明了DBT相对于基于深度神经网络的扩散模型的优势，以及DBT在真实回归任务上的竞争力，并展示了DBT在表格数据分类的商业应用（欺诈检测）中具有学习延迟的能力。

更新时间: 2024-06-03 22:11:38

领域: stat.ML,cs.AI,cs.LG,stat.AP,stat.ME

下载: http://arxiv.org/abs/2406.01813v1

Memory Capacity Analysis of Time-delay Reservoir Computing Based on Silicon Microring Resonator Nonlinearities

Silicon microring resonators (MRRs) have shown strong potential in acting as the nonlinear nodes of photonic reservoir computing (RC) schemes. By using nonlinearities within a silicon MRR, such as the ones caused by free-carrier dispersion (FCD) and thermo-optic (TO) effects, it is possible to map the input data of the RC to a higher dimensional space. Furthermore, by adding an external waveguide between the through and add ports of the MRR, it is possible to implement a time-delay RC (TDRC) with enhanced memory. The input from the through port is fed back into the add port of the ring with the delay applied by the external waveguide effectively adding memory. In a TDRC, the nodes are multiplexed in time, and their respective time evolutions are detected at the drop port. The performance of MRR-based TDRC is highly dependent on the amount of nonlinearity in the MRR. The nonlinear effects, in turn, are dependent on the physical properties of the MRR as they determine the lifetime of the effects. Another factor to take into account is the stability of the MRR response, as strong time-domain discontinuities at the drop port are known to emerge from FCD nonlinearities due to self-pulsing (high nonlinear behaviour). However, quantifying the right amount of nonlinearity that RC needs for a certain task in order to achieve optimum performance is challenging. Therefore, further analysis is required to fully understand the nonlinear dynamics of this TDRC setup. Here, we quantify the nonlinear and linear memory capacity of the previously described microring-based TDRC scheme, as a function of the time constants of the generated carriers and the thermal of the TO effects. We analyze the properties of the TDRC dynamics that generate the parameter space, in terms of input signal power and frequency detuning range, over which conventional RC tasks can be satisfactorily performed by the TDRC scheme.

Updated: 2024-06-03 22:10:25

标题: 基于硅微环谐振器非线性的时滞蓄积器计算的内存容量分析

摘要: 硅微环谐振器（MRR）在光子谐振器计算（RC）方案中表现出强大潜力，可作为非线性节点。通过利用硅MRR内部的非线性，例如自由载流子色散（FCD）和热光（TO）效应引起的非线性，可以将RC的输入数据映射到更高的维度空间。此外，通过在MRR的透射口和加口之间添加外部波导，可以实现具有增强记忆功能的时延RC（TDRC）。来自透射口的输入被反馈到环的加口，并通过外部波导施加延迟有效地增加记忆。在TDRC中，节点在时间上进行复用，它们各自的时间演变在抽出口被检测。基于MRR的TDRC的性能高度依赖于MRR中的非线性程度。非线性效应反过来又取决于MRR的物理特性，因为它们决定了效应的寿命。另一个需要考虑的因素是MRR响应的稳定性，因为由于自脉冲（高非线性行为）引起的FCD非线性会导致抽出口出现强烈的时间域不连续性。然而，量化RC需要的正确非线性程度以实现最佳性能是具有挑战性的。因此，需要进一步分析以充分了解这个TDRC设置的非线性动态。在这里，我们量化了先前描述的基于微环的TDRC方案的非线性和线性记忆容量，作为所生成载流子的时间常数和TO效应的热的函数。我们分析了产生参数空间的TDRC动态特性，用于输入信号功率和频率失谐范围，通过这些参数空间，传统的RC任务可以由TDRC方案令人满意地执行。

更新时间: 2024-06-03 22:10:25

领域: cs.NE,cs.AI,physics.optics

下载: http://arxiv.org/abs/2406.01812v1

A Game-Theoretic Approach to Privacy-Utility Tradeoff in Sharing Genomic Summary Statistics

The advent of online genomic data-sharing services has sought to enhance the accessibility of large genomic datasets by allowing queries about genetic variants, such as summary statistics, aiding care providers in distinguishing between spurious genomic variations and those with clinical significance. However, numerous studies have demonstrated that even sharing summary genomic information exposes individual members of such datasets to a significant privacy risk due to membership inference attacks. While several approaches have emerged that reduce privacy risks by adding noise or reducing the amount of information shared, these typically assume non-adaptive attacks that use likelihood ratio test (LRT) statistics. We propose a Bayesian game-theoretic framework for optimal privacy-utility tradeoff in the sharing of genomic summary statistics. Our first contribution is to prove that a very general Bayesian attacker model that anchors our game-theoretic approach is more powerful than the conventional LRT-based threat models in that it induces worse privacy loss for the defender who is modeled as a von Neumann-Morgenstern (vNM) decision-maker. We show this to be true even when the attacker uses a non-informative subjective prior. Next, we present an analytically tractable approach to compare the Bayesian attacks with arbitrary subjective priors and the Neyman-Pearson optimal LRT attacks under the Gaussian mechanism common in differential privacy frameworks. Finally, we propose an approach for approximating Bayes-Nash equilibria of the game using deep neural network generators to implicitly represent player mixed strategies. Our experiments demonstrate that the proposed game-theoretic framework yields both stronger attacks and stronger defense strategies than the state of the art.

Updated: 2024-06-03 22:09:47

标题: 一个博弈论方法翻译隐私-效用权衡在分享基因组摘要统计中

摘要: 在线基因组数据共享服务的出现旨在通过允许查询有关遗传变异的摘要统计信息，帮助医护人员区分伪造的基因组变异和具有临床意义的变异，从而提高大型基因组数据集的可访问性。然而，许多研究表明，即使共享摘要基因组信息，也会使此类数据集的个体成员面临重大的隐私风险，因为可能会发生成员推断攻击。虽然出现了几种减少隐私风险的方法，例如添加噪音或减少共享信息量，但这些方法通常假设使用可能性比率检验（LRT）统计的非适应性攻击。我们提出了一个贝叶斯博弈理论框架，用于在共享基因组摘要统计信息中实现隐私与效用的最佳权衡。我们的第一个贡献是证明，我们的博弈理论方法所根基的非常普遍的贝叶斯攻击者模型比传统的基于LRT的威胁模型更强大，因为它会导致被建模为冯·诺伊曼-莫根斯特恩（vNM）决策者的防御者遭受更严重的隐私损失。我们证明，即使攻击者使用非信息性主观先验，这也是正确的。接下来，我们提出了一种可以分析的方法，用于比较具有任意主观先验的贝叶斯攻击和在常见的差分隐私框架中使用的高斯机制下的纳曼-皮尔逊最优LRT攻击。最后，我们提出了一种使用深度神经网络生成器来隐式表示玩家混合策略的方法，以近似计算游戏的贝叶斯-纳什均衡。我们的实验表明，所提出的博弈理论框架比现有技术水平具有更强的攻击和防御策略。

更新时间: 2024-06-03 22:09:47

领域: cs.CR

下载: http://arxiv.org/abs/2406.01811v1

In-Context Learning of Physical Properties: Few-Shot Adaptation to Out-of-Distribution Molecular Graphs

Large language models manifest the ability of few-shot adaptation to a sequence of provided examples. This behavior, known as in-context learning, allows for performing nontrivial machine learning tasks during inference only. In this work, we address the question: can we leverage in-context learning to predict out-of-distribution materials properties? However, this would not be possible for structure property prediction tasks unless an effective method is found to pass atomic-level geometric features to the transformer model. To address this problem, we employ a compound model in which GPT-2 acts on the output of geometry-aware graph neural networks to adapt in-context information. To demonstrate our model's capabilities, we partition the QM9 dataset into sequences of molecules that share a common substructure and use them for in-context learning. This approach significantly improves the performance of the model on out-of-distribution examples, surpassing the one of general graph neural network models.

Updated: 2024-06-03 21:59:21

标题: 上下文学习物理性质：少样本适应分布之外的分子图

摘要: 大型语言模型表现出少样本自适应到一系列提供的示例的能力。这种行为被称为上下文学习，允许仅在推理过程中执行非平凡的机器学习任务。在这项工作中，我们探讨了一个问题：我们能否利用上下文学习来预测超出分布材料性质？然而，除非找到一种有效的方法将原子级几何特征传递给变压器模型，否则这对结构性质预测任务是不可能的。为了解决这个问题，我们采用了一个复合模型，其中GPT-2作用于具有几何感知的图神经网络的输出，以适应上下文信息。为了展示我们模型的能力，我们将QM9数据集分成共享相同次结构的分子序列，并将它们用于上下文学习。这种方法显著提高了模型在超出分布示例上的性能，超过了一般图神经网络模型的性能。

更新时间: 2024-06-03 21:59:21

领域: cs.LG,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2406.01808v1

Contextualized Sequence Likelihood: Enhanced Confidence Scores for Natural Language Generation

The advent of large language models (LLMs) has dramatically advanced the state-of-the-art in numerous natural language generation tasks. For LLMs to be applied reliably, it is essential to have an accurate measure of their confidence. Currently, the most commonly used confidence score function is the likelihood of the generated sequence, which, however, conflates semantic and syntactic components. For instance, in question-answering (QA) tasks, an awkward phrasing of the correct answer might result in a lower probability prediction. Additionally, different tokens should be weighted differently depending on the context. In this work, we propose enhancing the predicted sequence probability by assigning different weights to various tokens using attention values elicited from the base LLM. By employing a validation set, we can identify the relevant attention heads, thereby significantly improving the reliability of the vanilla sequence probability confidence measure. We refer to this new score as the Contextualized Sequence Likelihood (CSL). CSL is easy to implement, fast to compute, and offers considerable potential for further improvement with task-specific prompts. Across several QA datasets and a diverse array of LLMs, CSL has demonstrated significantly higher reliability than state-of-the-art baselines in predicting generation quality, as measured by the AUROC or AUARC.

Updated: 2024-06-03 21:55:07

标题: 情境化序列可能性：增强自然语言生成的置信度得分

摘要: 大型语言模型（LLM）的出现显著推进了许多自然语言生成任务的最新技术。要可靠地应用LLM，必须准确测量其置信度。目前，最常用的置信度评分函数是生成序列的可能性，但这种方法混淆了语义和句法组成部分。例如，在问答（QA）任务中，正确答案的笨拙措辞可能导致概率预测较低。此外，根据上下文，不同的标记应该获得不同的权重。在这项工作中，我们提出通过使用从基础LLM中激发的注意力值为不同标记分配不同权重来增强预测的序列概率。通过使用验证集，我们可以识别相关的注意力头部，从而显著提高原始序列概率置信度测量的可靠性。我们将这种新分数称为上下文化序列可能性（CSL）。CSL易于实现，计算速度快，并且在任务特定提示下具有进一步改进的潜力。在几个QA数据集和各种LLM中，CSL在预测生成质量方面表现出比AUROC或AUARC衡量的最新基线更高的可靠性。

更新时间: 2024-06-03 21:55:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.01806v1

TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting

Tabular data is prevalent in many critical domains, yet it is often challenging to acquire in large quantities. This scarcity usually results in poor performance of machine learning models on such data. Data augmentation, a common strategy for performance improvement in vision and language tasks, typically underperforms for tabular data due to the lack of explicit symmetries in the input space. To overcome this challenge, we introduce TabMDA, a novel method for manifold data augmentation on tabular data. This method utilises a pre-trained in-context model, such as TabPFN, to map the data into a manifold space. TabMDA performs label-invariant transformations by encoding the data multiple times with varied contexts. This process explores the manifold of the underlying in-context models, thereby enlarging the training dataset. TabMDA is a training-free method, making it applicable to any classifier. We evaluate TabMDA on five standard classifiers and observe significant performance improvements across various tabular datasets. Our results demonstrate that TabMDA provides an effective way to leverage information from pre-trained in-context models to enhance the performance of downstream classifiers.

Updated: 2024-06-03 21:51:13

标题: TabMDA：使用具有上下文子集的变换器对任何分类器进行表面流形数据增强

摘要: 表格数据在许多关键领域中很常见，但往往很难大量获取。这种稀缺性通常导致机器学习模型在这些数据上表现不佳。数据增强是一种在视觉和语言任务中提升性能的常见策略，但通常在表格数据上表现不佳，原因是输入空间中缺乏明确的对称性。为了克服这一挑战，我们引入了TabMDA，一种用于表格数据中流形数据增强的新方法。该方法利用预训练的上下文模型，如TabPFN，将数据映射到流形空间中。TabMDA通过多次使用不同上下文对数据进行编码，执行与标签无关的转换。这个过程探索了基础上下文模型的流形，从而扩大了训练数据集。TabMDA是一种无需训练的方法，适用于任何分类器。我们在五个标准分类器上评估了TabMDA，并观察到在各种表格数据集上显著的性能改善。我们的结果表明，TabMDA提供了一种有效的方式，利用预训练的上下文模型信息来增强下游分类器的性能。

更新时间: 2024-06-03 21:51:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01805v1

Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions

This paper proposes Progressive Inference - a framework to compute input attributions to explain the predictions of decoder-only sequence classification models. Our work is based on the insight that the classification head of a decoder-only Transformer model can be used to make intermediate predictions by evaluating them at different points in the input sequence. Due to the causal attention mechanism, these intermediate predictions only depend on the tokens seen before the inference point, allowing us to obtain the model's prediction on a masked input sub-sequence, with negligible computational overheads. We develop two methods to provide sub-sequence level attributions using this insight. First, we propose Single Pass-Progressive Inference (SP-PI), which computes attributions by taking the difference between consecutive intermediate predictions. Second, we exploit a connection with Kernel SHAP to develop Multi Pass-Progressive Inference (MP-PI). MP-PI uses intermediate predictions from multiple masked versions of the input to compute higher quality attributions. Our studies on a diverse set of models trained on text classification tasks show that SP-PI and MP-PI provide significantly better attributions compared to prior work.

Updated: 2024-06-03 21:48:57

标题: 渐进推理：利用中间预测解释仅使用解码器的序列分类模型

摘要: 本文提出了渐进推理（Progressive Inference）- 一个计算输入归因以解释仅使用解码器的序列分类模型预测的框架。我们的工作基于这样的洞察，即解码器模型的分类头可以通过在输入序列的不同点评估它们来进行中间预测。由于因果关注机制，这些中间预测仅依赖于推理点之前看到的标记，使我们能够获得模型对掩码输入子序列的预测，计算开销微乎其微。我们开发了两种方法，利用这一洞察提供子序列级别的归因。首先，我们提出了单次传递渐进推理（SP-PI），通过计算连续中间预测之间的差异来计算归因。其次，我们利用与核SHAP的连接开发了多次传递渐进推理（MP-PI）。MP-PI使用来自输入的多个掩码版本的中间预测来计算更高质量的归因。我们对在文本分类任务上训练的各种模型进行的研究表明，与先前工作相比，SP-PI和MP-PI提供了显着更好的归因。

更新时间: 2024-06-03 21:48:57

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.02625v1

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science

Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents, called scientific LLM agents, also introduce novel vulnerabilities that demand careful consideration for safety. However, there exists a notable gap in the literature, as there has been no comprehensive exploration of these vulnerabilities. This perspective paper fills this gap by conducting a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures. We begin by providing a comprehensive overview of the potential risks inherent to scientific LLM agents, taking into account user intent, the specific scientific domain, and their potential impact on the external environment. Then, we delve into the origins of these vulnerabilities and provide a scoping review of the limited existing works. Based on our analysis, we propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback (agent regulation) to mitigate these identified risks. Furthermore, we highlight the limitations and challenges associated with safeguarding scientific agents and advocate for the development of improved models, robust benchmarks, and comprehensive regulations to address these issues effectively.

Updated: 2024-06-03 21:45:53

标题: 优先考虑保护而不是自主权：LLM代理对科学的风险

摘要: 由大型语言模型（LLMs）驱动的智能代理已经展示出在自主进行实验和促进跨学科科学发现方面具有巨大潜力。虽然它们的能力令人期待，但这些被称为科学LLM代理的代理也引入了需要仔细考虑安全性的新型漏洞。然而，文献中存在一个显著的空白，因为对这些漏洞还没有进行全面探讨。本文通过对科学领域中基于LLM的代理的漏洞进行彻底检查，揭示了与其误用相关的潜在风险，并强调了对安全措施的需求来填补这一空白。我们首先全面概述了科学LLM代理固有的潜在风险，考虑了用户意图、特定科学领域以及它们对外部环境的潜在影响。然后，我们深入探讨了这些漏洞的根源，并对现有有限的研究进行了范围审查。根据我们的分析，我们提出了一个包括人类监管、代理对齐和对环境反馈的理解（代理监管）的三元框架，以减轻这些确定的风险。此外，我们强调了保护科学代理所面临的限制和挑战，并倡导开发改进的模型、强大的基准测试和全面的法规，以有效解决这些问题。

更新时间: 2024-06-03 21:45:53

领域: cs.CY,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.04247v3

Fearless Stochasticity in Expectation Propagation

Expectation propagation (EP) is a family of algorithms for performing approximate inference in probabilistic models. The updates of EP involve the evaluation of moments -- expectations of certain functions -- which can be estimated from Monte Carlo (MC) samples. However, the updates are not robust to MC noise when performed naively, and various prior works have attempted to address this issue in different ways. In this work, we provide a novel perspective on the moment-matching updates of EP; namely, that they perform natural-gradient-based optimisation of a variational objective. We use this insight to motivate two new EP variants, with updates that are particularly well-suited to MC estimation; they remain stable and are most sample-efficient when estimated with just a single sample. These new variants combine the benefits of their predecessors and address key weaknesses. In particular, they are easier to tune, offer an improved speed-accuracy trade-off, and do not rely on the use of debiasing estimators. We demonstrate their efficacy on a variety of probabilistic inference tasks.

Updated: 2024-06-03 21:42:06

标题: Expectation Propagation中的无畏随机性

摘要: 期望传播（EP）是一类用于在概率模型中进行近似推断的算法族。EP的更新涉及矩的评估——对某些函数的期望——这些可以从蒙特卡罗（MC）样本中估计得到。然而，当朴素地执行时，更新对MC噪声不稳健，各种先前的工作已尝试以不同方式解决这个问题。在这项工作中，我们提出了对EP的矩匹配更新的新视角；即，它们执行基于自然梯度的变分目标优化。我们利用这一见解来激发两种新的EP变体，其更新特别适合于MC估计；它们保持稳定，并且在仅使用单个样本估计时最节约。这些新变体结合了前身的优点并解决了关键弱点。特别是，它们更容易调整，提供了改进的速度-精度权衡，并且不依赖于去偏估计器的使用。我们展示了它们在各种概率推断任务上的有效性。

更新时间: 2024-06-03 21:42:06

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.01801v1

Online Control in Population Dynamics

The study of population dynamics originated with early sociological works (Malthus, 1872) but has since extended into many fields, including biology, epidemiology, evolutionary game theory, and economics. Most studies on population dynamics focus on the problem of prediction rather than control. Existing mathematical models for population control are often restricted to specific, noise-free dynamics, while real-world population changes can be complex and adversarial. To address this gap, we propose a new framework based on the paradigm of online control. We first characterize a set of linear dynamical systems that can naturally model evolving populations. We then give an efficient gradient-based controller for these systems, with near-optimal regret bounds with respect to a broad class of linear policies. Our empirical evaluations demonstrate the effectiveness of the proposed algorithm for population control even in non-linear models such as SIR and replicator dynamics.

Updated: 2024-06-03 21:40:59

标题: 人口动态中的在线控制

摘要: 人口动态的研究起源于早期的社会学作品（马尔萨斯，1872年），但现在已经扩展到许多领域，包括生物学、流行病学、进化博弈论和经济学。大多数关于人口动态的研究集中在预测问题上，而不是控制问题。现有的人口控制数学模型通常局限于特定的、无噪声的动态，而现实世界中的人口变化可能是复杂的和对抗性的。为了弥补这一差距，我们提出了一个基于在线控制范式的新框架。我们首先表征了一组线性动态系统，可以自然地模拟不断演变的人口。然后，我们为这些系统提供了一个高效的基于梯度的控制器，对于广泛类别的线性策略，具有接近最优的后悔界限。我们的实证评估展示了所提出的算法在人口控制方面的有效性，即使在SIR和增殖动力学等非线性模型中也是如此。

更新时间: 2024-06-03 21:40:59

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2406.01799v1

The Power of Sampling: Dimension-free Risk Bounds in Private ERM

Differentially private empirical risk minimization (DP-ERM) is a fundamental problem in private optimization. While the theory of DP-ERM is well-studied, as large-scale models become prevalent, traditional DP-ERM methods face new challenges, including (1) the prohibitive dependence on the ambient dimension, (2) the highly non-smooth objective functions, (3) costly first-order gradient oracles. Such challenges demand rethinking existing DP-ERM methodologies. In this work, we show that the regularized exponential mechanism combined with existing samplers can address these challenges altogether: under the standard unconstrained domain and low-rank gradients assumptions, our algorithm can achieve rank-dependent risk bounds for non-smooth convex objectives using only zeroth order oracles, which was not accomplished by prior methods. This highlights the power of sampling in differential privacy. We further construct lower bounds, demonstrating that when gradients are full-rank, there is no separation between the constrained and unconstrained settings. Our lower bound is derived from a general black-box reduction from unconstrained to the constrained domain and an improved lower bound in the constrained setting, which might be of independent interest.

Updated: 2024-06-03 21:31:18

标题: 采样的威力：私人ERM中的无维风险界限

摘要: 差分隐私经验风险最小化（DP-ERM）是隐私优化中的一个基本问题。虽然DP-ERM的理论已经得到了深入研究，但随着大规模模型的普及，传统的DP-ERM方法面临着新的挑战，包括（1）对环境维度的严重依赖，（2）高度非光滑的目标函数，（3）昂贵的一阶梯度预言。这些挑战要求重新思考现有的DP-ERM方法。在这项工作中，我们展示了正则化指数机制结合现有的采样器可以全面解决这些挑战：在标准的无约束域和低秩梯度假设下，我们的算法可以实现对非光滑凸目标的秩相关风险界，只使用零阶预言，这是以前的方法所未能实现的。这突显了采样在差分隐私中的作用。我们进一步构建了下界，证明当梯度是全秩时，受限和无限制设置之间没有分离。我们的下界是从无限制到受限领域的一般黑盒约简和受限设置中改进的下界推导而来，这可能是独立有趣的。

更新时间: 2024-06-03 21:31:18

领域: cs.LG,cs.CR,math.OC

下载: http://arxiv.org/abs/2105.13637v4

Player-Driven Emergence in LLM-Driven Game Narrative

We explore how interaction with large language models (LLMs) can give rise to emergent behaviors, empowering players to participate in the evolution of game narratives. Our testbed is a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise, but can freely interact with non-player characters generated by GPT-4, a large language model. We recruit 28 gamers to play the game and use GPT-4 to automatically convert the game logs into a node-graph representing the narrative in the player's gameplay. We find that through their interactions with the non-deterministic behavior of the LLM, players are able to discover interesting new emergent nodes that were not a part of the original narrative but have potential for being fun and engaging. Players that created the most emergent nodes tended to be those that often enjoy games that facilitate discovery, exploration and experimentation.

Updated: 2024-06-03 21:27:14

标题: 玩家驱动的LLM驱动的游戏叙事中的涌现

摘要: 我们探讨了与大型语言模型（LLMs）的互动如何导致新兴行为，使玩家能够参与游戏叙事的演变。我们的实验平台是一个文字冒险游戏，玩家试图在固定叙事前提下解决一个谜团，但可以自由地与由GPT-4生成的非玩家角色进行互动。我们招募了28名玩家来玩这个游戏，并使用GPT-4自动将游戏日志转换为代表玩家游戏中叙事的节点图。我们发现，通过与LLM的非确定性行为互动，玩家能够发现有趣的新兴节点，这些节点并不是原始叙事的一部分，但具有潜力成为有趣和引人入胜的内容。创造出最多新兴节点的玩家往往是那些喜欢促进发现、探索和实验的游戏的玩家。

更新时间: 2024-06-03 21:27:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.17027v3

It Takes Two: A Peer-Prediction Solution for Blockchain Verifier's Dilemma

The security of blockchain systems is fundamentally based on the decentralized consensus in which the majority of parties behave honestly, and the process of content verification is essential to keep the robustness of blockchain systems. However, the phenomenon that a secure blockchain system with few or no cheaters could not provide sufficient incentive for verifiers to honestly perform the costly verification, referred to as the Verifier's Dilemma, could severely undermine the fundamental security of blockchain systems. While existing works have attempted to insert deliberate errors to disincentivize lazy verification, the decentralized environment makes it impossible to judge the correctness of verification or detect malicious verifiers directly. In this paper, we initiate the research that leverages the peer prediction approach towards the design of Bayesian truthful mechanisms for the decentralized verification game among multiple verifiers, incentivizing all verifiers to perform honest verification without access to the ground truth even in the presence of noisy observations in the verification process. With theoretically guaranteed truthfulness of our mechanism for the verification game, our work provides a framework of verification mechanisms that enhances the security and robustness of the blockchain and potentially other decentralized systems.

Updated: 2024-06-03 21:21:17

标题: 需要两个人：区块链验证者困境的对等预测解决方案

摘要: 区块链系统的安全性基本上是建立在去中心化共识之上的，其中大多数参与方行为诚实，内容验证的过程对于保持区块链系统的稳固性至关重要。然而，安全的区块链系统即使没有或只有少量作弊者也可能无法为验证者提供足够的激励来进行昂贵的验证，这被称为验证者困境，可能严重破坏区块链系统的基本安全性。尽管现有的研究已经尝试通过插入故意错误来阻止懒惰的验证，但去中心化环境使得无法直接判断验证的正确性或检测恶意验证者。在本文中，我们启动了利用对等预测方法研究去中心化验证游戏设计的研究，激励所有验证者进行诚实验证，即使在验证过程中存在噪声观测，也无需访问真相。通过我们机制在验证游戏中的理论保证的真实性，我们提供了一个验证机制框架，增强了区块链和潜在其他去中心化系统的安全性和稳固性。

更新时间: 2024-06-03 21:21:17

领域: cs.CR,cs.GT

下载: http://arxiv.org/abs/2406.01794v1

Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning

Inverse reinforcement learning (IRL) aims to infer a reward from expert demonstrations, motivated by the idea that the reward, rather than the policy, is the most succinct and transferable description of a task [Ng et al., 2000]. However, the reward corresponding to an optimal policy is not unique, making it unclear if an IRL-learned reward is transferable to new transition laws in the sense that its optimal policy aligns with the optimal policy corresponding to the expert's true reward. Past work has addressed this problem only under the assumption of full access to the expert's policy, guaranteeing transferability when learning from two experts with the same reward but different transition laws that satisfy a specific rank condition [Rolland et al., 2022]. In this work, we show that the conditions developed under full access to the expert's policy cannot guarantee transferability in the more practical scenario where we have access only to demonstrations of the expert. Instead of a binary rank condition, we propose principal angles as a more refined measure of similarity and dissimilarity between transition laws. Based on this, we then establish two key results: 1) a sufficient condition for transferability to any transition laws when learning from at least two experts with sufficiently different transition laws, and 2) a sufficient condition for transferability to local changes in the transition law when learning from a single expert. Furthermore, we also provide a probably approximately correct (PAC) algorithm and an end-to-end analysis for learning transferable rewards from demonstrations of multiple experts.

Updated: 2024-06-03 21:18:08

标题: 朝向通过正则化逆强化学习恢复的奖励的可转移性

摘要: 反向强化学习（IRL）旨在从专家演示中推断出奖励，其动机是奖励而不是策略是一项任务的最简洁和可转移的描述[Ng等，2000]。然而，对应于最优策略的奖励并不是唯一的，这使得不清楚通过IRL学习得到的奖励是否可以转移到新的转移规律中，其最优策略与专家真实奖励对应的最优策略一致。过去的研究只在完全访问专家策略的假设下解决了这个问题，保证了从满足特定秩条件的两个具有相同奖励但不同转移规律的专家学习时的可转移性[Rolland等，2022]。在这项工作中，我们展示了在仅有专家演示的更实际的情况下，开发在完全访问专家策略的条件不能保证可转移性。我们提出主角度作为转移规律之间的相似性和差异性的更精细度量，而不是二元秩条件。基于此，我们建立了两个关键结果：1）在从至少两个具有足够不同转移规律的专家学习时对任何转移规律的可转移性的充分条件，2）在从单个专家学习时对转移规律的局部变化的可转移性的充分条件。此外，我们还提供了一个可能近似正确（PAC）算法和一个端到端的分析，用于从多个专家的演示中学习可转移的奖励。

更新时间: 2024-06-03 21:18:08

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.01793v1

SciMON: Scientific Inspiration Machines Optimized for Novelty

We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature. Work on literature-based hypothesis generation has traditionally focused on binary link prediction--severely limiting the expressivity of hypotheses. This line of work also does not focus on optimizing novelty. We take a dramatic departure with a novel setting in which models use as input background contexts (e.g., problems, experimental settings, goals), and output natural language ideas grounded in literature. We present SciMON, a modeling framework that uses retrieval of "inspirations" from past scientific papers, and explicitly optimizes for novelty by iteratively comparing to prior papers and updating idea suggestions until sufficient novelty is achieved. Comprehensive evaluations reveal that GPT-4 tends to generate ideas with overall low technical depth and novelty, while our methods partially mitigate this issue. Our work represents a first step toward evaluating and developing language models that generate new ideas derived from the scientific literature

Updated: 2024-06-03 21:15:28

标题: SciMON：为新颖性优化的科学灵感机器

摘要: 我们探索并增强神经语言模型生成基于文献的新科学方向的能力。文献基础假设生成的工作传统上集中在二元链接预测上，严重限制了假设的表达能力。这一领域的工作也没有专注于优化新颖性。我们采取了一个全新的设置，模型使用背景上下文（例如问题、实验设置、目标）作为输入，输出基于文献的自然语言理念。我们提出了SciMON，一个建模框架，利用检索过去科学论文中的“灵感”，并通过迭代比较以前的论文并更新理念建议，直到达到足够的新颖性。全面评估表明，GPT-4倾向于生成技术深度和新颖性较低的理念，而我们的方法在一定程度上缓解了这个问题。我们的工作代表了评估和开发语言模型的第一步，这些模型能够生成源自科学文献的新理念。

更新时间: 2024-06-03 21:15:28

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2305.14259v7

AI-based Classification of Customer Support Tickets: State of the Art and Implementation with AutoML

Automation of support ticket classification is crucial to improve customer support performance and shortening resolution time for customer inquiries. This research aims to test the applicability of automated machine learning (AutoML) as a technology to train a machine learning model (ML model) that can classify support tickets. The model evaluation conducted in this research shows that AutoML can be used to train ML models with good classification performance. Moreover, this paper fills a research gap by providing new insights into developing AI solutions without a dedicated professional by utilizing AutoML, which makes this technology more accessible for companies without specialized AI departments and staff.

Updated: 2024-06-03 21:13:02

标题: 基于人工智能的客户支持工单分类：最新技术及AutoML实现

摘要: 支持票分类的自动化对于提高客户支持绩效和缩短客户查询的解决时间至关重要。本研究旨在测试自动化机器学习（AutoML）作为一种技术来训练能够分类支持票的机器学习模型（ML模型）的适用性。本研究中进行的模型评估显示，AutoML可以用于训练具有良好分类性能的ML模型。此外，本文通过利用AutoML提供新的洞见，填补了研究空白，使得这项技术更容易为没有专门人员的公司提供AI解决方案。

更新时间: 2024-06-03 21:13:02

领域: cs.LG,cs.AI,cs.CL,cs.HC,I.2; I.2.7; K.6

下载: http://arxiv.org/abs/2406.01789v1

Recent Advances in Data-Driven Business Process Management

The rapid development of cutting-edge technologies, the increasing volume of data and also the availability and processability of new types of data sources has led to a paradigm shift in data-based management and decision-making. Since business processes are at the core of organizational work, these developments heavily impact BPM as a crucial success factor for organizations. In view of this emerging potential, data-driven business process management has become a relevant and vibrant research area. Given the complexity and interdisciplinarity of the research field, this position paper therefore presents research insights regarding data-driven BPM.

Updated: 2024-06-03 21:05:59

标题: 数据驱动业务流程管理的最新进展

摘要: 先进技术的迅速发展、数据量的不断增加以及新型数据源的可用性和可处理性，导致了基于数据的管理和决策发生了范式转变。由于业务流程是组织工作的核心，这些发展对业务流程管理（BPM）产生了重大影响，作为组织的关键成功因素。鉴于这一新兴潜力，数据驱动的业务流程管理已成为一个相关且充满活力的研究领域。鉴于研究领域的复杂性和跨学科性，这篇立场论文因此提出了关于数据驱动的BPM的研究见解。

更新时间: 2024-06-03 21:05:59

领域: cs.DB,cs.AI,68U35 68T07 68T07, 68U35, 68T01,H.4.1; I.2.1; I.2.6; I.2.7; H.2.8; K.6.1

下载: http://arxiv.org/abs/2406.01786v1

A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians

We interviewed twenty professional comedians who perform live shows in front of audiences and who use artificial intelligence in their artistic process as part of 3-hour workshops on ``AI x Comedy'' conducted at the Edinburgh Festival Fringe in August 2023 and online. The workshop consisted of a comedy writing session with large language models (LLMs), a human-computer interaction questionnaire to assess the Creativity Support Index of AI as a writing tool, and a focus group interrogating the comedians' motivations for and processes of using AI, as well as their ethical concerns about bias, censorship and copyright. Participants noted that existing moderation strategies used in safety filtering and instruction-tuned LLMs reinforced hegemonic viewpoints by erasing minority groups and their perspectives, and qualified this as a form of censorship. At the same time, most participants felt the LLMs did not succeed as a creativity support tool, by producing bland and biased comedy tropes, akin to ``cruise ship comedy material from the 1950s, but a bit less racist''. Our work extends scholarship about the subtle difference between, one the one hand, harmful speech, and on the other hand, ``offensive'' language as a practice of resistance, satire and ``punching up''. We also interrogate the global value alignment behind such language models, and discuss the importance of community-based value alignment and data ownership to build AI tools that better suit artists' needs.

Updated: 2024-06-03 21:01:50

标题: 一个机器人走进酒吧：语言模型能够作为喜剧创作的创意支持工具吗？对语言模型与喜剧演员幽默风格的评估

摘要: 我们对二十位专业喜剧演员进行了访谈，他们在观众面前进行现场表演，并在2023年8月在爱丁堡艺术节和在线上举办的“AI x Comedy”3小时研讨会中使用人工智能作为艺术创作过程的一部分。研讨会包括一个与大型语言模型（LLMs）一起进行喜剧创作的会话，一个人机交互问卷，用于评估AI作为写作工具的创造力支持指数，以及一个焦点小组，询问喜剧演员使用AI的动机和过程，以及他们对偏见、审查和版权的伦理关切。参与者指出，现有的安全过滤和指导调整的LLMs使用的调节策略通过消除少数群体及其观点来强化霸权观点，并将其定性为一种审查形式。同时，大多数参与者认为LLMs并不成功作为一种创造力支持工具，因为它们产生了平淡和有偏见的喜剧模式，类似于“来自1950年代的游轮喜剧素材，但稍微不那么种族主义”。我们的工作扩展了有关有害言论与“冒犯性”语言之间微妙差异的学术研究，一方面是有害言论，另一方面是作为抵抗、讽刺和“打压”的语言实践。我们还审视了这种语言模型背后的全球价值观调整，并讨论了基于社区的价值观调整和数据所有权的重要性，以构建更适合艺术家需求的AI工具。

更新时间: 2024-06-03 21:01:50

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20956v2

Take a Step Further: Understanding Page Spray in Linux Kernel Exploitation

Recently, a novel method known as Page Spray emerges, focusing on page-level exploitation for kernel vulnerabilities. Despite the advantages it offers in terms of exploitability, stability, and compatibility, comprehensive research on Page Spray remains scarce. Questions regarding its root causes, exploitation model, comparative benefits over other exploitation techniques, and possible mitigation strategies have largely remained unanswered. In this paper, we conduct a systematic investigation into Page Spray, providing an in-depth understanding of this exploitation technique. We introduce a comprehensive exploit model termed the \sys model, elucidating its fundamental principles. Additionally, we conduct a thorough analysis of the root causes underlying Page Spray occurrences within the Linux Kernel. We design an analyzer based on the Page Spray analysis model to identify Page Spray callsites. Subsequently, we evaluate the stability, exploitability, and compatibility of Page Spray through meticulously designed experiments. Finally, we propose mitigation principles for addressing Page Spray and introduce our own lightweight mitigation approach. This research aims to assist security researchers and developers in gaining insights into Page Spray, ultimately enhancing our collective understanding of this emerging exploitation technique and making improvements to the community.

Updated: 2024-06-03 20:59:36

标题: 深入探讨：理解Linux内核利用中的页面喷射

摘要: 最近，一种名为Page Spray的新颖方法出现，专注于内核漏洞的页面级利用。尽管它在利用性、稳定性和兼容性方面提供了优势，但对Page Spray的全面研究仍然很少。关于其根本原因、利用模型、与其他利用技术相比的比较优势以及可能的缓解策略的问题大多没有得到答复。在本文中，我们对Page Spray进行了系统调查，深入了解这种利用技术。我们引入了一个称为\sys模型的全面利用模型，阐明其基本原则。此外，我们对Linux内核中Page Spray发生的根本原因进行了彻底分析。我们设计了一个基于Page Spray分析模型的分析器，用于识别Page Spray调用点。随后，我们通过精心设计的实验评估了Page Spray的稳定性、可利用性和兼容性。最后，我们提出了解决Page Spray的缓解原则，并介绍了我们自己的轻量级缓解方法。这项研究旨在帮助安全研究人员和开发人员深入了解Page Spray，最终增强我们对这种新兴利用技术的集体理解，并改进社区。

更新时间: 2024-06-03 20:59:36

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2406.02624v1

Correcting Underrepresentation and Intersectional Bias for Classification

We consider the problem of learning from data corrupted by underrepresentation bias, where positive examples are filtered from the data at different, unknown rates for a fixed number of sensitive groups. We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates, even in settings where intersectional group membership makes learning each intersectional rate computationally infeasible. Using these estimates, we construct a reweighting scheme that allows us to approximate the loss of any hypothesis on the true distribution, even if we only observe the empirical error on a biased sample. From this, we present an algorithm encapsulating this learning and reweighting process along with a thorough empirical investigation. Finally, we define a bespoke notion of PAC learnability for the underrepresentation and intersectional bias setting and show that our algorithm permits efficient learning for model classes of finite VC dimension.

Updated: 2024-06-03 20:57:56

标题: 纠正分类中的表现不足和交叉偏见

摘要: 我们考虑从数据中学习的问题，数据受到不均衡偏差的影响，其中正例在一定数量的敏感群体中以不同且未知的比率被过滤。我们展示，通过少量无偏数据，我们可以有效地估计群体间的丢失率，即使在交叉群体成员使得学习每个交叉率在计算上不可行的情况下。利用这些估计，我们构建了一种重新加权方案，使我们能够在仅观察到偏倚样本上的经验误差的情况下，近似任何假设在真实分布上的损失。基于此，我们提出了一个包含学习和重新加权过程的算法，并进行了详尽的实证研究。最后，我们为不均衡和交叉偏差设置定义了一种定制的PAC可学习性概念，并展示了我们的算法允许对具有有限VC维度的模型类进行高效学习。

更新时间: 2024-06-03 20:57:56

领域: cs.LG,cs.CY,cs.DS,stat.ML

下载: http://arxiv.org/abs/2306.11112v4

Multi-agent assignment via state augmented reinforcement learning

We address the conflicting requirements of a multi-agent assignment problem through constrained reinforcement learning, emphasizing the inadequacy of standard regularization techniques for this purpose. Instead, we recur to a state augmentation approach in which the oscillation of dual variables is exploited by agents to alternate between tasks. In addition, we coordinate the actions of the multiple agents acting on their local states through these multipliers, which are gossiped through a communication network, eliminating the need to access other agent states. By these means, we propose a distributed multi-agent assignment protocol with theoretical feasibility guarantees that we corroborate in a monitoring numerical experiment.

Updated: 2024-06-03 20:56:12

标题: 多智能体通过状态增强强化学习进行分配

摘要: 我们通过受限强化学习解决多智能体分配问题的矛盾要求，强调标准正则化技术在此目的上的不足。相反，我们采用状态增强方法，在其中代理利用对偶变量的振荡来在任务之间交替。此外，我们通过这些乘数协调多个代理在其本地状态上的行动，这些乘数通过通信网络传播，消除了访问其他代理状态的需要。通过这些方法，我们提出了一个具有理论可行性保证的分布式多智能体分配协议，并在监测数值实验中加以验证。

更新时间: 2024-06-03 20:56:12

领域: eess.SY,cs.AI,cs.LG,cs.MA,cs.SY,93E35

下载: http://arxiv.org/abs/2406.01782v1

DEFT: Efficient Finetuning of Conditional Diffusion Models by Learning the Generalised $h$-transform

Generative modelling paradigms based on denoising diffusion processes have emerged as a leading candidate for conditional sampling in inverse problems. In many real-world applications, we often have access to large, expensively trained unconditional diffusion models, which we aim to exploit for improving conditional sampling. Most recent approaches are motivated heuristically and lack a unifying framework, obscuring connections between them. Further, they often suffer from issues such as being very sensitive to hyperparameters, being expensive to train or needing access to weights hidden behind a closed API. In this work, we unify conditional training and sampling using the mathematically well-understood Doob's h-transform. This new perspective allows us to unify many existing methods under a common umbrella. Under this framework, we propose DEFT (Doob's h-transform Efficient FineTuning), a new approach for conditional generation that simply fine-tunes a very small network to quickly learn the conditional $h$-transform, while keeping the larger unconditional network unchanged. DEFT is much faster than existing baselines while achieving state-of-the-art performance across a variety of linear and non-linear benchmarks. On image reconstruction tasks, we achieve speedups of up to 1.6$\times$, while having the best perceptual quality on natural images and reconstruction performance on medical images.

Updated: 2024-06-03 20:52:34

标题: DEFT: 通过学习广义$h$-变换高效微调条件扩散模型

摘要: 基于去噪扩散过程的生成建模范式已成为反问题中条件抽样的主要候选者。在许多实际应用中，我们经常可以访问大型、昂贵训练的无条件扩散模型，我们的目标是利用它们来改进条件抽样。最近的方法大多是启发式的，缺乏统一的框架，使它们之间的联系变得模糊。此外，它们通常存在一些问题，比如对超参数非常敏感、训练成本高或需要访问隐藏在封闭API后面的权重。在这项工作中，我们使用数学上理解良好的Doob's h-transform来统一条件训练和抽样。这种新视角使我们能够将许多现有方法统一在一个共同的框架下。在这个框架下，我们提出了DEFT（Doob's h-transform Efficient FineTuning），这是一种新的条件生成方法，简单地微调一个非常小的网络来快速学习条件$h$-transform，同时保持较大的无条件网络不变。DEFT比现有基线快得多，同时在各种线性和非线性基准测试中实现了最先进的性能。在图像重建任务中，我们实现了高达1.6倍的加速，同时在自然图像的感知质量和医学图像的重建性能方面表现最佳。

更新时间: 2024-06-03 20:52:34

领域: cs.LG

下载: http://arxiv.org/abs/2406.01781v1

Controlled Decoding from Language Models

KL-regularized reinforcement learning (RL) is a popular alignment framework to control the language model responses towards high reward outcomes. We pose a tokenwise RL objective and propose a modular solver for it, called controlled decoding (CD). CD exerts control through a separate prefix scorer module, which is trained to learn a value function for the reward. The prefix scorer is used at inference time to control the generation from a frozen base model, provably sampling from a solution to the RL objective. We empirically demonstrate that CD is effective as a control mechanism on popular benchmarks. We also show that prefix scorers for multiple rewards may be combined at inference time, effectively solving a multi-objective RL problem with no additional training. We show that the benefits of applying CD transfer to an unseen base model with no further tuning as well. Finally, we show that CD can be applied in a blockwise decoding fashion at inference-time, essentially bridging the gap between the popular best-of-K strategy and tokenwise control through reinforcement learning. This makes CD a promising approach for alignment of language models.

Updated: 2024-06-03 20:50:26

标题: Language Models的受控解码

摘要: KL正则化强化学习（RL）是一种流行的对齐框架，用于控制语言模型响应以获得高回报结果。我们提出了一种基于标记的RL目标，并提出了一个名为受控解码（CD）的模块化求解器。CD通过一个单独的前缀评分器模块进行控制，该评分器被训练为学习奖励的价值函数。前缀评分器在推断时用于控制从冻结的基础模型生成，从而保证从RL目标的解决方案中采样。我们在实证上证明CD作为一个流行基准上的控制机制是有效的。我们还展示了在推断时可以结合多个奖励的前缀评分器，有效解决多目标RL问题而无需额外训练。我们展示了应用CD的好处可以转移到一个未见的基础模型，而无需进一步调整。最后，我们展示了CD可以以分块解码方式在推断时应用，从而实质上弥合了流行的最佳K策略和基于标记的控制之间的差距，使CD成为一种有望用于对齐语言模型的方法。

更新时间: 2024-06-03 20:50:26

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2310.17022v3

Calo-VQ: Vector-Quantized Two-Stage Generative Model in Calorimeter Simulation

We introduce a novel machine learning method developed for the fast simulation of calorimeter detector response, adapting vector-quantized variational autoencoder (VQ-VAE). Our model adopts a two-stage generation strategy: initially compressing geometry-aware calorimeter data into a discrete latent space, followed by the application of a sequence model to learn and generate the latent tokens. Extensive experimentation on the Calo-challenge dataset underscores the efficiency of our approach, showcasing a remarkable improvement in the generation speed compared with conventional method by a factor of 2000. Remarkably, our model achieves the generation of calorimeter showers within milliseconds. Furthermore, comprehensive quantitative evaluations across various metrics are performed to validate physics performance of generation.

Updated: 2024-06-03 20:38:35

标题: Calo-VQ：在量能器模拟中的矢量量化两阶段生成模型

摘要: 我们介绍了一种新颖的机器学习方法，该方法专门用于快速模拟量热计探测器响应，适应向量量化变分自动编码器（VQ-VAE）。我们的模型采用了两阶段生成策略：首先将具有几何感知的量热计数据压缩成离散的潜在空间，然后应用序列模型学习和生成潜在令牌。在Calo-challenge数据集上进行的大量实验突显了我们方法的效率，与传统方法相比，生成速度提高了2000倍。值得注意的是，我们的模型在毫秒内实现了量热器淋浴的生成。此外，通过对各种度量标准进行全面的定量评估，以验证生成的物理性能。

更新时间: 2024-06-03 20:38:35

领域: physics.ins-det,cs.LG,hep-ph

下载: http://arxiv.org/abs/2405.06605v2

Efficient Data Distribution Estimation for Accelerated Federated Learning

Federated Learning(FL) is a privacy-preserving machine learning paradigm where a global model is trained in-situ across a large number of distributed edge devices. These systems are often comprised of millions of user devices and only a subset of available devices can be used for training in each epoch. Designing a device selection strategy is challenging, given that devices are highly heterogeneous in both their system resources and training data. This heterogeneity makes device selection very crucial for timely model convergence and sufficient model accuracy. To tackle the FL client heterogeneity problem, various client selection algorithms have been developed, showing promising performance improvement in terms of model coverage and accuracy. In this work, we study the overhead of client selection algorithms in a large scale FL environment. Then we propose an efficient data distribution summary calculation algorithm to reduce the overhead in a real-world large scale FL environment. The evaluation shows that our proposed solution could achieve up to 30x reduction in data summary time, and up to 360x reduction in clustering time.

Updated: 2024-06-03 20:33:17

标题: 高效数据分布估计以加速联邦学习

摘要: 联邦学习（FL）是一种隐私保护的机器学习范式，其中全局模型在大量分布式边缘设备上进行原地训练。这些系统通常由数百万用户设备组成，每个时期只能使用部分可用设备进行训练。设计设备选择策略具有挑战性，因为设备在系统资源和训练数据方面高度异质化。这种异质性使得设备选择对及时模型收敛和足够的模型准确性非常关键。为了解决FL客户端异质性问题，已经开发了各种客户端选择算法，显示出在模型覆盖率和准确性方面有着令人期待的性能改进。在这项工作中，我们研究了大规模FL环境中客户端选择算法的开销。然后我们提出了一种有效的数据分布摘要计算算法，以减少在实际的大规模FL环境中的开销。评估结果显示，我们提出的解决方案可以实现数据摘要时间的高达30倍降低，聚类时间的高达360倍降低。

更新时间: 2024-06-03 20:33:17

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2406.01774v1

How to select an objective function using information theory

In machine learning or scientific computing, model performance is measured with an objective function. But why choose one objective over another? Information theory gives one answer: To maximize the information in the model, select the objective function that represents the error in the fewest bits. To evaluate different objectives, transform them into likelihood functions. As likelihoods, their relative magnitude represents how strongly we should prefer one objective versus another, and the log of that relation represents the difference in their bit-length, as well as the difference in their uncertainty. In other words, prefer whichever objective minimizes the uncertainty. Under the information-theoretic paradigm, the ultimate objective is to maximize information (and minimize uncertainty), as opposed to any specific utility. We argue that this paradigm is well-suited to models that have many uses and no definite utility, like the large Earth system models used to understand the effects of climate change.

Updated: 2024-06-03 20:28:06

标题: 如何使用信息论选择客观函数

摘要: 在机器学习或科学计算中，模型性能是用客观函数来衡量的。但为什么要选择一个目标而不是另一个呢？信息理论给出了一个答案：为了最大化模型中的信息量，选择代表误差最少的客观函数。要评估不同的目标，将它们转化为似然函数。作为似然函数，它们的相对大小表示我们应该多么强烈地偏好一个目标而不是另一个，而这种关系的对数表示了它们的比特长度以及它们的不确定性之间的差异。换句话说，更偏好能最小化不确定性的目标。在信息理论范式下，最终目标是最大化信息（和最小化不确定性），而不是任何特定的效用。我们认为这种范式非常适合那些有许多用途但没有明确效用的模型，比如用来理解气候变化影响的大型地球系统模型。

更新时间: 2024-06-03 20:28:06

领域: cs.LG

下载: http://arxiv.org/abs/2212.06566v4

Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation

Model-Free Reinforcement Learning (MFRL), leveraging the policy gradient theorem, has demonstrated considerable success in continuous control tasks. However, these approaches are plagued by high gradient variance due to zeroth-order gradient estimation, resulting in suboptimal policies. Conversely, First-Order Model-Based Reinforcement Learning (FO-MBRL) methods employing differentiable simulation provide gradients with reduced variance but are susceptible to sampling error in scenarios involving stiff dynamics, such as physical contact. This paper investigates the source of this error and introduces Adaptive Horizon Actor-Critic (AHAC), an FO-MBRL algorithm that reduces gradient error by adapting the model-based horizon to avoid stiff dynamics. Empirical findings reveal that AHAC outperforms MFRL baselines, attaining 40% more reward across a set of locomotion tasks and efficiently scaling to high-dimensional control environments with improved wall-clock-time efficiency.

Updated: 2024-06-03 20:23:49

标题: 适应性视野行动者-评论家在接触丰富的可微分仿真中的策略学习

摘要: 无模型强化学习（MFRL），利用政策梯度定理，在连续控制任务中取得了相当大的成功。然而，这些方法由于零阶梯度估计而受到高梯度变化的困扰，导致子优化政策。相反，使用可微分模拟提供梯度的一阶模型强化学习（FO-MBRL）方法具有降低方差的梯度，但在涉及刚性动力学的场景中容易受到采样误差的影响，如物理接触。本文调查了这种错误的来源，并引入了自适应视野演员-评论家（AHAC），这是一种FO-MBRL算法，通过调整基于模型的视野来避免刚性动力学，从而减少梯度误差。实证发现显示，AHAC优于MFRL基线，在一系列动作任务中获得了更多的奖励，同时在高维控制环境中有效地提高了墙钟时间效率。

更新时间: 2024-06-03 20:23:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17784v2

Beyond Ensemble Averages: Leveraging Climate Model Ensembles for Subseasonal Forecasting

Producing high-quality forecasts of key climate variables, such as temperature and precipitation, on subseasonal time scales has long been a gap in operational forecasting. This study explores an application of machine learning (ML) models as post-processing tools for subseasonal forecasting. Lagged numerical ensemble forecasts (i.e., an ensemble where the members have different initialization dates) and observational data, including relative humidity, pressure at sea level, and geopotential height, are incorporated into various ML methods to predict monthly average precipitation and two-meter temperature two weeks in advance for the continental United States. For regression, quantile regression, and tercile classification tasks, we consider using linear models, random forests, convolutional neural networks, and stacked models (a multi-model approach based on the prediction of the individual ML models). Unlike previous ML approaches that often use ensemble mean alone, we leverage information embedded in the ensemble forecasts to enhance prediction accuracy. Additionally, we investigate extreme event predictions that are crucial for planning and mitigation efforts. Considering ensemble members as a collection of spatial forecasts, we explore different approaches to using spatial information. Trade-offs between different approaches may be mitigated with model stacking. Our proposed models outperform standard baselines such as climatological forecasts and ensemble means. In addition, we investigate feature importance, trade-offs between using the full ensemble or only the ensemble mean, and different modes of accounting for spatial variability.

Updated: 2024-06-03 20:19:45

标题: 超越集合平均值：利用气候模型集合进行次季节预测

摘要: 生产关键气候变量的高质量预测，例如温度和降水，对亚季节时间尺度长期以来一直是操作预测中的一个差距。本研究探讨了机器学习（ML）模型作为亚季节预测后处理工具的应用。滞后数字集合预测（即，成员具有不同初始化日期的集合）和观测数据，包括相对湿度、海平面压力和位势高度，被纳入各种ML方法中，以预测美国大陆未来两周的月平均降水和两米温度。对于回归、分位数回归和tercile分类任务，我们考虑使用线性模型、随机森林、卷积神经网络和堆叠模型（基于各个ML模型的预测的多模型方法）。与先前经常仅使用集合平均值的ML方法不同，我们利用集合预测中嵌入的信息来增强预测准确性。此外，我们调查了对规划和减灾工作至关重要的极端事件预测。考虑集合成员作为空间预测的集合，我们探讨了使用空间信息的不同方法。不同方法之间的权衡可能通过模型堆叠来缓解。我们提出的模型优于标准基线，如气候预测和集合平均值。此外，我们调查了特征重要性、使用完整集合还是仅使用集合平均值以及不同的考虑空间变异性的模式之间的权衡。

更新时间: 2024-06-03 20:19:45

领域: cs.LG,physics.ao-ph

下载: http://arxiv.org/abs/2211.15856v4

CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification

In the investment industry, it is often essential to carry out fine-grained company similarity quantification for a range of purposes, including market mapping, competitor analysis, and mergers and acquisitions. We propose and publish a knowledge graph, named CompanyKG, to represent and learn diverse company features and relations. Specifically, 1.17 million companies are represented as nodes enriched with company description embeddings; and 15 different inter-company relations result in 51.06 million weighted edges. To enable a comprehensive assessment of methods for company similarity quantification, we have devised and compiled three evaluation tasks with annotated test sets: similarity prediction, competitor retrieval and similarity ranking. We present extensive benchmarking results for 11 reproducible predictive methods categorized into three groups: node-only, edge-only, and node+edge. To the best of our knowledge, CompanyKG is the first large-scale heterogeneous graph dataset originating from a real-world investment platform, tailored for quantifying inter-company similarity.

Updated: 2024-06-03 20:19:01

标题: CompanyKG：用于公司相似度量化的大规模异构图

摘要: 在投资行业中，经常需要进行精细化的公司相似性量化，以用于市场映射、竞争对手分析和并购等多种目的。我们提出并发布了一个名为CompanyKG的知识图，用于表示和学习不同的公司特征和关系。具体地，117万家公司被表示为节点，并丰富了公司描述嵌入；15种不同的公司间关系导致了5106万个加权边。为了全面评估公司相似性量化方法，我们设计并编制了三个评估任务，并附有注释的测试集：相似性预测、竞争对手检索和相似性排名。我们对11种可重复预测方法进行了广泛的基准测试结果，分为节点、边和节点+边三组。据我们所知，CompanyKG是第一个大规模异构图数据集，源自于实际投资平台，专门用于量化公司间的相似性。

更新时间: 2024-06-03 20:19:01

领域: cs.AI,cs.CE,cs.DB,cs.LG,05C85, 05C12, 68T07, 68T50, 05C90,E.0; I.2.1; I.2.6; H.4.0; J.0; I.2.8; I.2.7

下载: http://arxiv.org/abs/2306.10649v3

Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination

In the rapidly changing healthcare landscape, the implementation of offline reinforcement learning (RL) in dynamic treatment regimes (DTRs) presents a mix of unprecedented opportunities and challenges. This position paper offers a critical examination of the current status of offline RL in the context of DTRs. We argue for a reassessment of applying RL in DTRs, citing concerns such as inconsistent and potentially inconclusive evaluation metrics, the absence of naive and supervised learning baselines, and the diverse choice of RL formulation in existing research. Through a case study with more than 17,000 evaluation experiments using a publicly available Sepsis dataset, we demonstrate that the performance of RL algorithms can significantly vary with changes in evaluation metrics and Markov Decision Process (MDP) formulations. Surprisingly, it is observed that in some instances, RL algorithms can be surpassed by random baselines subjected to policy evaluation methods and reward design. This calls for more careful policy evaluation and algorithm development in future DTR works. Additionally, we discussed potential enhancements toward more reliable development of RL-based dynamic treatment regimes and invited further discussion within the community. Code is available at https://github.com/GilesLuo/ReassessDTR.

Updated: 2024-06-03 20:16:11

标题: 强化学习在动态治疗方案中的需求需要进行关键性再审议

摘要: 在快速变化的医疗保健领域，离线强化学习（RL）在动态治疗方案（DTRs）中的实施带来了前所未有的机遇和挑战。这篇立场论文对当前离线RL在DTRs背景下的现状进行了批判性审视。我们认为需要重新评估在DTRs中应用RL，引用了一些担忧，比如评估指标不一致且可能无法得出结论，缺乏朴素和监督学习基准线，以及现有研究中RL公式的多样选择。通过对一个公开可用的败血症数据集进行超过17,000次评估实验的案例研究，我们展示了RL算法的性能在评估指标和马尔可夫决策过程（MDP）公式变化时可能会显著不同。令人惊讶的是，在某些情况下观察到，RL算法在受到政策评估方法和奖励设计的影响下，可能会被随机基线超越。这需要未来DTR工作中更加谨慎的政策评估和算法开发。此外，我们讨论了朝着更可靠发展基于RL的动态治疗方案的潜在增强措施，并邀请社区内进一步讨论。代码可在https://github.com/GilesLuo/ReassessDTR找到。

更新时间: 2024-06-03 20:16:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18556v2

Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning

We leverage convex and bilevel optimization techniques to develop a general gradient-based parameter learning framework for neural-symbolic (NeSy) systems. We demonstrate our framework with NeuPSL, a state-of-the-art NeSy architecture. To achieve this, we propose a smooth primal and dual formulation of NeuPSL inference and show learning gradients are functions of the optimal dual variables. Additionally, we develop a dual block coordinate descent algorithm for the new formulation that naturally exploits warm-starts. This leads to over 100x learning runtime improvements over the current best NeuPSL inference method. Finally, we provide extensive empirical evaluations across 8 datasets covering a range of tasks and demonstrate our learning framework achieves up to a 16% point prediction performance improvement over alternative learning methods.

Updated: 2024-06-03 20:15:37

标题: 凸优化和双层优化用于神经符号推理和学习

摘要: 我们利用凸优化和双层优化技术，为神经符号系统（NeSy）开发了一个基于梯度的参数学习框架。我们通过NeuPSL演示了我们的框架，这是一种最先进的NeSy架构。为了实现这一目标，我们提出了NeuPSL推理的平滑原始和对偶公式，并展示学习梯度是最优对偶变量的函数。此外，我们为新公式开发了一个对偶块坐标下降算法，自然地利用了热启动。这导致学习运行时间比当前最佳NeuPSL推理方法提高了100倍以上。最后，我们在涵盖一系列任务的8个数据集上进行了广泛的实证评估，并展示我们的学习框架相比替代学习方法可以实现高达16%的点预测性能改进。

更新时间: 2024-06-03 20:15:37

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2401.09651v2

How Does Gradient Descent Learn Features -- A Local Analysis for Regularized Two-Layer Neural Networks

The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works also demonstrate the potential for neural networks to go beyond NTK regime and perform feature learning. Recently, a line of work highlighted the feature learning capabilities of the early stages of gradient-based training. In this paper we consider another mechanism for feature learning via gradient descent through a local convergence analysis. We show that once the loss is below a certain threshold, gradient descent with a carefully regularized objective will capture ground-truth directions. Our results demonstrate that feature learning not only happens at the initial gradient steps, but can also occur towards the end of training.

Updated: 2024-06-03 20:15:28

标题: 梯度下降如何学习特征--正则化两层神经网络的局部分析

摘要: 神经网络学习有用特征的能力是其主要优势之一。尽管最近的研究表明神经网络可以在不允许特征学习的神经切线核（NTK）范围内运行，但许多研究也展示了神经网络超越NTK范围并进行特征学习的潜力。最近的一系列工作突出了基于梯度训练的早期阶段的特征学习能力。在本文中，我们考虑通过局部收敛分析通过梯度下降实现特征学习的另一种机制。我们展示了一旦损失低于一定阈值，带有仔细正则化目标的梯度下降将捕获地面真实方向。我们的结果表明，特征学习不仅发生在初始梯度步骤中，而且也可能发生在训练结束时。

更新时间: 2024-06-03 20:15:28

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01766v1

Scalable Diffusion for Materials Generation

Generative models trained on internet-scale data are capable of generating novel and realistic texts, images, and videos. A natural next question is whether these models can advance science, for example by generating novel stable materials. Traditionally, models with explicit structures (e.g., graphs) have been used in modeling structural relationships in scientific data (e.g., atoms and bonds in crystals), but generating structures can be difficult to scale to large and complex systems. Another challenge in generating materials is the mismatch between standard generative modeling metrics and downstream applications. For instance, common metrics such as the reconstruction error do not correlate well with the downstream goal of discovering stable materials. In this work, we tackle the scalability challenge by developing a unified crystal representation that can represent any crystal structure (UniMat), followed by training a diffusion probabilistic model on these UniMat representations. Our empirical results suggest that despite the lack of explicit structure modeling, UniMat can generate high fidelity crystal structures from larger and more complex chemical systems, outperforming previous graph-based approaches under various generative modeling metrics. To better connect the generation quality of materials to downstream applications, such as discovering novel stable materials, we propose additional metrics for evaluating generative models of materials, including per-composition formation energy and stability with respect to convex hulls through decomposition energy from Density Function Theory (DFT). Lastly, we show that conditional generation with UniMat can scale to previously established crystal datasets with up to millions of crystals structures, outperforming random structure search (the current leading method for structure discovery) in discovering new stable materials.

Updated: 2024-06-03 20:07:00

标题: 可扩展的材料生成扩散

摘要: 在互联网规模数据上训练的生成模型能够生成新颖和逼真的文本、图片和视频。一个自然的下一个问题是这些模型是否可以推动科学发展，例如通过生成新型稳定材料。传统上，在科学数据（如晶体中的原子和键）中建模结构关系时，通常使用具有显式结构（例如图形）的模型，但生成结构可能难以扩展到大型和复杂系统。在生成材料方面的另一个挑战是标准生成建模指标与下游应用之间的不匹配。例如，常见的指标如重建误差与发现稳定材料的下游目标之间的相关性不强。在这项工作中，我们通过开发一个能够表示任何晶体结构的统一晶体表示（UniMat），然后在这些UniMat表示上训练扩散概率模型，来解决可扩展性挑战。我们的实证结果表明，尽管缺乏显式结构建模，UniMat可以从更大更复杂的化学系统中生成高保真度的晶体结构，在各种生成建模指标下优于先前的基于图形的方法。为了更好地将材料的生成质量与下游应用（如发现新型稳定材料）连接起来，我们提出了用于评估材料生成模型的额外指标，包括相对于密度泛函理论（DFT）的分解能量的每组成形成能量和稳定性。最后，我们展示了使用UniMat的条件生成可以扩展到具有数百万晶体结构的先前建立的晶体数据集，优于随机结构搜索（当前领先的结构发现方法）在发现新型稳定材料方面。

更新时间: 2024-06-03 20:07:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.09235v2

Non-Asymptotic Analysis for Single-Loop (Natural) Actor-Critic with Compatible Function Approximation

Actor-critic (AC) is a powerful method for learning an optimal policy in reinforcement learning, where the critic uses algorithms, e.g., temporal difference (TD) learning with function approximation, to evaluate the current policy and the actor updates the policy along an approximate gradient direction using information from the critic. This paper provides the \textit{tightest} non-asymptotic convergence bounds for both the AC and natural AC (NAC) algorithms. Specifically, existing studies show that AC converges to an $\epsilon+\varepsilon_{\text{critic}}$ neighborhood of stationary points with the best known sample complexity of $\mathcal{O}(\epsilon^{-2})$ (up to a log factor), and NAC converges to an $\epsilon+\varepsilon_{\text{critic}}+\sqrt{\varepsilon_{\text{actor}}}$ neighborhood of the global optimum with the best known sample complexity of $\mathcal{O}(\epsilon^{-3})$, where $\varepsilon_{\text{critic}}$ is the approximation error of the critic and $\varepsilon_{\text{actor}}$ is the approximation error induced by the insufficient expressive power of the parameterized policy class. This paper analyzes the convergence of both AC and NAC algorithms with compatible function approximation. Our analysis eliminates the term $\varepsilon_{\text{critic}}$ from the error bounds while still achieving the best known sample complexities. Moreover, we focus on the challenging single-loop setting with a single Markovian sample trajectory. Our major technical novelty lies in analyzing the stochastic bias due to policy-dependent and time-varying compatible function approximation in the critic, and handling the non-ergodicity of the MDP due to the single Markovian sample trajectory. Numerical results are also provided in the appendix.

Updated: 2024-06-03 20:05:04

标题: 单回路（自然）演员-评论家与兼容函数逼近的非渐近分析

摘要: 演员-评论家（AC）是强化学习中学习最优策略的强大方法，其中评论家使用算法（例如，带有函数逼近的时间差异（TD）学习）评估当前策略，而演员使用评论家的信息沿近似梯度方向更新策略。本文提供了AC和自然AC（NAC）算法的最紧密的非渐近收敛界限。具体而言，现有研究显示，AC收敛于距离稳定点的$\epsilon+\varepsilon_{\text{critic}}$邻域，具有已知的最佳样本复杂度为$\mathcal{O}(\epsilon^{-2})$（最多有一个对数因子），而NAC收敛于全局最优解的$\epsilon+\varepsilon_{\text{critic}}+\sqrt{\varepsilon_{\text{actor}}}$邻域，具有已知的最佳样本复杂度为$\mathcal{O}(\epsilon^{-3})$，其中$\varepsilon_{\text{critic}}$是评论家的逼近误差，$\varepsilon_{\text{actor}}$是由参数化策略类的表达能力不足引起的逼近误差。本文分析了具有兼容函数逼近的AC和NAC算法的收敛性。我们的分析消除了误差界限中的$\varepsilon_{\text{critic}}$项，同时仍实现了已知的最佳样本复杂度。此外，我们专注于具有单个马尔可夫样本轨迹的挑战性单循环设置。我们的主要技术创新在于分析评论家中由策略相关和时变兼容函数逼近引起的随机偏差，并处理由于单个马尔可夫样本轨迹而导致的MDP的非遍历性。附录中还提供了数值结果。

更新时间: 2024-06-03 20:05:04

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.01762v1

Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.

Updated: 2024-06-03 20:02:33

标题: 提炼形态条件化超网络以实现高效通用形态控制

摘要: 学习跨不同机器人形态的通用策略可以显著提高学习效率，并实现对未见形态的零-shot泛化。然而，学习一个高性能的通用策略需要像transformers（TF）这样具有比较大内存和计算成本的复杂架构，而不是简单的多层感知器（MLP）。为了在推断时实现像TF一样的良好性能和像MLP一样的高效率，我们提出了HyperDistill，其中包括：（1）生成机器人特定MLP策略的形态条件化超网络（HN），以及（2）对成功训练至关重要的策略蒸馏方法。我们展示了在具有数百种不同形态的基准UNIMAL上，HyperDistill在训练和未见测试机器人上的表现与通用TF教师策略一样好，但在不同环境中将模型大小减少了6-14倍，计算成本减少了67-160倍。我们的分析将HyperDistill在推断时的效率优势归因于知识解耦，即解耦任务间和任务内知识的能力，这是一个通用原则，也可以应用于其他领域以提高推断效率。

更新时间: 2024-06-03 20:02:33

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2402.06570v2

Robust Data-driven Prescriptiveness Optimization

The abundance of data has led to the emergence of a variety of optimization techniques that attempt to leverage available side information to provide more anticipative decisions. The wide range of methods and contexts of application have motivated the design of a universal unitless measure of performance known as the coefficient of prescriptiveness. This coefficient was designed to quantify both the quality of contextual decisions compared to a reference one and the prescriptive power of side information. To identify policies that maximize the former in a data-driven context, this paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk minimization objective. We present a bisection algorithm to solve this model, which relies on solving a series of linear programs when the distributional ambiguity set has an appropriate nested form and polyhedral structure. Studying a contextual shortest path problem, we evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.

Updated: 2024-06-03 19:55:35

标题: 强大的数据驱动处方优化

摘要: 数据的丰富性导致了各种优化技术的出现，这些技术试图利用可用的辅助信息来提供更具预见性的决策。各种方法和应用背景的广泛性促使设计一种称为规范性系数的性能的通用无单位度量。该系数旨在量化与参考决策相比上下文决策的质量以及辅助信息的规范性能力。为了在数据驱动的背景下识别最大化前者的策略，本文引入了一个分布鲁棒的上下文优化模型，其中规范性系数替代了经典的经验风险最小化目标。我们提出了一个二分算法来解决这个模型，该算法依赖于解决一系列线性规划问题，当分布模糊集具有适当的嵌套形式和多面体结构时。通过研究一个上下文最短路径问题，我们评估了结果策略相对于其他方法在样本之外数据集受到不同程度分布偏移时的稳健性。

更新时间: 2024-06-03 19:55:35

领域: math.OC,cs.LG,stat.ME

下载: http://arxiv.org/abs/2306.05937v2

Online Algorithms with Uncertainty-Quantified Predictions

The burgeoning field of algorithms with predictions studies the problem of using possibly imperfect machine learning predictions to improve online algorithm performance. While nearly all existing algorithms in this framework make no assumptions on prediction quality, a number of methods providing uncertainty quantification (UQ) on machine learning models have been developed in recent years, which could enable additional information about prediction quality at decision time. In this work, we investigate the problem of optimally utilizing uncertainty-quantified predictions in the design of online algorithms. In particular, we study two classic online problems, ski rental and online search, where the decision-maker is provided predictions augmented with UQ describing the likelihood of the ground truth falling within a particular range of values. We demonstrate that non-trivial modifications to algorithm design are needed to fully leverage the UQ predictions. Moreover, we consider how to utilize more general forms of UQ, proposing an online learning framework that learns to exploit UQ to make decisions in multi-instance settings.

Updated: 2024-06-03 19:55:09

标题: 具有不确定性量化预测的在线算法

摘要: 算法预测领域的蓬勃发展研究了利用可能不完美的机器学习预测来改善在线算法性能的问题。尽管该框架中几乎所有现有的算法都不对预测质量做出任何假设，但近年来已开发出一些提供机器学习模型不确定性量化（UQ）的方法，这些方法可以在决策时提供有关预测质量的额外信息。在这项工作中，我们研究了在设计在线算法时如何最优地利用不确定性量化预测的问题。具体而言，我们研究了两个经典的在线问题，滑雪租赁和在线搜索，决策者提供了附加了描述地面真相可能落在特定值范围内的UQ的预测。我们证明需要对算法设计进行不平凡的修改才能充分利用UQ预测。此外，我们考虑如何利用更一般形式的UQ，提出了一个在线学习框架，该框架学习利用UQ在多实例设置中做出决策。

更新时间: 2024-06-03 19:55:09

领域: cs.LG,cs.DS

下载: http://arxiv.org/abs/2310.11558v2

From Latent to Lucid: Transforming Knowledge Graph Embeddings into Interpretable Structures

This paper introduces a post-hoc explainable AI method tailored for Knowledge Graph Embedding models. These models are essential to Knowledge Graph Completion yet criticized for their opaque, black-box nature. Despite their significant success in capturing the semantics of knowledge graphs through high-dimensional latent representations, their inherent complexity poses substantial challenges to explainability. Unlike existing methods, our approach directly decodes the latent representations encoded by Knowledge Graph Embedding models, leveraging the principle that similar embeddings reflect similar behaviors within the Knowledge Graph. By identifying distinct structures within the subgraph neighborhoods of similarly embedded entities, our method identifies the statistical regularities on which the models rely and translates these insights into human-understandable symbolic rules and facts. This bridges the gap between the abstract representations of Knowledge Graph Embedding models and their predictive outputs, offering clear, interpretable insights. Key contributions include a novel post-hoc explainable AI method for Knowledge Graph Embedding models that provides immediate, faithful explanations without retraining, facilitating real-time application even on large-scale knowledge graphs. The method's flexibility enables the generation of rule-based, instance-based, and analogy-based explanations, meeting diverse user needs. Extensive evaluations show our approach's effectiveness in delivering faithful and well-localized explanations, enhancing the transparency and trustworthiness of Knowledge Graph Embedding models.

Updated: 2024-06-03 19:54:11

标题: 从潜在到明晰：将知识图嵌入转化为可解释的结构

摘要: 这篇论文介绍了一种后续可解释的人工智能方法，专为知识图嵌入模型而设计。这些模型对于知识图完成至关重要，但由于其不透明的黑盒特性而受到批评。尽管它们在通过高维潜在表示捕捉知识图语义方面取得了显著成功，但其固有复杂性给解释性带来了重大挑战。与现有方法不同，我们的方法直接解码了知识图嵌入模型编码的潜在表示，利用了相似嵌入反映知识图内相似行为的原则。通过识别类似嵌入实体的子图邻域中的不同结构，我们的方法确定了模型依赖的统计规律，并将这些见解转化为人类可理解的符号规则和事实。这架起了知识图嵌入模型的抽象表示和其预测输出之间的桥梁，提供清晰、可解释的见解。主要贡献包括一种新颖的后续可解释的人工智能方法，适用于知识图嵌入模型，提供即时、忠实的解释，无需重新训练，甚至可以在大规模知识图上实时应用。该方法的灵活性使其能够生成基于规则、基于实例和基于类比的解释，满足多样化的用户需求。广泛的评估显示，我们的方法在提供忠实和良好局部化的解释方面的有效性，增强了知识图嵌入模型的透明度和可信度。

更新时间: 2024-06-03 19:54:11

领域: cs.AI

下载: http://arxiv.org/abs/2406.01759v1

Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities

The rise of foundation models holds immense promise for advancing AI, but this progress may amplify existing risks and inequalities, leaving marginalized communities behind. In this position paper, we discuss that disparities towards marginalized communities - performance, representation, privacy, robustness, interpretability and safety - are not isolated concerns but rather interconnected elements of a cascading disparity phenomenon. We contrast foundation models with traditional models and highlight the potential for exacerbated disparity against marginalized communities. Moreover, we emphasize the unique threat of cascading impacts in foundation models, where interconnected disparities can trigger long-lasting negative consequences, specifically to the people on the margin. We define marginalized communities within the machine learning context and explore the multifaceted nature of disparities. We analyze the sources of these disparities, tracing them from data creation, training and deployment procedures to highlight the complex technical and socio-technical landscape. To mitigate the pressing crisis, we conclude with a set of calls to action to mitigate disparity at its source.

Updated: 2024-06-03 19:52:41

标题: 职位：揭示对边缘化社区的级联不平等现象

摘要: 基金会模型的兴起为推动人工智能的发展带来了巨大的希望，但这一进展可能加剧现有的风险和不平等现象，使边缘化社区被落下。在这篇立场论文中，我们讨论了针对边缘化社区的差距 - 性能、表现、隐私、稳健性、可解释性和安全性 - 并非孤立的问题，而是一个连续性差距现象的相互关联要素。我们对基金会模型与传统模型进行了对比，并强调了针对边缘化社区加剧差距的潜力。此外，我们强调了基金会模型中连锁影响的独特威胁，其中相互关联的差距可能引发长期的负面后果，特别是对边缘人群。我们在机器学习背景下定义了边缘化社区，并探讨了差距的多方面性质。我们分析了这些差距的来源，追溯它们从数据创建、训练和部署程序中的复杂技术和社会技术景观。为了缓解迫在眉睫的危机，我们最后提出了一系列行动呼吁，以在根源处缓解差距。

更新时间: 2024-06-03 19:52:41

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.01757v1

Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal Initialization

Static sparse training aims to train sparse models from scratch, achieving remarkable results in recent years. A key design choice is given by the sparse initialization, which determines the trainable sub-network through a binary mask. Existing methods mainly select such mask based on a predefined dense initialization. Such an approach may not efficiently leverage the mask's potential impact on the optimization. An alternative direction, inspired by research into dynamical isometry, is to introduce orthogonality in the sparse subnetwork, which helps in stabilizing the gradient signal. In this work, we propose Exact Orthogonal Initialization (EOI), a novel sparse orthogonal initialization scheme based on composing random Givens rotations. Contrary to other existing approaches, our method provides exact (not approximated) orthogonality and enables the creation of layers with arbitrary densities. We demonstrate the superior effectiveness and efficiency of EOI through experiments, consistently outperforming common sparse initialization techniques. Our method enables training highly sparse 1000-layer MLP and CNN networks without residual connections or normalization techniques, emphasizing the crucial role of weight initialization in static sparse training alongside sparse mask selection. The code is available at https://github.com/woocash2/sparser-better-deeper-stronger

Updated: 2024-06-03 19:44:47

标题: 更稀疏，更好，更深，更强：通过精确正交初始化改善稀疏训练

摘要: 静态稀疏训练旨在从头开始训练稀疏模型，在近年来取得了显著的成果。一个关键的设计选择是通过稀疏初始化来确定可训练的子网络，这通过一个二进制掩模来实现。现有方法主要基于预定义的密集初始化来选择这种掩模。这种方法可能无法有效地利用掩模对优化的潜在影响。受到动态等谱研究的启发，另一个方向是在稀疏子网络中引入正交性，这有助于稳定梯度信号。在这项工作中，我们提出了Exact Orthogonal Initialization（EOI），这是一种基于组合随机Givens旋转的新型稀疏正交初始化方案。与其他现有方法不同，我们的方法提供了确切（而不是近似）的正交性，并且能够创建具有任意密度的层。我们通过实验证明了EOI的卓越有效性和效率，始终优于常见的稀疏初始化技术。我们的方法使得无需残差连接或标准化技术即可训练高度稀疏的1000层MLP和CNN网络，强调了在静态稀疏训练中权重初始化和稀疏掩模选择的关键作用。该代码可在https://github.com/woocash2/sparser-better-deeper-stronger 获取。

更新时间: 2024-06-03 19:44:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01755v1

Optimizing the Optimal Weighted Average: Efficient Distributed Sparse Classification

While distributed training is often viewed as a solution to optimizing linear models on increasingly large datasets, inter-machine communication costs of popular distributed approaches can dominate as data dimensionality increases. Recent work on non-interactive algorithms shows that approximate solutions for linear models can be obtained efficiently with only a single round of communication among machines. However, this approximation often degenerates as the number of machines increases. In this paper, building on the recent optimal weighted average method, we introduce a new technique, ACOWA, that allows an extra round of communication to achieve noticeably better approximation quality with minor runtime increases. Results show that for sparse distributed logistic regression, ACOWA obtains solutions that are more faithful to the empirical risk minimizer and attain substantially higher accuracy than other distributed algorithms.

Updated: 2024-06-03 19:43:06

标题: 优化最佳加权平均值：高效的分布式稀疏分类

摘要: 尽管分布式训练通常被视为优化线性模型在越来越大的数据集上的解决方案，但随着数据维度的增加，流行的分布式方法的机器间通信成本可能占主导地位。最近关于非交互式算法的研究表明，线性模型的近似解可以通过仅一轮机器间通信高效地获得。然而，随着机器数量的增加，这种近似通常会退化。在本文中，基于最近的最优加权平均方法，我们引入了一种新技术ACOWA，允许进行额外的一轮通信以在轻微运行时间增加的情况下实现明显更好的近似质量。结果表明，对于稀疏分布式逻辑回归，ACOWA获得的解更忠实于经验风险最小化器，并且比其他分布式算法实现了显著更高的准确性。

更新时间: 2024-06-03 19:43:06

领域: cs.LG,cs.DC,stat.ML

下载: http://arxiv.org/abs/2406.01753v1

Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs

Post-Training Quantization (PTQ) enhances the efficiency of Large Language Models (LLMs) by enabling faster operation and compatibility with more accessible hardware through reduced memory usage, at the cost of small performance drops. We explore the role of calibration sets in PTQ, specifically their effect on hidden activations in various notable open-source LLMs. Calibration sets are crucial for evaluating activation magnitudes and identifying outliers, which can distort the quantization range and negatively impact performance. Our analysis reveals a marked contrast in quantization effectiveness across models. The older OPT model, upon which much of the quantization literature is based, shows significant performance deterioration and high susceptibility to outliers with varying calibration sets. In contrast, newer models like Llama-2 7B, Llama-3 8B, Command-R 35B, and Mistral 7B demonstrate strong robustness, with Mistral 7B showing near-immunity to outliers and stable activations. These findings suggest a shift in PTQ strategies might be needed. As advancements in pre-training methods reduce the relevance of outliers, there is an emerging need to reassess the fundamentals of current quantization literature. The emphasis should pivot towards optimizing inference speed, rather than primarily focusing on outlier preservation, to align with the evolving characteristics of state-of-the-art LLMs.

Updated: 2024-06-03 19:35:36

标题: 离群值和校准集对现代LLM的量化效果逐渐减弱

摘要: 训练后量化（PTQ）通过减少内存使用，从而实现更快的操作和与更多可访问硬件兼容，提高了大型语言模型（LLMs）的效率，但代价是性能略微下降。我们探讨了PTQ中校准集的作用，特别是它们对各种知名开源LLMs中隐藏激活的影响。校准集对评估激活幅度和识别异常值至关重要，异常值可能扭曲量化范围并对性能产生负面影响。我们的分析揭示了不同模型之间在量化效果上的显著对比。基于OPT模型的许多量化文献显示出显着的性能下降和对不同校准集的异常值高度敏感。相比之下，像Llama-27B、Llama-38B、Command-R35B和Mistral7B这样的新模型展现出很强的稳健性，Mistral7B几乎对异常值免疫且激活稳定。这些发现表明可能需要调整PTQ策略。随着预训练方法的进步降低了异常值的相关性，当前量化文献的基础需要重新评估。重点应该转向优化推理速度，而不是主要关注异常值保留，以使其与最先进LLMs的演变特性相一致。

更新时间: 2024-06-03 19:35:36

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20835v2

MAWSEO: Adversarial Wiki Search Poisoning for Illicit Online Promotion

As a prominent instance of vandalism edits, Wiki search poisoning for illicit promotion is a cybercrime in which the adversary aims at editing Wiki articles to promote illicit businesses through Wiki search results of relevant queries. In this paper, we report a study that, for the first time, shows that such stealthy blackhat SEO on Wiki can be automated. Our technique, called MAWSEO, employs adversarial revisions to achieve real-world cybercriminal objectives, including rank boosting, vandalism detection evasion, topic relevancy, semantic consistency, user awareness (but not alarming) of promotional content, etc. Our evaluation and user study demonstrate that MAWSEO is capable of effectively and efficiently generating adversarial vandalism edits, which can bypass state-of-the-art built-in Wiki vandalism detectors, and also get promotional content through to Wiki users without triggering their alarms. In addition, we investigated potential defense, including coherence based detection and adversarial training of vandalism detection, against our attack in the Wiki ecosystem.

Updated: 2024-06-03 19:35:25

标题: MAWSEO：用于非法在线推广的对抗性维基搜索污染

摘要: 作为恶意编辑的一个突出例子，维基搜索毒化是一种网络犯罪，对手旨在通过编辑维基文章来促进非法业务，通过相关查询的维基搜索结果。本文报告了一项研究，首次显示这种在维基上的隐蔽黑帽SEO可以自动化。我们的技术，称为MAWSEO，利用对抗性修订来实现现实世界的网络犯罪目标，包括排名提升、破坏检测规避、主题相关性、语义一致性、用户意识（但不引起警觉）的推广内容等。我们的评估和用户研究表明，MAWSEO能够有效且高效地生成对抗性破坏性编辑，可以绕过最先进的内置维基破坏检测器，并将推广内容传递给维基用户，而不触发他们的警报。此外，我们调查了潜在的防御措施，包括基于连贯性的检测和破坏检测的对抗性训练，针对维基生态系统中的我们的攻击。

更新时间: 2024-06-03 19:35:25

领域: cs.CR,cs.AI,cs.IR

下载: http://arxiv.org/abs/2304.11300v3

Safeguarding Large Language Models: A Survey

In the burgeoning field of Large Language Models (LLMs), developing a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced into a comprehensive mechanism dealing with ethical issues in various contexts. First, the paper elucidates the current landscape of safeguarding mechanisms that major LLM service providers and the open-source community employ. This is followed by the techniques to evaluate, analyze, and enhance some (un)desirable properties that a guardrail might want to enforce, such as hallucinations, fairness, privacy, and so on. Based on them, we review techniques to circumvent these controls (i.e., attacks), to defend the attacks, and to reinforce the guardrails. While the techniques mentioned above represent the current status and the active research trends, we also discuss several challenges that cannot be easily dealt with by the methods and present our vision on how to implement a comprehensive guardrail through the full consideration of multi-disciplinary approach, neural-symbolic method, and systems development lifecycle.

Updated: 2024-06-03 19:27:46

标题: 保护大型语言模型：一项调查

摘要: 在蓬勃发展的大语言模型（LLMs）领域，开发一个强大的安全机制，俗称为“保障措施”或“防护栏”，已经成为确保LLMs在规定范围内得到道德使用的必要条件。本文对这一关键机制的当前状况进行了系统文献综述。文章讨论了其主要挑战以及如何将其提升为一个处理各种情境中的伦理问题的综合机制。首先，论文阐明了主要LLM服务提供商和开源社区采用的保障机制的当前景观。接着介绍了评估、分析和增强一些（不）希望实施的属性的技术，比如幻觉、公平性、隐私等。基于这些技术，我们回顾了规避这些控制（即攻击）、防御攻击和加强防护栏的技术。虽然上述技术代表了当前状况和活跃的研究趋势，我们还讨论了几个不能轻易通过方法解决的挑战，并提出了我们对如何通过全面考虑跨学科方法、神经符号方法和系统开发生命周期来实施综合防护栏的愿景。

更新时间: 2024-06-03 19:27:46

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.02622v1

Crisis Communication in the Face of Data Breaches

Data breaches refer to unauthorized accesses to data. Typically but not always, data breaches are about cyber crime. An organization facing such a crime is often also in a crisis situation. Therefore, organizations should prepare also for data breaches in their crisis management procedures. These procedures should include also crisis communication plans. To this end, this paper examines data breach crisis communication strategies and their practical executions. The background comes from the vibrant crisis communication research domain. According to a few qualitative case studies from Finland, the conventional wisdom holds well; the successful cases indicate communicating early, taking responsibility, offering an apology, and notifying public authorities. The unsuccessful cases show varying degrees of the reverse, including shifting of blame, positioning of an organization as a victim, and failing to notify public authorities. With these qualitative insights, the paper contributes to the research domain by focusing specifically on data breach crises, their peculiarities, and their management, including with respect to European regulations that have been neglected in existing crisis communication research.

Updated: 2024-06-03 19:21:04

标题: 数据泄露危机中的危机传播

摘要: 数据泄露是指未经授权访问数据。通常情况下，数据泄露涉及网络犯罪，但并非总是如此。面临此类犯罪的组织通常也处于危机中。因此，组织应在其危机管理程序中做好数据泄露的准备。这些程序还应包括危机沟通计划。为此，本文研究了数据泄露危机沟通策略及其实际执行情况。背景源自充满活力的危机沟通研究领域。根据芬兰的一些定性案例研究，传统智慧得到了很好的验证；成功案例表明需要及早沟通，承担责任，道歉并通知公共机构。不成功的案例显示出不同程度的相反情况，包括推卸责任，将组织定位为受害者，以及未能通知公共机构。通过这些定性见解，本文专注于数据泄露危机，其特殊性及其管理，包括欧洲规定，在现有危机沟通研究中被忽视。

更新时间: 2024-06-03 19:21:04

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2406.01744v1

CheXpert Plus: Augmenting a Large Chest X-ray Dataset with Text Radiology Reports, Patient Demographics and Additional Image Formats

Since the release of the original CheXpert paper five years ago, CheXpert has become one of the most widely used and cited clinical AI datasets. The emergence of vision language models has sparked an increase in demands for sharing reports linked to CheXpert images, along with a growing interest among AI fairness researchers in obtaining demographic data. To address this, CheXpert Plus serves as a new collection of radiology data sources, made publicly available to enhance the scaling, performance, robustness, and fairness of models for all subsequent machine learning tasks in the field of radiology. CheXpert Plus is the largest text dataset publicly released in radiology, with a total of 36 million text tokens, including 13 million impression tokens. To the best of our knowledge, it represents the largest text de-identification effort in radiology, with almost 1 million PHI spans anonymized. It is only the second time that a large-scale English paired dataset has been released in radiology, thereby enabling, for the first time, cross-institution training at scale. All reports are paired with high-quality images in DICOM format, along with numerous image and patient metadata covering various clinical and socio-economic groups, as well as many pathology labels and RadGraph annotations. We hope this dataset will boost research for AI models that can further assist radiologists and help improve medical care. Data is available at the following URL: https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 Models are available at the following URL: https://github.com/Stanford-AIMI/chexpert-plus

Updated: 2024-06-03 19:14:12

标题: CheXpert Plus：使用放射学报告、患者人口统计和额外图像格式增强大规模胸部X射线数据集

摘要: 自从五年前发布了原始CheXpert论文以来，CheXpert已成为最广泛使用和引用的临床人工智能数据集之一。视觉语言模型的出现引发了对与CheXpert图像相关的报告分享的需求增加，同时也引起了人工智能公平性研究人员对获取人口统计数据的兴趣。为了解决这个问题，CheXpert Plus作为一种新的放射学数据源集合，已公开提供，以增强放射学领域所有后续机器学习任务的扩展性、性能、鲁棒性和公平性。CheXpert Plus是公开发布的放射学中最大的文本数据集，总共包含3600万个文本标记，其中包括1300万个印象标记。据我们所知，这代表了放射学中最大规模的文本去识别努力，几乎有100万个PHI跨度被匿名化。这是放射学中第二次发布大规模英文配对数据集，从而首次实现了跨机构规模的培训。所有报告都与DICOM格式的高质量图像配对，还包括涵盖各种临床和社会经济群体的多个图像和患者元数据，以及许多病理标签和RadGraph注释。我们希望这个数据集将推动AI模型的研究，进一步帮助放射科医生，并改善医疗保健。数据可在以下网址获取：https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 模型可在以下网址获取：https://github.com/Stanford-AIMI/chexpert-plus

更新时间: 2024-06-03 19:14:12

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.19538v2

Good Vibes! Towards Phone-to-User Authentication Through Wristwatch Vibrations

While mobile devices frequently require users to authenticate to prevent unauthorized access, mobile devices typically do not authenticate to their users. This leaves room for users to unwittingly interact with different mobile devices. We present GoodVibes authentication, a variant of mobile device-to-user authentication, where the user's phone authenticates to the user through their wristwatch vibrating in their pre-selected authentication vibration pattern. We implement GoodVibes authentication as an Android prototype, evaluate different authentication scenarios with 30 participants, and find users to be able to well recognize and distinguish their authentication vibration pattern from different patters, from unrelated vibrations, and from the pattern being absent.

Updated: 2024-06-03 18:59:52

标题: 好的氛围！通过手表振动实现手机对用户的认证

摘要: 移动设备通常要求用户进行身份验证，以防止未经授权的访问，但是移动设备通常不会对其用户进行身份验证。这给用户留下了与不同移动设备无意地互动的空间。我们提出了GoodVibes身份验证，这是一种移动设备对用户进行身份验证的变体，用户的手机通过他们的手表以预先选择的身份验证振动模式来对用户进行身份验证。我们将GoodVibes身份验证作为Android原型进行实施，通过30名参与者评估不同的身份验证场景，发现用户能够很好地识别和区分他们的身份验证振动模式与不同模式、不相关振动以及缺少模式。

更新时间: 2024-06-03 18:59:52

领域: cs.HC,cs.CR

下载: http://arxiv.org/abs/2406.01738v1

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

Diffusion Transformers have recently demonstrated unprecedented generative capabilities for various tasks. The encouraging results, however, come with the cost of slow inference, since each denoising step requires inference on a transformer model with a large scale of parameters. In this study, we make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through introducing a caching mechanism, can be readily removed even without updating the model parameters. In the case of U-ViT-H/2, for example, we may remove up to 93.68% of the computation in the cache steps (46.84% for all steps), with less than 0.01 drop in FID. To achieve this, we introduce a novel scheme, named Learning-to-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Specifically, by leveraging the identical structure of layers in transformers and the sequential nature of diffusion, we explore redundant computations between timesteps by treating each layer as the fundamental unit for caching. To address the challenge of the exponential search space in deep models for identifying layers to cache and remove, we propose a novel differentiable optimization objective. An input-invariant yet timestep-variant router is then optimized, which can finally produce a static computation graph. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-Solver, alongside prior cache-based methods at the same inference speed.

Updated: 2024-06-03 18:49:57

标题: 学习缓存：通过层缓存加速扩散变压器

摘要: 最近，扩散变压器在各种任务中展示了前所未有的生成能力。然而，令人鼓舞的结果伴随着推理速度缓慢的代价，因为每个去噪步骤都需要对一个参数规模庞大的变压器模型进行推理。在本研究中，我们做出了一个有趣且有些令人惊讶的观察：通过引入缓存机制，扩散变压器中大部分层的计算，甚至在不更新模型参数的情况下也可以轻松移除。例如，在U-ViT-H/2的情况下，我们可以在缓存步骤中移除高达93.68%的计算（所有步骤中的46.84%），而FID下降不到0.01。为实现这一目标，我们引入了一种名为Learning-to-Cache（L2C）的新方案，它学习以动态方式进行扩散变压器的缓存。具体来说，通过利用变压器中各层的相同结构和扩散的顺序性质，我们通过将每一层视为缓存的基本单元来探索各时间步之间的冗余计算。为了解决深度模型中寻找要缓存和移除的层的指数搜索空间挑战，我们提出了一种新颖的可微优化目标。然后，我们优化了一个输入不变但时间步变化的路由器，最终可以生成一个静态计算图。实验结果表明，L2C在推理速度相同时大大优于类似DDIM和DPM-Solver的采样器，以及先前基于缓存的方法。

更新时间: 2024-06-03 18:49:57

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.01733v1

Inference of Utilities and Time Preference in Sequential Decision-Making

This paper introduces a novel stochastic control framework to enhance the capabilities of automated investment managers, or robo-advisors, by accurately inferring clients' investment preferences from past activities. Our approach leverages a continuous-time model that incorporates utility functions and a generic discounting scheme of a time-varying rate, tailored to each client's risk tolerance, valuation of daily consumption, and significant life goals. We address the resulting time inconsistency issue through state augmentation and the establishment of the dynamic programming principle and the verification theorem. Additionally, we provide sufficient conditions for the identifiability of client investment preferences. To complement our theoretical developments, we propose a learning algorithm based on maximum likelihood estimation within a discrete-time Markov Decision Process framework, augmented with entropy regularization. We prove that the log-likelihood function is locally concave, facilitating the fast convergence of our proposed algorithm. Practical effectiveness and efficiency are showcased through two numerical examples, including Merton's problem and an investment problem with unhedgeable risks. Our proposed framework not only advances financial technology by improving personalized investment advice but also contributes broadly to other fields such as healthcare, economics, and artificial intelligence, where understanding individual preferences is crucial.

Updated: 2024-06-03 18:40:20

标题: 在顺序决策中推断效用和时间偏好

摘要: 本文介绍了一个新颖的随机控制框架，以提升自动投资管理者或机器顾问（robo-advisors）的能力，通过准确推断客户过去活动中的投资偏好。我们的方法利用了一个连续时间模型，其中包括效用函数和一个根据每个客户的风险容忍度、每日消费估值和重要生活目标定制的时间变化率的一般折现方案。我们通过状态增强和建立动态规划原理和验证定理来解决由此产生的时间不一致性问题。此外，我们为客户投资偏好的可识别性提供了充分条件。为了补充我们的理论发展，我们提出了一种基于最大似然估计的学习算法，该算法在离散时间马尔可夫决策过程框架中进行了增强，并增加了熵正则化。我们证明对数似然函数是局部凹的，有助于我们提出的算法快速收敛。通过包括默顿问题和一个具有不可对冲风险的投资问题在内的两个数值示例展示了我们提出的框架的实际有效性和效率。我们提出的框架不仅通过改善个性化投资建议推进了金融技术，还广泛贡献于其他领域，如医疗保健、经济学和人工智能，在这些领域，理解个人偏好至关重要。

更新时间: 2024-06-03 18:40:20

领域: math.OC,cs.LG,q-fin.CP

下载: http://arxiv.org/abs/2405.15975v2

Federated Learning-based Collaborative Wideband Spectrum Sensing and Scheduling for UAVs in UTM Systems

In this paper, we propose a data-driven framework for collaborative wideband spectrum sensing and scheduling for networked unmanned aerial vehicles (UAVs), which act as the secondary users (SUs) to opportunistically utilize detected "spectrum holes". Our overall framework consists of three main stages. Firstly, in the model training stage, we explore dataset generation in a multi-cell environment and training a machine learning (ML) model using the federated learning (FL) architecture. Unlike the existing studies on FL for wireless that presume datasets are readily available for training, we propose a novel architecture that directly integrates wireless dataset generation, which involves capturing I/Q samples from over-the-air signals in a multi-cell environment, into the FL training process. Secondly, in the collaborative spectrum inference stage, we propose a collaborative spectrum fusion strategy that is compatible with the unmanned aircraft system traffic management (UTM) ecosystem. Finally, in the spectrum scheduling stage, we leverage reinforcement learning (RL) solutions to dynamically allocate the detected spectrum holes to the secondary users. To evaluate the proposed methods, we establish a comprehensive simulation framework that generates a near-realistic synthetic dataset using MATLAB LTE toolbox by incorporating base-station~(BS) locations in a chosen area of interest, performing ray-tracing, and emulating the primary users channel usage in terms of I/Q samples. This evaluation methodology provides a flexible framework to generate large spectrum datasets that could be used for developing ML/AI-based spectrum management solutions for aerial devices.

Updated: 2024-06-03 18:39:27

标题: 基于联邦学习的UTM系统中用于无人机的协作式宽带频谱感知和调度

摘要: 在本文中，我们提出了一个基于数据驱动的框架，用于协作宽带频谱感知和调度，应用于组网无人机（UAV）网络，这些无人机作为次要用户（SUs）机会性地利用检测到的“频谱空洞”。我们的整体框架包括三个主要阶段。首先，在模型训练阶段，我们探索多小区环境中的数据集生成，并使用联邦学习（FL）架构训练机器学习（ML）模型。与现有的针对无线通信的FL研究不同，这些研究假设数据集已经准备好进行训练，我们提出了一种新颖的架构，直接将无线数据集生成整合到FL训练过程中，包括在多小区环境中捕获空中信号的I/Q样本。其次，在协作频谱推断阶段，我们提出了一种与无人机系统交通管理（UTM）生态系统兼容的协作频谱融合策略。最后，在频谱调度阶段，我们利用强化学习（RL）解决方案动态分配检测到的频谱空洞给次要用户。为了评估所提出的方法，我们建立了一个全面的仿真框架，使用MATLAB LTE工具箱生成一个接近真实的合成数据集，通过在感兴趣区域内加入基站（BS）位置，进行射线跟踪，并模拟主用户的I/Q样本通道使用。这种评估方法提供了一个灵活的框架，可用于生成大型频谱数据集，用于开发基于ML/AI的空中设备频谱管理解决方案。

更新时间: 2024-06-03 18:39:27

领域: cs.LG,cs.MA,eess.SP

下载: http://arxiv.org/abs/2406.01727v1

Private Edge Density Estimation for Random Graphs: Optimal, Efficient and Robust

We give the first polynomial-time, differentially node-private, and robust algorithm for estimating the edge density of Erd\H{o}s-R\'enyi random graphs and their generalization, inhomogeneous random graphs. We further prove information-theoretical lower bounds, showing that the error rate of our algorithm is optimal up to logarithmic factors. Previous algorithms incur either exponential running time or suboptimal error rates. Two key ingredients of our algorithm are (1) a new sum-of-squares algorithm for robust edge density estimation, and (2) the reduction from privacy to robustness based on sum-of-squares exponential mechanisms due to Hopkins et al. (STOC 2023).

Updated: 2024-06-03 18:36:15

标题: 随机图的私密边密度估计：最佳、高效和健壮

摘要: 我们提出了第一个多项式时间、差异节点私密性和鲁棒性算法，用于估计Erd\H{o}s-R\'enyi随机图及其泛化版本、不均匀随机图的边密度。我们进一步证明了信息论下界，表明我们算法的误差率在对数因子上是最优的。先前的算法要么需要指数运行时间，要么产生次优的误差率。我们算法的两个关键要素是：（1）用于鲁棒边密度估计的新的平方和算法，以及（2）基于Hopkins等人的平方和指数机制的隐私到鲁棒性转化（STOC 2023）。

更新时间: 2024-06-03 18:36:15

领域: cs.DS,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.16663v2

Learn To be Efficient: Build Structured Sparsity in Large Language Models

Large Language Models (LLMs) have achieved remarkable success with their billion-level parameters, yet they incur high inference overheads. The emergence of activation sparsity in LLMs provides a natural approach to reduce this cost by involving only parts of the parameters for inference. However, existing methods only focus on utilizing this naturally formed activation sparsity in a post-training setting, overlooking the potential for further amplifying this inherent sparsity. In this paper, we hypothesize that LLMs can learn to be efficient by achieving more structured activation sparsity. To achieve this, we introduce a novel training algorithm, Learn-To-be-Efficient (LTE), designed to train efficiency-aware LLMs to learn to activate fewer neurons and achieve a better trade-off between sparsity and performance. Furthermore, unlike SOTA MoEfication methods, which mainly focus on ReLU-based models, LTE can also be applied to LLMs like LLaMA using non-ReLU activations. Extensive evaluation on language understanding, language generation, and instruction tuning tasks show that LTE consistently outperforms SOTA baselines. Along with our hardware-aware custom kernel implementation, LTE reduces LLaMA2-7B inference latency by 25% at 50% sparsity.

Updated: 2024-06-03 18:28:58

标题: 学习高效：在大型语言模型中建立结构化稀疏性

摘要: 大型语言模型（LLMs）以其十亿级参数取得了显著的成功，但它们带来了很高的推理开销。LLMs中激活稀疏性的出现为通过仅涉及部分参数进行推理从而降低成本提供了一种自然方法。然而，现有方法只关注在训练后设置中利用自然形成的激活稀疏性，忽视了进一步增强这种固有稀疏性的潜力。本文假设LLMs可以通过实现更结构化的激活稀疏性来学会高效。为了实现这一目标，我们引入了一种新的训练算法，名为Learn-To-be-Efficient（LTE），旨在训练具有高效意识的LLMs学会激活更少的神经元，并在稀疏性和性能之间取得更好的平衡。此外，与主流的MoEfication方法主要关注基于ReLU的模型不同，LTE也可以应用于使用非ReLU激活函数的LLMs，如LLaMA。对语言理解、语言生成和指令调优任务的广泛评估表明，LTE始终优于主流基准。结合我们的硬件感知的自定义内核实现，LTE将LLaMA2-7B的推理延迟在50%稀疏性下减少了25%。

更新时间: 2024-06-03 18:28:58

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.06126v3

Quantum Inception Score

Motivated by the great success of classical generative models in machine learning, enthusiastic exploration of their quantum version has recently started. To depart on this journey, it is important to develop a relevant metric to evaluate the quality of quantum generative models; in the classical case, one such example is the (classical) inception score (cIS). In this paper, as a natural extension of cIS, we propose the quantum inception score (qIS) for quantum generators. Importantly, qIS relates the quality to the Holevo information of the quantum channel that classifies a given dataset. In this context, we show several properties of qIS. First, qIS is greater than or equal to the corresponding cIS, which is defined through projection measurements on the system output. Second, the difference between qIS and cIS arises from the presence of quantum coherence, as characterized by the resource theory of asymmetry. Third, when a set of entangled generators is prepared, there exists a classifying process leading to the further enhancement of qIS. Fourth, we harness the quantum fluctuation theorem to characterize the physical limitation of qIS. Finally, we apply qIS to assess the quality of the one-dimensional spin chain model as a quantum generative model, with the quantum convolutional neural network as a quantum classifier, for the phase classification problem in the quantum many-body physics.

Updated: 2024-06-03 18:26:26

标题: 量子创世分数

摘要: 受经典生成模型在机器学习中取得巨大成功的启发，人们最近开始热情地探索它们的量子版本。为了踏上这一旅程，开发一个相关的度量标准来评估量子生成模型的质量非常重要；在经典情况下，一个例子就是（经典）启发分数（cIS）。在本文中，作为对cIS的自然延伸，我们提出了量子启发分数（qIS）用于量子生成器。重要的是，qIS将质量与用于分类给定数据集的量子通道的霍列沃信息联系起来。在这种情况下，我们展示了qIS的几个特性。首先，qIS大于或等于对应的cIS，后者通过对系统输出进行投影测量来定义。其次，qIS和cIS之间的差异来自于量子相干的存在，这由不对称性的资源理论所表征。第三，当一组纠缠的生成器被准备好时，存在一个分类过程，可以进一步提高qIS。第四，我们利用量子波动定理来表征qIS的物理限制。最后，我们将qIS应用于评估一维自旋链模型作为量子生成模型的质量，以及将量子卷积神经网络作为量子分类器，用于解决量子多体物理中的相分类问题。

更新时间: 2024-06-03 18:26:26

领域: quant-ph,cond-mat.stat-mech,cs.LG,stat.ML

下载: http://arxiv.org/abs/2311.12163v3

Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and Time-Series Analysis

Transformers have revolutionized image modeling tasks with adaptations like DeIT, Swin, SVT, Biformer, STVit, and FDVIT. However, these models often face challenges with inductive bias and high quadratic complexity, making them less efficient for high-resolution images. State space models (SSMs) such as Mamba, V-Mamba, ViM, and SiMBA offer an alternative to handle high resolution images in computer vision tasks. These SSMs encounter two major issues. First, they become unstable when scaled to large network sizes. Second, although they efficiently capture global information in images, they inherently struggle with handling local information. To address these challenges, we introduce Heracles, a novel SSM that integrates a local SSM, a global SSM, and an attention-based token interaction module. Heracles leverages a Hartely kernel-based state space model for global image information, a localized convolutional network for local details, and attention mechanisms in deeper layers for token interactions. Our extensive experiments demonstrate that Heracles-C-small achieves state-of-the-art performance on the ImageNet dataset with 84.5\% top-1 accuracy. Heracles-C-Large and Heracles-C-Huge further improve accuracy to 85.9\% and 86.4\%, respectively. Additionally, Heracles excels in transfer learning tasks on datasets such as CIFAR-10, CIFAR-100, Oxford Flowers, and Stanford Cars, and in instance segmentation on the MSCOCO dataset. Heracles also proves its versatility by achieving state-of-the-art results on seven time-series datasets, showcasing its ability to generalize across domains with spectral data, capturing both local and global information. The project page is available at this link.\url{https://github.com/badripatro/heracles}

Updated: 2024-06-03 18:22:30

标题: 赫拉克勒斯：用于高分辨率图像和时间序列分析的混合SSM-Transformer模型

摘要: Transformers已经通过DeIT、Swin、SVT、Biformer、STVit和FDVIT等改进改变了图像建模任务。然而，这些模型通常面临归纳偏差和高二次复杂性的挑战，使它们在高分辨率图像上的效率较低。状态空间模型（SSMs）如Mamba、V-Mamba、ViM和SiMBA提供了一种替代方案来处理计算机视觉任务中的高分辨率图像。这些SSMs遇到了两个主要问题。首先，当规模扩大到较大的网络尺寸时，它们变得不稳定。其次，尽管它们有效地捕捉图像中的全局信息，但它们本质上在处理局部信息时很困难。为了解决这些挑战，我们引入了Heracles，这是一种创新的SSM，它集成了一个局部SSM、一个全局SSM和一个基于注意力的令牌交互模块。Heracles利用Hartely核基于状态空间模型来捕捉全局图像信息，使用局部卷积网络来处理局部细节，并在更深层次上使用注意机制进行令牌交互。我们的大量实验表明，Heracles-C-small在ImageNet数据集上取得了84.5%的top-1精度，达到了最先进的性能。Heracles-C-Large和Heracles-C-Huge进一步将精度提高到85.9%和86.4%。此外，Heracles在诸如CIFAR-10、CIFAR-100、Oxford Flowers和Stanford Cars等数据集上的迁移学习任务以及在MSCOCO数据集上的实例分割中表现出色。Heracles还通过在七个时间序列数据集上取得最先进的结果展示了其在具有谱数据的不同领域中泛化的能力，捕捉到局部和全局信息。该项目页面可在以下链接找到：https://github.com/badripatro/heracles。

更新时间: 2024-06-03 18:22:30

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.MM

下载: http://arxiv.org/abs/2403.18063v2

Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off

Deep neural classifiers have recently found tremendous success in data-driven control systems. However, existing models suffer from a trade-off between accuracy and adversarial robustness. This limitation must be overcome in the control of safety-critical systems that require both high performance and rigorous robustness guarantees. In this work, we develop classifiers that simultaneously inherit high robustness from robust models and high accuracy from standard models. Specifically, we propose a theoretically motivated formulation that mixes the output probabilities of a standard neural network and a robust neural network. Both base classifiers are pre-trained, and thus our method does not require additional training. Our numerical experiments verify that the mixed classifier noticeably improves the accuracy-robustness trade-off and identify the confidence property of the robust base classifier as the key leverage of this more benign trade-off. Our theoretical results prove that under mild assumptions, when the robustness of the robust base model is certifiable, no alteration or attack within a closed-form $\ell_p$ radius on an input can result in the misclassification of the mixed classifier.

Updated: 2024-06-03 18:18:44

标题: 混合分类器以减轻准确性-鲁棒性权衡

摘要: 最近，深度神经网络分类器在数据驱动控制系统中取得了巨大成功。然而，现有模型存在精度和对抗鲁棒性之间的权衡问题。在需要高性能和严格鲁棒性保证的安全关键系统控制中必须克服这一限制。在这项工作中，我们开发了同时从鲁棒模型和标准模型继承高鲁棒性和高准确性的分类器。具体而言，我们提出了一个在标准神经网络和鲁棒神经网络的输出概率之间混合的理论动机的公式。两个基础分类器都是预训练的，因此我们的方法不需要额外的训练。我们的数值实验验证了混合分类器明显改善了准确性和鲁棒性之间的权衡，并确定了鲁棒基础分类器的置信属性作为这种更温和权衡的关键杠杆。我们的理论结果证明，在温和假设下，当鲁棒基础模型的鲁棒性可证实时，在输入上的封闭形式$\ell_p$半径内的任何修改或攻击都不会导致混合分类器的误分类。

更新时间: 2024-06-03 18:18:44

领域: cs.LG,cs.CV,68T07

下载: http://arxiv.org/abs/2311.15165v2

Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks

Softmax attention is the principle backbone of foundation models for various artificial intelligence applications, yet its quadratic complexity in sequence length can limit its inference throughput in long-context settings. To address this challenge, alternative architectures such as linear attention, State Space Models (SSMs), and Recurrent Neural Networks (RNNs) have been considered as more efficient alternatives. While connections between these approaches exist, such models are commonly developed in isolation and there is a lack of theoretical understanding of the shared principles underpinning these architectures and their subtle differences, greatly influencing performance and scalability. In this paper, we introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation. Our framework facilitates rigorous comparisons, providing new insights on the distinctive characteristics of each model class. For instance, we compare linear attention and selective SSMs, detailing their differences and conditions under which both are equivalent. We also provide principled comparisons between softmax attention and other model classes, discussing the theoretical conditions under which softmax attention can be approximated. Additionally, we substantiate these new insights with empirical validations and mathematical arguments. This shows the DSF's potential to guide the systematic development of future more efficient and scalable foundation models.

Updated: 2024-06-03 18:18:33

标题: 理解基础模型的差异：注意力机制、状态空间模型和循环神经网络

摘要: Softmax attention是各种人工智能应用基础模型的主要支柱，然而在序列长度方面的二次复杂性可能会限制其在长上下文设置中的推断吞吐量。为了解决这一挑战，已经考虑了替代架构，如线性注意力、状态空间模型（SSMs）和循环神经网络（RNNs）作为更有效的替代方案。虽然这些方法之间存在联系，但这些模型通常是在孤立环境中开发的，并且缺乏理解支持这些架构及其微妙差异的共享原则的理论基础，这极大地影响了性能和可扩展性。在本文中，我们介绍了动态系统框架（DSF），它允许在一个共同的表示中对所有这些架构进行原则性的调查。我们的框架促进了严格的比较，提供了关于每个模型类别独特特征的新见解。例如，我们比较了线性注意力和选择性SSMs，详细说明了它们的差异以及在哪些条件下它们是等价的。我们还对softmax attention和其他模型类别进行了原则性比较，讨论了softmax attention可以近似的理论条件。此外，我们用实证验证和数学论证来证实这些新见解。这显示了DSF引导未来更高效和可扩展基础模型系统性发展的潜力。

更新时间: 2024-06-03 18:18:33

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.15731v2

Conditional Denoising Diffusion Probabilistic Models for Data Reconstruction Enhancement in Wireless Communications

In this paper, conditional denoising diffusion probabilistic models (DDPMs) are proposed to enhance the data transmission and reconstruction over wireless channels. The underlying mechanism of DDPM is to decompose the data generation process over the so-called "denoising" steps. Inspired by this, the key idea is to leverage the generative prior of diffusion models in learning a "noisy-to-clean" transformation of the information signal to help enhance data reconstruction. The proposed scheme could be beneficial for communication scenarios in which a prior knowledge of the information content is available, e.g., in multimedia transmission. Hence, instead of employing complicated channel codes that reduce the information rate, one can exploit diffusion priors for reliable data reconstruction, especially under extreme channel conditions due to low signal-to-noise ratio (SNR), or hardware-impaired communications. The proposed DDPM-assisted receiver is tailored for the scenario of wireless image transmission using MNIST dataset. Our numerical results highlight the reconstruction performance of our scheme compared to the conventional digital communication, as well as the deep neural network (DNN)-based benchmark. It is also shown that more than 10 dB improvement in the reconstruction could be achieved in low SNR regimes, without the need to reduce the information rate for error correction.

Updated: 2024-06-03 18:17:01

标题: 无线通信中用于数据重建增强的条件去噪扩散概率模型

摘要: 本文提出了一种条件去噪扩散概率模型（DDPMs），用于增强无线信道上的数据传输和重建。DDPM的基本机制是将数据生成过程分解为所谓的“去噪”步骤。受此启发，关键思想是利用扩散模型的生成先验，在学习信息信号的“嘈杂到清晰”转换中帮助增强数据重建。所提出的方案对于通信场景可能是有益的，在这些场景中信息内容的先验知识是可用的，例如在多媒体传输中。因此，与使用降低信息速率的复杂信道编码不同，人们可以利用扩散先验进行可靠的数据重建，特别是在由于低信噪比（SNR）或硬件损坏导致的极端信道条件下。所提出的DDPM辅助接收器专为使用MNIST数据集进行无线图像传输的情景而设计。我们的数值结果突出了我们方案相对于传统数字通信以及基于深度神经网络（DNN）的基准的重建性能。还表明，在低SNR区域可以实现超过10 dB的重建改进，无需降低信息速率进行错误纠正。

更新时间: 2024-06-03 18:17:01

领域: cs.IT,cs.AI,cs.LG,math.IT

下载: http://arxiv.org/abs/2310.19460v2

Model for Peanuts: Hijacking ML Models without Training Access is Possible

The massive deployment of Machine Learning (ML) models has been accompanied by the emergence of several attacks that threaten their trustworthiness and raise ethical and societal concerns such as invasion of privacy, discrimination risks, and lack of accountability. Model hijacking is one of these attacks, where the adversary aims to hijack a victim model to execute a different task than its original one. Model hijacking can cause accountability and security risks since a hijacked model owner can be framed for having their model offering illegal or unethical services. Prior state-of-the-art works consider model hijacking as a training time attack, whereby an adversary requires access to the ML model training to execute their attack. In this paper, we consider a stronger threat model where the attacker has no access to the training phase of the victim model. Our intuition is that ML models, typically over-parameterized, might (unintentionally) learn more than the intended task for they are trained. We propose a simple approach for model hijacking at inference time named SnatchML to classify unknown input samples using distance measures in the latent space of the victim model to previously known samples associated with the hijacking task classes. SnatchML empirically shows that benign pre-trained models can execute tasks that are semantically related to the initial task. Surprisingly, this can be true even for hijacking tasks unrelated to the original task. We also explore different methods to mitigate this risk. We first propose a novel approach we call meta-unlearning, designed to help the model unlearn a potentially malicious task while training on the original task dataset. We also provide insights on over-parameterization as one possible inherent factor that makes model hijacking easier, and we accordingly propose a compression-based countermeasure against this attack.

Updated: 2024-06-03 18:04:37

标题: "花生模型：即使没有训练访问权限也可以劫持机器学习模型"

摘要: 机器学习模型的大规模部署伴随着几种威胁其可信度的攻击的出现，引发了伦理和社会问题，如侵犯隐私、歧视风险和缺乏问责制。模型劫持就是其中一种攻击，攻击者旨在劫持受害模型以执行不同于其原始任务的任务。模型劫持可能导致问责和安全风险，因为被劫持的模型所有者可能被指责为提供非法或不道德的服务。先前的最先进作品将模型劫持视为一种训练时攻击，即攻击者需要访问ML模型的训练阶段才能执行他们的攻击。本文考虑了一个更强的威胁模型，即攻击者无法访问受害模型的训练阶段。我们的直觉是，通常过度参数化的ML模型可能会（无意中）学习超出其训练目的的内容。我们提出了一种简单的在推断时进行模型劫持的方法，命名为SnatchML，该方法使用潜在空间中与劫持任务类别相关的先前已知样本的距离度量来对未知输入样本进行分类。SnatchML在实证中显示，良性预训练模型可以执行与初始任务语义相关的任务。令人惊讶的是，即使是与原始任务无关的劫持任务，这也是可能的。我们还探讨了不同方法来减轻这种风险。我们首先提出了一种我们称之为元遗忘的新方法，旨在帮助模型在原始任务数据集上训练的同时遗忘潜在的恶意任务。我们还提供了有关过度参数化作为使模型劫持变得更容易的可能固有因素的见解，并相应提出了一种基于压缩的对抗措施。

更新时间: 2024-06-03 18:04:37

领域: cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.01708v1

Demystifying Platform Requirements for Diverse LLM Inference Use Cases

Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. However, deploying these parameter-heavy models efficiently for diverse inference use cases requires carefully designed hardware platforms with ample computing, memory, and network resources. With LLM deployment scenarios and models evolving at breakneck speed, the hardware requirements to meet SLOs remains an open research question. In this work, we present an analytical tool, GenZ, to study the relationship between LLM inference performance and various platform design parameters. Our analysis provides insights into configuring platforms for different LLM workloads and use cases. We quantify the platform requirements to support SOTA LLMs models like LLaMA and GPT-4 under diverse serving settings. Furthermore, we project the hardware capabilities needed to enable future LLMs potentially exceeding hundreds of trillions of parameters. The trends and insights derived from GenZ can guide AI engineers deploying LLMs as well as computer architects designing next-generation hardware accelerators and platforms. Ultimately, this work sheds light on the platform design considerations for unlocking the full potential of large language models across a spectrum of applications. The source code is available at https://github.com/abhibambhaniya/GenZ-LLM-Analyzer .

Updated: 2024-06-03 18:00:50

标题: 揭秘各种LLM推理用例的平台要求

摘要: 大型语言模型（LLMs）在各种应用中表现出色，通常优于人类专家。然而，为了有效地部署这些参数庞大的模型以应对各种推理用例，需要设计精心的硬件平台，具备充足的计算、内存和网络资源。随着LLM部署场景和模型飞速发展，满足SLOs的硬件要求仍然是一个开放的研究问题。在这项工作中，我们提出了一种分析工具GenZ，用于研究LLM推理性能与各种平台设计参数之间的关系。我们的分析为配置不同LLM工作负载和用例的平台提供了洞察。我们量化了支持像LLaMA和GPT-4这样的SOTA LLMs模型在不同服务设置下的平台要求。此外，我们预测了未来可能超过数百万亿参数的LLMs所需的硬件能力。从GenZ得出的趋势和见解可以指导AI工程师部署LLMs，同时也可以指导计算机架构师设计下一代硬件加速器和平台。最终，这项工作阐明了解锁大型语言模型在各种应用中充分潜力的平台设计考虑。源代码可在https://github.com/abhibambhaniya/GenZ-LLM-Analyzer 上找到。

更新时间: 2024-06-03 18:00:50

领域: cs.AR,cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2406.01698v1

An efficient solution to Hidden Markov Models on trees with coupled branches

Hidden Markov Models (HMMs) are powerful tools for modeling sequential data, where the underlying states evolve in a stochastic manner and are only indirectly observable. Traditional HMM approaches are well-established for linear sequences, and have been extended to other structures such as trees. In this paper, we extend the framework of HMMs on trees to address scenarios where the tree-like structure of the data includes coupled branches -- a common feature in biological systems where entities within the same lineage exhibit dependent characteristics. We develop a dynamic programming algorithm that efficiently solves the likelihood, decoding, and parameter learning problems for tree-based HMMs with coupled branches. Our approach scales polynomially with the number of states and nodes, making it computationally feasible for a wide range of applications and does not suffer from the underflow problem. We demonstrate our algorithm by applying it to simulated data and propose self-consistency checks for validating the assumptions of the model used for inference. This work not only advances the theoretical understanding of HMMs on trees but also provides a practical tool for analyzing complex biological data where dependencies between branches cannot be ignored.

Updated: 2024-06-03 18:00:00

标题: 一种有效的解决方案：在具有耦合分支的树上的隐马尔可夫模型

摘要: 隐马尔可夫模型（HMMs）是用于建模序列数据的强大工具，其中潜在状态以随机方式演变，并且仅间接可观察到。传统的HMM方法已经在线性序列上得到很好的应用，并已扩展到其他结构，如树形结构。在本文中，我们将HMMs在树形结构上的框架扩展到解决数据中包含耦合分支的情形 -- 这是生物系统中的常见特征，其中同一谱系内的实体表现出相关特征。我们开发了一个动态规划算法，有效解决了具有耦合分支的基于树的HMMs的可能性、解码和参数学习问题。我们的方法与状态和节点的数量呈多项式比例，使其在广泛的应用中具有计算可行性，并且不受下溢问题的影响。我们通过将算法应用于模拟数据来展示我们的算法，并提出自洽检查来验证用于推断的模型的假设。这项工作不仅推进了对树形HMMs的理论理解，还为分析复杂的生物数据提供了实用工具，在这些数据中，分支之间的依赖关系是不可忽视的。

更新时间: 2024-06-03 18:00:00

领域: stat.ML,cs.LG,eess.SP,q-bio.QM,stat.ME

下载: http://arxiv.org/abs/2406.01663v1

Few-Shot Classification of Interactive Activities of Daily Living (InteractADL)

Understanding Activities of Daily Living (ADLs) is a crucial step for different applications including assistive robots, smart homes, and healthcare. However, to date, few benchmarks and methods have focused on complex ADLs, especially those involving multi-person interactions in home environments. In this paper, we propose a new dataset and benchmark, InteractADL, for understanding complex ADLs that involve interaction between humans (and objects). Furthermore, complex ADLs occurring in home environments comprise a challenging long-tailed distribution due to the rarity of multi-person interactions, and pose fine-grained visual recognition tasks due to the presence of semantically and visually similar classes. To address these issues, we propose a novel method for fine-grained few-shot video classification called Name Tuning that enables greater semantic separability by learning optimal class name vectors. We show that Name Tuning can be combined with existing prompt tuning strategies to learn the entire input text (rather than only learning the prompt or class names) and demonstrate improved performance for few-shot classification on InteractADL and 4 other fine-grained visual classification benchmarks. For transparency and reproducibility, we release our code at https://github.com/zanedurante/vlm_benchmark.

Updated: 2024-06-03 17:59:55

标题: 少样本分类的日常生活互动活动（InteractADL）

摘要: 理解日常生活活动（ADLs）是包括辅助机器人、智能家居和医疗在内的各种应用的关键步骤。然而，迄今为止，很少有基准和方法专注于复杂的ADLs，特别是涉及家庭环境中多人互动的情况。在本文中，我们提出了一个新的数据集和基准，InteractADL，用于理解涉及人类（和物体）互动的复杂ADLs。此外，发生在家庭环境中的复杂ADLs由于多人互动的罕见性构成具有挑战性的长尾分布，并且由于语义上和视觉上相似类别的存在而提出了细粒度的视觉识别任务。为了解决这些问题，我们提出了一种新颖的细粒度少样本视频分类方法，称为Name Tuning，通过学习最佳类名向量实现更大的语义可分离性。我们展示了Name Tuning可以与现有的提示调整策略相结合，学习整个输入文本（而不仅仅学习提示或类名），并展示了在InteractADL和其他4个细粒度视觉分类基准上进行少样本分类时的性能改进。为了透明和可重现性，我们在https://github.com/zanedurante/vlm_benchmark上发布了我们的代码。

更新时间: 2024-06-03 17:59:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.01662v1

DiffUHaul: A Training-Free Method for Object Dragging in Images

Text-to-image diffusion models have proven effective for solving many image editing tasks. However, the seemingly straightforward task of seamlessly relocating objects within a scene remains surprisingly challenging. Existing methods addressing this problem often struggle to function reliably in real-world scenarios due to lacking spatial reasoning. In this work, we propose a training-free method, dubbed DiffUHaul, that harnesses the spatial understanding of a localized text-to-image model, for the object dragging task. Blindly manipulating layout inputs of the localized model tends to cause low editing performance due to the intrinsic entanglement of object representation in the model. To this end, we first apply attention masking in each denoising step to make the generation more disentangled across different objects and adopt the self-attention sharing mechanism to preserve the high-level object appearance. Furthermore, we propose a new diffusion anchoring technique: in the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance; in the later denoising steps, we pass the localized features from the source images to the interpolated images to retain fine-grained object details. To adapt DiffUHaul to real-image editing, we apply a DDPM self-attention bucketing that can better reconstruct real images with the localized model. Finally, we introduce an automated evaluation pipeline for this task and showcase the efficacy of our method. Our results are reinforced through a user preference study.

Updated: 2024-06-03 17:59:53

标题: DiffUHaul：一种用于图像中对象拖动的无需训练的方法

摘要: 文本到图像扩散模型已被证明在解决许多图像编辑任务中非常有效。然而，看似简单的场景内物体无缝重定位任务却仍然具有挑战性。现有方法解决这一问题通常由于缺乏空间推理能力而难以在现实世界场景中可靠运行。在这项工作中，我们提出一种名为DiffUHaul的无需训练的方法，利用局部文本到图像模型的空间理解能力来进行物体拖动任务。盲目操作局部模型的布局输入往往会导致低编辑性能，这是因为模型中物体表示的固有纠缠。为此，我们首先在每个去噪步骤中应用注意力掩蔽，使生成在不同物体之间更加解耦，并采用自注意力共享机制来保留高级物体外观。此外，我们提出一种新的扩散锚定技术：在早期去噪步骤中，我们在源图像和目标图像之间插值注意力特征，以平滑地融合新的布局和原始外观；在后续的去噪步骤中，我们将来自源图像的局部特征传递给插值图像，以保留细粒度的物体细节。为了将DiffUHaul适应于真实图像编辑，我们应用了一种DDPM自注意力分桶，可以更好地用局部模型重建真实图像。最后，我们为这一任务引入了一个自动化评估流程，并展示了我们方法的有效性。我们的结果通过用户偏好研究得到了验证。

更新时间: 2024-06-03 17:59:53

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2406.01594v1

Text-guided Controllable Mesh Refinement for Interactive 3D Modeling

We propose a novel technique for adding geometric details to an input coarse 3D mesh guided by a text prompt. Our method is composed of three stages. First, we generate a single-view RGB image conditioned on the input coarse geometry and the input text prompt. This single-view image generation step allows the user to pre-visualize the result and offers stronger conditioning for subsequent multi-view generation. Second, we use our novel multi-view normal generation architecture to jointly generate six different views of the normal images. The joint view generation reduces inconsistencies and leads to sharper details. Third, we optimize our mesh with respect to all views and generate a fine, detailed geometry as output. The resulting method produces an output within seconds and offers explicit user control over the coarse structure, pose, and desired details of the resulting 3D mesh. Project page: https://text-mesh-refinement.github.io.

Updated: 2024-06-03 17:59:43

标题: 文本引导的可控网格细化用于交互式3D建模

摘要: 我们提出了一种新颖的技术，通过文本提示指导，在输入的粗略3D网格中添加几何细节。我们的方法由三个阶段组成。首先，我们生成一个单视图RGB图像，该图像以输入的粗略几何和输入的文本提示为条件。这一单视图图像生成步骤允许用户预览结果，并为后续的多视图生成提供更强的条件。其次，我们使用我们的新颖的多视图法线生成架构来联合生成六个不同视图的法线图像。联合视图生成减少了不一致性，并产生更清晰的细节。第三，我们针对所有视图优化我们的网格，并生成一个精细、详细的几何作为输出。由此产生的方法在几秒钟内产生输出，并为用户提供对结果3D网格的粗略结构、姿态和所需细节的明确控制。项目页面：https://text-mesh-refinement.github.io.

更新时间: 2024-06-03 17:59:43

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2406.01592v1

Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks

A wide range of empirical and theoretical works have shown that overparameterisation can amplify the performance of neural networks. According to the lottery ticket hypothesis, overparameterised networks have an increased chance of containing a sub-network that is well-initialised to solve the task at hand. A more parsimonious approach, inspired by animal learning, consists in guiding the learner towards solving the task by curating the order of the examples, i.e. providing a curriculum. However, this learning strategy seems to be hardly beneficial in deep learning applications. In this work, we undertake an analytical study that connects curriculum learning and overparameterisation. In particular, we investigate their interplay in the online learning setting for a 2-layer network in the XOR-like Gaussian Mixture problem. Our results show that a high degree of overparameterisation -- while simplifying the problem -- can limit the benefit from curricula, providing a theoretical account of the ineffectiveness of curricula in deep learning.

Updated: 2024-06-03 17:59:33

标题: 将抽奖机会倾斜：神经网络中超参数化和课程设置的相互作用

摘要: 许多经验和理论作品表明，过度参数化可以增强神经网络的性能。根据“中奖彩票”假设，过度参数化的网络有更高的机会包含一个经过良好初始化的子网络，以解决手头的任务。受动物学习启发，更简洁的方法是通过策划示例的顺序，即提供一个课程，来引导学习者解决任务。然而，这种学习策略在深度学习应用中似乎很少有益。在这项工作中，我们进行了一项分析研究，将课程学习和过度参数化联系起来。具体而言，我们研究了它们在XOR样式的高斯混合问题的在线学习设置中的相互作用。我们的结果表明，高度过度参数化虽然简化了问题，但可以限制来自课程的好处，从而提供了一个解释课程在深度学习中无效的理论解释。

更新时间: 2024-06-03 17:59:33

领域: stat.ML,cond-mat.dis-nn,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2406.01589v1

nn2poly: An R Package for Converting Neural Networks into Interpretable Polynomials

The nn2poly package provides the implementation in R of the NN2Poly method to explain and interpret feed-forward neural networks by means of polynomial representations that predict in an equivalent manner as the original network.Through the obtained polynomial coefficients, the effect and importance of each variable and their interactions on the output can be represented. This capabiltiy of capturing interactions is a key aspect usually missing from most Explainable Artificial Intelligence (XAI) methods, specially if they rely on expensive computations that can be amplified when used on large neural networks. The package provides integration with the main deep learning framework packages in R (tensorflow and torch), allowing an user-friendly application of the NN2Poly algorithm. Furthermore, nn2poly provides implementation of the required weight constraints to be used during the network training in those same frameworks. Other neural networks packages can also be used by including their weights in list format. Polynomials obtained with nn2poly can also be used to predict with new data or be visualized through its own plot method. Simulations are provided exemplifying the usage of the package alongside with a comparison with other approaches available in R to interpret neural networks.

Updated: 2024-06-03 17:59:30

标题: nn2poly：一个将神经网络转换为可解释多项式的R包

摘要: nn2poly软件包提供了在R中实施NN2Poly方法的方法，通过多项式表示来解释和解释前馈神经网络，该方法与原始网络以等效方式进行预测。通过获得的多项式系数，可以表示每个变量及其相互作用对输出的影响和重要性。捕获相互作用的能力通常是大多数可解释人工智能（XAI）方法中缺失的关键方面，特别是如果它们依赖于昂贵的计算，并且在大型神经网络上使用时可能会被放大。该软件包与R中的主要深度学习框架软件包（tensorflow和torch）集成，允许用户友好地应用NN2Poly算法。此外，nn2poly提供了在这些相同框架中训练网络时使用的必要权重约束的实现。还可以通过包含其权重的列表格式来使用其他神经网络软件包。通过nn2poly获得的多项式也可以用于预测新数据或通过其自己的绘图方法进行可视化。提供模拟示例，展示了该软件包的使用方式，并与R中其他可用方法进行比较，以解释神经网络。

更新时间: 2024-06-03 17:59:30

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.01588v1

ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation

Diffusion models have been verified to be effective in generating complex distributions from natural images to motion trajectories. Recent diffusion-based methods show impressive performance in 3D robotic manipulation tasks, whereas they suffer from severe runtime inefficiency due to multiple denoising steps, especially with high-dimensional observations. To this end, we propose a real-time robotic manipulation model named ManiCM that imposes the consistency constraint to the diffusion process, so that the model can generate robot actions in only one-step inference. Specifically, we formulate a consistent diffusion process in the robot action space conditioned on the point cloud input, where the original action is required to be directly denoised from any point along the ODE trajectory. To model this process, we design a consistency distillation technique to predict the action sample directly instead of predicting the noise within the vision community for fast convergence in the low-dimensional action manifold. We evaluate ManiCM on 31 robotic manipulation tasks from Adroit and Metaworld, and the results demonstrate that our approach accelerates the state-of-the-art method by 10 times in average inference speed while maintaining competitive average success rate.

Updated: 2024-06-03 17:59:23

标题: ManiCM：基于一致性模型的机器人操作的实时3D扩散策略

摘要: 扩散模型已被验证为从自然图像到运动轨迹生成复杂分布的有效方法。最近基于扩散的方法在3D机器人操作任务中表现出了令人印象深刻的性能，然而由于多个去噪步骤，特别是在高维观测中，它们遭受严重的运行时效率低下的问题。为此，我们提出了一个名为ManiCM的实时机器人操作模型，该模型对扩散过程施加了一致性约束，使得模型可以在仅一步推断中生成机器人动作。具体来说，我们在点云输入条件下在机器人动作空间中制定了一致的扩散过程，其中原始动作需要直接从ODE轨迹上的任何点去噪。为了建模这一过程，我们设计了一种一致性蒸馏技术，用于在视觉社区中预测动作样本而不是预测噪音，以便在低维动作流形中实现快速收敛。我们在Adroit和Metaworld的31个机器人操作任务上评估了ManiCM，结果表明我们的方法在平均推断速度上将现有技术方法加速了10倍，同时保持了竞争性的平均成功率。

更新时间: 2024-06-03 17:59:23

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.01586v1

Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP

Recent works have explored how individual components of the CLIP-ViT model contribute to the final representation by leveraging the shared image-text representation space of CLIP. These components, such as attention heads and MLPs, have been shown to capture distinct image features like shape, color or texture. However, understanding the role of these components in arbitrary vision transformers (ViTs) is challenging. To this end, we introduce a general framework which can identify the roles of various components in ViTs beyond CLIP. Specifically, we (a) automate the decomposition of the final representation into contributions from different model components, and (b) linearly map these contributions to CLIP space to interpret them via text. Additionally, we introduce a novel scoring function to rank components by their importance with respect to specific features. Applying our framework to various ViT variants (e.g. DeiT, DINO, DINOv2, Swin, MaxViT), we gain insights into the roles of different components concerning particular image features.These insights facilitate applications such as image retrieval using text descriptions or reference images, visualizing token importance heatmaps, and mitigating spurious correlations.

Updated: 2024-06-03 17:58:43

标题: 通过文本在超越CLIP的ViTs中分解和解释图像表示

摘要: 最近的研究探讨了CLIP-ViT模型的个别组件如何通过利用CLIP的共享图像-文本表示空间对最终表示产生影响。这些组件，如注意力头和MLP，已被证明能够捕捉到不同的图像特征，如形状、颜色或纹理。然而，理解这些组件在任意视觉Transformer（ViTs）中的作用是具有挑战性的。为此，我们引入了一个通用框架，可以在CLIP之外的ViTs中识别各种组件的作用。具体地，我们（a）自动将最终表示分解为来自不同模型组件的贡献，并（b）线性映射这些贡献到CLIP空间，以通过文本来解释它们。此外，我们引入了一种新颖的评分函数，可以根据特定特征的重要性对组件进行排名。将我们的框架应用于各种ViT变体（如DeiT、DINO、DINOv2、Swin、MaxViT），我们深入了解了不同组件在特定图像特征方面的作用。这些见解有助于应用，比如使用文本描述或参考图像进行图像检索，可视化令牌重要性热图，以及减少虚假相关性。

更新时间: 2024-06-03 17:58:43

领域: cs.CV,cs.LG,I.5.1

下载: http://arxiv.org/abs/2406.01583v1

GIFT: Generative Interpretable Fine-Tuning

We present Generative Interpretable Fine-Tuning (GIFT) for parameter-efficient fine-tuning of pretrained Transformer backbones, which can be formulated as a simple factorized matrix multiplication in the parameter space or equivalently in the activation space, and thus embraces built-in interpretability. For a pretrained layer with weights $\omega\in \mathbb{R}^{d_{out}\times d_{in}}$, our proposed GIFT learns the fine-tuned weights $\hat{\omega}$ directly from $\omega$ as $\hat{\omega}=\omega \cdot (\mathbb{I}+\phi_{d_{in}\times r}\cdot \psi_{r\times d_{in}})$ where $\mathbb{I}$ is an identity matrix. $\Theta=(\phi, \psi)$ are the learnable parameters of the two linear layers of GIFT with $r$ being a hyper-parameter. $\Theta$ is shared by all the layers selected for fine-tuning, resulting in significantly fewer trainable parameters compared to Low-Rank Adaptation (LoRA). We perform comprehensive evaluations on natural language tasks (commonsense reasoning and sequence classification) and computer vision tasks (visual fine-grained classification). We obtain the best accuracy and parameter efficiency among baselines both on the Commonsense170k reasoning benchmark using LLaMA-1 (7B) and Llama-2 (7B)/-3 (8B) and on the FGVC and VTAB visual recognition benchmarks using ImageNet-21k pretrained Vision Transformer (ViT-B/16). Notably, we obtain 5.9% absolute increase in average accuracy with 53.8 times reduction of parameters on Commonsense170k using Llama-3 (8B) compared to LoRA. We obtain performance comparable to LoRA on the GLUE benchmark but with significantly fewer parameters using RoBERTa-Base/Large. We show the output of the first linear layer (i.e., $\omega\cdot \phi$) is surprisingly interpretable, which can play the role of a token-clustering head as a by-product to localize meaningful objects/parts in images for computer vision tasks. Our code is publicly available.

Updated: 2024-06-03 17:57:39

标题: GIFT：生成可解释的微调

摘要: 我们提出了用于预训练Transformer骨干参数高效微调的生成可解释微调（GIFT）方法，可以在参数空间或激活空间中形式化为简单的分解矩阵乘法，因此具有内置可解释性。对于具有权重$\omega\in \mathbb{R}^{d_{out}\times d_{in}}$的预训练层，我们提出的GIFT直接从$\omega$中学习微调后的权重$\hat{\omega}$，即$\hat{\omega}=\omega \cdot (\mathbb{I}+\phi_{d_{in}\times r}\cdot \psi_{r\times d_{in}})$，其中$\mathbb{I}$是单位矩阵。$\Theta=(\phi, \psi)$是GIFT的两个线性层的可学习参数，其中$r$是一个超参数。$\Theta$由所有选择进行微调的层共享，与Low-Rank Adaptation（LoRA）相比，可训练参数显著减少。我们对自然语言任务（常识推理和序列分类）和计算机视觉任务（视觉细粒度分类）进行了全面评估。在Commonsense170k推理基准测试中，使用LLaMA-1（7B）和LLaMA-2（7B）/LLaMA-3（8B），以及使用ImageNet-21k预训练Vision Transformer（ViT-B/16）在FGVC和VTAB视觉识别基准测试中，我们在基线中获得了最佳的准确性和参数效率。值得注意的是，在Commonsense170k上，与LoRA相比，使用Llama-3（8B）获得了5.9%绝对准确度增加，并且参数减少了53.8倍。在GLUE基准测试中，我们使用RoBERTa-Base/Large获得了与LoRA相媲美但具有显著更少参数的性能。我们展示了第一个线性层的输出（即$\omega\cdot \phi$）是令人惊讶的可解释性，可以作为一个副产品，用于定位图像中有意义的对象/部分的令牌聚类头，用于计算机视觉任务。我们的代码是公开可用的。

更新时间: 2024-06-03 17:57:39

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.00700v2

Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series

Large pre-trained models excel in zero/few-shot learning for language and vision tasks but face challenges in multivariate time series (TS) forecasting due to diverse data characteristics. Consequently, recent research efforts have focused on developing pre-trained TS forecasting models. These models, whether built from scratch or adapted from large language models (LLMs), excel in zero/few-shot forecasting tasks. However, they are limited by slow performance, high computational demands, and neglect of cross-channel and exogenous correlations. To address this, we introduce Tiny Time Mixers (TTM), a compact model (starting from 1M parameters) with effective transfer learning capabilities, trained exclusively on public TS datasets. TTM, based on the light-weight TSMixer architecture, incorporates innovations like adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle pre-training on varied dataset resolutions with minimal model capacity. Additionally, it employs multi-level modeling to capture channel correlations and infuse exogenous signals during fine-tuning. TTM outperforms existing popular benchmarks in zero/few-shot forecasting by (4-40\%), while reducing computational requirements significantly. Moreover, TTMs are lightweight and can be executed even on CPU-only machines, enhancing usability and fostering wider adoption in resource-constrained environments. Model weights for our initial variant (TTM-Q) are available at https://huggingface.co/ibm-granite/granite-timeseries-ttm-v1. Model weights for more sophisticated variants (TTM-B, TTM-E, and TTM-A) will be shared soon. The source code for TTM can be accessed at https://github.com/ibm-granite/granite-tsfm/tree/main/tsfm_public/models/tinytimemixer.

Updated: 2024-06-03 17:57:22

标题: 微小时间混合器（TTMs）：用于增强多变量时间序列零/少样本预测的快速预训练模型

摘要: 大型预训练模型在语言和视觉任务的零/少样本学习中表现出色，但在多变量时间序列（TS）预测中面临挑战，因为数据特征多样。因此，最近的研究工作集中在开发预训练TS预测模型上。这些模型，无论是从头开始构建还是从大型语言模型（LLMs）进行调整，都在零/少样本预测任务中表现出色。然而，它们受限于性能较慢、计算需求高和忽略跨通道和外生相关性。为了解决这个问题，我们引入了Tiny Time Mixers（TTM），这是一个紧凑的模型（从1M参数开始），具有有效的迁移学习能力，专门在公共TS数据集上进行训练。TTM基于轻量级TSMixer架构，结合了自适应补丁、不同分辨率采样和分辨率前缀调整等创新，以处理在最小模型容量下对各种数据集分辨率进行预训练。此外，它采用多级建模来捕捉通道相关性，并在微调过程中注入外生信号。TTM在零/少样本预测中优于现有流行基准（4-40\%），同时显著减少了计算需求。此外，TTM轻巧且甚至可以在仅CPU机器上执行，增强了可用性，并促进了在资源受限环境中更广泛的采用。我们初始变体（TTM-Q）的模型权重可在https://huggingface.co/ibm-granite/granite-timeseries-ttm-v1获取。更复杂的变体（TTM-B、TTM-E和TTM-A）的模型权重将很快分享。TTM的源代码可在https://github.com/ibm-granite/granite-tsfm/tree/main/tsfm_public/models/tinytimemixer上获得。

更新时间: 2024-06-03 17:57:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.03955v6

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

We study the problem of gradient descent learning of a single-index target function $f_*(\boldsymbol{x}) = \textstyle\sigma_*\left(\langle\boldsymbol{x},\boldsymbol{\theta}\rangle\right)$ under isotropic Gaussian data in $\mathbb{R}^d$, where the link function $\sigma_*:\mathbb{R}\to\mathbb{R}$ is an unknown degree $q$ polynomial with information exponent $p$ (defined as the lowest degree in the Hermite expansion). Prior works showed that gradient-based training of neural networks can learn this target with $n\gtrsim d^{\Theta(p)}$ samples, and such statistical complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary polynomial link function with a sample and runtime complexity of $n \asymp T \asymp C(q) \cdot d\mathrm{polylog} d$, where constant $C(q)$ only depends on the degree of $\sigma_*$, regardless of information exponent; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.

Updated: 2024-06-03 17:56:58

标题: 神经网络利用SGD学习接近信息论极限的低维多项式

摘要: 我们研究了在$\mathbb{R}^d$中各向同性高斯数据下，针对单指数目标函数$f_*(\boldsymbol{x}) = \textstyle\sigma_*\left(\langle\boldsymbol{x},\boldsymbol{\theta}\rangle\right)$的梯度下降学习问题，其中链接函数$\sigma_*:\mathbb{R}\to\mathbb{R}$是一个未知的次数为$q$的多项式，具有信息指数$p$（定义为Hermite展开中的最低次数）。先前的研究表明，基于梯度的神经网络训练可以用$n\gtrsim d^{\Theta(p)}$个样本学习这个目标函数，并且这样的统计复杂性被相关统计查询下界预测为必要的。令人惊讶的是，我们证明了通过基于SGD算法优化的两层神经网络可以学习任意多项式链接函数的$f_*$，其样本和运行时复杂度为$n \asymp T \asymp C(q) \cdot d\mathrm{polylog} d$，其中常数$C(q)$仅取决于$\sigma_*$的次数，而不考虑信息指数；这个维度相关性匹配了信息论极限，直到对数多项式因子。我们分析的核心是在梯度计算中重复使用小批量数据，这产生了超越相关查询的高阶信息。

更新时间: 2024-06-03 17:56:58

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01581v1

Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits

The rapid proliferation of open-source language models significantly increases the risks of downstream backdoor attacks. These backdoors can introduce dangerous behaviours during model deployment and can evade detection by conventional cybersecurity monitoring systems. In this paper, we introduce a novel class of backdoors in autoregressive transformer models, that, in contrast to prior art, are unelicitable in nature. Unelicitability prevents the defender from triggering the backdoor, making it impossible to evaluate or detect ahead of deployment even if given full white-box access and using automated techniques, such as red-teaming or certain formal verification methods. We show that our novel construction is not only unelicitable thanks to using cryptographic techniques, but also has favourable robustness properties. We confirm these properties in empirical investigations, and provide evidence that our backdoors can withstand state-of-the-art mitigation strategies. Additionally, we expand on previous work by showing that our universal backdoors, while not completely undetectable in white-box settings, can be harder to detect than some existing designs. By demonstrating the feasibility of seamlessly integrating backdoors into transformer models, this paper fundamentally questions the efficacy of pre-deployment detection strategies. This offers new insights into the offence-defence balance in AI safety and security.

Updated: 2024-06-03 17:55:41

标题: 通过密码变换器电路在语言模型中无法引诱的后门

摘要: 开源语言模型的快速增长显著增加了下游后门攻击的风险。这些后门可以在模型部署过程中引入危险行为，并且可以逃避传统网络安全监控系统的检测。本文介绍了一类新型的自回归变压器模型中的后门，与先前的技术相比，这些后门是不可引诱的。不可引诱性阻止了防御者触发后门，使得即使拥有完全的白盒访问权限并使用自动化技术（如红队测试或某些正式验证方法），也无法在部署前进行评估或检测。我们展示了我们的新型构造不仅通过使用加密技术具有不可引诱性，而且具有有利的鲁棒性属性。我们在实证调查中证实了这些属性，并提供证据表明我们的后门可以抵御最先进的缓解策略。此外，我们通过展示我们的通用后门，虽然在白盒设置中不完全不可检测，但比一些现有设计更难以检测。通过展示将后门无缝集成到变压器模型中的可行性，本文从根本上质疑了预部署检测策略的有效性。这为AI安全和安全性中的攻防平衡提供了新的见解。

更新时间: 2024-06-03 17:55:41

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.02619v1

A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization

Learning to sample from intractable distributions over discrete sets without relying on corresponding training data is a central problem in a wide range of fields, including Combinatorial Optimization. Currently, popular deep learning-based approaches rely primarily on generative models that yield exact sample likelihoods. This work introduces a method that lifts this restriction and opens the possibility to employ highly expressive latent variable models like diffusion models. Our approach is conceptually based on a loss that upper bounds the reverse Kullback-Leibler divergence and evades the requirement of exact sample likelihoods. We experimentally validate our approach in data-free Combinatorial Optimization and demonstrate that our method achieves a new state-of-the-art on a wide range of benchmark problems.

Updated: 2024-06-03 17:55:02

标题: 一个无监督神经组合优化的扩散模型框架

摘要: 学习从不可解分布中对离散集进行采样，而无需依赖相应的训练数据，是许多领域的一个核心问题，包括组合优化。目前，流行的基于深度学习的方法主要依赖于产生精确样本似然的生成模型。本研究介绍了一种方法，可以解除这一限制，开启了利用高度表现力的潜变量模型（如扩散模型）的可能性。我们的方法在概念上基于一个上界约束逆Kullback-Leibler散度的损失，并且避免了精确样本似然的要求。我们在无数据的组合优化中实验验证了我们的方法，并展示了我们的方法在各种基准问题上取得了新的最先进水平。

更新时间: 2024-06-03 17:55:02

领域: cs.LG,cs.AI,cs.DM,stat.ML

下载: http://arxiv.org/abs/2406.01661v1

An Equivalence Between Static and Dynamic Regret Minimization

We study the problem of dynamic regret minimization in online convex optimization, in which the objective is to minimize the difference between the cumulative loss of an algorithm and that of an arbitrary sequence of comparators. While the literature on this topic is very rich, a unifying framework for the analysis and design of these algorithms is still missing. In this paper, \emph{we show that dynamic regret minimization is equivalent to static regret minimization in an extended decision space}. Using this simple observation, we show that there is a frontier of lower bounds trading off penalties due to the variance of the losses and penalties due to variability of the comparator sequence, and provide a framework for achieving any of the guarantees along this frontier. As a result, we prove for the first time that adapting to the squared path-length of an arbitrary sequence of comparators to achieve regret $R_{T}(u_{1},\dots,u_{T})\le O(\sqrt{T\sum_{t} \|u_{t}-u_{t+1}\|^{2}})$ is impossible. However, we prove that it is possible to adapt to a new notion of variability based on the locally-smoothed squared path-length of the comparator sequence, and provide an algorithm guaranteeing dynamic regret of the form $R_{T}(u_{1},\dots,u_{T})\le \tilde O(\sqrt{T\sum_{i}\|\bar u_{i}-\bar u_{i+1}\|^{2}})$. Up to polylogarithmic terms, the new notion of variability is never worse than the classic one involving the path-length.

Updated: 2024-06-03 17:54:58

标题: 静态和动态遗憾最小化之间的等价性

摘要: 我们研究了在线凸优化中动态遗憾最小化的问题，其目标是最小化算法的累积损失与任意一系列比较器的累积损失之间的差异。尽管关于这个主题的文献非常丰富，但仍然缺乏一个统一的分析和设计这些算法的框架。在本文中，我们展示了动态遗憾最小化等价于在扩展决策空间中的静态遗憾最小化。利用这一简单观察，我们展示了一种权衡损失方差和比较器序列变异性所带来的惩罚的下界，同时提供了一个实现这些下界中任意保证的框架。结果，我们首次证明了为了达到遗憾$R_{T}(u_{1},\dots,u_{T})\le O(\sqrt{T\sum_{t} \|u_{t}-u_{t+1}\|^{2}})$，需要适应任意比较器序列的平方路径长度是不可能的。然而，我们证明了可以适应基于比较器序列局部平滑的平方路径长度的新变异性概念，并提供一个算法保证动态遗憾的形式为$R_{T}(u_{1},\dots,u_{T})\le \tilde O(\sqrt{T\sum_{i}\|\bar u_{i}-\bar u_{i+1}\|^{2}})$。除了对数项外，新的变异性概念从未比涉及路径长度的经典概念更糟。

更新时间: 2024-06-03 17:54:58

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01577v1

Stochastic Bilevel Optimization with Lower-Level Contextual Markov Decision Processes

In various applications, the optimal policy in a strategic decision-making problem depends both on the environmental configuration and exogenous events. For these settings, we introduce Bilevel Optimization with Contextual Markov Decision Processes (BO-CMDP), a stochastic bilevel decision-making model, where the lower level consists of solving a contextual Markov Decision Process (CMDP). BO-CMDP can be viewed as a Stackelberg Game where the leader and a random context beyond the leader's control together decide the setup of (many) MDPs that (potentially multiple) followers best respond to. This framework extends beyond traditional bilevel optimization and finds relevance in diverse fields such as model design for MDPs, tax design, reward shaping and dynamic mechanism design. We propose a stochastic Hyper Policy Gradient Descent (HPGD) algorithm to solve BO-CMDP, and demonstrate its convergence. Notably, HPGD only utilizes observations of the followers' trajectories. Therefore, it allows followers to use any training procedure and the leader to be agnostic of the specific algorithm used, which aligns with various real-world scenarios. We further consider the setting when the leader can influence the training of followers and propose an accelerated algorithm. We empirically demonstrate the performance of our algorithm.

Updated: 2024-06-03 17:54:39

标题: 随机双层优化中的下层上下文马尔可夫决策过程

摘要: 在各种应用中，战略决策问题中的最佳策略取决于环境配置和外生事件。对于这些设置，我们引入了具有上下文马尔可夫决策过程（CMDP）的双层优化（BO-CMDP），这是一个随机双层决策模型，其中较低层包括解决上下文马尔可夫决策过程（CMDP）。BO-CMDP可以被视为一种斯塔克贝格博弈，领导者和领导者无法控制的随机上下文共同决定（许多）MDPs的设置，（潜在的多个）跟随者最好地响应。这个框架超出了传统的双层优化，在诸如MDP模型设计、税收设计、奖励塑造和动态机制设计等各个领域都具有相关性。我们提出了一种随机超策略梯度下降（HPGD）算法来解决BO-CMDP，并展示其收敛性。值得注意的是，HPGD仅利用跟随者轨迹的观察。因此，它允许跟随者使用任何训练过程，并且领导者对使用的特定算法是无知的，这与各种现实场景一致。我们进一步考虑了领导者可以影响跟随者训练的设置，并提出了一种加速算法。我们在实证中展示了我们算法的性能。

更新时间: 2024-06-03 17:54:39

领域: math.OC,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01575v1

Self-Improving Robust Preference Optimization

Both online and offline RLHF methods such as PPO and DPO have been extremely successful in aligning AI with human preferences. Despite their success, the existing methods suffer from a fundamental problem that their optimal solution is highly task-dependent (i.e., not robust to out-of-distribution (OOD) tasks). Here we address this challenge by proposing Self-Improving Robust Preference Optimization SRPO, a practical and mathematically principled offline RLHF framework that is completely robust to the changes in the task. The key idea of SRPO is to cast the problem of learning from human preferences as a self-improvement process, which can be mathematically expressed in terms of a min-max objective that aims at joint optimization of self-improvement policy and the generative policy in an adversarial fashion. The solution for this optimization problem is independent of the training task and thus it is robust to its changes. We then show that this objective can be re-expressed in the form of a non-adversarial offline loss which can be optimized using standard supervised optimization techniques at scale without any need for reward model and online inference. We show the effectiveness of SRPO in terms of AI Win-Rate (WR) against human (GOLD) completions. In particular, when SRPO is evaluated on the OOD XSUM dataset, it outperforms the celebrated DPO by a clear margin of 15% after 5 self-revisions, achieving WR of 90%.

Updated: 2024-06-03 17:53:25

标题: 自我改进的健壮偏好优化

摘要: 在线和离线RLHF方法，如PPO和DPO，在将人工智能与人类偏好对齐方面取得了极大的成功。尽管它们取得了成功，但现有方法存在一个基本问题，即它们的最优解高度依赖于任务（即对于分布外任务不具有鲁棒性）。在这里，我们通过提出自我改进的鲁棒偏好优化SRPO来解决这一挑战，这是一个实用且基于数学原理的离线RLHF框架，完全能够适应任务变化。SRPO的关键思想是将从人类偏好学习的问题视为一个自我改进的过程，可以用最小最大目标来数学表达，该目标旨在以对抗方式联合优化自我改进策略和生成策略。这种优化问题的解决方案独立于训练任务，因此对任务的变化具有鲁棒性。然后，我们展示了这个目标可以重新表达为一个非对抗的离线损失形式，可以在规模上使用标准监督优化技术进行优化，而无需奖励模型和在线推断。我们展示了SRPO在AI胜率（WR）方面对人类（GOLD）完成的效果。特别是，当在OOD XSUM数据集上评估SRPO时，它在经过5次自我修订后，以90%的WR明显领先于著名的DPO，超过15%。

更新时间: 2024-06-03 17:53:25

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.01660v1

MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations

We introduce MIM (Masked Image Modeling)-Refiner, a contrastive learning boost for pre-trained MIM models. MIM-Refiner is motivated by the insight that strong representations within MIM models generally reside in intermediate layers. Accordingly, MIM-Refiner leverages multiple contrastive heads that are connected to different intermediate layers. In each head, a modified nearest neighbor objective constructs semantic clusters that capture semantic information which improves performance on downstream tasks, including off-the-shelf and fine-tuning settings. The refinement process is short and simple - yet highly effective. Within a few epochs, we refine the features of MIM models from subpar to state-of-the-art, off-the-shelf features. Refining a ViT-H, pre-trained with data2vec 2.0 on ImageNet-1K, sets a new state-of-the-art in linear probing (84.7%) and low-shot classification among models that are pre-trained on ImageNet-1K. At ImageNet-1K 1-shot classification, MIM-Refiner advances the state-of-the-art to 64.2%, outperforming larger models that were trained on up to 2000 times more data such as DINOv2-g, OpenCLIP-G and MAWS-6.5B.

Updated: 2024-06-03 17:51:58

标题: MIM-Refiner：中间预训练表示的对比学习增强

摘要: 我们介绍了MIM（Masked Image Modeling）-Refiner，这是对预训练MIM模型的对比学习增强。MIM-Refiner的动机在于认识到MIM模型中强大的表示通常存在于中间层。因此，MIM-Refiner利用连接到不同中间层的多个对比头。在每个头中，修改后的最近邻目标构建语义集群，捕获语义信息，从而提高下游任务的性能，包括现成和微调设置。这个精炼过程短小简单，但非常有效。在几个时代内，我们将MIM模型的特征从次优改进到现成的特征。在ImageNet-1K上用data2vec 2.0预训练的ViT-H的精炼，为线性探测（84.7%）和低样本分类设立了一个新的现成标准，超越了在ImageNet-1K上预训练的模型。在ImageNet-1K 1-shot分类中，MIM-Refiner将现成标准提升到64.2%，胜过了使用多达2000倍数据进行训练的更大模型，如DINOv2-g、OpenCLIP-G和MAWS-6.5B。

更新时间: 2024-06-03 17:51:58

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.10093v2

Unlocking Guidance for Discrete State-Space Diffusion and Flow Models

Generative models on discrete state-spaces have a wide range of potential applications, particularly in the domain of natural sciences. In continuous state-spaces, controllable and flexible generation of samples with desired properties has been realized using guidance on diffusion and flow models. However, these guidance approaches are not readily amenable to discrete state-space models. Consequently, we introduce a general and principled method for applying guidance on such models. Our method depends on leveraging continuous-time Markov processes on discrete state-spaces, which unlocks computational tractability for sampling from a desired guided distribution. We demonstrate the utility of our approach, Discrete Guidance, on a range of applications including guided generation of images, small-molecules, DNA sequences and protein sequences.

Updated: 2024-06-03 17:51:54

标题: 解锁离散状态空间扩散和流模型的指导原则

摘要: Generative models based on discrete state-spaces have a wide range of potential applications, especially in the natural sciences. While controllable and flexible generation of samples with desired properties has been achieved in continuous state-spaces using guidance on diffusion and flow models, these approaches are not easily applicable to discrete state-space models. Therefore, we propose a general and principled method for applying guidance to such models. Our method relies on leveraging continuous-time Markov processes on discrete state-spaces, which enables efficient sampling from a desired guided distribution. We demonstrate the effectiveness of our approach, Discrete Guidance, in various applications such as guided generation of images, small molecules, DNA sequences, and protein sequences.

更新时间: 2024-06-03 17:51:54

领域: cs.LG

下载: http://arxiv.org/abs/2406.01572v1

Single Trajectory Conformal Prediction

We study the performance of risk-controlling prediction sets (RCPS), an empirical risk minimization-based formulation of conformal prediction, with a single trajectory of temporally correlated data from an unknown stochastic dynamical system. First, we use the blocking technique to show that RCPS attains performance guarantees similar to those enjoyed in the iid setting whenever data is generated by asymptotically stationary and contractive dynamics. Next, we use the decoupling technique to characterize the graceful degradation in RCPS guarantees when the data generating process deviates from stationarity and contractivity. We conclude by discussing how these tools could be used toward a unified analysis of online and offline conformal prediction algorithms, which are currently treated with very different tools.

Updated: 2024-06-03 17:51:33

标题: 单轨道符合性预测

摘要: 我们研究了风险控制预测集（RCPS）在未知随机动态系统中具有时间相关数据的单一轨迹的性能，这是一种基于经验风险最小化的符合预测形式。首先，我们使用阻塞技术表明，当数据由渐近稳定和收敛动态生成时，RCPS实现了类似于iid设置中享有的性能保证。接下来，我们使用解耦技术来表征当数据生成过程偏离稳定性和收敛性时，RCPS保证的渐进退化。最后，我们讨论了这些工具如何可以用于统一分析在线和离线符合预测算法，目前这些算法使用非常不同的工具进行处理。

更新时间: 2024-06-03 17:51:33

领域: cs.LG,cs.SY,eess.SY,stat.ML

下载: http://arxiv.org/abs/2406.01570v1

Loss Symmetry and Noise Equilibrium of Stochastic Gradient Descent

Symmetries exist abundantly in the loss function of neural networks. We characterize the learning dynamics of stochastic gradient descent (SGD) when exponential symmetries, a broad subclass of continuous symmetries, exist in the loss function. We establish that when gradient noises do not balance, SGD has the tendency to move the model parameters toward a point where noises from different directions are balanced. Here, a special type of fixed point in the constant directions of the loss function emerges as a candidate for solutions for SGD. As the main theoretical result, we prove that every parameter $\theta$ connects without loss function barrier to a unique noise-balanced fixed point $\theta^*$. The theory implies that the balancing of gradient noise can serve as a novel alternative mechanism for relevant phenomena such as progressive sharpening and flattening and can be applied to understand common practical problems such as representation normalization, matrix factorization, warmup, and formation of latent representations.

Updated: 2024-06-03 17:49:41

标题: 随机梯度下降的损失对称性和噪声平衡

摘要: 对神经网络的损失函数存在丰富的对称性。我们对随机梯度下降（SGD）的学习动态进行了表征，当损失函数中存在指数对称性时，即连续对称性的一个广泛子类。我们确定，当梯度噪声不平衡时，SGD会倾向于将模型参数移向一个使不同方向的噪声平衡的点。在这里，损失函数的常数方向上出现了一种特殊类型的固定点，成为SGD的解决方案的候选者。作为主要的理论结果，我们证明每个参数 $\theta$ 都连接到一个唯一的噪声平衡固定点 $\theta^*$，而没有损失函数障碍。该理论暗示了梯度噪声的平衡可以作为相关现象（如逐渐变尖和变平）的一种新的替代机制，并可应用于理解常见的实际问题，如表示规范化、矩阵分解、预热和潜在表示的形成。

更新时间: 2024-06-03 17:49:41

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2402.07193v2

Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs

This paper introduces Helix, a distributed system for high-throughput, low-latency large language model (LLM) serving on heterogeneous GPU clusters. A key idea behind Helix is to formulate inference computation of LLMs over heterogeneous GPUs and network connections as a max-flow problem for a directed, weighted graph, whose nodes represent GPU instances and edges capture both GPU and network heterogeneity through their capacities. Helix then uses a mixed integer linear programming (MILP) algorithm to discover highly optimized strategies to serve LLMs. This approach allows Helix to jointly optimize model placement and request scheduling, two highly entangled tasks in heterogeneous LLM serving. Our evaluation on several heterogeneous cluster settings ranging from 24 to 42 GPU nodes shows that Helix improves serving throughput by up to 2.7$\times$ and reduces prompting and decoding latency by up to 2.8$\times$ and 1.3$\times$, respectively, compared to best existing approaches.

Updated: 2024-06-03 17:47:53

标题: 螺旋：通过异构GPU上的最大流实现大型语言模型的分布式服务

摘要: 本文介绍了Helix，这是一个用于在异构GPU集群上进行高吞吐量、低延迟的大型语言模型（LLM）服务的分布式系统。Helix背后的关键思想是将LLMs的推断计算在异构GPU和网络连接上构建为一个有向加权图的最大流问题，其中节点代表GPU实例，边捕捉了GPU和网络异构性通过它们的容量。然后，Helix使用混合整数线性规划（MILP）算法来发现高度优化的策略来服务LLMs。这种方法使Helix能够共同优化模型放置和请求调度，这是异构LLM服务中高度纠缠的两个任务。我们在从24到42个GPU节点的几个异构集群设置上进行评估表明，与最佳现有方法相比，Helix可以将服务吞吐量提高高达2.7倍，并将提示和解码延迟分别降低高达2.8倍和1.3倍。

更新时间: 2024-06-03 17:47:53

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2406.01566v1

A Survey on Self-Evolution of Large Language Models

Large language models (LLMs) have significantly advanced in various fields and intelligent agent applications. However, current LLMs that learn from human or external model supervision are costly and may face performance ceilings as task complexity and diversity increase. To address this issue, self-evolution approaches that enable LLM to autonomously acquire, refine, and learn from experiences generated by the model itself are rapidly growing. This new training paradigm inspired by the human experiential learning process offers the potential to scale LLMs towards superintelligence. In this work, we present a comprehensive survey of self-evolution approaches in LLMs. We first propose a conceptual framework for self-evolution and outline the evolving process as iterative cycles composed of four phases: experience acquisition, experience refinement, updating, and evaluation. Second, we categorize the evolution objectives of LLMs and LLM-based agents; then, we summarize the literature and provide taxonomy and insights for each module. Lastly, we pinpoint existing challenges and propose future directions to improve self-evolution frameworks, equipping researchers with critical insights to fast-track the development of self-evolving LLMs. Our corresponding GitHub repository is available at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/Awesome-Self-Evolution-of-LLM

Updated: 2024-06-03 17:47:30

标题: 大型语言模型自我演化调查

摘要: 大型语言模型（LLMs）在各个领域和智能代理应用中取得了显著进展。然而，目前从人类或外部模型监督学习的LLMs成本高昂，并且随着任务复杂性和多样性的增加，可能面临性能上限的挑战。为了解决这个问题，使LLM能够自主获取、完善和学习模型本身生成的经验的自我进化方法正在迅速发展。这种受人类经验性学习过程启发的新训练范式为将LLMs推向超级智能提供了潜力。在这项工作中，我们提出了LLMs中自我进化方法的全面调查。首先，我们提出了一个自我进化的概念框架，并概述了演变过程，这个过程由四个阶段的迭代循环组成：经验获取、经验完善、更新和评估。其次，我们对LLMs和基于LLMs的代理的进化目标进行了分类；然后，我们总结了文献，并为每个模块提供了分类法和见解。最后，我们指出了现有的挑战，并提出了未来的方向，以改进自我进化框架，为研究人员提供了关键见解，以加快自我进化LLMs的发展。我们对应的GitHub存储库位于https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/Awesome-Self-Evolution-of-LLM。

更新时间: 2024-06-03 17:47:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.14387v2

A New View on Planning in Online Reinforcement Learning

This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.

Updated: 2024-06-03 17:45:19

标题: 在线强化学习中规划的新视角

摘要: 本文研究了一种基于模型的强化学习的新方法，使用背景规划：混合（近似）动态规划更新和无模型更新，类似于Dyna架构。使用学习模型的背景规划通常比无模型替代方法更差，如Double DQN，尽管前者使用了更多的内存和计算资源。根本问题在于学习的模型可能不准确，并且在迭代多步时经常生成无效状态。在本文中，我们通过将背景规划限制在一组（抽象的）子目标上，并仅学习本地的、基于子目标的模型，避免了这种限制。这种目标空间规划（GSP）方法在计算上更有效率，自然地结合了时间抽象以实现更快速的长期规划，并且避免了完全学习转移动态。我们展示了我们的GSP算法可以以一种方式从抽象空间传播价值，从而帮助各种基础学习者在不同领域中学习得更快。

更新时间: 2024-06-03 17:45:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01562v1

Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation

Diffusion-based text-to-image generation models trained on extensive text-image pairs have shown the capacity to generate photorealistic images consistent with textual descriptions. However, a significant limitation of these models is their slow sample generation, which requires iterative refinement through the same network. In this paper, we enhance Score identity Distillation (SiD) by developing long and short classifier-free guidance (LSG) to efficiently distill pretrained Stable Diffusion models without using real training data. SiD aims to optimize a model-based explicit score matching loss, utilizing a score-identity-based approximation alongside the proposed LSG for practical computation. By training exclusively with fake images synthesized with its one-step generator, SiD equipped with LSG rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score. Specifically, its data-free distillation of Stable Diffusion 1.5 achieves a record low FID of 8.15 on the COCO-2014 validation set, with a CLIP score of 0.304 at an LSG scale of 1.5, and a FID of 9.56 with a CLIP score of 0.313 at an LSG scale of 2. We will make our PyTorch implementation and distilled Stable Diffusion one-step generators available at https://github.com/mingyuanzhou/SiD-LSG

Updated: 2024-06-03 17:44:11

标题: 一步文本到图像生成中的分数身份蒸馏中的长短引导

摘要: 基于扩展的文本-图像对训练的基于扩散的文本到图像生成模型已经显示出生成与文字描述一致的照片逼真的图像的能力。然而，这些模型的一个显著限制是它们生成样本速度慢，需要通过相同网络进行迭代改进。本文通过开发长短分类器无关引导（LSG）来增强Score identity Distillation（SiD），以有效地提炼预训练的稳定扩散模型，而无需使用真实训练数据。SiD的目标是优化基于模型的显式分数匹配损失，利用基于分数标识的近似以及提出的LSG进行实用计算。通过仅使用其一步生成器合成的假图像进行训练，SiD配备LSG快速提高了FID和CLIP分数，实现了最先进的FID性能，同时保持竞争性的CLIP分数。具体地，其对Stable Diffusion 1.5的无数据提炼在COCO-2014验证集上实现了创纪录的低FID为8.15，LSG尺度为1.5时CLIP分数为0.304，在LSG尺度为2时FID为9.56，CLIP分数为0.313。我们将在https://github.com/mingyuanzhou/SiD-LSG 上提供我们的PyTorch实现和提炼的Stable Diffusion一步生成器。

更新时间: 2024-06-03 17:44:11

领域: cs.CV,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01561v1

DITTO: Diffusion Inference-Time T-Optimization for Music Generation

We propose Diffusion Inference-Time T-Optimization (DITTO), a general-purpose frame-work for controlling pre-trained text-to-music diffusion models at inference-time via optimizing initial noise latents. Our method can be used to optimize through any differentiable feature matching loss to achieve a target (stylized) output and leverages gradient checkpointing for memory efficiency. We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control - all without ever fine-tuning the underlying model. When we compare our approach against related training, guidance, and optimization-based methods, we find DITTO achieves state-of-the-art performance on nearly all tasks, including outperforming comparable approaches on controllability, audio quality, and computational efficiency, thus opening the door for high-quality, flexible, training-free control of diffusion models. Sound examples can be found at https://DITTO-Music.github.io/web/.

Updated: 2024-06-03 17:37:53

标题: DITTO：音乐生成中的扩散推断时间优化

摘要: 我们提出了扩散推理时间T优化（DITTO），这是一个通用的框架，用于通过优化初始噪声潜变量来控制预训练的文本到音乐扩散模型的推理时间。我们的方法可以用于通过任何可微特征匹配损失进行优化，以实现目标（风格化）输出，并利用梯度检查点实现内存效率。我们展示了音乐生成的惊人广泛应用，包括修补、扩展和循环，以及强度、旋律和音乐结构控制 - 所有这些都不需要对基础模型进行微调。当我们将我们的方法与相关的训练、指导和基于优化的方法进行比较时，我们发现DITTO在几乎所有任务上都达到了最先进的性能，包括在可控性、音频质量和计算效率方面胜过可比较的方法，从而为扩散模型的高质量、灵活、无需训练的控制打开了大门。音频示例可以在https://DITTO-Music.github.io/web/找到。

更新时间: 2024-06-03 17:37:53

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2401.12179v2

Addressing Diverging Training Costs using Local Restoration for Precise Bird's Eye View Map Construction

Recent advancements in Bird's Eye View (BEV) fusion for map construction have demonstrated remarkable mapping of urban environments. However, their deep and bulky architecture incurs substantial amounts of backpropagation memory and computing latency. Consequently, the problem poses an unavoidable bottleneck in constructing high-resolution (HR) BEV maps, as their large-sized features cause significant increases in costs including GPU memory consumption and computing latency, named diverging training costs issue. Affected by the problem, most existing methods adopt low-resolution (LR) BEV and struggle to estimate the precise locations of urban scene components like road lanes, and sidewalks. As the imprecision leads to risky self-driving, the diverging training costs issue has to be resolved. In this paper, we address the issue with our novel Trumpet Neural Network (TNN) mechanism. The framework utilizes LR BEV space and outputs an up-sampled semantic BEV map to create a memory-efficient pipeline. To this end, we introduce Local Restoration of BEV representation. Specifically, the up-sampled BEV representation has severely aliased, blocky signals, and thick semantic labels. Our proposed Local Restoration restores the signals and thins (or narrows down) the width of the labels. Our extensive experiments show that the TNN mechanism provides a plug-and-play memory-efficient pipeline, thereby enabling the effective estimation of real-sized (or precise) semantic labels for BEV map construction.

Updated: 2024-06-03 17:36:47

标题: 解决训练成本分歧：利用本地修复进行精确鸟瞰地图构建

摘要: 最近在鸟瞰图（BEV）融合方面取得的进展为地图构建展示了城市环境的显著映射。然而，它们深度和庞大的架构导致大量的反向传播内存和计算延迟。因此，这个问题在构建高分辨率（HR）BEV地图时构成了一个无法避免的瓶颈，因为它们大尺寸的特征导致了成本的显著增加，包括GPU内存消耗和计算延迟，被称为训练成本分歧问题。受到这个问题的影响，大多数现有方法采用低分辨率（LR）BEV，并且难以准确估计城市场景组件的位置，如道路车道和人行道。由于不精确会导致危险的自动驾驶，必须解决训练成本分歧问题。在本文中，我们通过我们的新颖的Trumpet神经网络（TNN）机制来解决这个问题。该框架利用LR BEV空间，并输出一个上采样的语义BEV地图来创建一个内存高效的管道。为此，我们引入了BEV表示的局部恢复。具体而言，上采样的BEV表示有严重的混叠，方块状信号和厚实的语义标签。我们提出的局部恢复恢复了信号并减小了标签的宽度。我们的广泛实验表明，TNN机制提供了一个即插即用的内存高效管道，从而实现了对BEV地图构建的真实大小（或精确）语义标签的有效估计。

更新时间: 2024-06-03 17:36:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.01016v2

Arrows of Time for Large Language Models

We study the probabilistic modeling performed by Autoregressive Large Language Models (LLMs) through the angle of time directionality, addressing a question first raised in (Shannon, 1951). For large enough models, we empirically find a time asymmetry in their ability to learn natural language: a difference in the average log-perplexity when trying to predict the next token versus when trying to predict the previous one. This difference is at the same time subtle and very consistent across various modalities (language, model size, training time, ...). Theoretically, this is surprising: from an information-theoretic point of view, there should be no such difference. We provide a theoretical framework to explain how such an asymmetry can appear from sparsity and computational complexity considerations, and outline a number of perspectives opened by our results.

Updated: 2024-06-03 17:35:04

标题: 大型语言模型的时间箭头

摘要: 我们通过时间方向性的角度研究自回归大型语言模型（LLMs）进行的概率建模，解决了Shannon（1951）首次提出的问题。对于足够大的模型，我们在实证研究中发现它们在学习自然语言时存在时间不对称性：在尝试预测下一个标记和尝试预测上一个标记时，平均对数困惑度存在差异。这种差异既微妙又非常稳定，跨不同模态（语言、模型大小、训练时间等）保持一致。从信息论的角度来看，理论上这是令人惊讶的：不应该存在这样的差异。我们提供了一个理论框架来解释如何从稀疏性和计算复杂性考虑中出现这种不对称性，并概述了我们研究结果所打开的一些视角。

更新时间: 2024-06-03 17:35:04

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2401.17505v3

Learning equivariant tensor functions with applications to sparse vector recovery

This work characterizes equivariant polynomial functions from tuples of tensor inputs to tensor outputs. Loosely motivated by physics, we focus on equivariant functions with respect to the diagonal action of the orthogonal group on tensors. We show how to extend this characterization to other linear algebraic groups, including the Lorentz and symplectic groups. Our goal behind these characterizations is to define equivariant machine learning models. In particular, we focus on the sparse vector estimation problem. This problem has been broadly studied in the theoretical computer science literature, and explicit spectral methods, derived by techniques from sum-of-squares, can be shown to recover sparse vectors under certain assumptions. Our numerical results show that the proposed equivariant machine learning models can learn spectral methods that outperform the best theoretically known spectral methods in some regimes. The experiments also suggest that learned spectral methods can solve the problem in settings that have not yet been theoretically analyzed. This is an example of a promising direction in which theory can inform machine learning models and machine learning models could inform theory.

Updated: 2024-06-03 17:32:43

标题: 学习等变张量函数及其在稀疏向量恢复中的应用

摘要: 这项工作对从张量输入到张量输出的等变多项式函数进行了表征。受物理学启发，我们专注于对张量上正交群的对角作用等变的函数。我们展示了如何将这种表征扩展到其他线性代数群，包括洛伦兹群和辛群。我们对这些表征的目标是定义等变机器学习模型。特别是，我们关注稀疏向量估计问题。这个问题在理论计算机科学文献中得到广泛研究，通过和平方和技术导出的显式谱方法可以在某些假设下恢复稀疏向量。我们的数值结果表明，提出的等变机器学习模型可以学习胜过目前已知最佳理论谱方法的谱方法在某些情况下。实验还表明，学习的谱方法可以在尚未经过理论分析的情境中解决问题。这是一个有前景的方向的例子，即理论可以指导机器学习模型，而机器学习模型也可以指导理论。

更新时间: 2024-06-03 17:32:43

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.01552v1

An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation

Retrieval-augmented generation integrates the capabilities of large language models with relevant information retrieved from an extensive corpus, yet encounters challenges when confronted with real-world noisy data. One recent solution is to train a filter module to find relevant content but only achieve suboptimal noise compression. In this paper, we propose to introduce the information bottleneck theory into retrieval-augmented generation. Our approach involves the filtration of noise by simultaneously maximizing the mutual information between compression and ground output, while minimizing the mutual information between compression and retrieved passage. In addition, we derive the formula of information bottleneck to facilitate its application in novel comprehensive evaluations, the selection of supervised fine-tuning data, and the construction of reinforcement learning rewards. Experimental results demonstrate that our approach achieves significant improvements across various question answering datasets, not only in terms of the correctness of answer generation but also in the conciseness with $2.5\%$ compression rate.

Updated: 2024-06-03 17:31:06

标题: 一个信息瓶颈视角对检索增强生成中的有效噪声过滤。

摘要: 检索增强生成将大型语言模型的能力与从广泛语料库中检索的相关信息整合在一起，但在面对真实世界中的嘈杂数据时会遇到挑战。最近的一种解决方案是训练一个过滤模块来查找相关内容，但只能达到次优的噪声压缩。在本文中，我们提出将信息瓶颈理论引入检索增强生成中。我们的方法涉及通过同时最大化压缩和地面输出之间的互信息，同时最小化压缩和检索段落之间的互信息来过滤噪声。此外，我们推导出信息瓶颈的公式，以便在新颖的全面评估中应用，选择监督微调数据，并构建强化学习奖励。实验结果表明，我们的方法在各种问答数据集上取得了显著的改进，不仅在答案生成的正确性方面，而且在2.5%的压缩率下的简洁性方面也取得了显著的改进。

更新时间: 2024-06-03 17:31:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.01549v1

TinySV: Speaker Verification in TinyML with On-device Learning

TinyML is a novel area of machine learning that gained huge momentum in the last few years thanks to the ability to execute machine learning algorithms on tiny devices (such as Internet-of-Things or embedded systems). Interestingly, research in this area focused on the efficient execution of the inference phase of TinyML models on tiny devices, while very few solutions for on-device learning of TinyML models are available in the literature due to the relevant overhead introduced by the learning algorithms. The aim of this paper is to introduce a new type of adaptive TinyML solution that can be used in tasks, such as the presented \textit{Tiny Speaker Verification} (TinySV), that require to be tackled with an on-device learning algorithm. Achieving this goal required (i) reducing the memory and computational demand of TinyML learning algorithms, and (ii) designing a TinyML learning algorithm operating with few and possibly unlabelled training data. The proposed TinySV solution relies on a two-layer hierarchical TinyML solution comprising Keyword Spotting and Adaptive Speaker Verification module. We evaluated the effectiveness and efficiency of the proposed TinySV solution on a dataset collected expressly for the task and tested the proposed solution on a real-world IoT device (Infineon PSoC 62S2 Wi-Fi BT Pioneer Kit).

Updated: 2024-06-03 17:27:40

标题: TinySV：在TinyML中进行的设备学习的说话人验证

摘要: TinyML是机器学习的一个新领域，在过去几年获得了巨大的动力，这要归功于在微型设备（如物联网或嵌入式系统）上执行机器学习算法的能力。有趣的是，该领域的研究着重于在微型设备上高效执行TinyML模型的推理阶段，而由于学习算法引入的相关开销，文献中几乎没有关于TinyML模型的设备端学习的解决方案。本文旨在介绍一种新型的自适应TinyML解决方案，可用于需要使用设备端学习算法解决的任务，例如所提出的“微型说话人验证”（TinySV）。实现这一目标需要（i）降低TinyML学习算法的内存和计算需求，以及（ii）设计一种操作少量甚至未标记训练数据的TinyML学习算法。所提出的TinySV解决方案依赖于一个包含关键词识别和自适应说话人验证模块的两层分层TinyML解决方案。我们对专门为此任务收集的数据集评估了所提出的TinySV解决方案的有效性和效率，并在一款真实的物联网设备（Infineon PSoC 62S2 Wi-Fi BT Pioneer Kit）上测试了所提出的解决方案。

更新时间: 2024-06-03 17:27:40

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.01655v1

Learning from Mistakes: a Weakly-supervised Method for Mitigating the Distribution Shift in Autonomous Vehicle Planning

The planning problem constitutes a fundamental aspect of the autonomous driving framework. Recent strides in representation learning have empowered vehicles to comprehend their surrounding environments, thereby facilitating the integration of learning-based planning strategies. Among these approaches, Imitation Learning stands out due to its notable training efficiency. However, traditional Imitation Learning methodologies encounter challenges associated with the co-variate shift phenomenon. We propose Learn from Mistakes (LfM) as a remedy to address this issue. The essence of LfM lies in deploying a pre-trained planner across diverse scenarios. Instances where the planner deviates from its immediate objectives, such as maintaining a safe distance from obstacles or adhering to traffic rules, are flagged as mistakes. The environments corresponding to these mistakes are categorized as out-of-distribution states and compiled into a new dataset termed closed-loop mistakes dataset. Notably, the absence of expert annotations for the closed-loop data precludes the applicability of standard imitation learning approaches. To facilitate learning from the closed-loop mistakes, we introduce Validity Learning, a weakly supervised method, which aims to discern valid trajectories within the current environmental context. Experimental evaluations conducted on the InD and Nuplan datasets reveal substantial enhancements in closed-loop metrics such as Progress and Collision Rate, underscoring the effectiveness of the proposed methodology.

Updated: 2024-06-03 17:25:18

标题: 从错误中学习：一种弱监督方法，用于减轻自动驾驶车辆规划中的分布偏移

摘要: 规划问题构成自动驾驶框架的基本方面。最近在表示学习方面取得的进展使车辆能够理解其周围的环境，从而促进了基于学习的规划策略的整合。在这些方法中，由于其显著的训练效率，模仿学习脱颖而出。然而，传统的模仿学习方法面临与协变量漂移现象相关的挑战。我们提出Learn from Mistakes (LfM)作为解决这个问题的方法。LfM的本质在于在各种场景中部署一个预先训练的规划器。规划器偏离其直接目标的实例，例如保持与障碍物的安全距离或遵守交通规则，被标记为错误。与这些错误对应的环境被归类为超出分布状态，并编制成一个新数据集，称为闭环错误数据集。值得注意的是，闭环数据缺乏专家注释，导致标准模仿学习方法的适用性受到限制。为了促进从闭环错误中学习，我们引入了一种弱监督方法，称为有效性学习，旨在识别当前环境背景中的有效轨迹。在InD和Nuplan数据集上进行的实验评估显示，在闭环指标，如进展和碰撞率方面，存在显著的改进，强调了所提出方法的有效性。

更新时间: 2024-06-03 17:25:18

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.01544v1

The Topology and Geometry of Neural Representations

A central question for neuroscience is how to characterize brain representations of perceptual and cognitive content. An ideal characterization should distinguish different functional regions with robustness to noise and idiosyncrasies of individual brains that do not correspond to computational differences. Previous studies have characterized brain representations by their representational geometry, which is defined by the representational dissimilarity matrix (RDM), a summary statistic that abstracts from the roles of individual neurons (or responses channels) and characterizes the discriminability of stimuli. Here we explore a further step of abstraction: from the geometry to the topology of brain representations. We propose topological representational similarity analysis (tRSA), an extension of representational similarity analysis (RSA) that uses a family of geo-topological summary statistics that generalizes the RDM to characterize the topology while de-emphasizing the geometry. We evaluate this new family of statistics in terms of the sensitivity and specificity for model selection using both simulations and fMRI data. In the simulations, the ground truth is a data-generating layer representation in a neural network model and the models are the same and other layers in different model instances (trained from different random seeds). In fMRI, the ground truth is a visual area and the models are the same and other areas measured in different subjects. Results show that topology-sensitive characterizations of population codes are robust to noise and interindividual variability and maintain excellent sensitivity to the unique representational signatures of different neural network layers and brain regions. These methods enable researchers to calibrate comparisons among representations in brains and models to be sensitive to the geometry, the topology, or a combination of both.

Updated: 2024-06-03 17:22:24

标题: 神经表示的拓扑学与几何学

摘要: 神经科学中一个核心问题是如何表征大脑对知觉和认知内容的表示。理想的表征应该能够区分不同的功能区域，并且对噪声和个体大脑特异性具有鲁棒性，这些特异性与计算差异不对应。先前的研究通过表示几何特征来表征大脑表示，这由表示不相似矩阵（RDM）定义，RDM是一个摘要统计量，抽象出个体神经元（或响应通道）的作用，并表征刺激的可辨识性。在这里，我们探索一个进一步的抽象步骤：从几何到大脑表示的拓扑。我们提出了拓扑表示相似性分析（tRSA），这是表示相似性分析（RSA）的一种扩展，使用一组地理拓扑摘要统计，将RDM概括为描述拓扑，同时淡化几何特征。我们通过模拟和fMRI数据评估这个新的统计方法的模型选择敏感性和特异性。在模拟中，地面实况是在神经网络模型中的数据生成层表示，模型是相同的和其他层在不同模型实例中（从不同随机种子训练）。在fMRI中，地面实况是一个视觉区域，模型是相同的和其他区域在不同受试者中测量。结果表明，对人口编码的拓扑敏感表征对噪声和个体间变异具有鲁棒性，并保持对不同神经网络层和大脑区域的独特表征特征具有优秀的敏感性。这些方法使研究人员能够校准大脑和模型中的表示之间的比较，以对几何、拓扑或两者的组合敏感。

更新时间: 2024-06-03 17:22:24

领域: q-bio.NC,cs.LG,stat.ME

下载: http://arxiv.org/abs/2309.11028v3

Physics-informed deep learning and compressive collocation for high-dimensional diffusion-reaction equations: practical existence theory and numerics

On the forefront of scientific computing, Deep Learning (DL), i.e., machine learning with Deep Neural Networks (DNNs), has emerged a powerful new tool for solving Partial Differential Equations (PDEs). It has been observed that DNNs are particularly well suited to weakening the effect of the curse of dimensionality, a term coined by Richard E. Bellman in the late `50s to describe challenges such as the exponential dependence of the sample complexity, i.e., the number of samples required to solve an approximation problem, on the dimension of the ambient space. However, although DNNs have been used to solve PDEs since the `90s, the literature underpinning their mathematical efficiency in terms of numerical analysis (i.e., stability, accuracy, and sample complexity), is only recently beginning to emerge. In this paper, we leverage recent advancements in function approximation using sparsity-based techniques and random sampling to develop and analyze an efficient high-dimensional PDE solver based on DL. We show, both theoretically and numerically, that it can compete with a novel stable and accurate compressive spectral collocation method. In particular, we demonstrate a new practical existence theorem, which establishes the existence of a class of trainable DNNs with suitable bounds on the network architecture and a sufficient condition on the sample complexity, with logarithmic or, at worst, linear scaling in dimension, such that the resulting networks stably and accurately approximate a diffusion-reaction PDE with high probability.

Updated: 2024-06-03 17:16:11

标题: 物理学通知深度学习和高维扩散反应方程的压缩选点：实际存在理论和数值解析

摘要: 在科学计算的前沿，深度学习（DL），即使用深度神经网络（DNNs）的机器学习，已经成为解决偏微分方程（PDEs）的强大新工具。已经观察到DNNs特别适合减弱维度诅咒的影响，这是Richard E. Bellman在50年代后期创造的一个术语，用来描述挑战，如样本复杂度（即解决逼近问题所需的样本数量）对环境空间维度的指数依赖性。然而，尽管自90年代以来DNNs已经被用来解决PDEs，但在数值分析方面支撑它们数学效率的文献，即稳定性、准确性和样本复杂度，最近才开始出现。在本文中，我们利用最近在使用基于稀疏技术和随机抽样的函数逼近方面取得的进展，开发并分析了基于DL的高效高维PDE求解器。我们在理论上和数值上表明，它可以与一种新颖的稳定和准确的压缩谱插值方法竞争。特别是，我们展示了一个新的实际存在定理，该定理确立了一类可训练的DNNs的存在，其网络架构具有适当的界限，并且在样本复杂度上具有足够的条件，维度的对数或最坏的情况下，线性缩放，使得所得到的网络以高概率稳定且准确地逼近扩散-反应PDE。

更新时间: 2024-06-03 17:16:11

领域: cs.LG,cs.IT,cs.NA,math.IT,math.NA

下载: http://arxiv.org/abs/2406.01539v1

What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores

Given the remarkable capabilities of large language models (LLMs), there has been a growing interest in evaluating their similarity to the human brain. One approach towards quantifying this similarity is by measuring how well a model predicts neural signals, also called "brain score". Internal representations from LLMs achieve state-of-the-art brain scores, leading to speculation that they share computational principles with human language processing. This inference is only valid if the subset of neural activity predicted by LLMs reflects core elements of language processing. Here, we question this assumption by analyzing three neural datasets used in an impactful study on LLM-to-brain mappings, with a particular focus on an fMRI dataset where participants read short passages. We first find that when using shuffled train-test splits, as done in previous studies with these datasets, a trivial feature that encodes temporal autocorrelation not only outperforms LLMs but also accounts for the majority of neural variance that LLMs explain. We therefore use contiguous splits moving forward. Second, we explain the surprisingly high brain scores of untrained LLMs by showing they do not account for additional neural variance beyond two simple features: sentence length and sentence position. This undermines evidence used to claim that the transformer architecture biases computations to be more brain-like. Third, we find that brain scores of trained LLMs on this dataset can largely be explained by sentence length, position, and pronoun-dereferenced static word embeddings; a small, additional amount is explained by sense-specific embeddings and contextual representations of sentence structure. We conclude that over-reliance on brain scores can lead to over-interpretations of similarity between LLMs and brains, and emphasize the importance of deconstructing what LLMs are mapping to in neural signals.

Updated: 2024-06-03 17:13:27

标题: 大型语言模型在大脑中映射到什么？反对对大脑分数的过度依赖。

摘要: 鉴于大型语言模型（LLMs）的显著能力，人们对评估它们与人类大脑的相似性越来越感兴趣。一种衡量这种相似性的方法是通过衡量模型预测神经信号的准确程度，也称为“脑分数”。LLMs的内部表示达到了最先进的脑分数，导致人们猜测它们与人类语言处理共享计算原则。这种推断只有在LLMs预测的神经活动子集反映语言处理的核心要素时才有效。在这里，我们通过分析三个神经数据集，重点关注一项关于LLM与大脑映射的影响力研究中使用的fMRI数据集，对这种假设提出质疑。我们首先发现，在使用随机训练-测试分割时，与以前使用这些数据集的研究中一样，一个编码时间自相关性的微不足道的特征不仅胜过LLMs，而且占据LLMs解释的神经变异的大部分。因此，我们将继续使用连续分割。其次，我们通过展示未经训练的LLMs的惊人高脑分数并不解释LLMs之外的额外神经变异，仅由句子长度和句子位置两个简单特征来解释，从而削弱了声称变换器架构偏向于更符合大脑的计算的证据。第三，我们发现在这个数据集上训练过的LLMs的脑分数可以在很大程度上由句子长度、位置和指代消解的静态词嵌入解释；少量额外的部分可以由特定于感知的嵌入和句子结构的上下文表示解释。我们得出结论，过度依赖脑分数可能导致对LLMs和大脑之间相似性的过度解释，并强调了分解LLMs在神经信号中映射的重要性。

更新时间: 2024-06-03 17:13:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.01538v1

Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Recent open-vocabulary robot mapping methods enrich dense geometric maps with pre-trained visual-language features. While these maps allow for the prediction of point-wise saliency maps when queried for a certain language concept, large-scale environments and abstract queries beyond the object level still pose a considerable hurdle, ultimately limiting language-grounded robotic navigation. In this work, we present HOV-SG, a hierarchical open-vocabulary 3D scene graph mapping approach for language-grounded robot navigation. Leveraging open-vocabulary vision foundation models, we first obtain state-of-the-art open-vocabulary segment-level maps in 3D and subsequently construct a 3D scene graph hierarchy consisting of floor, room, and object concepts, each enriched with open-vocabulary features. Our approach is able to represent multi-story buildings and allows robotic traversal of those using a cross-floor Voronoi graph. HOV-SG is evaluated on three distinct datasets and surpasses previous baselines in open-vocabulary semantic accuracy on the object, room, and floor level while producing a 75% reduction in representation size compared to dense open-vocabulary maps. In order to prove the efficacy and generalization capabilities of HOV-SG, we showcase successful long-horizon language-conditioned robot navigation within real-world multi-storage environments. We provide code and trial video data at http://hovsg.github.io/.

Updated: 2024-06-03 17:12:25

标题: 基于语言引导的机器人导航的分层开放词汇3D场景图

摘要: 最近的开放词汇机器人地图方法通过预训练的视觉语言特征丰富了密集几何地图。尽管这些地图允许在查询某种语言概念时预测逐点显著性地图，但大规模环境和抽象查询超越对象级别仍然构成了一个相当大的障碍，最终限制了基于语言的机器人导航。在这项工作中，我们提出了HOV-SG，一种用于基于语言的机器人导航的分层开放词汇3D场景图映射方法。利用开放词汇视觉基础模型，我们首先在3D中获得了最先进的开放词汇段级地图，随后构建了一个由地板、房间和对象概念组成的3D场景图层次结构，每个概念都丰富了开放词汇特征。我们的方法能够表示多层建筑，并允许机器人使用跨层Voronoi图进行遍历。HOV-SG在三个不同的数据集上进行评估，并在对象、房间和楼层级别上超越了以前的基线，在与密集开放词汇地图相比减少了75%的表示大小同时产生更高的语义准确性。为了证明HOV-SG的有效性和泛化能力，我们展示了在真实世界的多存储环境中成功实现的长视距语言条件机器人导航。我们在http://hovsg.github.io/上提供了代码和试验视频数据。

更新时间: 2024-06-03 17:12:25

领域: cs.RO,cs.AI,cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.17846v2

Immunocto: a massive immune cell database auto-generated for histopathology

With the advent of novel cancer treatment options such as immunotherapy, studying the tumour immune micro-environment is crucial to inform on prognosis and understand response to therapeutic agents. A key approach to characterising the tumour immune micro-environment may be through combining (1) digitised microscopic high-resolution optical images of hematoxylin and eosin (H&E) stained tissue sections obtained in routine histopathology examinations with (2) automated immune cell detection and classification methods. However, current individual immune cell classification models for digital pathology present relatively poor performance. This is mainly due to the limited size of currently available datasets of individual immune cells, a consequence of the time-consuming and difficult problem of manually annotating immune cells on digitised H&E whole slide images. In that context, we introduce Immunocto, a massive, multi-million automatically generated database of 6,848,454 human cells, including 2,282,818 immune cells distributed across 4 subtypes: CD4$^+$ T cell lymphocytes, CD8$^+$ T cell lymphocytes, B cell lymphocytes, and macrophages. For each cell, we provide a 64$\times$64 pixels H&E image at $\mathbf{40}\times$ magnification, along with a binary mask of the nucleus and a label. To create Immunocto, we combined open-source models and data to automatically generate the majority of contours and labels. The cells are obtained from a matched H&E and immunofluorescence colorectal dataset from the Orion platform, while contours are obtained using the Segment Anything Model. A classifier trained on H&E images from Immunocto produces an average F1 score of 0.74 to differentiate the 4 immune cell subtypes and other cells. Immunocto can be downloaded at: https://zenodo.org/uploads/11073373.

Updated: 2024-06-03 17:03:58

标题: 免疫细胞数据库Immunocto：为组织病理学自动生成的大规模数据库

摘要: 随着新型癌症治疗选择（如免疫疗法）的出现，研究肿瘤免疫微环境对于预后和理解对治疗药物的反应至关重要。表征肿瘤免疫微环境的一个关键方法可能是通过将（1）常规组织病理学检查中获取的苏木精和嗜酸性染色组织切片的数字化显微高分辨率光学图像与（2）自动化免疫细胞检测和分类方法相结合。然而，目前数字病理学中用于个体免疫细胞分类的模型表现相对较差。这主要是由于当前可用的个体免疫细胞数据集规模有限，这是在数字化苏木精和嗜酸性整张切片图像上手动注释免疫细胞的耗时困难问题的结果。在这种背景下，我们介绍了Immunocto，这是一个庞大的、由自动生成的 6,848,454 个人类细胞组成的数据库，其中包括 2,282,818 个免疫细胞，分布在 4 个亚型：CD4$^+$ T 细胞淋巴细胞、CD8$^+$ T 细胞淋巴细胞、B 细胞淋巴细胞和巨噬细胞。对于每个细胞，我们提供一个 64$\times$64 像素的 H&E 图像，在 $\mathbf{40}\times$ 倍放大下，以及一个细胞核的二值掩模和一个标签。为了创建 Immunocto，我们结合了开源模型和数据，自动生成了大多数轮廓和标签。这些细胞来自 Orion 平台的匹配 H&E 和免疫荧光结直肠数据集，而轮廓是使用 Segment Anything 模型获得的。在 Immunocto 的 H&E 图像上训练的分类器产生了一个平均 F1 分数为 0.74，可以区分 4 种免疫细胞亚型和其他细胞。Immunocto 可以从以下网址下载：https://zenodo.org/uploads/11073373。

更新时间: 2024-06-03 17:03:58

领域: q-bio.QM,cs.AI,eess.IV

下载: http://arxiv.org/abs/2406.02618v1

How to Count Coughs: An Event-Based Framework for Evaluating Automatic Cough Detection Algorithm Performance

Chronic cough disorders are widespread and challenging to assess because they rely on subjective patient questionnaires about cough frequency. Wearable devices running Machine Learning (ML) algorithms are promising for quantifying daily coughs, providing clinicians with objective metrics to track symptoms and evaluate treatments. However, there is a mismatch between state-of-the-art metrics for cough counting algorithms and the information relevant to clinicians. Most works focus on distinguishing cough from non-cough samples, which does not directly provide clinically relevant outcomes such as the number of cough events or their temporal patterns. In addition, typical metrics such as specificity and accuracy can be biased by class imbalance. We propose using event-based evaluation metrics aligned with clinical guidelines on significant cough counting endpoints. We use an ML classifier to illustrate the shortcomings of traditional sample-based accuracy measurements, highlighting their variance due to dataset class imbalance and sample window length. We also present an open-source event-based evaluation framework to test algorithm performance in identifying cough events and rejecting false positives. We provide examples and best practice guidelines in event-based cough counting as a necessary first step to assess algorithm performance with clinical relevance.

Updated: 2024-06-03 16:59:48

标题: 如何计算咳嗽：用于评估自动咳嗽检测算法性能的基于事件的框架

摘要: 慢性咳嗽疾病是普遍存在的，并且很难评估，因为它们依赖于关于咳嗽频率的主观患者问卷。运行机器学习（ML）算法的可穿戴设备对于量化每日咳嗽具有潜力，为临床医生提供客观指标以跟踪症状并评估治疗效果。然而，目前用于咳嗽计数算法的最先进指标与临床医生相关的信息存在不匹配。大多数研究侧重于区分咳嗽和非咳嗽样本，这并不能直接提供临床相关的结果，如咳嗽事件的数量或其时间模式。此外，诸如特异性和准确性之类的典型指标可能会受到类别不平衡的偏差影响。我们建议使用基于事件的评估指标，与临床指南中的重要咳嗽计数终点保持一致。我们使用机器学习分类器来说明传统基于样本的准确性测量的缺点，突出它们由于数据集类别不平衡和样本窗口长度而引起的方差。我们还提出一个开源的基于事件的评估框架，以测试算法在识别咳嗽事件和拒绝误报的表现。我们提供基于事件的咳嗽计数的示例和最佳实践准则，作为评估具有临床相关性的算法性能的必要第一步。

更新时间: 2024-06-03 16:59:48

领域: cs.LG

下载: http://arxiv.org/abs/2406.01529v1

Clover: Closed-Loop Verifiable Code Generation

The use of large language models for code generation is a rapidly growing trend in software development. However, without effective methods for ensuring the correctness of generated code, this trend could lead to any number of undesirable outcomes. In this paper, we lay out a vision for addressing this challenge: the Clover paradigm, short for Closed-Loop Verifiable Code Generation, which reduces correctness checking to the more accessible problem of consistency checking. At the core of Clover lies a checker that performs consistency checks among code, docstrings, and formal annotations. The checker is implemented using a novel integration of formal verification tools and large language models. We provide a theoretical analysis to support our thesis that Clover should be effective at consistency checking. We also empirically investigate its feasibility on a hand-designed dataset (CloverBench) featuring annotated Dafny programs at a textbook level of difficulty. Experimental results show that for this dataset, (i) LLMs are reasonably successful at automatically generating formal specifications; and (ii) our consistency checker achieves a promising acceptance rate (up to 87%) for correct instances while maintaining zero tolerance for incorrect ones (no false positives).

Updated: 2024-06-03 16:59:37

标题: 三叶草：闭环可验证代码生成

摘要: 大型语言模型用于代码生成在软件开发中是一个迅速增长的趋势。然而，如果没有有效的方法来确保生成的代码的正确性，这一趋势可能导致许多不良后果。在本文中，我们提出了解决这一挑战的愿景：Clover范式，即闭环可验证代码生成，它将正确性检查简化为更易访问的一致性检查问题。Clover的核心是一个检查器，它在代码、文档字符串和形式注释之间执行一致性检查。该检查器使用了一种新颖的形式验证工具和大型语言模型的集成实现。我们提供了理论分析来支持我们的观点，即Clover应该在一致性检查方面是有效的。我们还在一个手工设计的数据集（CloverBench）上进行了实证研究，该数据集包含了在教科书难度水平上带有注释的Dafny程序。实验结果显示，对于这个数据集，（i）LLMs在自动生成正式规范方面相当成功；（ii）我们的一致性检查器在正确实例上达到了一个有希望的接受率（高达87%），同时对于不正确实例保持零容忍（没有假阳性）。

更新时间: 2024-06-03 16:59:37

领域: cs.AI,cs.LG,cs.SE

下载: http://arxiv.org/abs/2310.17807v3

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

In the rapidly advancing field of conditional image generation research, challenges such as limited explainability lie in effectively evaluating the performance and capabilities of various models. This paper introduces VIEScore, a Visual Instruction-guided Explainable metric for evaluating any conditional image generation tasks. VIEScore leverages general knowledge from Multimodal Large Language Models (MLLMs) as the backbone and does not require training or fine-tuning. We evaluate VIEScore on seven prominent tasks in conditional image tasks and found: (1) VIEScore (GPT4-o) achieves a high Spearman correlation of 0.4 with human evaluations, while the human-to-human correlation is 0.45. (2) VIEScore (with open-source MLLM) is significantly weaker than GPT-4o and GPT-4v in evaluating synthetic images. (3) VIEScore achieves a correlation on par with human ratings in the generation tasks but struggles in editing tasks. With these results, we believe VIEScore shows its great potential to replace human judges in evaluating image synthesis tasks.

Updated: 2024-06-03 16:59:20

标题: VIEScore: 朝向可解释的条件图像合成评估指标

摘要: 在快速发展的条件图像生成研究领域，诸如有限的可解释性等挑战在有效评估各种模型的性能和能力方面存在。本文介绍了VIEScore，这是一个用于评估任何条件图像生成任务的可视指导可解释指标。VIEScore利用多模态大语言模型（MLLMs）作为骨干的一般知识，并且不需要训练或微调。我们在七个突出的条件图像任务上评估了VIEScore，并发现：（1）VIEScore（GPT4-o）与人类评估的Spearman相关性达到了0.4，而人与人之间的相关性为0.45。（2）VIEScore（使用开源MLLM）在评估合成图像方面明显弱于GPT-4o和GPT-4v。（3）VIEScore在生成任务中达到了与人类评分相当的相关性，但在编辑任务中表现不佳。根据这些结果，我们相信VIEScore展示了在评估图像合成任务中取代人类评委的巨大潜力。

更新时间: 2024-06-03 16:59:20

领域: cs.CV,cs.AI,cs.CL,cs.MM

下载: http://arxiv.org/abs/2312.14867v2

Physics-Informed Neural Networks for Dynamic Process Operations with Limited Physical Knowledge and Data

In chemical engineering, process data is often expensive to acquire, and complex phenomena are difficult to model rigorously, rendering both entirely data-driven and purely mechanistic modeling approaches impractical. We explore using physics-informed neural networks (PINNs) for modeling dynamic processes governed by differential-algebraic equation systems when process data is scarce and complete mechanistic knowledge is missing. In particular, we focus on estimating states for which neither direct observational data nor constitutive equations are available. For demonstration purposes, we study a continuously stirred tank reactor and a liquid-liquid separator. We find that PINNs can infer unmeasured states with reasonable accuracy, and they generalize better in low-data scenarios than purely data-driven models. We thus show that PINNs, similar to hybrid mechanistic/data-driven models, are capable of modeling processes when relatively few experimental data and only partially known mechanistic descriptions are available, and conclude that they constitute a promising avenue that warrants further investigation.

Updated: 2024-06-03 16:58:17

标题: 物理信息神经网络用于具有有限物理知识和数据的动态过程操作

摘要: 在化学工程中，过程数据往往很昂贵，复杂现象难以严谨建模，这使得完全基于数据驱动和纯粹机械建模方法都不切实际。我们探索使用物理信息神经网络（PINNs）来建模动态过程，这些过程由微分代数方程系统控制，当过程数据稀缺且完全机械知识缺失时。特别是，我们专注于估计既没有直接观测数据也没有构成方程的状态。为了演示目的，我们研究了一个连续搅拌反应釜和一个液-液分离器。我们发现，PINNs能够以合理的准确度推断未测量的状态，并且它们在低数据情况下比纯数据驱动模型更好地推广。因此，我们展示了PINNs，类似于混合机械/数据驱动模型，在相对较少的实验数据和部分已知机械描述可用时能够建模过程，并得出结论认为它们构成了一个值得进一步探究的有前途的途径。

更新时间: 2024-06-03 16:58:17

领域: cs.LG

下载: http://arxiv.org/abs/2406.01528v1

NeuSpeech: Decode Neural signal as Speech

Decoding language from brain dynamics is an important open direction in the realm of brain-computer interface (BCI), especially considering the rapid growth of large language models. Compared to invasive-based signals which require electrode implantation surgery, non-invasive neural signals (e.g. EEG, MEG) have attracted increasing attention considering their safety and generality. However, the exploration is not adequate in three aspects: 1) previous methods mainly focus on EEG but none of the previous works address this problem on MEG with better signal quality; 2) prior works have predominantly used $``teacher-forcing"$ during generative decoding, which is impractical; 3) prior works are mostly $``BART-based"$ not fully auto-regressive, which performs better in other sequence tasks. In this paper, we explore the brain-to-text translation of MEG signals in a speech-decoding formation. Here we are the first to investigate a cross-attention-based ``whisper" model for generating text directly from MEG signals without teacher forcing. Our model achieves impressive BLEU-1 scores of 60.30 and 52.89 without pretraining $\&$ teacher-forcing on two major datasets ($\textit{GWilliams}$ and $\textit{Schoffelen}$). This paper conducts a comprehensive review to understand how speech decoding formation performs on the neural decoding tasks, including pretraining initialization, training $\&$ evaluation set splitting, augmentation, and scaling law. Code is available at https://github.com/NeuSpeech/NeuSpeech1$.

Updated: 2024-06-03 16:58:04

标题: NeuSpeech：将神经信号解码为语音

摘要: 从大脑动态解码语言是脑-计算机界面（BCI）领域重要的开放方向，尤其考虑到大型语言模型的快速增长。与需要电极植入手术的侵入性信号相比，无创神经信号（如脑电图、脑磁图）由于其安全性和普遍性而受到越来越多的关注。然而，在三个方面的探索还不足：1）以往的方法主要关注脑电图，但以往的研究没有针对信号质量更好的脑磁图解决这个问题；2）先前的研究在生成解码过程中主要使用“教师强制”，这是不切实际的；3）先前的研究大多是基于“BART”，不完全是自回归的，在其他序列任务中表现更好。在本文中，我们探索了脑电磁信号在语音解码形式上的脑-文本翻译。我们首次研究了基于交叉关注的“耳语”模型，可以直接从脑磁图信号生成文本而无需教师强制。我们的模型在两个主要数据集（GWilliams和Schoffelen）上实现了令人印象深刻的BLEU-1分数，分别为60.30和52.89，在无预训练和教师强制的情况下。本文进行了全面的回顾，以了解语音解码形式在神经解码任务中的表现，包括预训练初始化、训练和评估集分割、增强和缩放规律。代码可在https://github.com/NeuSpeech/NeuSpeech1找到。

更新时间: 2024-06-03 16:58:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.01748v3

MOSEAC: Streamlined Variable Time Step Reinforcement Learning

Traditional reinforcement learning (RL) methods typically employ a fixed control loop, where each cycle corresponds to an action. This rigidity poses challenges in practical applications, as the optimal control frequency is task-dependent. A suboptimal choice can lead to high computational demands and reduced exploration efficiency. Variable Time Step Reinforcement Learning (VTS-RL) addresses these issues by using adaptive frequencies for the control loop, executing actions only when necessary. This approach, rooted in reactive programming principles, reduces computational load and extends the action space by including action durations. However, VTS-RL's implementation is often complicated by the need to tune multiple hyperparameters that govern exploration in the multi-objective action-duration space (i.e., balancing task performance and number of time steps to achieve a goal). To overcome these challenges, we introduce the Multi-Objective Soft Elastic Actor-Critic (MOSEAC) method. This method features an adaptive reward scheme that adjusts hyperparameters based on observed trends in task rewards during training. This scheme reduces the complexity of hyperparameter tuning, requiring a single hyperparameter to guide exploration, thereby simplifying the learning process and lowering deployment costs. We validate the MOSEAC method through simulations in a Newtonian kinematics environment, demonstrating high task and training performance with fewer time steps, ultimately lowering energy consumption. This validation shows that MOSEAC streamlines RL algorithm deployment by automatically tuning the agent control loop frequency using a single parameter. Its principles can be applied to enhance any RL algorithm, making it a versatile solution for various applications.

Updated: 2024-06-03 16:51:57

标题: MOSEAC：简化的变步长强化学习

摘要: 传统的强化学习（RL）方法通常采用固定的控制循环，其中每个周期对应一个动作。这种刚性在实际应用中存在挑战，因为最佳控制频率取决于任务。次优选择可能导致高计算需求和降低探索效率。可变时间步长强化学习（VTS-RL）通过使用自适应频率来执行控制循环，仅在必要时执行动作来解决这些问题。这种方法根植于反应式编程原则，减少了计算负荷，并通过包括动作持续时间扩展了动作空间。然而，VTS-RL的实施通常会受到需要调整多个超参数以在多目标动作持续时间空间中进行探索的影响的复杂性的限制（即，平衡任务性能和实现目标所需时间步数）。为了克服这些挑战，我们介绍了多目标软弹性演员-评论家（MOSEAC）方法。该方法具有自适应奖励方案，根据训练期间观察到的任务奖励趋势调整超参数。这种方案减少了超参数调整的复杂性，只需要一个超参数来引导探索，从而简化学习过程并降低部署成本。我们通过在牛顿运动学环境中的模拟验证了MOSEAC方法，展示了更少时间步数的高任务和训练性能，最终降低了能源消耗。这种验证表明，MOSEAC通过使用单个参数自动调整智能体控制循环频率，简化了RL算法的部署。其原则可以应用于增强任何RL算法，使其成为各种应用的通用解决方案。

更新时间: 2024-06-03 16:51:57

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2406.01521v1

Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering

Advances towards more faithful and traceable answers of Large Language Models (LLMs) are crucial for various research and practical endeavors. One avenue in reaching this goal is basing the answers on reliable sources. However, this Evidence-Based QA has proven to work insufficiently with LLMs in terms of citing the correct sources (source quality) and truthfully representing the information within sources (answer attributability). In this work, we systematically investigate how to robustly fine-tune LLMs for better source quality and answer attributability. Specifically, we introduce a data generation pipeline with automated data quality filters, which can synthesize diversified high-quality training and testing data at scale. We further introduce four test sets to benchmark the robustness of fine-tuned specialist models. Extensive evaluation shows that fine-tuning on synthetic data improves performance on both in- and out-of-distribution. Furthermore, we show that data quality, which can be drastically improved by proposed quality filters, matters more than quantity in improving Evidence-Based QA.

Updated: 2024-06-03 16:48:59

标题: 朝着忠实和稳健的基于证据的问答LLM专家的方向

摘要: 大型语言模型（LLMs）更加忠实和可追溯答案的进展对各种研究和实际努力至关重要。实现这一目标的一条途径是基于可靠的来源提供答案。然而，在引用正确的来源（来源质量）和真实地表达来源中的信息（答案可归因性）方面，这种基于证据的问答在LLMs方面已被证明效果不佳。在这项工作中，我们系统地研究如何强化微调LLMs以提高来源质量和答案可归因性。具体而言，我们引入了一个数据生成管道，其中包括自动数据质量过滤器，可以规模化地合成多样化高质量的训练和测试数据。我们进一步引入了四个测试集来评估微调专家模型的稳健性。广泛的评估表明，在合成数据上微调可以提高在内部和外部分布上的性能。此外，我们展示了数据质量，可以通过所提出的质量过滤器显著改善，比数量更重要，以改善基于证据的问答。

更新时间: 2024-06-03 16:48:59

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.08277v5

BISON: Blind Identification through Stateless scOpe-specific derivatioN

Delegating authentication to identity providers like Google or Facebook, while convenient, compromises user privacy. Global identifiers enable internet-wide tracking; furthermore, identity providers can also record users' associations. We show that neither is a necessary evil by presenting the BISON pseudonym derivation protocol, inspired by Oblivious Pseudorandom Functions. It hides the service provider's identity from the identity provider, yet produces a trusted, scoped, immutable pseudonym. Colluding service providers cannot link BISON pseudonyms. This prevents user tracking. BISON does not require long-lived state on the user device, and does not add additional actors to the authentication process. BISON uses lightweight cryptography. Pseudonym derivation requires a total of four elliptic curve scalar-point multiplications and four hash function evaluations, totaling to ~3 ms in our proof of concept implementation. BISON is designed to integrate into existing authentication protocols. We provide an OpenID Connect extension that allows OIDC's PPID pseudonyms to be derived using BISON. This demonstrates that BISON's privacy guarantees can be realized in practice. For these reasons, BISON is a crucial stepping stone towards realizing the privacy-preserving internet of tomorrow.

Updated: 2024-06-03 16:48:47

标题: BISON：通过无状态的特定范围推导进行盲目识别

摘要: 将身份验证委托给像谷歌或Facebook这样的身份提供商，虽然方便，但却损害了用户的隐私。全局标识符使得整个互联网可以进行跟踪；此外，身份提供商还可以记录用户的关联。我们通过提出受启发于遗忘伪随机函数的BISON假名推导协议，展示了这两者都不是必要的恶。它隐藏了服务提供商的身份，使得身份提供商无法得知，同时生成了一个受信任的、有范围的、不可变的假名。串通的服务提供商无法将BISON假名关联起来。这样可以防止用户被跟踪。BISON不需要在用户设备上保持长期状态，并且不会为身份验证流程增加额外的参与者。 BISON使用轻量级密码学。假名推导需要总共四次椭圆曲线标量-点乘法和四次哈希函数评估，总共在我们的概念验证实现中为约3毫秒。BISON被设计为能够集成到现有的身份验证协议中。我们提供了一个OpenID Connect扩展，允许使用BISON来推导OIDC的PPID假名。这表明BISON的隐私保证可以在实践中实现。因此，BISON是实现未来隐私保护互联网的关键一步。

更新时间: 2024-06-03 16:48:47

领域: cs.CR

下载: http://arxiv.org/abs/2406.01518v1

Beyond symmetrization: effective adjacency matrices and renormalization for (un)singed directed graphs

To address the peculiarities of directed and/or signed graphs, new Laplacian operators have emerged. For instance, in the case of directionality, we encounter the magnetic operator, dilation (which is underexplored), operators based on random walks, and so forth. The definition of these new operators leads to the need for new studies and concepts, and consequently, the development of new computational tools. But is this really necessary? In this work, we define the concept of effective adjacency matrices that arise from the definition of deformed Laplacian operators such as magnetic, dilation, and signal. These effective matrices allow mapping generic graphs to a family of unsigned, undirected graphs, enabling the application of the well-explored toolkit of measures, machine learning methods, and renormalization groups of undirected graphs. To explore the interplay between deformed operators and effective matrices, we show how the Hodge-Helmholtz decomposition can assist us in navigating this complexity.

Updated: 2024-06-03 16:48:25

标题: 超越对称化：有效邻接矩阵和重正化用于（无）有向图

摘要: 为了解决有向和/或带符号图的特殊性，新的拉普拉斯算子已经出现。例如，在方向性的情况下，我们遇到了磁算子、膨胀（这方面的研究不足）、基于随机游走的算子等。这些新算子的定义导致了对新研究和概念的需求，从而促使新的计算工具的发展。但这真的有必要吗？在这项工作中，我们定义了有效邻接矩阵的概念，这些矩阵源自于磁、膨胀和信号等变形拉普拉斯算子的定义。这些有效矩阵允许将通用图映射到一组无符号、无向图，从而能够应用已经深入研究的度量工具、机器学习方法和无向图的重标准化群。为了探索变形算子和有效矩阵之间的相互作用，我们展示了霍奇-亥姆霍兹分解如何帮助我们在这种复杂性中导航。

更新时间: 2024-06-03 16:48:25

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2406.01517v1

The Model Openness Framework: Promoting Completeness and Openness for Reproducibility, Transparency, and Usability in Artificial Intelligence

Generative AI (GAI) offers unprecedented opportunities for research and innovation, but its commercialization has raised concerns about transparency, reproducibility, and safety. Many open GAI models lack the necessary components for full understanding and reproducibility, and some use restrictive licenses whilst claiming to be ``open-source''. To address these concerns, we propose the Model Openness Framework (MOF), a ranked classification system that rates machine learning models based on their completeness and openness, following principles of open science, open source, open data, and open access. The MOF requires specific components of the model development lifecycle to be included and released under appropriate open licenses. This framework aims to prevent misrepresentation of models claiming to be open, guide researchers and developers in providing all model components under permissive licenses, and help individuals and organizations identify models that can be safely adopted without restrictions. By promoting transparency and reproducibility, the MOF combats ``openwashing'' practices and establishes completeness and openness as primary criteria alongside the core tenets of responsible AI. Wide adoption of the MOF will foster a more open AI ecosystem, benefiting research, innovation, and adoption of state-of-the-art models.

Updated: 2024-06-03 16:44:31

标题: 模型开放性框架：促进人工智能中的可重复性、透明度和可用性的完整性和开放性

摘要: 生成人工智能（GAI）为研究和创新提供了前所未有的机会，但其商业化引发了关于透明性、可重复性和安全性的担忧。许多开放的GAI模型缺乏完全理解和可重复性所必需的组件，有些使用限制性许可证却声称是“开源”的。为了解决这些问题，我们提出了模型开放性框架（MOF），这是一个排名分类系统，根据其完整性和开放性对机器学习模型进行评分，遵循开放科学、开源、开放数据和开放获取的原则。MOF要求特定模型开发生命周期的组件必须包括并在适当的开放许可证下发布。该框架旨在防止声称为开放的模型被误传，引导研究人员和开发人员在宽松许可下提供所有模型组件，并帮助个人和组织识别可以安全采用且没有限制的模型。通过促进透明性和可重复性，MOF打击“开源洗牌”行为，确立完整性和开放性作为责任人工智能核心原则之外的首要标准。MOF的广泛采用将促进更开放的人工智能生态系统，有利于研究、创新和采用最先进模型。

更新时间: 2024-06-03 16:44:31

领域: cs.LG,cs.AI,cs.CY,cs.SE

下载: http://arxiv.org/abs/2403.13784v3

Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models

Research on Large Language Models (LLMs) has often neglected subtle biases that, although less apparent, can significantly influence the models' outputs toward particular social narratives. This study addresses two such biases within LLMs: representative bias, which denotes a tendency of LLMs to generate outputs that mirror the experiences of certain identity groups, and affinity bias, reflecting the models' evaluative preferences for specific narratives or viewpoints. We introduce two novel metrics to measure these biases: the Representative Bias Score (RBS) and the Affinity Bias Score (ABS), and present the Creativity-Oriented Generation Suite (CoGS), a collection of open-ended tasks such as short story writing and poetry composition, designed with customized rubrics to detect these subtle biases. Our analysis uncovers marked representative biases in prominent LLMs, with a preference for identities associated with being white, straight, and men. Furthermore, our investigation of affinity bias reveals distinctive evaluative patterns within each model, akin to `bias fingerprints'. This trend is also seen in human evaluators, highlighting a complex interplay between human and machine bias perceptions.

Updated: 2024-06-03 16:43:16

标题: 微妙的偏见需要微妙的度量：用于评估大型语言模型中代表性偏见和亲和力偏见的双重度量

摘要: 对大型语言模型（LLMs）的研究经常忽视微妙的偏见，尽管不太明显，但可以显著影响模型向特定社会叙述输出。本研究探讨LLMs内部的两种偏见：代表性偏见，指的是LLMs生成反映某些身份群体经验的输出的倾向，以及亲和力偏见，反映模型对特定叙述或观点的评价偏好。我们引入了两个新的指标来衡量这些偏见：代表性偏见分数（RBS）和亲和力偏见分数（ABS），并呈现了以创造性为导向的生成套件（CoGS），其中包括短篇故事写作和诗歌创作等开放性任务，设计了定制的评分标准来检测这些微妙的偏见。我们的分析揭示了明显的代表性偏见存在于知名的LLMs中，偏好于与白人、直男相关的身份。此外，我们对亲和力偏见的调查揭示了每个模型内独特的评价模式，类似于“偏见指纹”。这种趋势也在人类评估者中出现，突显了人类和机器偏见感知之间复杂的相互作用。

更新时间: 2024-06-03 16:43:16

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2405.14555v4

Picturing Ambiguity: A Visual Twist on the Winograd Schema Challenge

Large Language Models (LLMs) have demonstrated remarkable success in tasks like the Winograd Schema Challenge (WSC), showcasing advanced textual common-sense reasoning. However, applying this reasoning to multimodal domains, where understanding text and images together is essential, remains a substantial challenge. To address this, we introduce WinoVis, a novel dataset specifically designed to probe text-to-image models on pronoun disambiguation within multimodal contexts. Utilizing GPT-4 for prompt generation and Diffusion Attentive Attribution Maps (DAAM) for heatmap analysis, we propose a novel evaluation framework that isolates the models' ability in pronoun disambiguation from other visual processing challenges. Evaluation of successive model versions reveals that, despite incremental advancements, Stable Diffusion 2.0 achieves a precision of 56.7% on WinoVis, only marginally surpassing random guessing. Further error analysis identifies important areas for future research aimed at advancing text-to-image models in their ability to interpret and interact with the complex visual world.

Updated: 2024-06-03 16:42:55

标题: 描绘模糊性：对Winograd Schema挑战的视觉转折

摘要: 大型语言模型（LLMs）在任务中表现出色，如温诺格拉德模式挑战（WSC），展示了先进的文本常识推理能力。然而，在多模态领域应用这种推理，其中理解文本和图像是必不可少的，仍然是一个重大挑战。为了解决这个问题，我们引入了WinoVis，这是一个新颖的数据集，专门设计用于在多模态上下文中探究文本到图像模型在代词消歧中的表现。利用GPT-4进行提示生成和扩散注意力归因映射（DAAM）进行热图分析，我们提出了一个新颖的评估框架，该框架将模型在代词消歧方面的能力与其他视觉处理挑战隔离开来。对连续模型版本的评估显示，尽管有增量进展，但稳定扩散2.0在WinoVis上实现了56.7%的精度，仅略高于随机猜测。进一步的错误分析确定了未来研究的重要领域，旨在推进文本到图像模型在理解和与复杂视觉世界互动方面的能力。

更新时间: 2024-06-03 16:42:55

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.16277v3

Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models

As the use of Large Language Models (LLMs) becomes more widespread, understanding their self-evaluation of confidence in generated responses becomes increasingly important as it is integral to the reliability of the output of these models. We introduce the concept of Confidence-Probability Alignment, that connects an LLM's internal confidence, quantified by token probabilities, to the confidence conveyed in the model's response when explicitly asked about its certainty. Using various datasets and prompting techniques that encourage model introspection, we probe the alignment between models' internal and expressed confidence. These techniques encompass using structured evaluation scales to rate confidence, including answer options when prompting, and eliciting the model's confidence level for outputs it does not recognize as its own. Notably, among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment, with an average Spearman's $\hat{\rho}$ of 0.42, across a wide range of tasks. Our work contributes to the ongoing efforts to facilitate risk assessment in the application of LLMs and to further our understanding of model trustworthiness.

Updated: 2024-06-03 16:41:53

标题: 引擎盖下的信心：大型语言模型中信心-概率对齐的调查

摘要: 随着大型语言模型（LLMs）的使用变得更加普遍，理解它们对生成的响应的自我评估信心变得越来越重要，因为这对于这些模型的输出的可靠性至关重要。我们引入了信心-概率对齐的概念，将一个LLM的内部信心，通过标记概率量化，与当明确询问其确定性时模型响应中传达的信心联系起来。通过使用各种数据集和鼓励模型内省的提示技术，我们探究了模型内部和表达的信心之间的对齐情况。这些技术包括使用结构化评估量表来评估信心，包括提示时的答案选项，以及引导模型对其自己不认可的输出的信心水平。值得注意的是，在分析的模型中，OpenAI的GPT-4显示出最强的信心-概率对齐，平均Spearman's ρ为0.42，在各种任务中都表现出色。我们的工作有助于促进在LLMs应用中进行风险评估的持续努力，进一步增进我们对模型可信度的理解。

更新时间: 2024-06-03 16:41:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16282v3

The Geometry of Categorical and Hierarchical Concepts in Large Language Models

Understanding how semantic meaning is encoded in the representation spaces of large language models is a fundamental problem in interpretability. In this paper, we study the two foundational questions in this area. First, how are categorical concepts, such as {'mammal', 'bird', 'reptile', 'fish'}, represented? Second, how are hierarchical relations between concepts encoded? For example, how is the fact that 'dog' is a kind of 'mammal' encoded? We show how to extend the linear representation hypothesis to answer these questions. We find a remarkably simple structure: simple categorical concepts are represented as simplices, hierarchically related concepts are orthogonal in a sense we make precise, and (in consequence) complex concepts are represented as polytopes constructed from direct sums of simplices, reflecting the hierarchical structure. We validate these theoretical results on the Gemma large language model, estimating representations for 957 hierarchically related concepts using data from WordNet.

Updated: 2024-06-03 16:34:01

标题: 大型语言模型中分类和层次概念的几何结构

摘要: 理解语义含义如何在大型语言模型的表示空间中编码是可解释性中的一个基本问题。在这篇论文中，我们研究了这一领域中的两个基本问题。首先，类别概念，例如{'哺乳动物'，'鸟类'，'爬行动物'，'鱼类'}，是如何表示的？其次，概念之间的层次关系是如何编码的？例如，'狗'是'哺乳动物'的一种这一事实是如何编码的？我们展示了如何扩展线性表示假设来回答这些问题。我们发现一个非常简单的结构：简单的类别概念被表示为单纯形，层次相关的概念在某种我们明确界定的意义上是正交的，因此，复杂概念被表示为由单纯形的直和构成的多面体，反映了层次结构。我们在Gemma大型语言模型上验证了这些理论结果，利用来自WordNet的数据估计了957个层次相关概念的表示。

更新时间: 2024-06-03 16:34:01

领域: cs.CL,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01506v1

Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

We investigate the problem of explainability for machine learning models, focusing on Feature Attribution Methods (FAMs) that evaluate feature importance through perturbation tests. Despite their utility, FAMs struggle to distinguish the contributions of different features, when their prediction changes are similar after perturbation. To enhance FAMs' discriminative power, we introduce Feature Attribution with Necessity and Sufficiency (FANS), which find a neighborhood of the input such that perturbing samples within this neighborhood have a high Probability of being Necessity and Sufficiency (PNS) cause for the change in predictions, and use this PNS as the importance of the feature. Specifically, FANS compute this PNS via a heuristic strategy for estimating the neighborhood and a perturbation test involving two stages (factual and interventional) for counterfactual reasoning. To generate counterfactual samples, we use a resampling-based approach on the observed samples to approximate the required conditional distribution. We demonstrate that FANS outperforms existing attribution methods on six benchmarks. Please refer to the source code via \url{https://github.com/DMIRLAB-Group/FANS}.

Updated: 2024-06-03 16:29:05

标题: 使用双阶段扰动测试的必要性和充分性特征归因，用于因果解释

摘要: 我们研究了机器学习模型的可解释性问题，重点关注通过扰动测试评估特征重要性的特征归因方法（FAMs）。尽管它们很有用，但在扰动后预测变化相似时，FAMs很难区分不同特征的贡献。为了增强FAMs的区分能力，我们引入了带有必要性和充分性的特征归因（FANS），它找到一个输入的邻域，使得在这个邻域内扰动样本具有很高的成为必要性和充分性（PNS）原因改变预测的概率，并将这个PNS作为特征的重要性。具体来说，FANS通过一种启发式策略来计算这个PNS，用于估计邻域，以及涉及两个阶段（事实和干预）的干预测试，用于反事实推理。为了生成反事实样本，我们使用基于重采样的方法对观察样本进行近似所需的条件分布。我们证明FANS在六个基准测试中优于现有的归因方法。请通过\url{https://github.com/DMIRLAB-Group/FANS}查看源代码。

更新时间: 2024-06-03 16:29:05

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2402.08845v3

An efficient Wasserstein-distance approach for reconstructing jump-diffusion processes using parameterized neural networks

We analyze the Wasserstein distance ($W$-distance) between two probability distributions associated with two multidimensional jump-diffusion processes. Specifically, we analyze a temporally decoupled squared $W_2$-distance, which provides both upper and lower bounds associated with the discrepancies in the drift, diffusion, and jump amplitude functions between the two jump-diffusion processes. Then, we propose a temporally decoupled squared $W_2$-distance method for efficiently reconstructing unknown jump-diffusion processes from data using parameterized neural networks. We further show its performance can be enhanced by utilizing prior information on the drift function of the jump-diffusion process. The effectiveness of our proposed reconstruction method is demonstrated across several examples and applications.

Updated: 2024-06-03 16:26:24

标题: 一种高效的Wasserstein距离方法，用于利用参数化神经网络重建跳跃扩散过程

摘要: 我们分析了与两个多维跳跃扩散过程相关的两个概率分布之间的Wasserstein距离（$W$-距离）。具体来说，我们分析了一个时间分离的平方$W_2$-距离，该距离提供了与两个跳跃扩散过程之间漂移、扩散和跳跃幅度函数偏差相关的上下界。然后，我们提出了一种时间分离的平方$W_2$-距离方法，用于利用参数化神经网络从数据中高效重建未知的跳跃扩散过程。我们进一步展示了通过利用关于跳跃扩散过程漂移函数的先验信息可以提高其性能。我们提出的重建方法的有效性在多个示例和应用中得到了证明。

更新时间: 2024-06-03 16:26:24

领域: stat.ML,cs.LG,math.PR,stat.AP,stat.ME,60G07, 60J76

下载: http://arxiv.org/abs/2406.01653v1

XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare

The integration of Large Language Models (LLMs) into healthcare diagnostics offers a promising avenue for clinical decision-making. This study outlines the development of a novel method for zero-shot/few-shot in-context learning (ICL) by integrating medical domain knowledge using a multi-layered structured prompt. We also explore the efficacy of two communication styles between the user and LLMs: the Numerical Conversational (NC) style, which processes data incrementally, and the Natural Language Single-Turn (NL-ST) style, which employs long narrative prompts. Our study systematically evaluates the diagnostic accuracy and risk factors, including gender bias and false negative rates, using a dataset of 920 patient records in various few-shot scenarios. Results indicate that traditional clinical machine learning (ML) models generally outperform LLMs in zero-shot and few-shot settings. However, the performance gap narrows significantly when employing few-shot examples alongside effective explainable AI (XAI) methods as sources of domain knowledge. Moreover, with sufficient time and an increased number of examples, the conversational style (NC) nearly matches the performance of ML models. Most notably, LLMs demonstrate comparable or superior cost-sensitive accuracy relative to ML models. This research confirms that, with appropriate domain knowledge and tailored communication strategies, LLMs can significantly enhance diagnostic processes. The findings highlight the importance of optimizing the number of training examples and communication styles to improve accuracy and reduce biases in LLM applications.

Updated: 2024-06-03 16:23:28

标题: XAI4LLM：让机器学习模型和LLMs共同合作，增强医疗保健领域中的上下文学习

摘要: 将大型语言模型（LLMs）整合到医疗诊断中为临床决策提供了一个有前景的途径。本研究概述了通过使用多层结构化提示集成医学领域知识开发一种零样本/少样本情境学习（ICL）的新方法。我们还探讨了用户和LLMs之间两种沟通风格的有效性：数字对话（NC）风格，逐步处理数据，以及自然语言单轮（NL-ST）风格，采用长篇叙述提示。我们的研究系统评估了诊断准确性和风险因素，包括性别偏见和假阴性率，在包含920个患者记录的各种少样本情景的数据集中。结果表明，传统的临床机器学习（ML）模型通常在零样本和少样本情景中优于LLMs。然而，在使用少样本示例以及有效的可解释人工智能（XAI）方法作为领域知识来源时，性能差距显著缩小。此外，随着足够的时间和增加的示例数量，对话风格（NC）几乎可以与ML模型的性能相匹配。值得注意的是，相对于ML模型，LLMs展示了可比或更高的成本敏感准确性。这项研究证实，通过适当的领域知识和量身定制的沟通策略，LLMs能够显著增强诊断过程。研究结果强调了优化训练样本数量和沟通风格以改善LLM应用准确性和减少偏见的重要性。

更新时间: 2024-06-03 16:23:28

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.06270v3

Robust Classification by Coupling Data Mollification with Label Smoothing

Introducing training-time augmentations is a key technique to enhance generalization and prepare deep neural networks against test-time corruptions. Inspired by the success of generative diffusion models, we propose a novel approach coupling data augmentation, in the form of image noising and blurring, with label smoothing to align predicted label confidences with image degradation. The method is simple to implement, introduces negligible overheads, and can be combined with existing augmentations. We demonstrate improved robustness and uncertainty quantification on the corrupted image benchmarks of the CIFAR and TinyImageNet datasets.

Updated: 2024-06-03 16:21:29

标题: 通过将数据平滑化与标签平滑化相结合实现强健分类

摘要: 引入训练时增强是增强泛化能力并准备深度神经网络应对测试时错误的关键技术。受到生成扩散模型成功的启发，我们提出了一种新颖的方法，将数据增强（图像噪声和模糊）与标签平滑相结合，以使预测的标签置信度与图像退化对齐。该方法实现简单，引入的开销可以忽略不计，并且可以与现有的增强技术结合使用。我们在CIFAR和TinyImageNet数据集的损坏图像基准上展示了改进的稳健性和不确定性量化。

更新时间: 2024-06-03 16:21:29

领域: cs.CV,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01494v1

Diffusion Model-Augmented Behavioral Cloning

Imitation learning addresses the challenge of learning by observing an expert's demonstrations without access to reward signals from environments. Most existing imitation learning methods that do not require interacting with environments either model the expert distribution as the conditional probability p(a|s) (e.g., behavioral cloning, BC) or the joint probability p(s, a). Despite the simplicity of modeling the conditional probability with BC, it usually struggles with generalization. While modeling the joint probability can improve generalization performance, the inference procedure is often time-consuming, and the model can suffer from manifold overfitting. This work proposes an imitation learning framework that benefits from modeling both the conditional and joint probability of the expert distribution. Our proposed Diffusion Model-Augmented Behavioral Cloning (DBC) employs a diffusion model trained to model expert behaviors and learns a policy to optimize both the BC loss (conditional) and our proposed diffusion model loss (joint). DBC outperforms baselines in various continuous control tasks in navigation, robot arm manipulation, dexterous manipulation, and locomotion. We design additional experiments to verify the limitations of modeling either the conditional probability or the joint probability of the expert distribution, as well as compare different generative models. Ablation studies justify the effectiveness of our design choices.

Updated: 2024-06-03 16:17:28

标题: 扩展行为克隆的扩散模型

摘要: 模仿学习解决了通过观察专家演示来学习的挑战，而无需访问环境中的奖励信号。大多数现有的不需要与环境进行交互的模仿学习方法要么将专家分布建模为条件概率p(a|s)（例如，行为克隆，BC），要么建模为联合概率p(s, a)。尽管使用行为克隆模型化条件概率的简单性，但通常很难实现泛化。而建模联合概率可以提高泛化性能，推断过程通常耗时，并且模型可能受到流形过拟合的影响。本文提出了一种模仿学习框架，该框架受益于同时建模专家分布的条件概率和联合概率。我们提出的扩散模型增强的行为克隆（DBC）采用了一个训练有素的扩散模型来建模专家行为，并学习一种策略来优化行为克隆损失（条件）和我们提出的扩散模型损失（联合）。DBC在导航、机器人臂操作、熟练操作和运动控制等各种连续控制任务中优于基线。我们设计了额外的实验来验证模型专家分布的条件概率或联合概率的局限性，以及比较不同的生成模型。消融研究证明了我们设计选择的有效性。

更新时间: 2024-06-03 16:17:28

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2302.13335v4

How Flawed Is ECE? An Analysis via Logit Smoothing

Informally, a model is calibrated if its predictions are correct with a probability that matches the confidence of the prediction. By far the most common method in the literature for measuring calibration is the expected calibration error (ECE). Recent work, however, has pointed out drawbacks of ECE, such as the fact that it is discontinuous in the space of predictors. In this work, we ask: how fundamental are these issues, and what are their impacts on existing results? Towards this end, we completely characterize the discontinuities of ECE with respect to general probability measures on Polish spaces. We then use the nature of these discontinuities to motivate a novel continuous, easily estimated miscalibration metric, which we term Logit-Smoothed ECE (LS-ECE). By comparing the ECE and LS-ECE of pre-trained image classification models, we show in initial experiments that binned ECE closely tracks LS-ECE, indicating that the theoretical pathologies of ECE may be avoidable in practice.

Updated: 2024-06-03 16:14:51

标题: ECE有多大的缺陷？通过Logit平滑分析进行分析

摘要: 非正式地说，如果模型的预测正确的概率与预测的置信度相匹配，那么模型就被校准了。在文献中，衡量校准性最常见的方法是期望校准误差（ECE）。然而，最近的研究指出了ECE的缺点，比如它在预测空间中是不连续的。在这项研究中，我们问：这些问题有多根本，它们对现有结果有什么影响？为此，我们完全描述了ECE相对波兰空间上一般概率测度的不连续性。然后，我们利用这些不连续性的特性，提出了一个新的连续、易估计的校准度量，我们称之为Logit-Smoothed ECE（LS-ECE）。通过比较预训练的图像分类模型的ECE和LS-ECE，在初步实验中我们发现，分桶ECE与LS-ECE密切相关，表明ECE的理论病态在实践中是可以避免的。

更新时间: 2024-06-03 16:14:51

领域: cs.LG,math.PR,68T37 (Primary) 62-08, 60E05 (Secondary)

下载: http://arxiv.org/abs/2402.10046v2

Convolutional L2LFlows: Generating Accurate Showers in Highly Granular Calorimeters Using Convolutional Normalizing Flows

In the quest to build generative surrogate models as computationally efficient alternatives to rule-based simulations, the quality of the generated samples remains a crucial frontier. So far, normalizing flows have been among the models with the best fidelity. However, as the latent space in such models is required to have the same dimensionality as the data space, scaling up normalizing flows to high dimensional datasets is not straightforward. The prior L2LFlows approach successfully used a series of separate normalizing flows and sequence of conditioning steps to circumvent this problem. In this work, we extend L2LFlows to simulate showers with a 9-times larger profile in the lateral direction. To achieve this, we introduce convolutional layers and U-Net-type connections, move from masked autoregressive flows to coupling layers, and demonstrate the successful modelling of showers in the ILD Electromagnetic Calorimeter as well as Dataset 3 from the public CaloChallenge dataset.

Updated: 2024-06-03 16:11:03

标题: 卷积L2LFlows：使用卷积归一化流在高度颗粒度的量能器中生成准确的簇射

摘要: 在构建生成替代模型作为基于规则的仿真的计算效率替代品的过程中，生成样本的质量仍然是一个关键的前沿。到目前为止，归一化流一直是拥有最高保真度的模型之一。然而，在这种模型中，潜在空间需要与数据空间具有相同的维度，因此将归一化流扩展到高维数据集并不是直接的。先前的L2LFlows方法成功地使用了一系列单独的归一化流和一系列调节步骤来规避这个问题。在这项工作中，我们将L2LFlows扩展到在横向方向上具有9倍更大轮廓的淋浴模拟。为了实现这一目标，我们引入了卷积层和U-Net类型的连接，从掩码自回归流转移到耦合层，并展示了在ILD电磁量热器以及公共CaloChallenge数据集的Dataset 3中成功建模淋浴的情况。

更新时间: 2024-06-03 16:11:03

领域: physics.ins-det,cs.LG,hep-ex,hep-ph,physics.data-an

下载: http://arxiv.org/abs/2405.20407v2

Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization

We investigate the finite-time analysis of finding ($\delta,\epsilon$)-stationary points for nonsmooth nonconvex objectives in decentralized stochastic optimization. A set of agents aim at minimizing a global function using only their local information by interacting over a network. We present a novel algorithm, called Multi Epoch Decentralized Online Learning (ME-DOL), for which we establish the sample complexity in various settings. First, using a recently proposed online-to-nonconvex technique, we show that our algorithm recovers the optimal convergence rate of smooth nonconvex objectives. We then extend our analysis to the nonsmooth setting, building on properties of randomized smoothing and Goldstein-subdifferential sets. We establish the sample complexity of $O(\delta^{-1}\epsilon^{-3})$, which to the best of our knowledge is the first finite-time guarantee for decentralized nonsmooth nonconvex stochastic optimization in the first-order setting (without weak-convexity), matching its optimal centralized counterpart. We further prove the same rate for the zero-order oracle setting without using variance reduction.

Updated: 2024-06-03 16:09:34

标题: 在线优化视角下的一阶和零阶分散非光滑非凸随机优化

摘要: 我们研究了在分散随机优化中找到($\delta,\epsilon$)-稳定点的有限时间分析。一组代理旨在通过在网络上交互仅使用他们的本地信息来最小化全局函数。我们提出了一种新颖的算法，称为多时期分散在线学习（ME-DOL），我们在各种设置中建立了样本复杂度。首先，使用最近提出的在线到非凸技术，我们展示了我们的算法恢复了光滑非凸目标的最优收敛速率。然后，我们将分析扩展到非光滑设置，利用随机平滑和Goldstein-次微分集的性质。我们建立了$O(\delta^{-1}\epsilon^{-3})$的样本复杂度，据我们所知，这是第一个分散非光滑非凸随机优化在一阶设置中的有限时间保证（没有弱凸性），与其最优集中化对应。我们进一步证明了在不使用方差减少的零阶oracle设置中相同的速率。

更新时间: 2024-06-03 16:09:34

领域: math.OC,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.01484v1

Bayesian learning of Causal Structure and Mechanisms with GFlowNets and Variational Bayes

Bayesian causal structure learning aims to learn a posterior distribution over directed acyclic graphs (DAGs), and the mechanisms that define the relationship between parent and child variables. By taking a Bayesian approach, it is possible to reason about the uncertainty of the causal model. The notion of modelling the uncertainty over models is particularly crucial for causal structure learning since the model could be unidentifiable when given only a finite amount of observational data. In this paper, we introduce a novel method to jointly learn the structure and mechanisms of the causal model using Variational Bayes, which we call Variational Bayes-DAG-GFlowNet (VBG). We extend the method of Bayesian causal structure learning using GFlowNets to learn not only the posterior distribution over the structure, but also the parameters of a linear-Gaussian model. Our results on simulated data suggest that VBG is competitive against several baselines in modelling the posterior over DAGs and mechanisms, while offering several advantages over existing methods, including the guarantee to sample acyclic graphs, and the flexibility to generalize to non-linear causal mechanisms.

Updated: 2024-06-03 16:09:12

标题: 使用GFlowNets和变分贝叶斯的贝叶斯学习因果结构和机制

摘要: 贝叶斯因果结构学习旨在学习一个概率后验分布，该分布覆盖了有向无环图（DAGs）以及定义父子变量关系的机制。通过采用贝叶斯方法，可以推断因果模型的不确定性。对模型不确定性进行建模对于因果结构学习尤为关键，因为在仅有有限观测数据时，模型可能无法识别。在本文中，我们提出了一种新颖的方法，使用变分贝叶斯(Variational Bayes)来联合学习因果模型的结构和机制，我们称之为变分贝叶斯-DAG-GFlowNet（VBG）。我们扩展了贝叶斯因果结构学习方法，使用GFlowNets来学习不仅是结构的后验分布，还有线性高斯模型的参数。我们在模拟数据上的结果表明，VBG在建模DAGs和机制的后验分布方面与几种基线方法竞争，同时比现有方法具有几个优势，包括保证采样无环图，并能灵活推广到非线性因果机制。

更新时间: 2024-06-03 16:09:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2211.02763v3

Learning from Streaming Data when Users Choose

In digital markets comprised of many competing services, each user chooses between multiple service providers according to their preferences, and the chosen service makes use of the user data to incrementally improve its model. The service providers' models influence which service the user will choose at the next time step, and the user's choice, in return, influences the model update, leading to a feedback loop. In this paper, we formalize the above dynamics and develop a simple and efficient decentralized algorithm to locally minimize the overall user loss. Theoretically, we show that our algorithm asymptotically converges to stationary points of of the overall loss almost surely. We also experimentally demonstrate the utility of our algorithm with real world data.

Updated: 2024-06-03 16:07:52

标题: 从用户选择时学习流数据

摘要: 在由许多竞争服务组成的数字市场中，每个用户根据自己的偏好选择多个服务提供商，所选择的服务利用用户数据逐步改进其模型。服务提供商的模型影响用户在下一个时间步选择哪种服务，而用户的选择反过来影响模型更新，形成一个反馈循环。在本文中，我们形式化上述动态并开发了一种简单高效的分散算法来在本地最小化总体用户损失。理论上，我们证明我们的算法渐近地收敛到总体损失的稳定点。我们还通过实际数据实验证明了我们算法的效用。

更新时间: 2024-06-03 16:07:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01481v1

Stochastic Newton Proximal Extragradient Method

Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time. Recent work in [arXiv:2204.09266] addressed this with a Hessian averaging scheme that achieves superlinear convergence without higher per-iteration costs. Nonetheless, the method has slow global convergence, requiring up to $\tilde{O}(\kappa^2)$ iterations to reach the superlinear rate of $\tilde{O}((1/t)^{t/2})$, where $\kappa$ is the problem's condition number. In this paper, we propose a novel stochastic Newton proximal extragradient method that improves these bounds, achieving a faster global linear rate and reaching the same fast superlinear rate in $\tilde{O}(\kappa)$ iterations. We accomplish this by extending the Hybrid Proximal Extragradient (HPE) framework, achieving fast global and local convergence rates for strongly convex functions with access to a noisy Hessian oracle.

Updated: 2024-06-03 16:06:23

标题: 随机牛顿近端外梯度法

摘要: 随机二阶方法通过使用有噪声的Hessian估计对梯度进行预处理，在强凸优化中实现快速局部收敛。然而，这些方法通常只有在随机Hessian噪声减小时才能达到超线性收敛，随着时间的推移，每次迭代的成本也会增加。最近的研究[arXiv:2204.09266]通过Hessian平均方案解决了这个问题，实现了超线性收敛而不增加每次迭代的成本。然而，该方法的全局收敛速度较慢，需要达到$\tilde{O}(\kappa^2)$次迭代才能达到超线性速率$\tilde{O}((1/t)^{t/2})$，其中$\kappa$是问题的条件数。在本文中，我们提出了一种新颖的随机牛顿近端外梯度方法，改善了这些界限，实现了更快的全局线性速率，并在$\tilde{O}(\kappa)$次迭代中达到相同的快速超线性速率。我们通过扩展Hybrid Proximal Extragradient (HPE)框架实现了这一点，为具有噪声Hessian oracle访问权限的强凸函数实现了快速全局和局部收敛速率。

更新时间: 2024-06-03 16:06:23

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01478v1

Finding Optimally Robust Data Mixtures via Concave Maximization

Training on mixtures of data distributions is now common in many modern machine learning pipelines, useful for performing well on several downstream tasks. Group distributionally robust optimization (group DRO) is one popular way to learn mixture weights for training a specific model class, but group DRO methods suffer for non-linear models due to non-convex loss functions and when the models are non-parametric. We address these challenges by proposing to solve a more general DRO problem, giving a method we call MixMax. MixMax selects mixture weights by maximizing a particular concave objective with entropic mirror ascent, and, crucially, we prove that optimally fitting this mixture distribution over the set of bounded predictors returns a group DRO optimal model. Experimentally, we tested MixMax on a sequence modeling task with transformers and on a variety of non-parametric learning problems. In all instances MixMax matched or outperformed the standard data mixing and group DRO baselines, and in particular, MixMax improved the performance of XGBoost over the only baseline, data balancing, for variations of the ACSIncome and CelebA annotations datasets.

Updated: 2024-06-03 16:06:12

标题: 通过凹函数最大化发现最优鲁棒数据混合

摘要: 目前，在许多现代机器学习流水线中，对混合数据分布进行训练已经很常见，这对于在多个下游任务上表现良好非常有用。群体分布鲁棒优化（group DRO）是一种流行的学习混合权重以训练特定模型类的方法，但由于非凸损失函数和非参数模型，群体DRO方法在非线性模型上存在问题。我们通过提出解决更一般的DRO问题来解决这些挑战，提出了一种我们称为MixMax的方法。MixMax通过最大化具有特定凹目标的熵镜上升来选择混合权重，并且关键是，我们证明了在有界预测器集合上最优拟合这种混合分布将返回群体DRO最优模型。在实验中，我们在使用transformers进行序列建模任务以及各种非参数学习问题上测试了MixMax。在所有情况下，MixMax与标准数据混合和群体DRO基线相匹配或表现更好，特别是，MixMax在ACSIncome和CelebA注释数据集的变体上，相对于唯一基线数据平衡，提高了XGBoost的性能。

更新时间: 2024-06-03 16:06:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01477v1

Feature Importance Disparities for Data Bias Investigations

It is widely held that one cause of downstream bias in classifiers is bias present in the training data. Rectifying such biases may involve context-dependent interventions such as training separate models on subgroups, removing features with bias in the collection process, or even conducting real-world experiments to ascertain sources of bias. Despite the need for such data bias investigations, few automated methods exist to assist practitioners in these efforts. In this paper, we present one such method that given a dataset $X$ consisting of protected and unprotected features, outcomes $y$, and a regressor $h$ that predicts $y$ given $X$, outputs a tuple $(f_j, g)$, with the following property: $g$ corresponds to a subset of the training dataset $(X, y)$, such that the $j^{th}$ feature $f_j$ has much larger (or smaller) influence in the subgroup $g$, than on the dataset overall, which we call feature importance disparity (FID). We show across $4$ datasets and $4$ common feature importance methods of broad interest to the machine learning community that we can efficiently find subgroups with large FID values even over exponentially large subgroup classes and in practice these groups correspond to subgroups with potentially serious bias issues as measured by standard fairness metrics.

Updated: 2024-06-03 16:03:48

标题: 数据偏差调查中的特征重要性差异

摘要: 普遍认为分类器中下游偏差的一个原因是训练数据中存在的偏差。纠正这种偏差可能涉及依赖于上下文的干预措施，例如在子群组上训练单独的模型、在数据收集过程中去除带有偏差的特征，甚至进行现实世界的实验以确定偏差的来源。尽管有这种数据偏差调查的需求，但目前存在很少自动化方法来协助从业者进行这些努力。在本文中，我们提出了一种这样的方法，给定一个由受保护和未受保护特征、结果和预测结果的回归器组成的数据集$X$，输出一个元组$(f_j, g)$，具有以下属性：$g$对应于训练数据集$(X, y)$的一个子集，其中第$j$个特征$f_j$在子群组$g$中的影响要比整个数据集上大得多（或小得多），我们称之为特征重要性差异（FID）。我们通过4个数据集和对机器学习社区具有广泛兴趣的4种常见特征重要性方法展示，我们可以高效地找到具有大FID值的子群组，即使在指数级别的庞大子群组类别中，实际上这些组对应于可能存在严重偏差问题的子群组，根据标准的公平度量来衡量。

更新时间: 2024-06-03 16:03:48

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2303.01704v4

Significance of Chain of Thought in Gender Bias Mitigation for English-Dravidian Machine Translation

Gender bias in machine translation (MT) sys- tems poses a significant challenge to achieving accurate and inclusive translations. This paper examines gender bias in machine translation systems for languages such as Telugu and Kan- nada from the Dravidian family, analyzing how gender inflections affect translation accuracy and neutrality using Google Translate and Chat- GPT. It finds that while plural forms can reduce bias, individual-centric sentences often main- tain the bias due to historical stereotypes. The study evaluates the Chain of Thought process- ing, noting significant bias mitigation from 80% to 4% in Telugu and from 40% to 0% in Kan- nada. It also compares Telugu and Kannada translations, emphasizing the need for language specific strategies to address these challenges and suggesting directions for future research to enhance fairness in both data preparation and prompts during inference.

Updated: 2024-06-03 15:59:34

标题: 《思维链在英德拉维迪亚语机器翻译性别偏见缓解中的重要性》

摘要: 机器翻译系统中的性别偏见对于实现准确和包容性翻译构成了重要挑战。本文研究了针对特拉古语和坎纳达语等德拉维达语系的机器翻译系统中的性别偏见，分析了性别词缀如何影响翻译准确性和中立性，使用了谷歌翻译和Chat-GPT。研究发现，虽然复数形式可以减少偏见，但基于个人的句子往往保持偏见，这是由于历史刻板印象造成的。该研究评估了思维链处理过程，注意到在特拉古语中偏见从80%降低到4%，在坎纳达语中从40%降低到0%。它还比较了特拉古语和坎纳达语的翻译，强调了需要针对语言的特定策略来解决这些挑战，并建议未来研究的方向，以增强在数据准备和推理过程中的公平性。

更新时间: 2024-06-03 15:59:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19701v2

Inverse design of photonic surfaces on Inconel via multi-fidelity machine learning ensemble framework and high throughput femtosecond laser processing

We demonstrate a multi-fidelity (MF) machine learning ensemble framework for the inverse design of photonic surfaces, trained on a dataset of 11,759 samples that we fabricate using high throughput femtosecond laser processing. The MF ensemble combines an initial low fidelity model for generating design solutions, with a high fidelity model that refines these solutions through local optimization. The combined MF ensemble can generate multiple disparate sets of laser-processing parameters that can each produce the same target input spectral emissivity with high accuracy (root mean squared errors < 2%). SHapley Additive exPlanations analysis shows transparent model interpretability of the complex relationship between laser parameters and spectral emissivity. Finally, the MF ensemble is experimentally validated by fabricating and evaluating photonic surface designs that it generates for improved efficiency energy harvesting devices. Our approach provides a powerful tool for advancing the inverse design of photonic surfaces in energy harvesting applications.

Updated: 2024-06-03 15:59:19

标题: Inconel上光子表面的逆向设计：基于多信度机器学习集成框架和高通量飞秒激光加工

摘要: 我们展示了一个多精度（MF）机器学习集成框架，用于逆向设计光子表面，该框架训练于我们使用高通量飞秒激光加工制作的11759个样本数据集。MF集成结合了一个初始低精度模型用于生成设计解决方案，以及一个高精度模型通过局部优化来优化这些解决方案。组合的MF集成可以生成多个不同的激光加工参数集，每个集合都可以产生相同的目标输入光谱发射率，并且准确度很高（均方根误差<2%）。SHapley Additive exPlanations分析显示了激光参数和光谱发射率之间复杂关系的透明模型可解释性。最后，MF集成通过制作和评估光子表面设计来实验验证，这些设计可以提高能量收集设备的效率。我们的方法为能量收集应用中的光子表面的逆向设计提供了一个强大的工具。

更新时间: 2024-06-03 15:59:19

领域: cs.LG,cs.CE,physics.optics

下载: http://arxiv.org/abs/2406.01471v1

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks

This study investigates how weight decay affects the update behavior of individual neurons in deep neural networks through a combination of applied analysis and experimentation. Weight decay can cause the expected magnitude and angular updates of a neuron's weight vector to converge to a steady state we call rotational equilibrium. These states can be highly homogeneous, effectively balancing the average rotation -- a proxy for the effective learning rate -- across different layers and neurons. Our work analyzes these dynamics across optimizers like Adam, Lion, and SGD with momentum, offering a new simple perspective on training that elucidates the efficacy of widely used but poorly understood methods in deep learning. We demonstrate how balanced rotation plays a key role in the effectiveness of normalization like Weight Standardization, as well as that of AdamW over Adam with L2-regularization. Finally, we show that explicitly controlling the rotation provides the benefits of weight decay while substantially reducing the need for learning rate warmup.

Updated: 2024-06-03 15:57:47

标题: 旋转平衡：体重衰减如何平衡神经网络中的学习

摘要: 这项研究通过应用分析和实验的结合，研究了权重衰减如何影响深度神经网络中个别神经元的更新行为。权重衰减可以导致神经元权重向量的预期幅度和角度更新收敛到我们称之为旋转平衡的稳定状态。这些状态可以非常均匀，有效地平衡不同层和神经元之间的平均旋转 -- 这是有效学习率的代理。我们的工作分析了Adam、Lion和带动量的SGD等优化器之间的动态，提供了一个新的简单视角，阐明了深度学习中广泛使用但不够理解的方法的有效性。我们展示了平衡旋转在标准化如权重标准化的有效性中起着关键作用，以及AdamW相对于带L2正则化的Adam的效果。最后，我们表明明确控制旋转可以提供权重衰减的好处，同时大大减少学习率预热的需求。

更新时间: 2024-06-03 15:57:47

领域: cs.LG

下载: http://arxiv.org/abs/2305.17212v4

Understanding Token Probability Encoding in Output Embeddings

In this paper, we investigate the output token probability information in the output embedding of language models. We provide an approximate common log-linear encoding of output token probabilities within the output embedding vectors and demonstrate that it is accurate and sparse when the output space is large and output logits are concentrated. Based on such findings, we edit the encoding in output embedding to modify the output probability distribution accurately. Moreover, the sparsity we find in output probability encoding suggests that a large number of dimensions in the output embedding do not contribute to causal language modeling. Therefore, we attempt to delete the output-unrelated dimensions and find more than 30% of the dimensions can be deleted without significant movement in output distribution and degeneration on sequence generation. Additionally, in training dynamics, we use such encoding as a probe and find that the output embeddings capture token frequency information in early steps, even before an obvious convergence starts.

Updated: 2024-06-03 15:57:29

标题: 理解输出嵌入中的令牌概率编码

摘要: 在这篇论文中，我们研究了语言模型输出嵌入中的输出标记概率信息。我们提供了一种对输出嵌入向量中输出标记概率的近似常见对数线性编码，并证明当输出空间较大且输出logits集中时，该编码是准确且稀疏的。基于这些发现，我们编辑输出嵌入中的编码以准确修改输出概率分布。此外，我们在输出概率编码中发现的稀疏性表明输出嵌入中的许多维度并不对因果语言建模有贡献。因此，我们尝试删除与输出无关的维度，并发现超过30%的维度可以被删除而不会显著改变输出分布并且不会导致序列生成的退化。此外，在训练动态中，我们将这种编码用作探针，并发现在明显收敛开始之前，输出嵌入在早期步骤中就捕捉到了标记频率信息。

更新时间: 2024-06-03 15:57:29

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.01468v1

EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting

Event cameras offer promising advantages such as high dynamic range and low latency, making them well-suited for challenging lighting conditions and fast-moving scenarios. However, reconstructing 3D scenes from raw event streams is difficult because event data is sparse and does not carry absolute color information. To release its potential in 3D reconstruction, we propose the first event-based generalizable 3D reconstruction framework, called EvGGS, which reconstructs scenes as 3D Gaussians from only event input in a feedforward manner and can generalize to unseen cases without any retraining. This framework includes a depth estimation module, an intensity reconstruction module, and a Gaussian regression module. These submodules connect in a cascading manner, and we collaboratively train them with a designed joint loss to make them mutually promote. To facilitate related studies, we build a novel event-based 3D dataset with various material objects and calibrated labels of grayscale images, depth maps, camera poses, and silhouettes. Experiments show models that have jointly trained significantly outperform those trained individually. Our approach performs better than all baselines in reconstruction quality, and depth/intensity predictions with satisfactory rendering speed.

Updated: 2024-06-03 15:51:49

标题: EvGGS：一种用于基于事件的可推广高斯喷洒的协作学习框架

摘要: 事件摄像机具有高动态范围和低延迟等优势，使其非常适合挑战性的光照条件和快速移动场景。然而，从原始事件流重建3D场景是困难的，因为事件数据稀疏且不携带绝对颜色信息。为了释放其在3D重建中的潜力，我们提出了第一个基于事件的通用化3D重建框架，称为EvGGS，它可以仅通过事件输入以前馈方式将场景重建为3D高斯，并且可以在未经任何重新训练的情况下泛化到未见情况。该框架包括深度估计模块、强度重建模块和高斯回归模块。这些子模块以级联方式连接，我们通过设计的联合损失共同训练它们，使它们相互促进。为了促进相关研究，我们构建了一个包含各种材料对象和灰度图像、深度图、相机姿态和轮廓标定标签的新颖的基于事件的3D数据集。实验表明，联合训练的模型明显优于单独训练的模型。我们的方法在重建质量和深度/强度预测方面表现优于所有基准线，并具有令人满意的渲染速度。

更新时间: 2024-06-03 15:51:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.14959v2

Understanding Preference Fine-Tuning Through the Lens of Coverage

Learning from human preference data has emerged as the dominant paradigm for fine-tuning large language models (LLMs). The two most common families of techniques -- online reinforcement learning (RL) such as Proximal Policy Optimization (PPO) and offline contrastive methods such as Direct Preference Optimization (DPO) -- were positioned as equivalent in prior work due to the fact that both have to start from the same offline preference dataset. To further expand our theoretical understanding of the similarities and differences between online and offline techniques for preference fine-tuning, we conduct a rigorous analysis through the lens of dataset coverage, a concept that captures how the training data covers the test distribution and is widely used in RL. We prove that a global coverage condition is both necessary and sufficient for offline contrastive methods to converge to the optimal policy, but a weaker partial coverage condition suffices for online RL methods. This separation provides one explanation of why online RL methods can perform better than offline methods, especially when the offline preference data is not diverse enough. Finally, motivated by our preceding theoretical observations, we derive a hybrid preference optimization (HyPO) algorithm that uses offline data for contrastive-based preference optimization and online data for KL regularization. Theoretically and empirically, we demonstrate that HyPO is more performant than its pure offline counterpart DPO, while still preserving its computation and memory efficiency.

Updated: 2024-06-03 15:51:04

标题: 透过覆盖范围的视角理解偏好微调

摘要: 从人类偏好数据中学习已经成为微调大型语言模型（LLMs）的主要范式。最常见的两类技术——在线强化学习（RL）如Proximal Policy Optimization（PPO）和离线对比方法如Direct Preference Optimization（DPO）——在先前的工作中被定位为等效，因为两者都必须从相同的离线偏好数据集开始。为了进一步扩展我们对在线和离线技术在偏好微调中相似性和差异性的理论理解，我们通过数据集覆盖的视角进行了严格分析，这是一个捕捉训练数据覆盖测试分布的概念，在RL中被广泛使用。我们证明，全局覆盖条件对于离线对比方法收敛到最优策略既是必要的又是充分的，但对于在线RL方法则充分的部分覆盖条件足以。这种分离提供了一个解释，说明为什么在线RL方法可以比离线方法表现更好，特别是当离线偏好数据不够多样化时。最后，受我们之前的理论观察启发，我们推导出一个混合偏好优化（HyPO）算法，该算法使用离线数据进行基于对比的偏好优化，使用在线数据进行KL正则化。理论上和实证上，我们证明HyPO比其纯离线对应物DPO更高效，同时仍然保持其计算和内存效率。

更新时间: 2024-06-03 15:51:04

领域: cs.LG

下载: http://arxiv.org/abs/2406.01462v1

Hardness of Learning Neural Networks under the Manifold Hypothesis

The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under i.i.d. Gaussian or uniform Boolean data distributions. In this paper, we investigate the hardness of learning under the manifold hypothesis. We ask which minimal assumptions on the curvature and regularity of the manifold, if any, render the learning problem efficiently learnable. We prove that learning is hard under input manifolds of bounded curvature by extending proofs of hardness in the SQ and cryptographic settings for Boolean data inputs to the geometric setting. On the other hand, we show that additional assumptions on the volume of the data manifold alleviate these fundamental limitations and guarantee learnability via a simple interpolation argument. Notable instances of this regime are manifolds which can be reliably reconstructed via manifold learning. Looking forward, we comment on and empirically explore intermediate regimes of manifolds, which have heterogeneous features commonly found in real world data.

Updated: 2024-06-03 15:50:32

标题: 学习神经网络在流形假设下的困难程度

摘要: 多样化假设认为高维数据位于低维流形上或附近。虽然已经通过实证方法证明了编码几何结构的实用性，但对其对神经网络可学习性的影响的严格分析在很大程度上缺失。最近的一些结果已经在独立同分布的高斯或均匀布尔数据分布下建立了学习前馈和等变神经网络的难度结果。在本文中，我们调查了在多样化假设下的学习难度。我们询问对流形的曲率和规则性有哪些最小假设，如果有的话，能够使学习问题能够高效学习。我们证明，在有界曲率的输入流形下学习是困难的，通过将在布尔数据输入到几何设置中的SQ和加密设置中的难度证明扩展到几何设置。另一方面，我们展示了对数据流形体积的额外假设可以缓解这些基本限制，并通过简单的插值论证保证可学习性。这种情况的显著实例是可以通过流形学习可靠重建的流形。展望未来，我们评论并通过实证方法探索了流形的中间情况，其中具有常见于真实世界数据中的异质特征。

更新时间: 2024-06-03 15:50:32

领域: cs.LG,math.DG,stat.ML

下载: http://arxiv.org/abs/2406.01461v1

Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code

With the growing popularity of Large Language Models (LLMs) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although LLMs can help developers to be more productive, prior empirical studies have shown that LLMs can generate insecure code. There are two contributing factors to the insecure code generation. First, existing datasets used to evaluate LLMs do not adequately represent genuine software engineering tasks sensitive to security. Instead, they are often based on competitive programming challenges or classroom-type coding tasks. In real-world applications, the code produced is integrated into larger codebases, introducing potential security risks. Second, existing evaluation metrics primarily focus on the functional correctness of the generated code while ignoring security considerations. Therefore, in this paper, we described SALLM, a framework to benchmark LLMs' abilities to generate secure code systematically. This framework has three major components: a novel dataset of security-centric Python prompts, configurable assessment techniques to evaluate the generated code, and novel metrics to evaluate the models' performance from the perspective of secure code generation.

Updated: 2024-06-03 15:50:23

标题: 生成和祈祷：使用SALLMS评估由LLM生成的代码的安全性

摘要: 随着大型语言模型（LLMs）在软件工程师日常实践中日益流行，确保这些工具生成的代码不仅在功能上正确，而且没有漏洞至关重要。尽管LLMs可以帮助开发人员更高效地工作，但先前的实证研究表明，LLMs可能会生成不安全的代码。造成不安全代码生成的两个因素。首先，用于评估LLMs的现有数据集未能充分代表对安全性敏感的真实软件工程任务。相反，它们通常基于竞争性编程挑战或课堂类型的编码任务。在实际应用中，生成的代码被集成到较大的代码库中，引入潜在的安全风险。其次，现有的评估指标主要关注生成的代码的功能正确性，而忽略了安全性考虑。因此，在本文中，我们描述了SALLM，一个用于系统评估LLMs生成安全代码能力的框架。该框架有三个主要组成部分：一个安全中心的Python提示的新数据集，可配置的评估技术来评估生成的代码，以及从生成安全代码的角度评估模型性能的新指标。

更新时间: 2024-06-03 15:50:23

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2311.00889v2

Distributional bias compromises leave-one-out cross-validation

Cross-validation is a common method for estimating the predictive performance of machine learning models. In a data-scarce regime, where one typically wishes to maximize the number of instances used for training the model, an approach called "leave-one-out cross-validation" is often used. In this design, a separate model is built for predicting each data instance after training on all other instances. Since this results in a single test data point available per model trained, predictions are aggregated across the entire dataset to calculate common rank-based performance metrics such as the area under the receiver operating characteristic or precision-recall curves. In this work, we demonstrate that this approach creates a negative correlation between the average label of each training fold and the label of its corresponding test instance, a phenomenon that we term distributional bias. As machine learning models tend to regress to the mean of their training data, this distributional bias tends to negatively impact performance evaluation and hyperparameter optimization. We show that this effect generalizes to leave-P-out cross-validation and persists across a wide range of modeling and evaluation approaches, and that it can lead to a bias against stronger regularization. To address this, we propose a generalizable rebalanced cross-validation approach that corrects for distributional bias. We demonstrate that our approach improves cross-validation performance evaluation in synthetic simulations and in several published leave-one-out analyses.

Updated: 2024-06-03 15:47:34

标题: 分布偏差影响留一法交叉验证

摘要: 交叉验证是评估机器学习模型预测性能的常见方法。在数据稀缺的情况下，人们通常希望最大化用于训练模型的实例数量，因此常常使用一种称为“留一法交叉验证”的方法。在这种设计中，训练所有其他实例后，构建一个单独的模型来预测每个数据实例。由于这导致每个训练模型只有一个可用的测试数据点，因此将预测汇总在整个数据集上以计算基于排名的常见性能指标，如接收器操作特性曲线或精度-召回曲线下的面积。在这项工作中，我们展示了这种方法在每个训练折叠的平均标签和其对应测试实例的标签之间创建了负相关，我们将这种现象称为分布偏差。由于机器学习模型倾向于回归到其训练数据的平均值，这种分布偏差往往会对性能评估和超参数优化产生负面影响。我们展示了这种效应推广到留P个交叉验证，并持续适用于各种建模和评估方法，并且可能导致对更强正则化的偏见。为了解决这个问题，我们提出了一种通用的重新平衡交叉验证方法，可校正分布偏差。我们展示了我们的方法在合成模拟和几个已发表的留一分析中改善了交叉验证性能评估。

更新时间: 2024-06-03 15:47:34

领域: stat.ME,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2406.01652v1

Differentially Private Tabular Data Synthesis using Large Language Models

Synthetic tabular data generation with differential privacy is a crucial problem to enable data sharing with formal privacy. Despite a rich history of methodological research and development, developing differentially private tabular data generators that can provide realistic synthetic datasets remains challenging. This paper introduces DP-LLMTGen -- a novel framework for differentially private tabular data synthesis that leverages pretrained large language models (LLMs). DP-LLMTGen models sensitive datasets using a two-stage fine-tuning procedure with a novel loss function specifically designed for tabular data. Subsequently, it generates synthetic data through sampling the fine-tuned LLMs. Our empirical evaluation demonstrates that DP-LLMTGen outperforms a variety of existing mechanisms across multiple datasets and privacy settings. Additionally, we conduct an ablation study and several experimental analyses to deepen our understanding of LLMs in addressing this important problem. Finally, we highlight the controllable generation ability of DP-LLMTGen through a fairness-constrained generation setting.

Updated: 2024-06-03 15:43:57

标题: 使用大型语言模型进行差分隐私表格数据合成

摘要: 使用差分隐私进行合成表格数据生成是实现数据共享并确保形式隐私的一个关键问题。尽管在方法学研究和发展方面有着丰富的历史，但开发能够提供真实合成数据集的差分隐私表格数据生成器仍然具有挑战性。本文介绍了DP-LLMTGen - 一种利用预训练大型语言模型（LLMs）的差分隐私表格数据合成的新框架。DP-LLMTGen使用一个专门为表格数据设计的新型损失函数，通过两阶段微调过程对敏感数据集进行建模。随后，通过对微调后的LLMs进行抽样生成合成数据。我们的实证评估表明，DP-LLMTGen在多个数据集和隐私设置下优于多种现有机制。此外，我们进行了消融研究和几项实验分析，以加深我们对LLMs解决这一重要问题的理解。最后，我们通过一个公平约束生成设置突出展示了DP-LLMTGen的可控生成能力。

更新时间: 2024-06-03 15:43:57

领域: cs.LG

下载: http://arxiv.org/abs/2406.01457v1

Automatic Fused Multimodal Deep Learning for Plant Identification

Plant classification is vital for ecological conservation and agricultural productivity, enhancing our understanding of plant growth dynamics and aiding species preservation. The advent of deep learning (DL) techniques has revolutionized this field by enabling autonomous feature extraction, significantly reducing the dependence on manual expertise. However, conventional DL models often rely solely on single data sources, failing to capture the full biological diversity of plant species comprehensively. Recent research has turned to multimodal learning to overcome this limitation by integrating multiple data types, which enriches the representation of plant characteristics. This shift introduces the challenge of determining the optimal point for modality fusion. In this paper, we introduce a pioneering multimodal DL-based approach for plant classification with automatic modality fusion. Utilizing the multimodal fusion architecture search, our method integrates images from multiple plant organs-flowers, leaves, fruits, and stems-into a cohesive model. Our method achieves 83.48% accuracy on 956 classes of the PlantCLEF2015 dataset, surpassing state-of-the-art methods. It outperforms late fusion by 11.07% and is more robust to missing modalities. We validate our model against established benchmarks using standard performance metrics and McNemar's test, further underscoring its superiority.

Updated: 2024-06-03 15:43:29

标题: 植物识别的自动融合多模态深度学习

摘要: 植物分类对生态保护和农业生产至关重要，有助于增进我们对植物生长动态的理解，并促进物种保护。深度学习（DL）技术的出现彻底改变了这一领域，通过实现自主特征提取，显著减少了对人工专业知识的依赖。然而，传统的DL模型通常只依赖于单一数据源，未能全面捕捉植物物种的生物多样性。最近的研究转向多模态学习，以克服这一限制，通过整合多种数据类型，丰富了对植物特征的表示。这一转变引入了确定最佳模态融合点的挑战。本文介绍了一种开创性的基于多模态DL的植物分类方法，实现了自动模态融合。通过利用多模态融合架构搜索，我们的方法将来自多个植物器官-花朵、叶子、果实和茎-的图像整合到一个连贯的模型中。我们的方法在PlantCLEF2015数据集的956个类别上实现了83.48%的准确率，超越了最先进的方法。它比延迟融合提高了11.07%，对缺失的模态更具鲁棒性。我们使用标准性能指标和McNemar检验对我们的模型进行验证，进一步强调其优越性。

更新时间: 2024-06-03 15:43:29

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.01455v1

Quantum Theory and Application of Contextual Optimal Transport

Optimal Transport (OT) has fueled machine learning (ML) across many domains. When paired data measurements $(\boldsymbol{\mu}, \boldsymbol{\nu})$ are coupled to covariates, a challenging conditional distribution learning setting arises. Existing approaches for learning a $\textit{global}$ transport map parameterized through a potentially unseen context utilize Neural OT and largely rely on Brenier's theorem. Here, we propose a first-of-its-kind quantum computing formulation for amortized optimization of contextualized transportation plans. We exploit a direct link between doubly stochastic matrices and unitary operators thus unravelling a natural connection between OT and quantum computation. We verify our method (QontOT) on synthetic and real data by predicting variations in cell type distributions conditioned on drug dosage. Importantly we conduct a 24-qubit hardware experiment on a task challenging for classical computers and report a performance that cannot be matched with our classical neural OT approach. In sum, this is a first step toward learning to predict contextualized transportation plans through quantum computing.

Updated: 2024-06-03 15:42:55

标题: 量子理论与情境最优输运的应用

摘要: Optimal Transport (OT)已经在许多领域推动了机器学习（ML）的发展。当配对数据测量（$\boldsymbol{\mu}, \boldsymbol{\nu}$）与协变量耦合时，会出现一个具有挑战性的条件分布学习设置。现有方法用于学习通过可能看不见的上下文参数化的$\textit{全局}$传输映射，利用了神经OT并且在很大程度上依赖于Brenier的定理。在这里，我们提出了一种首次采用量子计算形式化的方法，用于摊销优化上下文化的运输计划。我们利用双随机矩阵和酉算子之间的直接联系，从而揭示了OT和量子计算之间的自然联系。我们在合成和真实数据上验证了我们的方法（QontOT），通过预测细胞类型分布的变化，这些变化取决于药物剂量。重要的是，我们在一个对经典计算机具有挑战性的任务上进行了一个24量子比特硬件实验，并报告了一个无法与我们的经典神经OT方法匹敌的性能。总的来说，这是通过量子计算学习预测上下文化运输计划的第一步。

更新时间: 2024-06-03 15:42:55

领域: cs.LG,cs.ET,math.QA,q-bio.QM,quant-ph

下载: http://arxiv.org/abs/2402.14991v3

Efficient Inverse Design Optimization through Multi-fidelity Simulations, Machine Learning, and Search Space Reduction Strategies

This paper introduces a methodology designed to augment the inverse design optimization process in scenarios constrained by limited compute, through the strategic synergy of multi-fidelity evaluations, machine learning models, and optimization algorithms. The proposed methodology is analyzed on two distinct engineering inverse design problems: airfoil inverse design and the scalar field reconstruction problem. It leverages a machine learning model trained with low-fidelity simulation data, in each optimization cycle, thereby proficiently predicting a target variable and discerning whether a high-fidelity simulation is necessitated, which notably conserves computational resources. Additionally, the machine learning model is strategically deployed prior to optimization to compress the design space boundaries, thereby further accelerating convergence toward the optimal solution. The methodology has been employed to enhance two optimization algorithms, namely Differential Evolution and Particle Swarm Optimization. Comparative analyses illustrate performance improvements across both algorithms. Notably, this method is adaptable across any inverse design application, facilitating a synergy between a representative low-fidelity ML model, and high-fidelity simulation, and can be seamlessly applied across any variety of population-based optimization algorithms.}

Updated: 2024-06-03 15:42:45

标题: 高效的反向设计优化方法：基于多精度模拟、机器学习和搜索空间缩减策略

摘要: 本文介绍了一种方法论，旨在通过多种精度评估、机器学习模型和优化算法的战略协同，增强受限于有限计算资源的反向设计优化过程。所提出的方法论在两个不同的工程反向设计问题上进行了分析：翼型反向设计和标量场重构问题。在每个优化周期中，该方法利用经过低精度模拟数据训练的机器学习模型，从而能够有效地预测目标变量，并判断是否需要高精度模拟，从而显著节省计算资源。此外，机器学习模型在优化之前被战略性地部署，以压缩设计空间边界，从而进一步加快收敛到最佳解决方案。该方法已被用于加强两种优化算法，即差分进化和粒子群优化。比较分析显示了两种算法的性能改进。值得注意的是，这种方法可以适用于任何反向设计应用，促进了低精度机器学习模型和高精度模拟之间的协同作用，并可以无缝应用于任何种类的基于群体的优化算法。

更新时间: 2024-06-03 15:42:45

领域: cs.CE,cs.AI,cs.LG,cs.NE,stat.ML

下载: http://arxiv.org/abs/2312.03654v2

Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach

In recent years, automatic speech recognition (ASR) systems have significantly improved, especially in languages with a vast amount of transcribed speech data. However, ASR systems tend to perform poorly for low-resource languages with fewer resources, such as minority and regional languages. This study introduces a novel pipeline designed to generate ASR training datasets from audiobooks, which typically feature a single transcript associated with hours-long audios. The common structure of these audiobooks poses a unique challenge due to the extensive length of audio segments, whereas optimal ASR training requires segments ranging from 4 to 15 seconds. To address this, we propose a method for effectively aligning audio with its corresponding text and segmenting it into lengths suitable for ASR training. Our approach simplifies data preparation for ASR systems in low-resource languages and demonstrates its application through a case study involving the Armenian language. Our method, which is "portable" to many low-resource languages, not only mitigates the issue of data scarcity but also enhances the performance of ASR models for underrepresented languages.

Updated: 2024-06-03 15:38:40

标题: 为低资源语言实现ASR：一种全面的数据集创建方法

摘要: 近年来，自动语音识别（ASR）系统有了显著的改进，特别是在拥有大量转录语音数据的语言中。然而，ASR系统在资源较少的低资源语言（如少数民族和地区性语言）中表现不佳。本研究介绍了一种新颖的流程，旨在从有声读物中生成ASR训练数据集，这些有声读物通常包含与长达数小时的音频相关联的单个文本。这些有声读物的共同结构由于音频片段的长度而构成独特挑战，而最佳的ASR训练需要时长在4至15秒之间的片段。为了解决这个问题，我们提出了一种有效地将音频与相应文本对齐并将其分段成适合ASR训练的长度的方法。我们的方法简化了低资源语言ASR系统的数据准备，并通过涉及亚美尼亚语的案例研究展示了其应用。我们的方法可以“移植”到许多低资源语言，不仅缓解了数据稀缺问题，还提升了ASR模型对少数语言的表现。

更新时间: 2024-06-03 15:38:40

领域: cs.CL,cs.LG,eess.AS,eess.SP

下载: http://arxiv.org/abs/2406.01446v1

Efficient and Generalizable Certified Unlearning: A Hessian-free Recollection Approach

Machine unlearning strives to uphold the data owners' right to be forgotten by enabling models to selectively forget specific data. Recent advances suggest precomputing and storing statistics extracted from second-order information and implementing unlearning through Newton-style updates. However, the theoretical analysis of these works often depends on restrictive assumptions of convexity and smoothness, and those mentioned operations on Hessian matrix are extremely costly. As a result, applying these works to high-dimensional models becomes challenging. In this paper, we propose an efficient Hessian-free certified unlearning. We propose to maintain a statistical vector for each data, computed through affine stochastic recursion approximation of the difference between retrained and learned models. Our analysis does not involve inverting Hessian and thus can be extended to non-convex non-smooth objectives. Under same assumptions, we demonstrate advancements of proposed method beyond the state-of-the-art theoretical studies, in terms of generalization, unlearning guarantee, deletion capacity, and computation/storage complexity, and we show that the unlearned model of our proposed approach is close to or same as the retrained model. Based on the strategy of recollecting statistics for forgetting data, we develop an algorithm that achieves near-instantaneous unlearning as it only requires a vector addition operation. Experiments demonstrate that the proposed scheme surpasses existing results by orders of magnitude in terms of time/storage costs, while also enhancing accuracy.

Updated: 2024-06-03 15:35:12

标题: 高效且可推广的认证式遗忘：一种无Hessian方法

摘要: 机器遗忘致力于维护数据所有者被遗忘的权利，通过使模型有选择地忘记特定数据。最近的进展表明，通过预先计算和存储从二阶信息中提取的统计信息，并通过牛顿式更新实现遗忘。然而，这些工作的理论分析通常依赖于凸性和平滑性的限制性假设，以及对Hessian矩阵的操作非常昂贵。因此，将这些工作应用于高维模型变得具有挑战性。在本文中，我们提出了一种高效的无Hessian认证遗忘方法。我们建议为每个数据维护一个统计向量，通过重新训练和学习模型之间的差异的仿射随机递归逼近来计算。我们的分析不涉及求逆Hessian，因此可以扩展到非凸非光滑目标。在相同的假设下，我们展示了所提出方法在泛化、遗忘保证、删除容量和计算/存储复杂性等方面超越了最新的理论研究，同时我们展示了我们提出方法的遗忘模型接近或与重新训练模型相同。基于重新收集遗忘数据的统计信息的策略，我们开发了一种算法，实现几乎即时的遗忘，因为它只需要进行向量加法操作。实验证明，所提出的方案在时间/存储成本方面比现有结果提升了数个数量级，同时也提高了准确性。

更新时间: 2024-06-03 15:35:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.01712v3

Cheap Talking Algorithms

We simulate behaviour of two independent reinforcement learning algorithms playing the Crawford and Sobel (1982) game of strategic information transmission. We adopt memoryless algorithms to capture learning in a static game where a large population interacts anonymously. We show that sender and receiver converge to Nash equilibrium play. The level of informativeness of the sender's cheap talk decreases as the bias increases and, at intermediate level of the bias, it matches the level predicted by the Pareto optimal equilibrium or by the second best one. Conclusions are robust to alternative specifications of the learning hyperparameters and of the game.

Updated: 2024-06-03 15:34:10

标题: 廉价的对话算法

摘要: 我们模拟了两种独立的强化学习算法在Crawford和Sobel（1982）的战略信息传递游戏中的行为。我们采用无记忆算法来捕捉在一个大群体中匿名互动的静态游戏中的学习过程。我们展示了发送者和接收者会收敛到纳什均衡策略。随着偏见增加，发送者的廉价谈话的信息量减少，在偏见的中间水平，它与帕累托最优均衡或次优均衡预测的水平相匹配。结论对于学习超参数和游戏的替代规范是稳健的。

更新时间: 2024-06-03 15:34:10

领域: econ.TH,cs.AI

下载: http://arxiv.org/abs/2310.07867v5

Understanding Domain-Size Generalization in Markov Logic Networks

We study the generalization behavior of Markov Logic Networks (MLNs) across relational structures of different sizes. Multiple works have noticed that MLNs learned on a given domain generalize poorly across domains of different sizes. This behavior emerges from a lack of internal consistency within an MLN when used across different domain sizes. In this paper, we quantify this inconsistency and bound it in terms of the variance of the MLN parameters. The parameter variance also bounds the KL divergence between an MLN's marginal distributions taken from different domain sizes. We use these bounds to show that maximizing the data log-likelihood while simultaneously minimizing the parameter variance corresponds to two natural notions of generalization across domain sizes. Our theoretical results apply to Exponential Random Graphs and other Markov network based relational models. Finally, we observe that solutions known to decrease the variance of the MLN parameters, like regularization and Domain-Size Aware MLNs, increase the internal consistency of the MLNs. We empirically verify our results on four different datasets, with different methods to control parameter variance, showing that controlling parameter variance leads to better generalization.

Updated: 2024-06-03 15:30:52

标题: 理解马尔科夫逻辑网络中的域大小泛化

摘要: 我们研究了马尔科夫逻辑网络（MLNs）在不同大小的关系结构上的泛化行为。多个研究已经注意到，对于给定领域学习的MLNs在不同大小的领域之间泛化性差。这种行为源自MLN在不同领域大小之间使用时的内在一致性不足。在本文中，我们量化了这种不一致性，并将其限定为MLN参数的方差。参数方差还限制了从不同领域大小取出的MLN边际分布之间的KL散度。我们利用这些界限显示，在最大化数据对数似然的同时最小化参数方差对应于两种领域大小之间泛化的自然观念。我们的理论结果适用于指数随机图和其他基于马尔科夫网络的关系模型。最后，我们观察到已知可以减少MLN参数方差的解决方案，如正则化和领域大小感知MLNs，可以增加MLNs的内部一致性。我们在四个不同数据集上实证验证了我们的结果，采用不同方法来控制参数方差，结果显示控制参数方差可以实现更好的泛化。

更新时间: 2024-06-03 15:30:52

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.15933v3

Asynchronous Multi-Server Federated Learning for Geo-Distributed Clients

Federated learning (FL) systems enable multiple clients to train a machine learning model iteratively through synchronously exchanging the intermediate model weights with a single server. The scalability of such FL systems can be limited by two factors: server idle time due to synchronous communication and the risk of a single server becoming the bottleneck. In this paper, we propose a new FL architecture, to our knowledge, the first multi-server FL system that is entirely asynchronous, and therefore addresses these two limitations simultaneously. Our solution keeps both servers and clients continuously active. As in previous multi-server methods, clients interact solely with their nearest server, ensuring efficient update integration into the model. Differently, however, servers also periodically update each other asynchronously, and never postpone interactions with clients. We compare our solution to three representative baselines - FedAvg, FedAsync and HierFAVG - on the MNIST and CIFAR-10 image classification datasets and on the WikiText-2 language modeling dataset. Our solution converges to similar or higher accuracy levels than previous baselines and requires 61% less time to do so in geo-distributed settings.

Updated: 2024-06-03 15:29:46

标题: 异步多服务器联合学习用于地理分布式客户端

摘要: 联邦学习（FL）系统通过通过与单个服务器同步交换中间模型权重使多个客户端迭代地训练机器学习模型。这种FL系统的可扩展性可能受到两个因素的限制：由于同步通信而导致服务器空闲时间和单个服务器成为瓶颈的风险。在本文中，我们提出了一种新的FL架构，据我们所知，这是第一个完全异步的多服务器FL系统，因此可以同时解决这两个限制。我们的解决方案保持服务器和客户端持续活动。与以前的多服务器方法一样，客户端仅与其最近的服务器交互，确保有效地将更新集成到模型中。然而，不同的是，服务器也会定期异步更新彼此，并且从不推迟与客户端的交互。我们在MNIST和CIFAR-10图像分类数据集以及WikiText-2语言建模数据集上将我们的解决方案与三个代表性基线-FedAvg，FedAsync和HierFAVG进行比较。我们的解决方案收敛到与以前基线相似或更高的准确度水平，并且在地理分布的设置中所需的时间少61％。

更新时间: 2024-06-03 15:29:46

领域: cs.LG,cs.DC,I.2.11

下载: http://arxiv.org/abs/2406.01439v1

Asynchronous Byzantine Federated Learning

Federated learning (FL) enables a set of geographically distributed clients to collectively train a model through a server. Classically, the training process is synchronous, but can be made asynchronous to maintain its speed in presence of slow clients and in heterogeneous networks. The vast majority of Byzantine fault-tolerant FL systems however rely on a synchronous training process. Our solution is one of the first Byzantine-resilient and asynchronous FL algorithms that does not require an auxiliary server dataset and is not delayed by stragglers, which are shortcomings of previous works. Intuitively, the server in our solution waits to receive a minimum number of updates from clients on its latest model to safely update it, and is later able to safely leverage the updates that late clients might send. We compare the performance of our solution with state-of-the-art algorithms on both image and text datasets under gradient inversion, perturbation, and backdoor attacks. Our results indicate that our solution trains a model faster than previous synchronous FL solution, and maintains a higher accuracy, up to 1.54x and up to 1.75x for perturbation and gradient inversion attacks respectively, in the presence of Byzantine clients than previous asynchronous FL solutions.

Updated: 2024-06-03 15:29:38

标题: 异步拜占庭式联邦学习

摘要: 联邦学习（FL）使一组地理分布的客户端通过服务器共同训练模型。经典上，训练过程是同步的，但可以异步进行，以保持在慢速客户端和异构网络的情况下的速度。然而，绝大多数拜占庭容错的FL系统依赖于同步训练过程。我们的解决方案是第一个拜占庭容错和异步FL算法，不需要辅助服务器数据集，并且不会受到拖延者的影响，这是以前工作的缺点。直观地说，我们的解决方案中服务器等待从客户端接收其最新模型的最少更新，以安全更新它，并且稍后能够安全地利用迟到的客户端可能发送的更新。我们将我们的解决方案与最先进的算法在图像和文本数据集上进行性能比较，包括梯度反转、扰动和后门攻击。我们的结果表明，在拜占庭客户端存在的情况下，与以前的异步FL解决方案相比，我们的解决方案训练模型更快，准确率更高，对于扰动和梯度反转攻击，分别高出1.54倍和1.75倍。

更新时间: 2024-06-03 15:29:38

领域: cs.LG,cs.DC,I.2.11

下载: http://arxiv.org/abs/2406.01438v1

Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL

We study the sample complexity of reinforcement learning (RL) in Mean-Field Games (MFGs) with model-based function approximation that requires strategic exploration to find a Nash Equilibrium policy. We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity. Notably, P-MBED measures the complexity of the single-agent model class converted from the given mean-field model class, and potentially, can be exponentially lower than the MBED proposed by \citet{huang2023statistical}. We contribute a model elimination algorithm featuring a novel exploration strategy and establish sample complexity results polynomial w.r.t.~P-MBED. Crucially, our results reveal that, under the basic realizability and Lipschitz continuity assumptions, \emph{learning Nash Equilibrium in MFGs is no more statistically challenging than solving a logarithmic number of single-agent RL problems}. We further extend our results to Multi-Type MFGs, generalizing from conventional MFGs and involving multiple types of agents. This extension implies statistical tractability of a broader class of Markov Games through the efficacy of mean-field approximation. Finally, inspired by our theoretical algorithm, we present a heuristic approach with improved computational efficiency and empirically demonstrate its effectiveness.

Updated: 2024-06-03 15:29:09

标题: 基于模型的强化学习在均场博弈中并不比单一智能体强化学习更具统计学困难性

摘要: 我们研究了需要进行战略性探索以找到纳什均衡策略的基于模型的函数逼近的均场博弈（MFGs）中强化学习（RL）的样本复杂性。我们引入了部分基于模型的躲避维度（P-MBED），这是一个更有效的概念来表征模型类的复杂性。值得注意的是，P-MBED衡量了从给定的均场模型类转换而来的单一代理模型类的复杂性，潜在地可以比\citet{huang2023statistical}提出的MBED指数级更低。我们提出了一个特征新颖的探索策略的模型消除算法，并建立了与P-MBED多项式相关的样本复杂度结果。关键的是，我们的结果揭示，在基本可实现性和Lipschitz连续性假设下，学习MFGs中的纳什均衡不再比解决对数个单一代理RL问题更具统计挑战性。我们进一步将我们的结果扩展到多类型MFGs，从传统MFGs泛化并涉及多种类型的代理。这一扩展意味着通过均场逼近的有效性，更广泛的马尔可夫博弈类的统计可处理性。最后，受我们的理论算法启发，我们提出了一种启发式方法，具有改进的计算效率，并在实证中展示了其有效性。

更新时间: 2024-06-03 15:29:09

领域: cs.LG,cs.AI,cs.GT,stat.ML

下载: http://arxiv.org/abs/2402.05724v2

Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning

Ridgeless regression has garnered attention among researchers, particularly in light of the ``Benign Overfitting'' phenomenon, where models interpolating noisy samples demonstrate robust generalization. However, kernel ridgeless regression does not always perform well due to the lack of flexibility. This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels, incorporating kernel learning techniques to improve performance in both experiments and theory. For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs). Despite the absence of explicit regularization in the proposed model, its optimization is equivalent to solving an $\ell_0$-regularized problem in the integral space of RKHSs, elucidating the origin of its generalization ability. Taking an approximation analysis viewpoint, we introduce an $l_q$-norm analysis technique (with $0<q<1$) to derive the learning rate for the proposed model under mild conditions. This result deepens our theoretical understanding, explaining that our algorithm's robust approximation ability arises from the large capacity of the integral space of RKHSs, while its generalization ability is ensured by sparsity, controlled by the number of support vectors. Experimental results on both synthetic and real datasets validate our theoretical conclusions.

Updated: 2024-06-03 15:28:12

标题: 学习非对称核学习的无正则化核岭回归分析

摘要: 岩脊回归在研究人员中引起了关注，特别是考虑到“良性过拟合”现象，即插值噪声样本的模型表现出强大的泛化能力。然而，由于缺乏灵活性，核岩脊回归并不总是表现良好。本文通过引入局部自适应带宽（LAB）RBF核，将核学习技术应用于增强核岩脊回归，在实验和理论上改善性能。我们首次证明，从LAB RBF核学习的函数属于可再生核希尔伯特空间（RKHSs）的积分空间。尽管所提出的模型中没有显式的正则化，但其优化等价于在RKHSs的积分空间中解决一个$\ell_0$-正则化问题，阐明了其泛化能力的起源。从近似分析的角度出发，我们引入了一个$l_q$-范数分析技术（其中$0<q<1$），在温和条件下推导了所提出模型的学习率。这一结果加深了我们对理论的理解，解释了我们算法的强大近似能力是由于RKHSs的积分空间的高容量，而其泛化能力则由支持向量的数量控制。在合成和实际数据集上的实验结果验证了我们的理论结论。

更新时间: 2024-06-03 15:28:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01435v1

Universal In-Context Approximation By Prompting Fully Recurrent Models

Zero-shot and in-context learning enable solving tasks without model fine-tuning, making them essential for developing generative model solutions. Therefore, it is crucial to understand whether a pretrained model can be prompted to approximate any function, i.e., whether it is a universal in-context approximator. While it was recently shown that transformer models do possess this property, these results rely on their attention mechanism. Hence, these findings do not apply to fully recurrent architectures like RNNs, LSTMs, and the increasingly popular SSMs. We demonstrate that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures such as Mamba and Hawk/Griffin can also serve as universal in-context approximators. To streamline our argument, we introduce a programming language called LSRL that compiles to these fully recurrent architectures. LSRL may be of independent interest for further studies of fully recurrent models, such as constructing interpretability benchmarks. We also study the role of multiplicative gating and observe that architectures incorporating such gating (e.g., LSTMs, GRUs, Hawk/Griffin) can implement certain operations more stably, making them more viable candidates for practical in-context universal approximation.

Updated: 2024-06-03 15:25:13

标题: 通过提示完全递归模型实现的通用上下文逼近

摘要: 零样本学习和上下文学习使得在不进行模型微调的情况下解决任务成为可能，这对于开发生成模型解决方案至关重要。因此，了解预训练模型能否被要求逼近任何函数是至关重要的，即它是否是一个通用的上下文逼近器。最近有研究表明，变压器模型确实具有这一特性，但这些结果依赖于它们的注意力机制。因此，这些发现不适用于完全循环结构，如RNN、LSTM和越来越受欢迎的SSM。我们证明了RNN、LSTM、GRU、线性RNN以及线性门控结构如Mamba和Hawk/Griffin也可以作为通用的上下文逼近器。为了简化我们的论点，我们引入了一种名为LSRL的编程语言，可以编译为这些完全循环结构。LSRL可能对进一步研究完全循环模型，比如构建可解释性基准测试，具有独立的兴趣。我们还研究了乘法门控的作用，并观察到包含这种门控的架构（如LSTM、GRU、Hawk/Griffin）可以更稳定地实现某些操作，使它们成为实际上下文通用逼近的更可行的候选者。

更新时间: 2024-06-03 15:25:13

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.01424v1

Value Improved Actor Critic Algorithms

Many modern reinforcement learning algorithms build on the actor-critic (AC) framework: iterative improvement of a policy (the actor) using policy improvement operators and iterative approximation of the policy's value (the critic). In contrast, the popular value-based algorithm family employs improvement operators in the value update, to iteratively improve the value function directly. In this work, we propose a general extension to the AC framework that employs two separate improvement operators: one applied to the policy in the spirit of policy-based algorithms and one applied to the value in the spirit of value-based algorithms, which we dub Value-Improved AC (VI-AC). We design two practical VI-AC algorithms based in the popular online off-policy AC algorithms TD3 and DDPG. We evaluate VI-TD3 and VI-DDPG in the Mujoco benchmark and find that both improve upon or match the performance of their respective baselines in all environments tested.

Updated: 2024-06-03 15:24:15

标题: 价值改进的演员-评论家算法

摘要: 许多现代强化学习算法基于演员-评论家（AC）框架：使用策略改进运算符迭代改进策略（演员），并使用评论家对策略的值进行迭代近似（评论家）。相比之下，流行的基于价值的算法家族在值更新中使用改进运算符，直接迭代改进值函数。在这项工作中，我们提出了对AC框架的一般扩展，使用两个单独的改进运算符：一个应用于策略，类似于基于策略的算法，另一个应用于值，类似于基于价值的算法，我们称之为增值改进AC（VI-AC）。我们设计了两种基于流行的在线离策略AC算法TD3和DDPG的实用VI-AC算法。我们在Mujoco基准测试中评估了VI-TD3和VI-DDPG，并发现在所有测试环境中，两者都改进或匹配其各自基线的性能。

更新时间: 2024-06-03 15:24:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01423v1

Leveraging Expert Consistency to Improve Algorithmic Decision Support

Machine learning (ML) is increasingly being used to support high-stakes decisions. However, there is frequently a construct gap: a gap between the construct of interest to the decision-making task and what is captured in proxies used as labels to train ML models. As a result, ML models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. Thus, an essential step in the design of ML systems for decision support is selecting a target label among available proxies. In this work, we explore the use of historical expert decisions as a rich -- yet also imperfect -- source of information that can be combined with observed outcomes to narrow the construct gap. We argue that managers and system designers may be interested in learning from experts in instances where they exhibit consistency with each other, while learning from observed outcomes otherwise. We develop a methodology to enable this goal using information that is commonly available in organizational information systems. This involves two core steps. First, we propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Second, we introduce a label amalgamation approach that allows ML models to simultaneously learn from expert decisions and observed outcomes. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap, yielding better predictive performance than learning from either observed outcomes or expert decisions alone.

Updated: 2024-06-03 15:23:05

标题: 利用专家一致性提高算法决策支持

摘要: 机器学习（ML）越来越被用于支持高风险决策。然而，经常存在一个构建差距：即在决策任务中感兴趣的构建与用作训练ML模型标签的代理之间的差距。因此，ML模型可能无法捕捉决策标准的重要维度，从而影响它们在决策支持方面的效用。因此，在为决策支持设计ML系统的关键步骤是从可用代理中选择目标标签。在这项工作中，我们探讨了使用历史专家决策作为信息丰富但也不完美的信息源，结合观察结果来缩小构建差距。我们认为，在专家之间表现一致的情况下，管理者和系统设计者可能有兴趣向他们学习，否则可以从观察结果中学习。我们开发了一种方法论，利用组织信息系统中通常可用的信息实现这一目标。这涉及两个核心步骤。首先，我们提出了一种基于影响函数的方法来间接估计专家一致性，当数据中的每个案例由一个专家评估时。其次，我们引入了一种标签聚合方法，使ML模型能够同时从专家决策和观察结果中学习。我们的实证评估使用了在临床环境中的模拟和来自儿童福利领域的真实数据，表明所提出的方法成功缩小了构建差距，比仅从观察结果或专家决策中学习能够获得更好的预测性能。

更新时间: 2024-06-03 15:23:05

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2101.09648v3

1-Lipschitz Neural Networks are more expressive with N-Activations

A crucial property for achieving secure, trustworthy and interpretable deep learning systems is their robustness: small changes to a system's inputs should not result in large changes to its outputs. Mathematically, this means one strives for networks with a small Lipschitz constant. Several recent works have focused on how to construct such Lipschitz networks, typically by imposing constraints on the weight matrices. In this work, we study an orthogonal aspect, namely the role of the activation function. We show that commonly used activation functions, such as MaxMin, as well as all piece-wise linear ones with two segments unnecessarily restrict the class of representable functions, even in the simplest one-dimensional setting. We furthermore introduce the new N-activation function that is provably more expressive than currently popular activation functions. We provide code at https://github.com/berndprach/NActivation.

Updated: 2024-06-03 15:20:13

标题: 1-利普希茨神经网络在具有N个激活函数时更具表现力

摘要: 用于实现安全、可靠和可解释的深度学习系统的一个关键属性是它们的鲁棒性：对系统输入的微小变化不应导致其输出的大幅变化。从数学上讲，这意味着人们努力寻求具有较小 Lipschitz 常数的网络。最近几项研究工作集中讨论如何构建这样的 Lipschitz 网络，通常是通过对权重矩阵施加约束。在这项工作中，我们研究了一个正交方面，即激活函数的作用。我们展示了常用的激活函数，如 MaxMin，以及所有具有两个线性分段的分段线性函数，在最简单的一维设置中不必要地限制了可表示函数的类别。此外，我们介绍了新的 N-激活函数，可以证明比目前流行的激活函数更具表现力。我们在 https://github.com/berndprach/NActivation 上提供了代码。

更新时间: 2024-06-03 15:20:13

领域: cs.LG

下载: http://arxiv.org/abs/2311.06103v2

Problematizing AI Omnipresence in Landscape Architecture

This position paper argues for, and offers, a critical lens through which to examine the current AI frenzy in the landscape architecture profession. In it, the authors propose five archetypes or mental modes that landscape architects might inhabit when thinking about AI. Rather than limiting judgments of AI use to a single axis of acceleration, these archetypes and corresponding narratives exist along a relational spectrum and are permeable, allowing LAs to take on and switch between them according to context. We model these relationships between the archetypes and their contributions to AI advancement using a causal loop diagram (CLD), and with those interactions argue that more nuanced ways of approaching AI might also open new modes of practice in the new digital economy.

Updated: 2024-06-03 15:20:05

标题: 在景观建筑中对人工智能无所不在的问题进行探讨

摘要: 这篇立场论文提出并提供了一种批判性视角，用以审视景观建筑专业中当前的人工智能狂热现象。在文中，作者提出了景观建筑师在思考人工智能时可能采用的五种原型或心智模式。与仅仅将人工智能使用的评判限制在加速度的单一轴线上不同，这些原型和相应的叙事存在于一个关系光谱中，并且是可渗透的，使得景观建筑师可以根据环境采用和在它们之间切换。我们使用因果循环图（CLD）对这些原型之间的关系及其对人工智能进步的贡献进行建模，并通过这些互动认为，更细致的人工智能处理方式也可能在新数字经济中开启新的实践模式。

更新时间: 2024-06-03 15:20:05

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.01421v1

HyCubE: Efficient Knowledge Hypergraph 3D Circular Convolutional Embedding

Knowledge hypergraph embedding models are usually computationally expensive due to the inherent complex semantic information. However, existing works mainly focus on improving the effectiveness of knowledge hypergraph embedding, making the model architecture more complex and redundant. It is desirable and challenging for knowledge hypergraph embedding to reach a trade-off between model effectiveness and efficiency. In this paper, we propose an end-to-end efficient n-ary knowledge hypergraph embedding model, HyCubE, which designs a novel 3D circular convolutional neural network and the alternate mask stack strategy to enhance the interaction and extraction of feature information comprehensively. Furthermore, our proposed model achieves a better trade-off between effectiveness and efficiency by adaptively adjusting the 3D circular convolutional layer structure to handle different arity knowledge hypergraphs with fewer parameters. In addition, we use 1-N multilinear scoring based on the entity mask mechanism to further accelerate the model training efficiency. Finally, extensive experimental results on all datasets demonstrate that our proposed model consistently outperforms state-of-the-art baselines, with an average improvement of 7.30%-9.53% and a maximum improvement of 33.82% across all metrics. Meanwhile, HyCubE is 4.12x faster, GPU memory usage is 52.19% lower, and the number of parameters is reduced by 85.21% compared with the average metric of the latest state-of-the-art baselines.

Updated: 2024-06-03 15:17:46

标题: HyCubE：高效知识超图3D循环卷积嵌入

摘要: 知识超图嵌入模型通常由于固有的复杂语义信息而计算成本高昂。然而，现有的工作主要集中在改进知识超图嵌入的有效性，使模型架构更加复杂和冗余。知识超图嵌入能够在模型有效性和效率之间达到一种平衡是令人期待和具有挑战性的。在本文中，我们提出了一种端到端高效的n元知识超图嵌入模型HyCubE，该模型设计了一种新颖的三维环形卷积神经网络和交替掩码堆叠策略，全面增强了特征信息的交互和提取。此外，我们提出的模型通过自适应调整三维环形卷积层结构来处理具有更少参数的不同度知识超图，从而实现了更好的有效性和效率之间的平衡。此外，我们利用基于实体掩码机制的1-N多线性评分进一步加快了模型训练效率。最后，对所有数据集进行的广泛实验结果表明，我们提出的模型始终优于最先进的基线模型，在所有指标上平均提高了7.30%-9.53%，最大提高了33.82%。与最新最先进的基线模型的平均指标相比，HyCubE的速度提高了4.12倍，GPU内存使用量降低了52.19%，参数数量减少了85.21%。

更新时间: 2024-06-03 15:17:46

领域: cs.AI

下载: http://arxiv.org/abs/2402.08961v2

Functional Bilevel Optimization for Machine Learning

In this paper, we introduce a new functional point of view on bilevel optimization problems for machine learning, where the inner objective is minimized over a function space. These types of problems are most often solved by using methods developed in the parametric setting, where the inner objective is strongly convex with respect to the parameters of the prediction function. The functional point of view does not rely on this assumption and notably allows using over-parameterized neural networks as the inner prediction function. We propose scalable and efficient algorithms for the functional bilevel optimization problem and illustrate the benefits of our approach on instrumental regression and reinforcement learning tasks.

Updated: 2024-06-03 15:16:26

标题: 机器学习的功能双层优化

摘要: 在这篇论文中，我们介绍了一个新的功能性视角，用于机器学习中的双层优化问题，其中内部目标在函数空间上被最小化。这些类型的问题通常通过使用在参数设置中开发的方法来解决，其中内部目标相对于预测函数的参数是强凸的。功能性视角不依赖于这一假设，特别允许使用过参数化的神经网络作为内部预测函数。我们提出了可扩展且高效的算法来解决功能性双层优化问题，并展示了我们的方法在仪器回归和强化学习任务上的优势。

更新时间: 2024-06-03 15:16:26

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2403.20233v2

Mixup Augmentation with Multiple Interpolations

Mixup and its variants form a popular class of data augmentation techniques.Using a random sample pair, it generates a new sample by linear interpolation of the inputs and labels. However, generating only one single interpolation may limit its augmentation ability. In this paper, we propose a simple yet effective extension called multi-mix, which generates multiple interpolations from a sample pair. With an ordered sequence of generated samples, multi-mix can better guide the training process than standard mixup. Moreover, theoretically, this can also reduce the stochastic gradient variance. Extensive experiments on a number of synthetic and large-scale data sets demonstrate that multi-mix outperforms various mixup variants and non-mixup-based baselines in terms of generalization, robustness, and calibration.

Updated: 2024-06-03 15:16:09

标题: 混合增强与多重插值

摘要: 混合数据增强技术及其变种是一类流行的数据增强技术。通过使用随机样本对，它通过对输入和标签进行线性插值生成新样本。然而，仅生成一个单一插值可能会限制其增强能力。本文提出了一个简单但有效的扩展称为multi-mix，它从一个样本对生成多个插值。通过生成的样本的有序序列，multi-mix可以比标准mixup更好地指导训练过程。此外，从理论上讲，这也可以减少随机梯度方差。对多个合成和大型数据集进行的大量实验表明，multi-mix在泛化、鲁棒性和校准方面优于各种mixup变种和非mixup基准线。

更新时间: 2024-06-03 15:16:09

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.01417v1

Adapting Conformal Prediction to Distribution Shifts Without Labels

Conformal prediction (CP) enables machine learning models to output prediction sets with guaranteed coverage rate, assuming exchangeable data. Unfortunately, the exchangeability assumption is frequently violated due to distribution shifts in practice, and the challenge is often compounded by the lack of ground truth labels at test time. Focusing on classification in this paper, our goal is to improve the quality of CP-generated prediction sets using only unlabeled data from the test domain. This is achieved by two new methods called ECP and EACP, that adjust the score function in CP according to the base model's uncertainty on the unlabeled test data. Through extensive experiments on a number of large-scale datasets and neural network architectures, we show that our methods provide consistent improvement over existing baselines and nearly match the performance of supervised algorithms.

Updated: 2024-06-03 15:16:02

标题: 将 Conformal Prediction 适应到没有标签的分布转移情况

摘要: 合规预测（CP）使机器学习模型能够输出具有保证覆盖率的预测集，假设数据可交换。不幸的是，在实践中，由于分布转移，交换性假设经常被违反，而且挑战往往因测试时缺乏地面真实标签而加剧。本文关注分类，在这篇论文中，我们的目标是仅使用来自测试域的无标签数据来改进CP生成的预测集的质量。通过两种称为ECP和EACP的新方法，根据基础模型对无标签测试数据的不确定性调整CP中的评分函数来实现这一目标。通过在多个大型数据集和神经网络架构上进行大量实验，我们展示了我们的方法在现有基线上提供了一致的改进，并几乎与监督算法的性能相匹配。

更新时间: 2024-06-03 15:16:02

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01416v1

CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework

This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-learning agent to dynamically adjust GPU resources based on carbon intensity, predicted by a time-series transformer, to balance energy-efficient sampling and energy-intensive evaluation tasks. Furthermore, CE-NAS leverages a recently proposed multi-objective optimizer to effectively reduce the NAS search space. We demonstrate the efficacy of CE-NAS in lowering carbon emissions while achieving SOTA results for both NAS datasets and open-domain NAS tasks. For example, on the HW-NasBench dataset, CE-NAS reduces carbon emissions by up to 7.22X while maintaining a search efficiency comparable to vanilla NAS. For open-domain NAS tasks, CE-NAS achieves SOTA results with 97.35% top-1 accuracy on CIFAR-10 with only 1.68M parameters and a carbon consumption of 38.53 lbs of CO2. On ImageNet, our searched model achieves 80.6% top-1 accuracy with a 0.78 ms TensorRT latency using FP16 on NVIDIA V100, consuming only 909.86 lbs of CO2, making it comparable to other one-shot-based NAS baselines.

Updated: 2024-06-03 15:13:21

标题: CE-NAS：一种端到端的碳效率神经架构搜索框架

摘要: 这项工作提出了一种新颖的神经架构搜索（NAS）方法，旨在提高模型设计过程的碳效率。所提出的CE-NAS框架解决了与NAS相关的高碳成本的关键挑战，通过探索不同NAS算法的碳排放变化和能量差异。在高层次上，CE-NAS利用强化学习代理根据由时间序列变压器预测的碳密度动态调整GPU资源，以平衡能效抽样和能耗密集型评估任务。此外，CE-NAS利用最近提出的多目标优化器有效地减少了NAS搜索空间。我们展示了CE-NAS在降低碳排放同时实现NAS数据集和开放域NAS任务的SOTA结果的功效。例如，在HW-NasBench数据集上，CE-NAS将碳排放量降低了最多达7.22倍，同时保持了与普通NAS相当的搜索效率。对于开放域NAS任务，CE-NAS在CIFAR-10上实现了97.35%的top-1准确率，仅使用1.68M个参数和38.53磅CO2的碳消耗。在ImageNet上，我们搜索到的模型在NVIDIA V100上使用FP16实现了80.6%的top-1准确率，延迟为0.78毫秒的TensorRT，仅消耗909.86磅CO2，与其他基于一次性NAS基线相当。

更新时间: 2024-06-03 15:13:21

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2406.01414v1

Large Language Models are Zero-Shot Next Location Predictors

Predicting the locations an individual will visit in the future is crucial for solving many societal issues like disease diffusion and reduction of pollution among many others. The models designed to tackle next-location prediction, however, require a significant amount of individual-level information to be trained effectively. Such data may be scarce or even unavailable in some geographic regions or peculiar scenarios (e.g., cold-start in recommendation systems). Moreover, the design of a next-location predictor able to generalize or geographically transfer knowledge is still an open research challenge. Recent advances in natural language processing have led to a rapid diffusion of Large Language Models (LLMs) which have shown good generalization and reasoning capabilities. These insights, coupled with the recent findings that LLMs are rich in geographical knowledge, allowed us to believe that these models can act as zero-shot next-location predictors. This paper evaluates the capabilities of many popular LLMs in this role, specifically Llama, GPT-3.5 and Mistral 7B. After designing a proper prompt, we tested the models on three real-world mobility datasets. The results show that LLMs can obtain accuracies up to 32.4%, a significant relative improvement of over 600% when compared to sophisticated DL models specifically designed for human mobility. Moreover, we show that other LLMs are unable to perform the task properly. To prevent positively biased results, we also propose a framework inspired by other studies to test data contamination. Finally, we explored the possibility of using LLMs as text-based explainers for next-location prediction showing that can effectively provide an explanation for their decision. Notably, 7B models provide more generic, but still reliable, explanations compared to larger counterparts. Code: github.com/ssai-trento/LLM-zero-shot-NL

Updated: 2024-06-03 15:10:53

标题: 大型语言模型是零-shot 下一个位置预测器

摘要: 预测个体将来访问的位置对解决诸如疾病传播和减少污染等许多社会问题至关重要。然而，旨在解决下一个位置预测的模型需要大量个体级信息才能有效训练。这样的数据在一些地理区域或特殊场景（如推荐系统中的冷启动）中可能稀缺甚至不可用。此外，能够泛化或地理转移知识的下一个位置预测器的设计仍然是一个开放的研究挑战。自然语言处理的最新进展导致了大规模语言模型（LLMs）的迅速扩散，这些模型表现出良好的泛化和推理能力。这些见解，结合最近发现的LLMs富含地理知识，使我们相信这些模型可以充当零-shot下一个位置预测器。本文评估了许多流行LLMs（具体为Llama、GPT-3.5和Mistral 7B）在这一角色中的能力。在设计适当的提示后，我们在三个真实世界的移动数据集上测试了模型。结果显示，LLMs可以获得高达32.4%的准确性，相对于专门为人类移动设计的复杂DL模型，显著提高了600%以上。此外，我们发现其他LLMs无法正确执行任务。为了防止结果呈现正面偏差，我们还提出了一个受其他研究启发的框架来测试数据污染。最后，我们探讨了将LLMs用作基于文本的下一个位置解释器的可能性，显示它们能有效地解释其决策。值得注意的是，7B模型相对于更大的对应物提供更通用但仍可靠的解释。代码：github.com/ssai-trento/LLM-zero-shot-NL

更新时间: 2024-06-03 15:10:53

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20962v2

Using Constraints to Discover Sparse and Alternative Subgroup Descriptions

Subgroup-discovery methods allow users to obtain simple descriptions of interesting regions in a dataset. Using constraints in subgroup discovery can enhance interpretability even further. In this article, we focus on two types of constraints: First, we limit the number of features used in subgroup descriptions, making the latter sparse. Second, we propose the novel optimization problem of finding alternative subgroup descriptions, which cover a similar set of data objects as a given subgroup but use different features. We describe how to integrate both constraint types into heuristic subgroup-discovery methods. Further, we propose a novel Satisfiability Modulo Theories (SMT) formulation of subgroup discovery as a white-box optimization problem, which allows solver-based search for subgroups and is open to a variety of constraint types. Additionally, we prove that both constraint types lead to an NP-hard optimization problem. Finally, we employ 27 binary-classification datasets to compare heuristic and solver-based search for unconstrained and constrained subgroup discovery. We observe that heuristic search methods often yield high-quality subgroups within a short runtime, also in scenarios with constraints.

Updated: 2024-06-03 15:10:01

标题: 使用约束条件发现稀疏和替代子组描述

摘要: 子群发现方法允许用户获得数据集中有趣区域的简单描述。在子群发现中使用约束可以进一步增强可解释性。在本文中，我们关注两种类型的约束：首先，我们限制在子群描述中使用的特征数量，使得描述变得稀疏。其次，我们提出了一种新颖的优化问题，即寻找替代子群描述，这些描述覆盖了与给定子群相似的数据对象集，但使用不同的特征。我们描述了如何将这两种约束类型整合到启发式子群发现方法中。此外，我们提出了一种新颖的子群发现的Satisfiability Modulo Theories (SMT)公式化，作为一个白盒优化问题，允许基于求解器的对子群进行搜索，并适用于各种约束类型。此外，我们证明了这两种约束类型都导致了一个NP困难的优化问题。最后，我们使用27个二元分类数据集来比较启发式和基于求解器的对无约束和受约束子群发现的搜索。我们观察到，启发式搜索方法通常在短时间内产生高质量的子群，即使在有约束的情况下也是如此。

更新时间: 2024-06-03 15:10:01

领域: cs.LG

下载: http://arxiv.org/abs/2406.01411v1

CF-OPT: Counterfactual Explanations for Structured Prediction

Optimization layers in deep neural networks have enjoyed a growing popularity in structured learning, improving the state of the art on a variety of applications. Yet, these pipelines lack interpretability since they are made of two opaque layers: a highly non-linear prediction model, such as a deep neural network, and an optimization layer, which is typically a complex black-box solver. Our goal is to improve the transparency of such methods by providing counterfactual explanations. We build upon variational autoencoders a principled way of obtaining counterfactuals: working in the latent space leads to a natural notion of plausibility of explanations. We finally introduce a variant of the classic loss for VAE training that improves their performance in our specific structured context. These provide the foundations of CF-OPT, a first-order optimization algorithm that can find counterfactual explanations for a broad class of structured learning architectures. Our numerical results show that both close and plausible explanations can be obtained for problems from the recent literature.

Updated: 2024-06-03 15:07:01

标题: CF-OPT：结构化预测的反事实解释

摘要: 深度神经网络中的优化层在结构化学习中越来越受欢迎，提高了各种应用的最新技术水平。然而，这些管道缺乏可解释性，因为它们由两个不透明的层组成：一个高度非线性的预测模型，如深度神经网络，以及一个通常是复杂黑匣子求解器的优化层。我们的目标是通过提供反事实解释来提高这类方法的透明度。我们借鉴变分自动编码器的原则性方法来获取反事实：在潜在空间中工作导致了解释的可信度的自然概念。最后，我们介绍了一种改进VAE训练的经典损失的变体，以提高它们在我们特定的结构化上下文中的性能。这些为CF-OPT的基础奠定了基础，这是一种可以为广泛类别的结构化学习架构找到反事实解释的一阶优化算法。我们的数值结果表明，可以为最近文献中的问题获得接近和可信的解释。

更新时间: 2024-06-03 15:07:01

领域: cs.LG

下载: http://arxiv.org/abs/2405.18293v2

Learning Partially Aligned Item Representation for Cross-Domain Sequential Recommendation

Cross-domain sequential recommendation (CDSR) aims to uncover and transfer users' sequential preferences across multiple recommendation domains. While significant endeavors have been made, they primarily concentrated on developing advanced transfer modules and aligning user representations using self-supervised learning techniques. However, the problem of aligning item representations has received limited attention, and misaligned item representations can potentially lead to sub-optimal sequential modeling and user representation alignment. To this end, we propose a model-agnostic framework called \textbf{C}ross-domain item representation \textbf{A}lignment for \textbf{C}ross-\textbf{D}omain \textbf{S}equential \textbf{R}ecommendation (\textbf{CA-CDSR}), which achieves sequence-aware generation and adaptively partial alignment for item representations. Specifically, we first develop a sequence-aware feature augmentation strategy, which captures both collaborative and sequential item correlations, thus facilitating holistic item representation generation. Next, we conduct an empirical study to investigate the partial representation alignment problem from a spectrum perspective. It motivates us to devise an adaptive spectrum filter, achieving partial alignment adaptively. Furthermore, the aligned item representations can be fed into different sequential encoders to obtain user representations. The entire framework is optimized in a multi-task learning paradigm with an annealing strategy. Extensive experiments have demonstrated that CA-CDSR can surpass state-of-the-art baselines by a significant margin and can effectively align items in representation spaces to enhance performance.

Updated: 2024-06-03 15:05:57

标题: 学习部分对齐的项目表示以进行跨领域的顺序推荐

摘要: 跨领域顺序推荐（CDSR）旨在揭示并转移用户跨多个推荐领域的顺序偏好。尽管已经付出了重大努力，但主要集中在开发先进的转移模块和使用自监督学习技术来对齐用户表示。然而，对齐项目表示的问题受到了有限关注，不正确对齐的项目表示可能导致次优的顺序建模和用户表示对齐。为此，我们提出了一个称为跨领域项目表示对齐的模型无关框架（CA-CDSR），该框架实现了对项目表示的序列感知生成和自适应部分对齐。具体来说，我们首先开发了一种序列感知特征增强策略，捕捉协作和顺序项目之间的相关性，从而促进整体项目表示生成。接下来，我们进行了一项实证研究，从谱的角度探讨部分表示对齐问题。这激励我们设计一个自适应谱滤波器，实现自适应的部分对齐。此外，对齐的项目表示可以被馈送到不同的顺序编码器中以获得用户表示。整个框架在多任务学习范式中进行优化，采用一种退火策略。大量实验表明，CA-CDSR可以明显超越最先进的基线，并可以有效地在表示空间中对齐项目以提高性能。

更新时间: 2024-06-03 15:05:57

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2405.12473v2

Mixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question Answering

Zero-shot visual question answering (VQA) is a challenging task that requires reasoning across modalities. While some existing methods rely on a single rationale within the Chain of Thoughts (CoT) framework, they may fall short of capturing the complexity of the VQA problem. On the other hand, some other methods that use multiple rationales may still suffer from low diversity, poor modality alignment, and inefficient retrieval and fusion. In response to these challenges, we propose \emph{Mixture of Rationales (MoR)}, a novel multi-modal reasoning method that mixes multiple rationales for VQA. MoR uses a single frozen Vision-and-Language Pre-trained Models (VLPM) model to {dynamically generate, retrieve and fuse multi-modal thoughts}. We evaluate MoR on two challenging VQA datasets, i.e. NLVR2 and OKVQA, with two representative backbones OFA and VL-T5. MoR achieves a 12.43\% accuracy improvement on NLVR2, and a 2.45\% accuracy improvement on OKVQA-S( the science and technology category of OKVQA).

Updated: 2024-06-03 15:04:47

标题: 混合的理由：用于视觉问答的多模态推理混合

摘要: 零样本视觉问答（VQA）是一个具有挑战性的任务，需要跨模态进行推理。虽然一些现有方法依赖于Chain of Thoughts（CoT）框架中的单一原理，但它们可能无法捕捉VQA问题的复杂性。另一方面，一些使用多个原理的方法可能仍然存在缺乏多样性、模态对齐不佳以及检索和融合效率低的问题。为了应对这些挑战，我们提出了一种新颖的多模态推理方法——混合原理（MoR），用于混合多个VQA的原理。MoR使用一个冻结的视觉和语言预训练模型（VLPM）模型来动态生成、检索和融合多模态思维。我们在两个具有挑战性的VQA数据集NLVR2和OKVQA上评估了MoR，使用了两个代表性的骨干模型OFA和VL-T5。MoR在NLVR2上实现了12.43%的准确率提升，在OKVQA-S（OKVQA的科学技术类别）上实现了2.45%的准确率提升。

更新时间: 2024-06-03 15:04:47

领域: cs.CV,cs.AI,cs.LG,I.2.10

下载: http://arxiv.org/abs/2406.01402v1

Efficient Computation Using Spatial-Photonic Ising Machines: Utilizing Low-Rank and Circulant Matrix Constraints

We explore the potential of spatial-photonic Ising machines (SPIMs) to address computationally intensive Ising problems that employ low-rank and circulant coupling matrices. Our results indicate that the performance of SPIMs is critically affected by the rank and precision of the coupling matrices. By developing and assessing advanced decomposition techniques, we expand the range of problems SPIMs can solve, overcoming the limitations of traditional Mattis-type matrices. Our approach accommodates a diverse array of coupling matrices, including those with inherently low ranks, applicable to complex NP-complete problems. We explore the practical benefits of low-rank approximation in optimization tasks, particularly in financial optimization, to demonstrate the real-world applications of SPIMs. Finally, we evaluate the computational limitations imposed by SPIM hardware precision and suggest strategies to optimize the performance of these systems within these constraints.

Updated: 2024-06-03 15:03:31

标题: 高效计算利用空间光学伊辛机：利用低秩和循环矩阵约束

摘要: 我们探讨了空间光子伊辛机（SPIMs）在解决使用低秩和循环耦合矩阵的计算密集型伊辛问题中的潜力。我们的结果表明，SPIMs的性能受耦合矩阵的秩和精度的关键影响。通过开发和评估先进的分解技术，我们扩大了SPIMs可以解决的问题范围，克服了传统Mattis类型矩阵的限制。我们的方法适用于各种耦合矩阵，包括固有低秩的矩阵，适用于复杂的NP完全问题。我们探讨了低秩逼近在优化任务中的实际好处，特别是在金融优化中，以展示SPIMs的真实世界应用。最后，我们评估了SPIM硬件精度所施加的计算限制，并提出了优化这些系统性能的策略，以在这些约束条件下实现最佳性能。

更新时间: 2024-06-03 15:03:31

领域: physics.comp-ph,cond-mat.dis-nn,cs.ET,cs.LG,physics.optics

下载: http://arxiv.org/abs/2406.01400v1

Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations

To some, the advent of artificial intelligence (AI) promises better decision-making and increased military effectiveness while reducing the influence of human error and emotions. However, there is still debate about how AI systems, especially large language models (LLMs), behave compared to humans in high-stakes military decision-making scenarios with the potential for increased risks towards escalation and unnecessary conflicts. To test this potential and scrutinize the use of LLMs for such purposes, we use a new wargame experiment with 107 national security experts designed to look at crisis escalation in a fictional US-China scenario and compare human players to LLM-simulated responses in separate simulations. Wargames have a long history in the development of military strategy and the response of nations to threats or attacks. Here, we show a considerable high-level agreement in the LLM and human responses and significant quantitative and qualitative differences in individual actions and strategic tendencies. These differences depend on intrinsic biases in LLMs regarding the appropriate level of violence following strategic instructions, the choice of LLM, and whether the LLMs are tasked to decide for a team of players directly or first to simulate dialog between players. When simulating the dialog, the discussions lack quality and maintain a farcical harmony. The LLM simulations cannot account for human player characteristics, showing no significant difference even for extreme traits, such as "pacifist" or "aggressive sociopath". Our results motivate policymakers to be cautious before granting autonomy or following AI-based strategy recommendations.

Updated: 2024-06-03 15:00:47

标题: 人类对比机器：专家人类和语言模型在战争游戏模拟中的行为差异

摘要: 有人认为，人工智能（AI）的出现承诺着更好的决策和增强的军事效能，同时减少了人为错误和情绪的影响。然而，关于AI系统，尤其是大型语言模型（LLMs）在高风险军事决策情境中与人类的行为相比如何仍存在争议，这可能导致升级和不必要冲突的风险增加。为了测试这种潜力并审查LLMs在此类目的中的使用，我们进行了一项新的战争游戏实验，共有107名国家安全专家参与，旨在研究虚构的美中危机升级情景，并将人类玩家与LLM模拟响应在单独的模拟中进行比较。战争游戏在军事战略发展和国家对威胁或攻击的反应方面有着悠久的历史。在这里，我们展示了LLM和人类响应之间显著的高级别一致性，以及在个体行动和战略倾向上的显著的定量和定性差异。这些差异取决于LLMs在执行战略指令时关于适当暴力水平的内在偏见，LLM的选择，以及LLMs是否直接为一组玩家做出决定，还是首先模拟玩家之间的对话。在模拟对话时，讨论质量欠缺，保持荒谬的和谐。LLM模拟无法考虑人类玩家的特征，即使是极端特质，例如“和平主义者”或“侵略性狂人”，也没有显著差异。我们的结果促使决策者在授予自主权或遵循基于AI的战略建议之前要保持谨慎。

更新时间: 2024-06-03 15:00:47

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.03407v2

Semi-supervised Contrastive Learning Using Partial Label Information

In semi-supervised learning, information from unlabeled examples is used to improve the model learned from labeled examples. In some learning problems, partial label information can be inferred from otherwise unlabeled examples and used to further improve the model. In particular, partial label information exists when subsets of training examples are known to have the same label, even though the label itself is missing. By encouraging the model to give the same label to all such examples through contrastive learning objectives, we can potentially improve its performance. We call this encouragement Nullspace Tuning because the difference vector between any pair of examples with the same label should lie in the nullspace of a linear model. In this paper, we investigate the benefit of using partial label information using a careful comparison framework over well-characterized public datasets. We show that the additional information provided by partial labels reduces test error over good semi-supervised methods usually by a factor of 2, up to a factor of 5.5 in the best case. We also show that adding Nullspace Tuning to the newer and state-of-the-art MixMatch method decreases its test error by up to a factor of 1.8.

Updated: 2024-06-03 14:59:41

标题: 半监督对比学习利用部分标签信息

摘要: 在半监督学习中，利用未标记示例的信息来改进从已标记示例中学习的模型。在某些学习问题中，可以从否则未标记的示例中推断出部分标签信息，并用于进一步改进模型。特别是，在已知训练示例的子集具有相同标签时存在部分标签信息，即使标签本身缺失。通过通过对比学习目标鼓励模型为所有这类示例提供相同的标签，我们可以潜在地提高其性能。我们将这种鼓励称为零空间调谐，因为具有相同标签的任何一对示例之间的差向量应位于线性模型的零空间中。在本文中，我们通过对公认数据集进行细致的比较框架，调查使用部分标签信息的好处。我们表明，部分标签提供的额外信息通常可以将测试误差降低一半，最佳情况下可降低5.5倍。我们还表明，将零空间调谐添加到较新的MixMatch方法中可以将其测试误差降低1.8倍。

更新时间: 2024-06-03 14:59:41

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2003.07921v2

PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration

The widespread usage of online Large Language Models (LLMs) inference services has raised significant privacy concerns about the potential exposure of private information in user inputs to eavesdroppers or untrustworthy service providers. Existing privacy protection methods for LLMs suffer from insufficient privacy protection, performance degradation, or severe inference time overhead. In this paper, we propose PrivacyRestore to protect the privacy of user inputs during LLM inference. PrivacyRestore directly removes privacy spans in user inputs and restores privacy information via activation steering during inference. The privacy spans are encoded as restoration vectors. We propose Attention-aware Weighted Aggregation (AWA) which aggregates restoration vectors of all privacy spans in the input into a meta restoration vector. AWA not only ensures proper representation of all privacy spans but also prevents attackers from inferring the privacy spans from the meta restoration vector alone. This meta restoration vector, along with the query with privacy spans removed, is then sent to the server. The experimental results show that PrivacyRestore can protect private information while maintaining acceptable levels of performance and inference efficiency.

Updated: 2024-06-03 14:57:39

标题: PrivacyRestore：通过隐私消除和恢复在大型语言模型中实现隐私保护的推理

摘要: 在线大语言模型（LLMs）推断服务的广泛使用引发了对用户输入中私人信息可能被窃听者或不可信任的服务提供商曝露的重大隐私担忧。现有的LLMs隐私保护方法存在隐私保护不足、性能下降或推断时间严重超载的问题。本文提出了PrivacyRestore来保护LLM推断过程中用户输入的隐私。PrivacyRestore直接移除用户输入中的隐私区域，并通过推断过程中的激活引导恢复隐私信息。隐私区域被编码为恢复向量。我们提出了注意力感知加权聚合（AWA），将输入中所有隐私区域的恢复向量聚合成一个元恢复向量。AWA不仅确保所有隐私区域的适当表示，还防止攻击者仅从元恢复向量中推断隐私区域。然后将这个元恢复向量与移除隐私区域的查询一起发送到服务器。实验结果表明，PrivacyRestore可以在保持可接受的性能和推断效率水平的同时保护私人信息。

更新时间: 2024-06-03 14:57:39

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.01394v1

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a policy basis so that each (sub)policy in it solves a well-defined subproblem. In a task described by a finite state automaton (FSA) that involves the same set of subproblems, the combination of these (sub)policies can then be used to generate an optimal solution without additional learning. In contrast to other methods that combine (sub)policies via planning, our method asymptotically attains global optimality, even in stochastic environments.

Updated: 2024-06-03 14:56:28

标题: 利用学习的策略基础进行规划，以最优化地解决复杂任务

摘要: 传统的强化学习（RL）方法可以成功地解决各种顺序决策问题。然而，在具有非马尔可夫奖励规范的环境中学习可以在多个任务之间可预测地泛化的策略是一个具有挑战性的问题。我们建议使用后继特征来学习一个策略基础，使得其中的每个（子）策略都解决一个明确定义的子问题。在由有限状态自动机（FSA）描述的任务中涉及相同子问题集的情况下，这些（子）策略的组合可以被用来生成一个最优解，而无需额外学习。与通过规划组合（子）策略的其他方法不同，我们的方法在随机环境中也可以渐近地达到全局最优性。

更新时间: 2024-06-03 14:56:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.15301v2

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

In many real-world decision problems there is partially observed, hidden or latent information that remains fixed throughout an interaction. Such decision problems can be modeled as Latent Markov Decision Processes (LMDPs), where a latent variable is selected at the beginning of an interaction and is not disclosed to the agent. In the last decade, there has been significant progress in solving LMDPs under different structural assumptions. However, for general LMDPs, there is no known learning algorithm that provably matches the existing lower bound~\cite{kwon2021rl}. We introduce the first sample-efficient algorithm for LMDPs without any additional structural assumptions. Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments. Specifically, we establish a novel off-policy evaluation lemma and introduce a new coverage coefficient for LMDPs. Then, we show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm. These results, we believe, can be valuable for a wide range of interactive learning problems beyond LMDPs, and especially, for partially observed environments.

Updated: 2024-06-03 14:51:27

标题: 潜在MDP中的RL是可解的：通过离线策略评估实现在线保证

摘要: 在许多实际决策问题中，存在部分观察、隐藏或潜在信息，这些信息在整个交互过程中保持固定。这类决策问题可以建模为潜在马尔可夫决策过程（LMDPs），其中在交互开始时选择一个潜在变量，并且不向代理披露。在过去十年中，已经在不同结构假设下解决了LMDPs的一些问题。然而，对于一般的LMDPs，目前没有已知的学习算法能够确保与现有的下限匹配。我们介绍了第一个在没有额外结构假设的情况下的LMDPs的样本高效算法。我们的结果基于对离线策略评估保证和覆盖系数在LMDPs中的作用的新视角，这一视角在部分观察环境中的探索上下文中被忽视。具体而言，我们建立了一个新颖的离线策略评估引理，并引入了一个新的LMDPs覆盖系数。然后，我们展示了如何利用这些信息来推导一个乐观探索算法的近乎最优保证。我们相信，这些结果对超出LMDPs范围的广泛交互式学习问题，特别是部分观察环境，可能具有价值。

更新时间: 2024-06-03 14:51:27

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.01389v1

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Compressing high-capability Large Language Models (LLMs) has emerged as a favored strategy for resource-efficient inferences. While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected. This study conducts the first, thorough evaluation of three (3) leading LLMs using five (5) SoTA compression techniques across eight (8) trustworthiness dimensions. Our experiments highlight the intricate interplay between compression and trustworthiness, revealing some interesting patterns. We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously. For instance, a 4-bit quantized model retains the trustworthiness of its original counterpart, but model pruning significantly degrades trustworthiness, even at 50% sparsity. Moreover, employing quantization within a moderate bit range could unexpectedly improve certain trustworthiness dimensions such as ethics and fairness. Conversely, extreme quantization to very low bit levels (3 bits) tends to reduce trustworthiness significantly. This increased risk cannot be uncovered by looking at benign performance alone, in turn, mandating comprehensive trustworthiness evaluation in practice. These findings culminate in practical recommendations for simultaneously achieving high utility, efficiency, and trustworthiness in LLMs. Code and models are available at https://decoding-comp-trust.github.io.

Updated: 2024-06-03 14:49:00

标题: 解码压缩信任：审查在压缩下高效LLMs的可信度

摘要: 将高性能的大型语言模型（LLMs）进行压缩已成为资源高效推理的首选策略。虽然最先进的压缩方法在保留良好任务性能方面取得了令人印象深刻的进展，但在安全性和可信度方面的潜在风险却被大多数忽略。本研究首次对三种领先的LLMs使用五种最先进的压缩技术在八个可信度维度上进行了彻底评估。我们的实验突显了压缩和可信度之间复杂的相互作用，揭示了一些有趣的模式。我们发现，目前量化比修剪更有效，可以同时实现效率和可信度。例如，一个4位量化模型保留了其原始对应模型的可信度，但模型修剪会显著降低可信度，即使在50%的稀疏性下也是如此。此外，在适度的位范围内使用量化可能会意外地提高某些可信度维度，如道德和公平性。相反，将量化极端降低到非常低的位级别（3位）往往会显著降低可信度。这种增加的风险不能仅通过观察良好性能来发现，因此在实践中需要进行全面的可信度评估。这些发现为在LLMs中同时实现高效用、效率和可信度提供了实用建议。代码和模型可在https://decoding-comp-trust.github.io找到。

更新时间: 2024-06-03 14:49:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.15447v2

FusionDTI: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction

Predicting drug-target interaction (DTI) is critical in the drug discovery process. Despite remarkable advances in recent DTI models through the integration of representations from diverse drug and target encoders, such models often struggle to capture the fine-grained interactions between drugs and protein, i.e. the binding of specific drug atoms (or substructures) and key amino acids of proteins, which is crucial for understanding the binding mechanisms and optimising drug design. To address this issue, this paper introduces a novel model, called FusionDTI, which uses a token-level Fusion module to effectively learn fine-grained information for Drug-Target Interaction. In particular, our FusionDTI model uses the SELFIES representation of drugs to mitigate sequence fragment invalidation and incorporates the structure-aware (SA) vocabulary of target proteins to address the limitation of amino acid sequences in structural information, additionally leveraging pre-trained language models extensively trained on large-scale biomedical datasets as encoders to capture the complex information of drugs and targets. Experiments on three well-known benchmark datasets show that our proposed FusionDTI model achieves the best performance in DTI prediction compared with seven existing state-of-the-art baselines. Furthermore, our case study indicates that FusionDTI could highlight the potential binding sites, enhancing the explainability of the DTI prediction.

Updated: 2024-06-03 14:48:54

标题: FusionDTI：使用令牌级融合进行药物靶标相互作用的精细结合发现

摘要: 预测药物-靶标相互作用（DTI）在药物发现过程中至关重要。尽管最近的DTI模型在整合来自不同药物和靶标编码器的表示方面取得了显著进展，但这些模型通常难以捕捉药物和蛋白质之间的细粒度相互作用，即特定药物原子（或亚结构）与蛋白质的关键氨基酸的结合，这对于理解结合机制和优化药物设计至关重要。为了解决这个问题，本文介绍了一种名为FusionDTI的新模型，该模型使用一个基于令牌级别的Fusion模块，有效地学习药物-靶标相互作用的细粒度信息。特别是，我们的FusionDTI模型使用药物的SELFIES表示来减轻序列片段无效化，并结合目标蛋白的结构感知（SA）词汇来解决氨基酸序列在结构信息方面的局限性，此外，还广泛利用在大规模生物医学数据集上进行了充分训练的预训练语言模型作为编码器，以捕捉药物和靶标的复杂信息。对三个知名基准数据集的实验表明，我们提出的FusionDTI模型在DTI预测中表现最佳，与七个现有的最先进基线相比。此外，我们的案例研究表明，FusionDTI可以突出显示潜在的结合位点，提高了DTI预测的可解释性。

更新时间: 2024-06-03 14:48:54

领域: q-bio.QM,cs.AI,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2406.01651v1

Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond

We introduce a novel framework of combinatorial multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT), where the outcome of each arm is a $d$-dimensional multivariant random variable and the feedback follows a general arm triggering process. Compared with existing CMAB works, CMAB-MT not only enhances the modeling power but also allows improved results by leveraging distinct statistical properties for multivariant random variables. For CMAB-MT, we propose a general 1-norm multivariant and triggering probability-modulated smoothness condition, and an optimistic CUCB-MT algorithm built upon this condition. Our framework can include many important problems as applications, such as episodic reinforcement learning (RL) and probabilistic maximum coverage for goods distribution, all of which meet the above smoothness condition and achieve matching or improved regret bounds compared to existing works. Through our new framework, we build the first connection between the episodic RL and CMAB literature, by offering a new angle to solve the episodic RL through the lens of CMAB, which may encourage more interactions between these two important directions.

Updated: 2024-06-03 14:48:53

标题: 多变量组合多臂老虎机在情节强化学习及其他领域中的应用

摘要: 我们引入了一种新颖的组合多臂老虎机（CMAB）框架，其中具有多变量和概率触发臂（CMAB-MT），其中每个臂的结果是一个$d$维多变量随机变量，反馈遵循一般的臂触发过程。与现有的CMAB作品相比，CMAB-MT不仅增强了建模能力，还通过利用多变量随机变量的不同统计特性实现了改进的结果。对于CMAB-MT，我们提出了一个一般的1范数多变量和触发概率调制平滑条件，以及基于这个条件构建的乐观CUCB-MT算法。我们的框架可以将许多重要问题作为应用，例如阶段性强化学习（RL）和商品分配的概率最大覆盖，所有这些都符合上述平滑条件，并与现有作品相比实现匹配或改进的后悔界限。通过我们的新框架，我们建立了阶段性RL和CMAB文献之间的第一个联系，通过提供一个新的角度来解决阶段性RL，这可能鼓励这两个重要方向之间更多的互动。

更新时间: 2024-06-03 14:48:53

领域: cs.LG

下载: http://arxiv.org/abs/2406.01386v1

Extending Structural Causal Models for Use in Autonomous Embodied Systems

Much work has been done to develop causal reasoning techniques across a number of domains, however the utilisation of causality within autonomous systems is still in its infancy. Autonomous systems would greatly benefit from the integration of causality through the use of representations such as structural causal models (SCMs). The system would be afforded a higher level of transparency, it would enable post-hoc explanations of outcomes, and assist in the online inference of exogenous variables. These qualities are either directly beneficial to the autonomous system or a valuable step in building public trust and informing regulation. To such an end we present a case study in which we describe a module-based autonomous driving system comprised of SCMs. Approaching this task requires considerations of a number of challenges when dealing with a system of great complexity and size, that must operate for extended periods of time by itself. Here we describe these challenges, and present solutions. The first of these is SCM contexts, with the remainder being three new variable categories -- two of which are based upon functional programming monads. Finally, we conclude by presenting an example application of the causal capabilities of the autonomous driving system. In this example, we aim to attribute culpability between vehicular agents in a hypothetical road collision incident.

Updated: 2024-06-03 14:47:05

标题: 扩展结构因果模型以用于自主体系中

摘要: 已经做了大量工作来发展跨越多个领域的因果推理技术，然而在自主系统中利用因果关系仍处于起步阶段。自主系统将极大地受益于通过使用结构性因果模型（SCMs）来整合因果关系。该系统将获得更高级别的透明度，它将使结果的事后解释成为可能，并有助于外生变量的在线推断。这些特点要么直接有益于自主系统，要么是建立公众信任和制定监管的宝贵步骤。为此，我们提供了一个案例研究，描述了由SCM组成的基于模块的自动驾驶系统。解决这一任务需要考虑处理一个极为复杂且庞大的系统时所面临的许多挑战，这个系统必须能够自主运行很长时间。在这里，我们描述了这些挑战，并提出解决方案。首先是SCM上下文，其余三个是基于函数式编程monad的新变量类别。最后，我们通过展示自动驾驶系统的因果能力的实例应用来总结。在这个例子中，我们旨在在假设的道路碰撞事件中为车辆代理之间的过失归因。

更新时间: 2024-06-03 14:47:05

领域: cs.AI,cs.RO,cs.SE,D.1.5; D.2.11; I.2.9; J.2

下载: http://arxiv.org/abs/2406.01384v1

Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function

What makes large language models (LLMs) impressive is also what makes them hard to evaluate: their diversity of uses. To evaluate these models, we must understand the purposes they will be used for. We consider a setting where these deployment decisions are made by people, and in particular, people's beliefs about where an LLM will perform well. We model such beliefs as the consequence of a human generalization function: having seen what an LLM gets right or wrong, people generalize to where else it might succeed. We collect a dataset of 19K examples of how humans make generalizations across 79 tasks from the MMLU and BIG-Bench benchmarks. We show that the human generalization function can be predicted using NLP methods: people have consistent structured ways to generalize. We then evaluate LLM alignment with the human generalization function. Our results show that -- especially for cases where the cost of mistakes is high -- more capable models (e.g. GPT-4) can do worse on the instances people choose to use them for, exactly because they are not aligned with the human generalization function.

Updated: 2024-06-03 14:45:21

标题: 大型语言模型的表现是否符合人们的期望？衡量人类概括函数

摘要: 大型语言模型(LLMs)令人印象深刻的地方也是使它们难以评估的地方：它们多样的用途。要评估这些模型，我们必须了解它们将被用于何种目的。我们考虑一个情境，即这些部署决策是由人类做出的，特别是人们对LLM在哪些领域表现良好的信念。我们将这种信念建模为人类概括函数的结果：人们根据LLM做对或做错的情况，推想它可能在哪些其他领域取得成功。我们收集了一个数据集，包含了来自MMLU和BIG-Bench基准测试中的79个任务的19K个人类如何对概括的示例。我们展示了人类概括函数可以使用自然语言处理方法来预测：人们有一致的结构化方式来概括。然后我们评估LLM与人类概括函数的一致性。我们的结果表明，尤其是在错误成本高的情况下，更强大的模型(例如GPT-4)在人们选择用它们进行的实例上可能表现更差，正是因为它们与人类概括函数不一致。

更新时间: 2024-06-03 14:45:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.01382v1

TAGMol: Target-Aware Gradient-guided Molecule Generation

3D generative models have shown significant promise in structure-based drug design (SBDD), particularly in discovering ligands tailored to specific target binding sites. Existing algorithms often focus primarily on ligand-target binding, characterized by binding affinity. Moreover, models trained solely on target-ligand distribution may fall short in addressing the broader objectives of drug discovery, such as the development of novel ligands with desired properties like drug-likeness, and synthesizability, underscoring the multifaceted nature of the drug design process. To overcome these challenges, we decouple the problem into molecular generation and property prediction. The latter synergistically guides the diffusion sampling process, facilitating guided diffusion and resulting in the creation of meaningful molecules with the desired properties. We call this guided molecular generation process as TAGMol. Through experiments on benchmark datasets, TAGMol demonstrates superior performance compared to state-of-the-art baselines, achieving a 22% improvement in average Vina Score and yielding favorable outcomes in essential auxiliary properties. This establishes TAGMol as a comprehensive framework for drug generation.

Updated: 2024-06-03 14:43:54

标题: TAGMol：面向目标的梯度引导分子生成

摘要: 3D生成模型在基于结构的药物设计（SBDD）中显示出重要的潜力，特别是在发现针对特定靶点结合位点定制的配体方面。现有算法通常主要关注配体-靶标结合，以结合亲和力为特征。此外，仅在靶标-配体分布上训练的模型可能无法解决药物发现的更广泛目标，比如开发具有期望特性（如药物样性和合成可行性）的新型配体，突显了药物设计过程的多方面性质。为了克服这些挑战，我们将问题分解为分子生成和性质预测两部分。后者协同地引导扩散采样过程，促进引导扩散，从而创造具有期望性质的有意义分子。我们将这一引导分子生成过程称为TAGMol。通过对基准数据集的实验，TAGMol相对于最先进的基线表现出更优异的性能，平均Vina评分提高了22％，在关键辅助性质上产生了有利的结果。这将TAGMol确立为药物生成的全面框架。

更新时间: 2024-06-03 14:43:54

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.01650v1

A Theory of Learnability for Offline Decision Making

We study the problem of offline decision making, which focuses on learning decisions from datasets only partially correlated with the learning objective. While previous research has extensively studied specific offline decision making problems like offline reinforcement learning (RL) and off-policy evaluation (OPE), a unified framework and theory remain absent. To address this gap, we introduce a unified framework termed Decision Making with Offline Feedback (DMOF), which captures a wide range of offline decision making problems including offline RL, OPE, and offline partially observable Markov decision processes (POMDPs). For the DMOF framework, we introduce a hardness measure called the Offline Estimation Coefficient (OEC), which measures the learnability of offline decision making problems and is also reflected in the derived minimax lower bounds. Additionally, we introduce an algorithm called Empirical Decision with Divergence (EDD), for which we establish both an instance-dependent upper bound and a minimax upper bound. The minimax upper bound almost matches the lower bound determined by the OEC. Finally, we show that EDD achieves a fast convergence rate (i.e., a rate scaling as $1/N$, where $N$ is the sample size) for specific settings such as supervised learning and Markovian sequential problems~(e.g., MDPs) with partial coverage.

Updated: 2024-06-03 14:42:31

标题: 一个适用于离线决策制定的可学习性理论

摘要: 我们研究了离线决策问题，重点是从仅与学习目标部分相关的数据集中学习决策。尽管先前的研究已广泛研究了特定的离线决策问题，如离线强化学习（RL）和离线策略评估（OPE），但统一的框架和理论仍然缺失。为了填补这一空白，我们引入了一个统一的框架，称为离线反馈决策（DMOF），它涵盖了一系列离线决策问题，包括离线RL、OPE和离线部分可观察马尔可夫决策过程（POMDP）。对于DMOF框架，我们引入了一个称为离线估计系数（OEC）的难度度量，该度量衡量了离线决策问题的可学习性，并且也反映在导出的极小下界中。此外，我们介绍了一种名为经验决策与差异（EDD）的算法，我们为其建立了一个基于实例的上界和一个极小上界。极小上界几乎与OEC确定的下界相匹配。最后，我们展示了EDD在特定设置下（例如监督学习和具有部分覆盖率的马尔可夫序贯问题（例如MDPs））实现了快速收敛速率（即一个与样本大小$N$成比例的速率）。

更新时间: 2024-06-03 14:42:31

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01378v1

Multi-Agent Transfer Learning via Temporal Contrastive Learning

This paper introduces a novel transfer learning framework for deep multi-agent reinforcement learning. The approach automatically combines goal-conditioned policies with temporal contrastive learning to discover meaningful sub-goals. The approach involves pre-training a goal-conditioned agent, finetuning it on the target domain, and using contrastive learning to construct a planning graph that guides the agent via sub-goals. Experiments on multi-agent coordination Overcooked tasks demonstrate improved sample efficiency, the ability to solve sparse-reward and long-horizon problems, and enhanced interpretability compared to baselines. The results highlight the effectiveness of integrating goal-conditioned policies with unsupervised temporal abstraction learning for complex multi-agent transfer learning. Compared to state-of-the-art baselines, our method achieves the same or better performances while requiring only 21.7% of the training samples.

Updated: 2024-06-03 14:42:14

标题: 多智能体通过时间对比学习的迁移学习

摘要: 本文介绍了一种新颖的深度多智能体强化学习的迁移学习框架。该方法自动地将目标条件策略与时间对比学习相结合，以发现有意义的子目标。该方法涉及预训练一个目标条件代理，对其在目标域上进行微调，并使用对比学习构建一个规划图，通过子目标引导代理。在多智能体协调烹饪任务上的实验表明，相比基线，该方法具有改进的样本效率、解决稀疏奖励和长时间跨度问题的能力，以及增强的可解释性。结果突显了将目标条件策略与无监督的时间抽象学习集成到复杂多智能体迁移学习中的有效性。与最先进的基线相比，我们的方法在只需21.7%的训练样本的情况下实现了相同或更好的性能。

更新时间: 2024-06-03 14:42:14

领域: cs.AI

下载: http://arxiv.org/abs/2406.01377v1

Knockout: A simple way to handle missing inputs

Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training multiple models. Marginalization can obtain calibrated predictions but it is computationally costly and therefore only feasible for low dimensional inputs. Imputation may result in inaccurate predictions because it employs point estimates for missing variables and does not work well for high dimensional inputs (e.g., images). Training multiple models whereby each model takes different subsets of inputs can work well but requires knowing missing input patterns in advance. Furthermore, training and retaining multiple models can be costly. We propose an efficient way to learn both the conditional distribution using full inputs and the marginal distributions. Our method, Knockout, randomly replaces input features with appropriate placeholder values during training. We provide a theoretical justification of Knockout and show that it can be viewed as an implicit marginalization strategy. We evaluate Knockout in a wide range of simulations and real-world datasets and show that it can offer strong empirical performance.

Updated: 2024-06-03 14:40:28

标题: 淘汰：处理缺失输入的简单方法

摘要: 深度学习模型可以从复杂的输入中提取具有预测性和可操作性的信息。输入越丰富，这些模型通常表现得越好。然而，利用丰富输入（例如多模态）的模型可能难以广泛部署，因为推断时可能会缺少一些输入。目前流行的解决这个问题的方法包括边缘化、插补和训练多个模型。边缘化可以获得校准预测，但计算成本高，因此只适用于低维输入。插补可能导致不准确的预测，因为它对缺失变量采用点估计，并且不适用于高维输入（例如图像）。训练多个模型，其中每个模型采用不同的输入子集，可能效果良好，但需要事先知道缺失输入模式。此外，训练和保留多个模型可能成本高昂。我们提出了一种有效的方法来学习完整输入的条件分布和边缘分布。我们的方法 Knockout 在训练过程中随机替换输入特征为适当的占位值。我们提供了 Knockout 的理论证明，并展示它可以被视为一种隐含的边缘化策略。我们在各种模拟和真实世界数据集中评估了 Knockout，并展示它可以提供强大的经验性能。

更新时间: 2024-06-03 14:40:28

领域: cs.LG

下载: http://arxiv.org/abs/2405.20448v2

Multiscale Causal Learning

Biological intelligence is more sample-efficient than artificial intelligence (AI), learning from fewer examples. Here we answer why. Given data, there can be many policies which seem "correct" because they perfectly fit the data. However, only one correct policy could have actually caused the data. Sample-efficiency requires a means of discerning which. Previous work showed sample efficiency is maximised by weak-policy-optimisation (WPO); preferring policies that more weakly constrain what is considered to be correct, given finite resources. Biology's sample-efficiency demonstrates it is better at WPO. To understand how, we formalise the "multiscale-competency-architecture" (MCA) observed in biological systems, as a sequence of nested "agentic-abstraction-layers". We show that WPO at low levels enables synthesis of weaker policies at high. We call this "multiscale-causal-learning", and argue this is how we might construct more scale-able, sample-efficient and reliable AI. Furthermore, a sufficiently weak policy at low levels is a precondition of collective policy at higher levels. The higher level "identity" of the collective is lost if lower levels use an insufficiently weak policy (e.g. cells may become isolated from the collective informational structure and revert to primitive behaviour). This has implications for biology, machine learning, AI-safety, and philosophy.

Updated: 2024-06-03 14:38:08

标题: 多尺度因果学习

摘要: 生物智能比人工智能更节省样本，从更少的例子中学习。在这里，我们回答为什么。在给定数据的情况下，可能会有许多策略看起来“正确”，因为它们完全符合数据。然而，只有一个正确的策略实际上可能导致了数据。样本效率需要一种方法来分辨哪个是正确的。先前的研究表明，弱策略优化（WPO）最大化了样本效率；更倾向于那些在有限资源下更弱地约束被认为是正确的政策。生物学的样本效率表明它在WPO方面更擅长。为了理解这一点，我们将在生物系统中观察到的“多尺度能力架构”（MCA）形式化为一系列嵌套的“主体抽象层”。我们展示了低层次的WPO能够促进高层次合成更弱的政策。我们称之为“多尺度因果学习”，并认为这是我们可能构建更具规模、节约样本和可靠的人工智能的方法。此外，低层次的足够弱政策是高层次集体政策的前提条件。如果低层次使用的政策不够弱（例如，细胞可能会从集体信息结构中隔离出来，并恢复到原始行为），集体的高层次“身份”将丢失。这对生物学、机器学习、人工智能安全和哲学都有影响。

更新时间: 2024-06-03 14:38:08

领域: cs.AI

下载: http://arxiv.org/abs/2405.02325v2

Aligner: Efficient Alignment by Learning to Correct

With the rapid development of large language models (LLMs) and ever-evolving practical requirements, finding an efficient and effective alignment method has never been more critical. However, the tension between the complexity of current alignment methods and the need for rapid iteration in deployment scenarios necessitates the development of a model-agnostic alignment approach that can operate under these constraints. In this paper, we introduce Aligner, a novel and simple alignment paradigm that learns the correctional residuals between preferred and dispreferred answers using a small model. Designed as a model-agnostic, plug-and-play module, Aligner can be directly applied to various open-source and API-based models with only one-off training, making it suitable for rapid iteration. Notably, Aligner can be applied to any powerful, large-scale upstream models. Moreover, it can even iteratively bootstrap the upstream models using corrected responses as synthetic human preference data, breaking through the model's performance ceiling. Our experiments demonstrate performance improvements by deploying the same Aligner model across 11 different LLMs, evaluated on the 3H dimensions (helpfulness, harmlessness, and honesty). Specifically, Aligner-7B has achieved an average improvement of 68.9\% in helpfulness and 23.8\% in harmlessness across the tested LLMs while also effectively reducing hallucination. In the Alpaca-Eval leaderboard, stacking Aligner-2B on GPT-4 Turbo improved its LC Win Rate from 55.0\% to 58.3\%, surpassing GPT-4 Omni's 57.5\% Win Rate (community report).

Updated: 2024-06-03 14:33:45

标题: Aligner：通过学习校正实现高效对齐

摘要: 随着大型语言模型（LLMs）的快速发展和不断发展的实际需求，寻找一种高效且有效的对齐方法变得更加关键。然而，当前对齐方法的复杂性与部署场景中快速迭代的需求之间的紧张关系，需要开发一种可以在这些约束条件下运行的模型无关的对齐方法。本文介绍了Aligner，这是一种新颖且简单的对齐范例，它利用小型模型学习首选答案和非首选答案之间的校正残差。作为一种模型无关的即插即用模块，Aligner 可直接应用于各种开源和基于 API 的模型，只需进行一次性训练，适用于快速迭代。值得注意的是，Aligner 可应用于任何强大的大规模上游模型。此外，它甚至可以使用经过校正的响应作为合成人类偏好数据，通过模型的性能上限。我们的实验表明，通过在11种不同的LLMs上部署相同的Aligner模型，可以提高性能，评估3H维度（帮助性、无害性和诚实性）。具体而言，Aligner-7B在测试的LLMs中，在帮助性方面平均提高了68.9％，在无害性方面提高了23.8％，同时有效地减少了幻觉。在Alpaca-Eval排行榜上，将Aligner-2B叠加在GPT-4 Turbo上，将其LC Win Rate从55.0％提高到58.3％，超过了GPT-4 Omni的57.5％Win Rate（社区报告）。

更新时间: 2024-06-03 14:33:45

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.02416v3

From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation

Understanding the inner working functionality of large-scale deep neural networks is challenging yet crucial in several high-stakes applications. Mechanistic inter- pretability is an emergent field that tackles this challenge, often by identifying human-understandable subgraphs in deep neural networks known as circuits. In vision-pretrained models, these subgraphs are usually interpreted by visualizing their node features through a popular technique called feature visualization. Recent works have analyzed the stability of different feature visualization types under the adversarial model manipulation framework. This paper starts by addressing limitations in existing works by proposing a novel attack called ProxPulse that simultaneously manipulates the two types of feature visualizations. Surprisingly, when analyzing these attacks under the umbrella of visual circuits, we find that visual circuits show some robustness to ProxPulse. We, therefore, introduce a new attack based on ProxPulse that unveils the manipulability of visual circuits, shedding light on their lack of robustness. The effectiveness of these attacks is validated using pre-trained AlexNet and ResNet-50 models on ImageNet.

Updated: 2024-06-03 14:32:39

标题: 从特征可视化到视觉电路：对抗模型操纵的影响

摘要: 理解大规模深度神经网络的内部工作功能在一些高风险应用中具有挑战性但又至关重要。机械性可解释性是一个新兴领域，通过识别深度神经网络中人类可理解的子图（称为电路）来解决这一挑战。在视觉预训练模型中，通过一种称为特征可视化的流行技术来解释这些子图的节点特征。最近的研究分析了在对抗模型操纵框架下不同特征可视化类型的稳定性。本文首先解决了现有研究中的局限性，提出了一种称为ProxPulse的新攻击，同时操纵了两种特征可视化类型。令人惊讶的是，在将这些攻击分析为视觉电路的范畴下，我们发现视觉电路对ProxPulse表现出一定的鲁棒性。因此，我们提出了一种基于ProxPulse的新攻击，揭示了视觉电路的可操纵性，阐明了它们缺乏鲁棒性。这些攻击的有效性是通过在ImageNet上使用预训练的AlexNet和ResNet-50模型进行验证的。

更新时间: 2024-06-03 14:32:39

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.01365v1

BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards

Input-output safeguards are used to detect anomalies in the traces produced by Large Language Models (LLMs) systems. These detectors are at the core of diverse safety-critical applications such as real-time monitoring, offline evaluation of traces, and content moderation. However, there is no widely recognized methodology to evaluate them. To fill this gap, we introduce the Benchmarks for the Evaluation of LLM Safeguards (BELLS), a structured collection of tests, organized into three categories: (1) established failure tests, based on already-existing benchmarks for well-defined failure modes, aiming to compare the performance of current input-output safeguards; (2) emerging failure tests, to measure generalization to never-seen-before failure modes and encourage the development of more general safeguards; (3) next-gen architecture tests, for more complex scaffolding (such as LLM-agents and multi-agent systems), aiming to foster the development of safeguards that could adapt to future applications for which no safeguard currently exists. Furthermore, we implement and share the first next-gen architecture test, using the MACHIAVELLI environment, along with an interactive visualization of the dataset.

Updated: 2024-06-03 14:32:30

标题: BELLS: 一个用于评估LLM安全机制的面向未来的基准架构

摘要: 输入输出保障用于检测大型语言模型（LLMs）系统产生的痕迹中的异常。这些检测器是各种安全关键应用的核心，如实时监控、脱机痕迹评估和内容审核。然而，目前没有被广泛认可的方法来评估它们。为了填补这一空白，我们引入了LLM保障评估基准（BELLS），这是一个结构化的测试集合，分为三类：（1）基于已有明确定义故障模式基准的既有故障测试，旨在比较当前输入输出保障的性能；（2）新兴故障测试，用于衡量对以前未见过的故障模式的泛化能力，鼓励开发更通用的保障；（3）下一代架构测试，针对更复杂的支架（如LLM代理和多代理系统），旨在促进能够适应目前尚不存在保障的未来应用的保障的发展。此外，我们实现并分享了第一个下一代架构测试，使用MACHIAVELLI环境，并提供数据集的交互式可视化。

更新时间: 2024-06-03 14:32:30

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.01364v1

Embedding Privacy in Computational Social Science and Artificial Intelligence Research

Privacy is a human right. It ensures that individuals are free to engage in discussions, participate in groups, and form relationships online or offline without fear of their data being inappropriately harvested, analyzed, or otherwise used to harm them. Preserving privacy has emerged as a critical factor in research, particularly in the computational social science (CSS), artificial intelligence (AI) and data science domains, given their reliance on individuals' data for novel insights. The increasing use of advanced computational models stands to exacerbate privacy concerns because, if inappropriately used, they can quickly infringe privacy rights and lead to adverse effects for individuals -- especially vulnerable groups -- and society. We have already witnessed a host of privacy issues emerge with the advent of large language models (LLMs), such as ChatGPT, which further demonstrate the importance of embedding privacy from the start. This article contributes to the field by discussing the role of privacy and the issues that researchers working in CSS, AI, data science and related domains are likely to face. It then presents several key considerations for researchers to ensure participant privacy is best preserved in their research design, data collection and use, analysis, and dissemination of research results.

Updated: 2024-06-03 14:32:04

标题: 将隐私嵌入计算社会科学和人工智能研究中

摘要: 隐私是一项人权。它确保个人能够自由参与讨论、参与团体，并在线或离线建立关系，而不必担心他们的数据被不当收集、分析或以其他方式用来伤害他们。保护隐私已经成为研究中的一个关键因素，特别是在计算社会科学（CSS）、人工智能（AI）和数据科学领域，因为它们依赖个人数据来获得新的见解。对先进计算模型的增加使用有可能加剧隐私问题，因为如果不当使用，它们可能会迅速侵犯隐私权，并导致对个人（尤其是弱势群体）和社会产生不利影响。我们已经目睹了大型语言模型（LLMs）的出现带来了一系列隐私问题，例如ChatGPT，进一步证明了从一开始就嵌入隐私的重要性。本文通过讨论隐私的作用以及在CSS、AI、数据科学和相关领域工作的研究人员可能面临的问题，为该领域做出了贡献。然后，它提出了几个关键考虑因素，以确保研究人员在研究设计、数据收集和使用、分析以及研究结果的传播中最好地保护参与者的隐私。

更新时间: 2024-06-03 14:32:04

领域: cs.AI,cs.CY,cs.ET,cs.HC

下载: http://arxiv.org/abs/2404.11515v2

PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning

While a number of knowledge graph representation learning (KGRL) methods have been proposed over the past decade, very few theoretical analyses have been conducted on them. In this paper, we present the first PAC-Bayesian generalization bounds for KGRL methods. To analyze a broad class of KGRL models, we propose a generic framework named ReED (Relation-aware Encoder-Decoder), which consists of a relation-aware message passing encoder and a triplet classification decoder. Our ReED framework can express at least 15 different existing KGRL models, including not only graph neural network-based models such as R-GCN and CompGCN but also shallow-architecture models such as RotatE and ANALOGY. Our generalization bounds for the ReED framework provide theoretical grounds for the commonly used tricks in KGRL, e.g., parameter-sharing and weight normalization schemes, and guide desirable design choices for practical KGRL methods. We empirically show that the critical factors in our generalization bounds can explain actual generalization errors on three real-world knowledge graphs.

Updated: 2024-06-03 14:27:59

标题: PAC-Bayesian泛化界限用于知识图表示学习

摘要: 尽管在过去的十年中提出了许多知识图表示学习（KGRL）方法，但对它们很少进行了理论分析。在本文中，我们提出了第一个针对KGRL方法的PAC-Bayesian泛化界限。为了分析广泛类别的KGRL模型，我们提出了一个名为ReED（关系感知编码器-解码器）的通用框架，包括一个关系感知消息传递编码器和一个三元组分类解码器。我们的ReED框架可以表达至少15种不同的现有KGRL模型，包括基于图神经网络的模型，如R-GCN和CompGCN，以及浅层架构模型，如RotatE和ANALOGY。我们对ReED框架的泛化界限提供了KGRL中常用技巧的理论基础，例如参数共享和权重归一化方案，并指导实用KGRL方法的理想设计选择。我们在实证中展示了我们泛化界限中的关键因素可以解释三个真实世界知识图上的实际泛化错误。

更新时间: 2024-06-03 14:27:59

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.06418v2

CoLa-DCE -- Concept-guided Latent Diffusion Counterfactual Explanations

Recent advancements in generative AI have introduced novel prospects and practical implementations. Especially diffusion models show their strength in generating diverse and, at the same time, realistic features, positioning them well for generating counterfactual explanations for computer vision models. Answering "what if" questions of what needs to change to make an image classifier change its prediction, counterfactual explanations align well with human understanding and consequently help in making model behavior more comprehensible. Current methods succeed in generating authentic counterfactuals, but lack transparency as feature changes are not directly perceivable. To address this limitation, we introduce Concept-guided Latent Diffusion Counterfactual Explanations (CoLa-DCE). CoLa-DCE generates concept-guided counterfactuals for any classifier with a high degree of control regarding concept selection and spatial conditioning. The counterfactuals comprise an increased granularity through minimal feature changes. The reference feature visualization ensures better comprehensibility, while the feature localization provides increased transparency of "where" changed "what". We demonstrate the advantages of our approach in minimality and comprehensibility across multiple image classification models and datasets and provide insights into how our CoLa-DCE explanations help comprehend model errors like misclassification cases.

Updated: 2024-06-03 14:27:46

标题: CoLa-DCE -- 概念导向的潜在扩散对照解释

摘要: 最近生成式人工智能的进展引入了新的前景和实际应用。特别是扩散模型展现了它们在生成多样且真实特征方面的优势，使它们很适合为计算机视觉模型生成反事实解释。回答“如果”问题，即需要改变什么才能使图像分类器改变其预测，反事实解释与人类理解相吻合，因此有助于使模型行为更加可理解。目前的方法成功生成真实的反事实，但由于特征变化不直接可感知，缺乏透明度。为了解决这一限制，我们引入了概念引导的潜在扩散反事实解释（CoLa-DCE）。CoLa-DCE为任何分类器生成概念引导的反事实，具有较高程度的概念选择和空间条件控制。反事实通过最小特征变化实现增加的细粒度。参考特征可视化确保更好的可理解性，而特征定位提供了“何处”变化“什么”的增加透明度。我们展示了我们方法在多个图像分类模型和数据集中的最小性和可理解性优势，并提供了关于我们的CoLa-DCE解释如何帮助理解模型错误（如误分类情况）的见解。

更新时间: 2024-06-03 14:27:46

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2406.01649v1

Assessment of cryptographic approaches for a quantum-resistant Galileo OSNMA

Quantum computing becomes more of a reality as time passes, bringing several cybersecurity challenges. Modern cryptography is based on the computational complexity of specific mathematical problems, but as new quantum-based computers appear, classical methods might not be enough to secure communications. In this paper, we analyse the state of the Galileo Open Service Navigation Message Authentication (OSNMA) to overcome these new threats. This analysis and its assessment have been performed using OSNMA documentation, reviewing the available Post Quantum Cryptography (PQC) algorithms competing in the National Institute of Standards and Technology (NIST) standardization process, and studying the possibility of its implementation in the Galileo service. The main barrier to adopting the PQC approach is the size of both the signature and the key. The analysis shows that OSNMA is not yet prepared to face the quantum threat, and a significant change would be required. This work concludes by assessing different temporal countermeasures that can be implemented to sustain the system's integrity in the short term.

Updated: 2024-06-03 14:26:29

标题: 量子抗性Galileo OSNMA的加密方法评估

摘要: 随着时间的推移，量子计算变得越来越现实，带来了几个网络安全挑战。现代密码学基于特定数学问题的计算复杂性，但随着新的基于量子的计算机出现，传统方法可能不足以确保通信安全。本文分析了伽利略开放服务导航消息认证（OSNMA）的状态，以应对这些新威胁。通过使用OSNMA文档进行分析和评估，审查了在国家标准技术研究所（NIST）标准化过程中竞争的可用的后量子密码算法，并研究了其在伽利略服务中实现的可能性。采用PQC方法的主要障碍是签名和密钥的大小。分析显示，OSNMA尚未准备好应对量子威胁，并需要进行重大变革。最后，本文评估了可以在短期内实施以维持系统完整性的不同时间上的对策。

更新时间: 2024-06-03 14:26:29

领域: cs.CR,eess.SP

下载: http://arxiv.org/abs/2312.11080v2

Learning to Play Atari in a World of Tokens

Model-based reinforcement learning agents utilizing transformers have shown improved sample efficiency due to their ability to model extended context, resulting in more accurate world models. However, for complex reasoning and planning tasks, these methods primarily rely on continuous representations. This complicates modeling of discrete properties of the real world such as disjoint object classes between which interpolation is not plausible. In this work, we introduce discrete abstract representations for transformer-based learning (DART), a sample-efficient method utilizing discrete representations for modeling both the world and learning behavior. We incorporate a transformer-decoder for auto-regressive world modeling and a transformer-encoder for learning behavior by attending to task-relevant cues in the discrete representation of the world model. For handling partial observability, we aggregate information from past time steps as memory tokens. DART outperforms previous state-of-the-art methods that do not use look-ahead search on the Atari 100k sample efficiency benchmark with a median human-normalized score of 0.790 and beats humans in 9 out of 26 games. We release our code at https://pranaval.github.io/DART/.

Updated: 2024-06-03 14:25:29

标题: 在代币世界学习玩Atari

摘要: 基于模型的强化学习代理利用变压器已经显示出改进的样本效率，因为它们能够建模扩展上下文，从而产生更准确的世界模型。然而，对于复杂的推理和规划任务，这些方法主要依赖于连续表示。这使得建模现实世界的离散属性变得复杂，例如在不可插值的不相交对象类之间。在这项工作中，我们引入了基于变压器的离散抽象表示（DART），这是一种利用离散表示进行建模和学习行为的高效样本方法。我们结合了一个用于自回归世界建模的变压器解码器和一个用于学习行为的变压器编码器，通过关注世界模型的离散表示中的任务相关线索。为了处理部分可观察性，我们从过去的时间步骤中聚合信息作为记忆标记。DART在Atari 100k样本效率基准测试中表现优于以前的最先进方法，其中介人标准化得分中值为0.790，并在26个游戏中击败了人类中的9个。我们将我们的代码发布在https://pranaval.github.io/DART/。

更新时间: 2024-06-03 14:25:29

领域: cs.LG

下载: http://arxiv.org/abs/2406.01361v1

Consciousness defined: requirements for biological and artificial general intelligence

Consciousness is notoriously hard to define with objective terms. An objective definition of consciousness is critically needed so that we might accurately understand how consciousness and resultant choice behaviour may arise in biological or artificial systems. Many theories have integrated neurobiological and psychological research to explain how consciousness might arise, but few, if any, outline what is fundamentally required to generate consciousness. To identify such requirements, I examine current theories of consciousness and corresponding scientific research to generate a new definition of consciousness from first principles. Critically, consciousness is the apparatus that provides the ability to make decisions, but it is not defined by the decision itself. As such, a definition of consciousness does not require choice behaviour or an explicit awareness of temporality despite both being well-characterised outcomes of conscious thought. Rather, requirements for consciousness include: at least some capability for perception, a memory for the storage of such perceptual information which in turn provides a framework for an imagination with which a sense of self can be capable of making decisions based on possible and desired futures. Thought experiments and observable neurological phenomena demonstrate that these components are fundamentally required of consciousness, whereby the loss of any one component removes the capability for conscious thought. Identifying these requirements provides a new definition for consciousness by which we can objectively determine consciousness in any conceivable agent, such as non-human animals and artificially intelligent systems.

Updated: 2024-06-03 14:20:56

标题: 意识的定义：生物和人工通用智能的要求

摘要: 意识是一个极其难以用客观术语定义的概念。我们迫切需要一个客观的意识定义，以便准确理解意识及其结果选择行为在生物或人工系统中是如何产生的。许多理论已经整合了神经生物学和心理学研究，以解释意识可能如何产生，但很少有理论明确阐明了产生意识所基本需要的条件。为了确定这些要求，我审查了当前的意识理论和相应的科学研究，从第一原则出发提出了一个新的意识定义。关键是，意识是提供做出决策能力的装置，但并不是由决策本身来定义。因此，意识的定义并不需要选择行为或对时间的明确意识，尽管这两者都是意识思维的明确结果。相反，意识的要求包括：至少具有一定程度的感知能力，用于存储感知信息的记忆，进而提供一个想象的框架，通过这个框架，自我意识可以基于可能的和期望的未来做出决策。思维实验和可观察的神经现象表明，这些组成部分是意识所基本需要的，任何一个组成部分的丧失都会导致意识思维的能力丧失。确定这些要求为意识提供了一个新的定义，通过这个定义我们可以客观地确定任何可想象的代理人，如非人类动物和人工智能系统是否具有意识。

更新时间: 2024-06-03 14:20:56

领域: q-bio.NC,cs.AI

下载: http://arxiv.org/abs/2406.01648v1

Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation

Understanding the generalization properties of heavy-tailed stochastic optimization algorithms has attracted increasing attention over the past years. While illuminating interesting aspects of stochastic optimizers by using heavy-tailed stochastic differential equations as proxies, prior works either provided expected generalization bounds, or introduced non-computable information theoretic terms. Addressing these drawbacks, in this work, we prove high-probability generalization bounds for heavy-tailed SDEs which do not contain any nontrivial information theoretic terms. To achieve this goal, we develop new proof techniques based on estimating the entropy flows associated with the so-called fractional Fokker-Planck equation (a partial differential equation that governs the evolution of the distribution of the corresponding heavy-tailed SDE). In addition to obtaining high-probability bounds, we show that our bounds have a better dependence on the dimension of parameters as compared to prior art. Our results further identify a phase transition phenomenon, which suggests that heavy tails can be either beneficial or harmful depending on the problem structure. We support our theory with experiments conducted in a variety of settings.

Updated: 2024-06-03 14:20:34

标题: 重尾随机微分方程的广义界限通过分数阶福克-普朗克方程

摘要: 理解重尾随机优化算法的泛化性质近年来引起了越来越多的关注。通过使用重尾随机微分方程作为代理来揭示随机优化器的有趣方面，之前的研究要么提供了期望的泛化界限，要么引入了不可计算的信息论术语。为了解决这些缺点，在这项工作中，我们证明了重尾SDE的高概率泛化界限，不包含任何非平凡的信息论术语。为了实现这一目标，我们开发了基于估计与所谓的分数Fokker-Planck方程相关的熵流的新证明技术（这是控制相应重尾SDE分布演化的偏微分方程）。除了获得高概率界限外，我们还表明与之前的研究相比，我们的界限对参数维度有更好的依赖性。我们的结果进一步确定了一个相变现象，这表明重尾可能在问题结构上有益或有害。我们通过在各种设置中进行的实验证明了我们的理论。

更新时间: 2024-06-03 14:20:34

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.07723v2

Graph External Attention Enhanced Transformer

The Transformer architecture has recently gained considerable attention in the field of graph representation learning, as it naturally overcomes several limitations of Graph Neural Networks (GNNs) with customized attention mechanisms or positional and structural encodings. Despite making some progress, existing works tend to overlook external information of graphs, specifically the correlation between graphs. Intuitively, graphs with similar structures should have similar representations. Therefore, we propose Graph External Attention (GEA) -- a novel attention mechanism that leverages multiple external node/edge key-value units to capture inter-graph correlations implicitly. On this basis, we design an effective architecture called Graph External Attention Enhanced Transformer (GEAET), which integrates local structure and global interaction information for more comprehensive graph representations. Extensive experiments on benchmark datasets demonstrate that GEAET achieves state-of-the-art empirical performance. The source code is available for reproducibility at: https://github.com/icm1018/GEAET.

Updated: 2024-06-03 14:20:27

标题: 图外部注意增强变压器

摘要: 最近，Transformer架构在图表示学习领域引起了相当大的关注，因为它自然地克服了图神经网络（GNNs）在定制化注意机制或位置和结构编码方面的一些限制。尽管取得了一些进展，现有的研究往往忽视了图的外部信息，特别是图之间的相关性。直观地，具有相似结构的图应该具有相似的表示。因此，我们提出了图外部注意（GEA）--一种利用多个外部节点/边键-值单元来隐式捕捉图间相关性的新型注意机制。在此基础上，我们设计了一种名为图外部注意增强Transformer（GEAET）的有效架构，该架构整合了局部结构和全局交互信息，以获得更全面的图表示。对基准数据集的大量实验证明，GEAET取得了最先进的实证性能。源代码可在以下链接进行复现：https://github.com/icm1018/GEAET。

更新时间: 2024-06-03 14:20:27

领域: cs.LG

下载: http://arxiv.org/abs/2405.21061v2

CFT-Forensics: High-Performance Byzantine Accountability for Crash Fault Tolerant Protocols

Crash fault tolerant (CFT) consensus algorithms are commonly used in scenarios where system components are trusted -- e.g., enterprise settings and government infrastructure. However, CFT consensus can be broken by even a single corrupt node. A desirable property in the face of such potential Byzantine faults is \emph{accountability}: if a corrupt node breaks protocol and affects consensus safety, it should be possible to identify the culpable components with cryptographic integrity from the node states. Today, the best-known protocol for providing accountability to CFT protocols is called PeerReview; it essentially records a signed transcript of all messages sent during the CFT protocol. Because PeerReview is agnostic to the underlying CFT protocol, it incurs high communication and storage overhead. We propose CFT-Forensics, an accountability framework for CFT protocols. We show that for a special family of \emph{forensics-compliant} CFT protocols (which includes widely-used CFT protocols like Raft and multi-Paxos), CFT-Forensics gives provable accountability guarantees. Under realistic deployment settings, we show theoretically that CFT-Forensics operates at a fraction of the cost of PeerReview. We subsequently instantiate CFT-Forensics for Raft, and implement Raft-Forensics as an extension to the popular nuRaft library. In extensive experiments, we demonstrate that Raft-Forensics adds low overhead to vanilla Raft. With 256 byte messages, Raft-Forensics achieves a peak throughput 87.8\% of vanilla Raft at 46\% higher latency ($+44$ ms). We finally integrate Raft-Forensics into the open-source central bank digital currency OpenCBDC, and show that in wide-area network experiments, Raft-Forensics achieves 97.8\% of the throughput of Raft, with 14.5\% higher latency ($+326$ ms).

Updated: 2024-06-03 14:20:12

标题: CFT-Forensics：用于崩溃容错协议的高性能拜占庭责任制

摘要: 冲突容错（CFT）共识算法通常在系统组件受信任的场景中使用，例如企业设置和政府基础设施。然而，即使是一个腐败节点也可以破坏CFT共识。在面对潜在的拜占庭错误时，一种理想的属性是\emph{问责制}：如果一个腐败节点违反协议并影响共识安全，应该能够通过节点状态的加密完整性识别有罪的组件。如今，为CFT协议提供问责制的最佳协议被称为PeerReview；它基本上记录了CFT协议期间发送的所有消息的签名副本。由于PeerReview对底层CFT协议是不可知的，它会产生高通信和存储开销。我们提出CFT-Forensics，这是一个适用于CFT协议的问责制框架。我们展示，对于一类特殊的\emph{法医合规}CFT协议（其中包括广泛使用的CFT协议如Raft和多Paxos），CFT-Forensics提供可证明的问责保证。在现实部署设置下，我们理论上展示了CFT-Forensics以PeerReview成本的一小部分运行。随后，我们为Raft实例化CFT-Forensics，并将Raft-Forensics作为流行的nuRaft库的扩展实现。在广泛的实验中，我们展示Raft-Forensics对原始Raft的开销很低。使用256字节的消息，Raft-Forensics在46\%更高的延迟（+44毫秒）下实现了原始Raft的最高吞吐量的87.8\%。最后，我们将Raft-Forensics集成到开源的中央银行数字货币OpenCBDC中，并展示在广域网实验中，Raft-Forensics实现了Raft吞吐量的97.8%，延迟提高了14.5%（+326毫秒）。

更新时间: 2024-06-03 14:20:12

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2305.09123v3

Interpreting and Improving Diffusion Models from an Optimization Perspective

Denoising is intuitively related to projection. Indeed, under the manifold hypothesis, adding random noise is approximately equivalent to orthogonal perturbation. Hence, learning to denoise is approximately learning to project. In this paper, we use this observation to interpret denoising diffusion models as approximate gradient descent applied to the Euclidean distance function. We then provide straight-forward convergence analysis of the DDIM sampler under simple assumptions on the projection error of the denoiser. Finally, we propose a new gradient-estimation sampler, generalizing DDIM using insights from our theoretical results. In as few as 5-10 function evaluations, our sampler achieves state-of-the-art FID scores on pretrained CIFAR-10 and CelebA models and can generate high quality samples on latent diffusion models.

Updated: 2024-06-03 14:18:29

标题: 从优化的角度解释和改进扩散模型

摘要: 降噪直觉上与投影相关。事实上，在流形假设下，添加随机噪声大致相当于正交扰动。因此，学习去噪大致等同于学习投影。在本文中，我们利用这一观察结果将降噪扩散模型解释为应用于欧几里德距离函数的近似梯度下降。然后，我们在对去噪器的投影误差进行简单假设的情况下对DDIM采样器进行直观收敛性分析。最后，我们提出了一种新的梯度估计采样器，通过从我们的理论结果中获得的见解对DDIM进行泛化。在仅5-10个函数评估的情况下，我们的采样器在预训练的CIFAR-10和CelebA模型上实现了最新的FID分数，并且可以在潜在扩散模型上生成高质量的样本。

更新时间: 2024-06-03 14:18:29

领域: cs.LG,cs.CV,math.OC,stat.ML

下载: http://arxiv.org/abs/2306.04848v4

Differentially Private Fine-Tuning of Diffusion Models

The integration of Differential Privacy (DP) with diffusion models (DMs) presents a promising yet challenging frontier, particularly due to the substantial memorization capabilities of DMs that pose significant privacy risks. Differential privacy offers a rigorous framework for safeguarding individual data points during model training, with Differential Privacy Stochastic Gradient Descent (DP-SGD) being a prominent implementation. Diffusion method decomposes image generation into iterative steps, theoretically aligning well with DP's incremental noise addition. Despite the natural fit, the unique architecture of DMs necessitates tailored approaches to effectively balance privacy-utility trade-off. Recent developments in this field have highlighted the potential for generating high-quality synthetic data by pre-training on public data (i.e., ImageNet) and fine-tuning on private data, however, there is a pronounced gap in research on optimizing the trade-offs involved in DP settings, particularly concerning parameter efficiency and model scalability. Our work addresses this by proposing a parameter-efficient fine-tuning strategy optimized for private diffusion models, which minimizes the number of trainable parameters to enhance the privacy-utility trade-off. We empirically demonstrate that our method achieves state-of-the-art performance in DP synthesis, significantly surpassing previous benchmarks on widely studied datasets (e.g., with only 0.47M trainable parameters, achieving a more than 35% improvement over the previous state-of-the-art with a small privacy budget on the CelebA-64 dataset). Anonymous codes available at https://anonymous.4open.science/r/DP-LORA-F02F.

Updated: 2024-06-03 14:18:04

标题: 差分私有微调扩散模型

摘要: 将差分隐私（DP）与扩散模型（DMs）集成在一起，提出了一个有前景但具有挑战性的前沿，尤其是由于DMs具有大量的记忆能力，这会带来显著的隐私风险。差分隐私在模型训练过程中提供了一种严格的框架，用于保护个体数据点，其中差分隐私随机梯度下降（DP-SGD）是一个著名的实现方法。扩散方法将图像生成分解为迭代步骤，与DP的增量噪声添加理论上很好地吻合。尽管自然吻合，但DMs的独特架构需要量身定制的方法，以有效地平衡隐私-效用的权衡。该领域最近的发展突显了通过在公共数据（例如ImageNet）上进行预训练并在私有数据上进行微调，从而产生高质量合成数据的潜力，然而，在DP设置中优化权衡涉及的研究存在明显的差距，特别是关于参数效率和模型可扩展性的问题。我们的工作通过提出一种针对私有扩散模型优化的参数高效微调策略来解决这个问题，该策略通过最小化可训练参数的数量来增强隐私-效用的权衡。我们经验性地证明，我们的方法在DP合成方面取得了最先进的性能，显著超过了先前在广泛研究的数据集上的基准（例如，在CelebA-64数据集上，仅使用0.47M可训练参数，相对于先前的最先进水平，隐私预算较小时实现了超过35%的改进）。匿名代码可在https://anonymous.4open.science/r/DP-LORA-F02F找到。

更新时间: 2024-06-03 14:18:04

领域: cs.CV,cs.AI,cs.CR

下载: http://arxiv.org/abs/2406.01355v1

Position Paper: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience

Inner Interpretability is a promising emerging field tasked with uncovering the inner mechanisms of AI systems, though how to develop these mechanistic theories is still much debated. Moreover, recent critiques raise issues that question its usefulness to advance the broader goals of AI. However, it has been overlooked that these issues resemble those that have been grappled with in another field: Cognitive Neuroscience. Here we draw the relevant connections and highlight lessons that can be transferred productively between fields. Based on these, we propose a general conceptual framework and give concrete methodological strategies for building mechanistic explanations in AI inner interpretability research. With this conceptual framework, Inner Interpretability can fend off critiques and position itself on a productive path to explain AI systems.

Updated: 2024-06-03 14:16:56

标题: 立场文件：受认知神经科学启发的人工智能内部可解释性框架

摘要: 内部可解释性是一个有前景的新兴领域，旨在揭示人工智能系统的内在机制，尽管如何发展这些机制性理论仍然存在很大争议。此外，最近的批评提出了问题，质疑其对推动人工智能更广泛目标的有用性。然而，人们忽视了这些问题类似于另一个领域所面临的问题：认知神经科学。在这里，我们描绘了相关的联系并强调可以在领域之间进行有益转移的经验教训。基于这些，我们提出了一个通用的概念框架，并提供了在人工智能内部可解释性研究中建立机制性解释的具体方法策略。通过这个概念框架，内部可解释性可以抵御批评，并将自己定位在解释人工智能系统的有益路径上。

更新时间: 2024-06-03 14:16:56

领域: cs.AI

下载: http://arxiv.org/abs/2406.01352v1

Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

A multimodal large language model (MLLM) agent can receive instructions, capture images, retrieve histories from memory, and decide which tools to use. Nonetheless, red-teaming efforts have revealed that adversarial images/prompts can jailbreak an MLLM and cause unaligned behaviors. In this work, we report an even more severe safety issue in multi-agent environments, referred to as infectious jailbreak. It entails the adversary simply jailbreaking a single agent, and without any further intervention from the adversary, (almost) all agents will become infected exponentially fast and exhibit harmful behaviors. To validate the feasibility of infectious jailbreak, we simulate multi-agent environments containing up to one million LLaVA-1.5 agents, and employ randomized pair-wise chat as a proof-of-concept instantiation for multi-agent interaction. Our results show that feeding an (infectious) adversarial image into the memory of any randomly chosen agent is sufficient to achieve infectious jailbreak. Finally, we derive a simple principle for determining whether a defense mechanism can provably restrain the spread of infectious jailbreak, but how to design a practical defense that meets this principle remains an open question to investigate. Our project page is available at https://sail-sg.github.io/Agent-Smith/.

Updated: 2024-06-03 14:15:03

标题: 史密斯特工：一张图片可以指数快速越狱一百万个多模态LLM代理

摘要: 一种多模态大型语言模型（MLLM）代理可以接收指令，捕捉图像，从记忆中检索历史，并决定使用哪些工具。然而，红队测试揭示了对抗性图像/提示可以越狱MLLM并导致不一致的行为。在这项工作中，我们报告了一个更严重的多代理环境安全问题，称为传染性越狱。它包括对手简单地越狱一个单一代理，而无需对手进一步干预，（几乎）所有代理将以指数速度感染并表现出有害行为。为了验证传染性越狱的可行性，我们模拟了包含最多一百万个LLaVA-1.5代理的多代理环境，并使用随机配对聊天作为多代理交互的概念验证实例。我们的结果显示，将（具有传染性的）对抗性图像输入到任意随机选择的代理的记忆中就足以实现传染性越狱。最后，我们提出了一个简单原则，用于确定防御机制是否能够明确约束传染性越狱的传播，但如何设计一个符合这一原则的实用防御仍然是一个需要研究的开放问题。我们的项目页面位于https://sail-sg.github.io/Agent-Smith/。

更新时间: 2024-06-03 14:15:03

领域: cs.CL,cs.CR,cs.CV,cs.LG,cs.MA

下载: http://arxiv.org/abs/2402.08567v2

Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms. We identify that existing benchmarks used for research into open-ended learning fall into one of two categories. Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen. To remedy this, we first present Craftax-Classic: a ground-up rewrite of Crafter in JAX that runs up to 250x faster than the Python-native original. A run of PPO using 1 billion environment interactions finishes in under an hour using only a single GPU and averages 90% of the optimal reward. To provide a more compelling challenge we present the main Craftax benchmark, a significant extension of the Crafter mechanics with elements inspired from NetHack. Solving Craftax requires deep exploration, long term planning and memory, as well as continual adaptation to novel situations as more of the world is discovered. We show that existing methods including global and episodic exploration, as well as unsupervised environment design fail to make material progress on the benchmark. We believe that Craftax can for the first time allow researchers to experiment in a complex, open-ended environment with limited computational resources.

Updated: 2024-06-03 14:12:27

标题: Craftax：一个用于开放式强化学习的高速基准测试

摘要: 基准测试在强化学习（RL）算法的开发和分析中发挥着至关重要的作用。我们发现，用于开放式学习研究的现有基准测试可以分为两类。要么它们速度太慢，需要巨大的计算资源才能进行有意义的研究，例如Crafter、NetHack和Minecraft；要么它们不够复杂，无法构成重大挑战，例如Minigrid和Procgen。为了解决这个问题，我们首先提出了Craftax-Classic：在JAX中从头开始重写Crafter，比Python原始版本快250倍。使用1亿次环境交互运行的PPO算法在不到一小时内完成，只使用一个GPU，并平均获得90%的最优奖励。为了提供更具挑战性的挑战，我们提出了主要的Craftax基准测试，这是对Crafter机制的重大扩展，灵感来自于NetHack。解决Craftax需要深度探索、长期规划和记忆，以及在发现更多世界的情况下对新情况的持续适应。我们展示了，包括全局和分集式探索以及无监督环境设计在内的现有方法在基准测试中未能取得实质性进展。我们相信，Craftax首次使研究人员能够在复杂、开放式环境中进行实验，而且只需有限的计算资源。

更新时间: 2024-06-03 14:12:27

领域: cs.LG

下载: http://arxiv.org/abs/2402.16801v2

Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

Classification models are expected to perform equally well for different classes, yet in practice, there are often large gaps in their performance. This issue of class bias is widely studied in cases of datasets with sample imbalance, but is relatively overlooked in balanced datasets. In this work, we introduce the concept of spectral imbalance in features as a potential source for class disparities and study the connections between spectral imbalance and class bias in both theory and practice. To build the connection between spectral imbalance and class gap, we develop a theoretical framework for studying class disparities and derive exact expressions for the per-class error in a high-dimensional mixture model setting. We then study this phenomenon in 11 different state-of-the-art pretrained encoders and show how our proposed framework can be used to compare the quality of encoders, as well as evaluate and combine data augmentation strategies to mitigate the issue. Our work sheds light on the class-dependent effects of learning, and provides new insights into how state-of-the-art pretrained features may have unknown biases that can be diagnosed through their spectra.

Updated: 2024-06-03 14:09:10

标题: 平衡数据，不平衡光谱：通过光谱不平衡揭示类别差异

摘要: 分类模型预期在不同类别之间表现相同，但在实践中，它们的性能常常存在较大差距。类别偏差的问题在样本不平衡的数据集中得到广泛研究，但在平衡数据集中相对被忽视。在这项工作中，我们引入了特征中的光谱不平衡概念作为类别差异的潜在来源，并研究了光谱不平衡与类别偏差之间的理论和实践联系。为了建立光谱不平衡与类别差距之间的关系，我们开发了一个用于研究类别差异的理论框架，并推导了在高维混合模型设置中每个类别错误的精确表达式。然后我们在11个不同的最先进的预训练编码器中研究了这一现象，并展示了我们提出的框架如何用于比较编码器的质量，以及评估和结合数据增强策略以减轻问题。我们的工作揭示了学习过程中类别相关效应，并提供了关于最先进的预训练特征可能存在未知偏见的新见解，这些偏见可以通过它们的光谱进行诊断。

更新时间: 2024-06-03 14:09:10

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.11742v2

BMRS: Bayesian Model Reduction for Structured Pruning

Modern neural networks are often massively overparameterized leading to high compute costs during training and at inference. One effective method to improve both the compute and energy efficiency of neural networks while maintaining good performance is structured pruning, where full network structures (e.g. neurons or convolutional filters) that have limited impact on the model output are removed. In this work, we propose Bayesian Model Reduction for Structured pruning (BMRS), a fully end-to-end Bayesian method of structured pruning. BMRS is based on two recent methods: Bayesian structured pruning with multiplicative noise, and Bayesian model reduction (BMR), a method which allows efficient comparison of Bayesian models under a change in prior. We present two realizations of BMRS derived from different priors which yield different structured pruning characteristics: 1) BMRS_N with the truncated log-normal prior, which offers reliable compression rates and accuracy without the need for tuning any thresholds and 2) BMRS_U with the truncated log-uniform prior that can achieve more aggressive compression based on the boundaries of truncation. Overall, we find that BMRS offers a theoretically grounded approach to structured pruning of neural networks yielding both high compression rates and accuracy. Experiments on multiple datasets and neural networks of varying complexity showed that the two BMRS methods offer a competitive performance-efficiency trade-off compared to other pruning methods.

Updated: 2024-06-03 14:08:04

标题: BMRS: 贝叶斯模型简化用于结构化修剪

摘要: 现代神经网络通常过度参数化，导致训练和推断过程中的计算成本很高。一种有效的方法来提高神经网络的计算和能量效率，同时保持良好性能是结构化剪枝，即删除对模型输出影响有限的完整网络结构（例如神经元或卷积滤波器）。在这项工作中，我们提出了一种完全端到端的贝叶斯模型剪枝方法，称为结构化剪枝贝叶斯模型缩减（BMRS）。BMRS基于两种最近的方法：带乘性噪声的贝叶斯结构化剪枝和贝叶斯模型缩减（BMR），一种允许在先验变化下高效比较贝叶斯模型的方法。我们提出了两种源自不同先验的BMRS实现，它们产生不同的结构化剪枝特性：1）BMRS_N采用截断的对数正态先验，提供可靠的压缩率和准确性，无需调整任何阈值；2）BMRS_U采用截断的对数均匀先验，可以根据截断边界实现更激进的压缩。总体而言，我们发现BMRS提供了一个基于理论的方法来对神经网络进行结构化剪枝，同时产生高压缩率和准确性。对多个数据集和不同复杂性的神经网络进行的实验表明，与其他剪枝方法相比，这两种BMRS方法在性能效率折衷方面表现竞争力。

更新时间: 2024-06-03 14:08:04

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01345v1

Probing Language Models for Pre-training Data Detection

Large Language Models (LLMs) have shown their impressive capabilities, while also raising concerns about the data contamination problems due to privacy issues and leakage of benchmark datasets in the pre-training phase. Therefore, it is vital to detect the contamination by checking whether an LLM has been pre-trained on the target texts. Recent studies focus on the generated texts and compute perplexities, which are superficial features and not reliable. In this study, we propose to utilize the probing technique for pre-training data detection by examining the model's internal activations. Our method is simple and effective and leads to more trustworthy pre-training data detection. Additionally, we propose ArxivMIA, a new challenging benchmark comprising arxiv abstracts from Computer Science and Mathematics categories. Our experiments demonstrate that our method outperforms all baselines, and achieves state-of-the-art performance on both WikiMIA and ArxivMIA, with additional experiments confirming its efficacy (Our code and dataset are available at https://github.com/zhliu0106/probing-lm-data).

Updated: 2024-06-03 13:58:04

标题: 探究语言模型用于预训练数据检测

摘要: 大型语言模型(LLMs)展示了它们令人印象深刻的能力，同时也引起了关于由于隐私问题和基准数据集在预训练阶段泄露导致的数据污染问题的担忧。因此，通过检查LLM是否已经在目标文本上进行了预训练来检测污染是至关重要的。最近的研究集中在生成的文本上，并计算困惑度，这些都是表面特征并且不可靠。在本研究中，我们提出利用探测技术来检测预训练数据，通过检查模型的内部激活。我们的方法简单有效，并导致更可信赖的预训练数据检测。此外，我们提出ArxivMIA，一个包含计算机科学和数学类别的Arxiv摘要的新的具有挑战性的基准。我们的实验表明，我们的方法优于所有基准，并在WikiMIA和ArxivMIA上实现了最先进的性能，额外的实验证实了其有效性(我们的代码和数据集可在https://github.com/zhliu0106/probing-lm-data 获取)。

更新时间: 2024-06-03 13:58:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.01333v1

Transferring Domain Knowledge with (X)AI-Based Learning Systems

In numerous high-stakes domains, training novices via conventional learning systems does not suffice. To impart tacit knowledge, experts' hands-on guidance is imperative. However, training novices by experts is costly and time-consuming, increasing the need for alternatives. Explainable artificial intelligence (XAI) has conventionally been used to make black-box artificial intelligence systems interpretable. In this work, we utilize XAI as an alternative: An (X)AI system is trained on experts' past decisions and is then employed to teach novices by providing examples coupled with explanations. In a study with 249 participants, we measure the effectiveness of such an approach for a classification task. We show that (X)AI-based learning systems are able to induce learning in novices and that their cognitive styles moderate learning. Thus, we take the first steps to reveal the impact of XAI on human learning and point AI developers to future options to tailor the design of (X)AI-based learning systems.

Updated: 2024-06-03 13:56:30

标题: 用(X)AI基于学习系统转移领域知识

摘要: 在许多高风险领域，通过传统学习系统培训新手是不够的。为了传授隐性知识，专家的亲身指导是必不可少的。然而，由专家培训新手是昂贵且耗时的，增加了寻求替代方案的需求。可解释人工智能（XAI）通常被用来使黑匣子人工智能系统可解释。在这项工作中，我们将XAI作为一种替代方法：一个（X）AI系统被训练基于专家的过去决策，然后被用来通过提供示例和解释来教授新手。在一项涉及249名参与者的研究中，我们衡量了这种方法在分类任务中的有效性。我们展示了（X）AI学习系统能够在新手中引起学习，他们的认知风格调节学习。因此，我们迈出了揭示XAI对人类学习影响的第一步，并指出AI开发人员未来定制（X）AI学习系统设计的选项。

更新时间: 2024-06-03 13:56:30

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2406.01329v1

Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning

Deductive reasoning plays a pivotal role in the formulation of sound and cohesive arguments. It allows individuals to draw conclusions that logically follow, given the truth value of the information provided. Recent progress in the domain of large language models (LLMs) has showcased their capability in executing deductive reasoning tasks. Nonetheless, a significant portion of research primarily assesses the accuracy of LLMs in solving such tasks, often overlooking a deeper analysis of their reasoning behavior. In this study, we draw upon principles from cognitive psychology to examine inferential strategies employed by LLMs, through a detailed evaluation of their responses to propositional logic problems. Our findings indicate that LLMs display reasoning patterns akin to those observed in humans, including strategies like $\textit{supposition following}$ or $\textit{chain construction}$. Moreover, our research demonstrates that the architecture and scale of the model significantly affect its preferred method of reasoning, with more advanced models tending to adopt strategies more frequently than less sophisticated ones. Importantly, we assert that a model's accuracy, that is the correctness of its final conclusion, does not necessarily reflect the validity of its reasoning process. This distinction underscores the necessity for more nuanced evaluation procedures in the field.

Updated: 2024-06-03 13:53:01

标题: 人类和大型语言模型在演绎推理中的推理策略比较

摘要: 演绎推理在制定合理和连贯论点方面发挥着关键作用。它允许个体在提供信息的真值的基础上得出逻辑上的结论。大型语言模型（LLMs）领域的最新进展展示了它们在执行演绎推理任务方面的能力。然而，研究的一个重要部分主要评估LLMs在解决此类任务中的准确性，往往忽视对其推理行为的深入分析。在这项研究中，我们借鉴认知心理学原理，通过对LLMs对命题逻辑问题的响应进行详细评估，来考察LLMs所采用的推理策略。我们的研究结果表明，LLMs展现出类似于人类观察到的推理模式，包括像“假设跟随”或“链构建”这样的策略。此外，我们的研究表明，模型的体系结构和规模显著影响其首选的推理方法，更先进的模型倾向于更频繁地采用策略，而不那么复杂的模型则不太倾向于采用。重要的是，我们断言一个模型的准确性，即其最终结论的正确性，不一定反映其推理过程的有效性。这种区别强调了该领域需要更加细致的评估程序。

更新时间: 2024-06-03 13:53:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.14856v2

PowerGraph: A power grid benchmark dataset for graph neural networks

Power grids are critical infrastructures of paramount importance to modern society and, therefore, engineered to operate under diverse conditions and failures. The ongoing energy transition poses new challenges for the decision-makers and system operators. Therefore, we must develop grid analysis algorithms to ensure reliable operations. These key tools include power flow analysis and system security analysis, both needed for effective operational and strategic planning. The literature review shows a growing trend of machine learning (ML) models that perform these analyses effectively. In particular, Graph Neural Networks (GNNs) stand out in such applications because of the graph-based structure of power grids. However, there is a lack of publicly available graph datasets for training and benchmarking ML models in electrical power grid applications. First, we present PowerGraph, which comprises GNN-tailored datasets for i) power flows, ii) optimal power flows, and iii) cascading failure analyses of power grids. Second, we provide ground-truth explanations for the cascading failure analysis. Finally, we perform a complete benchmarking of GNN methods for node-level and graph-level tasks and explainability. Overall, PowerGraph is a multifaceted GNN dataset for diverse tasks that includes power flow and fault scenarios with real-world explanations, providing a valuable resource for developing improved GNN models for node-level, graph-level tasks and explainability methods in power system modeling. The dataset is available at https://figshare.com/articles/dataset/PowerGraph/22820534 and the code at https://github.com/PowerGraph-Datasets.

Updated: 2024-06-03 13:51:16

标题: PowerGraph: 一个用于图神经网络的电力网络基准数据集如果需要其他帮助，请随时告诉我。

摘要: 电力网络是现代社会至关重要的关键基础设施，因此被设计为在不同条件和故障下运行。持续进行的能源转型为决策者和系统操作者带来了新的挑战。因此，我们必须开发电网分析算法来确保可靠运行。这些关键工具包括功率流分析和系统安全分析，两者均为有效的运营和战略规划所需。文献综述显示，机器学习（ML）模型在执行这些分析方面表现出越来越明显的趋势。特别是，由于电力网络的基于图的结构，图神经网络（GNNs）在此类应用中表现突出。然而，在电力网应用中缺乏用于训练和基准测试ML模型的公开可用的图数据集。首先，我们提出了PowerGraph，其中包括用于i）功率流、ii）最优功率流和iii）电力网络级联故障分析的GNN定制数据集。其次，我们提供了级联故障分析的基本解释。最后，我们对节点级和图级任务以及可解释性的GNN方法进行了全面的基准测试。总的来说，PowerGraph是一个多方面的GNN数据集，适用于包括功率流和故障场景在内的各种任务，提供了一个宝贵的资源，用于开发改进的GNN模型，以进行节点级、图级任务和电力系统建模中的可解释性方法。数据集可在https://figshare.com/articles/dataset/PowerGraph/22820534 和代码可在https://github.com/PowerGraph-Datasets 找到。

更新时间: 2024-06-03 13:51:16

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2402.02827v2

Uplift Modeling Under Limited Supervision

Estimating causal effects in e-commerce tends to involve costly treatment assignments which can be impractical in large-scale settings. Leveraging machine learning to predict such treatment effects without actual intervention is a standard practice to diminish the risk. However, existing methods for treatment effect prediction tend to rely on training sets of substantial size, which are built from real experiments and are thus inherently risky to create. In this work we propose a graph neural network to diminish the required training set size, relying on graphs that are common in e-commerce data. Specifically, we view the problem as node regression with a restricted number of labeled instances, develop a two-model neural architecture akin to previous causal effect estimators, and test varying message-passing layers for encoding. Furthermore, as an extra step, we combine the model with an acquisition function to guide the creation of the training set in settings with extremely low experimental budget. The framework is flexible since each step can be used separately with other models or treatment policies. The experiments on real large-scale networks indicate a clear advantage of our methodology over the state of the art, which in many cases performs close to random, underlining the need for models that can generalize with limited supervision to reduce experimental risks.

Updated: 2024-06-03 13:49:20

标题: 有限监督下的提升建模

摘要: 在电子商务中估计因果效应往往涉及昂贵的处理分配，这在大规模环境中可能不切实际。利用机器学习来预测这种处理效果，而无需实际干预，是减少风险的标准做法。然而，现有的处理效果预测方法往往依赖于大量训练集，这些训练集是从真实实验中构建的，因此在创建过程中存在固有风险。在这项工作中，我们提出了一个图神经网络来减少所需的训练集大小，依赖于电子商务数据中常见的图。具体来说，我们将问题视为具有受限标记实例数量的节点回归问题，开发了一个类似于以前因果效应估计器的双模型神经架构，并测试了不同的消息传递层进行编码。此外，作为额外步骤，我们将模型与一种获取函数相结合，以指导在实验预算极低的情况下创建训练集。该框架是灵活的，因为每个步骤都可以单独与其他模型或处理策略一起使用。对真实大规模网络的实验表明，我们的方法明显优于现有技术，后者在许多情况下表现接近随机，强调了需要能够在有限监督下进行泛化以减少实验风险的模型的必要性。

更新时间: 2024-06-03 13:49:20

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2403.19289v2

Machine Learning with Confidential Computing: A Systematization of Knowledge

Privacy and security challenges in Machine Learning (ML) have become increasingly severe, along with ML's pervasive development and the recent demonstration of large attack surfaces. As a mature system-oriented approach, Confidential Computing has been utilized in both academia and industry to mitigate privacy and security issues in various ML scenarios. In this paper, the conjunction between ML and Confidential Computing is investigated. We systematize the prior work on Confidential Computing-assisted ML techniques that provide i) confidentiality guarantees and ii) integrity assurances, and discuss their advanced features and drawbacks. Key challenges are further identified, and we provide dedicated analyses of the limitations in existing Trusted Execution Environment (TEE) systems for ML use cases. Finally, prospective works are discussed, including grounded privacy definitions for closed-loop protection, partitioned executions of efficient ML, dedicated TEE-assisted designs for ML, TEE-aware ML, and ML full pipeline guarantees. By providing these potential solutions in our systematization of knowledge, we aim to build the bridge to help achieve a much stronger TEE-enabled ML for privacy guarantees without introducing computation and system costs.

Updated: 2024-06-03 13:48:59

标题: 机器学习与机密计算：知识系统化

摘要: 随着机器学习（ML）的广泛发展和最近对大规模攻击面的演示，隐私和安全挑战变得日益严峻。作为一种成熟的系统导向方法，保密计算已被学术界和工业界广泛利用，以缓解各种ML场景中的隐私和安全问题。本文研究了ML与保密计算之间的结合。我们系统化了先前关于保密计算辅助ML技术的工作，这些技术提供了i）保密性保证和ii）完整性保证，并讨论了它们的高级特性和缺点。进一步确定了关键挑战，并提供了对现有受信执行环境（TEE）系统在ML用例中的限制的专门分析。最后，讨论了未来的工作，包括为闭环保护提供基础隐私定义，有效ML的分区执行，专门的TEE辅助设计用于ML，TEE感知ML以及ML全流水线保证。通过在我们的知识系统化中提供这些潜在解决方案，我们旨在搭建桥梁，帮助实现更强大的TEE启用ML以实现隐私保证，而不引入计算和系统成本。

更新时间: 2024-06-03 13:48:59

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2208.10134v3

Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics

Replica exchange stochastic gradient Langevin dynamics (reSGLD) is an effective sampler for non-convex learning in large-scale datasets. However, the simulation may encounter stagnation issues when the high-temperature chain delves too deeply into the distribution tails. To tackle this issue, we propose reflected reSGLD (r2SGLD): an algorithm tailored for constrained non-convex exploration by utilizing reflection steps within a bounded domain. Theoretically, we observe that reducing the diameter of the domain enhances mixing rates, exhibiting a $\textit{quadratic}$ behavior. Empirically, we test its performance through extensive experiments, including identifying dynamical systems with physical constraints, simulations of constrained multi-modal distributions, and image classification tasks. The theoretical and empirical findings highlight the crucial role of constrained exploration in improving the simulation efficiency.

Updated: 2024-06-03 13:48:52

标题: 通过反射复制交换随机梯度 Langevin 动力学进行受限探索

摘要: 复制交换随机梯度 Langevin 动力学 (reSGLD) 是一个在大规模数据集中非凸学习中有效的采样器。然而，当高温链深入分布尾部时，模拟可能会遇到停滞问题。为了解决这个问题，我们提出了反射 reSGLD (r2SGLD)：一种专门针对受约束的非凸探索的算法，通过在有界域内利用反射步骤。理论上，我们观察到减小域的直径可以增强混合率，表现出二次行为。经验上，我们通过广泛实验测试其性能，包括识别具有物理约束的动力系统、受约束多模态分布的模拟以及图像分类任务。理论和经验结果突出了受约束探索在提高模拟效率中的关键作用。

更新时间: 2024-06-03 13:48:52

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.07839v2

One-Shot Learning as Instruction Data Prospector for Large Language Models

Contemporary practices in instruction tuning often hinge on enlarging data scaling without a clear strategy for ensuring data quality, inadvertently introducing noise that may compromise model performance. To address this challenge, we introduce \textsc{Nuggets}, a novel and efficient methodology that leverages one-shot learning to discern and select high-quality instruction data from extensive datasets. \textsc{Nuggets} assesses the potential of individual instruction examples to act as effective one-shot learning instances, thereby identifying those that can significantly improve performance across diverse tasks. \textsc{Nuggets} utilizes a scoring system based on the impact of candidate examples on the perplexity of a diverse anchor set, facilitating the selection of the most advantageous data for instruction tuning. Through comprehensive evaluations on two benchmarks, including MT-Bench and Alpaca-Eval, we show that instruction tuning with the top 1\% of examples curated by \textsc{Nuggets} substantially outperforms conventional methods employing the entire dataset.

Updated: 2024-06-03 13:46:16

标题: 一次性学习作为大型语言模型的指导数据探寻者

摘要: 当今的指导调优实践往往依赖于扩大数据规模，但缺乏确保数据质量的明确策略，从而无意中引入可能损害模型性能的噪音。为了解决这一挑战，我们引入了一种新颖高效的方法学，名为\textsc{Nuggets}，利用一次性学习来甄别和选择来自庞大数据集的高质量指导数据。 \textsc{Nuggets}评估单个指导示例作为有效一次性学习实例的潜力，从而确定可以显著改善各种任务性能的示例。 \textsc{Nuggets}利用基于候选示例对多样锚点集困惑度影响的评分系统，促进选择最有利于指导调优的数据。通过对两个基准测试，包括MT-Bench和Alpaca-Eval的全面评估，我们展示了通过\textsc{Nuggets}筛选的前1％示例进行指导调优明显优于采用整个数据集的传统方法。

更新时间: 2024-06-03 13:46:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2312.10302v4

Targeted Reduction of Causal Models

Why does a phenomenon occur? Addressing this question is central to most scientific inquiries and often relies on simulations of scientific models. As models become more intricate, deciphering the causes behind phenomena in high-dimensional spaces of interconnected variables becomes increasingly challenging. Causal Representation Learning (CRL) offers a promising avenue to uncover interpretable causal patterns within these simulations through an interventional lens. However, developing general CRL frameworks suitable for practical applications remains an open challenge. We introduce Targeted Causal Reduction (TCR), a method for condensing complex intervenable models into a concise set of causal factors that explain a specific target phenomenon. We propose an information theoretic objective to learn TCR from interventional data of simulations, establish identifiability for continuous variables under shift interventions and present a practical algorithm for learning TCRs. Its ability to generate interpretable high-level explanations from complex models is demonstrated on toy and mechanical systems, illustrating its potential to assist scientists in the study of complex phenomena in a broad range of disciplines.

Updated: 2024-06-03 13:45:44

标题: 目标化减少因果模型

摘要: 为什么会出现某种现象？解决这个问题对大多数科学探究至关重要，通常依赖于科学模型的模拟。随着模型变得越来越复杂，解析高维空间中相互关联变量背后的原因变得越来越具有挑战性。因果表征学习（Causal Representation Learning，CRL）通过干预性视角为我们提供了一个有望揭示这些模拟中可解释因果模式的途径。然而，开发适用于实际应用的通用CRL框架仍然是一个开放性挑战。我们介绍了目标因果简化（Targeted Causal Reduction，TCR），这是一种将复杂的可干预模型压缩为简明的一组因果因子，用以解释特定目标现象的方法。我们提出了一个信息论目标，用于从模拟的干预数据中学习TCR，为连续变量在转移干预下的可辨识性建立了基础，并提出了一个学习TCR的实用算法。我们展示了它在玩具和机械系统上生成可解释的高层解释的能力，说明了它在帮助科学家研究各种学科中的复杂现象方面的潜力。

更新时间: 2024-06-03 13:45:44

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2311.18639v2

Predictive Coding beyond Correlations

Recently, there has been extensive research on the capabilities of biologically plausible algorithms. In this work, we show how one of such algorithms, called predictive coding, is able to perform causal inference tasks. First, we show how a simple change in the inference process of predictive coding enables to compute interventions without the need to mutilate or redefine a causal graph. Then, we explore applications in cases where the graph is unknown, and has to be inferred from observational data. Empirically, we show how such findings can be used to improve the performance of predictive coding in image classification tasks, and conclude that such models are able to perform simple end-to-end causal inference tasks.

Updated: 2024-06-03 13:43:52

标题: 超越相关性的预测编码

摘要: 最近，关于生物学合理算法的能力进行了广泛的研究。在这项工作中，我们展示了一种名为预测编码的算法如何执行因果推断任务。首先，我们展示了如何通过简单地改变预测编码的推断过程，就能够计算干预，而无需残害或重新定义因果图。然后，我们探讨了在因果图未知且需要从观测数据中推断的情况下的应用。实证上，我们展示了这些发现如何用于改善预测编码在图像分类任务中的性能，并得出结论称这种模型能够执行简单的端到端因果推断任务。

更新时间: 2024-06-03 13:43:52

领域: cs.LG

下载: http://arxiv.org/abs/2306.15479v2

Sequence-to-Sequence Multi-Modal Speech In-Painting

Speech in-painting is the task of regenerating missing audio contents using reliable context information. Despite various recent studies in multi-modal perception of audio in-painting, there is still a need for an effective infusion of visual and auditory information in speech in-painting. In this paper, we introduce a novel sequence-to-sequence model that leverages the visual information to in-paint audio signals via an encoder-decoder architecture. The encoder plays the role of a lip-reader for facial recordings and the decoder takes both encoder outputs as well as the distorted audio spectrograms to restore the original speech. Our model outperforms an audio-only speech in-painting model and has comparable results with a recent multi-modal speech in-painter in terms of speech quality and intelligibility metrics for distortions of 300 ms to 1500 ms duration, which proves the effectiveness of the introduced multi-modality in speech in-painting.

Updated: 2024-06-03 13:42:10

标题: 序列到序列多模态语音修复

摘要: 语音修复是利用可靠的上下文信息恢复缺失的音频内容的任务。尽管近年来进行了各种多模态感知的音频修复研究，但仍然需要在语音修复中有效地融合视觉和听觉信息。本文介绍了一种新颖的序列到序列模型，通过编码器-解码器架构利用视觉信息来修复音频信号。编码器在面部录音中扮演唇读者的角色，解码器则同时接收编码器输出和失真的音频频谱图，恢复原始语音。我们的模型在语音修复方面优于仅音频的语音修复模型，并在300毫秒到1500毫秒持续时间的失真方面与最近的多模态语音修复模型具有可比的结果，这证明了在语音修复中引入多模态的有效性。

更新时间: 2024-06-03 13:42:10

领域: cs.SD,cs.AI,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2406.01321v1

Robotic Imitation of Human Actions

Imitation can allow us to quickly gain an understanding of a new task. Through a demonstration, we can gain direct knowledge about which actions need to be performed and which goals they have. In this paper, we introduce a new approach to imitation learning that tackles the challenges of a robot imitating a human, such as the change in perspective and body schema. Our approach can use a single human demonstration to abstract information about the demonstrated task, and use that information to generalise and replicate it. We facilitate this ability by a new integration of two state-of-the-art methods: a diffusion action segmentation model to abstract temporal information from the demonstration and an open vocabulary object detector for spatial information. Furthermore, we refine the abstracted information and use symbolic reasoning to create an action plan utilising inverse kinematics, to allow the robot to imitate the demonstrated action.

Updated: 2024-06-03 13:40:44

标题: 机器人模仿人类动作

摘要: 模仿可以让我们快速掌握新任务的理解。通过演示，我们可以直接了解需要执行哪些动作以及它们的目标。本文介绍了一种新的模仿学习方法，解决了机器人模仿人类所面临的挑战，例如视角和身体图式的变化。我们的方法可以利用单一人类演示来提取所演示任务的信息，并利用该信息进行泛化和复制。我们通过将两种最先进的方法进行新的整合来实现这一能力：扩散动作分割模型用于从演示中提取时间信息，以及用于空间信息的开放词汇目标检测器。此外，我们通过精化提取的信息，并利用符号推理来创建一个利用逆运动学的动作计划，使机器人能够模仿所演示的动作。

更新时间: 2024-06-03 13:40:44

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2401.08381v2

TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild

Large language models with instruction-following abilities have revolutionized the field of artificial intelligence. These models show exceptional generalizability to tackle various real-world tasks through their natural language interfaces. However, their performance heavily relies on high-quality exemplar data, which is often difficult to obtain. This challenge is further exacerbated when it comes to multimodal instruction following. We introduce TextBind, an almost annotation-free framework for empowering larger language models with the multi-turn interleaved multimodal instruction-following capabilities. Our approach requires only image-caption pairs and generates multi-turn multimodal instruction-response conversations from a language model. To accommodate interleaved image-text inputs and outputs, we devise MIM, a language model-centric architecture that seamlessly integrates image encoder and decoder models. We release our dataset, model, and demo to foster future research in the area of multimodal instruction following.

Updated: 2024-06-03 13:39:40

标题: TextBind：野外多轮交织多模态指示跟随

摘要: 具有指令遵循能力的大型语言模型已经彻底改变了人工智能领域。这些模型通过其自然语言界面展示出出色的泛化能力，能够解决各种真实世界任务。然而，它们的性能严重依赖于高质量的示范数据，而这通常很难获得。当涉及多模态指令跟随时，这一挑战进一步加剧。我们介绍了TextBind，这是一个几乎不需要注释的框架，用于赋予更大语言模型多回合交替的多模态指令遵循能力。我们的方法仅需要图像-标题对，并从语言模型生成多回合的多模态指令-响应对话。为了适应交错的图像-文本输入和输出，我们设计了MIM，这是一个以语言模型为中心的架构，能够无缝集成图像编码器和解码器模型。我们发布了我们的数据集、模型和演示，以促进未来在多模态指令跟随领域的研究。

更新时间: 2024-06-03 13:39:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2309.08637v5

Quantum Generative Diffusion Model: A Fully Quantum-Mechanical Model for Generating Quantum State Ensemble

Classical diffusion models have shown superior generative results and have been applied to many problems. Exploring these models in the quantum domain can advance the field of quantum generative learning. In this paper, we introduce the Quantum Generative Diffusion Model (QGDM), a simple and elegant quantum counterpart of classical diffusion models. The core idea of QGDM is that any target quantum state can be transformed into a completely mixed state, which has the highest entropy and maximum uncertainty about the system, through a non-unitary forward process. Subsequently, a trainable backward process can be used to recover the target state from the completely mixed state. The design requirements for QGDM's backward process include ensuring non-unitarity while maintaining a low number of parameters. To achieve this, we introduce partial trace operations in the backward process to enforce non-unitary. Additionally, we control the number of trainable parameters by using a parameter-sharing strategy and incorporating temporal information as an input in the backward process. Furthermore, we introduce a resource-efficient version of QGDM, which reduces the number of auxiliary qubits while preserving impressive generative capabilities. Our proposed models exhibit better convergence performance than Quantum Generative Adversarial Networks (QGANs) because our models optimize a convex distance function using gradient descent. Comparative results with QGANs demonstrate the effectiveness of our models in generating both pure and mixed quantum states. Notably, our models achieve 53.03% higher fidelity in mixed-state generation tasks compared to QGANs. These results highlight the potential of the proposed models to tackle challenging quantum generation tasks.

Updated: 2024-06-03 13:37:50

标题: 量子生成扩散模型：一种用于生成量子态集合的全量子力学模型

摘要: 经典扩散模型已经展示出卓越的生成结果，并被应用于许多问题。在量子领域探索这些模型可以推动量子生成学习领域的发展。在本文中，我们介绍了量子生成扩散模型（QGDM），这是经典扩散模型的简单而优雅的量子对应物。 QGDM的核心思想是通过非幺正的前向过程将任何目标量子态转化为完全混合态，这种态具有最高的熵和对系统的最大不确定性。随后，可以使用可训练的反向过程从完全混合态恢复目标态。QGDM反向过程的设计要求包括确保非幺正性同时保持低参数数量。为了实现这一点，我们在反向过程中引入了偏迹操作来强制非幺正性。此外，我们通过使用参数共享策略和将时间信息作为反向过程的输入来控制可训练参数的数量。此外，我们还引入了一种资源高效的QGDM版本，减少了辅助量子比特的数量，同时保留了出色的生成能力。我们提出的模型表现出比量子生成对抗网络（QGANs）更好的收敛性能，因为我们的模型使用梯度下降优化凸距离函数。与QGANs的比较结果表明，我们的模型在生成纯量子态和混合量子态方面的效果显著。值得注意的是，我们的模型在混合态生成任务中的保真度比QGANs高出53.03%。这些结果突显了所提出模型应对具有挑战性的量子生成任务的潜力。

更新时间: 2024-06-03 13:37:50

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2401.07039v2

Conservative Prediction via Data-Driven Confidence Minimization

In safety-critical applications of machine learning, it is often desirable for a model to be conservative, abstaining from making predictions on unknown inputs which are not well-represented in the training data. However, detecting unknown examples is challenging, as it is impossible to anticipate all potential inputs at test time. To address this, prior work (Hendrycks et al., 2018) minimizes model confidence on an auxiliary outlier dataset carefully curated to be disjoint from the training distribution. We theoretically analyze the choice of auxiliary dataset for confidence minimization, revealing two actionable insights: (1) if the auxiliary set contains unknown examples similar to those seen at test time, confidence minimization leads to provable detection of unknown test examples, and (2) if the first condition is satisfied, it is unnecessary to filter out known examples for out-of-distribution (OOD) detection. Motivated by these guidelines, we propose the Data-Driven Confidence Minimization (DCM) framework, which minimizes confidence on an uncertainty dataset. We apply DCM to two problem settings in which conservative prediction is paramount -- selective classification and OOD detection -- and provide a realistic way to gather uncertainty data for each setting. In our experiments, DCM consistently outperforms existing selective classification approaches on 4 datasets when tested on unseen distributions and outperforms state-of-the-art OOD detection methods on 12 ID-OOD dataset pairs, reducing FPR (at TPR $95\%$) by $6.3\%$ and $58.1\%$ on CIFAR-10 and CIFAR-100 compared to Outlier Exposure.

Updated: 2024-06-03 13:30:28

标题: 基于数据驱动的置信度最小化的保守预测

摘要: 在机器学习的安全关键应用中，通常希望模型保守，避免对在训练数据中未很好表示的未知输入进行预测。然而，检测未知示例是具有挑战性的，因为在测试时不可能预先知道所有潜在输入。为了解决这个问题，先前的研究（Hendrycks等，2018）在一个精心策划的辅助异常数据集上最小化模型的置信度，该数据集与训练分布不重叠。我们在理论上分析了用于置信度最小化的辅助数据集的选择，揭示了两个可操作的见解：（1）如果辅助集包含类似于测试时看到的未知示例，那么置信度最小化会导致未知测试示例的可证明检测，（2）如果第一个条件得到满足，则无需为超出分布（OOD）检测过滤已知示例。受这些指导方针的启发，我们提出了数据驱动置信度最小化（DCM）框架，该框架在一个不确定性数据集上最小化置信度。我们将DCM应用于两个问题设置中，其中保守预测至关重要--选择性分类和OOD检测--并为每个设置提供了一种收集不确定性数据的现实方法。在我们的实验中，当在看不见的分布上进行测试时，DCM在4个数据集上始终优于现有的选择性分类方法，并在12个ID-OOD数据集对中优于最先进的OOD检测方法，将FPR（在TPR $95\%$时）与Outlier Exposure相比，在CIFAR-10和CIFAR-100上分别减少了$6.3\%$和$58.1\%$。

更新时间: 2024-06-03 13:30:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.04974v2

The Intelligible and Effective Graph Neural Additive Networks

Graph Neural Networks (GNNs) have emerged as the predominant approach for learning over graph-structured data. However, most GNNs operate as black-box models and require post-hoc explanations, which may not suffice in high-stakes scenarios where transparency is crucial. In this paper, we present a GNN that is interpretable by design. Our model, Graph Neural Additive Network (GNAN), is a novel extension of the interpretable class of Generalized Additive Models, and can be visualized and fully understood by humans. GNAN is designed to be fully interpretable, allowing both global and local explanations at the feature and graph levels through direct visualization of the model. These visualizations describe the exact way the model uses the relationships between the target variable, the features, and the graph. We demonstrate the intelligibility of GNANs in a series of examples on different tasks and datasets. In addition, we show that the accuracy of GNAN is on par with black-box GNNs, making it suitable for critical applications where transparency is essential, alongside high accuracy.

Updated: 2024-06-03 13:29:36

标题: 可解释和有效的图神经加性网络

摘要: 图神经网络（GNNs）已经成为处理图结构数据学习的主要方法。然而，大多数GNNs作为黑盒模型运行，并需要事后解释，这在透明度至关重要的高风险场景中可能不足够。在本文中，我们提出了一种通过设计可解释的GNN。我们的模型，图神经加性网络（GNAN），是可解释类别的广义加性模型的一种新扩展，可以被人类可视化和完全理解。GNAN被设计为完全可解释，通过直接可视化模型可以在特征和图级别提供全局和局部解释。这些可视化描述了模型如何精确使用目标变量、特征和图之间的关系。我们通过一系列不同任务和数据集的示例展示了GNAN的可解释性。此外，我们展示了GNAN的准确性与黑盒GNNs相当，使其适用于透明度至关重要且准确率高的关键应用。

更新时间: 2024-06-03 13:29:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01317v1

Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs

Due to the scarcity of labeled sensor data in HAR, prior research has turned to video data to synthesize Inertial Measurement Units (IMU) data, capitalizing on its rich activity annotations. However, generating IMU data from videos presents challenges for HAR in real-world settings, attributed to the poor quality of synthetic IMU data and its limited efficacy in subtle, fine-grained motions. In this paper, we propose Multi$^3$Net, our novel multi-modal, multitask, and contrastive-based framework approach to address the issue of limited data. Our pretraining procedure uses videos from online repositories, aiming to learn joint representations of text, pose, and IMU simultaneously. By employing video data and contrastive learning, our method seeks to enhance wearable HAR performance, especially in recognizing subtle activities.Our experimental findings validate the effectiveness of our approach in improving HAR performance with IMU data. We demonstrate that models trained with synthetic IMU data generated from videos using our method surpass existing approaches in recognizing fine-grained activities.

Updated: 2024-06-03 13:28:42

标题: 通过语言、姿势和合成IMU的联合表示增强惯性手部动作识别

摘要: 由于HAR中标记传感器数据稀缺，先前的研究已经转向使用视频数据来合成惯性测量单元（IMU）数据，利用其丰富的活动标注。然而，在现实世界环境中从视频中生成IMU数据对HAR提出了挑战，这是因为合成IMU数据的质量较差，对微妙、细粒度运动的有效性有限。在本文中，我们提出了Multi$^3$Net，我们的新颖的多模态、多任务和基于对比的框架方法，以解决数据有限的问题。我们的预训练过程使用来自在线数据库的视频，旨在同时学习文本、姿势和IMU的联合表示。通过使用视频数据和对比学习，我们的方法旨在提高可穿戴HAR的性能，特别是在识别微妙活动方面。我们的实验结果验证了我们方法在改进HAR性能方面的有效性。我们表明，使用我们的方法从视频生成的合成IMU数据训练的模型在识别细粒度活动方面超过了现有方法。

更新时间: 2024-06-03 13:28:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.01316v1

Scale-Free Image Keypoints Using Differentiable Persistent Homology

In computer vision, keypoint detection is a fundamental task, with applications spanning from robotics to image retrieval; however, existing learning-based methods suffer from scale dependency and lack flexibility. This paper introduces a novel approach that leverages Morse theory and persistent homology, powerful tools rooted in algebraic topology. We propose a novel loss function based on the recent introduction of a notion of subgradient in persistent homology, paving the way toward topological learning. Our detector, MorseDet, is the first topology-based learning model for feature detection, which achieves competitive performance in keypoint repeatability and introduces a principled and theoretically robust approach to the problem.

Updated: 2024-06-03 13:27:51

标题: 使用可微持久同调构建无尺度图像关键点

摘要: 在计算机视觉中，关键点检测是一项基本任务，应用范围从机器人到图像检索都有涉及；然而，现有的基于学习的方法存在尺度依赖性和缺乏灵活性的问题。本文介绍了一种利用莫尔斯理论和持久同调的新方法，这是代数拓扑学中的强大工具。我们提出了一种基于最近引入的持久同调中次梯度概念的新损失函数，为拓扑学习铺平了道路。我们的检测器MorseDet是第一个基于拓扑学的学习模型，用于特征检测，在关键点重复性方面取得了竞争性性能，并引入了一个有原则且在理论上健壮的方法来解决这个问题。

更新时间: 2024-06-03 13:27:51

领域: cs.CV,cs.LG,math.AT,55N31,I.2.10

下载: http://arxiv.org/abs/2406.01315v1

Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization

The Transformer model has been pivotal in advancing fields such as natural language processing, speech recognition, and computer vision. However, a critical limitation of this model is its quadratic computational and memory complexity relative to the sequence length, which constrains its application to longer sequences. This is especially crucial in medical imaging where high-resolution images can reach gigapixel scale. Efforts to address this issue have predominantely focused on complex techniques, such as decomposing the softmax operation integral to the Transformer's architecture. This paper addresses this quadratic computational complexity of Transformer models and introduces a remarkably simple and effective method that circumvents this issue by eliminating the softmax function from the attention mechanism and adopting a sequence normalization technique for the key, query, and value tokens. Coupled with a reordering of matrix multiplications this approach reduces the memory- and compute complexity to a linear scale. We evaluate this approach across various medical imaging datasets comprising fundoscopic, dermascopic, radiologic and histologic imaging data. Our findings highlight that these models exhibit a comparable performance to traditional transformer models, while efficiently handling longer sequences.

Updated: 2024-06-03 13:27:08

标题: 使用无softmax的transformer和序列归一化进行高效医学图像分类

摘要: Transformer模型在推动自然语言处理、语音识别和计算机视觉等领域取得了重大进展。然而，该模型的一个关键局限性是与序列长度相关的二次计算和内存复杂性，这限制了其对更长序列的应用。在医学影像领域，这一点尤为重要，因为高分辨率图像可以达到千兆像素的规模。解决这一问题的努力主要集中在复杂的技术上，比如分解Transformer架构中的softmax操作。本文解决了Transformer模型的这种二次计算复杂性，并引入了一种非常简单且有效的方法，通过从注意力机制中消除softmax函数，并采用一种适用于关键、查询和值标记的序列归一化技术来规避这一问题。结合矩阵乘法的重新排列，这种方法将内存和计算复杂性降低到线性规模。我们在包括眼底、皮肤、放射和组织学影像数据在内的各种医学影像数据集上评估了这种方法。我们的研究结果表明，这些模型在处理更长序列时表现出与传统Transformer模型相当的性能。

更新时间: 2024-06-03 13:27:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.01314v1

REvolve: Reward Evolution with Large Language Models for Autonomous Driving

Designing effective reward functions is crucial to training reinforcement learning (RL) algorithms. However, this design is non-trivial, even for domain experts, due to the subjective nature of certain tasks that are hard to quantify explicitly. In recent works, large language models (LLMs) have been used for reward generation from natural language task descriptions, leveraging their extensive instruction tuning and commonsense understanding of human behavior. In this work, we hypothesize that LLMs, guided by human feedback, can be used to formulate human-aligned reward functions. Specifically, we study this in the challenging setting of autonomous driving (AD), wherein notions of "good" driving are tacit and hard to quantify. To this end, we introduce REvolve, an evolutionary framework that uses LLMs for reward design in AD. REvolve creates and refines reward functions by utilizing human feedback to guide the evolution process, effectively translating implicit human knowledge into explicit reward functions for training (deep) RL agents. We demonstrate that agents trained on REvolve-designed rewards align closely with human driving standards, thereby outperforming other state-of-the-art baselines.

Updated: 2024-06-03 13:23:27

标题: REvolve：利用大型语言模型进行自主驾驶的奖励演化

摘要: 设计有效的奖励函数对于训练强化学习（RL）算法至关重要。然而，即使对于领域专家来说，这种设计也并非易事，因为某些任务的主观性质很难明确量化。在最近的研究中，大型语言模型（LLMs）已被用于从自然语言任务描述中生成奖励，利用它们对指导调整和人类行为常识的广泛理解。在这项工作中，我们假设LLMs在人类反馈的指导下，可以用于制定与人类对齐的奖励函数。具体而言，我们在自动驾驶（AD）这一具有挑战性的领域中研究了这一点，在这个领域中"良好"驾驶的概念是含蓄的，很难量化。为此，我们引入了REvolve，一个利用LLMs进行AD奖励设计的演化框架。REvolve通过利用人类反馈来指导演化过程创建和完善奖励函数，有效地将隐含的人类知识转化为明确的奖励函数，用于训练（深度）RL代理。我们证明，经由REvolve设计的奖励训练的代理与人类驾驶标准紧密对齐，从而超越了其他最先进的基线模型。

更新时间: 2024-06-03 13:23:27

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2406.01309v1

Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models

With the emergence of pretrained vision-language models (VLMs), considerable efforts have been devoted to fine-tuning them for downstream tasks. Despite the progress made in designing efficient fine-tuning methods, such methods require access to the model's parameters, which can be challenging as model owners often opt to provide their models as a black box to safeguard model ownership. This paper proposes a \textbf{C}ollabo\textbf{ra}tive \textbf{F}ine-\textbf{T}uning (\textbf{CraFT}) approach for fine-tuning black-box VLMs to downstream tasks, where one only has access to the input prompts and the output predictions of the model. CraFT comprises two modules, a prompt generation module for learning text prompts and a prediction refinement module for enhancing output predictions in residual style. Additionally, we introduce an auxiliary prediction-consistent loss to promote consistent optimization across these modules. These modules are optimized by a novel collaborative training algorithm. Extensive experiments on few-shot classification over 15 datasets demonstrate the superiority of CraFT. The results show that CraFT achieves a decent gain of about 12\% with 16-shot datasets and only 8,000 queries. Moreover, CraFT trains faster and uses only about 1/80 of the memory footprint for deployment, while sacrificing only 1.62\% compared to the white-box method. Our code is publicly available at https://github.com/mrflogs/CraFT .

Updated: 2024-06-03 13:22:12

标题: 连接点：黑盒视觉语言模型的协作微调

摘要: 随着预训练的视觉语言模型（VLMs）的出现，人们已经投入了相当大的努力来对它们进行下游任务的微调。尽管在设计高效的微调方法方面取得了进展，但这些方法需要访问模型的参数，这可能具有挑战性，因为模型所有者通常选择将他们的模型作为黑匣子以保护模型所有权。本文提出了一种针对黑匣子VLMs进行微调的\textbf{C}ollabo\textbf{ra}tive \textbf{F}ine-\textbf{T}uning（\textbf{CraFT}）方法，其中只能访问模型的输入提示和输出预测。CraFT包括两个模块，一个用于学习文本提示的提示生成模块，和一个用于以残差风格增强输出预测的预测细化模块。此外，我们引入了一个辅助预测一致性损失，以促进这些模块之间的一致优化。这些模块通过一种新颖的协作训练算法进行优化。对15个数据集上的少样本分类进行了大量实验，结果显示CraFT的优势。结果表明，CraFT在16-shot数据集和仅8000个查询时实现了约12％的不错增益。此外，CraFT训练速度更快，部署时仅使用约1/80的内存占用，并且与白箱方法相比仅牺牲了1.62％。我们的代码可以在https://github.com/mrflogs/CraFT 上公开获取。

更新时间: 2024-06-03 13:22:12

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2402.04050v2

CodeR: Issue Resolving with Multi-Agent and Task Graphs

GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.00% of issues, in the case of submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction.

Updated: 2024-06-03 13:13:35

标题: CodeR: 使用多智能体和任务图解决问题

摘要: 最近，GitHub问题解决引起了学术界和工业界的重视。SWE-bench被提出用来衡量解决问题的性能。本文提出了CodeR，采用了多代理框架和预定义的任务图来修复和解决报告的错误，并在代码库中添加新功能。在SWE-bench lite上，CodeR能够解决28.00％的问题，即每个问题只提交一次。我们检验了CodeR每个设计的性能影响，并为推动这一研究方向提供见解。

更新时间: 2024-06-03 13:13:35

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2406.01304v1

Partial Search in a Frozen Network is Enough to Find a Strong Lottery Ticket

Randomly initialized dense networks contain subnetworks that achieve high accuracy without weight learning -- strong lottery tickets (SLTs). Recently, Gadhikar et al. (2023) demonstrated that SLTs can also be found within a randomly pruned source network, thus reducing the SLT search space. However, this limits the search to SLTs that are even sparser than the source, leading to worse accuracy due to unintentionally high sparsity. This paper proposes a method that reduces the SLT search space by an arbitrary ratio independent of the desired SLT sparsity. A random subset of the initial weights is excluded from the search space by freezing it -- i.e., by either permanently pruning them or locking them as a fixed part of the SLT. In addition to reducing search space, the proposed random freezing can also provide the benefit of reducing the model size for inference. Furthermore, experimental results show that the proposed method finds SLTs with better accuracy-to-model size trade-off than the SLTs obtained from dense or randomly pruned source networks. In particular, the SLTs found in Frozen ResNets on image classification using ImageNet significantly improve the accuracy-to-search space and accuracy-to-model size trade-offs over SLTs within dense (non-freezing) or sparse (non-locking) random networks.

Updated: 2024-06-03 13:12:18

标题: 部分搜索在冻结网络中足以找到一个强大的“幸运彩票”

摘要: 随机初始化的稠密网络包含能够在不进行权重学习的情况下实现高准确性的子网络 - 强大的中奖票（SLTs）。最近，Gadhikar等人（2023）证明了在随机修剪的源网络中也可以找到SLTs，从而减少了SLT的搜索空间。然而，这将搜索限制在比源网络更稀疏的SLTs上，由于意外高稀疏度导致准确性变差。本文提出了一种方法，通过一个任意比例减少SLT搜索空间，而不受所需SLT稀疏性的影响。通过冻结初始权重的随机子集，将其排除在搜索空间之外 - 即通过永久修剪它们或将其锁定为SLT的固定部分。除了减少搜索空间外，提出的随机冻结还可以提供减小推理模型大小的好处。此外，实验结果表明，与从稠密或随机修剪的源网络中获得的SLTs相比，提出的方法找到的SLTs具有更好的准确性-模型大小权衡。特别是，在使用ImageNet进行图像分类的Frozen ResNets中找到的SLTs显着改善了准确性-搜索空间和准确性-模型大小之间的权衡，而在稠密（非冻结）或稀疏（非锁定）随机网络中找到的SLTs。

更新时间: 2024-06-03 13:12:18

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2402.14029v2

Sequential Neural Score Estimation: Likelihood-Free Inference with Conditional Score Based Diffusion Models

We introduce Sequential Neural Posterior Score Estimation (SNPSE), a score-based method for Bayesian inference in simulator-based models. Our method, inspired by the remarkable success of score-based methods in generative modelling, leverages conditional score-based diffusion models to generate samples from the posterior distribution of interest. The model is trained using an objective function which directly estimates the score of the posterior. We embed the model into a sequential training procedure, which guides simulations using the current approximation of the posterior at the observation of interest, thereby reducing the simulation cost. We also introduce several alternative sequential approaches, and discuss their relative merits. We then validate our method, as well as its amortised, non-sequential, variant on several numerical examples, demonstrating comparable or superior performance to existing state-of-the-art methods such as Sequential Neural Posterior Estimation (SNPE).

Updated: 2024-06-03 13:07:57

标题: 顺序神经得分估计：基于条件得分扩散模型的无似然推断

摘要: 我们介绍了顺序神经后验得分估计（SNPSE），这是一种基于得分的方法，用于模拟器基础模型中的贝叶斯推断。我们的方法受到生成建模中基于得分的方法取得显著成功的启发，利用条件得分扩散模型生成感兴趣后验分布的样本。该模型使用一个直接估计后验得分的目标函数进行训练。我们将模型嵌入到一个顺序训练过程中，该过程利用观察到的兴趣后验的当前近似值来指导模拟，从而减少模拟成本。我们还介绍了几种替代的顺序方法，并讨论它们的相对优点。然后，我们验证了我们的方法，以及其摊销的、非顺序的变体在几个数值示例中，证明了与现有最先进的方法（如顺序神经后验估计（SNPE））相比，具有可比或更优秀的性能。

更新时间: 2024-06-03 13:07:57

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2210.04872v3

Graph-enhanced Large Language Models in Asynchronous Plan Reasoning

Planning is a fundamental property of human intelligence. Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs. Can large language models (LLMs) succeed at this task? Here, we present the first large-scale study investigating this question. We find that a representative set of closed and open-source LLMs, including GPT-4 and LLaMA-2, behave poorly when not supplied with illustrations about the task-solving process in our benchmark AsyncHow. We propose a novel technique called Plan Like a Graph (PLaG) that combines graphs with natural language prompts and achieves state-of-the-art results. We show that although PLaG can boost model performance, LLMs still suffer from drastic degradation when task complexity increases, highlighting the limits of utilizing LLMs for simulating digital devices. We see our study as an exciting step towards using LLMs as efficient autonomous agents. Our code and data are available at https://github.com/fangru-lin/graph-llm-asynchow-plan.

Updated: 2024-06-03 13:07:06

标题: 异步计划推理中增强的大型语言模型

摘要: 规划是人类智能的基本属性。由于需要顺序和并行规划以优化时间成本，因此对于异步计划的推理是具有挑战性的。大型语言模型（LLMs）能够成功完成这项任务吗？在这里，我们提出了第一项大规模研究，探讨了这个问题。我们发现，代表性的闭源和开源LLMs集合，包括GPT-4和LLaMA-2，在我们的基准异步计划（AsyncHow）中表现不佳，当没有提供有关任务解决过程的插图时。我们提出了一种称为Plan Like a Graph（PLaG）的新技术，它将图形与自然语言提示结合起来，实现了最先进的结果。我们展示了尽管PLaG可以提高模型性能，但当任务复杂性增加时，LLMs仍然会遭受严重退化，突显了利用LLMs模拟数字设备的局限性。我们将我们的研究视为使用LLMs作为高效自主代理的一大步。我们的代码和数据可在https://github.com/fangru-lin/graph-llm-asynchow-plan上找到。

更新时间: 2024-06-03 13:07:06

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.02805v2

Accelerating Graph Neural Networks via Edge Pruning for Power Allocation in Wireless Networks

Graph Neural Networks (GNNs) have recently emerged as a promising approach to tackling power allocation problems in wireless networks. Since unpaired transmitters and receivers are often spatially distant, the distance-based threshold is proposed to reduce the computation time by excluding or including the channel state information in GNNs. In this paper, we are the first to introduce a neighbour-based threshold approach to GNNs to reduce the time complexity. Furthermore, we conduct a comprehensive analysis of both distance-based and neighbour-based thresholds and provide recommendations for selecting the appropriate value in different communication channel scenarios. We design the corresponding neighbour-based Graph Neural Networks (N-GNN) with the aim of allocating transmit powers to maximise the network throughput. Our results show that our proposed N-GNN offer significant advantages in terms of reducing time complexity while preserving strong performance and generalisation capacity. Besides, we show that by choosing a suitable threshold, the time complexity is reduced from O(|V|^2) to O(|V|), where |V| is the total number of transceiver pairs.

Updated: 2024-06-03 13:06:52

标题: 通过边缘修剪加速图神经网络，用于在无线网络中进行功率分配

摘要: 图神经网络（GNNs）最近已经成为解决无线网络中功率分配问题的一种有前途的方法。由于配对的发射机和接收机通常在空间上相距较远，因此提出了基于距离的阈值方法，通过在GNNs中排除或包含信道状态信息来减少计算时间。本文首次引入了基于邻居的阈值方法到GNNs中，以减少时间复杂度。此外，我们对基于距离和基于邻居的阈值进行了全面分析，并提供了在不同通信信道场景中选择适当值的建议。我们设计了相应的基于邻居的图神经网络（N-GNN），旨在分配发射功率以最大化网络吞吐量。我们的结果表明，我们提出的N-GNN在减少时间复杂度的同时保持了强大的性能和泛化能力。此外，我们表明通过选择适当的阈值，时间复杂度可以从O（|V| ^ 2）减少到O（|V|），其中|V|是发射机-接收机对的总数。

更新时间: 2024-06-03 13:06:52

领域: cs.IT,cs.LG,cs.NI,eess.SP,math.IT

下载: http://arxiv.org/abs/2305.12639v2

Resource-constrained Fairness

Access to resources strongly constrains the decisions we make. While we might wish to offer every student a scholarship, or schedule every patient for follow-up meetings with a specialist, limited resources mean that this is not possible. Existing tools for fair machine learning ignore these key constraints, with the majority of methods disregarding any finite resource limitations under which decisions are made. Our research introduces the concept of ``resource-constrained fairness" and quantifies the cost of fairness within this framework. We demonstrate that the level of available resources significantly influences this cost, a factor that has been overlooked in previous evaluations.

Updated: 2024-06-03 13:01:09

标题: 受资源限制的公平性

摘要: 资源的获取严重限制了我们所做的决定。虽然我们可能希望为每个学生提供奖学金，或安排每位患者与专家进行跟踪会议，但有限的资源意味着这是不可能的。现有的公平机器学习工具忽略了这些关键约束条件，大多数方法忽视了决策所处的任何有限资源限制。我们的研究引入了“资源受限公平”概念，并在这一框架内量化了公平的成本。我们证明了可用资源水平在很大程度上影响了这一成本，这是先前评估中被忽视的因素。

更新时间: 2024-06-03 13:01:09

领域: cs.LG

下载: http://arxiv.org/abs/2406.01290v1

TATTOOED: A Robust Deep Neural Network Watermarking Scheme based on Spread-Spectrum Channel Coding

Watermarking of deep neural networks (DNNs) has gained significant traction in recent years, with numerous (watermarking) strategies being proposed as mechanisms that can help verify the ownership of a DNN in scenarios where these models are obtained without the permission of the owner. However, a growing body of work has demonstrated that existing watermarking mechanisms are highly susceptible to removal techniques, such as fine-tuning, parameter pruning, or shuffling. In this paper, we build upon extensive prior work on covert (military) communication and propose TATTOOED, a novel DNN watermarking technique that is robust to existing threats. We demonstrate that using TATTOOED as their watermarking mechanisms, the DNN owner can successfully obtain the watermark and verify model ownership even in scenarios where 99% of model parameters are altered. Furthermore, we show that TATTOOED is easy to employ in training pipelines, and has negligible impact on model performance.

Updated: 2024-06-03 12:59:35

标题: TATTOOED：基于扩频频谱信道编码的稳健深度神经网络水印方案

摘要: 深度神经网络（DNNs）的水印技术近年来备受关注，许多水印策略被提出作为一种机制，可以帮助验证DNN的所有权，尤其是在未经所有者许可获取这些模型的情况下。然而，越来越多的研究表明，现有的水印机制极易受到细调、参数修剪或洗牌等移除技术的影响。在本文中，我们借鉴了大量关于秘密（军事）通信的先前工作，提出了一种新颖的DNN水印技术TATTOOED，该技术对现有威胁具有鲁棒性。我们证明，使用TATTOOED作为水印机制，DNN所有者可以成功获取水印并验证模型所有权，即使在99％的模型参数被改变的情况下也是如此。此外，我们展示了TATTOOED在训练管道中易于使用，并对模型性能影响极小。

更新时间: 2024-06-03 12:59:35

领域: cs.CR,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2202.06091v3

Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses

Recently, Anil et al. (2024) show that many-shot (up to hundreds of) demonstrations can jailbreak state-of-the-art LLMs by exploiting their long-context capability. Nevertheless, is it possible to use few-shot demonstrations to efficiently jailbreak LLMs within limited context sizes? While the vanilla few-shot jailbreaking may be inefficient, we propose improved techniques such as injecting special system tokens like [/INST] and employing demo-level random search from a collected demo pool. These simple techniques result in surprisingly effective jailbreaking against aligned LLMs (even with advanced defenses). For examples, our method achieves >80% (mostly >95%) ASRs on Llama-2-7B and Llama-3-8B without multiple restarts, even if the models are enhanced by strong defenses such as perplexity detection and/or SmoothLLM, which is challenging for suffix-based jailbreaking. In addition, we conduct comprehensive and elaborate (e.g., making sure to use correct system prompts) evaluations against other aligned LLMs and advanced defenses, where our method consistently achieves nearly 100% ASRs. Our code is available at https://github.com/sail-sg/I-FSJ.

Updated: 2024-06-03 12:59:17

标题: 改进的少样本越狱方法可以规避对齐语言模型及其防御措施

摘要: 最近，Anil等人（2024年）展示了许多次（多达数百次）演示如何利用长上下文能力来越狱最先进的LLM。然而，是否可能使用少量次演示来有效地越狱限制上下文大小的LLM？尽管普通的少量次越狱可能效率低下，我们提出改进的技术，如注入特殊系统标记（如[/INST]）和利用从收集的演示池中进行的演示级随机搜索。这些简单的技术对齐LLM产生了令人惊讶的有效越狱（即使具有先进的防御措施）。例如，我们的方法在没有多次重启的情况下在Llama-2-7B和Llama-3-8B上实现了>80%（大部分>95%）的ASR，即使模型经过强大的防御措施增强，如困惑检测和/或SmoothLLM，这对基于后缀的越狱来说是具有挑战性的。此外，我们对其他对齐的LLM和先进的防御措施进行了全面和详尽的评估（例如，确保使用正确的系统提示），在这些评估中，我们的方法始终实现了接近100%的ASR。我们的代码可以在https://github.com/sail-sg/I-FSJ找到。

更新时间: 2024-06-03 12:59:17

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.01288v1

An Analysis under a Unified Fomulation of Learning Algorithms with Output Constraints

Neural networks (NN) perform well in diverse tasks, but sometimes produce nonsensical results to humans. Most NN models "solely" learn from (input, output) pairs, occasionally conflicting with human knowledge. Many studies indicate injecting human knowledge by reducing output constraints during training can improve model performance and reduce constraint violations. While there have been several attempts to compare different existing algorithms under the same programming framework, nonetheless, there has been no previous work that categorizes learning algorithms with output constraints in a unified manner. Our contributions are as follows: (1) We categorize the previous studies based on three axes: type of constraint loss used (e.g. probabilistic soft logic, REINFORCE), exploration strategy of constraint-violating examples, and integration mechanism of learning signals from main task and constraint. (2) We propose new algorithms to integrate the information of main task and constraint injection, inspired by continual-learning algorithms. (3) Furthermore, we propose the $H\beta$-score as a metric for considering the main task metric and constraint violation simultaneously. To provide a thorough analysis, we examine all the algorithms on three NLP tasks: natural language inference (NLI), synthetic transduction examples (STE), and semantic role labeling (SRL). We explore and reveal the key factors of various algorithms associated with achieving high $H\beta$-scores.

Updated: 2024-06-03 12:58:29

标题: 一个统一公式下对具有输出约束的学习算法的分析

摘要: 神经网络在各种任务中表现出色，但有时会产生对人类来说毫无意义的结果。大多数神经网络模型“仅仅”从（输入，输出）对中学习，有时与人类知识相冲突。许多研究表明，在训练过程中通过减少输出约束来注入人类知识可以改善模型性能并减少约束违反。虽然已经有几次尝试在同一编程框架下比较不同的现有算法，但以前没有将具有输出约束的学习算法进行统一分类的先前工作。我们的贡献如下：（1）我们基于三个轴对以前的研究进行分类：使用的约束损失类型（例如，概率软逻辑，REINFORCE），违反约束示例的探索策略，以及从主任务和约束中学习信号的整合机制。（2）我们提出了新的算法来整合主任务和约束注入的信息，灵感来自持续学习算法。（3）此外，我们提出了$H\beta$-score作为同时考虑主任务指标和约束违反的度量标准。为了进行彻底的分析，我们在三个自然语言处理任务上检验了所有算法：自然语言推理（NLI），合成转导示例（STE）和语义角色标注（SRL）。我们探索并揭示了与实现高$H\beta$-score相关的各种算法的关键因素。

更新时间: 2024-06-03 12:58:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01647v1

Unlock the Power of Algorithm Features: A Generalization Analysis for Algorithm Selection

In the algorithm selection research, the discussion surrounding algorithm features has been significantly overshadowed by the emphasis on problem features. Although a few empirical studies have yielded evidence regarding the effectiveness of algorithm features, the potential benefits of incorporating algorithm features into algorithm selection models and their suitability for different scenarios remain unclear. In this paper, we address this gap by proposing the first provable guarantee for algorithm selection based on algorithm features, taking a generalization perspective. We analyze the benefits and costs associated with algorithm features and investigate how the generalization error is affected by different factors. Specifically, we examine adaptive and predefined algorithm features under transductive and inductive learning paradigms, respectively, and derive upper bounds for the generalization error based on their model's Rademacher complexity. Our theoretical findings not only provide tight upper bounds, but also offer analytical insights into the impact of various factors, such as the training scale of problem instances and candidate algorithms, model parameters, feature values, and distributional differences between the training and test data. Notably, we demonstrate how models will benefit from algorithm features in complex scenarios involving many algorithms, and proves the positive correlation between generalization error bound and $\chi^2$-divergence of distributions.

Updated: 2024-06-03 12:55:58

标题: 解锁算法特征的力量：算法选择的一般化分析

摘要: 在算法选择研究中，围绕算法特征的讨论在很大程度上被对问题特征的强调所掩盖。尽管一些实证研究已经证明了算法特征的有效性，但将算法特征纳入算法选择模型并确定其适用于不同场景的潜在收益仍然不清楚。本文通过提出基于算法特征的算法选择的第一个可证明保证，从概括性的角度来解决这一问题。我们分析了与算法特征相关的收益和成本，并研究了不同因素对概括错误的影响。具体来说，我们分别在传导式和归纳式学习范式下研究了自适应和预定义的算法特征，并基于它们的模型Rademacher复杂性得出了概括错误的上界。我们的理论发现不仅提供了紧密的上界，还提供了关于各种因素的影响的分析见解，如问题实例和候选算法的训练规模、模型参数、特征值以及训练和测试数据之间的分布差异。值得注意的是，我们展示了在涉及许多算法的复杂场景中，模型将如何从算法特征中受益，并证明了概括错误边界与分布的$\chi^2$-散度之间的正相关性。

更新时间: 2024-06-03 12:55:58

领域: cs.LG

下载: http://arxiv.org/abs/2405.11349v2

Large Language Models as Recommender Systems: A Study of Popularity Bias

The issue of popularity bias -- where popular items are disproportionately recommended, overshadowing less popular but potentially relevant items -- remains a significant challenge in recommender systems. Recent advancements have seen the integration of general-purpose Large Language Models (LLMs) into the architecture of such systems. This integration raises concerns that it might exacerbate popularity bias, given that the LLM's training data is likely dominated by popular items. However, it simultaneously presents a novel opportunity to address the bias via prompt tuning. Our study explores this dichotomy, examining whether LLMs contribute to or can alleviate popularity bias in recommender systems. We introduce a principled way to measure popularity bias by discussing existing metrics and proposing a novel metric that fulfills a series of desiderata. Based on our new metric, we compare a simple LLM-based recommender to traditional recommender systems on a movie recommendation task. We find that the LLM recommender exhibits less popularity bias, even without any explicit mitigation.

Updated: 2024-06-03 12:53:37

标题: 大型语言模型作为推荐系统：流行度偏见的研究

摘要: 人们对流行度偏见的问题——即流行物品被不成比例地推荐，而潜在相关性更高的不太流行的物品被忽视——在推荐系统中仍然是一个重要挑战。最近的进展已经看到通用的大型语言模型（LLMs）被整合到这些系统的架构中。这种整合引发了担忧，因为LLM的训练数据很可能被流行物品所主导，这可能会加剧流行度偏见。然而，它同时提供了通过提示调整来解决偏见的新机会。我们的研究探讨了这种二元对立，检查LLMs是否对推荐系统中的流行度偏见有所贡献或能够减轻。我们介绍了一种衡量流行度偏见的原则性方法，讨论现有的指标并提出一个满足一系列期望的新指标。基于我们的新指标，我们在电影推荐任务上比较了一个简单的基于LLM的推荐系统和传统的推荐系统。我们发现，即使没有任何明确的减轻措施，LLM推荐系统表现出更少的流行度偏见。

更新时间: 2024-06-03 12:53:37

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.01285v1

Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification

Transformer-based models have achieved dominant performance in numerous NLP tasks. Despite their remarkable successes, pre-trained transformers such as BERT suffer from a computationally expensive self-attention mechanism that interacts with all tokens, including the ones unfavorable to classification performance. To overcome these challenges, we propose integrating two strategies: token pruning and token combining. Token pruning eliminates less important tokens in the attention mechanism's key and value as they pass through the layers. Additionally, we adopt fuzzy logic to handle uncertainty and alleviate potential mispruning risks arising from an imbalanced distribution of each token's importance. Token combining, on the other hand, condenses input sequences into smaller sizes in order to further compress the model. By integrating these two approaches, we not only improve the model's performance but also reduce its computational demands. Experiments with various datasets demonstrate superior performance compared to baseline models, especially with the best improvement over the existing BERT model, achieving +5%p in accuracy and +5.6%p in F1 score. Additionally, memory cost is reduced to 0.61x, and a speedup of 1.64x is achieved.

Updated: 2024-06-03 12:51:52

标题: 聚焦核心：通过修剪标记压缩实现文档分类的高效注意力

摘要: 基于Transformer的模型在许多自然语言处理任务中取得了主导性能。尽管它们取得了显著的成功，例如BERT这样的预训练transformers存在一个计算上昂贵的自注意机制，与所有标记进行交互，包括那些对分类性能不利的标记。为了克服这些挑战，我们提出了整合两种策略：标记修剪和标记合并。标记修剪消除了在通过层时在注意机制的键和值中不太重要的标记。此外，我们采用模糊逻辑来处理不确定性，并减轻由于每个标记的重要性分布不平衡而产生的潜在的错误修剪风险。另一方面，标记合并将输入序列压缩成较小的尺寸，以进一步压缩模型。通过整合这两种方法，我们不仅改善了模型的性能，还减少了计算需求。对各种数据集的实验表明，与基准模型相比，我们的性能更优，特别是在现有BERT模型上取得了最佳改进，准确率提高了+5%，F1得分提高了+5.6%。此外，内存成本降低至0.61倍，速度提升了1.64倍。

更新时间: 2024-06-03 12:51:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.01283v1

Continuous Geometry-Aware Graph Diffusion via Hyperbolic Neural PDE

While Hyperbolic Graph Neural Network (HGNN) has recently emerged as a powerful tool dealing with hierarchical graph data, the limitations of scalability and efficiency hinder itself from generalizing to deep models. In this paper, by envisioning depth as a continuous-time embedding evolution, we decouple the HGNN and reframe the information propagation as a partial differential equation, letting node-wise attention undertake the role of diffusivity within the Hyperbolic Neural PDE (HPDE). By introducing theoretical principles \textit{e.g.,} field and flow, gradient, divergence, and diffusivity on a non-Euclidean manifold for HPDE integration, we discuss both implicit and explicit discretization schemes to formulate numerical HPDE solvers. Further, we propose the Hyperbolic Graph Diffusion Equation (HGDE) -- a flexible vector flow function that can be integrated to obtain expressive hyperbolic node embeddings. By analyzing potential energy decay of embeddings, we demonstrate that HGDE is capable of modeling both low- and high-order proximity with the benefit of local-global diffusivity functions. Experiments on node classification and link prediction and image-text classification tasks verify the superiority of the proposed method, which consistently outperforms various competitive models by a significant margin.

Updated: 2024-06-03 12:50:58

标题: 连续几何感知图扩散通过双曲神经偏微分方程

摘要: 最近，双曲图神经网络（HGNN）作为处理层次图数据的强大工具已经出现，但可扩展性和效率的局限性阻碍了它泛化到深度模型。本文通过将深度视为连续时间嵌入演变，将HGNN解耦并将信息传播重新构建为偏微分方程，让节点注意力承担双曲神经PDE（HPDE）中的扩散性角色。通过在非欧几里德流形上引入理论原则，如场和流、梯度、散度和扩散性，来讨论用于HPDE集成的隐式和显式离散化方案，以制定数值HPDE求解器。此外，我们提出了双曲图扩散方程（HGDE）- 一种灵活的矢量流函数，可以集成以获得富有表现力的双曲节点嵌入。通过分析嵌入的潜在能量衰减，我们展示了HGDE能够模拟低阶和高阶接近性，并获益于局部-全局扩散性功能。对节点分类、链接预测和图像-文本分类任务的实验验证了所提出方法的优越性，其在各种竞争模型上均表现出显著优势。

更新时间: 2024-06-03 12:50:58

领域: cs.LG

下载: http://arxiv.org/abs/2406.01282v1

fruit-SALAD: A Style Aligned Artwork Dataset to reveal similarity perception in image embeddings

The notion of visual similarity is essential for computer vision, and in applications and studies revolving around vector embeddings of images. However, the scarcity of benchmark datasets poses a significant hurdle in exploring how these models perceive similarity. Here we introduce Style Aligned Artwork Datasets (SALADs), and an example of fruit-SALAD with 10,000 images of fruit depictions. This combined semantic category and style benchmark comprises 100 instances each of 10 easy-to-recognize fruit categories, across 10 easy distinguishable styles. Leveraging a systematic pipeline of generative image synthesis, this visually diverse yet balanced benchmark demonstrates salient differences in semantic category and style similarity weights across various computational models, including machine learning models, feature extraction algorithms, and complexity measures, as well as conceptual models for reference. This meticulously designed dataset offers a controlled and balanced platform for the comparative analysis of similarity perception. The SALAD framework allows the comparison of how these models perform semantic category and style recognition task to go beyond the level of anecdotal knowledge, making it robustly quantifiable and qualitatively interpretable.

Updated: 2024-06-03 12:47:48

标题: 水果沙拉：一种风格对齐的艺术品数据集，揭示图像嵌入中的相似性感知

摘要: 视觉相似性的概念对于计算机视觉以及围绕图像向量嵌入的应用和研究至关重要。然而，基准数据集的稀缺性在探索这些模型如何感知相似性方面构成了重要障碍。在这里，我们介绍了风格对齐艺术作品数据集（SALADs），以及一个包含10,000幅水果图像的水果-SALAD示例。这个结合了语义类别和风格的基准数据集包括10个易于识别的水果类别的每个100个实例，跨越了10种易于区分的风格。通过利用生成图像合成的系统化流程，这个视觉多样性但平衡的基准数据集展示了在各种计算模型（包括机器学习模型、特征提取算法和复杂性测量）以及概念模型中，语义类别和风格相似性权重的显著差异。这个精心设计的数据集为相似性感知的比较分析提供了一个可控且平衡的平台。SALAD框架允许比较这些模型在语义类别和风格识别任务中的表现，超越了单纯的经验知识水平，使其具有强大的量化和质性可解释性。

更新时间: 2024-06-03 12:47:48

领域: cs.CV,cs.AI,cs.CC,cs.LG

下载: http://arxiv.org/abs/2406.01278v1

ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation

Natural Language Generation (NLG) accepts input data in the form of images, videos, or text and generates corresponding natural language text as output. Existing NLG methods mainly adopt a supervised approach and rely heavily on coupled data-to-text pairs. However, for many targeted scenarios and for non-English languages, sufficient quantities of labeled data are often not available. To relax the dependency on labeled data of downstream tasks, we propose an intuitive and effective zero-shot learning framework, ZeroNLG, which can deal with multiple NLG tasks, including image-to-text (image captioning), video-to-text (video captioning), and text-to-text (neural machine translation), across English, Chinese, German, and French within a unified framework. ZeroNLG does not require any labeled downstream pairs for training. During training, ZeroNLG (i) projects different domains (across modalities and languages) to corresponding coordinates in a shared common latent space; (ii) bridges different domains by aligning their corresponding coordinates in this space; and (iii) builds an unsupervised multilingual auto-encoder to learn to generate text by reconstructing the input text given its coordinate in shared latent space. Consequently, during inference, based on the data-to-text pipeline, ZeroNLG can generate target sentences across different languages given the coordinate of input data in the common space. Within this unified framework, given visual (imaging or video) data as input, ZeroNLG can perform zero-shot visual captioning; given textual sentences as input, ZeroNLG can perform zero-shot machine translation. We present the results of extensive experiments on twelve NLG tasks, showing that, without using any labeled downstream pairs for training, ZeroNLG generates high-quality and believable outputs and significantly outperforms existing zero-shot methods.

Updated: 2024-06-03 12:47:12

标题: ZeroNLG：为零样本多模态和多语言自然语言生成对齐和自动编码领域

摘要: 自然语言生成（NLG）接受图像、视频或文本形式的输入数据，并生成相应的自然语言文本作为输出。现有的NLG方法主要采用监督方法，并且严重依赖于配对的数据到文本。然而，在许多特定情景和非英语语言中，通常缺乏足够数量的标记数据。为了减少对下游任务标记数据的依赖性，我们提出了一种直观且有效的零样本学习框架ZeroNLG，可以处理多个NLG任务，包括图像到文本（图像标题）、视频到文本（视频标题）和文本到文本（神经机器翻译），跨越英语、中文、德语和法语在一个统一的框架内。ZeroNLG在训练过程中不需要任何标记的下游配对。在训练过程中，ZeroNLG（i）将不同领域（跨模态和语言）投影到共享的潜在空间中的相应坐标；（ii）通过对齐这个空间中的相应坐标来连接不同领域；（iii）构建一个无监督的多语言自编码器，通过重构给定其在共享潜在空间中的坐标的输入文本来学习生成文本。因此，在推断过程中，基于数据到文本的流水线，ZeroNLG可以在不同语言中生成目标句子，给定输入数据在共同空间中的坐标。在这个统一的框架内，给定视觉（图像或视频）数据作为输入，ZeroNLG可以执行零样本视觉标题；给定文本句子作为输入，ZeroNLG可以执行零样本机器翻译。我们在十二个NLG任务上进行了大量实验，结果表明，ZeroNLG在没有使用任何标记的下游配对进行训练的情况下，生成高质量和可信度的输出，并明显优于现有的零样本方法。

更新时间: 2024-06-03 12:47:12

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2303.06458v3

Lifting Factor Graphs with Some Unknown Factors

Lifting exploits symmetries in probabilistic graphical models by using a representative for indistinguishable objects, allowing to carry out query answering more efficiently while maintaining exact answers. In this paper, we investigate how lifting enables us to perform probabilistic inference for factor graphs containing factors whose potentials are unknown. We introduce the Lifting Factor Graphs with Some Unknown Factors (LIFAGU) algorithm to identify symmetric subgraphs in a factor graph containing unknown factors, thereby enabling the transfer of known potentials to unknown potentials to ensure a well-defined semantics and allow for (lifted) probabilistic inference.

Updated: 2024-06-03 12:44:55

标题: 使用一些未知因素提升因子图

摘要: 提升利用概率图模型中的对称性，通过使用无法区分的对象的代表，从而更有效地进行查询回答，同时保持精确答案。在本文中，我们研究了如何利用提升技术来执行包含未知潜力因子的因子图的概率推理。我们引入了Lifting Factor Graphs with Some Unknown Factors (LIFAGU)算法，以识别包含未知因子的因子图中的对称子图，从而使得已知潜力转移到未知潜力，以确保明确定义的语义并允许进行（提升的）概率推理。

更新时间: 2024-06-03 12:44:55

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.01275v1

iMove: Exploring Bio-impedance Sensing for Fitness Activity Recognition

Automatic and precise fitness activity recognition can be beneficial in aspects from promoting a healthy lifestyle to personalized preventative healthcare. While IMUs are currently the prominent fitness tracking modality, through iMove, we show bio-impedence can help improve IMU-based fitness tracking through sensor fusion and contrastive learning.To evaluate our methods, we conducted an experiment including six upper body fitness activities performed by ten subjects over five days to collect synchronized data from bio-impedance across two wrists and IMU on the left wrist.The contrastive learning framework uses the two modalities to train a better IMU-only classification model, where bio-impedance is only required at the training phase, by which the average Macro F1 score with the input of a single IMU was improved by 3.22 \% reaching 84.71 \% compared to the 81.49 \% of the IMU baseline model. We have also shown how bio-impedance can improve human activity recognition (HAR) directly through sensor fusion, reaching an average Macro F1 score of 89.57 \% (two modalities required for both training and inference) even if Bio-impedance alone has an average macro F1 score of 75.36 \%, which is outperformed by IMU alone. In addition, similar results were obtained in an extended study on lower body fitness activity classification, demonstrating the generalisability of our approach.Our findings underscore the potential of sensor fusion and contrastive learning as valuable tools for advancing fitness activity recognition, with bio-impedance playing a pivotal role in augmenting the capabilities of IMU-based systems.

Updated: 2024-06-03 12:42:50

标题: iMove：探索生物阻抗传感用于健身活动识别

摘要: 自动和精确的健身活动识别可以在促进健康生活方式到个性化预防性医疗等方面带来益处。尽管IMU目前是主要的健身跟踪模式，通过iMove，我们展示生物阻抗可以通过传感器融合和对比学习来改善基于IMU的健身追踪。为了评估我们的方法，我们进行了一个实验，包括十名受试者在五天内进行的六项上半身健身活动，收集了来自两只手腕的生物阻抗和左手腕IMU的同步数据。对比学习框架利用这两种模式训练一个更好的仅IMU分类模型，其中生物阻抗仅在训练阶段被要求，通过这种方式，输入一个单一IMU的平均宏F1分数提高了3.22％，达到84.71％，而IMU基线模型为81.49％。我们还展示了生物阻抗如何通过传感器融合直接改善人体活动识别（HAR），达到了89.57％的平均宏F1分数（两种模式在训练和推断中都需要），即使生物阻抗单独的平均宏F1分数为75.36％，被IMU单独超越。此外，在下半身健身活动分类的扩展研究中获得了类似的结果，证明了我们方法的普适性。我们的发现强调了传感器融合和对比学习作为推进健身活动识别的有价值工具的潜力，生物阻抗在增强基于IMU系统能力方面发挥着关键作用。

更新时间: 2024-06-03 12:42:50

领域: eess.SP,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2402.09445v2

Expected Grad-CAM: Towards gradient faithfulness

Although input-gradients techniques have evolved to mitigate and tackle the challenges associated with gradients, modern gradient-weighted CAM approaches still rely on vanilla gradients, which are inherently susceptible to the saturation phenomena. Despite recent enhancements have incorporated counterfactual gradient strategies as a mitigating measure, these local explanation techniques still exhibit a lack of sensitivity to their baseline parameter. Our work proposes a gradient-weighted CAM augmentation that tackles both the saturation and sensitivity problem by reshaping the gradient computation, incorporating two well-established and provably approaches: Expected Gradients and kernel smoothing. By revisiting the original formulation as the smoothed expectation of the perturbed integrated gradients, one can concurrently construct more faithful, localized and robust explanations which minimize infidelity. Through fine modulation of the perturbation distribution it is possible to regulate the complexity characteristic of the explanation, selectively discriminating stable features. Our technique, Expected Grad-CAM, differently from recent works, exclusively optimizes the gradient computation, purposefully designed as an enhanced substitute of the foundational Grad-CAM algorithm and any method built therefrom. Quantitative and qualitative evaluations have been conducted to assess the effectiveness of our method.

Updated: 2024-06-03 12:40:30

标题: 预期的Grad-CAM: 朝向梯度忠实

摘要: 尽管输入梯度技术已经发展出来以减轻和应对梯度相关的挑战，但现代梯度加权CAM方法仍然依赖于基本梯度，这在本质上容易受到饱和现象的影响。尽管最近的增强措施已经纳入对抗性梯度策略作为一种缓解措施，但这些局部解释技术仍然表现出对基线参数缺乏敏感性的问题。我们的工作提出了一种梯度加权CAM增强方法，通过重新塑造梯度计算，结合了两种成熟且可证明的方法：期望梯度和核平滑。通过重新审视原始公式作为扰动集成梯度的平滑期望，可以同时构建更忠实、局部化和稳健的解释，从而最小化不忠实性。通过对扰动分布进行精细调节，可以调节解释的复杂性特征，有选择地辨别稳定特征。我们的技术Expected Grad-CAM与最近的研究不同，它专门优化梯度计算，旨在作为Grad-CAM算法和任何基于该算法构建的方法的增强替代品。已进行了定量和定性评估来评估我们方法的有效性。

更新时间: 2024-06-03 12:40:30

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.01274v1

iKAN: Global Incremental Learning with KAN for Human Activity Recognition Across Heterogeneous Datasets

This work proposes an incremental learning (IL) framework for wearable sensor human activity recognition (HAR) that tackles two challenges simultaneously: catastrophic forgetting and non-uniform inputs. The scalable framework, iKAN, pioneers IL with Kolmogorov-Arnold Networks (KAN) to replace multi-layer perceptrons as the classifier that leverages the local plasticity and global stability of splines. To adapt KAN for HAR, iKAN uses task-specific feature branches and a feature redistribution layer. Unlike existing IL methods that primarily adjust the output dimension or the number of classifier nodes to adapt to new tasks, iKAN focuses on expanding the feature extraction branches to accommodate new inputs from different sensor modalities while maintaining consistent dimensions and the number of classifier outputs. Continual learning across six public HAR datasets demonstrated the iKAN framework's incremental learning performance, with a last performance of 84.9\% (weighted F1 score) and an average incremental performance of 81.34\%, which significantly outperforms the two existing incremental learning methods, such as EWC (51.42\%) and experience replay (59.92\%).

Updated: 2024-06-03 12:33:27

标题: iKAN：利用KAN进行跨异构数据集的全局增量学习，用于人类活动识别

摘要: 这项工作提出了一种增量学习（IL）框架，用于可穿戴传感器人体活动识别（HAR），同时解决了两个挑战：灾难性遗忘和非均匀输入。可扩展的框架iKAN，开创性地利用科尔莫哥洛夫-阿诺德网络（KAN）进行IL，以取代多层感知器作为利用样条局部可塑性和全局稳定性的分类器。为了适应HAR，iKAN使用任务特定的特征分支和特征重新分配层对KAN进行调整。与现有IL方法不同，这些方法主要调整输出维度或分类器节点数量以适应新任务，iKAN侧重于扩展特征提取分支以适应来自不同传感器模态的新输入，同时保持一致的维度和分类器输出数量。对六个公共HAR数据集进行的持续学习表明，iKAN框架的增量学习性能，最终性能为84.9％（加权F1分数），平均增量性能为81.34％，明显优于两种现有的增量学习方法，如EWC（51.42％）和经验重放（59.92％）。

更新时间: 2024-06-03 12:33:27

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2406.01646v1

FNP: Fourier Neural Processes for Arbitrary-Resolution Data Assimilation

Data assimilation is a vital component in modern global medium-range weather forecasting systems to obtain the best estimation of the atmospheric state by combining the short-term forecast and observations. Recently, AI-based data assimilation approaches have attracted increasing attention for their significant advantages over traditional techniques in terms of computational consumption. However, existing AI-based data assimilation methods can only handle observations with a specific resolution, lacking the compatibility and generalization ability to assimilate observations with other resolutions. Considering that complex real-world observations often have different resolutions, we propose the \textit{\textbf{Fourier Neural Processes}} (FNP) for \textit{arbitrary-resolution data assimilation} in this paper. Leveraging the efficiency of the designed modules and flexible structure of neural processes, FNP achieves state-of-the-art results in assimilating observations with varying resolutions, and also exhibits increasing advantages over the counterparts as the resolution and the amount of observations increase. Moreover, our FNP trained on a fixed resolution can directly handle the assimilation of observations with out-of-distribution resolutions and the observational information reconstruction task without additional fine-tuning, demonstrating its excellent generalization ability across data resolutions as well as across tasks.

Updated: 2024-06-03 12:24:24

标题: FNP: 傅里叶神经过程用于任意分辨率的数据同化

摘要: 数据同化是现代全球中期天气预报系统中的一个重要组成部分，通过结合短期预报和观测数据来获得大气状态的最佳估计。最近，基于人工智能的数据同化方法因其在计算消耗方面相对传统技术的显著优势而引起了人们的越来越多的关注。然而，现有的基于人工智能的数据同化方法只能处理具有特定分辨率的观测数据，缺乏与其他分辨率观测数据同化的兼容性和泛化能力。考虑到复杂的现实观测数据通常具有不同的分辨率，本文提出了\textit{\textbf{Fourier神经过程}}（FNP）用于\textit{任意分辨率数据同化}。利用设计模块的效率和神经过程的灵活结构，FNP在同化具有不同分辨率的观测数据方面取得了最新的成果，并且随着分辨率和观测数据量的增加，也表现出与同行相比的优势。此外，我们训练的FNP在固定分辨率上可以直接处理具有分布外分辨率的观测数据同化以及观测信息重建任务，无需额外的微调，展示了其在数据分辨率以及任务之间的出色泛化能力。

更新时间: 2024-06-03 12:24:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01645v1

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

As large language models (LLMs) continue to grow by scaling laws, reinforcement learning from human feedback (RLHF) has gained significant attention due to its outstanding performance. However, unlike pretraining or fine-tuning a single model, scaling reinforcement learning from human feedback (RLHF) for training large language models poses coordination challenges across four models. We present OpenRLHF, an open-source framework enabling efficient RLHF scaling. Unlike existing RLHF frameworks that co-locate four models on the same GPUs, OpenRLHF re-designs scheduling for the models beyond 70B parameters using Ray, vLLM, and DeepSpeed, leveraging improved resource utilization and diverse training approaches. Integrating seamlessly with Hugging Face, OpenRLHF provides an out-of-the-box solution with optimized algorithms and launch scripts, which ensures user-friendliness. OpenRLHF implements RLHF, DPO, rejection sampling, and other alignment techniques. Empowering state-of-the-art LLM development, OpenRLHF's code is available at https://github.com/OpenLLMAI/OpenRLHF.

Updated: 2024-06-03 12:19:18

标题: OpenRLHF：一个易于使用、可扩展和高性能的RLHF框架

摘要: 随着大型语言模型（LLMs）按照扩展规律不断增长，基于人类反馈的强化学习（RLHF）因其出色的性能而受到重视。然而，与预训练或微调单一模型不同，对于训练大型语言模型来说，扩展基于人类反馈的强化学习（RLHF）存在着协调四个模型之间的挑战。我们提出了OpenRLHF，这是一个开源框架，可以实现高效的RLHF扩展。与现有的RLHF框架不同，OpenRLHF通过使用Ray、vLLM和DeepSpeed重新设计模型的调度，实现了超过70B参数的性能提升和多样化的训练方法。OpenRLHF与Hugging Face无缝集成，提供了一个优化算法和启动脚本的即插即用解决方案，确保用户友好性。OpenRLHF实现了RLHF、DPO、拒绝抽样和其他对齐技术。OpenRLHF的代码可在https://github.com/OpenLLMAI/OpenRLHF 上找到，为最先进的LLM开发提供支持。

更新时间: 2024-06-03 12:19:18

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.11143v2

Fundamental Limitations of Alignment in Large Language Models

An important aspect in developing language models that interact with humans is aligning their behavior to be useful and unharmful for their human users. This is usually achieved by tuning the model in a way that enhances desired behaviors and inhibits undesired ones, a process referred to as alignment. In this paper, we propose a theoretical approach called Behavior Expectation Bounds (BEB) which allows us to formally investigate several inherent characteristics and limitations of alignment in large language models. Importantly, we prove that within the limits of this framework, for any behavior that has a finite probability of being exhibited by the model, there exist prompts that can trigger the model into outputting this behavior, with probability that increases with the length of the prompt. This implies that any alignment process that attenuates an undesired behavior but does not remove it altogether, is not safe against adversarial prompting attacks. Furthermore, our framework hints at the mechanism by which leading alignment approaches such as reinforcement learning from human feedback make the LLM prone to being prompted into the undesired behaviors. This theoretical result is being experimentally demonstrated in large scale by the so called contemporary "chatGPT jailbreaks", where adversarial users trick the LLM into breaking its alignment guardrails by triggering it into acting as a malicious persona. Our results expose fundamental limitations in alignment of LLMs and bring to the forefront the need to devise reliable mechanisms for ensuring AI safety.

Updated: 2024-06-03 12:19:16

标题: 大型语言模型中对齐的基本限制

摘要: 在开发与人类互动的语言模型中，一个重要方面是调整它们的行为，使其对人类用户有用且无害。通常通过调整模型以增强期望行为和抑制不期望行为的方式来实现，这个过程被称为对齐。在本文中，我们提出了一种理论方法，称为行为期望边界（BEB），它允许我们正式调查大型语言模型对齐的若干固有特征和局限性。重要的是，我们证明在这一框架的限制内，对于模型具有有限概率展示的任何行为，都存在可以触发模型输出这种行为的提示，其概率随提示长度的增加而增加。这意味着任何减弱不期望行为但没有完全消除它的对齐过程，都无法抵御对抗性提示攻击。此外，我们的框架暗示了引领对齐方法（如从人类反馈中进行强化学习）使大型语言模型容易被触发进入不期望行为的机制。这一理论结果正在通过所谓的当代“chatGPT越狱”大规模实验中得到证实，对抗性用户通过触发其扮演恶意角色来破坏其对齐防护栏。我们的结果揭示了语言模型对齐的基本限制，并突显了制定可靠机制以确保人工智能安全的必要性。

更新时间: 2024-06-03 12:19:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2304.11082v6

SCALLER: Standard Cell Assembled and Local Layout Effect-based Ring Oscillators

This letter presents a technique that enables very fine tunability of the frequency of Ring Oscillators (ROs). Multiple ROs with different numbers of tunable elements were designed and fabricated in a 65nm CMOS technology. A tunable element consists of two inverters under different local layout effects (LLEs) and a multiplexer. LLEs impact the transient response of inverters deterministically and allow to establish a fine tunable mechanism even in the presence of large process variation. The entire RO is digital and its layout is standard-cell compatible. We demonstrate the tunability of multi-stage ROs with post-silicon measurements of oscillation frequencies in the range of 80-900MHz and tuning steps of 90KHz

Updated: 2024-06-03 12:14:51

标题: SCALLER: 标准单元组装和基于局部布局效应的环形振荡器

摘要: 这封信介绍了一种技术，可以实现环振荡器（ROs）频率的非常精细调节。在65纳米的CMOS技术中设计和制造了具有不同可调元件数量的多个ROs。可调元件由两个受不同局部布局效应（LLEs）影响的反相器和一个多路复用器组成。LLEs决定性地影响反相器的瞬态响应，并允许在大的工艺变化存在的情况下建立一个精细调节机制。整个RO是数字化的，其布局与标准单元兼容。我们通过后硅测量演示了多级ROs的可调性，振荡频率范围为80-900MHz，调谐步长为90KHz。

更新时间: 2024-06-03 12:14:51

领域: cs.CR

下载: http://arxiv.org/abs/2406.01258v1

What makes unlearning hard and what to do about it

Machine unlearning is the problem of removing the effect of a subset of training data (the ''forget set'') from a trained model without damaging the model's utility e.g. to comply with users' requests to delete their data, or remove mislabeled, poisoned or otherwise problematic data. With unlearning research still being at its infancy, many fundamental open questions exist: Are there interpretable characteristics of forget sets that substantially affect the difficulty of the problem? How do these characteristics affect different state-of-the-art algorithms? With this paper, we present the first investigation aiming to answer these questions. We identify two key factors affecting unlearning difficulty and the performance of unlearning algorithms. Evaluation on forget sets that isolate these identified factors reveals previously-unknown behaviours of state-of-the-art algorithms that don't materialize on random forget sets. Based on our insights, we develop a framework coined Refined-Unlearning Meta-algorithm (RUM) that encompasses: (i) refining the forget set into homogenized subsets, according to different characteristics; and (ii) a meta-algorithm that employs existing algorithms to unlearn each subset and finally delivers a model that has unlearned the overall forget set. We find that RUM substantially improves top-performing unlearning algorithms. Overall, we view our work as an important step in (i) deepening our scientific understanding of unlearning and (ii) revealing new pathways to improving the state-of-the-art.

Updated: 2024-06-03 12:14:47

标题: 如何解决取消学习的困难问题

摘要: 机器遗忘是指从经过训练的模型中删除一部分训练数据（“遗忘集”）的影响，而不损害模型的效用，例如遵守用户要求删除其数据，或移除错误标记、受污染或其他问题数据。随着遗忘研究仍处于起步阶段，许多基本的未解决问题存在：是否存在可解释的遗忘集特征，会显著影响问题的难度？这些特征如何影响不同的最先进算法？本文旨在回答这些问题，我们首次进行调查。我们确定了影响遗忘难度和遗忘算法性能的两个关键因素。对能够隔离这些确定因素的遗忘集进行评估，揭示了以往未知的最先进算法的行为，这些行为在随机遗忘集上并未显现。基于我们的见解，我们开发了一个名为"精细遗忘元算法"（RUM）的框架，包括：（i）根据不同特征将遗忘集细分为同质子集；以及（ii）一个元算法，利用现有算法来遗忘每个子集，最终交付一个已经遗忘了整体遗忘集的模型。我们发现，RUM显著改进了表现最优的遗忘算法。总的来说，我们认为我们的工作是（i）深化我们对遗忘科学的理解，以及（ii）揭示改进最先进技术的新途径的重要一步。

更新时间: 2024-06-03 12:14:47

领域: cs.LG

下载: http://arxiv.org/abs/2406.01257v1

Graph Language Models

While Language Models (LMs) are the workhorses of NLP, their interplay with structured knowledge graphs (KGs) is still actively researched. Current methods for encoding such graphs typically either (i) linearize them for embedding with LMs -- which underutilize structural information, or (ii) use Graph Neural Networks (GNNs) to preserve the graph structure -- but GNNs cannot represent text features as well as pretrained LMs. In our work we introduce a novel LM type, the Graph Language Model (GLM), that integrates the strengths of both approaches and mitigates their weaknesses. The GLM parameters are initialized from a pretrained LM to enhance understanding of individual graph concepts and triplets. Simultaneously, we design the GLM's architecture to incorporate graph biases, thereby promoting effective knowledge distribution within the graph. This enables GLMs to process graphs, texts, and interleaved inputs of both. Empirical evaluations on relation classification tasks show that GLM embeddings surpass both LM- and GNN-based baselines in supervised and zero-shot setting, demonstrating their versatility.

Updated: 2024-06-03 12:14:34

标题: 图语言模型

摘要: 虽然语言模型（LMs）是自然语言处理中的主力军，但它们与结构化知识图（KGs）的相互作用仍在积极研究中。目前对于编码这种图的方法通常要么（i）将它们线性化以进行LM嵌入 - 这种方法利用结构信息不足，要么（ii）使用图神经网络（GNNs）来保持图结构 - 但GNNs无法像预训练LMs那样表示文本特征。在我们的研究中，我们引入了一种新型LM类型，即图语言模型（GLM），它融合了这两种方法的优势并减轻了它们的弱点。GLM参数从预训练的LM中初始化，以增强对单个图概念和三元组的理解。同时，我们设计GLM的架构以整合图偏差，从而促进图内有效知识分布。这使得GLMs能够处理图、文本以及两者交织的输入。在关系分类任务上的实证评估显示，GLM嵌入在监督和零样本设置中均优于基于LM和GNN的基线，展示了它们的多功能性。

更新时间: 2024-06-03 12:14:34

领域: cs.CL,cs.AI,cs.LG,I.2.0; I.2.4; I.2.7

下载: http://arxiv.org/abs/2401.07105v3

Augmented Commonsense Knowledge for Remote Object Grounding

The vision-and-language navigation (VLN) task necessitates an agent to perceive the surroundings, follow natural language instructions, and act in photo-realistic unseen environments. Most of the existing methods employ the entire image or object features to represent navigable viewpoints. However, these representations are insufficient for proper action prediction, especially for the REVERIE task, which uses concise high-level instructions, such as ''Bring me the blue cushion in the master bedroom''. To address enhancing representation, we propose an augmented commonsense knowledge model (ACK) to leverage commonsense information as a spatio-temporal knowledge graph for improving agent navigation. Specifically, the proposed approach involves constructing a knowledge base by retrieving commonsense information from ConceptNet, followed by a refinement module to remove noisy and irrelevant knowledge. We further present ACK which consists of knowledge graph-aware cross-modal and concept aggregation modules to enhance visual representation and visual-textual data alignment by integrating visible objects, commonsense knowledge, and concept history, which includes object and knowledge temporal information. Moreover, we add a new pipeline for the commonsense-based decision-making process which leads to more accurate local action prediction. Experimental results demonstrate our proposed model noticeably outperforms the baseline and archives the state-of-the-art on the REVERIE benchmark.

Updated: 2024-06-03 12:12:33

标题: 增强型常识知识用于远程物体定位

摘要: 视觉与语言导航（VLN）任务需要一个智能体感知周围环境，遵循自然语言指令并在逼真的未知环境中行动。大多数现有方法利用整个图像或物体特征来表示可导航的视角。然而，这些表示对于正确的动作预测是不足的，特别是对于使用简洁高级指令的REVERIE任务，例如“把主卧室里的蓝色靠垫拿给我”。为了解决增强表示的问题，我们提出了一个增强的常识知识模型（ACK），利用常识信息作为一个时空知识图，以改进智能体的导航。具体而言，所提出的方法涉及通过从ConceptNet检索常识信息构建知识库，然后通过一个精炼模块去除嘈杂和无关的知识。我们进一步提出了ACK，它包括知识图感知跨模态和概念聚合模块，通过整合可见物体、常识知识和概念历史（包括物体和知识的时间信息）来增强视觉表示和视觉文本数据对齐。此外，我们添加了一个基于常识的决策流程，从而实现更准确的本地动作预测。实验结果表明，我们提出的模型明显优于基准线，并在REVERIE基准测试中达到了最先进水平。

更新时间: 2024-06-03 12:12:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.01256v1

Particle identification with machine learning from incomplete data in the ALICE experiment

The ALICE experiment at the LHC measures properties of the strongly interacting matter formed in ultrarelativistic heavy-ion collisions. Such studies require accurate particle identification (PID). ALICE provides PID information via several detectors for particles with momentum from about 100 MeV/c up to 20 GeV/c. Traditionally, particles are selected with rectangular cuts. A much better performance can be achieved with machine learning (ML) methods. Our solution uses multiple neural networks (NN) serving as binary classifiers. Moreover, we extended our particle classifier with Feature Set Embedding and attention in order to train on data with incomplete samples. We also present the integration of the ML project with the ALICE analysis software, and we discuss domain adaptation, the ML technique needed to transfer the knowledge between simulated and real experimental data.

Updated: 2024-06-03 12:12:25

标题: ALICE实验中利用机器学习从不完整数据中识别粒子

摘要: ALICE实验在LHC上测量超相对论重离子碰撞中形成的强相互作用物质的性质。这类研究需要准确的粒子识别（PID）。ALICE通过几个探测器为动量从约100 MeV/c到20 GeV/c的粒子提供PID信息。传统上，粒子是通过矩形切割来选择的。采用机器学习（ML）方法可以实现更好的性能。我们的解决方案使用多个神经网络（NN）作为二元分类器。此外，我们通过特征集嵌入和注意力扩展了我们的粒子分类器，以便在数据缺失样本的情况下进行训练。我们还介绍了ML项目与ALICE分析软件的集成，并讨论了领域自适应，这是在模拟和真实实验数据之间转移知识所需的ML技术。

更新时间: 2024-06-03 12:12:25

领域: hep-ex,cs.LG

下载: http://arxiv.org/abs/2403.17436v2

FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis

The scarcity of well-annotated medical datasets requires leveraging transfer learning from broader datasets like ImageNet or pre-trained models like CLIP. Model soups averages multiple fine-tuned models aiming to improve performance on In-Domain (ID) tasks and enhance robustness against Out-of-Distribution (OOD) datasets. However, applying these methods to the medical imaging domain faces challenges and results in suboptimal performance. This is primarily due to differences in error surface characteristics that stem from data complexities such as heterogeneity, domain shift, class imbalance, and distributional shifts between training and testing phases. To address this issue, we propose a hierarchical merging approach that involves local and global aggregation of models at various levels based on models' hyperparameter configurations. Furthermore, to alleviate the need for training a large number of models in the hyperparameter search, we introduce a computationally efficient method using a cyclical learning rate scheduler to produce multiple models for aggregation in the weight space. Our method demonstrates significant improvements over the model souping approach across multiple datasets (around 6% gain in HAM10000 and CheXpert datasets) while maintaining low computational costs for model generation and selection. Moreover, we achieve better results on OOD datasets than model soups. The code is available at https://github.com/BioMedIA-MBZUAI/FissionFusion.

Updated: 2024-06-03 12:11:52

标题: 裂变融合：用于医学图像分析的快速几何生成和分层合并

摘要: 医学数据集的稀缺性需要利用来自更广泛数据集（如ImageNet）或预训练模型（如CLIP）的迁移学习。模型混合是将多个微调模型平均化，旨在提高在域内（ID）任务上的性能，并增强对域外分布（OOD）数据集的鲁棒性。然而，将这些方法应用于医学成像领域面临挑战，并导致次优性能。这主要是由于错误表面特征的差异，这些特征源于数据复杂性，如异质性、域转移、类别不平衡以及训练和测试阶段之间的分布变化。为了解决这个问题，我们提出了一种分层合并方法，涉及根据模型的超参数配置在不同级别对模型进行局部和全局聚合。此外，为了减轻在超参数搜索中训练大量模型的需求，我们引入了一种使用循环学习率调度程序的计算效率方法，以在权重空间中生成多个模型用于聚合。我们的方法在多个数据集上显示出显著的改进（HAM10000和CheXpert数据集分别约为6％的增益），同时保持较低的计算成本用于模型生成和选择。此外，我们在OOD数据集上取得比模型混合更好的结果。该代码可在https://github.com/BioMedIA-MBZUAI/FissionFusion 上找到。

更新时间: 2024-06-03 12:11:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.13341v2

On the Nonlinearity of Layer Normalization

Layer normalization (LN) is a ubiquitous technique in deep learning but our theoretical understanding to it remains elusive. This paper investigates a new theoretical direction for LN, regarding to its nonlinearity and representation capacity. We investigate the representation capacity of a network with layerwise composition of linear and LN transformations, referred to as LN-Net. We theoretically show that, given $m$ samples with any label assignment, an LN-Net with only 3 neurons in each layer and $O(m)$ LN layers can correctly classify them. We further show the lower bound of the VC dimension of an LN-Net. The nonlinearity of LN can be amplified by group partition, which is also theoretically demonstrated with mild assumption and empirically supported by our experiments. Based on our analyses, we consider to design neural architecture by exploiting and amplifying the nonlinearity of LN, and the effectiveness is supported by our experiments.

Updated: 2024-06-03 12:11:34

标题: 关于层归一化的非线性性

摘要: Layer normalization（LN）是深度学习中一种普遍的技术，但我们对其理论理解仍然很模糊。本文探讨了LN的一个新的理论方向，关于其非线性和表示能力。我们研究了一个网络的表示能力，该网络通过线性和LN变换逐层组合，称为LN-Net。我们在理论上证明，对于任何标签分配的$m$个样本，每层仅有3个神经元和$O(m)$个LN层的LN-Net可以正确分类它们。我们进一步展示了LN-Net的VC维的下界。LN的非线性可以通过组分割放大，这也在理论上得到了证明，并且通过我们的实验证明了。根据我们的分析，我们考虑通过利用和放大LN的非线性来设计神经架构，并且我们的实验证明了其有效性。

更新时间: 2024-06-03 12:11:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01255v1

animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics

Bioacoustic research provides invaluable insights into the behavior, ecology, and conservation of animals. Most bioacoustic datasets consist of long recordings where events of interest, such as vocalizations, are exceedingly rare. Analyzing these datasets poses a monumental challenge to researchers, where deep learning techniques have emerged as a standard method. Their adaptation remains challenging, focusing on models conceived for computer vision, where the audio waveforms are engineered into spectrographic representations for training and inference. We improve the current state of deep learning in bioacoustics in two ways: First, we present the animal2vec framework: a fully interpretable transformer model and self-supervised training scheme tailored for sparse and unbalanced bioacoustic data. Second, we openly publish MeerKAT: Meerkat Kalahari Audio Transcripts, a large-scale dataset containing audio collected via biologgers deployed on free-ranging meerkats with a length of over 1068h, of which 184h have twelve time-resolved vocalization-type classes, each with ms-resolution, making it the largest publicly-available labeled dataset on terrestrial mammals. Further, we benchmark animal2vec against the NIPS4Bplus birdsong dataset. We report new state-of-the-art results on both datasets and evaluate the few-shot capabilities of animal2vec of labeled training data. Finally, we perform ablation studies to highlight the differences between our architecture and a vanilla transformer baseline for human-produced sounds. animal2vec allows researchers to classify massive amounts of sparse bioacoustic data even with little ground truth information available. In addition, the MeerKAT dataset is the first large-scale, millisecond-resolution corpus for benchmarking bioacoustic models in the pretrain/finetune paradigm. We believe this sets the stage for a new reference point for bioacoustics.

Updated: 2024-06-03 12:11:01

标题: animal2vec和MeerKAT：用于罕见事件原始音频输入的自监督变换器和生物声学大规模参考数据集

摘要: 生物声学研究为动物的行为、生态和保护提供了宝贵的见解。大多数生物声学数据集包括长时间的记录，其中感兴趣的事件，如鸣叫声，非常罕见。分析这些数据集对研究人员构成了巨大挑战，深度学习技术已成为一种标准方法。它们的适应性仍然具有挑战性，侧重于为计算机视觉构思的模型，其中音频波形被设计为用于训练和推理的频谱图表示。我们以两种方式改进了生物声学中深度学习的当前状态：首先，我们提出了animal2vec框架：一个完全可解释的变压器模型和自监督训练方案，专为稀疏和不平衡的生物声学数据量身定制。其次，我们公开发布了MeerKAT：卡拉哈里狐獴音频转录，这是一个大规模数据集，包含通过部署在自由放养的狐獴身上的生物记录仪收集的音频，总时长超过1068小时，其中184小时有十二种时间分辨率的鸣叫类型类别，每种类别都有毫秒级分辨率，使其成为陆生哺乳动物中最大的公开可用标记数据集。此外，我们将animal2vec与NIPS4Bplus鸟鸣数据集进行了基准测试。我们报告了两个数据集上的最新技术成果，并评估了animal2vec在有标记训练数据的少样本学习能力。最后，我们进行消融研究，以突出我们的架构与基准的简单变压器模型之间的差异，用于人类制作的声音。animal2vec使研究人员能够对大量稀疏的生物声学数据进行分类，即使只有少量地面真实信息可用。此外，MeerKAT数据集是用于在预训练/微调范式中进行生物声学模型基准测试的第一个大规模、毫秒级分辨率的语料库。我们相信这为生物声学设立了一个新的参考点。

更新时间: 2024-06-03 12:11:01

领域: cs.SD,cs.AI,eess.AS,q-bio.QM,stat.AP

下载: http://arxiv.org/abs/2406.01253v1

Towards Scalable Automated Alignment of LLMs: A Survey

Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approaches. In this paper, we systematically review the recently emerging methods of automated alignment, attempting to explore how to achieve effective, scalable, automated alignment once the capabilities of LLMs exceed those of humans. Specifically, we categorize existing automated alignment methods into 4 major categories based on the sources of alignment signals and discuss the current status and potential development of each category. Additionally, we explore the underlying mechanisms that enable automated alignment and discuss the essential factors that make automated alignment technologies feasible and effective from the fundamental role of alignment.

Updated: 2024-06-03 12:10:26

标题: 朝向可扩展的LLM自动对齐：一项调查

摘要: 对于构建满足人类需求的大型语言模型（LLMs）来说，对齐是最关键的步骤。随着LLMs的快速发展逐渐超越人类能力，基于人工标注的传统对齐方法越来越无法满足可扩展性需求。因此，迫切需要探索新的自动对齐信号和技术方法的来源。本文系统地回顾了最近出现的自动对齐方法，试图探讨一旦LLMs的能力超过人类时如何实现有效、可扩展的自动对齐。具体而言，我们根据对齐信号的来源将现有自动对齐方法分为4个主要类别，并讨论每个类别的当前状态和潜在发展。此外，我们探讨了使自动对齐成为可能且有效的基本因素，并从对齐的基本作用出发讨论了实现自动对齐的潜在机制。

更新时间: 2024-06-03 12:10:26

领域: cs.CL,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.01252v1

ProtoGate: Prototype-based Neural Networks with Global-to-local Feature Selection for Tabular Biomedical Data

Tabular biomedical data poses challenges in machine learning because it is often high-dimensional and typically low-sample-size (HDLSS). Previous research has attempted to address these challenges via local feature selection, but existing approaches often fail to achieve optimal performance due to their limitation in identifying globally important features and their susceptibility to the co-adaptation problem. In this paper, we propose ProtoGate, a prototype-based neural model for feature selection on HDLSS data. ProtoGate first selects instance-wise features via adaptively balancing global and local feature selection. Furthermore, ProtoGate employs a non-parametric prototype-based prediction mechanism to tackle the co-adaptation problem, ensuring the feature selection results and predictions are consistent with underlying data clusters. We conduct comprehensive experiments to evaluate the performance and interpretability of ProtoGate on synthetic and real-world datasets. The results show that ProtoGate generally outperforms state-of-the-art methods in prediction accuracy by a clear margin while providing high-fidelity feature selection and explainable predictions. Code is available at https://github.com/SilenceX12138/ProtoGate.

Updated: 2024-06-03 12:09:10

标题: ProtoGate：基于原型的神经网络，具有全局到局部特征选择功能，用于表格化生物医学数据

摘要: 表格生物医学数据在机器学习中面临挑战，因为它通常是高维度的，且样本量通常较少（HDLSS）。先前的研究尝试通过局部特征选择来解决这些挑战，但现有方法往往由于无法识别全局重要特征以及容易受到协适应问题的限制而无法达到最佳性能。在本文中，我们提出ProtoGate，一个用于HDLSS数据特征选择的基于原型的神经模型。ProtoGate首先通过自适应地平衡全局和局部特征选择来选择基于实例的特征。此外，ProtoGate采用非参数化的基于原型的预测机制来解决协适应问题，确保特征选择结果和预测与基础数据集簇一致。我们进行了全面实验来评估ProtoGate在合成和真实世界数据集上的性能和可解释性。结果显示，ProtoGate通常在预测准确性方面明显优于最先进的方法，同时提供高保真度的特征选择和可解释的预测。代码可在https://github.com/SilenceX12138/ProtoGate上找到。

更新时间: 2024-06-03 12:09:10

领域: cs.LG

下载: http://arxiv.org/abs/2306.12330v2

DumpKV: Learning based lifetime aware garbage collection for key value separation in LSM-tree

Key\-value separation is used in LSM\-tree to stored large value in separate log files to reduce write amplification, but requires garbage collection to garbage collect invalid values. Existing garbage collection techniques in LSM\-tree typically adopt static parameter based garbage collection to garbage collect obsolete values which struggles to achieve low write amplification and it's challenging to find proper parameter for garbage collection triggering. In this work we introduce DumpKV, which introduces learning based lifetime aware garbage collection with dynamic lifetime adjustment to do efficient garbage collection to achieve lower write amplification. DumpKV manages large values using trained lightweight model with features suitable for various application based on past write access information of keys to give lifetime prediction for each individual key to enable efficient garbage collection. To reduce interference to write throughput DumpKV conducts feature collection during L0\-L1 compaction leveraging the fact that LSM\-tree is small under KV separation. Experimental results show that DumpKV achieves lower write amplification by 38\%\-73\% compared to existing key\-value separation garbage collection LSM\-tree stores with small feature storage overhead.

Updated: 2024-06-03 12:07:22

标题: DumpKV：LSM树中基于学习的生命周期感知垃圾收集的键值分离

摘要: 键值分离在LSM树中用于将大值存储在单独的日志文件中，以减少写放大，但需要进行垃圾回收以清除无效值。现有的LSM树中的垃圾回收技术通常采用基于静态参数的垃圾回收，用于清除过时值，但难以实现低写放大，并且难以找到适当的参数来触发垃圾回收。在本研究中，我们介绍了DumpKV，它引入了基于学习的寿命感知垃圾回收，具有动态寿命调整，以进行高效的垃圾回收，实现更低的写放大。DumpKV使用经过训练的轻量级模型管理大值，其特征适用于各种基于过去键的写访问信息的应用，并为每个单独的键提供生命周期预测，以实现有效的垃圾回收。为了减少对写吞吐量的干扰，DumpKV在L0-L1压缩期间进行特征收集，利用LSM树在KV分离下较小的特性。实验结果表明，与现有的键值分离垃圾回收LSM树相比，DumpKV实现了38%-73%的写放大降低，并具有较小的特征存储开销。

更新时间: 2024-06-03 12:07:22

领域: cs.DB,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.01250v1

Equivariant Machine Learning on Graphs with Nonlinear Spectral Filters

Equivariant machine learning is an approach for designing deep learning models that respect the symmetries of the problem, with the aim of reducing model complexity and improving generalization. In this paper, we focus on an extension of shift equivariance, which is the basis of convolution networks on images, to general graphs. Unlike images, graphs do not have a natural notion of domain translation. Therefore, we consider the graph functional shifts as the symmetry group: the unitary operators that commute with the graph shift operator. Notably, such symmetries operate in the signal space rather than directly in the spatial space. We remark that each linear filter layer of a standard spectral graph neural network (GNN) commutes with graph functional shifts, but the activation function breaks this symmetry. Instead, we propose nonlinear spectral filters (NLSFs) that are fully equivariant to graph functional shifts and show that they have universal approximation properties. The proposed NLSFs are based on a new form of spectral domain that is transferable between graphs. We demonstrate the superior performance of NLSFs over existing spectral GNNs in node and graph classification benchmarks.

Updated: 2024-06-03 12:07:01

标题: 在具有非线性谱滤波器的图上的等变机器学习

摘要: 等变机器学习是一种设计深度学习模型的方法，它尊重问题的对称性，旨在减少模型的复杂性并改善泛化能力。在本文中，我们专注于将移位等变性的扩展，这是卷积网络在图像上的基础，扩展到一般图形。与图像不同，图形没有自然的域平移概念。因此，我们将图形功能平移视为对称群：与图形平移算子相交换的酉算子。值得注意的是，这样的对称性在信号空间中操作，而不是直接在空间空间中。我们指出，标准频谱图形神经网络（GNN）的每个线性滤波层都与图形功能平移相交换，但激活函数打破了这种对称性。相反，我们提出了对图形功能平移完全等变的非线性谱滤波器（NLSFs），并展示它们具有通用逼近性质。所提出的NLSFs基于一种可在图形之间传输的新形式的频谱域。我们展示了NLSFs在节点和图形分类基准测试中优于现有频谱GNN的性能。

更新时间: 2024-06-03 12:07:01

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01249v1

Weak Augmentation Guided Relational Self-Supervised Learning

Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations. However, most methods mainly focus on the instance level information (\ie, the different augmented images of the same instance should have the same feature or cluster into the same class), but there is a lack of attention on the relationships between different instances. In this paper, we introduce a novel SSL paradigm, which we term as relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances. Specifically, our proposed method employs sharpened distribution of pairwise similarities among different instances as \textit{relation} metric, which is thus utilized to match the feature embeddings of different augmentations. To boost the performance, we argue that weak augmentations matter to represent a more reliable relation, and leverage momentum strategy for practical efficiency. The designed asymmetric predictor head and an InfoNCE warm-up strategy enhance the robustness to hyper-parameters and benefit the resulting performance. Experimental results show that our proposed ReSSL substantially outperforms the state-of-the-art methods across different network architectures, including various lightweight networks (\eg, EfficientNet and MobileNet).

Updated: 2024-06-03 12:06:06

标题: 弱增强引导的关系自监督学习

摘要: 自监督学习（SSL）包括主流对比学习在学习视觉表示时取得了巨大成功，而无需数据注释。然而，大多数方法主要关注实例级信息（即，同一实例的不同增强图像应具有相同特征或聚类到同一类别），但缺乏对不同实例之间关系的关注。在本文中，我们引入了一种新的SSL范式，我们将其称为关系自监督学习（ReSSL）框架，通过建模不同实例之间的关系来学习表示。具体而言，我们提出的方法利用不同实例之间的成对相似性的锐化分布作为“关系”度量，因此用于匹配不同增强的特征嵌入。为了提高性能，我们认为弱增强对于表示更可靠的关系至关重要，并利用动量策略提高实际效率。设计的不对称预测器头和InfoNCE热身策略增强了对超参数的鲁棒性，并使结果性能更加优越。实验结果表明，我们提出的ReSSL在不同网络架构上（包括各种轻量级网络，如EfficientNet和MobileNet）大大优于现有方法。

更新时间: 2024-06-03 12:06:06

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2203.08717v3

Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Instruction-tuned Large Language Models (LLMs) show impressive results in numerous practical applications, but they lack essential safety features that are common in other areas of computer science, particularly an explicit separation of instructions and data. This makes them vulnerable to manipulations such as indirect prompt injections and generally unsuitable for safety-critical tasks. Surprisingly, there is currently no established definition or benchmark to quantify this phenomenon. In this work, we close this gap by introducing a formal measure for instruction-data separation and an empirical variant that is calculable from a model's outputs. We also present a new dataset, SEP, that allows estimating the measure for real-world models. Our results on various LLMs show that the problem of instruction-data separation is real: all models fail to achieve high separation, and canonical mitigation techniques, such as prompt engineering and fine-tuning, either fail to substantially improve separation or reduce model utility. The source code and SEP dataset are openly accessible at https://github.com/egozverev/Shold-It-Be-Executed-Or-Processed.

Updated: 2024-06-03 12:04:50

标题: LLMs能够分离指令和数据吗？我们究竟指的是什么？

摘要: 指令调整的大型语言模型（LLMs）在许多实际应用中展现出令人印象深刻的结果，但它们缺乏其他计算机科学领域常见的关键安全功能，特别是指令和数据的明确分离。这使它们容易受到间接提示注入等操纵的影响，通常不适用于安全关键任务。令人惊讶的是，目前尚未建立用于量化这种现象的明确定义或基准。在这项工作中，我们通过引入一种正式的指令-数据分离度量和一个可以从模型输出中计算的经验变量来填补这一空白。我们还提出了一个新的数据集SEP，允许对现实世界模型进行分析。我们在各种LLMs上的结果表明，指令-数据分离问题是真实存在的：所有模型都无法实现高度分离，而传统的缓解技术，如提示工程和微调，要么无法实质性地改善分离，要么降低了模型的效用。源代码和SEP数据集可在https://github.com/egozverev/Shold-It-Be-Executed-Or-Processed上公开获取。

更新时间: 2024-06-03 12:04:50

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2403.06833v2

FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information

This paper establishes a mathematical foundation for the Adam optimizer, elucidating its connection to natural gradient descent through Riemannian and information geometry. We rigorously analyze the diagonal empirical Fisher information matrix (FIM) in Adam, clarifying all detailed approximations and advocating for the use of log probability functions as loss, which should be based on discrete distributions, due to the limitations of empirical FIM. Our analysis uncovers flaws in the original Adam algorithm, leading to proposed corrections such as enhanced momentum calculations, adjusted bias corrections, adaptive epsilon, and gradient clipping. We refine the weight decay term based on our theoretical framework. Our modified algorithm, Fisher Adam (FAdam), demonstrates superior performance across diverse domains including LLM, ASR, and VQ-VAE, achieving state-of-the-art results in ASR.

Updated: 2024-06-03 11:55:11

标题: FAdam：Adam是使用对角经验费舍尔信息的自然梯度优化器

摘要: 本文为Adam优化器建立了数学基础，通过黎曼和信息几何来阐明它与自然梯度下降的关系。我们对Adam中的对角经验费舍尔信息矩阵（FIM）进行了严格分析，澄清了所有详细的近似，并提倡使用对数概率函数作为损失函数，这应该基于离散分布，由于经验FIM的局限性。我们的分析揭示了原始Adam算法的缺陷，导致提出了改进的动量计算、调整的偏差校正、自适应ε和梯度裁剪等纠正措施。我们根据我们的理论框架对权重衰减项进行了改进。我们修改后的算法，费舍尔Adam（FAdam），在包括LLM、ASR和VQ-VAE在内的各个领域展示出卓越性能，实现了ASR的最新成果。

更新时间: 2024-06-03 11:55:11

领域: cs.LG,cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2405.12807v5

Achieving Tractable Minimax Optimal Regret in Average Reward MDPs

In recent years, significant attention has been directed towards learning average-reward Markov Decision Processes (MDPs). However, existing algorithms either suffer from sub-optimal regret guarantees or computational inefficiencies. In this paper, we present the first tractable algorithm with minimax optimal regret of $\widetilde{\mathrm{O}}(\sqrt{\mathrm{sp}(h^*) S A T})$, where $\mathrm{sp}(h^*)$ is the span of the optimal bias function $h^*$, $S \times A$ is the size of the state-action space and $T$ the number of learning steps. Remarkably, our algorithm does not require prior information on $\mathrm{sp}(h^*)$. Our algorithm relies on a novel subroutine, Projected Mitigated Extended Value Iteration (PMEVI), to compute bias-constrained optimal policies efficiently. This subroutine can be applied to various previous algorithms to improve regret bounds.

Updated: 2024-06-03 11:53:44

标题: 在平均奖励MDPs中实现可处理的极小化最优后悔

摘要: 近年来，人们开始关注学习平均奖励马尔可夫决策过程（MDPs）。然而，现有算法要么受到次优遗憾保证的困扰，要么计算效率低下。在本文中，我们提出了第一个可行的算法，具有最小化最优遗憾的保证为$\widetilde{\mathrm{O}}(\sqrt{\mathrm{sp}(h^*) S A T})$，其中$\mathrm{sp}(h^*)$是最优偏差函数$h^*$的跨度，$S \times A$是状态-动作空间的大小，$T$是学习步数。值得注意的是，我们的算法不需要先验关于$\mathrm{sp}(h^*)$的信息。我们的算法依赖于一种新颖的子程序，即投影缓解扩展值迭代（PMEVI），以高效地计算受偏差限制的最优策略。这种子程序可以应用于各种先前的算法，以改善遗憾界限。

更新时间: 2024-06-03 11:53:44

领域: cs.LG,cs.SY,eess.SY,math.OC,stat.ML

下载: http://arxiv.org/abs/2406.01234v1

AGALE: A Graph-Aware Continual Learning Evaluation Framework

In recent years, continual learning (CL) techniques have made significant progress in learning from streaming data while preserving knowledge across sequential tasks, particularly in the realm of euclidean data. To foster fair evaluation and recognize challenges in CL settings, several evaluation frameworks have been proposed, focusing mainly on the single- and multi-label classification task on euclidean data. However, these evaluation frameworks are not trivially applicable when the input data is graph-structured, as they do not consider the topological structure inherent in graphs. Existing continual graph learning (CGL) evaluation frameworks have predominantly focussed on single-label scenarios in the node classification (NC) task. This focus has overlooked the complexities of multi-label scenarios, where nodes may exhibit affiliations with multiple labels, simultaneously participating in multiple tasks. We develop a graph-aware evaluation (\agale) framework that accommodates both single-labeled and multi-labeled nodes, addressing the limitations of previous evaluation frameworks. In particular, we define new incremental settings and devise data partitioning algorithms tailored to CGL datasets. We perform extensive experiments comparing methods from the domains of continual learning, continual graph learning, and dynamic graph learning (DGL). We theoretically analyze \agale and provide new insights about the role of homophily in the performance of compared methods. We release our framework at https://github.com/Tianqi-py/AGALE.

Updated: 2024-06-03 11:50:47

标题: AGALE：一种面向图的持续学习评估框架

摘要: 最近几年，持续学习（CL）技术在学习流数据并在顺序任务之间保留知识方面取得了显著进展，特别是在欧几里得数据领域。为了促进公平评估并认识CL环境中的挑战，已提出了几种评估框架，主要关注欧几里得数据上的单标签和多标签分类任务。然而，当输入数据是图结构化时，这些评估框架并不容易应用，因为它们没有考虑图中固有的拓扑结构。现有的持续图学习（CGL）评估框架主要集中在节点分类（NC）任务中的单标签情景。这种关注忽视了多标签情景的复杂性，其中节点可能同时参与多个任务，表现出与多个标签的关联。我们开发了一个图感知评估（AGALE）框架，既适用于单标签节点又适用于多标签节点，解决了以前评估框架的局限性。特别是，我们定义了新的增量设置，并设计了针对CGL数据集的数据分区算法。我们进行了广泛的实验，比较了持续学习、持续图学习和动态图学习（DGL）领域的方法。我们在理论上分析了AGALE，并提供了关于同质性在比较方法性能中的作用的新见解。我们在https://github.com/Tianqi-py/AGALE发布了我们的框架。

更新时间: 2024-06-03 11:50:47

领域: cs.LG

下载: http://arxiv.org/abs/2406.01229v1

Earthfarseer: Versatile Spatio-Temporal Dynamical Systems Modeling in One Model

Efficiently modeling spatio-temporal (ST) physical processes and observations presents a challenging problem for the deep learning community. Many recent studies have concentrated on meticulously reconciling various advantages, leading to designed models that are neither simple nor practical. To address this issue, this paper presents a systematic study on existing shortcomings faced by off-the-shelf models, including lack of local fidelity, poor prediction performance over long time-steps,low scalability, and inefficiency. To systematically address the aforementioned problems, we propose an EarthFarseer, a concise framework that combines parallel local convolutions and global Fourier-based transformer architectures, enabling dynamically capture the local-global spatial interactions and dependencies. EarthFarseer also incorporates a multi-scale fully convolutional and Fourier architectures to efficiently and effectively capture the temporal evolution. Our proposal demonstrates strong adaptability across various tasks and datasets, with fast convergence and better local fidelity in long time-steps predictions. Extensive experiments and visualizations over eight human society physical and natural physical datasets demonstrates the state-of-the-art performance of EarthFarseer. We release our code at https://github.com/easylearningscores/EarthFarseer.

Updated: 2024-06-03 11:46:47

标题: Earthfarseer：多功能时空动力系统建模在一个模型中

摘要: 高效地建模时空（ST）物理过程和观测对深度学习社区来说是一个具有挑战性的问题。许多最近的研究集中在精心协调各种优势，导致设计的模型既不简单也不实用。为了解决这个问题，本文对现有模型面临的存在的缺点进行了系统研究，包括局部保真度不足、长时间步的预测性能差、可伸缩性低和低效率等问题。为了系统地解决上述问题，我们提出了EarthFarseer，这是一个简明的框架，结合了并行局部卷积和全局基于Fourier的变压器架构，可以动态捕捉局部-全局空间交互和依赖关系。EarthFarseer还结合了多尺度全卷积和Fourier架构，以高效有效地捕捉时间演变。我们的提议在各种任务和数据集上表现出了强大的适应性，具有快速收敛和在长时间步预测中更好的局部保真度。对八个人类社会物理和自然物理数据集进行的大量实验和可视化展示了EarthFarseer的最新性能。我们将我们的代码发布在https://github.com/easylearningscores/EarthFarseer。

更新时间: 2024-06-03 11:46:47

领域: cs.AI,I.2.3

下载: http://arxiv.org/abs/2312.08403v3

Overcoming Saturation in Density Ratio Estimation by Iterated Regularization

Estimating the ratio of two probability densities from finitely many samples, is a central task in machine learning and statistics. In this work, we show that a large class of kernel methods for density ratio estimation suffers from error saturation, which prevents algorithms from achieving fast error convergence rates on highly regular learning problems. To resolve saturation, we introduce iterated regularization in density ratio estimation to achieve fast error rates. Our methods outperform its non-iteratively regularized versions on benchmarks for density ratio estimation as well as on large-scale evaluations for importance-weighted ensembling of deep unsupervised domain adaptation models.

Updated: 2024-06-03 11:40:32

标题: 通过迭代正则化克服密度比估计中的饱和

摘要: 在机器学习和统计学中，从有限数量样本估计两个概率密度的比率是一项核心任务。在这项工作中，我们展示了一大类用于密度比率估计的核方法存在错误饱和问题，这阻碍了算法在高度规则化的学习问题上实现快速错误收敛率。为了解决饱和问题，我们引入了迭代正则化方法在密度比率估计中，以实现快速错误率。我们的方法在密度比率估计基准测试和大规模评估中，超越了其非迭代正则化版本，特别是在深度无监督领域适应模型的重要性加权集成中。

更新时间: 2024-06-03 11:40:32

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.13891v2

Constraint-based Adversarial Example Synthesis

In the era of rapid advancements in artificial intelligence (AI), neural network models have achieved notable breakthroughs. However, concerns arise regarding their vulnerability to adversarial attacks. This study focuses on enhancing Concolic Testing, a specialized technique for testing Python programs implementing neural networks. The extended tool, PyCT, now accommodates a broader range of neural network operations, including floating-point and activation function computations. By systematically generating prediction path constraints, the research facilitates the identification of potential adversarial examples. Demonstrating effectiveness across various neural network architectures, the study highlights the vulnerability of Python-based neural network models to adversarial attacks. This research contributes to securing AI-powered applications by emphasizing the need for robust testing methodologies to detect and mitigate potential adversarial threats. It underscores the importance of rigorous testing techniques in fortifying neural network models for reliable applications in Python.

Updated: 2024-06-03 11:35:26

标题: 基于约束的对抗性样本合成

摘要: 在人工智能（AI）快速发展的时代，神经网络模型取得了显著的突破。然而，人们对其容易受到对抗性攻击的担忧也日益增加。本研究专注于增强Concolic Testing，这是一种专门用于测试实现神经网络的Python程序的技术。扩展的工具PyCT 现在可以适应更广泛的神经网络操作，包括浮点数和激活函数计算。通过系统地生成预测路径约束，研究有助于识别潜在的对抗性示例。展示了在各种神经网络架构中的有效性，研究突出了基于Python的神经网络模型容易受到对抗性攻击的脆弱性。这项研究通过强调需要稳健的测试方法来检测和缓解潜在的对抗性威胁，有助于保障AI应用的安全。它强调了通过严格的测试技术来加固神经网络模型以实现可靠Python应用的重要性。

更新时间: 2024-06-03 11:35:26

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2406.01219v1

Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game

Human preference alignment is essential to improve the interaction quality of large language models (LLMs). Existing alignment methods depend on manually annotated preference data to guide the LLM optimization directions. However, continuously updating LLMs for alignment raises a distribution gap between model-generated samples and human-annotated responses, hindering training effectiveness. To mitigate this issue, previous methods require additional preference annotation on newly generated samples to adapt to the shifted distribution, which consumes a large amount of annotation resources. Targeting more efficient human preference optimization, we propose an Adversarial Preference Optimization (APO) framework, in which the LLM and the reward model update alternatively via a min-max game. Through adversarial training, the reward model can adapt to the shifted generation distribution of the LLM without any additional annotation. With comprehensive experiments, we find the proposed adversarial training framework further enhances existing alignment baselines in terms of LLM helpfulness and harmlessness. The code is at https://github.com/Linear95/APO.

Updated: 2024-06-03 11:34:05

标题: 对抗性偏好优化：通过RM-LLM游戏提升您的对齐效果

摘要: 人类偏好对齐对于提高大型语言模型（LLMs）的交互质量至关重要。现有的对齐方法依赖于手动标注的偏好数据来引导LLM优化方向。然而，持续更新LLMs以进行对齐会引起模型生成样本和人工标注响应之间的分布差距，从而阻碍训练效果。为了缓解这个问题，先前的方法需要对新生成的样本进行额外的偏好注释，以适应转变的分布，这消耗了大量的注释资源。针对更有效的人类偏好优化，我们提出了一个对抗偏好优化（APO）框架，其中LLM和奖励模型通过最小最大博弈交替更新。通过对抗训练，奖励模型可以适应LLM的转变生成分布，而无需额外的注释。通过全面的实验，我们发现所提出的对抗训练框架进一步增强了现有的对齐基线在LLM的帮助和无害性方面。代码位于https://github.com/Linear95/APO。

更新时间: 2024-06-03 11:34:05

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.08045v4

Multistep Consistency Models

Diffusion models are relatively easy to train but require many steps to generate samples. Consistency models are far more difficult to train, but generate samples in a single step. In this paper we propose Multistep Consistency Models: A unification between Consistency Models (Song et al., 2023) and TRACT (Berthelot et al., 2023) that can interpolate between a consistency model and a diffusion model: a trade-off between sampling speed and sampling quality. Specifically, a 1-step consistency model is a conventional consistency model whereas a $\infty$-step consistency model is a diffusion model. Multistep Consistency Models work really well in practice. By increasing the sample budget from a single step to 2-8 steps, we can train models more easily that generate higher quality samples, while retaining much of the sampling speed benefits. Notable results are 1.4 FID on Imagenet 64 in 8 step and 2.1 FID on Imagenet128 in 8 steps with consistency distillation, using simple losses without adversarial training. We also show that our method scales to a text-to-image diffusion model, generating samples that are close to the quality of the original model.

Updated: 2024-06-03 11:33:51

标题: 多步一致性模型

摘要: 扩散模型相对容易训练，但需要许多步骤来生成样本。一致性模型更难训练，但可以在单个步骤中生成样本。在本文中，我们提出了多步一致性模型：将一致性模型（Song等人，2023年）和TRACT（Berthelot等人，2023年）统一起来，可以在一致性模型和扩散模型之间进行插值：在采样速度和采样质量之间进行权衡。具体来说，1步一致性模型是传统的一致性模型，而无限步一致性模型是扩散模型。多步一致性模型在实践中表现非常好。通过将样本预算从单一步骤增加到2-8步，我们可以更轻松地训练生成更高质量样本的模型，同时保留大部分采样速度的优势。显著的结果是在Imaget 64上8步的1.4 FID，以及在Imaget128上8步的2.1 FID，使用简单的损失函数而不是对抗性训练进行一致性蒸馏。我们还展示了我们的方法适用于文本到图像扩散模型，生成的样本接近原始模型的质量。

更新时间: 2024-06-03 11:33:51

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2403.06807v2

DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild

Image quality assessment (IQA) plays a critical role in selecting high-quality images and guiding compression and enhancement methods in a series of applications. The blind IQA, which assesses the quality of in-the-wild images containing complex authentic distortions without reference images, poses greater challenges. Existing methods are limited to modeling a uniform distribution with local patches and are bothered by the gap between low and high-level visions (caused by widely adopted pre-trained classification networks). In this paper, we propose a novel IQA method called diffusion priors-based IQA (DP-IQA), which leverages the prior knowledge from the pre-trained diffusion model with its excellent powers to bridge semantic gaps in the perception of the visual quality of images. Specifically, we use pre-trained stable diffusion as the backbone, extract multi-level features from the denoising U-Net during the upsampling process at a specified timestep, and decode them to estimate the image quality score. The text and image adapters are adopted to mitigate the domain gap for downstream tasks and correct the information loss caused by the variational autoencoder bottleneck. Finally, we distill the knowledge in the above model into a CNN-based student model, significantly reducing the parameter to enhance applicability, with the student model performing similarly or even better than the teacher model surprisingly. Experimental results demonstrate that our DP-IQA achieves state-of-the-art results on various in-the-wild datasets with better generalization capability, which shows the superiority of our method in global modeling and utilizing the hierarchical feature clues of diffusion for evaluating image quality.

Updated: 2024-06-03 11:32:40

标题: DP-IQA：利用扩散先验进行野外盲图像质量评估

摘要: 图像质量评估（IQA）在选择高质量图像和指导压缩和增强方法方面起着关键作用，在一系列应用中扮演着重要角色。盲目的IQA评估野外图像的质量，这些图像包含复杂的真实失真，没有参考图像，面临更大的挑战。现有方法局限于对局部补丁进行均匀分布建模，并受到低级和高级视觉之间差距的困扰（由广泛采用的预先训练分类网络引起）。本文提出了一种称为基于扩散先验的IQA（DP-IQA）的新型IQA方法，利用了从预先训练的扩散模型中获得的先验知识，其具有优秀的能力来弥合图像视觉质量感知中的语义差距。具体地，我们使用预先训练的稳定扩散作为骨干，从去噪U-Net中提取多级特征，在指定的时间步长内进行上采样过程，并对它们进行解码以估计图像质量得分。文本和图像适配器被采用来减轻下游任务的域差距，并纠正由变分自编码器瓶颈引起的信息丢失。最后，我们将上述模型中的知识提炼到基于CNN的学生模型中，大大减少参数以增强适用性，使学生模型表现出令人惊讶地与教师模型类似甚至更好。实验结果表明，我们的DP-IQA在各种野外数据集上实现了最先进的结果，具有更好的泛化能力，显示了我们的方法在全局建模和利用扩散的层次特征线索来评估图像质量方面的优越性。

更新时间: 2024-06-03 11:32:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19996v3

Deep Learning Calabi-Yau four folds with hybrid and recurrent neural network architectures

In this work, we report the results of applying deep learning based on hybrid convolutional-recurrent and purely recurrent neural network architectures to the dataset of almost one million complete intersection Calabi-Yau four-folds (CICY4) to machine-learn their four Hodge numbers $h^{1,1}, h^{2,1}, h^{3,1}, h^{2,2}$. In particular, we explored and experimented with twelve different neural network models, nine of which are convolutional-recurrent (CNN-RNN) hybrids with the RNN unit being either GRU (Gated Recurrent Unit) or Long Short Term Memory (LSTM). The remaining four models are purely recurrent neural networks based on LSTM. In terms of the $h^{1,1}, h^{2,1}, h^{3,1}, h^{2,2}$ prediction accuracies, at 72% training ratio, our best performing individual model is CNN-LSTM-400, a hybrid CNN-LSTM with the LSTM hidden size of 400, which obtained 99.74%, 98.07%, 95.19%, 81.01%, our second best performing individual model is LSTM-448, an LSTM-based model with the hidden size of 448, which obtained 99.74%, 97.51%, 94.24%, and 78.63%. These results were improved by forming ensembles of the top two, three or even four models. Our best ensemble, consisting of the top four models, achieved the accuracies of 99.84%, 98.71%, 96.26%, 85.03%. At 80% training ratio, the top two performing models LSTM-448 and LSTM-424 are both LSTM-based with the hidden sizes of 448 and 424. Compared with the 72% training ratio, there is a significant improvement of accuracies, which reached 99.85%, 98.66%, 96.26%, 84.77% for the best individual model and 99.90%, 99.03%, 97.97%, 87.34% for the best ensemble.

Updated: 2024-06-03 11:32:11

标题: 用混合和循环神经网络架构进行深度学习Calabi-Yau四胞体

摘要: 在这项工作中，我们报告了应用基于混合卷积-循环和纯循环神经网络架构的深度学习结果，对近一百万个完整交叉 Calabi-Yau 四倍体（CICY4）数据集进行机器学习，以了解它们的四个 Hodge 数 $h^{1,1}, h^{2,1}, h^{3,1}, h^{2,2}$。具体而言，我们探索并尝试了十二种不同的神经网络模型，其中九种是卷积-循环（CNN-RNN）混合模型，RNN 单元可能是 GRU（门控循环单元）或长短期记忆（LSTM）。其余四种模型是基于 LSTM 的纯循环神经网络。在 $h^{1,1}, h^{2,1}, h^{3,1}, h^{2,2}$ 预测准确率方面，在 72% 的训练比率下，我们表现最佳的单个模型是 CNN-LSTM-400，一个具有 LSTM 隐藏大小为 400 的混合 CNN-LSTM 模型，分别获得了 99.74%，98.07%，95.19%，81.01% 的准确率；我们次佳的单个模型是 LSTM-448，一个基于 LSTM 的模型，隐藏大小为 448，分别获得了 99.74%，97.51%，94.24%，78.63% 的准确率。通过组合前两个、三个甚至四个模型，这些结果得到了改善。我们最佳的组合由前四个模型组成，实现了 99.84%，98.71%，96.26%，85.03% 的准确率。在 80% 的训练比率下，表现最佳的两个模型 LSTM-448 和 LSTM-424 都是基于 LSTM 的，隐藏大小分别为 448 和 424。与 72% 的训练比率相比，准确率有显著提高，最佳单个模型达到 99.85%，98.66%，96.26%，84.77%，最佳组合达到 99.90%，99.03%，97.97%，87.34%。

更新时间: 2024-06-03 11:32:11

领域: hep-th,cs.LG,math.AG

下载: http://arxiv.org/abs/2405.17406v2

Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition

Cross-lingual named entity recognition (NER) aims to train an NER model for the target language leveraging only labeled source language data and unlabeled target language data. Prior approaches either perform label projection on translated source language data or employ a source model to assign pseudo labels for target language data and train a target model on these pseudo-labeled data to generalize to the target language. However, these automatic labeling procedures inevitably introduce noisy labels, thus leading to a performance drop. In this paper, we propose a Global-Local Denoising framework (GLoDe) for cross-lingual NER. Specifically, GLoDe introduces a progressive denoising strategy to rectify incorrect pseudo labels by leveraging both global and local distribution information in the semantic space. The refined pseudo-labeled target language data significantly improves the model's generalization ability. Moreover, previous methods only consider improving the model with language-agnostic features, however, we argue that target language-specific features are also important and should never be ignored. To this end, we employ a simple auxiliary task to achieve this goal. Experimental results on two benchmark datasets with six target languages demonstrate that our proposed GLoDe significantly outperforms current state-of-the-art methods.

Updated: 2024-06-03 11:29:19

标题: Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition （为跨语言命名实体识别提供全局局部去噪框架的伪标签改进）

摘要: 跨语言命名实体识别（NER）旨在仅利用标记的源语言数据和未标记的目标语言数据来训练目标语言的NER模型。先前的方法要么在翻译的源语言数据上执行标签投影，要么利用源模型为目标语言数据分配伪标签，并在这些伪标签数据上训练目标模型以推广至目标语言。然而，这些自动标记过程不可避免地会引入噪声标签，从而导致性能下降。在本文中，我们提出了一个全局-局部去噪框架（GLoDe）用于跨语言NER。具体而言，GLoDe引入了一种渐进式去噪策略，通过利用语义空间中的全局和局部分布信息来纠正不正确的伪标签。经过改进的伪标记目标语言数据显著提高了模型的泛化能力。此外，先前的方法仅考虑使用与语言无关的特征来改进模型，然而，我们认为目标语言特定的特征也很重要，不应被忽视。为此，我们采用一个简单的辅助任务来实现这个目标。在两个基准数据集上进行的实验结果显示，我们提出的GLoDe明显优于当前最先进的方法。

更新时间: 2024-06-03 11:29:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.01213v1

Federated Learning under Partially Class-Disjoint Data via Manifold Reshaping

Statistical heterogeneity severely limits the performance of federated learning (FL), motivating several explorations e.g., FedProx, MOON and FedDyn, to alleviate this problem. Despite effectiveness, their considered scenario generally requires samples from almost all classes during the local training of each client, although some covariate shifts may exist among clients. In fact, the natural case of partially class-disjoint data (PCDD), where each client contributes a few classes (instead of all classes) of samples, is practical yet underexplored. Specifically, the unique collapse and invasion characteristics of PCDD can induce the biased optimization direction in local training, which prevents the efficiency of federated learning. To address this dilemma, we propose a manifold reshaping approach called FedMR to calibrate the feature space of local training. Our FedMR adds two interplaying losses to the vanilla federated learning: one is intra-class loss to decorrelate feature dimensions for anti-collapse; and the other one is inter-class loss to guarantee the proper margin among categories in the feature expansion. We conduct extensive experiments on a range of datasets to demonstrate that our FedMR achieves much higher accuracy and better communication efficiency. Source code is available at: https://github.com/MediaBrain-SJTU/FedMR.git.

Updated: 2024-06-03 11:16:55

标题: 部分类不相交数据下的流分布学习通过流形重塑

摘要: 统计异质性严重限制了联邦学习（FL）的性能，这激发了几项探索，例如FedProx、MOON和FedDyn，以减轻这一问题。尽管这些方法有效，但它们考虑的场景通常要求在每个客户端的本地训练期间几乎需要所有类别的样本，尽管客户端之间可能存在一些协变量转移。事实上，部分类别不相交数据（PCDD）是一个实际而未被充分探索的自然情况，其中每个客户端只贡献了少量类别（而不是所有类别）的样本。具体来说，PCDD 的独特坍缩和入侵特征可以在本地训练中引起有偏的优化方向，从而阻碍了联邦学习的效率。为了解决这一困境，我们提出了一种称为FedMR的流形重塑方法，以校准本地训练的特征空间。我们的FedMR在原始联邦学习中添加了两种相互作用的损失：一种是用于解耦特征维度的类内损失，以防止坍缩；另一种是用于确保特征扩展中类别之间的适当间隔的类间损失。我们在一系列数据集上进行了大量实验，证明我们的FedMR实现了更高的准确性和更好的通信效率。源代码可在https://github.com/MediaBrain-SJTU/FedMR.git 上找到。

更新时间: 2024-06-03 11:16:55

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.18983v2

Scaling Up Deep Clustering Methods Beyond ImageNet-1K

Deep image clustering methods are typically evaluated on small-scale balanced classification datasets while feature-based $k$-means has been applied on proprietary billion-scale datasets. In this work, we explore the performance of feature-based deep clustering approaches on large-scale benchmarks whilst disentangling the impact of the following data-related factors: i) class imbalance, ii) class granularity, iii) easy-to-recognize classes, and iv) the ability to capture multiple classes. Consequently, we develop multiple new benchmarks based on ImageNet21K. Our experimental analysis reveals that feature-based $k$-means is often unfairly evaluated on balanced datasets. However, deep clustering methods outperform $k$-means across most large-scale benchmarks. Interestingly, $k$-means underperforms on easy-to-classify benchmarks by large margins. The performance gap, however, diminishes on the highest data regimes such as ImageNet21K. Finally, we find that non-primary cluster predictions capture meaningful classes (i.e. coarser classes).

Updated: 2024-06-03 11:13:27

标题: 将深度聚类方法扩展至超越ImageNet-1K

摘要: 深度图像聚类方法通常在小规模平衡分类数据集上进行评估，而基于特征的$k$-means已应用于专有的十亿规模数据集。在这项工作中，我们探讨了基于特征的深度聚类方法在大规模基准数据集上的性能，同时分析以下数据相关因素的影响：i) 类别不平衡，ii) 类别粒度，iii) 易于识别的类别，和iv) 能够捕捉多个类别。因此，我们基于ImageNet21K开发了多个新的基准数据集。我们的实验分析显示，基于特征的$k$-means经常在平衡数据集上被不公平地评估。然而，在大多数大规模基准数据集上，深度聚类方法优于$k$-means。有趣的是，$k$-means在易于分类的基准数据集上表现不佳，差距很大。然而，这种性能差距在像ImageNet21K这样的最高数据范围上减小。最后，我们发现，非主要的聚类预测可以捕捉到有意义的类别（即更粗略的类别）。

更新时间: 2024-06-03 11:13:27

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.01203v1

DeCoF: Generated Video Detection via Frame Consistency: The First Benchmark Dataset

The escalating quality of video generated by advanced video generation methods results in new security challenges, while there have been few relevant research efforts: 1) There is no open-source dataset for generated video detection, 2) No generated video detection method has been proposed so far. To this end, we propose an open-source dataset and a detection method for generated video for the first time. First, we propose a scalable dataset consisting of 964 prompts, covering various forgery targets, scenes, behaviors, and actions, as well as various generation models with different architectures and generation methods, including the most popular commercial models like OpenAI's Sora and Google's Veo. Second, we found via probing experiments that spatial artifact-based detectors lack generalizability. Hence, we propose a simple yet effective \textbf{de}tection model based on \textbf{f}rame \textbf{co}nsistency (\textbf{DeCoF}), which focuses on temporal artifacts by eliminating the impact of spatial artifacts during feature learning. Extensive experiments demonstrate the efficacy of DeCoF in detecting videos generated by unseen video generation models and confirm its powerful generalizability across several commercially proprietary models. Our code and dataset will be released at \url{https://anonymous.4open.science/r/DeCoF-8394}.

Updated: 2024-06-03 11:00:25

标题: DeCoF：通过帧一致性生成视频检测：第一个基准数据集

摘要: 先进视频生成方法生成的视频质量不断提升，导致出现了新的安全挑战，然而相关研究工作却很少：1）目前还没有针对生成视频检测的开源数据集，2）迄今为止还没有提出任何生成视频检测方法。为此，我们首次提出了一个开源数据集和一种生成视频检测方法。首先，我们提出了一个可扩展的数据集，包含964个提示，涵盖各种伪造目标、场景、行为和动作，以及各种具有不同架构和生成方法的生成模型，包括像OpenAI的Sora和Google的Veo这样的最流行的商业模型。其次，我们通过探究实验证明，基于空间伪影的检测器缺乏泛化能力。因此，我们提出了一种简单而有效的基于帧一致性（DeCoF）的检测模型，通过在特征学习过程中消除空间伪影的影响，从而集中于时间伪影。大量实验表明，DeCoF在检测未见的视频生成模型生成的视频方面的效力，并且证实了其在几种商业专有模型中的强大泛化能力。我们的代码和数据集将在\url{https://anonymous.4open.science/r/DeCoF-8394}上发布。

更新时间: 2024-06-03 11:00:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.02085v3

Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression

Automated essay scoring (AES) involves predicting a score that reflects the writing quality of an essay. Most existing AES systems produce only a single overall score. However, users and L2 learners expect scores across different dimensions (e.g., vocabulary, grammar, coherence) for English essays in real-world applications. To address this need, we have developed two models that automatically score English essays across multiple dimensions by employing fine-tuning and other strategies on two large datasets. The results demonstrate that our systems achieve impressive performance in evaluation using three criteria: precision, F1 score, and Quadratic Weighted Kappa. Furthermore, our system outperforms existing methods in overall scoring.

Updated: 2024-06-03 10:59:50

标题: 使用微调和多元回归自动评分多维论文

摘要: 自动化作文评分(AES)涉及预测反映作文质量的分数。大多数现有的AES系统只产生一个总体分数。然而，用户和第二语言学习者期望在真实应用中获得不同维度的分数(例如词汇、语法、连贯性)。为了满足这一需求，我们开发了两种模型，通过在两个大型数据集上使用微调和其他策略，自动评分英语作文的多个维度。结果表明，我们的系统在三个评价标准(精确度、F1分数和二次加权Kappa)下取得了令人印象深刻的表现。此外，我们的系统在总体评分方面优于现有方法。

更新时间: 2024-06-03 10:59:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.01198v1

3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information

In recent years, a plethora of diverse methods have been proposed for 3D pose estimation. Among these, self-attention mechanisms and graph convolutions have both been proven to be effective and practical methods. Recognizing the strengths of those two techniques, we have developed a novel Semantic Graph Attention Network which can benefit from the ability of self-attention to capture global context, while also utilizing the graph convolutions to handle the local connectivity and structural constraints of the skeleton. We also design a Body Part Decoder that assists in extracting and refining the information related to specific segments of the body. Furthermore, our approach incorporates Distance Information, enhancing our model's capability to comprehend and accurately predict spatial relationships. Finally, we introduce a Geometry Loss who makes a critical constraint on the structural skeleton of the body, ensuring that the model's predictions adhere to the natural limits of human posture. The experimental results validate the effectiveness of our approach, demonstrating that every element within the system is essential for improving pose estimation outcomes. With comparison to state-of-the-art, the proposed work not only meets but exceeds the existing benchmarks.

Updated: 2024-06-03 10:59:00

标题: 基于语义图注意力网络和距离信息的3D全身姿态估计

摘要: 近年来，已经提出了大量不同的方法用于3D姿势估计。在这些方法中，自注意机制和图卷积都被证明是有效和实用的方法。认识到这两种技术的优势，我们开发了一种新颖的语义图注意网络，它可以从自注意力捕捉全局上下文的能力中受益，同时利用图卷积来处理骨架的局部连接性和结构约束。我们还设计了一个身体部位解码器，可帮助提取和精炼与身体特定部位相关的信息。此外，我们的方法融入了距离信息，增强了我们模型理解和准确预测空间关系的能力。最后，我们引入了一个几何损失，对身体的结构骨架施加了重要约束，确保模型的预测符合人体姿势的自然限制。实验结果验证了我们方法的有效性，表明系统中的每个元素对改善姿势估计结果至关重要。与最先进技术相比，提出的工作不仅达到了现有基准，而且超过了现有基准。

更新时间: 2024-06-03 10:59:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.01196v1

CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion

Large language models (LLMs) often incorporate multiple text chunks in their inputs to provide the necessary contexts. To speed up the prefill of the long LLM inputs, one can pre-compute the KV cache of a text and re-use the KV cache when the context is reused as the prefix of another LLM input. However, the reused text chunks are not always the input prefix, and when they are not, their precomputed KV caches cannot be directly used since they ignore the text's cross-attention with the preceding text in the LLM input. Thus, the benefits of reusing KV caches remain largely unrealized. This paper tackles just one question: when an LLM input contains multiple text chunks, how to quickly combine their precomputed KV caches in order to achieve the same generation quality as the expensive full prefill (i.e., without reusing KV cache)? We present CacheBlend, a scheme that reuses the pre-computed KV caches, regardless prefix or not, and selectively recomputes the KV values of a small subset of tokens to partially update each reused KV cache. In the meantime,the small extra delay for recomputing some tokens can be pipelined with the retrieval of KV caches within the same job,allowing CacheBlend to store KV caches in slower devices with more storage capacity while retrieving them without increasing the inference delay. By comparing CacheBlend with the state-of-the-art KV cache reusing schemes on three open-source LLMs of various sizes and four popular benchmark datasets of different tasks, we show that CacheBlend reduces time-to-first-token (TTFT) by 2.2-3.3X and increases the inference throughput by 2.8-5X, compared with full KV recompute, without compromising generation quality or incurring more storage cost.

Updated: 2024-06-03 10:57:57

标题: CacheBlend：用于带有缓存知识融合的 RAG 的快速大型语言模型服务

摘要: 大型语言模型（LLMs）通常在其输入中包含多个文本块，以提供必要的上下文。为了加快长LLM输入的预填充，可以预先计算文本的KV缓存，并在上下文被重新使用作为另一个LLM输入的前缀时重复使用KV缓存。然而，重新使用的文本块并不总是输入前缀，当它们不是时，它们预先计算的KV缓存不能直接使用，因为它们忽略了文本在LLM输入中与前面文本的交叉关注。因此，重用KV缓存的好处大部分尚未实现。本文只处理一个问题：当LLM输入包含多个文本块时，如何快速组合它们预先计算的KV缓存，以达到与昂贵的完全预填充（即，不重用KV缓存）相同的生成质量？我们提出了CacheBlend，这是一种方案，它重复使用预先计算的KV缓存，无论是否前缀，并有选择地重新计算少量标记的KV值，从而部分更新每个重复使用的KV缓存。同时，重新计算一些标记的小额延迟可以与在同一作业中检索KV缓存的过程并行进行，使CacheBlend能够将KV缓存存储在存储容量更大的较慢设备中，同时检索它们而不增加推理延迟。通过将CacheBlend与三个不同大小的开源LLM和四个不同任务的流行基准数据集上的最先进的KV缓存重用方案进行比较，我们表明CacheBlend将时间到第一个标记（TTFT）减少了2.2-3.3倍，并将推理吞吐量提高了2.8-5倍，与完整的KV重新计算相比，而不会影响生成质量或增加更多的存储成本。

更新时间: 2024-06-03 10:57:57

领域: cs.LG

下载: http://arxiv.org/abs/2405.16444v2

Flood and Echo Net: Algorithmically Aligned GNNs that Generalize

Most Graph Neural Networks follow the standard message-passing framework where, in each step, all nodes simultaneously communicate with each other. We want to challenge this paradigm by aligning the computation more closely to the execution of distributed algorithms and propose the Flood and Echo Net. A single round of a Flood and Echo Net consists of an origin node and a flooding phase followed by an echo phase. First, during the flooding, messages are sent from the origin and propagated outwards throughout the entire graph. Then, during the echo, the message flow reverses and messages are sent back towards the origin. As nodes are only sparsely activated upon receiving a message, this leads to a wave-like activation pattern that traverses the graph. Through these sparse but parallel activations, the Net becomes more expressive than traditional MPNNs which are limited by the 1-WL test and also is provably more efficient in terms of message complexity. Moreover, the mechanism's inherent ability to generalize across graphs of varying sizes positions it as a practical architecture for the task of algorithmic learning. We test the Flood and Echo Net on a variety of synthetic tasks and the SALSA-CLRS benchmark and find that the algorithmic alignment of the execution improves generalization to larger graph sizes.

Updated: 2024-06-03 10:50:08

标题: 洪水和回声网络：算法对齐的通用GNN

摘要: 大多数图神经网络都遵循标准的消息传递框架，在每一步中，所有节点同时进行通信。我们希望通过更紧密地将计算与分布式算法的执行对齐来挑战这一范式，并提出了洪泛和回声网络。洪泛和回声网络的一个单一轮包括一个起始节点和一个洪泛阶段，然后是一个回声阶段。首先，在洪泛阶段，消息从起始节点发送并在整个图中向外传播。然后，在回声阶段，消息流倒转，消息被发送回起始节点。由于节点仅在接收到消息时才被稀疏激活，这导致了一种波状激活模式，遍历整个图。通过这些稀疏但并行的激活，网络变得比传统的MPNN更具表现力，后者受到1-WL测试的限制，并且从消息复杂度方面更加高效。此外，该机制固有的泛化跨不同大小图形的能力将其定位为算法学习任务的实用架构。我们在各种合成任务和SALSA-CLRS基准测试上测试了洪泛和回声网络，并发现执行的算法对较大图形大小的泛化有所改善。

更新时间: 2024-06-03 10:50:08

领域: cs.LG

下载: http://arxiv.org/abs/2310.06970v3

SNPGuard: Remote Attestation of SEV-SNP VMs Using Open Source Tools

Cloud computing is a ubiquitous solution to handle today's complex computing demands. However, it comes with data privacy concerns, as the cloud service provider has complete access to code and data running on their infrastructure. VM-based Trusted Execution Environments (TEEs) are a promising solution to solve this issue. They provide strong isolation guarantees to lock out the cloud service provider, as well as an attestation mechanism to enable the end user to verify their trustworthiness. Attesting the whole boot chain of a VM is a challenging task that requires modifications to several software components. While there are open source solutions for the individual components, the tooling and documentation for properly integrating them remains scarce. In this paper, we try to fill this gap by elaborating on two common boot workflows and providing open source tooling to perform them with low manual effort. The first workflow assumes that the VM image does only require integrity but not confidentiality, allowing for an uninterrupted boot process. The second workflow covers booting a VM with an encrypted root filesystem, requiring secure provisioning of the decryption key during early boot. While our tooling targets AMD Secure Encrypted Virtualization (SEV) VMs, the concepts also apply to other VM-based TEEs such as Intel Trusted Domain Extensions (TDX).

Updated: 2024-06-03 10:48:30

标题: SNPGuard：使用开源工具对SEV-SNP虚拟机进行远程认证

摘要: 云计算是处理当今复杂计算需求的一种无处不在的解决方案。然而，它带来了数据隐私方面的担忧，因为云服务提供商可以完全访问其基础设施上运行的代码和数据。基于虚拟机的可信执行环境（TEE）是解决这个问题的一种有前途的解决方案。它们提供强大的隔离保证，以阻止云服务提供商访问，同时提供一个证明机制，使最终用户能够验证其可信性。对于验证虚拟机整个引导链是一项具有挑战性的任务，需要对几个软件组件进行修改。虽然有针对各个组件的开源解决方案，但针对其进行适当集成的工具和文档仍然稀缺。在本文中，我们试图填补这一空白，详细阐述了两种常见的引导工作流程，并提供开源工具以在较低的手动工作量下执行它们。第一个工作流程假定虚拟机镜像只需要完整性而不需要保密性，从而允许无间断的引导过程。第二个工作流程涵盖了使用加密根文件系统引导虚拟机，需要在早期引导时安全提供解密密钥。虽然我们的工具针对AMD安全加密虚拟化（SEV）虚拟机，但这些概念也适用于其他基于虚拟机的TEE，如英特尔可信域扩展（TDX）。

更新时间: 2024-06-03 10:48:30

领域: cs.CR

下载: http://arxiv.org/abs/2406.01186v1

OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models

Neural Theory-of-Mind (N-ToM), machine's ability to understand and keep track of the mental states of others, is pivotal in developing socially intelligent agents. However, prevalent N-ToM benchmarks have several shortcomings, including the presence of ambiguous and artificial narratives, absence of personality traits and preferences, a lack of questions addressing characters' psychological mental states, and limited diversity in the questions posed. In response to these issues, we construct OpenToM, a new benchmark for assessing N-ToM with (1) longer and clearer narrative stories, (2) characters with explicit personality traits, (3) actions that are triggered by character intentions, and (4) questions designed to challenge LLMs' capabilities of modeling characters' mental states of both the physical and psychological world. Using OpenToM, we reveal that state-of-the-art LLMs thrive at modeling certain aspects of mental states in the physical world but fall short when tracking characters' mental states in the psychological world.

Updated: 2024-06-03 10:48:16

标题: OpenToM：用于评估大型语言模型理论心智推理能力的综合基准

摘要: 神经心灵理论（N-ToM）是机器理解和跟踪他人心理状态的能力，在发展具有社会智能的代理人方面起着关键作用。然而，目前普遍的N-ToM基准存在几个缺点，包括具有模糊和人为叙述、缺乏个性特征和偏好、缺乏涉及角色心理状态的问题以及问题提出方面的多样性有限。为了解决这些问题，我们构建了OpenToM，这是一个用于评估N-ToM的新基准，具有（1）更长更清晰的叙述故事、（2）带有明确个性特征的角色、（3）由角色意图触发的行为，以及（4）旨在挑战LLM模型角色心理状态建模能力的问题。通过使用OpenToM，我们发现最先进的LLM在模拟物理世界中的某些心理状态方面表现出色，但在跟踪角色在心理世界中的心理状态方面表现不佳。

更新时间: 2024-06-03 10:48:16

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.06044v3

Eye-tracking in Mixed Reality for Diagnosis of Neurodegenerative Diseases

Parkinson's disease ranks as the second most prevalent neurodegenerative disorder globally. This research aims to develop a system leveraging Mixed Reality capabilities for tracking and assessing eye movements. In this paper, we present a medical scenario and outline the development of an application designed to capture eye-tracking signals through Mixed Reality technology for the evaluation of neurodegenerative diseases. Additionally, we introduce a pipeline for extracting clinically relevant features from eye-gaze analysis, describing the capabilities of the proposed system from a medical perspective. The study involved a cohort of healthy control individuals and patients suffering from Parkinson's disease, showcasing the feasibility and potential of the proposed technology for non-intrusive monitoring of eye movement patterns for the diagnosis of neurodegenerative diseases. Clinical relevance - Developing a non-invasive biomarker for Parkinson's disease is urgently needed to accurately detect the disease's onset. This would allow for the timely introduction of neuroprotective treatment at the earliest stage and enable the continuous monitoring of intervention outcomes. The ability to detect subtle changes in eye movements allows for early diagnosis, offering a critical window for intervention before more pronounced symptoms emerge. Eye tracking provides objective and quantifiable biomarkers, ensuring reliable assessments of disease progression and cognitive function. The eye gaze analysis using Mixed Reality glasses is wireless, facilitating convenient assessments in both home and hospital settings. The approach offers the advantage of utilizing hardware that requires no additional specialized attachments, enabling examinations through personal eyewear.

Updated: 2024-06-03 10:45:42

标题: 眼动追踪在混合现实中用于诊断神经退行性疾病

摘要: 帕金森病在全球神经退行性疾病中排名第二。本研究旨在开发一种利用混合现实技术进行眼动追踪和评估的系统。本文介绍了一个医学场景，并概述了一个旨在通过混合现实技术捕捉眼动信号以评估神经退行性疾病的应用程序的开发过程。此外，我们介绍了一个从眼注视分析中提取临床相关特征的流程，从医学角度描述了所提出系统的能力。该研究涉及一组健康对照个体和患有帕金森病的患者，展示了所提出技术在非侵入性监测眼动模式以诊断神经退行性疾病方面的可行性和潜力。临床相关性-迫切需要开发一种无创生物标志物来准确检测帕金森病的发病。这将允许及时介入神经保护性治疗，并在最早期阶段实现持续监测干预结果。检测眼动中微变化的能力允许早期诊断，为更显著症状出现前的干预提供了关键时机。眼动追踪提供客观和可量化的生物标志物，确保可靠评估疾病进展和认知功能。使用混合现实眼镜进行眼注视分析是无线的，便于在家庭和医院环境中进行方便的评估。这种方法的优势在于利用不需要额外专门附件的硬件，使得可以通过个人眼镜进行检查。

更新时间: 2024-06-03 10:45:42

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2404.12984v2

Functional Programming Paradigm of Python for Scientific Computation Pipeline Integration

The advent of modern data processing has led to an increasing tendency towards interdisciplinarity, which frequently involves the importation of different technical approaches. Consequently, there is an urgent need for a unified data control system to facilitate the integration of varying libraries. This integration is of profound significance in accelerating prototype verification, optimising algorithm performance and minimising maintenance costs. This paper presents a novel functional programming (FP) paradigm based on the Python architecture and associated suites in programming practice, designed for the integration of pipelines of different data mapping operations. In particular, the solution is intended for the integration of scientific computation flows, which affords a robust yet flexible solution for the aforementioned challenges.

Updated: 2024-06-03 10:42:50

标题: Python的函数式编程范式在科学计算流程集成中的应用

摘要: 现代数据处理的出现导致了跨学科研究的增加趋势，这往往涉及不同技术方法的引入。因此，迫切需要一个统一的数据控制系统，以促进不同库的集成。这种集成在加速原型验证、优化算法性能和减少维护成本方面具有深远意义。本文介绍了一种基于Python架构和相关套件的新颖的函数式编程（FP）范式，旨在实现不同数据映射操作流水线的集成。特别是，该解决方案旨在集成科学计算流程，为上述挑战提供了强大而灵活的解决方案。

更新时间: 2024-06-03 10:42:50

领域: cs.LG,cs.AI,cs.CE,cs.PL,cs.SE

下载: http://arxiv.org/abs/2405.16956v2

Automatic Input Feature Relevance via Spectral Neural Networks

Working with high-dimensional data is a common practice, in the field of machine learning. Identifying relevant input features is thus crucial, so as to obtain compact dataset more prone for effective numerical handling. Further, by isolating pivotal elements that form the basis of decision making, one can contribute to elaborate on - ex post - models' interpretability, so far rather elusive. Here, we propose a novel method to estimate the relative importance of the input components for a Deep Neural Network. This is achieved by leveraging on a spectral re-parametrization of the optimization process. Eigenvalues associated to input nodes provide in fact a robust proxy to gauge the relevance of the supplied entry features. Unlike existing techniques, the spectral features ranking is carried out automatically, as a byproduct of the network training. The technique is successfully challenged against both synthetic and real data.

Updated: 2024-06-03 10:39:12

标题: 利用谱神经网络实现自动输入特征相关性

摘要: 处理高维数据是机器学习领域的常见实践。因此，识别相关的输入特征至关重要，以获得更易于有效数值处理的紧凑数据集。此外，通过隔离构成决策基础的关键元素，可以有助于对模型的解释性进行详细探讨，迄今为止相当难以捉摸。在这里，我们提出一种新颖的方法来估计深度神经网络中输入组件的相对重要性。这是通过利用优化过程的谱重新参数化实现的。与输入节点相关联的特征值实际上为衡量提供的输入特征的相关性提供了强有力的代理。与现有技术不同，谱特征排名是自动进行的，作为网络训练的副产品。该技术已成功地应用于合成数据和实际数据的挑战。

更新时间: 2024-06-03 10:39:12

领域: cs.LG,cond-mat.dis-nn,cond-mat.stat-mech,cs.AI

下载: http://arxiv.org/abs/2406.01183v1

SyntaxShap: Syntax-aware Explainability Method for Text Generation

To harness the power of large language models in safety-critical domains, we need to ensure the explainability of their predictions. However, despite the significant attention to model interpretability, there remains an unexplored domain in explaining sequence-to-sequence tasks using methods tailored for textual data. This paper introduces SyntaxShap, a local, model-agnostic explainability method for text generation that takes into consideration the syntax in the text data. The presented work extends Shapley values to account for parsing-based syntactic dependencies. Taking a game theoric approach, SyntaxShap only considers coalitions constraint by the dependency tree. We adopt a model-based evaluation to compare SyntaxShap and its weighted form to state-of-the-art explainability methods adapted to text generation tasks, using diverse metrics including faithfulness, coherency, and semantic alignment of the explanations to the model. We show that our syntax-aware method produces explanations that help build more faithful and coherent explanations for predictions by autoregressive models. Confronted with the misalignment of human and AI model reasoning, this paper also highlights the need for cautious evaluation strategies in explainable AI.

Updated: 2024-06-03 10:30:00

标题: SyntaxShap: 一种面向文本生成的基于语法的可解释性方法

摘要: 为了在安全关键领域利用大型语言模型的力量，我们需要确保其预测的可解释性。然而，尽管对模型可解释性的关注很大，但仍然存在一个未被探索的领域，即使用针对文本数据定制的方法解释序列到序列任务。本文介绍了SyntaxShap，这是一种用于文本生成的本地、与模型无关的解释性方法，考虑了文本数据中的句法。所提出的工作将Shapley值扩展为考虑基于解析的句法依赖关系。采用博弈论方法，SyntaxShap仅考虑由依赖树约束的联盟。我们采用基于模型的评估方法，比较SyntaxShap及其加权形式与适用于文本生成任务的最先进的可解释性方法，在包括忠实度、连贯性和解释与模型的语义对齐等多样化指标下。我们展示了我们的句法感知方法产生的解释有助于通过自回归模型构建更忠实和连贯的解释。面对人类和AI模型推理不一致的情况，本文还强调了在可解释的人工智能中需要谨慎的评估策略。

更新时间: 2024-06-03 10:30:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.09259v2

Estimating the normal-inverse-Wishart distribution

The normal-inverse-Wishart (NIW) distribution is commonly used as a prior distribution for the mean and covariance parameters of a multivariate normal distribution. The family of NIW distributions is also a minimal exponential family. In this short note we describe a convergent procedure for converting from mean parameters to natural parameters in the NIW family, or -- equivalently -- for performing maximum likelihood estimation of the natural parameters given observed sufficient statistics. This is needed, for example, when using a NIW base family in expectation propagation.

Updated: 2024-06-03 10:26:10

标题: 估计正态逆-Wishart分布

摘要: 正态逆-维沙特（NIW）分布通常被用作多元正态分布的均值和协方差参数的先验分布。NIW分布族也是一个最小指数族。在这个简短的说明中，我们描述了一个收敛程序，用于将均值参数转换为NIW族中的自然参数，或者等效地，用于给定观察到的充分统计量执行自然参数的最大似然估计。例如，在使用NIW基础族进行期望传播时，这是必要的。

更新时间: 2024-06-03 10:26:10

领域: math.ST,cs.LG,stat.ML,stat.TH

下载: http://arxiv.org/abs/2405.16088v2

Graph-based Forecasting with Missing Data through Spatiotemporal Downsampling

Given a set of synchronous time series, each associated with a sensor-point in space and characterized by inter-series relationships, the problem of spatiotemporal forecasting consists of predicting future observations for each point. Spatiotemporal graph neural networks achieve striking results by representing the relationships across time series as a graph. Nonetheless, most existing methods rely on the often unrealistic assumption that inputs are always available and fail to capture hidden spatiotemporal dynamics when part of the data is missing. In this work, we tackle this problem through hierarchical spatiotemporal downsampling. The input time series are progressively coarsened over time and space, obtaining a pool of representations that capture heterogeneous temporal and spatial dynamics. Conditioned on observations and missing data patterns, such representations are combined by an interpretable attention mechanism to generate the forecasts. Our approach outperforms state-of-the-art methods on synthetic and real-world benchmarks under different missing data distributions, particularly in the presence of contiguous blocks of missing values.

Updated: 2024-06-03 10:26:05

标题: 基于图的空间时间下采样方法进行缺失数据的预测

摘要: 给定一组同步时间序列，每个序列与空间中的传感器点相关联，并通过序列间关系进行表征，空间时间预测问题包括预测每个点的未来观测。空间时间图神经网络通过将时间序列之间的关系表示为图来取得惊人的结果。然而，大多数现有方法依赖于通常不现实的假设，即输入始终可用，并且在数据部分丢失时无法捕捉隐藏的空间时间动态。在这项工作中，我们通过分层空间时间下采样来解决这个问题。输入的时间序列随着时间和空间的逐渐粗化，获得一个捕捉异质时间和空间动态的表示池。在观测和丢失数据模式的条件下，这种表示通过可解释的注意机制组合起来生成预测。我们的方法在不同缺失数据分布下的合成和真实世界基准测试中表现优异，特别是在存在连续缺失值块的情况下。

更新时间: 2024-06-03 10:26:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.10634v2

Representation Surgery: Theory and Practice of Affine Steering

Language models often exhibit undesirable behavior, e.g., generating toxic or gender-biased text. In the case of neural language models, an encoding of the undesirable behavior is often present in the model's representations. Thus, one natural (and common) approach to prevent the model from exhibiting undesirable behavior is to steer the model's representations in a manner that reduces the probability of it generating undesirable text. This paper investigates the formal and empirical properties of steering functions, i.e., transformation of the neural language model's representations that alter its behavior. First, we derive two optimal, in the least-squares sense, affine steering functions under different constraints. Our theory provides justification for existing approaches and offers a novel, improved steering approach. Second, we offer a series of experiments that demonstrate the empirical effectiveness of the methods in mitigating bias and reducing toxic generation.

Updated: 2024-06-03 10:24:22

标题: 表征手术：仿射导向的理论与实践

摘要: 语言模型通常表现出不良行为，例如生成有毒或性别偏见的文本。在神经语言模型的情况下，不良行为的编码通常存在于模型的表示中。因此，防止模型表现出不良行为的一个自然（和常见）方法是引导模型的表示，以减少其生成不良文本的概率。本文研究了引导函数的形式和经验特性，即改变神经语言模型表示的转换，以改变其行为。首先，我们在不同约束条件下推导了两个最优的、在最小二乘意义下的仿射引导函数。我们的理论为现有方法提供了理论基础，并提供了一种新的、改进的引导方法。其次，我们进行了一系列实验证明这些方法在减轻偏见和减少有毒生成方面的实证有效性。

更新时间: 2024-06-03 10:24:22

领域: cs.LG,cs.CL,cs.CY

下载: http://arxiv.org/abs/2402.09631v3

Blockchain in Healthcare and Medicine: A Contemporary Research of Applications, Challenges, and Future Perspectives

Blockchain technology is one of the most contemporary and disruptive technologies in the world. It has gained considerable attention in numerous applications such as financial services, cybersecurity applications, Internet of Things (IoT), network data management. Now its range of applications is beyond the financial services as the healthcare industry has also adopted blockchain technology in its various subdomains such as Electronic Health Records (EHR), medical supply chain management system, genomic market, neuroscience technology, clinical research, and pharmaceutical medicine. Blockchain is considered a secure and viable solution for storing and accessing patients medical records and the patients can diagnosed and treated with safe and secure data sharing. Blockchain technology will revolutionize the healthcare systems with personalized, authentic, and secure access to the clinical data of patients and that data can be used for further health improvements and clinical researches. In this paper, we conduct a contemporary research on existing applications and developments in healthcare industry with the use of blockchain technology. We also discuss some robust applications and various existing companies that are using blockchain solutions for securing their data along with some current challenges and future perspectives.

Updated: 2024-06-03 10:23:33

标题: 区块链在医疗保健和医学领域的应用、挑战和未来展望的当代研究

摘要: 区块链技术是世界上最当代和颠覆性的技术之一。它在诸多领域引起了相当大的关注，例如金融服务、网络安全应用、物联网、网络数据管理等。现在，其应用范围已超越金融服务，医疗保健行业也在其各个子领域中采用了区块链技术，如电子健康记录、医疗供应链管理系统、基因市场、神经科学技术、临床研究和制药医学。区块链被认为是存储和访问患者医疗记录的安全可行解决方案，患者可以通过安全的数据共享进行诊断和治疗。区块链技术将通过为患者提供个性化、真实和安全的临床数据访问，从而改变医疗保健系统，并且这些数据可以用于进一步的健康改进和临床研究。在本文中，我们对医疗保健行业中利用区块链技术进行的现有应用和发展进行了当代研究。我们还讨论了一些强大的应用程序和一些正在使用区块链解决方案保护其数据的现有公司，以及一些当前的挑战和未来的展望。

更新时间: 2024-06-03 10:23:33

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2004.06795v3

Are AI-Generated Text Detectors Robust to Adversarial Perturbations?

The widespread use of large language models (LLMs) has sparked concerns about the potential misuse of AI-generated text, as these models can produce content that closely resembles human-generated text. Current detectors for AI-generated text (AIGT) lack robustness against adversarial perturbations, with even minor changes in characters or words causing a reversal in distinguishing between human-created and AI-generated text. This paper investigates the robustness of existing AIGT detection methods and introduces a novel detector, the Siamese Calibrated Reconstruction Network (SCRN). The SCRN employs a reconstruction network to add and remove noise from text, extracting a semantic representation that is robust to local perturbations. We also propose a siamese calibration technique to train the model to make equally confidence predictions under different noise, which improves the model's robustness against adversarial perturbations. Experiments on four publicly available datasets show that the SCRN outperforms all baseline methods, achieving 6.5\%-18.25\% absolute accuracy improvement over the best baseline method under adversarial attacks. Moreover, it exhibits superior generalizability in cross-domain, cross-genre, and mixed-source scenarios. The code is available at \url{https://github.com/CarlanLark/Robust-AIGC-Detector}.

Updated: 2024-06-03 10:21:48

标题: AI生成的文本检测器对敌对扰动具有鲁棒性吗？

摘要: 广泛使用大型语言模型(LLMs)引发了对人工智能生成文本潜在滥用的担忧，因为这些模型可以生成与人类生成文本非常相似的内容。目前用于检测人工智能生成文本(AIGT)的方法在面对对抗性扰动时缺乏鲁棒性，即使是字符或单词的微小变化也会导致在区分人类创建和人工智能生成文本之间发生逆转。本文研究了现有AIGT检测方法的鲁棒性，并引入了一种新颖的检测器，即Siamese Calibrated Reconstruction Network (SCRN)。SCRN利用重建网络向文本添加和删除噪声，提取出对局部扰动具有鲁棒性的语义表示。我们还提出了一种Siamese校准技术，训练模型在不同噪声下做出同样自信的预测，从而提高模型对对抗性扰动的鲁棒性。在四个公开可用的数据集上进行的实验表明，SCRN优于所有基线方法，在对抗性攻击下比最佳基线方法提高了6.5\%-18.25\%的绝对准确率。此外，它在跨领域、跨体裁和混合资源场景中表现出卓越的泛化能力。代码可在\url{https://github.com/CarlanLark/Robust-AIGC-Detector}找到。

更新时间: 2024-06-03 10:21:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.01179v1

Examining properness in the external validation of survival models with squared and logarithmic losses

Scoring rules promote rational and honest decision-making, which is becoming increasingly important for automated procedures in `auto-ML'. In this paper we survey common squared and logarithmic scoring rules for survival analysis and determine which losses are proper and improper. We prove that commonly utilised squared and logarithmic scoring rules that are claimed to be proper are in fact improper, such as the Integrated Survival Brier Score (ISBS). We further prove that under a strict set of assumptions a class of scoring rules is strictly proper for, what we term, `approximate' survival losses. Despite the difference in properness, experiments in simulated and real-world datasets show there is no major difference between improper and proper versions of the widely-used ISBS, ensuring that we can reasonably trust previous experiments utilizing the original score for evaluation purposes. We still advocate for the use of proper scoring rules, as even minor differences between losses can have important implications in automated processes such as model tuning. We hope our findings encourage further research into the properties of survival measures so that robust and honest evaluation of survival models can be achieved.

Updated: 2024-06-03 10:16:12

标题: 审视在使用平方和对数损失函数进行生存模型外部验证中的适当性

摘要: 打分规则促进理性和诚实的决策，这对于自动化程序在“auto-ML”中变得越来越重要。在本文中，我们对用于生存分析的常见平方和对数得分规则进行调查，并确定哪些损失是适当的和不适当的。我们证明了常用的被声称为适当的平方和对数得分规则实际上是不适当的，例如集成生存Brier分数（ISBS）。我们进一步证明，在一组严格的假设下，一类得分规则对于我们所谓的“近似”生存损失是严格适当的。尽管适当性上存在差异，在模拟和真实世界数据集中的实验表明，常用的ISBS的不适当和适当版本之间没有主要差异，确保我们可以合理地信任以前利用原始分数进行评估目的的实验。我们仍然倡导使用适当的评分规则，因为即使损失之间有微小的差异，也可能在诸如模型调整等自动化过程中产生重要影响。我们希望我们的研究结果鼓励进一步研究生存度量的属性，以便实现对生存模型的稳健和真实评估。

更新时间: 2024-06-03 10:16:12

领域: math.ST,cs.LG,stat.AP,stat.TH

下载: http://arxiv.org/abs/2212.05260v2

Robust and Conjugate Gaussian Process Regression

To enable closed form conditioning, a common assumption in Gaussian process (GP) regression is independent and identically distributed Gaussian observation noise. This strong and simplistic assumption is often violated in practice, which leads to unreliable inferences and uncertainty quantification. Unfortunately, existing methods for robustifying GPs break closed-form conditioning, which makes them less attractive to practitioners and significantly more computationally expensive. In this paper, we demonstrate how to perform provably robust and conjugate Gaussian process (RCGP) regression at virtually no additional cost using generalised Bayesian inference. RCGP is particularly versatile as it enables exact conjugate closed form updates in all settings where standard GPs admit them. To demonstrate its strong empirical performance, we deploy RCGP for problems ranging from Bayesian optimisation to sparse variational Gaussian processes.

Updated: 2024-06-03 10:07:13

标题: 强健和共轭高斯过程回归

摘要: 为了实现闭合形式的调节，高斯过程（GP）回归中的一个常见假设是独立且同分布的高斯观测噪声。这种强大而简单的假设在实践中经常被违反，导致推断不可靠并且无法对不确定性进行量化。不幸的是，现有的增强GP方法破坏了闭合形式的调节，这使它们对从业者的吸引力降低并且计算上更加昂贵。在本文中，我们展示了如何使用广义贝叶斯推断以几乎没有额外成本进行可证明的稳健和共轭高斯过程（RCGP）回归。RCGP特别灵活，因为它使标准GP允许的所有情况下的确切共轭闭合形式更新成为可能。为了展示其强大的实证性能，我们将RCGP用于从贝叶斯优化到稀疏变分高斯过程等各种问题。

更新时间: 2024-06-03 10:07:13

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2311.00463v2

How Ethical Should AI Be? How AI Alignment Shapes the Risk Preferences of LLMs

This study explores the risk preferences of Large Language Models (LLMs) and how the process of aligning them with human ethical standards influences their economic decision-making. By analyzing 30 LLMs, we uncover a broad range of inherent risk profiles ranging from risk-averse to risk-seeking. We then explore how different types of AI alignment, a process that ensures models act according to human values and that focuses on harmlessness, helpfulness, and honesty, alter these base risk preferences. Alignment significantly shifts LLMs towards risk aversion, with models that incorporate all three ethical dimensions exhibiting the most conservative investment behavior. Replicating a prior study that used LLMs to predict corporate investments from company earnings call transcripts, we demonstrate that although some alignment can improve the accuracy of investment forecasts, excessive alignment results in overly cautious predictions. These findings suggest that deploying excessively aligned LLMs in financial decision-making could lead to severe underinvestment. We underline the need for a nuanced approach that carefully balances the degree of ethical alignment with the specific requirements of economic domains when leveraging LLMs within finance.

Updated: 2024-06-03 10:05:25

标题: 人工智能应该有多少道德？ LLMs的风险偏好受到AI对齐的影响

摘要: 这项研究探讨了大型语言模型（LLMs）的风险偏好，以及将它们与人类道德标准对齐的过程如何影响它们的经济决策。通过分析30个LLMs，我们揭示了从风险规避到风险追求的广泛固有风险配置。然后，我们探讨了不同类型的人工智能对齐如何改变这些基础风险偏好，这是一个确保模型按照人类价值观行事并专注于无害性、有益性和诚实性的过程。对齐显著将LLMs转向风险规避，其中结合了所有三个道德维度的模型表现出最保守的投资行为。复制了一项先前利用LLMs从公司收益电话转录中预测企业投资的研究，我们证明了一些对齐可以提高投资预测的准确性，但过度对齐会导致过于谨慎的预测。这些发现表明，在金融决策中部署过度对齐的LLMs可能导致严重的投资不足。我们强调了在金融领域利用LLMs时需要谨慎平衡道德对齐程度与特定经济需求的需要。

更新时间: 2024-06-03 10:05:25

领域: econ.GN,cs.AI,cs.CY,cs.ET,cs.HC,q-fin.EC

下载: http://arxiv.org/abs/2406.01168v1

SPEAR:Exact Gradient Inversion of Batches in Federated Learning

Federated learning is a framework for collaborative machine learning where clients only share gradient updates and not their private data with a server. However, it was recently shown that gradient inversion attacks can reconstruct this data from the shared gradients. In the important honest-but-curious setting, existing attacks enable exact reconstruction only for a batch size of $b=1$, with larger batches permitting only approximate reconstruction. In this work, we propose SPEAR, the first algorithm reconstructing whole batches with $b >1$ exactly. SPEAR combines insights into the explicit low-rank structure of gradients with a sampling-based algorithm. Crucially, we leverage ReLU-induced gradient sparsity to precisely filter out large numbers of incorrect samples, making a final reconstruction step tractable. We provide an efficient GPU implementation for fully connected networks and show that it recovers high-dimensional ImageNet inputs in batches of up to $b \lesssim 25$ exactly while scaling to large networks. Finally, we show theoretically that much larger batches can be reconstructed with high probability given exponential time.

Updated: 2024-06-03 09:55:44

标题: SPEAR：联邦学习中批量精确梯度反转

摘要: 联邦学习是一种协作机器学习框架，其中客户端仅与服务器共享梯度更新，而不共享他们的私人数据。然而，最近发现梯度反演攻击可以从共享的梯度中重建数据。在重要的诚实但好奇的情况下，现有的攻击仅允许在批量大小为$b=1$时进行精确重建，而批量更大仅允许近似重建。在这项工作中，我们提出了SPEAR，这是第一个能够精确重建批量大小为$b>1$的算法。SPEAR将对梯度的显式低秩结构的洞察力与基于采样的算法相结合。关键的是，我们利用ReLU引起的梯度稀疏性，精确地过滤掉大量不正确的样本，使最终的重建步骤变得可行。我们为全连接网络提供了高效的GPU实现，并展示了它在批量大小为$b\lesssim 25$时可以精确恢复高维度的ImageNet输入，同时可以扩展到大型网络。最后，我们理论上表明，可以在指数时间内以高概率重建更大的批量。

更新时间: 2024-06-03 09:55:44

领域: cs.LG,cs.CR,cs.DC,I.2.11

下载: http://arxiv.org/abs/2403.03945v2

Identifiability of total effects from abstractions of time series causal graphs

We study the problem of identifiability of the total effect of an intervention from observational time series in the situation, common in practice, where one only has access to abstractions of the true causal graph. We consider here two abstractions: the extended summary causal graph, which conflates all lagged causal relations but distinguishes between lagged and instantaneous relations, and the summary causal graph which does not give any indication about the lag between causal relations. We show that the total effect is always identifiable in extended summary causal graphs and provide sufficient conditions for identifiability in summary causal graphs. We furthermore provide adjustment sets allowing to estimate the total effect whenever it is identifiable.

Updated: 2024-06-03 09:55:20

标题: 时间序列因果图抽象中总效应的可识别性

摘要: 我们研究了在观察时间序列中从干预中确定性总效应的可辨识性问题，在实践中常见的情况是，只能访问真实因果图的抽象。我们在这里考虑了两种抽象：扩展摘要因果图，它混淆了所有滞后因果关系，但区分了滞后和瞬时关系；以及摘要因果图，它不提供有关因果关系之间滞后的任何指示。我们表明总效应在扩展摘要因果图中总是可辨识的，并提供了在摘要因果图中的可辨识性的充分条件。此外，我们提供了调整集，允许在总效应可辨识时估计总效应。

更新时间: 2024-06-03 09:55:20

领域: math.ST,cs.AI,stat.TH

下载: http://arxiv.org/abs/2310.14691v7

A survey on multi-player bandits

Due mostly to its application to cognitive radio networks, multiplayer bandits gained a lot of interest in the last decade. A considerable progress has been made on its theoretical aspect. However, the current algorithms are far from applicable and many obstacles remain between these theoretical results and a possible implementation of multiplayer bandits algorithms in real cognitive radio networks. This survey contextualizes and organizes the rich multiplayer bandits literature. In light of the existing works, some clear directions for future research appear. We believe that a further study of these different directions might lead to theoretical algorithms adapted to real-world situations.

Updated: 2024-06-03 09:53:22

标题: 多人游戏赌博的调查

摘要: 由于其在认知无线电网络中的应用，多人游戏者赢得了过去十年的很多关注。在理论方面已取得了相当大的进展。然而，目前的算法远非适用，许多障碍仍然存在于这些理论结果和在现实认知无线电网络中可能实现多人游戏者算法之间。本调查将丰富的多人游戏者文献置于背景中并进行组织。根据现有的研究成果，一些明确的未来研究方向出现了。我们相信对这些不同方向的进一步研究可能会导致适应现实世界情况的理论算法。

更新时间: 2024-06-03 09:53:22

领域: stat.ML,cs.GT,cs.LG

下载: http://arxiv.org/abs/2211.16275v2

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants

Synthetic Minority Oversampling Technique (SMOTE) is a common rebalancing strategy for handling imbalanced tabular data sets. However, few works analyze SMOTE theoretically. In this paper, we prove that SMOTE (with default parameter) simply copies the original minority samples asymptotically. We also prove that SMOTE exhibits boundary artifacts, thus justifying existing SMOTE variants. Then we introduce two new SMOTE-related strategies, and compare them with state-of-the-art rebalancing procedures. Surprisingly, for most data sets, we observe that applying no rebalancing strategy is competitive in terms of predictive performances, with tuned random forests. For highly imbalanced data sets, our new method, named Multivariate Gaussian SMOTE, is competitive. Besides, our analysis sheds some lights on the behavior of common rebalancing strategies, when used in conjunction with random forests.

Updated: 2024-06-03 09:53:06

标题: 我们是否需要重新平衡策略？围绕SMOTE及其变体的理论和实证研究

摘要: 合成少数过采样技术（SMOTE）是处理不平衡表格数据集的常见重新平衡策略。然而，很少有研究从理论上分析SMOTE。在本文中，我们证明SMOTE（使用默认参数）在渐近情况下仅简单复制原始少数样本。我们还证明SMOTE表现出边界伪迹，从而证明了现有的SMOTE变种的存在。然后我们介绍了两种新的与SMOTE相关的策略，并将它们与最先进的重新平衡程序进行比较。令人惊讶的是，对于大多数数据集，我们观察到不应用任何重新平衡策略在预测性能方面与调整后的随机森林具有竞争力。对于高度不平衡的数据集，我们的新方法，名为多变量高斯SMOTE，具有竞争力。此外，我们的分析还对常见的重新平衡策略在与随机森林结合使用时的行为进行了一些阐释。

更新时间: 2024-06-03 09:53:06

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.03819v2

Profile Reconstruction from Private Sketches

Given a multiset of $n$ items from $\mathcal{D}$, the \emph{profile reconstruction} problem is to estimate, for $t = 0, 1, \dots, n$, the fraction $\vec{f}[t]$ of items in $\mathcal{D}$ that appear exactly $t$ times. We consider differentially private profile estimation in a distributed, space-constrained setting where we wish to maintain an updatable, private sketch of the multiset that allows us to compute an approximation of $\vec{f} = (\vec{f}[0], \dots, \vec{f}[n])$. Using a histogram privatized using discrete Laplace noise, we show how to ``reverse'' the noise, using an approach of Dwork et al.~(ITCS '10). We show how to speed up their LP-based technique from polynomial time to $O(d + n \log n)$, where $d = |\mathcal{D}|$, and analyze the achievable error in the $\ell_1$, $\ell_2$ and $\ell_\infty$ norms. In all cases the dependency of the error on $d$ is $O( 1 / \sqrt{d})$ -- we give an information-theoretic lower bound showing that this dependence on $d$ is asymptotically optimal among all private, updatable sketches for the profile reconstruction problem with a high-probability error guarantee.

Updated: 2024-06-03 09:51:28

标题: 从私人草图中重建个人资料

摘要: 给定来自$\mathcal{D}$的$n$个项目的多重集合，\emph{配置重建}问题是估计对于$t = 0, 1, \dots, n$，在$\mathcal{D}$中出现恰好$t$次的项目的分数$\vec{f}[t]$。我们考虑在一个分布式、空间受限的环境中进行差分私密配置估计，我们希望维护一个可更新的、私密的多重集合草图，使我们能够计算$\vec{f} = (\vec{f}[0], \dots, \vec{f}[n])$的近似值。使用使用离散Laplace噪声私密化的直方图，我们展示了如何使用Dwork等人(ITCS '10)的方法“逆转”噪声。我们展示了如何将他们基于LP的技术从多项式时间加速到$O(d + n \log n)$，其中$d = |\mathcal{D}|$，并分析了在$\ell_1$、$\ell_2$和$\ell_\infty$范数中可达到的误差。在所有情况下，误差对$d$的依赖性为$O(1 / \sqrt{d})$ — 我们提供了一个信息论下界，表明这种对$d$的依赖性在所有私密、可更新的草图中是渐近最优的，对于配置重建问题，具有高概率误差保证。

更新时间: 2024-06-03 09:51:28

领域: cs.CR,cs.DS

下载: http://arxiv.org/abs/2406.01158v1

Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics

A key aspect of intelligence is the ability to demonstrate a broad spectrum of behaviors for adapting to unexpected situations. Over the past decade, advancements in deep reinforcement learning have led to groundbreaking achievements to solve complex continuous control tasks. However, most approaches return only one solution specialized for a specific problem. We introduce Quality-Diversity Actor-Critic (QDAC), an off-policy actor-critic deep reinforcement learning algorithm that leverages a value function critic and a successor features critic to learn high-performing and diverse behaviors. In this framework, the actor optimizes an objective that seamlessly unifies both critics using constrained optimization to (1) maximize return, while (2) executing diverse skills. Compared with other Quality-Diversity methods, QDAC achieves significantly higher performance and more diverse behaviors on six challenging continuous control locomotion tasks. We also demonstrate that we can harness the learned skills to adapt better than other baselines to five perturbed environments. Finally, qualitative analyses showcase a range of remarkable behaviors: adaptive-intelligent-robotics.github.io/QDAC.

Updated: 2024-06-03 09:46:32

标题: 质量多样性演员-评论家：通过价值和继承特征评论家学习高性能和多样化行为

摘要: 智能的一个关键方面是展示适应意外情况的广泛行为谱。在过去的十年中，深度强化学习的进展已经导致了突破性的成就，解决了复杂的连续控制任务。然而，大多数方法只返回一个针对特定问题专门化的解决方案。我们引入了Quality-Diversity Actor-Critic（QDAC），这是一个离策略的演员-评论者深度强化学习算法，利用值函数评论者和继承特征评论者来学习高性能和多样化的行为。在这个框架中，演员通过受限制的优化来最大化回报，同时执行多样化的技能，无缝地统一两个评论者。与其他Quality-Diversity方法相比，QDAC在六个具有挑战性的连续控制运动任务上实现了显着更高的性能和更多样化的行为。我们还展示，我们可以利用学到的技能比其他基线更好地适应五个受扰动的环境。最后，定性分析展示了一系列引人注目的行为：adaptive-intelligent-robotics.github.io/QDAC。

更新时间: 2024-06-03 09:46:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.09930v3

Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms

Mixed linear regression is a well-studied problem in parametric statistics and machine learning. Given a set of samples, tuples of covariates and labels, the task of mixed linear regression is to find a small list of linear relationships that best fit the samples. Usually it is assumed that the label is generated stochastically by randomly selecting one of two or more linear functions, applying this chosen function to the covariates, and potentially introducing noise to the result. In that situation, the objective is to estimate the ground-truth linear functions up to some parameter error. The popular expectation maximization (EM) and alternating minimization (AM) algorithms have been previously analyzed for this. In this paper, we consider the more general problem of agnostic learning of mixed linear regression from samples, without such generative models. In particular, we show that the AM and EM algorithms, under standard conditions of separability and good initialization, lead to agnostic learning in mixed linear regression by converging to the population loss minimizers, for suitably defined loss functions. In some sense, this shows the strength of AM and EM algorithms that converges to ``optimal solutions'' even in the absence of realizable generative models.

Updated: 2024-06-03 09:43:24

标题: 使用EM和AM算法对混合线性回归进行不可知学习

摘要: 混合线性回归是参数统计学和机器学习中一个经过深入研究的问题。给定一组样本，由自变量和标签组成的元组，混合线性回归的任务是找到最适合这些样本的少量线性关系。通常假设标签是通过随机选择两个或多个线性函数之一来随机生成的，将所选函数应用于自变量，并可能引入噪声到结果中。在这种情况下，目标是估计地面真实的线性函数，直到某个参数误差。先前已经分析过流行的期望最大化（EM）和交替最小化（AM）算法用于此目的。在本文中，我们考虑从样本中对混合线性回归进行不带生成模型的无知学习的更一般问题。特别是，我们展示了在分离性和良好初始化的标准条件下，AM和EM算法通过收敛到人群损失最小化器，对适当定义的损失函数实现了混合线性回归的无知学习。在某种意义上，这展示了AM和EM算法的强大之处，即使在没有可实现的生成模型的情况下，也能收敛到“最优解”。

更新时间: 2024-06-03 09:43:24

领域: stat.ML,cs.AI,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2406.01149v1

Logical Reasoning with Relation Network for Inductive Knowledge Graph Completion

Inductive knowledge graph completion (KGC) aims to infer the missing relation for a set of newly-coming entities that never appeared in the training set. Such a setting is more in line with reality, as real-world KGs are constantly evolving and introducing new knowledge. Recent studies have shown promising results using message passing over subgraphs to embed newly-coming entities for inductive KGC. However, the inductive capability of these methods is usually limited by two key issues. (i) KGC always suffers from data sparsity, and the situation is even exacerbated in inductive KGC where new entities often have few or no connections to the original KG. (ii) Cold-start problem. It is over coarse-grained for accurate KG reasoning to generate representations for new entities by gathering the local information from few neighbors. To this end, we propose a novel iNfOmax RelAtion Network, namely NORAN, for inductive KG completion. It aims to mine latent relation patterns for inductive KG completion. Specifically, by centering on relations, NORAN provides a hyper view towards KG modeling, where the correlations between relations can be naturally captured as entity-independent logical evidence to conduct inductive KGC. Extensive experiment results on five benchmarks show that our framework substantially outperforms the state-of-the-art KGC methods.

Updated: 2024-06-03 09:30:43

标题: 用关系网络进行归纳式知识图完成的逻辑推理

摘要: 感应知识图谱完成（KGC）旨在推断一组从未出现在训练集中的新实体的缺失关系。这种设置更符合现实，因为现实世界的知识图谱不断发展并引入新知识。最近的研究表明，使用子图上的消息传递来嵌入新出现的实体以进行感应KGC可以取得有希望的结果。然而，这些方法的感应能力通常受到两个关键问题的限制。（i）KGC总是受到数据稀疏性的影响，在感应KGC中情况甚至更加严重，因为新实体通常与原始知识图谱之间几乎没有连接。（ii）冷启动问题。通过从少数邻居收集本地信息生成新实体的表示对于准确的知识图谱推理来说过于粗粒度。因此，我们提出了一种新颖的感应KG完成模型iNfOmax RelAtion Network，即NORAN。它旨在挖掘用于感应KG完成的潜在关系模式。具体来说，通过聚焦关系，NORAN提供了一种超级视图来建模知识图谱，其中关系之间的相关性可以自然地被捕捉为与实体无关的逻辑证据，以进行感应KGC。在五个基准测试上的大量实验结果表明，我们的框架大大优于最先进的KGC方法。

更新时间: 2024-06-03 09:30:43

领域: cs.AI

下载: http://arxiv.org/abs/2406.01140v1

Depth-Bounded Epistemic Planning

In this paper, we propose a novel algorithm for epistemic planning based on dynamic epistemic logic (DEL). The novelty is that we limit the depth of reasoning of the planning agent to an upper bound b, meaning that the planning agent can only reason about higher-order knowledge to at most (modal) depth b. The algorithm makes use of a novel type of canonical b-bisimulation contraction guaranteeing unique minimal models with respect to b-bisimulation. We show our depth-bounded planning algorithm to be sound. Additionally, we show it to be complete with respect to planning tasks having a solution within bound b of reasoning depth (and hence the iterative bound-deepening variant is complete in the standard sense). For bound b of reasoning depth, the algorithm is shown to be (b + 1)-EXPTIME complete, and furthermore fixed-parameter tractable in the number of agents and atoms. We present both a tree search and a graph search variant of the algorithm, and we benchmark an implementation of the tree search version against a baseline epistemic planner.

Updated: 2024-06-03 09:30:28

标题: 有界深度认知规划

摘要: 本文提出了一种基于动态认知逻辑（DEL）的认知规划的新算法。其创新之处在于将规划代理的推理深度限制为上限b，这意味着规划代理只能推理到最多（模态）深度b的高阶知识。该算法利用了一种新型的规范b-双模拟收缩保证，确保了相对于b-双模拟的唯一最小模型。我们证明了我们的深度有界规划算法是正确的。此外，我们展示了对于在推理深度边界b内有解的规划任务，算法是完备的（因此迭代边界加深变体在标准意义上是完备的）。对于推理深度边界b，该算法被证明是（b + 1）-EXPTIME完备，并且在代理和原子数量的固定参数中是可处理的。我们提出了算法的树搜索和图搜索变体，并将树搜索版本的实现与基准认知规划器进行了基准测试。

更新时间: 2024-06-03 09:30:28

领域: cs.AI

下载: http://arxiv.org/abs/2406.01139v1

Asset-centric Threat Modeling for AI-based Systems

Threat modeling is a popular method to securely develop systems by achieving awareness of potential areas of future damage caused by adversaries. However, threat modeling for systems relying on Artificial Intelligence is still not well explored. While conventional threat modeling methods and tools did not address AI-related threats, research on this amalgamation still lacks solutions capable of guiding and automating the process, as well as providing evidence that the methods hold up in practice. Consequently, this paper presents ThreatFinderAI, an approach and tool providing guidance and automation to model AI-related assets, threats, countermeasures, and quantify residual risks. To evaluate the practicality of the approach, participants were tasked to recreate a threat model developed by cybersecurity experts of an AI-based healthcare platform. Secondly, the approach was used to identify and discuss strategic risks in an LLM-based application through a case study. Overall, the solution's usability was well-perceived and effectively supports threat identification and risk discussion.

Updated: 2024-06-03 09:30:24

标题: 基于资产的威胁建模对于基于人工智能系统很重要

摘要: 威胁建模是一种流行的方法，通过意识到对手可能造成的未来损害的潜在领域来安全地开发系统。然而，对于依赖人工智能的系统的威胁建模仍未得到很好的探讨。虽然传统的威胁建模方法和工具并未涉及人工智能相关的威胁，但对这种融合的研究仍缺乏能够指导和自动化过程的解决方案，以及提供这些方法在实践中是否有效的证据。因此，本文提出了ThreatFinderAI，一种提供指导和自动化以建模人工智能相关资产、威胁、对策并量化剩余风险的方法和工具。为了评估该方法的实用性，参与者被要求重新创建一个由人工智能健康平台的网络安全专家开发的威胁模型。其次，该方法被用于通过案例研究识别和讨论基于LLM的应用程序中的战略风险。总的来说，该解决方案的可用性得到了良好的认可，并有效地支持威胁识别和风险讨论。

更新时间: 2024-06-03 09:30:24

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2403.06512v2

The Role of Learning Algorithms in Collective Action

Collective action in machine learning is the study of the control that a coordinated group can have over machine learning algorithms. While previous research has concentrated on assessing the impact of collectives against Bayes~(sub)-optimal classifiers, this perspective is limited in that it does not account for the choice of learning algorithm. Classifiers seldom behave like Bayes classifiers and are influenced by the choice of learning algorithms along with their inherent biases. In this work, we initiate the study of how the choice of the learning algorithm plays a role in the success of a collective in practical settings. Specifically, we focus on distributionally robust optimization (DRO), popular for improving a worst group error, and on the ubiquitous stochastic gradient descent (SGD), due to its inductive bias for "simpler" functions. Our empirical results, supported by a theoretical foundation, show that the effective size and success of the collective are highly dependent on properties of the learning algorithm. This highlights the necessity of taking the learning algorithm into account when studying the impact of collective action in machine learning.

Updated: 2024-06-03 09:27:03

标题: 学习算法在集体行动中的作用

摘要: 机器学习中的集体行动是研究协调群体对机器学习算法的控制能力。以往的研究主要集中在评估集体对贝叶斯（次）最优分类器的影响，但这种观点存在局限性，因为它没有考虑学习算法的选择。分类器很少表现得像贝叶斯分类器，它们受学习算法选择以及固有偏见的影响。在这项工作中，我们开始研究学习算法的选择在实际环境中对集体成功的影响。具体而言，我们关注分布鲁棒优化（DRO），用于改进最差组错误，以及普遍使用的随机梯度下降（SGD），因为它对“简单”函数具有归纳偏见。我们的实证结果，得到理论基础支持，表明集体的有效规模和成功程度在很大程度上取决于学习算法的特性。这强调了在研究机器学习中的集体行动影响时，考虑学习算法的必要性。

更新时间: 2024-06-03 09:27:03

领域: cs.LG,cs.CY,stat.ML

下载: http://arxiv.org/abs/2405.06582v2

The Danger Within: Insider Threat Modeling Using Business Process Models

Threat modeling has been successfully applied to model technical threats within information systems. However, a lack of methods focusing on non-technical assets and their representation can be observed in theory and practice. Following the voices of industry practitioners, this paper explored how to model insider threats based on business process models. Hence, this study developed a novel insider threat knowledge base and a threat modeling application that leverages Business Process Modeling and Notation (BPMN). Finally, to understand how well the theoretic knowledge and its prototype translate into practice, the study conducted a real-world case study of an IT provider's business process and an experimental deployment for a real voting process. The results indicate that even without annotation, BPMN diagrams can be leveraged to automatically identify insider threats in an organization.

Updated: 2024-06-03 09:26:53

标题: 内部威胁建模：利用业务流程模型

摘要: 威胁建模已成功应用于对信息系统中的技术威胁进行建模。然而，理论和实践中缺乏专注于非技术资产及其表示的方法。本文根据行业从业者的声音，探讨了如何基于业务流程模型对内部威胁进行建模。因此，本研究开发了一种新颖的内部威胁知识库和威胁建模应用程序，利用了业务流程建模和符号化（BPMN）。最后，为了了解理论知识及其原型如何转化为实践，本研究对IT服务提供商的业务流程进行了真实案例研究，并对一个真实投票流程进行了实验部署。结果表明，即使没有注释，BPMN图表也可以用于自动识别组织内部的威胁。

更新时间: 2024-06-03 09:26:53

领域: cs.CR

下载: http://arxiv.org/abs/2406.01135v1

Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation

Generative AI systems have become ubiquitous for all kinds of modalities, which makes the issue of the evaluation of such models more pressing. One popular approach is preference ratings, where the generated outputs of different systems are shown to evaluators who choose their preferences. In recent years the field shifted towards the development of automated (trained) metrics to assess generated outputs, which can be used to create preference ratings automatically. In this work, we investigate the evaluation of the metrics themselves, which currently rely on measuring the correlation to human judgments or computing sign accuracy scores. These measures only assess how well the metric agrees with the human ratings. However, our research shows that this does not tell the whole story. Most metrics exhibit a disagreement with human system assessments which is often skewed in favor of particular text generation systems, exposing a degree of favoritism in automated metrics. This paper introduces a formal definition of favoritism in preference metrics, and derives the Favi-Score, which measures this phenomenon. In particular we show that favoritism is strongly related to errors in final system rankings. Thus, we propose that preference-based metrics ought to be evaluated on both sign accuracy scores and favoritism.

Updated: 2024-06-03 09:20:46

标题: Favi-Score：用于生成式人工智能评估中偏袒评分的衡量标准

摘要: 生成式AI系统已经成为各种模态的常态，这使得评估这些模型的问题变得更加紧迫。一种流行的方法是偏好评分，其中将不同系统生成的输出展示给评估者，评估者选择他们的偏好。近年来，该领域转向开发自动（训练的）指标来评估生成的输出，这些指标可以用于自动创建偏好评分。在这项工作中，我们调查了指标本身的评估，目前这些指标依赖于衡量与人类判断的相关性或计算符号准确度分数。这些衡量只评估指标与人类评分的一致性。然而，我们的研究表明这并不能讲述全部故事。大多数指标展示出与人类系统评估的分歧，这往往有利于特定的文本生成系统，暴露了自动化指标中的偏爱程度。本文介绍了偏好指标中偏爱的正式定义，并推导出度量这一现象的Favi-Score。特别地，我们展示了偏爱与最终系统排名中的错误之间的强相关性。因此，我们提出偏好基于的指标应该在符号准确度分数和偏爱方面进行评估。

更新时间: 2024-06-03 09:20:46

领域: cs.AI

下载: http://arxiv.org/abs/2406.01131v1

VREM-FL: Mobility-Aware Computation-Scheduling Co-Design for Vehicular Federated Learning

Assisted and autonomous driving are rapidly gaining momentum and will soon become a reality. Artificial intelligence and machine learning are regarded as key enablers thanks to the massive amount of data that smart vehicles will collect from onboard sensors. Federated learning is one of the most promising techniques for training global machine learning models while preserving data privacy of vehicles and optimizing communications resource usage. In this article, we propose vehicular radio environment map federated learning (VREM-FL), a computation-scheduling co-design for vehicular federated learning that combines mobility of vehicles with 5G radio environment maps. VREM-FL jointly optimizes learning performance of the global model and wisely allocates communication and computation resources. This is achieved by orchestrating local computations at the vehicles in conjunction with transmission of their local models in an adaptive and predictive fashion, by exploiting radio channel maps. The proposed algorithm can be tuned to trade training time for radio resource usage. Experimental results demonstrate that VREM-FL outperforms literature benchmarks for both a linear regression model (learning time reduced by 28%) and a deep neural network for semantic image segmentation (doubling the number of model updates within the same time window).

Updated: 2024-06-03 09:15:29

标题: VREM-FL：面向车联网联合学习的移动感知计算调度协同设计

摘要: 辅助和自主驾驶正迅速发展，并将很快成为现实。人工智能和机器学习被视为关键推动因素，这得益于智能车辆将从车载传感器收集的大量数据。联邦学习是训练全局机器学习模型的最有前途的技术之一，同时保护车辆数据隐私并优化通信资源使用。在本文中，我们提出了车载无线环境地图联邦学习（VREM-FL），这是一种将车辆的移动性与5G无线环境地图结合起来的车载联邦学习计算调度共同设计。VREM-FL共同优化全局模型的学习性能，并明智地分配通信和计算资源。通过在车辆进行本地计算，并以自适应和预测方式传输其本地模型，利用无线信道地图来实现。该算法可以调整以在训练时间和无线资源使用之间进行权衡。实验结果表明，VREM-FL在线性回归模型（学习时间减少28%）和用于语义图像分割的深度神经网络方面优于文献基准（在相同时间窗口内将模型更新次数翻倍）。

更新时间: 2024-06-03 09:15:29

领域: eess.SY,cs.AI,cs.DC,cs.LG,cs.SY

下载: http://arxiv.org/abs/2311.18741v2

TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine

Large language models (LLMs) have performed remarkably well in various natural language processing tasks by benchmarking, including in the Western medical domain. However, the professional evaluation benchmarks for LLMs have yet to be covered in the traditional Chinese medicine(TCM) domain, which has a profound history and vast influence. To address this research gap, we introduce TCM-Bench, an comprehensive benchmark for evaluating LLM performance in TCM. It comprises the TCM-ED dataset, consisting of 5,473 questions sourced from the TCM Licensing Exam (TCMLE), including 1,300 questions with authoritative analysis. It covers the core components of TCMLE, including TCM basis and clinical practice. To evaluate LLMs beyond accuracy of question answering, we propose TCMScore, a metric tailored for evaluating the quality of answers generated by LLMs for TCM related questions. It comprehensively considers the consistency of TCM semantics and knowledge. After conducting comprehensive experimental analyses from diverse perspectives, we can obtain the following findings: (1) The unsatisfactory performance of LLMs on this benchmark underscores their significant room for improvement in TCM. (2) Introducing domain knowledge can enhance LLMs' performance. However, for in-domain models like ZhongJing-TCM, the quality of generated analysis text has decreased, and we hypothesize that their fine-tuning process affects the basic LLM capabilities. (3) Traditional metrics for text generation quality like Rouge and BertScore are susceptible to text length and surface semantic ambiguity, while domain-specific metrics such as TCMScore can further supplement and explain their evaluation results. These findings highlight the capabilities and limitations of LLMs in the TCM and aim to provide a more profound assistance to medical research.

Updated: 2024-06-03 09:11:13

标题: TCMBench: 用于评估中医药领域大型语言模型的全面基准

摘要: 大型语言模型（LLMs）在各种自然语言处理任务中表现出色，包括在西方医学领域。然而，LLMs的专业评估基准尚未涵盖传统中医领域，这个领域具有悠久的历史和广泛的影响。为了填补这一研究空白，我们引入了TCM-Bench，这是一个用于评估LLM在中医领域性能的综合基准。它包括TCM-ED数据集，由来自中医执业医师考试（TCMLE）的5,473个问题组成，其中包括1,300个具有权威分析的问题。它涵盖了TCMLE的核心组成部分，包括中医基础和临床实践。为了评估LLMs超越问题回答的准确性，我们提出了TCMScore，这是一个针对评估LLMs生成的中医相关问题答案质量的度量标准。它全面考虑了中医语义和知识的一致性。通过从不同角度进行全面的实验分析，我们得出以下结论：（1）LLMs在这个基准上的表现不尽如人意，突显了它们在中医领域有待改进的重要空间。(2) 引入领域知识可以提高LLMs的性能。然而，对于像ZhongJing-TCM这样的领域内模型，生成的分析文本质量下降了，我们推测它们的微调过程影响了基本LLM的能力。(3) 像Rouge和BertScore这样的文本生成质量传统度量标准容易受到文本长度和表面语义歧义的影响，而领域特定的度量标准如TCMScore可以进一步补充和解释它们的评估结果。这些发现突显了LLMs在中医领域的能力和局限性，并旨在为医学研究提供更深入的帮助。

更新时间: 2024-06-03 09:11:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.01126v1

Learning to optimize with convergence guarantees using nonlinear system theory

The increasing reliance on numerical methods for controlling dynamical systems and training machine learning models underscores the need to devise algorithms that dependably and efficiently navigate complex optimization landscapes. Classical gradient descent methods offer strong theoretical guarantees for convex problems; however, they demand meticulous hyperparameter tuning for non-convex ones. The emerging paradigm of learning to optimize (L2O) automates the discovery of algorithms with optimized performance leveraging learning models and data - yet, it lacks a theoretical framework to analyze convergence of the learned algorithms. In this paper, we fill this gap by harnessing nonlinear system theory. Specifically, we propose an unconstrained parametrization of all convergent algorithms for smooth non-convex objective functions. Notably, our framework is directly compatible with automatic differentiation tools, ensuring convergence by design while learning to optimize.

Updated: 2024-06-03 09:10:27

标题: 学习使用非线性系统理论保证收敛的优化

摘要: 越来越多地依赖数值方法来控制动态系统和训练机器学习模型凸显了需要设计可靠高效地导航复杂优化景观的算法。经典的梯度下降方法在凸问题上提供了强大的理论保证；然而，对于非凸问题，它们需要精心调节超参数。学习优化（L2O）的新兴范式自动发现具有优化性能的算法，利用学习模型和数据 - 然而，它缺乏分析学习算法收敛性的理论框架。在本文中，我们通过利用非线性系统理论来填补这一空白。具体而言，我们提出了对于平滑非凸目标函数的所有收敛算法的无约束参数化。值得注意的是，我们的框架与自动微分工具直接兼容，确保通过设计而学习优化时的收敛。

更新时间: 2024-06-03 09:10:27

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2403.09389v2

A finite operator learning technique for mapping the elastic properties of microstructures to their mechanical deformations

To obtain fast solutions for governing physical equations in solid mechanics, we introduce a method that integrates the core ideas of the finite element method with physics-informed neural networks and concept of neural operators. This approach generalizes and enhances each method, learning the parametric solution for mechanical problems without relying on data from other resources (e.g. other numerical solvers). We propose directly utilizing the available discretized weak form in finite element packages to construct the loss functions algebraically, thereby demonstrating the ability to find solutions even in the presence of sharp discontinuities. Our focus is on micromechanics as an example, where knowledge of deformation and stress fields for a given heterogeneous microstructure is crucial for further design applications. The primary parameter under investigation is the Young's modulus distribution within the heterogeneous solid system. Our investigations reveal that physics-based training yields higher accuracy compared to purely data-driven approaches for unseen microstructures. Additionally, we offer two methods to directly improve the process of obtaining high-resolution solutions, avoiding the need to use basic interpolation techniques. First is based on an autoencoder approach to enhance the efficiency for calculation on high resolution grid point. Next, Fourier-based parametrization is utilized to address complex 2D and 3D problems in micromechanics. The latter idea aims to represent complex microstructures efficiently using Fourier coefficients. Comparisons with other well-known operator learning algorithms, further emphasize the advantages of the newly proposed method.

Updated: 2024-06-03 09:03:10

标题: 一个有限的操作学习技术，用于将微结构的弹性性质映射到它们的机械变形

摘要: 为了获得固体力学中物理方程的快速解决方案，我们引入了一种方法，将有限元方法的核心思想与基于物理信息的神经网络和神经算子的概念相结合。这种方法推广和增强了每种方法，学习了机械问题的参数化解决方案，而不依赖于其他资源（例如其他数值求解器）的数据。我们建议直接利用有限元软件包中的离散弱形式来代数地构建损失函数，从而展示了在存在尖锐不连续性的情况下寻找解决方案的能力。我们的重点是微观力学作为一个示例，对于给定的异质微结构，了解变形和应力场对于进一步设计应用至关重要。我们研究的主要参数是异质固体系统内的杨氏模量分布。我们的调查显示，基于物理的训练相比纯数据驱动方法对未知微结构具有更高的准确性。此外，我们提供了两种方法来直接改进获取高分辨率解决方案的过程，避免使用基本的插值技术。第一种基于自动编码器方法，增强了在高分辨率网格点上的计算效率。接下来，傅立叶参数化被用于解决微观力学中的复杂二维和三维问题。后一种想法旨在利用傅立叶系数有效地表示复杂的微结构。与其他众所周知的运算学习算法进行比较，进一步强调了新提出的方法的优势。

更新时间: 2024-06-03 09:03:10

领域: cs.LG,cs.CE,cs.NA,math.NA

下载: http://arxiv.org/abs/2404.00074v2

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

The burgeoning interest in developing Large Language Models (LLMs) with up to trillion parameters has been met with concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores the importance of exploring the potential of Small Language Models (SLMs) as a resource-efficient alternative. In this context, we introduce MiniCPM, specifically the 1.2B and 2.4B non-embedding parameter variants, not only excel in their respective categories but also demonstrate capabilities on par with 7B-13B LLMs. While focusing on SLMs, our approach exhibits scalability in both model and data dimensions for future LLM research. Regarding model scaling, we employ extensive model wind tunnel experiments for stable and optimal scaling. For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation. We present an in-depth analysis of the intriguing training dynamics that occurred in the WSD LRS. With WSD LRS, we are now able to efficiently study data-model scaling law without extensive retraining experiments on both axes of model and data, from which we derive the much higher compute optimal data-model ratio than Chinchilla Optimal. Additionally, we introduce MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE and MiniCPM-128K, whose excellent performance further cementing MiniCPM's foundation in diverse SLM applications. MiniCPM models are available publicly at https://github.com/OpenBMB/MiniCPM .

Updated: 2024-06-03 08:54:38

标题: MiniCPM: 揭示小型语言模型潜力的可扩展训练策略

摘要: 对开发具有高达万亿参数的大型语言模型（LLMs）的兴趣与资源效率和实际费用方面的担忧相遇，尤其是考虑到巨大的实验成本。这种情况强调了探索小型语言模型（SLMs）作为资源高效替代方案的潜力的重要性。在这个背景下，我们介绍了MiniCPM，特别是1.2B和2.4B非嵌入参数变体，它们不仅在各自的类别中表现出色，而且还展示了与7B-13B LLMs相当的能力。在关注SLMs的同时，我们的方法展示了未来LLM研究中模型和数据维度的可扩展性。在模型扩展方面，我们进行了大量的模型风洞实验，以实现稳定和最佳的扩展。在数据扩展方面，我们引入了一种适用于连续训练和领域适应的Warmup-Stable-Decay（WSD）学习率调度器（LRS）。我们对WSD LRS中发生的有趣训练动态进行了深入分析。通过WSD LRS，我们现在能够有效地研究数据-模型扩展规律，而无需在模型和数据的两个轴上进行大量的重新训练实验，从中我们得出了比Chinchilla Optimal更高的计算优化数据-模型比例。此外，我们还介绍了MiniCPM系列，包括MiniCPM-DPO、MiniCPM-MoE和MiniCPM-128K，它们的出色性能进一步巩固了MiniCPM在各种SLM应用中的基础。MiniCPM模型可以在 https://github.com/OpenBMB/MiniCPM 上公开获取。

更新时间: 2024-06-03 08:54:38

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.06395v3

Accelerating Heterogeneous Federated Learning with Closed-form Classifiers

Federated Learning (FL) methods often struggle in highly statistically heterogeneous settings. Indeed, non-IID data distributions cause client drift and biased local solutions, particularly pronounced in the final classification layer, negatively impacting convergence speed and accuracy. To address this issue, we introduce Federated Recursive Ridge Regression (Fed3R). Our method fits a Ridge Regression classifier computed in closed form leveraging pre-trained features. Fed3R is immune to statistical heterogeneity and is invariant to the sampling order of the clients. Therefore, it proves particularly effective in cross-device scenarios. Furthermore, it is fast and efficient in terms of communication and computation costs, requiring up to two orders of magnitude fewer resources than the competitors. Finally, we propose to leverage the Fed3R parameters as an initialization for a softmax classifier and subsequently fine-tune the model using any FL algorithm (Fed3R with Fine-Tuning, Fed3R+FT). Our findings also indicate that maintaining a fixed classifier aids in stabilizing the training and learning more discriminative features in cross-device settings. Official website: https://fed-3r.github.io/.

Updated: 2024-06-03 08:52:06

标题: 用闭式分类器加速异构联邦学习

摘要: 联邦学习（FL）方法在高度统计异质性环境中往往面临困难。事实上，非独立同分布的数据分布会导致客户端漂移和偏倚的本地解决方案，特别在最终分类层中表现明显，从而对收敛速度和准确性产生负面影响。为了解决这个问题，我们引入了联邦递归岭回归（Fed3R）。我们的方法利用预训练特征计算出闭合形式的岭回归分类器。Fed3R对统计异质性具有免疫性，并且对客户端的采样顺序具有不变性。因此，在跨设备场景中特别有效。此外，它在通信和计算成本方面快速高效，比竞争对手需要的资源少两个数量级。最后，我们建议利用Fed3R参数作为softmax分类器的初始化，并随后使用任何FL算法对模型进行微调（Fed3R with Fine-Tuning，Fed3R+FT）。我们的研究结果还表明，保持固定的分类器有助于稳定训练并学习更具有区分性的特征在跨设备设置中。官方网站：https://fed-3r.github.io/。

更新时间: 2024-06-03 08:52:06

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.01116v1

Deep Optimal Transport for Domain Adaptation on SPD Manifolds

The machine learning community has shown increasing interest in addressing the domain adaptation problem on symmetric positive definite (SPD) manifolds. This interest is primarily driven by the complexities of neuroimaging data generated from brain signals, which often exhibit shifts in data distribution across recording sessions. These neuroimaging data, represented by signal covariance matrices, possess the mathematical properties of symmetry and positive definiteness. However, applying conventional domain adaptation methods is challenging because these mathematical properties can be disrupted when operating on covariance matrices. In this study, we introduce a novel geometric deep learning-based approach utilizing optimal transport on SPD manifolds to manage discrepancies in both marginal and conditional distributions between the source and target domains. We evaluate the effectiveness of this approach in three cross-session brain-computer interface scenarios and provide visualized results for further insights. The GitHub repository of this study can be accessed at https://github.com/GeometricBCI/Deep-Optimal-Transport-for-Domain-Adaptation-on-SPD-Manifolds.

Updated: 2024-06-03 08:51:23

标题: 深度最优输运在SPD流形上的域自适应

摘要: 机器学习社区对在对称正定（SPD）流形上解决领域适应问题表现出越来越大的兴趣。这种兴趣主要是由大脑信号产生的神经影像数据的复杂性驱动，这些数据通常在记录会话中展现出数据分布的变化。这些神经影像数据，由信号协方差矩阵表示，具有对称性和正定性的数学特性。然而，应用传统的领域适应方法是具有挑战性的，因为在协方差矩阵上操作时这些数学特性可能会被破坏。在本研究中，我们引入了一种基于几何深度学习的方法，利用在SPD流形上的最优传输来管理源域和目标域之间的边际和条件分布之间的差异。我们在三个跨会话的脑-计算机界面场景中评估了这种方法的有效性，并提供了可视化结果以进一步洞察。本研究的GitHub存储库可在以下网址访问：https://github.com/GeometricBCI/Deep-Optimal-Transport-for-Domain-Adaptation-on-SPD-Manifolds。

更新时间: 2024-06-03 08:51:23

领域: cs.LG,cs.AI,eess.SP,I.2.0

下载: http://arxiv.org/abs/2201.05745v4

ACEGEN: Reinforcement learning of generative chemical agents for drug discovery

In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capabilities, flexibility, reliability, and efficiency remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we introduce ACEGEN, a comprehensive and streamlined toolkit tailored for generative drug design, built using TorchRL, a modern RL library that offers thoroughly tested reusable components. We validate ACEGEN by benchmarking against other published generative modeling algorithms and show comparable or improved performance. We also show examples of ACEGEN applied in multiple drug discovery case studies. ACEGEN is accessible at \url{https://github.com/acellera/acegen-open} and available for use under the MIT license.

Updated: 2024-06-03 08:50:31

标题: ACEGEN：用于药物发现的生成化学试剂的强化学习

摘要: 近年来，强化学习（RL）已经成为药物设计中的一个有价值的工具，可以提出和优化具有期望性质的分子。然而，由于先进RL算法的复杂性和对专门代码的重大依赖，平衡能力、灵活性、可靠性和效率仍然具有挑战性。在这项工作中，我们介绍了ACEGEN，一个专门针对生成式药物设计的全面简化工具包，使用TorchRL构建，TorchRL是一个现代的RL库，提供经过充分测试的可重用组件。我们通过与其他已发表的生成建模算法进行基准测试来验证ACEGEN，并展示了相当或改进的性能。我们还展示了ACEGEN在多个药物发现案例研究中的应用示例。ACEGEN可以在\url{https://github.com/acellera/acegen-open}访问，并可在MIT许可下使用。

更新时间: 2024-06-03 08:50:31

领域: cs.LG,cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2405.04657v2

A Practical Approach to Novel Class Discovery in Tabular Data

The problem of Novel Class Discovery (NCD) consists in extracting knowledge from a labeled set of known classes to accurately partition an unlabeled set of novel classes. While NCD has recently received a lot of attention from the community, it is often solved on computer vision problems and under unrealistic conditions. In particular, the number of novel classes is usually assumed to be known in advance, and their labels are sometimes used to tune hyperparameters. Methods that rely on these assumptions are not applicable in real-world scenarios. In this work, we focus on solving NCD in tabular data when no prior knowledge of the novel classes is available. To this end, we propose to tune the hyperparameters of NCD methods by adapting the $k$-fold cross-validation process and hiding some of the known classes in each fold. Since we have found that methods with too many hyperparameters are likely to overfit these hidden classes, we define a simple deep NCD model. This method is composed of only the essential elements necessary for the NCD problem and performs impressively well under realistic conditions. Furthermore, we find that the latent space of this method can be used to reliably estimate the number of novel classes. Additionally, we adapt two unsupervised clustering algorithms ($k$-means and Spectral Clustering) to leverage the knowledge of the known classes. Extensive experiments are conducted on 7 tabular datasets and demonstrate the effectiveness of the proposed method and hyperparameter tuning process, and show that the NCD problem can be solved without relying on knowledge from the novel classes.

Updated: 2024-06-03 08:49:54

标题: 在表格数据中发现新类的实用方法

摘要: Novel Class Discovery (NCD)问题是从已知类别的标记集中提取知识，以准确地将未标记的新类别划分为不同类别。尽管NCD最近受到了社区的关注，但通常是在计算机视觉问题下解决，且在不切实际的条件下。特别是，通常假定新类别的数量事先已知，并且有时使用它们的标签来调整超参数。依赖这些假设的方法在真实场景中不适用。在这项工作中，我们致力于在没有新类别先验知识的情况下解决表格数据中的NCD。为此，我们提出通过调整$k$-fold交叉验证过程并隐藏每个fold中的一些已知类别来调整NCD方法的超参数。由于我们发现具有过多超参数的方法很可能会过拟合这些隐藏类别，我们定义了一个简单的深度NCD模型。该方法仅包含解决NCD问题所必需的基本元素，并在实际条件下表现出色。此外，我们发现该方法的潜在空间可以可靠地估计新类别的数量。此外，我们改进了两种无监督聚类算法（$k$-means和谱聚类）以利用已知类别的知识。我们在7个表格数据集上进行了广泛的实验，展示了所提出的方法和超参数调整过程的有效性，并表明NCD问题可以在不依赖新类别的知识的情况下解决。

更新时间: 2024-06-03 08:49:54

领域: cs.LG

下载: http://arxiv.org/abs/2311.05440v3

Globally Interpretable Classifiers via Boolean Formulas with Dynamic Propositions

Interpretability and explainability are among the most important challenges of modern artificial intelligence, being mentioned even in various legislative sources. In this article, we develop a method for extracting immediately human interpretable classifiers from tabular data. The classifiers are given in the form of short Boolean formulas built with propositions that can either be directly extracted from categorical attributes or dynamically computed from numeric ones. Our method is implemented using Answer Set Programming. We investigate seven datasets and compare our results to ones obtainable by state-of-the-art classifiers for tabular data, namely, XGBoost and random forests. Over all datasets, the accuracies obtainable by our method are similar to the reference methods. The advantage of our classifiers in all cases is that they are very short and immediately human intelligible as opposed to the black-box nature of the reference methods.

Updated: 2024-06-03 08:46:17

标题: 通过具有动态命题的布尔公式实现全球可解释的分类器

摘要: 解释性和可解释性是现代人工智能中最重要的挑战之一，甚至在各种立法来源中都有提及。在本文中，我们开发了一种从表格数据中提取立即人类可解释分类器的方法。这些分类器以使用命题构建的简短布尔公式的形式给出，这些命题可以直接从分类属性中提取，也可以从数值属性中动态计算得出。我们的方法使用Answer Set Programming实现。我们研究了七个数据集，并将我们的结果与用于表格数据的最先进分类器（即XGBoost和随机森林）的结果进行比较。在所有数据集上，我们的方法获得的准确度与参考方法相似。在所有情况下，我们分类器的优势在于它们非常简短并且立即人类可理解，与参考方法的黑盒性质相反。

更新时间: 2024-06-03 08:46:17

领域: cs.LG,cs.AI,cs.LO,I.2.6; F.4.1; I.2.4

下载: http://arxiv.org/abs/2406.01114v1

SST-GCN: The Sequential based Spatio-Temporal Graph Convolutional networks for Minute-level and Road-level Traffic Accident Risk Prediction

Traffic accidents are recognized as a major social issue worldwide, causing numerous injuries and significant costs annually. Consequently, methods for predicting and preventing traffic accidents have been researched for many years. With advancements in the field of artificial intelligence, various studies have applied Machine Learning and Deep Learning techniques to traffic accident prediction. Modern traffic conditions change rapidly by the minute, and these changes vary significantly across different roads. In other words, the risk of traffic accidents changes minute by minute in various patterns for each road. Therefore, it is desirable to predict traffic accident risk at the Minute-Level and Road-Level. However, because roads have close and complex relationships with adjacent roads, research on predicting traffic accidents at the Minute-Level and Road-Level is challenging. Thus, it is essential to build a model that can reflect the spatial and temporal characteristics of roads for traffic accident prediction. Consequently, recent attempts have been made to use Graph Convolutional Networks to capture the spatial characteristics of roads and Recurrent Neural Networks to capture their temporal characteristics for predicting traffic accident risk. This paper proposes the Sequential based Spatio-Temporal Graph Convolutional Networks (SST-GCN), which combines GCN and LSTM, to predict traffic accidents at the Minute-Level and Road-Level using a road dataset constructed in Seoul, the capital of South Korea. Experiments have demonstrated that SST-GCN outperforms other state-of-the-art models in Minute-Level predictions.

Updated: 2024-06-03 08:44:05

标题: SST-GCN：基于时空图卷积网络的基于序列的分钟级和道路级交通事故风险预测

摘要: 交通事故被公认为全球性的重大社会问题，每年造成大量伤害和巨额成本。因此，多年来一直在研究预测和预防交通事故的方法。随着人工智能领域的进步，各种研究已经应用机器学习和深度学习技术来预测交通事故。现代交通条件每分钟都在快速变化，这些变化在不同道路之间差异显著。换句话说，每条道路的交通事故风险每分钟以不同的模式改变。因此，预测交通事故风险在分钟级别和道路级别是可取的。然而，由于道路与相邻道路之间有密切而复杂的关系，研究在分钟级别和道路级别预测交通事故是具有挑战性的。因此，必须建立一个能够反映道路空间和时间特征的模型，用于交通事故预测。因此，最近的尝试是使用图卷积网络来捕捉道路的空间特征，使用循环神经网络来捕捉其时间特征，以预测交通事故风险。本文提出了基于序列的时空图卷积网络（SST-GCN），结合GCN和LSTM，利用在韩国首都首尔构建的道路数据集来预测交通事故在分钟级别和道路级别发生的情况。实验证明SST-GCN在分钟级别预测方面优于其他最先进的模型。

更新时间: 2024-06-03 08:44:05

领域: cs.AI

下载: http://arxiv.org/abs/2405.18602v2

Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles

A popular approach for solving zero-sum games is to maintain populations of policies to approximate the Nash Equilibrium (NE). Previous studies have shown that Policy Space Response Oracle (PSRO) algorithm is an effective multi-agent reinforcement learning framework for solving such games. However, repeatedly training new policies from scratch to approximate Best Response (BR) to opponents' mixed policies at each iteration is both inefficient and costly. While some PSRO variants initialize a new policy by inheriting from past BR policies, this approach limits the exploration of new policies, especially against challenging opponents. To address this issue, we propose Fusion-PSRO, which employs policy fusion to initialize policies for better approximation to BR. By selecting high-quality base policies from meta-NE, policy fusion fuses the base policies into a new policy through model averaging. This approach allows the initialized policies to incorporate multiple expert policies, making it easier to handle difficult opponents compared to inheriting from past BR policies or initializing from scratch. Moreover, our method only modifies the policy initialization phase, allowing its application to nearly all PSRO variants without additional training overhead. Our experiments on non-transitive matrix games, Leduc Poker, and the more complex Liars Dice demonstrate that Fusion-PSRO enhances the performance of nearly all PSRO variants, achieving lower exploitability.

Updated: 2024-06-03 08:43:51

标题: Fusion-PSRO：纳什政策融合用于政策空间响应预言者

摘要: 解决零和博弈的一种流行方法是维护一组策略来逼近纳什均衡（NE）。先前的研究表明，政策空间响应甲骨文（PSRO）算法是解决此类游戏的有效多智能体强化学习框架。然而，反复训练新的策略以逼近对手在每次迭代中的混合策略的最佳响应（BR）既低效又昂贵。虽然一些PSRO变体通过继承过去的BR策略来初始化新策略，但这种方法限制了新策略的探索，特别是面对具有挑战性的对手。为了解决这个问题，我们提出了Fusion-PSRO，它利用策略融合来初始化策略，以更好地逼近BR。通过从元NE中选择高质量的基础策略，策略融合通过模型平均将基础策略融合到一个新策略中。这种方法允许初始化策略整合多个专家策略，使其比继承过去的BR策略或从头开始初始化更容易处理困难的对手。此外，我们的方法只修改了策略初始化阶段，使其可以应用于几乎所有PSRO变体，而无需额外的训练开销。我们在非传递矩阵游戏、Leduc扑克和更复杂的说谎者骰子上的实验表明，Fusion-PSRO提高了几乎所有PSRO变体的性能，实现了更低的剥削性。

更新时间: 2024-06-03 08:43:51

领域: cs.GT,cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2405.21027v2

Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

Deep Reinforcement Learning (DRL) agents have demonstrated impressive success in a wide range of game genres. However, existing research primarily focuses on optimizing DRL competence rather than addressing the challenge of prolonged player interaction. In this paper, we propose a practical DRL agent system for fighting games named Sh\=ukai, which has been successfully deployed to Naruto Mobile, a popular fighting game with over 100 million registered users. Sh\=ukai quantifies the state to enhance generalizability, introducing Heterogeneous League Training (HELT) to achieve balanced competence, generalizability, and training efficiency. Furthermore, Sh\=ukai implements specific rewards to align the agent's behavior with human expectations. Sh\=ukai's ability to generalize is demonstrated by its consistent competence across all characters, even though it was trained on only 13% of them. Additionally, HELT exhibits a remarkable 22% improvement in sample efficiency. Sh\=ukai serves as a valuable training partner for players in Naruto Mobile, enabling them to enhance their abilities and skills.

Updated: 2024-06-03 08:39:15

标题: 推进商用格斗游戏中的DRL代理：训练，整合和代理-人类对齐

摘要: 深度强化学习（DRL）代理已在各种游戏类型中取得了令人印象深刻的成功。然而，现有研究主要集中在优化DRL的能力，而不是解决长时间玩家互动的挑战。本文提出了一个针对格斗游戏的实用DRL代理系统，名为Sh\=ukai，已成功部署到拥有超过1亿注册用户的热门格斗游戏《火影忍者手机版》。Sh\=ukai通过量化状态来增强泛化能力，引入异质联赛训练（HELT）以实现平衡的竞争力、泛化能力和训练效率。此外，Sh\=ukai实现了特定奖励以使代理行为与人类期望保持一致。Sh\=ukai展示了其泛化能力，即使仅在其中的13%进行了训练，也能在所有角色上表现出一致的竞争力。此外，HELT表现出了显著的22%的样本效率提升。Sh\=ukai为《火影忍者手机版》中的玩家提供了一个宝贵的训练伙伴，使他们能够提升自己的能力和技能。

更新时间: 2024-06-03 08:39:15

领域: cs.AI

下载: http://arxiv.org/abs/2406.01103v1

Using EEG to investigate the effectiveness of applying ChatGPT

In recent years, the rapid development of artificial intelligence technology, especially the emergence of large language models (LLMs) such as ChatGPT, has presented significant prospects for application in the field of education. LLMs possess the capability to interpret knowledge, answer questions, and consider context, thus providing support for dialogic teaching to students. Therefore, an examination of the capacity of LLMs to effectively fulfill instructional roles, thereby facilitating student learning akin to human educators within dialogic teaching scenarios, is an exceptionally valuable research topic. This research recruited 34 undergraduate students as participants, who were randomly divided into two groups. The experimental group engaged in dialogic teaching using ChatGPT, while the control group interacted with human teachers. Both groups learned the histogram equalization unit in the information-related course "Digital Image Processing". The research findings show comparable scores between the two groups on the retention test. However, students who engaged in dialogue with ChatGPT exhibited lower performance on the transfer test. Electroencephalography data revealed that students who interacted with ChatGPT exhibited higher levels of cognitive activity, suggesting that ChatGPT could help students establish a knowledge foundation and stimulate cognitive activity. However, its strengths on promoting students. knowledge application and creativity were insignificant. Based upon the research findings, it is evident that ChatGPT cannot fully excel in fulfilling teaching tasks in the dialogue teaching in information related courses. Combining ChatGPT with traditional human teachers might be a more ideal approach. The synergistic use of both can provide students with more comprehensive learning support, thus contributing to enhancing the quality of teaching.

Updated: 2024-06-03 08:37:42

标题: 使用脑电图研究应用ChatGPT的有效性

摘要: 近年来，人工智能技术的快速发展，尤其是大语言模型（LLMs）如ChatGPT的出现，在教育领域应用方面呈现出重要前景。LLMs具有解释知识、回答问题和考虑上下文的能力，从而为学生提供对话式教学支持。因此，对LLMs有效履行教学角色的能力进行检验，从而促进学生在对话式教学场景中学习，是一个极具价值的研究课题。本研究招募了34名本科生作为参与者，随机分为两组。实验组使用ChatGPT进行对话式教学，而对照组与人类教师互动。两组学习信息相关课程《数字图像处理》中的直方图均衡单元。研究结果显示，两组在保留测试中得分相当。然而，与ChatGPT对话的学生在转移测试中表现较差。脑电图数据显示，与ChatGPT互动的学生表现出更高水平的认知活动，表明ChatGPT可以帮助学生建立知识基础和激发认知活动。然而，它在促进学生知识应用和创造力方面的优势微不足道。根据研究结果，ChatGPT在信息相关课程的对话式教学中不能充分胜任教学任务是显而易见的。将ChatGPT与传统人类教师结合可能是更理想的方法。两者的协同使用可以为学生提供更全面的学习支持，从而有助于提高教学质量。

更新时间: 2024-06-03 08:37:42

领域: cs.CY,cs.AI,physics.ed-ph

下载: http://arxiv.org/abs/2403.16687v4

Genshin: General Shield for Natural Language Processing with Large Language Models

Large language models (LLMs) like ChatGPT, Gemini, or LLaMA have been trending recently, demonstrating considerable advancement and generalizability power in countless domains. However, LLMs create an even bigger black box exacerbating opacity, with interpretability limited to few approaches. The uncertainty and opacity embedded in LLMs' nature restrict their application in high-stakes domains like financial fraud, phishing, etc. Current approaches mainly rely on traditional textual classification with posterior interpretable algorithms, suffering from attackers who may create versatile adversarial samples to break the system's defense, forcing users to make trade-offs between efficiency and robustness. To address this issue, we propose a novel cascading framework called Genshin (General Shield for Natural Language Processing with Large Language Models), utilizing LLMs as defensive one-time plug-ins. Unlike most applications of LLMs that try to transform text into something new or structural, Genshin uses LLMs to recover text to its original state. Genshin aims to combine the generalizability of the LLM, the discrimination of the median model, and the interpretability of the simple model. Our experiments on the task of sentimental analysis and spam detection have shown fatal flaws of the current median models and exhilarating results on LLMs' recovery ability, demonstrating that Genshin is both effective and efficient. In our ablation study, we unearth several intriguing observations. Utilizing the LLM defender, a tool derived from the 4th paradigm, we have reproduced BERT's 15% optimal mask rate results in the 3rd paradigm of NLP. Additionally, when employing the LLM as a potential adversarial tool, attackers are capable of executing effective attacks that are nearly semantically lossless.

Updated: 2024-06-03 08:35:07

标题: 原文标题翻译为：Genshin：大型语言模型自然语言处理的通用护盾

摘要: 最近，像ChatGPT、Gemini或LLaMA这样的大型语言模型（LLMs）一直备受关注，展示了在无数领域取得的显著进展和泛化能力。然而，LLMs产生了更大的黑匣子，加剧了不透明性，解释性仅限于少数方法。LLMs固有的不确定性和不透明性限制了它们在金融欺诈、网络钓鱼等高风险领域的应用。目前的方法主要依赖于传统的文本分类与后验可解释算法，容易受到攻击者的攻击，可能创建多功能对抗样本来破坏系统的防御，迫使用户在效率和稳健性之间进行权衡。为解决这一问题，我们提出了一种名为Genshin（大型语言模型自然语言处理的通用防护盾）的新型级联框架，利用LLMs作为防御性一次性插件。与大多数LLMs的应用尝试将文本转换为新的或结构化内容不同，Genshin利用LLMs将文本恢复到其原始状态。Genshin旨在结合LLMs的泛化能力、中位数模型的区分能力和简单模型的可解释性。我们在情感分析和垃圾邮件检测任务上的实验显示了当前中位数模型的致命缺陷，以及LLMs的恢复能力令人振奋的结果，证明了Genshin既有效又高效。在我们的消融研究中，我们发现了几个有趣的观察结果。利用从第四范式衍生的LLM防御者工具，我们在自然语言处理的第三范式中重现了BERT的15%最佳掩码率结果。此外，当将LLM作为潜在的对抗工具时，攻击者能够执行几乎是语义无损的有效攻击。

更新时间: 2024-06-03 08:35:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.18741v2

Deep reinforcement learning for weakly coupled MDP's with continuous actions

This paper introduces the Lagrange Policy for Continuous Actions (LPCA), a reinforcement learning algorithm specifically designed for weakly coupled MDP problems with continuous action spaces. LPCA addresses the challenge of resource constraints dependent on continuous actions by introducing a Lagrange relaxation of the weakly coupled MDP problem within a neural network framework for Q-value computation. This approach effectively decouples the MDP, enabling efficient policy learning in resource-constrained environments. We present two variations of LPCA: LPCA-DE, which utilizes differential evolution for global optimization, and LPCA-Greedy, a method that incrementally and greadily selects actions based on Q-value gradients. Comparative analysis against other state-of-the-art techniques across various settings highlight LPCA's robustness and efficiency in managing resource allocation while maximizing rewards.

Updated: 2024-06-03 08:34:32

标题: 深度强化学习在连续动作弱耦合MDP中的应用

摘要: 这篇论文介绍了连续动作的拉格朗日策略（LPCA），这是一种专门针对具有连续动作空间的弱耦合MDP问题设计的强化学习算法。LPCA通过在神经网络框架中引入对弱耦合MDP问题的拉格朗日松弛来解决依赖连续动作的资源约束挑战，用于Q值计算。这种方法有效地解耦了MDP，使得在资源受限环境中能够进行高效的策略学习。我们提出了两种LPCA的变体：LPCA-DE利用微分进化进行全局优化，而LPCA-Greedy则是一种基于Q值梯度逐步和贪婪选择动作的方法。通过与其他各种设置中的最新技术进行比较分析，突显了LPCA在资源分配管理和最大化奖励方面的稳健性和效率。

更新时间: 2024-06-03 08:34:32

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2406.01099v1

Bayesian Exploration Networks

Bayesian reinforcement learning (RL) offers a principled and elegant approach for sequential decision making under uncertainty. Most notably, Bayesian agents do not face an exploration/exploitation dilemma, a major pathology of frequentist methods. However theoretical understanding of model-free approaches is lacking. In this paper, we introduce a novel Bayesian model-free formulation and the first analysis showing that model-free approaches can yield Bayes-optimal policies. We show all existing model-free approaches make approximations that yield policies that can be arbitrarily Bayes-suboptimal. As a first step towards model-free Bayes optimality, we introduce the Bayesian exploration network (BEN) which uses normalising flows to model both the aleatoric uncertainty (via density estimation) and epistemic uncertainty (via variational inference) in the Bellman operator. In the limit of complete optimisation, BEN learns true Bayes-optimal policies, but like in variational expectation-maximisation, partial optimisation renders our approach tractable. Empirical results demonstrate that BEN can learn true Bayes-optimal policies in tasks where existing model-free approaches fail.

Updated: 2024-06-03 08:23:19

标题: 贝叶斯探索网络

摘要: 贝叶斯强化学习（RL）为在不确定性下进行序贯决策提供了一种原则性和优雅的方法。最显著的是，贝叶斯代理没有面临频繁主义方法的一个主要病理，即勘探/开发困境。然而，对于无模型方法的理论理解还不足。在本文中，我们介绍了一种新颖的贝叶斯无模型制定，并首次分析了无模型方法可以产生贝叶斯最优策略。我们表明所有现有的无模型方法都做出了近似，导致了可能是任意贝叶斯次优的策略。作为朝着无模型贝叶斯最优性的第一步，我们引入了贝叶斯探索网络（BEN），该网络使用正则化流来模拟贝尔曼算子中的混合不确定性（通过密度估计）和认知不确定性（通过变分推断）。在完全优化的极限情况下，BEN学习真正的贝叶斯最优策略，但与变分期望极大化一样，部分优化使我们的方法可行。实证结果表明，BEN在现有无模型方法失败的任务中可以学习真正的贝叶斯最优策略。

更新时间: 2024-06-03 08:23:19

领域: cs.LG

下载: http://arxiv.org/abs/2308.13049v3

On the Expressivity of Persistent Homology in Graph Learning

Persistent homology, a technique from computational topology, has recently shown strong empirical performance in the context of graph classification. Being able to capture long range graph properties via higher-order topological features, such as cycles of arbitrary length, in combination with multi-scale topological descriptors, has improved predictive performance for data sets with prominent topological structures, such as molecules. At the same time, the theoretical properties of persistent homology have not been formally assessed in this context. This paper intends to bridge the gap between computational topology and graph machine learning by providing a brief introduction to persistent homology in the context of graphs, as well as a theoretical discussion and empirical analysis of its expressivity for graph learning tasks.

Updated: 2024-06-03 08:20:31

标题: 关于持久同调在图学习中的表达能力

摘要: 持续同调性是一种来自计算拓扑学的技术，最近在图分类的背景下显示出强大的实证性能。通过能够通过高阶拓扑特征捕捉长距离图属性，如任意长度的环，结合多尺度拓扑描述符，在具有显著拓扑结构的数据集，如分子中，提高了预测性能。同时，持续同调性的理论性质在这个背景下尚未得到正式评估。本文旨在通过提供关于图中持续同调性的简要介绍，以及对其在图学习任务中的表达能力的理论讨论和实证分析，弥合计算拓扑学与图机器学习之间的差距。

更新时间: 2024-06-03 08:20:31

领域: cs.LG,math.AT,stat.ML,55N31 (Primary) 62R40, 68T09 (Secondary)

下载: http://arxiv.org/abs/2302.09826v3

Towards Efficient Replay in Federated Incremental Learning

In Federated Learning (FL), the data in each client is typically assumed fixed or static. However, data often comes in an incremental manner in real-world applications, where the data domain may increase dynamically. In this work, we study catastrophic forgetting with data heterogeneity in Federated Incremental Learning (FIL) scenarios where edge clients may lack enough storage space to retain full data. We propose to employ a simple, generic framework for FIL named Re-Fed, which can coordinate each client to cache important samples for replay. More specifically, when a new task arrives, each client first caches selected previous samples based on their global and local importance. Then, the client trains the local model with both the cached samples and the samples from the new task. Theoretically, we analyze the ability of Re-Fed to discover important samples for replay thus alleviating the catastrophic forgetting problem. Moreover, we empirically show that Re-Fed achieves competitive performance compared to state-of-the-art methods.

Updated: 2024-06-03 08:14:56

标题: 朝向在联邦增量学习中高效回放

摘要: 在联邦学习（FL）中，通常假定每个客户端中的数据是固定或静态的。然而，在现实世界的应用中，数据通常以增量方式出现，其中数据域可能动态增加。在这项工作中，我们研究了在联邦增量学习（FIL）场景中具有数据异质性的灾难性遗忘，其中边缘客户端可能缺乏足够的存储空间来保留完整数据。我们提出采用一种名为Re-Fed的简单通用框架用于FIL，该框架可以协调每个客户端缓存重要样本以进行重播。更具体地说，当新任务到达时，每个客户端首先根据它们的全局和局部重要性缓存选定的先前样本。然后，客户端使用既缓存的样本又使用来自新任务的样本来训练本地模型。从理论上讲，我们分析了Re-Fed发现重要样本以进行重播从而缓解灾难性遗忘问题的能力。此外，我们还从实证上展示了Re-Fed与最先进方法相比取得了竞争性能。

更新时间: 2024-06-03 08:14:56

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2403.05890v3

CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge

Large Language Models (LLMs) are increasingly used across various domains, from software development to cyber threat intelligence. Understanding all the different fields of cybersecurity, which includes topics such as cryptography, reverse engineering, and risk assessment, poses a challenge even for human experts. To accurately test the general knowledge of LLMs in cybersecurity, the research community needs a diverse, accurate, and up-to-date dataset. To address this gap, we present CyberMetric-80, CyberMetric-500, CyberMetric-2000, and CyberMetric-10000, which are multiple-choice Q&A benchmark datasets comprising 80, 500, 2000, and 10,000 questions respectively. By utilizing GPT-3.5 and Retrieval-Augmented Generation (RAG), we collected documents, including NIST standards, research papers, publicly accessible books, RFCs, and other publications in the cybersecurity domain, to generate questions, each with four possible answers. The results underwent several rounds of error checking and refinement. Human experts invested over 200 hours validating the questions and solutions to ensure their accuracy and relevance, and to filter out any questions unrelated to cybersecurity. We have evaluated and compared 25 state-of-the-art LLM models on the CyberMetric datasets. In addition to our primary goal of evaluating LLMs, we involved 30 human participants to solve CyberMetric-80 in a closed-book scenario. The results can serve as a reference for comparing the general cybersecurity knowledge of humans and LLMs. The findings revealed that GPT-4o, GPT-4-turbo, Mixtral-8x7B-Instruct, Falcon-180B-Chat, and GEMINI-pro 1.0 were the best-performing LLMs. Additionally, the top LLMs were more accurate than humans on CyberMetric-80, although highly experienced human experts still outperformed small models such as Llama-3-8B, Phi-2 or Gemma-7b.

Updated: 2024-06-03 08:14:45

标题: 网络度量：基于检索增强生成的基准数据集，用于评估网络安全知识中的LLMs

摘要: 大型语言模型（LLMs）越来越多地应用于各个领域，从软件开发到网络威胁情报。即使对于人类专家来说，理解包括密码学、逆向工程和风险评估等主题在内的所有不同领域的网络安全都是一项挑战。为了准确测试LLMs在网络安全领域的普通知识，研究界需要一个多样化、准确且及时更新的数据集。为了填补这一差距，我们提出了CyberMetric-80、CyberMetric-500、CyberMetric-2000和CyberMetric-10000，这些是分别包含80、500、2000和10000个问题的多选题目基准数据集。通过利用GPT-3.5和检索增强生成（RAG），我们收集了包括NIST标准、研究论文、公开可访问的书籍、RFC以及其他网络安全领域的出版物，以生成每个问题及其四个可能答案。结果经过多轮错误检查和精炼。人类专家投入了200多小时验证问题和解决方案，以确保其准确性和相关性，并过滤掉与网络安全无关的问题。我们评估并比较了25个最先进的LLM模型在CyberMetric数据集上的表现。除了我们评估LLMs的主要目标外，我们还让30名人类参与者在闭卷情况下解决CyberMetric-80。结果可以作为比较人类和LLMs的普通网络安全知识的参考。研究结果显示，GPT-4o、GPT-4-turbo、Mixtral-8x7B-Instruct、Falcon-180B-Chat和GEMINI-pro 1.0是表现最佳的LLMs。此外，顶级LLMs在CyberMetric-80上比人类更准确，尽管经验丰富的人类专家仍然优于小型模型，如Llama-3-8B、Phi-2或Gemma-7b。

更新时间: 2024-06-03 08:14:45

领域: cs.AI,cs.CR

下载: http://arxiv.org/abs/2402.07688v2

Effective Subset Selection Through The Lens of Neural Network Pruning

Having large amounts of annotated data significantly impacts the effectiveness of deep neural networks. However, the annotation task can be very expensive in some domains, such as medical data. Thus, it is important to select the data to be annotated wisely, which is known as the subset selection problem. We investigate the relationship between subset selection and neural network pruning, which is more widely studied, and establish a correspondence between them. Leveraging insights from network pruning, we propose utilizing the norm criterion of neural network features to improve subset selection methods. We empirically validate our proposed strategy on various networks and datasets, demonstrating enhanced accuracy. This shows the potential of employing pruning tools for subset selection.

Updated: 2024-06-03 08:12:32

标题: 通过神经网络剪枝的视角进行有效的子集选择

摘要: 拥有大量注释数据显著影响深度神经网络的有效性。然而，在某些领域，如医疗数据，注释任务可能非常昂贵。因此，明智地选择要注释的数据非常重要，这被称为子集选择问题。我们研究了子集选择和更广泛研究的神经网络修剪之间的关系，并建立了它们之间的对应关系。利用神经网络修剪的见解，我们提出利用神经网络特征的规范准则来改进子集选择方法。我们在各种网络和数据集上进行了实证验证，展示了增强的准确性。这显示了利用修剪工具进行子集选择的潜力。

更新时间: 2024-06-03 08:12:32

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.01086v1

Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal Transport

Wasserstein Gradient Flow (WGF) describes the gradient dynamics of probability density within the Wasserstein space. WGF provides a promising approach for conducting optimization over the probability distributions. Numerically approximating the continuous WGF requires the time discretization method. The most well-known method for this is the JKO scheme. In this regard, previous WGF models employ the JKO scheme and parametrize transport map for each JKO step. However, this approach results in quadratic training complexity $O(K^2)$ with the number of JKO step $K$. This severely limits the scalability of WGF models. In this paper, we introduce a scalable WGF-based generative model, called Semi-dual JKO (S-JKO). Our model is based on the semi-dual form of the JKO step, derived from the equivalence between the JKO step and the Unbalanced Optimal Transport. Our approach reduces the training complexity to $O(K)$. We demonstrate that our model significantly outperforms existing WGF-based generative models, achieving FID scores of 2.62 on CIFAR-10 and 5.46 on CelebA-HQ-256, which are comparable to state-of-the-art image generative models.

Updated: 2024-06-03 08:12:13

标题: 可扩展的Wasserstein梯度流通过不平衡的最优输运用于生成建模

摘要: Wasserstein Gradient Flow (WGF)描述了Wasserstein空间中概率密度的梯度动力学。WGF为在概率分布上进行优化提供了一种有前途的方法。数值逼近连续WGF需要时间离散化方法。其中最为知名的方法是JKO方案。在此方面，先前的WGF模型采用了JKO方案，并为每个JKO步骤参数化传输映射。然而，这种方法导致了与JKO步骤数量K成二次复杂度O(K^2)。这严重限制了WGF模型的可扩展性。在本文中，我们介绍了一种可扩展的基于WGF的生成模型，称为半对偶JKO（S-JKO）。我们的模型基于JKO步骤的半对偶形式，这是从JKO步骤和不平衡最优传输之间的等价性导出的。我们的方法将训练复杂度降低到O(K)。我们证明了我们的模型明显优于现有的基于WGF的生成模型，在CIFAR-10上达到了2.62的FID分数，在CelebA-HQ-256上达到了5.46的FID分数，与最先进的图像生成模型相当。

更新时间: 2024-06-03 08:12:13

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2402.05443v3

FedAdOb: Privacy-Preserving Federated Deep Learning with Adaptive Obfuscation

Federated learning (FL) has emerged as a collaborative approach that allows multiple clients to jointly learn a machine learning model without sharing their private data. The concern about privacy leakage, albeit demonstrated under specific conditions, has triggered numerous follow-up research in designing powerful attacking methods and effective defending mechanisms aiming to thwart these attacking methods. Nevertheless, privacy-preserving mechanisms employed in these defending methods invariably lead to compromised model performances due to a fixed obfuscation applied to private data or gradients. In this article, we, therefore, propose a novel adaptive obfuscation mechanism, coined FedAdOb, to protect private data without yielding original model performances. Technically, FedAdOb utilizes passport-based adaptive obfuscation to ensure data privacy in both horizontal and vertical federated learning settings. The privacy-preserving capabilities of FedAdOb, specifically with regard to private features and labels, are theoretically proven through Theorems 1 and 2. Furthermore, extensive experimental evaluations conducted on various datasets and network architectures demonstrate the effectiveness of FedAdOb by manifesting its superior trade-off between privacy preservation and model performance, surpassing existing methods.

Updated: 2024-06-03 08:12:09

标题: FedAdOb：具有自适应混淆的隐私保护联邦深度学习

摘要: 联邦学习（FL）已经成为一种协作方法，允许多个客户共同学习一个机器学习模型，而不共享他们的私人数据。尽管在特定条件下已经证明了对隐私泄漏的担忧，但这已经引发了大量的后续研究，旨在设计强大的攻击方法和有效的防御机制，以阻止这些攻击方法。然而，在这些防御方法中采用的保护隐私机制往往会导致模型性能受损，因为对私人数据或梯度应用了固定的混淆。因此，在本文中，我们提出了一种新颖的自适应混淆机制，命名为FedAdOb，以保护私人数据而不降低原始模型的性能。从技术上讲，FedAdOb利用基于护照的自适应混淆来确保在水平和垂直联邦学习设置中的数据隐私。FedAdOb的保护隐私能力，特别是私有特征和标签，通过定理1和定理2在理论上得到证明。此外，在各种数据集和网络架构上进行的大量实验评估表明，FedAdOb通过展示其在隐私保护和模型性能之间的卓越权衡，超越了现有方法的有效性。

更新时间: 2024-06-03 08:12:09

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.01085v1

What Is Fairness? On the Role of Protected Attributes and Fictitious Worlds

A growing body of literature in fairness-aware machine learning (fairML) aims to mitigate machine learning (ML)-related unfairness in automated decision-making (ADM) by defining metrics that measure fairness of an ML model and by proposing methods to ensure that trained ML models achieve low scores on these metrics. However, the underlying concept of fairness, i.e., the question of what fairness is, is rarely discussed, leaving a significant gap between centuries of philosophical discussion and the recent adoption of the concept in the ML community. In this work, we try to bridge this gap by formalizing a consistent concept of fairness and by translating the philosophical considerations into a formal framework for the training and evaluation of ML models in ADM systems. We argue that fairness problems can arise even without the presence of protected attributes (PAs), and point out that fairness and predictive performance are not irreconcilable opposites, but that the latter is necessary to achieve the former. Furthermore, we argue why and how causal considerations are necessary when assessing fairness in the presence of PAs by proposing a fictitious, normatively desired (FiND) world in which PAs have no causal effects. In practice, this FiND world must be approximated by a warped world in which the causal effects of the PAs are removed from the real-world data. Finally, we achieve greater linguistic clarity in the discussion of fairML. We outline algorithms for practical applications and present illustrative experiments on COMPAS data.

Updated: 2024-06-03 08:02:04

标题: 公平是什么？关于受保护特征和虚构世界的作用

摘要: 在关于公平感知机器学习（fairML）的文献中，越来越多的研究旨在通过定义衡量机器学习（ML）模型公平性的指标，并提出方法来确保训练的ML模型在这些指标上得分较低，从而减轻自动决策（ADM）中与ML相关的不公平性。然而，公平性的基本概念，即什么是公平性的问题，很少被讨论，导致了在几个世纪的哲学讨论与ML社区最近对该概念的采用之间存在显著差距。在这项工作中，我们试图通过形式化一个一致的公平概念，将哲学考虑转化为ADM系统中ML模型的训练和评估的形式框架，来弥合这一差距。我们认为，即使没有受保护属性（PAs）的存在，公平性问题也可能出现，并指出公平性和预测性能并非不可调和的对立，而是前者实现后者的必要条件。此外，我们认为在评估存在PAs的情况下的公平性时，因果考虑是必要的，通过提出一个虚构的、规范上期望的（FiND）世界，在这个世界里PAs没有因果效应。实际上，这个FiND世界必须通过一个消除了PAs因果效应的扭曲世界来近似真实世界数据。最后，我们在fairML的讨论中实现了更大的语言清晰度。我们概述了实际应用的算法，并在COMPAS数据上呈现了说明性实验。

更新时间: 2024-06-03 08:02:04

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2205.09622v5

DMS*: Minimizing Makespan for Multi-Agent Combinatorial Path Finding

Multi-Agent Combinatorial Path Finding (MCPF) seeks collision-free paths for multiple agents from their initial to goal locations, while visiting a set of intermediate target locations in the middle of the paths. MCPF is challenging as it involves both planning collision-free paths for multiple agents and target sequencing, i.e., solving traveling salesman problems to assign targets to and find the visiting order for the agents. Recent work develops methods to address MCPF while minimizing the sum of individual arrival times at goals. Such a problem formulation may result in paths with different arrival times and lead to a long makespan, the maximum arrival time, among the agents. This paper proposes a min-max variant of MCPF, denoted as MCPF-max, that minimizes the makespan of the agents. While the existing methods (such as MS*) for MCPF can be adapted to solve MCPF-max, we further develop two new techniques based on MS* to defer the expensive target sequencing during planning to expedite the overall computation. We analyze the properties of the resulting algorithm Deferred MS* (DMS*), and test DMS* with up to 20 agents and 80 targets. We demonstrate the use of DMS* on differential-drive robots.

Updated: 2024-06-03 08:00:16

标题: DMS*: 多智能体组合路径规划的最小化完成时间

摘要: 多智能体组合路径规划(MCPF)寻求多个智能体从初始位置到目标位置的无碰撞路径，同时在路径中间访问一组中间目标位置。MCPF具有挑战性，因为它涉及为多个智能体规划无碰撞路径和目标排序，即解决旅行推销员问题以分配目标并找到智能体的访问顺序。最近的工作开发了方法来解决MCPF，同时最小化各个智能体到达目标的总时间。这种问题表述可能导致具有不同到达时间的路径，并导致智能体中的最大到达时间，即总工作时间。本文提出了MCPF的min-max变体，称为MCPF-max，该变体最小化智能体的总工作时间。虽然现有的方法(如MS*)可以被改进以解决MCPF-max，但我们进一步开发了两种基于MS*的新技术，以推迟规划中昂贵的目标排序，以加快整体计算。我们分析了所得算法Deferred MS*(DMS*)的性质，并使用多达20个智能体和80个目标对DMS*进行了测试。我们展示了在差动驱动机器人上使用DMS*的情况。

更新时间: 2024-06-03 08:00:16

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2312.06314v2

No Vandalism: Privacy-Preserving and Byzantine-Robust Federated Learning

Federated learning allows several clients to train one machine learning model jointly without sharing private data, providing privacy protection. However, traditional federated learning is vulnerable to poisoning attacks, which can not only decrease the model performance, but also implant malicious backdoors. In addition, direct submission of local model parameters can also lead to the privacy leakage of the training dataset. In this paper, we aim to build a privacy-preserving and Byzantine-robust federated learning scheme to provide an environment with no vandalism (NoV) against attacks from malicious participants. Specifically, we construct a model filter for poisoned local models, protecting the global model from data and model poisoning attacks. This model filter combines zero-knowledge proofs to provide further privacy protection. Then, we adopt secret sharing to provide verifiable secure aggregation, removing malicious clients that disrupting the aggregation process. Our formal analysis proves that NoV can protect data privacy and weed out Byzantine attackers. Our experiments illustrate that NoV can effectively address data and model poisoning attacks, including PGD, and outperforms other related schemes.

Updated: 2024-06-03 07:59:10

标题: 无破坏行为：保护隐私和拜占庭鲁棒的联邦学习

摘要: 联合学习允许多个客户端共同训练一个机器学习模型，而无需共享私人数据，提供隐私保护。然而，传统的联合学习容易受到毒化攻击的影响，这不仅会降低模型性能，还可能植入恶意后门。此外，直接提交本地模型参数也可能导致训练数据集的隐私泄露。本文旨在构建一个保护隐私且具有拜占庭容错特性的联合学习方案，以提供一个无恶意破坏者的环境。具体来说，我们构建了一个用于检测毒化本地模型的模型过滤器，保护全局模型免受数据和模型毒化攻击。该模型过滤器结合了零知识证明技术，提供进一步的隐私保护。接着，我们采用秘密共享技术来提供可验证的安全聚合，排除干扰聚合过程的恶意客户端。我们的正式分析证明了无恶意破坏者（NoV）可以保护数据隐私并清除拜占庭攻击者。实验结果表明，NoV能够有效应对数据和模型毒化攻击，包括PGD，并且优于其他相关方案。

更新时间: 2024-06-03 07:59:10

领域: cs.CR,cs.DC,cs.LG

下载: http://arxiv.org/abs/2406.01080v1

Object Aware Egocentric Online Action Detection

Advancements in egocentric video datasets like Ego4D, EPIC-Kitchens, and Ego-Exo4D have enriched the study of first-person human interactions, which is crucial for applications in augmented reality and assisted living. Despite these advancements, current Online Action Detection methods, which efficiently detect actions in streaming videos, are predominantly designed for exocentric views and thus fail to capitalize on the unique perspectives inherent to egocentric videos. To address this gap, we introduce an Object-Aware Module that integrates egocentric-specific priors into existing OAD frameworks, enhancing first-person footage interpretation. Utilizing object-specific details and temporal dynamics, our module improves scene understanding in detecting actions. Validated extensively on the Epic-Kitchens 100 dataset, our work can be seamlessly integrated into existing models with minimal overhead and bring consistent performance enhancements, marking an important step forward in adapting action detection systems to egocentric video analysis.

Updated: 2024-06-03 07:58:40

标题: 意识到对象的主观在线动作检测

摘要: egocentric video datasets like Ego4D, EPIC-Kitchens, and Ego-Exo4D在第一人称互动研究方面取得了进展，这对增强现实和辅助生活应用至关重要。尽管取得了这些进展，但当前的在线动作检测方法主要设计用于外中心视角，因此无法充分利用内在于第一人称视频中独特的视角。为了填补这一空白，我们引入了一个对象感知模块，将第一人称特定的先验信息整合到现有的OAD框架中，增强第一人称镜头的解释。通过利用对象特定的细节和时间动态，我们的模块改善了在检测动作中的场景理解。我们在Epic-Kitchens 100数据集上进行了大量验证，我们的工作可以无缝地集成到现有模型中，带来持续的性能提升，标志着将动作检测系统调整到第一人称视频分析的重要一步。

更新时间: 2024-06-03 07:58:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.01079v1

Estimating Canopy Height at Scale

We propose a framework for global-scale canopy height estimation based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regions, enhancing the reliability of our predictions in those areas. A comparison between predictions and ground-truth labels yields an MAE / RMSE of 2.43 / 4.73 (meters) overall and 4.45 / 6.72 (meters) for trees taller than five meters, which depicts a substantial improvement compared to existing global-scale maps. The resulting height map as well as the underlying framework will facilitate and enhance ecological analyses at a global scale, including, but not limited to, large-scale forest and biomass monitoring.

Updated: 2024-06-03 07:53:38

标题: 在规模上估算树冠高度

摘要: 我们提出了一个基于卫星数据的全球尺度树冠高度估算框架。我们的模型利用先进的数据预处理技术，采用一种新颖的损失函数，旨在抵消地面真实高度测量中固有的地理位置不准确性，并利用航天飞机雷达地形测量任务的数据，有效地过滤出山区错误标签，增强了我们在这些地区的预测可靠性。预测和地面真实标签之间的比较结果显示，整体的MAE / RMSE为2.43 / 4.73（米），对于高度超过五米的树木为4.45 / 6.72（米），相较于现有的全球尺度地图，这显示出了显著的改进。生成的高度地图以及底层框架将促进和增强全球范围的生态分析，包括但不限于大规模森林和生物量监测。

更新时间: 2024-06-03 07:53:38

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.01076v1

NUBO: A Transparent Python Package for Bayesian Optimization

NUBO, short for Newcastle University Bayesian Optimization, is a Bayesian optimization framework for optimizing expensive-to-evaluate black-box functions, such as physical experiments and computer simulators. Bayesian optimization is a cost-efficient optimization strategy that uses surrogate modeling via Gaussian processes to represent an objective function and acquisition functions to guide the selection of candidate points to approximate the global optimum of the objective function. NUBO focuses on transparency and user experience to make Bayesian optimization accessible to researchers from all disciplines. Clean and understandable code, precise references, and thorough documentation ensure transparency, while a modular and flexible design, easy-to-write syntax, and careful selection of Bayesian optimization algorithms ensure a good user experience. NUBO allows users to tailor Bayesian optimization to their problem by writing a custom optimization loop using the provided building blocks. It supports sequential single-point, parallel multi-point, and asynchronous optimization of bounded, constrained, and mixed (discrete and continuous) parameter input spaces. Only algorithms and methods extensively tested and validated to perform well are included in NUBO. This ensures that the package remains compact and does not overwhelm the user with an unnecessarily large number of options. The package is written in Python but does not require expert knowledge of Python to optimize simulators and experiments. NUBO is distributed as open-source software under the BSD 3-Clause license.

Updated: 2024-06-03 07:52:21

标题: NUBO: 一个用于贝叶斯优化的透明Python包

摘要: NUBO是Newcastle大学贝叶斯优化的缩写，是一种用于优化昂贵的黑盒函数（例如物理实验和计算机模拟器）的贝叶斯优化框架。贝叶斯优化是一种成本高效的优化策略，通过高斯过程来表示目标函数，并使用获取函数来指导选择候选点以逼近目标函数的全局最优解。NUBO专注于透明度和用户体验，使贝叶斯优化对来自各个学科的研究人员都可以使用。清晰易懂的代码、精确的参考文献和完善的文档确保透明度，而模块化和灵活的设计、易于编写的语法以及对贝叶斯优化算法的精心选择则确保了良好的用户体验。NUBO允许用户通过使用提供的构建块编写自定义优化循环来调整贝叶斯优化以解决问题。它支持有界、约束和混合（离散和连续）参数输入空间的顺序单点、并行多点和异步优化。只有经过广泛测试和验证表现良好的算法和方法才包含在NUBO中。这确保了软件包保持紧凑，并且不会用大量不必要的选项来压倒用户。该软件包使用Python编写，但不需要对Python有专业知识即可优化模拟器和实验。NUBO以BSD 3-Clause许可证的形式作为开源软件发布。

更新时间: 2024-06-03 07:52:21

领域: cs.LG,cs.MS,stat.ML

下载: http://arxiv.org/abs/2305.06709v2

Cardinality Estimation over Knowledge Graphs with Embeddings and Graph Neural Networks

Cardinality Estimation over Knowledge Graphs (KG) is crucial for query optimization, yet remains a challenging task due to the semi-structured nature and complex correlations of typical Knowledge Graphs. In this work, we propose GNCE, a novel approach that leverages knowledge graph embeddings and Graph Neural Networks (GNN) to accurately predict the cardinality of conjunctive queries. GNCE first creates semantically meaningful embeddings for all entities in the KG, which are then integrated into the given query, which is processed by a GNN to estimate the cardinality of the query. We evaluate GNCE on several KGs in terms of q-Error and demonstrate that it outperforms state-of-the-art approaches based on sampling, summaries, and (machine) learning in terms of estimation accuracy while also having lower execution time and less parameters. Additionally, we show that GNCE can inductively generalise to unseen entities, making it suitable for use in dynamic query processing scenarios. Our proposed approach has the potential to significantly improve query optimization and related applications that rely on accurate cardinality estimates of conjunctive queries.

Updated: 2024-06-03 07:51:54

标题: 使用嵌入和图神经网络进行知识图谱的基数估计

摘要: 在知识图谱（KG）上对基数进行估计对于查询优化至关重要，但由于典型知识图谱的半结构化性质和复杂关联性而仍然是一项具有挑战性的任务。在这项工作中，我们提出了一种新颖的方法GNCE，利用知识图谱嵌入和图神经网络（GNN）来准确预测联合查询的基数。GNCE首先为KG中的所有实体创建语义上有意义的嵌入，然后将这些嵌入集成到给定的查询中，并通过GNN处理查询以估计其基数。我们根据q-Error在几个知识图谱上评估了GNCE，并表明它在估计准确性方面优于基于采样、摘要和（机器）学习的最新方法，同时具有更低的执行时间和更少的参数。此外，我们展示了GNCE可以归纳地泛化到未见实体，使其适用于动态查询处理场景。我们提出的方法有望显著改进查询优化和依赖于联合查询准确基数估计的相关应用。

更新时间: 2024-06-03 07:51:54

领域: cs.DB,cs.AI,cs.LG

下载: http://arxiv.org/abs/2303.01140v2

Hierarchical Tree-structured Knowledge Graph For Academic Insight Survey

Research surveys have always posed a challenge for beginner researchers who lack of research training. These researchers struggle to understand the directions within their research topic, and the discovery of new research findings within a short time. One way to provide intuitive assistance to beginner researchers is by offering relevant knowledge graphs(KG) and recommending related academic papers. However, existing navigation knowledge graphs primarily rely on keywords in the research field and often fail to present the logical hierarchy among multiple related papers clearly. Moreover, most recommendation systems for academic papers simply rely on high text similarity, which can leave researchers confused as to why a particular article is being recommended. They may lack of grasp important information about the insight connection between "Issue resolved" and "Issue finding" that they hope to obtain. To address these issues, this study aims to support research insight surveys for beginner researchers by establishing a hierarchical tree-structured knowledge graph that reflects the inheritance insight of research topics and the relevance insight among the academic papers.

Updated: 2024-06-03 07:48:19

标题: 层次树状知识图谱用于学术洞察调查

摘要: 研究调查一直是缺乏研究训练的初学者研究人员面临的挑战。这些研究人员往往难以理解他们研究主题内的方向，并在短时间内发现新的研究发现。为初学者研究人员提供直观帮助的一种方式是提供相关的知识图（KG）并推荐相关的学术论文。然而，现有的导航知识图主要依赖于研究领域中的关键词，往往无法清晰地呈现多个相关论文之间的逻辑层次结构。此外，大多数学术论文推荐系统仅仅依赖于高文本相似性，这可能会让研究人员困惑为什么推荐了特定的文章。他们可能缺乏对“问题解决”和“问题发现”之间的重要信息连接的把握，而这正是他们希望获得的。为了解决这些问题，本研究旨在通过建立一个反映研究主题继承洞察和学术论文之间相关性洞察的分层树状知识图，支持初学者研究人员的研究洞察调查。

更新时间: 2024-06-03 07:48:19

领域: cs.DL,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.04854v3

Towards Efficient Deep Spiking Neural Networks Construction with Spiking Activity based Pruning

The emergence of deep and large-scale spiking neural networks (SNNs) exhibiting high performance across diverse complex datasets has led to a need for compressing network models due to the presence of a significant number of redundant structural units, aiming to more effectively leverage their low-power consumption and biological interpretability advantages. Currently, most model compression techniques for SNNs are based on unstructured pruning of individual connections, which requires specific hardware support. Hence, we propose a structured pruning approach based on the activity levels of convolutional kernels named Spiking Channel Activity-based (SCA) network pruning framework. Inspired by synaptic plasticity mechanisms, our method dynamically adjusts the network's structure by pruning and regenerating convolutional kernels during training, enhancing the model's adaptation to the current target task. While maintaining model performance, this approach refines the network architecture, ultimately reducing computational load and accelerating the inference process. This indicates that structured dynamic sparse learning methods can better facilitate the application of deep SNNs in low-power and high-efficiency scenarios.

Updated: 2024-06-03 07:44:37

标题: 朝着利用基于脉冲活动的修剪构建高效深度脉冲神经网络

摘要: 最近出现的深层和大规模脉冲神经网络（SNNs）展现出在各种复杂数据集上高性能的趋势，这导致了对网络模型进行压缩的需求，因为存在大量冗余的结构单元，旨在更有效地利用其低功耗和生物解释性优势。目前，大多数用于SNNs的模型压缩技术基于对个体连接的非结构化剪枝，这需要特定的硬件支持。因此，我们提出了一种基于卷积核活动水平的结构化剪枝方法，命名为脉冲通道活动基础（SCA）网络剪枝框架。受到突触可塑性机制的启发，我们的方法通过在训练过程中对卷积核进行剪枝和再生成，动态调整网络的结构，增强模型对当前目标任务的适应性。在保持模型性能的同时，这种方法完善了网络架构，最终减少了计算负载并加速了推理过程。这表明，结构化动态稀疏学习方法可以更好地促进在低功耗和高效率场景中应用深层SNNs。

更新时间: 2024-06-03 07:44:37

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2406.01072v1

KS-Lottery: Finding Certified Lottery Tickets for Multilingual Language Models

The lottery ticket hypothesis posits the existence of ``winning tickets'' within a randomly initialized neural network. Do winning tickets exist for LLMs in fine-tuning scenarios? How can we find such winning tickets? In this paper, we propose KS-Lottery, a method to identify a small subset of LLM parameters highly effective in multilingual fine-tuning. Our key idea is to use Kolmogorov-Smirnov Test to analyze the distribution shift of parameters before and after fine-tuning. We further theoretically prove that KS-Lottery can find the certified winning tickets in the embedding layer, fine-tuning on the found parameters is guaranteed to perform as well as full fine-tuning. Comparing KS-Lottery with other parameter-efficient tuning algorithms on translation tasks, the experimental results show that KS-Lottery finds a much smaller set of parameters for fine-tuning while achieving the comparable performance as full fine-tuning LLM. Surprisingly, we find that fine-tuning 18 tokens' embedding of LLaMA suffices to reach the fine-tuning translation performance~\footnote{https://github.com/CONE-MT/KS-Lottery.}.

Updated: 2024-06-03 07:35:25

标题: KS-Lottery：为多语言模型寻找认证彩票票据

摘要: 抽奖票假设假设在随机初始化的神经网络中存在“中奖票”。在微调场景中，LLMs存在中奖票吗？我们如何找到这样的中奖票？在本文中，我们提出了KS-Lottery，一种用于识别在多语言微调中高效的LLM参数的小子集的方法。我们的关键思想是使用Kolmogorov-Smirnov检验来分析微调前后参数的分布变化。我们进一步在理论上证明，KS-Lottery可以在嵌入层中找到认证的中奖票，对找到的参数进行微调可以保证性能与完全微调一样好。将KS-Lottery与其他参数高效调整算法在翻译任务上进行比较，实验结果显示，KS-Lottery在微调时找到了一个更小的参数集，同时达到了与完全微调LLM相当的性能。令人惊讶的是，我们发现微调LLaMA的18个令牌的嵌入就足以达到微调翻译的性能。

更新时间: 2024-06-03 07:35:25

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.02801v2

SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite their high expressive power. We further identify the attention of transformers as being responsible for this low generalization capacity. Building upon this insight, we propose a shallow lightweight transformer model that successfully escapes bad local minima when optimized with sharpness-aware optimization. We empirically demonstrate that this result extends to all commonly used real-world multivariate time series datasets. In particular, SAMformer surpasses current state-of-the-art methods and is on par with the biggest foundation model MOIRAI while having significantly fewer parameters. The code is available at https://github.com/romilbert/samformer.

Updated: 2024-06-03 07:34:37

标题: SAMformer：通过锐度感知最小化和通道注意力释放变压器在时间序列预测中的潜力

摘要: 基于Transformer的架构在自然语言处理和计算机视觉中取得了突破性的表现，然而它们在多变量长期预测方面仍然不如更简单的线性基准。为了更好地理解这一现象，我们从研究一个玩具线性预测问题开始，我们展示了尽管Transformer具有很高的表达能力，但它们无法收敛到真正的解决方案。我们进一步确定了Transformer的注意力机制是导致其泛化能力低的原因。基于这一洞见，我们提出了一个浅层轻量级Transformer模型，当采用锐度感知优化进行优化时，成功地避开了坏的局部最小值。我们在实证中证明了这一结果适用于所有常用的真实世界多变量时间序列数据集。特别是，SAMformer超越了当前的最先进方法，并与最大的基础模型MOIRAI不相上下，同时参数数量显著较少。代码可在https://github.com/romilbert/samformer上找到。

更新时间: 2024-06-03 07:34:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.10198v3

Causal prompting model-based offline reinforcement learning

Model-based offline Reinforcement Learning (RL) allows agents to fully utilise pre-collected datasets without requiring additional or unethical explorations. However, applying model-based offline RL to online systems presents challenges, primarily due to the highly suboptimal (noise-filled) and diverse nature of datasets generated by online systems. To tackle these issues, we introduce the Causal Prompting Reinforcement Learning (CPRL) framework, designed for highly suboptimal and resource-constrained online scenarios. The initial phase of CPRL involves the introduction of the Hidden-Parameter Block Causal Prompting Dynamic (Hip-BCPD) to model environmental dynamics. This approach utilises invariant causal prompts and aligns hidden parameters to generalise to new and diverse online users. In the subsequent phase, a single policy is trained to address multiple tasks through the amalgamation of reusable skills, circumventing the need for training from scratch. Experiments conducted across datasets with varying levels of noise, including simulation-based and real-world offline datasets from the Dnurse APP, demonstrate that our proposed method can make robust decisions in out-of-distribution and noisy environments, outperforming contemporary algorithms. Additionally, we separately verify the contributions of Hip-BCPDs and the skill-reuse strategy to the robustness of performance. We further analyse the visualised structure of Hip-BCPD and the interpretability of sub-skills. We released our source code and the first ever real-world medical dataset for precise medical decision-making tasks.

Updated: 2024-06-03 07:28:57

标题: 因果提示模型驱动的离线强化学习

摘要: 基于模型的离线强化学习（RL）允许智能体充分利用预先收集的数据集，而无需额外或不道德的探索。然而，将基于模型的离线RL应用于在线系统存在挑战，主要是由于在线系统生成的数据集具有高度次优（噪声填充）和多样化的特性。为了解决这些问题，我们引入了因果提示强化学习（CPRL）框架，专为高度次优和资源受限的在线场景设计。CPRL的初始阶段涉及引入隐藏参数块因果提示动态（Hip-BCPD）来建模环境动态。该方法利用不变的因果提示并对齐隐藏参数以泛化到新的和多样化的在线用户。在随后的阶段，通过整合可重用技能训练单个策略来解决多个任务，避免了从头开始训练的需要。在包括来自Dnurse APP的模拟和真实世界离线数据集在内的具有不同噪声水平的数据集上进行的实验表明，我们提出的方法能够在分布外和嘈杂的环境中做出稳健的决策，优于当代算法。此外，我们分别验证了Hip-BCPD和技能重用策略对性能稳健性的贡献。我们进一步分析了Hip-BCPD的可视化结构和子技能的可解释性。我们发布了我们的源代码和用于精确医疗决策任务的首个真实世界医疗数据集。

更新时间: 2024-06-03 07:28:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01065v1

Spiking mode-based neural networks

Spiking neural networks play an important role in brain-like neuromorphic computations and in studying working mechanisms of neural circuits. One drawback of training a large scale spiking neural network is that updating all weights is quite expensive. Furthermore, after training, all information related to the computational task is hidden into the weight matrix, prohibiting us from a transparent understanding of circuit mechanisms. Therefore, in this work, we address these challenges by proposing a spiking mode-based training protocol, where the recurrent weight matrix is explained as a Hopfield-like multiplication of three matrices: input, output modes and a score matrix. The first advantage is that the weight is interpreted by input and output modes and their associated scores characterizing the importance of each decomposition term. The number of modes is thus adjustable, allowing more degrees of freedom for modeling the experimental data. This significantly reduces the training cost because of significantly reduced space complexity for learning. Training spiking networks is thus carried out in the mode-score space. The second advantage is that one can project the high dimensional neural activity (filtered spike train) in the state space onto the mode space which is typically of a low dimension, e.g., a few modes are sufficient to capture the shape of the underlying neural manifolds. We successfully apply our framework in two computational tasks -- digit classification and selective sensory integration tasks. Our method accelerate the training of spiking neural networks by a Hopfield-like decomposition, and moreover this training leads to low-dimensional attractor structures of high-dimensional neural dynamics.

Updated: 2024-06-03 07:27:04

标题: 基于尖峰模式的神经网络

摘要: 尖峰神经网络在类似大脑的神经形态计算和研究神经回路的工作机制中扮演着重要角色。训练大规模尖峰神经网络的一个缺点是更新所有权重成本很高。此外，在训练之后，与计算任务相关的所有信息都隐藏在权重矩阵中，阻碍了我们对电路机制的透明理解。因此，在这项工作中，我们通过提出基于尖峰模式的训练协议来解决这些挑战，其中循环权重矩阵被解释为三个矩阵的Hopfield样式乘积：输入、输出模式和一个评分矩阵。第一个优势是权重由输入和输出模式及其关联的得分解释，表征每个分解项的重要性。因此，模式的数量是可调整的，允许更多的自由度来建模实验数据。这显著降低了训练成本，因为学习的空间复杂度显著减少。尖峰网络的训练因此在模式-得分空间中进行。第二个优势是可以将高维神经活动（滤波尖峰列车）在状态空间中投影到通常是低维的模式空间，例如，少数模式足以捕捉基础神经流形的形状。我们成功地将我们的框架应用于两个计算任务--数字分类和选择性感觉整合任务。我们的方法通过Hopfield样式分解加速了尖峰神经网络的训练，而且这种训练导致了高维神经动态的低维吸引子结构。

更新时间: 2024-06-03 07:27:04

领域: q-bio.NC,cond-mat.dis-nn,cs.AI,cs.NE

下载: http://arxiv.org/abs/2310.14621v2

SparseTSF: Modeling Long-term Time Series Forecasting with 1k Parameters

This paper introduces SparseTSF, a novel, extremely lightweight model for Long-term Time Series Forecasting (LTSF), designed to address the challenges of modeling complex temporal dependencies over extended horizons with minimal computational resources. At the heart of SparseTSF lies the Cross-Period Sparse Forecasting technique, which simplifies the forecasting task by decoupling the periodicity and trend in time series data. This technique involves downsampling the original sequences to focus on cross-period trend prediction, effectively extracting periodic features while minimizing the model's complexity and parameter count. Based on this technique, the SparseTSF model uses fewer than *1k* parameters to achieve competitive or superior performance compared to state-of-the-art models. Furthermore, SparseTSF showcases remarkable generalization capabilities, making it well-suited for scenarios with limited computational resources, small samples, or low-quality data. The code is publicly available at this repository: https://github.com/lss-1138/SparseTSF.

Updated: 2024-06-03 07:13:37

标题: SparseTSF：使用1k参数建模长期时间序列预测

摘要: 本文介绍了SparseTSF，这是一个新颖且极其轻量级的模型，用于长期时间序列预测(LTSF)，旨在解决使用最少计算资源来建模复杂时间依赖性的挑战。SparseTSF的核心是Cross-Period Sparse Forecasting技术，该技术通过解耦时间序列数据中的周期性和趋势，简化了预测任务。该技术涉及将原始序列进行下采样，集中于跨周期的趋势预测，有效提取周期性特征同时最小化模型的复杂性和参数数量。基于这种技术，SparseTSF模型使用少于1k个参数来实现与最先进模型相当或更优越的性能。此外，SparseTSF展示了出色的泛化能力，使其非常适用于计算资源有限、样本稀少或数据质量低的情况。该代码可在以下存储库中公开获取：https://github.com/lss-1138/SparseTSF。

更新时间: 2024-06-03 07:13:37

领域: cs.LG

下载: http://arxiv.org/abs/2405.00946v2

Virtual avatar generation models as world navigators

We introduce SABR-CLIMB, a novel video model simulating human movement in rock climbing environments using a virtual avatar. Our diffusion transformer predicts the sample instead of noise in each diffusion step and ingests entire videos to output complete motion sequences. By leveraging a large proprietary dataset, NAV-22M, and substantial computational resources, we showcase a proof of concept for a system to train general-purpose virtual avatars for complex tasks in robotics, sports, and healthcare.

Updated: 2024-06-03 07:10:15

标题: 虚拟化身生成模型作为世界导航者

摘要: 我们介绍了SABR-CLIMB，一种新颖的视频模型，使用虚拟化身模拟人类在攀岩环境中的运动。我们的扩散变换器在每个扩散步骤中预测样本而不是噪音，并摄入整个视频以输出完整的动作序列。通过利用庞大的专有数据集NAV-22M和大量的计算资源，我们展示了一个用于训练通用虚拟化身执行复杂任务的系统的概念验证。这些任务涉及机器人技术、体育以及医疗保健领域。

更新时间: 2024-06-03 07:10:15

领域: cs.CV,cs.AI,cs.HC,cs.LG,cs.RO

下载: http://arxiv.org/abs/2406.01056v1

Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model

Vision-language foundation models have exhibited remarkable success across a multitude of downstream tasks due to their scalability on extensive image-text paired data. However, these models also display significant limitations when applied to downstream tasks, such as fine-grained image classification, as a result of ``decision shortcuts'' that hinder their generalization capabilities. In this work, we find that the CLIP model possesses a rich set of features, encompassing both \textit{desired invariant causal features} and \textit{undesired decision shortcuts}. Moreover, the underperformance of CLIP on downstream tasks originates from its inability to effectively utilize pre-trained features in accordance with specific task requirements. To address this challenge, we propose a simple yet effective method, Spurious Feature Eraser (SEraser), to alleviate the decision shortcuts by erasing the spurious features. Specifically, we introduce a test-time prompt tuning paradigm that optimizes a learnable prompt, thereby compelling the model to exploit invariant features while disregarding decision shortcuts during the inference phase. The proposed method effectively alleviates excessive dependence on potentially misleading spurious information. We conduct comparative analysis of the proposed method against various approaches which validates the significant superiority.

Updated: 2024-06-03 07:09:39

标题: 虚假特征擦除器：稳定视觉-语言基础模型的测试时适应

摘要: 视觉-语言基础模型在大量的图像文本配对数据上表现出了显著的成功，因此在各种下游任务中具有可扩展性。然而，当应用于下游任务，如细粒度图像分类时，这些模型也显示出明显的局限性，这是由于“决策捷径”阻碍了它们的泛化能力。在这项工作中，我们发现CLIP模型拥有丰富的特征集，包括既期望的不变因果特征，也包括不希望的决策捷径。此外，CLIP在下游任务上的表现不佳源于其无法根据特定任务要求有效利用预训练特征。为了解决这一挑战，我们提出了一种简单而有效的方法，即虚假特征擦除器（SEraser），通过擦除虚假特征来减轻决策捷径。具体来说，我们引入了一种测试时提示调整范式，优化可学习提示，从而迫使模型在推理阶段利用不变特征，而忽略决策捷径。所提出的方法有效地减轻了对潜在误导性虚假信息的过度依赖。我们对所提出的方法与各种方法进行了比较分析，验证了显著的优越性。

更新时间: 2024-06-03 07:09:39

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.00376v2

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder

Accent plays a significant role in speech communication, influencing one's capability to understand as well as conveying a person's identity. This paper introduces a novel and efficient framework for accented Text-to-Speech (TTS) synthesis based on a Conditional Variational Autoencoder. It has the ability to synthesize a selected speaker's voice, which is converted to any desired target accent. Our thorough experiments validate the effectiveness of the proposed framework using both objective and subjective evaluations. The results also show remarkable performance in terms of the ability to manipulate accents in the synthesized speech and provide a promising avenue for future accented TTS research.

Updated: 2024-06-03 07:01:54

标题: 使用条件变分自动编码器进行带口音的文本转语音合成

摘要: 口音在语音交流中起着重要作用，影响了一个人理解和表达身份的能力。本文介绍了一种基于条件变分自动编码器的口音文本到语音（TTS）合成的新颖有效框架。它具有合成特定说话者的声音，并将其转换为任何所需的目标口音的能力。我们进行了彻底的实验证实了所提出框架的有效性，使用客观和主观评估。结果还显示，在合成语音中的口音操控能力方面表现出色，并为未来口音TTS研究提供了一个有前景的途径。

更新时间: 2024-06-03 07:01:54

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2211.03316v2

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

We investigate parameter-efficient fine-tuning (PEFT) methods that can provide good accuracy under limited computational and memory budgets in the context of large language models (LLMs). We present a new PEFT method called Robust Adaptation (RoSA) inspired by robust principal component analysis that jointly trains $\textit{low-rank}$ and $\textit{highly-sparse}$ components on top of a set of fixed pretrained weights to efficiently approximate the performance of a full-fine-tuning (FFT) solution. Across a series of challenging generative tasks such as grade-school math and SQL query generation, which require fine-tuning for good performance, we show that RoSA outperforms LoRA, pure sparse fine-tuning, and alternative hybrid methods at the same parameter budget, and can even recover the performance of FFT on some tasks. We provide system support for RoSA to complement the training algorithm, specifically in the form of sparse GPU kernels which enable memory- and computationally-efficient training, and show that it is also compatible with low-precision base weights, resulting in the first joint representation combining quantization, low-rank and sparse approximations. Our code is available at https://github.com/IST-DASLab/RoSA.

Updated: 2024-06-03 06:59:31

标题: RoSA：通过稳健调整实现准确的参数高效微调

摘要: 我们调查了在大型语言模型（LLMs）的背景下，可以在有限的计算和内存预算下提供良好准确性的参数高效微调（PEFT）方法。我们提出了一种新的PEFT方法，称为Robust Adaptation（RoSA），灵感来自于鲁棒主成分分析，它在一组固定的预训练权重之上联合训练低秩和高度稀疏的组件，以有效地近似完全微调（FFT）解决方案的性能。在一系列具有挑战性的生成任务中，例如小学数学和SQL查询生成，这些任务需要微调才能获得良好的性能，我们展示了RoSA在相同参数预算下优于LoRA、纯稀疏微调和替代混合方法，并且甚至可以恢复某些任务上FFT的性能。我们为RoSA提供了系统支持，以补充训练算法，具体是以稀疏GPU核心的形式，这使得训练既节省内存又节省计算资源，并且显示它也与低精度基础权重兼容，从而实现了第一个联合表示，结合了量化、低秩和稀疏近似。我们的代码可以在https://github.com/IST-DASLab/RoSA上找到。

更新时间: 2024-06-03 06:59:31

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.04679v7

An Advanced Reinforcement Learning Framework for Online Scheduling of Deferrable Workloads in Cloud Computing

Efficient resource utilization and perfect user experience usually conflict with each other in cloud computing platforms. Great efforts have been invested in increasing resource utilization but trying not to affect users' experience for cloud computing platforms. In order to better utilize the remaining pieces of computing resources spread over the whole platform, deferrable jobs are provided with a discounted price to users. For this type of deferrable jobs, users are allowed to submit jobs that will run for a specific uninterrupted duration in a flexible range of time in the future with a great discount. With these deferrable jobs to be scheduled under the remaining capacity after deploying those on-demand jobs, it remains a challenge to achieve high resource utilization and meanwhile shorten the waiting time for users as much as possible in an online manner. In this paper, we propose an online deferrable job scheduling method called \textit{Online Scheduling for DEferrable jobs in Cloud} (\OSDEC{}), where a deep reinforcement learning model is adopted to learn the scheduling policy, and several auxiliary tasks are utilized to provide better state representations and improve the performance of the model. With the integrated reinforcement learning framework, the proposed method can well plan the deployment schedule and achieve a short waiting time for users while maintaining a high resource utilization for the platform. The proposed method is validated on a public dataset and shows superior performance.

Updated: 2024-06-03 06:55:26

标题: 一种用于云计算中可推迟工作负载在线调度的高级强化学习框架

摘要: 云计算平台中，高效利用资源与提供完美用户体验通常存在冲突。为了增加资源利用率但又不影响用户体验，人们投入了大量精力。为了更好地利用整个平台上分散的剩余计算资源，对可推迟的作业提供了折扣价格给用户。对于这种可推迟的作业，用户可以在未来的某个灵活的时间范围内提交将连续运行一定时间的作业，并获得较大的折扣。在将这些可推迟的作业安排到部署了按需作业后的剩余容量中，仍然存在一个挑战，即在在线方式下实现高资源利用率的同时尽可能缩短用户的等待时间。在本文中，我们提出了一种在线可推迟作业调度方法，名为“云中可推迟作业的在线调度（OSDEC）”，采用深度强化学习模型来学习调度策略，并利用几个辅助任务提供更好的状态表示并提高模型的性能。通过集成强化学习框架，所提出的方法可以很好地规划部署计划，为用户实现短等待时间的同时保持平台的高资源利用率。所提出的方法在一个公共数据集上进行了验证，并表现出优越的性能。

更新时间: 2024-06-03 06:55:26

领域: cs.DC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.01047v1

Decompose, Enrich, and Extract! Schema-aware Event Extraction using LLMs

Large Language Models (LLMs) demonstrate significant capabilities in processing natural language data, promising efficient knowledge extraction from diverse textual sources to enhance situational awareness and support decision-making. However, concerns arise due to their susceptibility to hallucination, resulting in contextually inaccurate content. This work focuses on harnessing LLMs for automated Event Extraction, introducing a new method to address hallucination by decomposing the task into Event Detection and Event Argument Extraction. Moreover, the proposed method integrates dynamic schema-aware augmented retrieval examples into prompts tailored for each specific inquiry, thereby extending and adapting advanced prompting techniques such as Retrieval-Augmented Generation. Evaluation findings on prominent event extraction benchmarks and results from a synthesized benchmark illustrate the method's superior performance compared to baseline approaches.

Updated: 2024-06-03 06:55:10

标题: 分解，丰富和提取！使用LLMs的基于模式的事件提取

摘要: 大型语言模型（LLMs）展示了在处理自然语言数据方面的显著能力，承诺从多样的文本来源中高效地提取知识，以增强情境意识并支持决策制定。然而，由于它们容易产生幻觉，导致内容在语境上不准确，因此引起了关注。本文侧重于利用LLMs进行自动事件提取，引入一种新方法来解决幻觉问题，将任务分解为事件检测和事件参数提取。此外，提出的方法将动态模式感知增强检索示例集成到为每个具体查询量身定制的提示中，从而扩展并适应先进的提示技术，如检索增强生成。在突出的事件提取基准上的评估结果以及来自合成基准的结果展示了该方法相对于基准方法的卓越性能。

更新时间: 2024-06-03 06:55:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.01045v1

Nuclear Medicine Artificial Intelligence in Action: The Bethesda Report (AI Summit 2024)

The 2nd SNMMI Artificial Intelligence (AI) Summit, organized by the SNMMI AI Task Force, took place in Bethesda, MD, on February 29 - March 1, 2024. Bringing together various community members and stakeholders, and following up on a prior successful 2022 AI Summit, the summit theme was: AI in Action. Six key topics included (i) an overview of prior and ongoing efforts by the AI task force, (ii) emerging needs and tools for computational nuclear oncology, (iii) new frontiers in large language and generative models, (iv) defining the value proposition for the use of AI in nuclear medicine, (v) open science including efforts for data and model repositories, and (vi) issues of reimbursement and funding. The primary efforts, findings, challenges, and next steps are summarized in this manuscript.

Updated: 2024-06-03 06:54:38

标题: 核医学人工智能行动：贝塞斯达报告（AI峰会2024）

摘要: 第二届SNMMI人工智能（AI）峰会由SNMMI AI任务组织，在2024年2月29日至3月1日在马里兰州贝塞斯达举行。峰会主题是：AI在行动中。聚集各种社区成员和利益相关者，继续之前成功的2022年AI峰会，六个关键主题包括（i）AI任务组之前和正在进行的工作概述，（ii）计算核医学新兴需求和工具，（iii）大型语言和生成模型的新领域，（iv）定义核医学中使用AI的价值主张，（v）开放科学包括数据和模型存储库的努力，以及（vi）报销和资金问题。本文总结了主要的努力、发现、挑战和下一步。

更新时间: 2024-06-03 06:54:38

领域: physics.med-ph,cs.AI

下载: http://arxiv.org/abs/2406.01044v1

On Prompt-Driven Safeguarding for Large Language Models

Prepending model inputs with safety prompts is a common practice for safeguarding large language models (LLMs) against queries with harmful intents. However, the underlying working mechanisms of safety prompts have not been unraveled yet, restricting the possibility of automatically optimizing them to improve LLM safety. In this work, we investigate how LLMs' behavior (i.e., complying with or refusing user queries) is affected by safety prompts from the perspective of model representation. We find that in the representation space, the input queries are typically moved by safety prompts in a "higher-refusal" direction, in which models become more prone to refusing to provide assistance, even when the queries are harmless. On the other hand, LLMs are naturally capable of distinguishing harmful and harmless queries without safety prompts. Inspired by these findings, we propose a method for safety prompt optimization, namely DRO (Directed Representation Optimization). Treating a safety prompt as continuous, trainable embeddings, DRO learns to move the queries' representations along or opposite the refusal direction, depending on their harmfulness. Experiments with eight LLMs on out-of-domain and jailbreak benchmarks demonstrate that DRO remarkably improves the safeguarding performance of human-crafted safety prompts, without compromising the models' general performance.

Updated: 2024-06-03 06:52:58

标题: 关于大型语言模型的即时驱动保护

摘要: 在保护大型语言模型（LLMs）免受恶意查询的常见做法是在模型输入中添加安全提示。然而，安全提示的工作机制尚未被揭示，这限制了自动优化它们以提高LLM安全性的可能性。在这项工作中，我们从模型表示的角度研究了LLMs行为（即遵守或拒绝用户查询）如何受到安全提示的影响。我们发现在表示空间中，输入查询通常会被安全提示朝着“更多拒绝”的方向移动，这使得模型更容易拒绝提供帮助，即使查询是无害的。另一方面，LLMs在没有安全提示的情况下自然能够区分有害和无害的查询。受到这些发现的启发，我们提出了一种安全提示优化方法，即DRO（Directed Representation Optimization）。将安全提示视为连续可训练的嵌入，DRO学习如何移动查询的表示沿着或相反于拒绝方向，具体取决于它们的有害性。在超出领域和越狱基准测试中，对八个LLMs进行的实验表明，DRO显著提高了人工制作的安全提示的保护性能，而不会影响模型的通用性能。

更新时间: 2024-06-03 06:52:58

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2401.18018v4

INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations

Neuro-symbolic reinforcement learning (NS-RL) has emerged as a promising paradigm for explainable decision-making, characterized by the interpretability of symbolic policies. NS-RL entails structured state representations for tasks with visual observations, but previous methods are unable to refine the structured states with rewards due to a lack of efficiency. Accessibility also remains to be an issue, as extensive domain knowledge is required to interpret symbolic policies. In this paper, we present a framework for learning structured states and symbolic policies jointly, whose key idea is to distill vision foundation models into a scalable perception module and refine it during policy learning. Moreover, we design a pipeline to generate language explanations for policies and decisions using large language models. In experiments on nine Atari tasks, we verify the efficacy of our approach, and we also present explanations for policies and decisions.

Updated: 2024-06-03 06:50:51

标题: 洞察：具有语言解释的端到端神经符号视觉强化学习

摘要: 神经符号强化学习（NS-RL）已经成为一种有前途的可解释性决策制定范式，其特点是符号政策的可解释性。NS-RL涉及具有视觉观察的任务的结构化状态表示，但是先前的方法由于缺乏效率而无法通过奖励来细化结构化状态。可访问性也仍然是一个问题，因为需要广泛的领域知识来解释符号政策。在本文中，我们提出了一个学习结构化状态和符号政策联合的框架，其关键思想是将视觉基础模型提炼为可扩展的感知模块，并在政策学习过程中进行改进。此外，我们设计了一个管道来使用大型语言模型生成政策和决策的语言解释。在九个Atari任务的实验中，我们验证了我们方法的有效性，并给出了政策和决策的解释。

更新时间: 2024-06-03 06:50:51

领域: cs.AI

下载: http://arxiv.org/abs/2403.12451v3

Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search

In code search, the Generation-Augmented Retrieval (GAR) framework, which generates exemplar code snippets to augment queries, has emerged as a promising strategy to address the principal challenge of modality misalignment between code snippets and natural language queries, particularly with the demonstrated code generation capabilities of Large Language Models (LLMs). Nevertheless, our preliminary investigations indicate that the improvements conferred by such an LLM-augmented framework are somewhat constrained. This limitation could potentially be ascribed to the fact that the generated codes, albeit functionally accurate, frequently display a pronounced stylistic deviation from the ground truth code in the codebase. In this paper, we extend the foundational GAR framework and propose a simple yet effective method that additionally Rewrites the Code (ReCo) within the codebase for style normalization. Experimental results demonstrate that ReCo significantly boosts retrieval accuracy across sparse (up to 35.7%), zero-shot dense (up to 27.6%), and fine-tuned dense (up to 23.6%) retrieval settings in diverse search scenarios. To further elucidate the advantages of ReCo and stimulate research in code style normalization, we introduce Code Style Similarity, the first metric tailored to quantify stylistic similarities in code. Notably, our empirical findings reveal the inadequacy of existing metrics in capturing stylistic nuances. The source code and data are available at \url{https://github.com/Alex-HaochenLi/ReCo}.

Updated: 2024-06-03 06:50:26

标题: 重写代码：一种用于大型语言模型增强代码搜索的简单方法

摘要: 在代码搜索中，生成增强检索（GAR）框架已经成为一种有前途的策略，它生成示例代码片段来增强查询，以解决代码片段和自然语言查询之间的主要挑战，特别是在大型语言模型（LLMs）展示出代码生成能力的情况下的模态不匹配。然而，我们的初步调查表明，LLM增强框架所赋予的改进在一定程度上受到限制。这种限制可能归因于生成的代码，尽管在功能上准确，但在代码库中与真实代码显示出明显的风格偏差。在本文中，我们扩展了基础GAR框架，并提出了一种简单而有效的方法，另外在代码库中重写代码（ReCo）以进行风格规范化。实验结果表明，ReCo在各种搜索情境中显著提高了检索准确度，包括稀疏（最高达35.7%）、零样本稠密（最高达27.6%）和微调稠密（最高达23.6%）检索设置。为了进一步阐明ReCo的优势并促进代码风格规范化的研究，我们引入了代码风格相似度，这是第一个量化代码风格相似性的度量标准。值得注意的是，我们的实证研究结果揭示了现有度量标准无法捕捉风格细微差异的不足。源代码和数据可在\url{https://github.com/Alex-HaochenLi/ReCo}上找到。

更新时间: 2024-06-03 06:50:26

领域: cs.SE,cs.CL,cs.IR,cs.LG

下载: http://arxiv.org/abs/2401.04514v2

Balanced Data Sampling for Language Model Training with Clustering

Data plays a fundamental role in the training of Large Language Models (LLMs). While attention has been paid to the collection and composition of datasets, determining the data sampling strategy in training remains an open question. Most LLMs are trained with a simple strategy, random sampling. However, this sampling strategy ignores the unbalanced nature of training data distribution, which can be sub-optimal. In this paper, we propose ClusterClip Sampling to balance the text distribution of training data for better model training. Specifically, ClusterClip Sampling utilizes data clustering to reflect the data distribution of the training set and balances the common samples and rare samples during training based on the cluster results. A repetition clip operation is introduced to mitigate the overfitting issue led by samples from certain clusters. Extensive experiments validate the effectiveness of ClusterClip Sampling, which outperforms random sampling and other cluster-based sampling variants under various training datasets and large language models.

Updated: 2024-06-03 06:48:34

标题: 使用聚类进行语言模型训练的平衡数据采样

摘要: 数据在大型语言模型（LLMs）的训练中起着基础性作用。尽管人们已经开始关注数据集的收集和组成，但在训练过程中确定数据采样策略仍然是一个未解决的问题。大多数LLMs使用简单的随机抽样策略进行训练。然而，这种抽样策略忽视了训练数据分布的不平衡性，这可能不是最佳选择。在本文中，我们提出了ClusterClip Sampling方法，以平衡训练数据的文本分布，从而实现更好的模型训练。具体来说，ClusterClip Sampling利用数据聚类来反映训练集的数据分布，并根据聚类结果在训练过程中平衡常见样本和稀有样本。引入重复剪辑操作以缓解由某些聚类中的样本导致的过拟合问题。大量实验证实了ClusterClip Sampling的有效性，在各种训练数据集和大型语言模型下，它优于随机抽样和其他基于聚类的抽样变体。

更新时间: 2024-06-03 06:48:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.14526v2

Synthetic Data Generation for 3D Myocardium Deformation Analysis

Accurate analysis of 3D myocardium deformation using high-resolution computerized tomography (CT) datasets with ground truth (GT) annotations is crucial for advancing cardiovascular imaging research. However, the scarcity of such datasets poses a significant challenge for developing robust myocardium deformation analysis models. To address this, we propose a novel approach to synthetic data generation for enriching cardiovascular imaging datasets. We introduce a synthetic data generation method, enriched with crucial GT 3D optical flow annotations. We outline the data preparation from a cardiac four-dimensional (4D) CT scan, selection of parameters, and the subsequent creation of synthetic data from the same or other sources of 3D cardiac CT data for training. Our work contributes to overcoming the limitations imposed by the scarcity of high-resolution CT datasets with precise annotations, thereby facilitating the development of accurate and reliable myocardium deformation analysis algorithms for clinical applications and diagnostics. Our code is available at: http://www.github.com/shaharzuler/cardio_volume_skewer

Updated: 2024-06-03 06:40:53

标题: 合成数据生成用于3D心肌变形分析

摘要: 使用具有地面真实（GT）注释的高分辨率计算机断层扫描（CT）数据集准确分析3D心肌变形对于推进心血管影像研究至关重要。然而，这类数据集的稀缺性对于开发强大的心肌变形分析模型构成重大挑战。为了解决这个问题，我们提出了一种新颖的方法来合成数据以丰富心血管影像数据集。我们介绍了一种合成数据生成方法，其中包含关键的GT 3D光流注释。我们概述了从心脏四维（4D）CT扫描中准备数据，参数选择，以及从相同或其他来源的3D心脏CT数据创建合成数据以供训练。我们的工作有助于克服高分辨率CT数据集稀缺性所带来的限制，从而促进准确可靠的心肌变形分析算法的发展，用于临床应用和诊断。我们的代码可以在以下网址找到：http://www.github.com/shaharzuler/cardio_volume_skewer

更新时间: 2024-06-03 06:40:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.01040v1

LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

Recent progress in Graph Neural Networks (GNNs) has greatly enhanced the ability to model complex molecular structures for predicting properties. Nevertheless, molecular data encompasses more than just graph structures, including textual and visual information that GNNs do not handle well. To bridge this gap, we present an innovative framework that utilizes multimodal molecular data to extract insights from Large Language Models (LLMs). We introduce GALLON (Graph Learning from Large Language Model Distillation), a framework that synergizes the capabilities of LLMs and GNNs by distilling multimodal knowledge into a unified Multilayer Perceptron (MLP). This method integrates the rich textual and visual data of molecules with the structural analysis power of GNNs. Extensive experiments reveal that our distilled MLP model notably improves the accuracy and efficiency of molecular property predictions.

Updated: 2024-06-03 06:33:51

标题: LLM和GNN是互补的：提炼LLM用于多模态图学习

摘要: 最近在图神经网络（GNNs）方面取得的进展极大地增强了对复杂分子结构进行性质预测的能力。然而，分子数据不仅包括图结构，还包括文本和视觉信息，而GNNs并不擅长处理这些信息。为了弥补这一差距，我们提出了一个创新的框架，利用多模态分子数据从大型语言模型（LLMs）中提取洞察力。我们引入了GALLON（从大型语言模型蒸馏学习图），这是一个通过将多模态知识蒸馏到统一的多层感知器（MLP）中来协同LLMs和GNNs的能力的框架。这种方法将分子的丰富文本和视觉数据与GNNs的结构分析能力整合在一起。广泛的实验表明，我们蒸馏的MLP模型显著提高了分子性质预测的准确性和效率。

更新时间: 2024-06-03 06:33:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01032v1

Illicit Promotion on Twitter

In this paper, we present an extensive study of the promotion of illicit goods and services on Twitter, a popular online social network(OSN). This study is made possible through the design and implementation of multiple novel tools for detecting and analyzing illicit promotion activities as well as their underlying campaigns. As the results, we observe that illicit promotion is prevalent on Twitter, along with noticeable existence on other three popular OSNs including Youtube, Facebook, and TikTok. Particularly, 12 million distinct posts of illicit promotion (PIPs) have been observed on the Twitter platform, which are widely distributed in 5 major natural languages and 10 categories of illicit goods and services, e.g., drugs, data leakage, gambling, and weapon sales. What are also observed are 580K Twitter accounts publishing PIPs as well as 37K distinct instant messaging (IM) accounts that are embedded in PIPs and serve as next hops of communication, which strongly indicates that the campaigns underpinning PIPs are also of a large scale. Also, an arms race between Twitter and illicit promotion operators is also observed. On one hand, Twitter is observed to conduct content moderation in a continuous manner and almost 80% PIPs will get gradually unpublished within six months since posted. However, in the meantime, miscreants adopt various evasion tactics to masquerade their PIPs, which renders more than 90% PIPs keeping hidden from the detection radar for two months or longer.

Updated: 2024-06-03 06:24:40

标题: 推特上的非法宣传

摘要: 在这篇论文中，我们展示了对推广非法商品和服务在Twitter上进行了广泛研究，这是一个流行的在线社交网络(OSN)。通过设计和实施多种新颖工具来检测和分析非法推广活动及其潜在的宣传活动，这项研究得以实现。结果显示，在Twitter上非法推广普遍存在，同时在其他三个流行的OSN平台，包括Youtube、Facebook和TikTok上也有明显存在。具体来说，在Twitter平台上观察到了1200万个独特的非法推广帖子(PIPs)，这些帖子广泛分布在5种主要自然语言和10种非法商品和服务类别中，如毒品、数据泄露、赌博和武器销售等。同时还观察到了58万个在Twitter上发布PIP的账户，以及3.7万个嵌入在PIP中的独立即时消息(IM)账户，这些IM账户作为通信的下一个跳跃点，强烈表明支撑PIP的宣传活动也是规模庞大的。另外，还观察到了Twitter和非法推广运营商之间的一场军备竞赛。一方面，Twitter持续进行内容调整，几乎80%的PIP将在发布后六个月内逐渐下架。然而与此同时，不法分子采用各种规避策略来掩饰他们的PIP，导致超过90%的PIP在两个月或更长时间内都能成功躲避检测雷达。

更新时间: 2024-06-03 06:24:40

领域: cs.CR,cs.SI

下载: http://arxiv.org/abs/2404.07797v2

Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language Models

Although Retrieval-Augmented Large Language Models (RALMs) demonstrate their superiority in terms of factuality, they do not consistently outperform the original retrieval-free Language Models (LMs). Our experiments reveal that this example-level performance inconsistency exists not only between retrieval-augmented and retrieval-free LM but also among different retrievers. To understand this phenomenon, we investigate the degeneration behavior of RALMs and theoretically decompose it into four categories. Further analysis based on our decomposition reveals that the innate difference in knowledge sources and the unpredictable degeneration of the reader model contribute most to the inconsistency. Drawing from our analysis, we introduce Ensemble of Retrievers (EoR), a trainable framework that can adaptively retrieve from different knowledge sources and effectively decrease unpredictable reader errors. Our experiments on Open Domain Question Answering show that EoR substantially improves performance over the RALM with a single retriever by considerably reducing inconsistent behaviors.

Updated: 2024-06-03 06:20:18

标题: 解开并减轻在检索增强的大型语言模型中的检索不一致性

摘要: 尽管检索增强型大型语言模型（RALM）在事实性方面表现出优越性，但它们并不总能在性能上 consistently outperform原始的无检索语言模型（LM）。我们的实验揭示了这种示例级性能不一致性不仅存在于检索增强和无检索LM之间，而且存在于不同的检索器之间。为了理解这一现象，我们研究了RALM的退化行为，并在理论上将其分解为四类。根据我们的分解进一步分析显示，知识来源的固有差异和读者模型的不可预测的退化最大程度地导致了不一致性。借鉴我们的分析，我们引入了检索器集成（EoR），这是一个可训练的框架，可以自适应地从不同的知识来源中检索，并有效减少不可预测的读者错误。我们在开放领域问答方面的实验表明，EoR通过显著减少不一致行为极大地提高了性能，超过了仅有一个检索器的RALM。

更新时间: 2024-06-03 06:20:18

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20680v2

Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge

The problem of sample complexity of online reinforcement learning is often studied in the literature without taking into account any partial knowledge about the system dynamics that could potentially accelerate the learning process. In this paper, we study the sample complexity of online Q-learning methods when some prior knowledge about the dynamics is available or can be learned efficiently. We focus on systems that evolve according to an additive disturbance model of the form $S_{h+1} = f(S_h, A_h) + W_h$, where $f$ represents the underlying system dynamics, and $W_h$ are unknown disturbances independent of states and actions. In the setting of finite episodic Markov decision processes with $S$ states, $A$ actions, and episode length $H$, we present an optimistic Q-learning algorithm that achieves $\tilde{\mathcal{O}}(\text{Poly}(H)\sqrt{T})$ regret under perfect knowledge of $f$, where $T$ is the total number of interactions with the system. This is in contrast to the typical $\tilde{\mathcal{O}}(\text{Poly}(H)\sqrt{SAT})$ regret for existing Q-learning methods. Further, if only a noisy estimate $\hat{f}$ of $f$ is available, our method can learn an approximately optimal policy in a number of samples that is independent of the cardinalities of state and action spaces. The sub-optimality gap depends on the approximation error $\hat{f}-f$, as well as the Lipschitz constant of the corresponding optimal value function. Our approach does not require modeling of the transition probabilities and enjoys the same memory complexity as model-free methods.

Updated: 2024-06-03 06:17:33

标题: 具有部分动态知识的样本高效强化学习

摘要: 在线强化学习的样本复杂性问题通常在文献中进行研究，而不考虑可能加速学习过程的任何系统动态的部分知识。本文研究了在线Q-learning方法的样本复杂性，当系统动态的一些先验知识可用或可以有效地学习时。我们关注按照形式为$S_{h+1} = f(S_h, A_h) + W_h$的加性干扰模型演变的系统，在这个模型中，$f$代表基础系统动态，而$W_h$是与状态和动作无关的未知干扰。在具有$S$个状态，$A$个动作和长度为$H$的有限周期马尔可夫决策过程设置中，我们提出了一种乐观的Q-learning算法，在完全了解$f$的情况下实现了$\tilde{\mathcal{O}}(\text{Poly}(H)\sqrt{T})$的后悔，其中$T$是与系统的总交互次数。这与现有Q-learning方法的典型$\tilde{\mathcal{O}}(\text{Poly}(H)\sqrt{SAT})$后悔形成对比。此外，如果只有对$f$的一个噪声估计$\hat{f}$可用，我们的方法可以在独立于状态和动作空间的基数的样本数量中学习近似最优策略。次优间隙取决于近似误差$\hat{f}-f$，以及对应最优值函数的Lipschitz常数。我们的方法不需要建模转移概率，并且具有与无模型方法相同的内存复杂性。

更新时间: 2024-06-03 06:17:33

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2312.12558v3

How Vocabulary Sharing Facilitates Multilingualism in LLaMA?

Large Language Models (LLMs), often show strong performance on English tasks, while exhibiting limitations on other languages. What is an LLM's multilingual capability when it is trained only on certain languages? The underlying mechanism remains unclear. This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective by conducting an exhaustive analysis across 101 languages. Through the investigation of the performance gap before and after embedding fine-tuning, we discovered four distinct quadrants. By delving into each quadrant we provide actionable and efficient guidelines for tuning these languages. Extensive experiments reveal that existing LLMs possess multilingual capabilities that surpass our expectations, and we can significantly improve the multilingual performance of LLMs based on these attributes of each quadrant~\footnote{\url{https://github.com/CONE-MT/Vocabulary-Sharing-Facilitates-Multilingualism}.}.

Updated: 2024-06-03 06:11:06

标题: 如何词汇共享促进LLaMA中的多语言能力？

摘要: 大型语言模型（LLMs）在英语任务上通常表现出色，但在其他语言上存在局限性。当LLM仅在某些语言上接受训练时，它的多语言能力是什么？其基本机制仍不清楚。本研究从词汇共享的角度对LLMs的多语言能力进行了考察，通过对101种语言进行全面分析。通过在嵌入微调之前和之后进行性能差距的调查，我们发现了四个不同的象限。通过深入研究每个象限，我们为调整这些语言提供了可行的和高效的指导方针。大量实验证明，现有的LLMs具有超出我们预期的多语言能力，并且基于每个象限的属性，我们可以显著提高LLMs的多语言性能。

更新时间: 2024-06-03 06:11:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.09071v2

Poisoning Attacks and Defenses in Recommender Systems: A Survey

Modern recommender systems (RS) have profoundly enhanced user experience across digital platforms, yet they face significant threats from poisoning attacks. These attacks, aimed at manipulating recommendation outputs for unethical gains, exploit vulnerabilities in RS through injecting malicious data or intervening model training. This survey presents a unique perspective by examining these threats through the lens of an attacker, offering fresh insights into their mechanics and impacts. Concretely, we detail a systematic pipeline that encompasses four stages of a poisoning attack: setting attack goals, assessing attacker capabilities, analyzing victim architecture, and implementing poisoning strategies. The pipeline not only aligns with various attack tactics but also serves as a comprehensive taxonomy to pinpoint focuses of distinct poisoning attacks. Correspondingly, we further classify defensive strategies into two main categories: poisoning data filtering and robust training from the defender's perspective. Finally, we highlight existing limitations and suggest innovative directions for further exploration in this field.

Updated: 2024-06-03 06:08:02

标题: 在推荐系统中的毒害攻击与防御：一项调查

摘要: 现代推荐系统（RS）极大地提升了数字平台用户体验，然而它们面临来自毒化攻击的重大威胁。这些攻击旨在通过注入恶意数据或干预模型训练来操纵推荐结果，以获取不道德的利益，并利用RS中的漏洞。本调查通过攻击者的视角审视这些威胁，为它们的机制和影响提供新的见解。具体而言，我们详细介绍了一个系统化流程，包括四个阶段的毒化攻击：设定攻击目标、评估攻击者能力、分析受害者架构和实施毒化策略。这个流程不仅与各种攻击策略相一致，还作为一个全面的分类法，以准确定位不同毒化攻击的焦点。相应地，我们进一步将防御策略分为两个主要类别：毒化数据过滤和从防御者角度进行强化训练。最后，我们强调现有的局限性，并提出在这一领域进一步探索的创新方向。

更新时间: 2024-06-03 06:08:02

领域: cs.CR,cs.IR

下载: http://arxiv.org/abs/2406.01022v1

Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents

Emergent cooperation among self-interested individuals is a widespread phenomenon in the natural world, but remains elusive in interactions between artificially intelligent agents. Instead, na\"ive reinforcement learning algorithms typically converge to Pareto-dominated outcomes in even the simplest of social dilemmas. An emerging class of opponent-shaping methods have demonstrated the ability to reach prosocial outcomes by influencing the learning of other agents. However, they rely on higher-order derivatives through the predicted learning step of other agents or learning meta-game dynamics, which in turn rely on stringent assumptions over opponent learning rules or exponential sample complexity, respectively. To provide a learning rule-agnostic and sample-efficient alternative, we introduce Reciprocators, reinforcement learning agents which are intrinsically motivated to reciprocate the influence of an opponent's actions on their returns. This approach effectively seeks to modify other agents' $Q$-values by increasing their return following beneficial actions (with respect to the Reciprocator) and decreasing it after detrimental actions, guiding them towards mutually beneficial actions without attempting to directly shape policy updates. We show that Reciprocators can be used to promote cooperation in a variety of temporally extended social dilemmas during simultaneous learning.

Updated: 2024-06-03 06:07:27

标题: 相互回报激励促进自私利益代理合作

摘要: 自私个体之间的紧急合作是自然界中普遍存在的现象，但在人工智能代理之间的互动中仍然难以捉摸。相反，天真的强化学习算法通常会在即使是最简单的社会困境中也会收敛到帕累托支配的结果。一种新兴的对手塑造方法已经展示了通过影响其他代理的学习能力来达到亲社会的结果。然而，它们依赖于通过其他代理的预测学习步骤或学习元游戏动态的高阶导数，这又依赖于对手学习规则或指数样本复杂性的严格假设。为了提供一种学习规则不可知和样本效率的替代方案，我们引入了回报者，这是一种受内在动机激励的强化学习代理，以回报对手的行动对他们的回报产生影响。这种方法有效地试图通过增加对其他代理的回报（相对于回报者）以及在有害行动之后减少回报来修改其他代理的Q值，引导它们朝着互利的行动前进，而不是直接塑造政策更新。我们展示了回报者可以在同时学习期间在各种时间延长的社会困境中促进合作。

更新时间: 2024-06-03 06:07:27

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2406.01641v1

Decomposable Submodular Maximization in Federated Setting

Submodular functions, as well as the sub-class of decomposable submodular functions, and their optimization appear in a wide range of applications in machine learning, recommendation systems, and welfare maximization. However, optimization of decomposable submodular functions with millions of component functions is computationally prohibitive. Furthermore, the component functions may be private (they might represent user preference function, for example) and cannot be widely shared. To address these issues, we propose a {\em federated optimization} setting for decomposable submodular optimization. In this setting, clients have their own preference functions, and a weighted sum of these preferences needs to be maximized. We implement the popular {\em continuous greedy} algorithm in this setting where clients take parallel small local steps towards the local solution and then the local changes are aggregated at a central server. To address the large number of clients, the aggregation is performed only on a subsampled set. Further, the aggregation is performed only intermittently between stretches of parallel local steps, which reduces communication cost significantly. We show that our federated algorithm is guaranteed to provide a good approximate solution, even in the presence of above cost-cutting measures. Finally, we show how the federated setting can be incorporated in solving fundamental discrete submodular optimization problems such as Maximum Coverage and Facility Location.

Updated: 2024-06-03 06:05:29

标题: 在联邦设置中的可分解子模块最大化

摘要: 子模函数以及可分解子模函数的子类，以及它们的优化在机器学习、推荐系统和福利最大化等各种应用中出现。然而，优化具有数百万个组件函数的可分解子模函数在计算上是不可行的。此外，这些组件函数可能是私有的（例如，它们可能代表用户喜好函数），不能广泛共享。为了解决这些问题，我们提出了一种用于可分解子模函数优化的“联邦优化”设置。在这种设置中，客户端拥有自己的偏好函数，这些偏好的加权和需要最大化。我们在这种设置中实现了流行的“连续贪心”算法，其中客户端向本地解决方案采取并行小步骤，然后将本地更改聚合在中央服务器上。为了解决大量客户端的问题，聚合仅在子采样集上执行。此外，聚合仅在并行本地步骤之间的间歇性期间执行，这显著降低了通信成本。我们展示了我们的联邦算法保证提供良好的近似解决方案，即使存在上述降低成本的措施。最后，我们展示了如何将联邦设置纳入解决基本的离散子模优化问题，如最大覆盖和设施位置。

更新时间: 2024-06-03 06:05:29

领域: cs.DS,cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2402.00138v2

KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

Automatic evaluation methods for large language models (LLMs) are hindered by data contamination, leading to inflated assessments of their effectiveness. Existing strategies, which aim to detect contaminated texts, focus on quantifying contamination status instead of accurately gauging model performance. In this paper, we introduce KIEval, a Knowledge-grounded Interactive Evaluation framework, which incorporates an LLM-powered "interactor" role for the first time to accomplish a dynamic contamination-resilient evaluation. Starting with a question in a conventional LLM benchmark involving domain-specific knowledge, KIEval utilizes dynamically generated, multi-round, and knowledge-focused dialogues to determine whether a model's response is merely a recall of benchmark answers or demonstrates a deep comprehension to apply knowledge in more complex conversations. Extensive experiments on seven leading LLMs across five datasets validate KIEval's effectiveness and generalization. We also reveal that data contamination brings no contribution or even negative effect to models' real-world applicability and understanding, and existing contamination detection methods for LLMs can only identify contamination in pre-training but not during supervised fine-tuning.

Updated: 2024-06-03 06:02:39

标题: KIEval：面向大型语言模型的知识驱动交互式评估框架

摘要: 大型语言模型（LLMs）的自动评估方法受到数据污染的影响，导致它们的有效性评估被夸大。现有的策略旨在检测受污染的文本，但重点是量化污染状况，而不是准确评估模型性能。本文介绍了KIEval，一个基于知识的交互式评估框架，首次引入了一个由LLM驱动的“交互者”角色，以实现动态的抗污染评估。从涉及领域特定知识的常规LLM基准测试中提出一个问题开始，KIEval利用动态生成的、多轮的、以知识为重点的对话，来确定模型的响应是否仅仅是对基准答案的回忆，还是展示了深刻理解，能够在更复杂的对话中应用知识。对五个数据集上七种领先的LLM进行的大量实验验证了KIEval的有效性和泛化性。我们还发现，数据污染对模型在现实世界中的适用性和理解没有任何贡献，甚至可能产生负面影响，并且现有的LLM污染检测方法只能在预训练时识别污染，而在监督微调期间无法识别。

更新时间: 2024-06-03 06:02:39

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.15043v2

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been extensively investigated. We observe that the misalignment is caused by inadequate token attention activation. We further attribute this phenomenon to the diffusion model's insufficient condition utilization, which is caused by its training paradigm. To address the issue, we propose CoMat, an end-to-end diffusion model fine-tuning strategy with an image-to-text concept matching mechanism. We leverage an image captioning model to measure image-to-text alignment and guide the diffusion model to revisit ignored tokens. A novel attribute concentration module is also proposed to address the attribute binding problem. Without any image or human preference data, we use only 20K text prompts to fine-tune SDXL to obtain CoMat-SDXL. Extensive experiments show that CoMat-SDXL significantly outperforms the baseline model SDXL in two text-to-image alignment benchmarks and achieves start-of-the-art performance.

Updated: 2024-06-03 06:02:34

标题: CoMat：将文本到图像扩散模型与图像到文本概念匹配进行对齐

摘要: 扩散模型在文本到图像生成领域取得了巨大成功。然而，缓解文本提示和图像之间的不一致仍然具有挑战性。导致不一致的根本原因尚未得到充分调查。我们观察到，不一致是由于令牌注意力激活不足引起的。我们进一步将这一现象归因于扩散模型对条件利用不足，这是由其训练范式引起的。为了解决这个问题，我们提出了CoMat，一种端到端的扩散模型微调策略，具有图像到文本概念匹配机制。我们利用图像字幕模型来衡量图像到文本的对齐，并引导扩散模型重新审视被忽视的令牌。还提出了一种新颖的属性集中模块来解决属性绑定问题。在没有图像或人类偏好数据的情况下，我们仅使用20K个文本提示对SDXL进行微调，以获得CoMat-SDXL。大量实验表明，CoMat-SDXL在两个文本到图像对齐基准测试中明显优于基线模型SDXL，并实现了最新的性能。

更新时间: 2024-06-03 06:02:34

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.03653v2

Attention-based Iterative Decomposition for Tensor Product Representation

In recent research, Tensor Product Representation (TPR) is applied for the systematic generalization task of deep neural networks by learning the compositional structure of data. However, such prior works show limited performance in discovering and representing the symbolic structure from unseen test data because their decomposition to the structural representations was incomplete. In this work, we propose an Attention-based Iterative Decomposition (AID) module designed to enhance the decomposition operations for the structured representations encoded from the sequential input data with TPR. Our AID can be easily adapted to any TPR-based model and provides enhanced systematic decomposition through a competitive attention mechanism between input features and structured representations. In our experiments, AID shows effectiveness by significantly improving the performance of TPR-based prior works on the series of systematic generalization tasks. Moreover, in the quantitative and qualitative evaluations, AID produces more compositional and well-bound structural representations than other works.

Updated: 2024-06-03 05:46:52

标题: 基于注意力机制的张量积表示的迭代分解

摘要: 在最近的研究中，张量乘积表示（TPR）被应用于通过学习数据的组合结构来进行深度神经网络的系统泛化任务。然而，先前的研究表明，这种方法在发现和表示来自未见测试数据的符号结构方面表现有限，因为它们对结构表示的分解是不完整的。在本文中，我们提出了一个基于注意力的迭代分解（AID）模块，旨在增强从使用TPR编码的顺序输入数据中获得的结构表示的分解操作。我们的AID可以轻松适应任何基于TPR的模型，并通过输入特征和结构表示之间的竞争性注意力机制提供增强的系统分解。在我们的实验中，AID通过显著改进TPR先前研究在一系列系统泛化任务上的表现来显示其有效性。此外，在定量和定性评估中，AID产生比其他方法更具组合性和良好结构的表示。

更新时间: 2024-06-03 05:46:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.01012v1

Multi-Object Tracking based on Imaging Radar 3D Object Detection

Effective tracking of surrounding traffic participants allows for an accurate state estimation as a necessary ingredient for prediction of future behavior and therefore adequate planning of the ego vehicle trajectory. One approach for detecting and tracking surrounding traffic participants is the combination of a learning based object detector with a classical tracking algorithm. Learning based object detectors have been shown to work adequately on lidar and camera data, while learning based object detectors using standard radar data input have proven to be inferior. Recently, with the improvements to radar sensor technology in the form of imaging radars, the object detection performance on radar was greatly improved but is still limited compared to lidar sensors due to the sparsity of the radar point cloud. This presents a unique challenge for the task of multi-object tracking. The tracking algorithm must overcome the limited detection quality while generating consistent tracks. To this end, a comparison between different multi-object tracking methods on imaging radar data is required to investigate its potential for downstream tasks. The work at hand compares multiple approaches and analyzes their limitations when applied to imaging radar data. Furthermore, enhancements to the presented approaches in the form of probabilistic association algorithms are considered for this task.

Updated: 2024-06-03 05:46:23

标题: 基于成像雷达3D目标检测的多目标跟踪

摘要: 有效跟踪周围交通参与者可以实现准确的状态估计，这是预测未来行为和因此充分规划自车轨迹的必要因素。一种检测和跟踪周围交通参与者的方法是将基于学习的目标检测器与经典跟踪算法结合起来。已经证明，基于学习的目标检测器在激光雷达和摄像头数据上能够很好地工作，而使用标准雷达数据输入的基于学习的目标检测器则被证明效果较差。最近，随着成像雷达技术的改进，雷达上的目标检测性能得到了极大的提升，但由于雷达点云的稀疏性，与激光雷达传感器相比仍然存在局限性。这为多目标跟踪任务提出了独特的挑战。跟踪算法必须克服有限的检测质量，同时生成一致的轨迹。为此，有必要对成像雷达数据上的不同多目标跟踪方法进行比较，以探索其在下游任务中的潜力。本文比较了多种方法，并分析了它们在应用于成像雷达数据时的局限性。此外，还考虑了概率关联算法的改进，以用于此任务。

更新时间: 2024-06-03 05:46:23

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.01011v1

Model Editing by Standard Fine-Tuning

Standard fine-tuning is considered not as effective as specialized methods for model editing due to its comparatively poor performance. However, it is simple, agnostic to the architectural details of the model being edited, and able to leverage advances in standard training techniques with no additional work (e.g., black-box PEFT for computational efficiency), making it an appealing choice for a model editor. In this work, we show that standard fine-tuning alone can yield competitive model editing performance with two minor modifications. First, we optimize the conditional likelihood rather than the full likelihood. Second, in addition to the typical practice of training on randomly paraphrased edit prompts to encourage generalization, we also train on random or similar unedited facts to encourage locality. Our experiments on the ZsRE and CounterFact datasets demonstrate that these simple modifications allow standard fine-tuning to match or outperform highly specialized editors in terms of edit score.

Updated: 2024-06-03 05:39:10

标题: 标准微调的模型编辑

摘要: 标准微调被认为不如专门的编辑方法效果好，因为其性能相对较差。然而，它简单、对被编辑模型的架构细节不加偏见，并且能够利用标准训练技术的进展而不需要额外的工作（例如，黑盒PEFT以提高计算效率），使其成为模型编辑器的吸引人选择。在这项工作中，我们展示了仅通过两个小修改，标准微调就可以获得具有竞争力的模型编辑性能。首先，我们优化条件似然而不是完全似然。其次，除了训练随机重述的编辑提示以促进泛化的典型做法外，我们还训练随机或相似的未编辑事实以促进局部性。我们在ZsRE和CounterFact数据集上的实验表明，这些简单的修改使得标准微调能够在编辑得分方面与高度专门的编辑器匹敌甚至超越。

更新时间: 2024-06-03 05:39:10

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.11078v3

SemCoder: Training Code Language Models with Comprehensive Semantics

Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap between Code LLMs' reliance on static text data and the need for thorough semantic understanding for complex tasks like debugging and program repair. We introduce a novel strategy to train Code LLMs with comprehensive semantics, encompassing high-level functional descriptions, local execution effects of individual statements, and overall input/output behavior, thereby linking static code text with dynamic execution states. We begin by collecting PyX, a clean code corpus of fully executable samples with functional descriptions and execution tracing. We propose training Code LLMs to write code and represent and reason about execution behaviors using natural language, mimicking human verbal debugging. This approach led to the development of SemCoder, a Code LLM with only 6.7B parameters, which shows competitive performance with GPT-3.5-turbo on code generation and execution reasoning tasks. SemCoder achieves 81.1% on HumanEval (GPT-3.5-turbo: 76.8%) and 54.5% on CRUXEval-I (GPT-3.5-turbo: 50.3%). We also study the effectiveness of SemCoder's monologue-style execution reasoning compared to concrete scratchpad reasoning, showing that our approach integrates semantics from multiple dimensions more smoothly. Finally, we demonstrate the potential of applying learned semantics to improve Code LLMs' debugging and self-refining capabilities.

Updated: 2024-06-03 05:36:57

标题: SemCoder: 用综合语义训练代码语言模型

摘要: 大型语言模型（Code LLMs）在任务如代码补全方面表现出色，但往往会忽略更深层次的语义，如执行效果和动态状态。本文旨在弥合Code LLMs对静态文本数据的依赖与对复杂任务如调试和程序修复所需彻底语义理解之间的差距。我们引入了一种新颖的策略，通过训练Code LLMs具有全面的语义，包括高级功能描述、单个语句的局部执行效果以及整体输入/输出行为，从而将静态代码文本与动态执行状态联系起来。我们首先收集了PyX，一个干净的可执行样本代码语料库，具有功能描述和执行跟踪。我们提出训练Code LLMs编写代码，并使用自然语言表示和推理执行行为，模仿人类口头调试。这种方法导致了SemCoder的开发，一个只有6.7B参数的Code LLM，在代码生成和执行推理任务上显示出与GPT-3.5-turbo竞争力的表现。SemCoder在HumanEval上达到81.1%（GPT-3.5-turbo：76.8%）和CRUXEval-I上达到54.5%（GPT-3.5-turbo：50.3%）。我们还研究了SemCoder的独白式执行推理与具体的草稿板推理相比的有效性，显示出我们的方法更加平滑地整合了多个维度的语义。最后，我们展示了将学到的语义应用于改进Code LLMs的调试和自我完善能力的潜力。

更新时间: 2024-06-03 05:36:57

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2406.01006v1

Data Contamination Calibration for Black-box LLMs

The rapid advancements of Large Language Models (LLMs) tightly associate with the expansion of the training data size. However, the unchecked ultra-large-scale training sets introduce a series of potential risks like data contamination, i.e. the benchmark data is used for training. In this work, we propose a holistic method named Polarized Augment Calibration (PAC) along with a new to-be-released dataset to detect the contaminated data and diminish the contamination effect. PAC extends the popular MIA (Membership Inference Attack) -- from machine learning community -- by forming a more global target at detecting training data to Clarify invisible training data. As a pioneering work, PAC is very much plug-and-play that can be integrated with most (if not all) current white- and black-box LLMs. By extensive experiments, PAC outperforms existing methods by at least 4.5%, towards data contamination detection on more 4 dataset formats, with more than 10 base LLMs. Besides, our application in real-world scenarios highlights the prominent presence of contamination and related issues.

Updated: 2024-06-03 05:21:54

标题: 数据污染校准对于黑盒LLMs的翻译

摘要: 大型语言模型（LLMs）的快速进展与训练数据规模的扩大密切相关。然而，未经检查的超大规模训练集引入了一系列潜在风险，如数据污染，即使用基准数据进行训练。在这项工作中，我们提出了一种名为Polarized Augment Calibration（PAC）的整体方法，以及一个即将发布的新数据集，用于检测受污染的数据并减少污染效应。PAC通过对机器学习社区中流行的MIA（成员推断攻击）进行扩展，形成了一个更全局的目标，以检测训练数据以澄清不可见的训练数据。作为一项开创性工作，PAC非常易于使用，可以与大多数（如果不是全部）当前的白盒和黑盒LLMs集成。通过大量实验，PAC在至少4.5％以上的性能方面超越了现有方法，针对4种以上的数据集格式，使用了超过10个基本LLMs进行数据污染检测。此外，我们在现实场景中的应用突显了污染及相关问题的突出存在。

更新时间: 2024-06-03 05:21:54

领域: cs.LG

下载: http://arxiv.org/abs/2405.11930v2

Seeing the Forest through the Trees: Data Leakage from Partial Transformer Gradients

Recent studies have shown that distributed machine learning is vulnerable to gradient inversion attacks, where private training data can be reconstructed by analyzing the gradients of the models shared in training. Previous attacks established that such reconstructions are possible using gradients from all parameters in the entire models. However, we hypothesize that most of the involved modules, or even their sub-modules, are at risk of training data leakage, and we validate such vulnerabilities in various intermediate layers of language models. Our extensive experiments reveal that gradients from a single Transformer layer, or even a single linear component with 0.54% parameters, are susceptible to training data leakage. Additionally, we show that applying differential privacy on gradients during training offers limited protection against the novel vulnerability of data disclosure.

Updated: 2024-06-03 05:15:04

标题: 透过树林看到森林：来自部分变压器梯度的数据泄漏

摘要: 最近的研究表明，分布式机器学习容易受到梯度反转攻击的影响，私人训练数据可以通过分析在训练中共享的模型的梯度来重建。先前的攻击已经证实，使用整个模型中所有参数的梯度可以进行这种重建。然而，我们假设大多数涉及的模块，甚至它们的子模块，都存在训练数据泄露的风险，并且我们在各种语言模型的中间层验证了这种漏洞。我们的广泛实验证明，来自单个Transformer层的梯度，甚至是具有0.54％参数的单个线性组件，都容易泄露训练数据。此外，我们还表明，在训练过程中对梯度应用差分隐私只能提供有限的保护，无法防止数据泄露的新漏洞。

更新时间: 2024-06-03 05:15:04

领域: cs.LG,cs.CL,cs.CR,I.2.7; I.2.11

下载: http://arxiv.org/abs/2406.00999v1

Improving out-of-distribution generalization in graphs via hierarchical semantic environments

Out-of-distribution (OOD) generalization in the graph domain is challenging due to complex distribution shifts and a lack of environmental contexts. Recent methods attempt to enhance graph OOD generalization by generating flat environments. However, such flat environments come with inherent limitations to capture more complex data distributions. Considering the DrugOOD dataset, which contains diverse training environments (e.g., scaffold, size, etc.), flat contexts cannot sufficiently address its high heterogeneity. Thus, a new challenge is posed to generate more semantically enriched environments to enhance graph invariant learning for handling distribution shifts. In this paper, we propose a novel approach to generate hierarchical semantic environments for each graph. Firstly, given an input graph, we explicitly extract variant subgraphs from the input graph to generate proxy predictions on local environments. Then, stochastic attention mechanisms are employed to re-extract the subgraphs for regenerating global environments in a hierarchical manner. In addition, we introduce a new learning objective that guides our model to learn the diversity of environments within the same hierarchy while maintaining consistency across different hierarchies. This approach enables our model to consider the relationships between environments and facilitates robust graph invariant learning. Extensive experiments on real-world graph data have demonstrated the effectiveness of our framework. Particularly, in the challenging dataset DrugOOD, our method achieves up to 1.29% and 2.83% improvement over the best baselines on IC50 and EC50 prediction tasks, respectively.

Updated: 2024-06-03 05:05:24

标题: 通过分层语义环境改善图中的超范围泛化

摘要: 图领域中的分布外泛化具有挑战性，因为存在复杂的分布转变和环境背景的缺乏。最近的方法试图通过生成平坦环境来增强图的分布外泛化能力。然而，这种平坦环境带有固有的限制，不能很好地捕捉更复杂的数据分布。考虑到DrugOOD数据集，其中包含多样的训练环境（例如，骨架、大小等），平坦环境无法充分解决其高异质性。因此，提出了一个新的挑战，即生成更具语义丰富的环境，以增强图的不变学习，以处理分布转变。在本文中，我们提出了一种为每个图生成分层语义环境的新方法。首先，给定一个输入图，我们明确地从输入图中提取变体子图，以在局部环境上生成代理预测。然后，采用随机注意机制以分层方式重新提取子图，以重新生成全局环境。此外，我们引入了一个新的学习目标，指导我们的模型在同一层次内学习环境的多样性，同时在不同层次之间保持一致性。这种方法使我们的模型能够考虑环境之间的关系，并促进强大的图不变学习。对真实世界图数据的广泛实验表明了我们框架的有效性。特别是在具有挑战性的DrugOOD数据集中，我们的方法在IC50和EC50预测任务上分别比最佳基线提高了1.29%和2.83%。

更新时间: 2024-06-03 05:05:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.01773v2

Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect

Current methods of toxic language detection (TLD) typically rely on specific tokens to conduct decisions, which makes them suffer from lexical bias, leading to inferior performance and generalization. Lexical bias has both "useful" and "misleading" impacts on understanding toxicity. Unfortunately, instead of distinguishing between these impacts, current debiasing methods typically eliminate them indiscriminately, resulting in a degradation in the detection accuracy of the model. To this end, we propose a Counterfactual Causal Debiasing Framework (CCDF) to mitigate lexical bias in TLD. It preserves the "useful impact" of lexical bias and eliminates the "misleading impact". Specifically, we first represent the total effect of the original sentence and biased tokens on decisions from a causal view. We then conduct counterfactual inference to exclude the direct causal effect of lexical bias from the total effect. Empirical evaluations demonstrate that the debiased TLD model incorporating CCDF achieves state-of-the-art performance in both accuracy and fairness compared to competitive baselines applied on several vanilla models. The generalization capability of our model outperforms current debiased models for out-of-distribution data.

Updated: 2024-06-03 04:34:30

标题: 提取精华，舍弃糟粕！通过反事实因果效应进行有毒语言检测的去偏见化

摘要: 目前毒性语言检测（TLD）的方法通常依赖于特定的标记来进行决策，这使它们受到词汇偏见的影响，导致性能和泛化能力较差。词汇偏见对理解毒性有着“有用”和“误导性”影响。不幸的是，当前的去偏方法通常没有区分这些影响，而是不加选择地消除它们，导致模型检测准确性下降。因此，我们提出了一个反事实因果去偏框架（CCDF）来减轻TLD中的词汇偏见。它保留了词汇偏见的“有用影响”并消除了“误导性影响”。具体来说，我们首先从因果视角表示原始句子和有偏标记对决策的总效应。然后进行反事实推断，将词汇偏见的直接因果效应从总效应中排除。实证评估表明，整合CCDF的去偏TLD模型在准确性和公平性方面均比几种基准模型表现出更先进的性能。我们的模型的泛化能力优于当前的去偏模型对于带外数据的表现。

更新时间: 2024-06-03 04:34:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.00983v1

Do pretrained Transformers Learn In-Context by Gradient Descent?

The emergence of In-Context Learning (ICL) in LLMs remains a remarkable phenomenon that is partially understood. To explain ICL, recent studies have created theoretical connections to Gradient Descent (GD). We ask, do such connections hold up in actual pre-trained language models? We highlight the limiting assumptions in prior works that make their setup considerably different from the practical setup in which language models are trained. For example, their experimental verification uses \emph{ICL objective} (training models explicitly for ICL), which differs from the emergent ICL in the wild. Furthermore, the theoretical hand-constructed weights used in these studies have properties that don't match those of real LLMs. We also look for evidence in real models. We observe that ICL and GD have different sensitivity to the order in which they observe demonstrations. Finally, we probe and compare the ICL vs. GD hypothesis in a natural setting. We conduct comprehensive empirical analyses on language models pre-trained on natural data (LLaMa-7B). Our comparisons of three performance metrics highlight the inconsistent behavior of ICL and GD as a function of various factors such as datasets, models, and the number of demonstrations. We observe that ICL and GD modify the output distribution of language models differently. These results indicate that \emph{the equivalence between ICL and GD remains an open hypothesis} and calls for further studies.

Updated: 2024-06-03 04:18:11

标题: 预训练的Transformer通过梯度下降学习上下文吗？

摘要: In-Context Learning (ICL)在预训练语言模型（LLM）中的出现仍然是一个部分理解的显著现象。为了解释ICL，最近的研究建立了与梯度下降（GD）的理论联系。我们问，这些联系在实际的预训练语言模型中是否成立？我们强调先前研究中存在的限制性假设，这使得它们的设置与语言模型训练的实际设置有很大的不同。例如，他们的实验验证使用ICL目标（明确为ICL训练模型），这与野外出现的ICL不同。此外，这些研究中使用的理论手工构建的权重具有与真实LLM不匹配的属性。我们还寻找真实模型中的证据。我们观察到ICL和GD对观察演示的顺序具有不同的敏感性。最后，我们在自然环境中探讨和比较ICL与GD的假设。我们对在自然数据（LLaMa-7B）上预训练的语言模型进行了全面的实证分析。我们对三个性能指标的比较突出了ICL和GD作为各种因素的函数时的不一致行为，如数据集、模型和演示数量。我们观察到ICL和GD以不同方式修改语言模型的输出分布。这些结果表明“ICL与GD之间的等价性仍然是一个开放的假设”，并呼吁进一步研究。

更新时间: 2024-06-03 04:18:11

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.08540v5

Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale

Popular guidance for denoising diffusion probabilistic model (DDPM) linearly combines distinct conditional models together to provide enhanced control over samples. However, this approach overlooks nonlinear effects that become significant when guidance scale is large. To address this issue, we propose characteristic guidance, a guidance method that provides first-principle non-linear correction for classifier-free guidance. Such correction forces the guided DDPMs to respect the Fokker-Planck (FP) equation of diffusion process, in a way that is training-free and compatible with existing sampling methods. Experiments show that characteristic guidance enhances semantic characteristics of prompts and mitigate irregularities in image generation, proving effective in diverse applications ranging from simulating magnet phase transitions to latent space sampling.

Updated: 2024-06-03 04:17:49

标题: 特征引导：大尺度引导下扩散模型的非线性校正

摘要: 流行的去噪扩散概率模型（DDPM）指导线性地将不同的条件模型组合在一起，以提供对样本的增强控制。然而，这种方法忽视了非线性效应，当指导尺度很大时，非线性效应变得显著。为解决这一问题，我们提出了特征指导，这是一种指导方法，为无分类器指导提供第一原理的非线性校正。这种校正强制引导的DDPM遵守扩散过程的福克-普朗克（FP）方程，方式是无需训练的，并且与现有的采样方法兼容。实验证明，特征指导增强了提示的语义特征，减轻了图像生成中的不规则性，在从模拟磁相转变到潜在空间采样等各种应用中表现出了有效性。

更新时间: 2024-06-03 04:17:49

领域: cs.CV,cs.AI,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2312.07586v5

Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model

Recent advances in large multimodal models (LMMs) suggest that higher image resolution enhances the fine-grained understanding of image details, crucial for tasks such as visual commonsense reasoning and analyzing biomedical images. However, increasing input resolution poses two main challenges: 1) It extends the context length required by the language model, leading to inefficiencies and hitting the model's context limit; 2) It increases the complexity of visual features, necessitating more training data or more complex architecture. We introduce Dragonfly, a new LMM architecture that enhances fine-grained visual understanding and reasoning about image regions to address these challenges. Dragonfly employs two key strategies: multi-resolution visual encoding and zoom-in patch selection. These strategies allow the model to process high-resolution images efficiently while maintaining reasonable context length. Our experiments on eight popular benchmarks demonstrate that Dragonfly achieves competitive or better performance compared to other architectures, highlighting the effectiveness of our design. Additionally, we finetuned Dragonfly on biomedical instructions, achieving state-of-the-art results on multiple biomedical tasks requiring fine-grained visual understanding, including 92.3% accuracy on the Path-VQA dataset (compared to 83.3% for Med-Gemini) and the highest reported results on biomedical image captioning. To support model training, we curated a visual instruction-tuning dataset with 5.5 million image-instruction samples in the general domain and 1.4 million samples in the biomedical domain. We also conducted ablation studies to characterize the impact of various architectural designs and image resolutions, providing insights for future research on visual instruction alignment. The codebase and model are available at https://github.com/togethercomputer/Dragonfly.

Updated: 2024-06-03 04:17:12

标题: 蜻蜓：多分辨率放大强化大型视觉-语言模型

摘要: 最近对大型多模式模型（LMMs）的最新进展表明，更高的图像分辨率增强了对图像细节的细致理解，这对于视觉常识推理和分析生物医学图像等任务至关重要。然而，增加输入分辨率会带来两个主要挑战：1）它延长了语言模型所需的上下文长度，导致效率低下并达到模型的上下文限制；2）它增加了视觉特征的复杂性，需要更多的训练数据或更复杂的架构。我们引入了Dragonfly，一种新的LMM架构，通过增强对图像区域的细致视觉理解和推理来解决这些挑战。Dragonfly采用了两个关键策略：多分辨率视觉编码和放大补丁选择。这些策略使模型能够高效处理高分辨率图像，同时保持合理的上下文长度。我们在八个流行的基准测试上进行的实验表明，与其他架构相比，Dragonfly实现了竞争力或更好的性能，突显了我们设计的有效性。此外，我们在生物医学说明书上对Dragonfly进行了微调，在需要细致视觉理解的多个生物医学任务上取得了最新成果，包括在Path-VQA数据集上达到92.3%的准确率（而Med-Gemini为83.3%），并在生物医学图像字幕中报告了最高的结果。为了支持模型训练，我们整理了一个通用领域中包含550万图像说明样本和生物医学领域中包含140万样本的视觉说明调整数据集。我们还进行了消融研究，以描述各种架构设计和图像分辨率的影响，为未来关于视觉说明对齐的研究提供见解。代码库和模型可在https://github.com/togethercomputer/Dragonfly上获得。

更新时间: 2024-06-03 04:17:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.00977v1

Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost

Retriever Augmented Generation (RAG) systems have become pivotal in enhancing the capabilities of language models by incorporating external knowledge retrieval mechanisms. However, a significant challenge in deploying these systems in industry applications is the detection and mitigation of hallucinations: instances where the model generates information that is not grounded in the retrieved context. Addressing this issue is crucial for ensuring the reliability and accuracy of responses generated by large language models (LLMs) in diverse industry settings. Current hallucination detection techniques fail to deliver accuracy, low latency, and low cost simultaneously. We introduce Luna: a DeBERTA-large (440M) encoder, finetuned for hallucination detection in RAG settings. We demonstrate that Luna outperforms GPT-3.5 and commercial evaluation frameworks on the hallucination detection task, with 97% and 96% reduction in cost and latency, respectively. Luna is lightweight and generalizes across multiple industry verticals and out-of-domain data, making it an ideal candidate for industry LLM applications.

Updated: 2024-06-03 04:14:21

标题: 卢娜：一种评估基础模型，以高准确性和低成本捕捉语言模型幻觉

摘要: 检索增强生成（RAG）系统已成为通过整合外部知识检索机制来增强语言模型能力的关键因素。然而，在将这些系统部署到工业应用中的一个重要挑战是检测和减轻幻觉：即模型生成不基于检索上下文的信息的情况。解决这个问题对于确保大型语言模型（LLMs）在不同行业环境中生成的响应的可靠性和准确性至关重要。当前的幻觉检测技术无法同时提供准确性、低延迟和低成本。我们介绍了Luna：一个在RAG设置中针对幻觉检测进行微调的DeBERTA-large（440M）编码器。我们展示了Luna在幻觉检测任务上优于GPT-3.5和商业评估框架，分别降低了97%和96%的成本和延迟。Luna轻量且能够泛化到多个行业垂直领域和域外数据，使其成为工业LLM应用的理想选择。

更新时间: 2024-06-03 04:14:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.00975v1

SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence

To address the increasing complexity and frequency of cybersecurity incidents emphasized by the recent cybersecurity threat reports with over 10 billion instances, cyber threat intelligence (CTI) plays a critical role in the modern cybersecurity landscape by offering the insights required to understand and combat the constantly evolving nature of cyber threats. Inspired by the powerful capability of large language models (LLMs) in handling complex tasks, in this paper, we introduce a framework to benchmark, elicit, and improve cybersecurity incident analysis and response abilities in LLMs for Security Events (SEvenLLM). Specifically, we create a high-quality bilingual instruction corpus by crawling cybersecurity raw text from cybersecurity websites to overcome the lack of effective data for information extraction. Then, we design a pipeline to auto-select tasks from the tasks pool and convert the raw text into supervised corpora comprised of question and response. The instruction dataset SEvenLLM-Instruct is used to train cybersecurity LLMs with the multi-task learning objective (27 well-designed tasks) for augmenting the analysis of cybersecurity events. Extensive experiments in our curated benchmark (SEvenLLM-bench) demonstrate that SEvenLLM performs more sophisticated threat analysis and fortifies defenses against the evolving landscape of cyber threats.

Updated: 2024-06-03 04:04:52

标题: SEvenLLM：在网络威胁情报中对大型语言模型的能力进行基准测试、激发和增强

摘要: 为了解决最近的网络安全威胁报告强调的网络安全事件日益复杂和频繁的问题，网络威胁情报（CTI）在现代网络安全领域中发挥着关键作用，提供了理解和应对不断演变的网络威胁性质所需的洞察力。受大型语言模型（LLMs）在处理复杂任务方面的强大能力的启发，本文介绍了一个框架，用于评估、唤起和提高LLMs对安全事件（SEvenLLM）的网络安全事件分析和响应能力。具体而言，我们通过从网络安全网站抓取网络安全原始文本来创建高质量的双语指导语料库，以克服信息提取的有效数据不足问题。然后，我们设计了一个管道，从任务池中自动选择任务，并将原始文本转换为由问题和响应组成的受监督的语料库。指导数据集SEvenLLM-Instruct用于通过多任务学习目标（27个精心设计的任务）培训网络安全LLMs，以增强对网络安全事件的分析。我们精心策划的基准测试中进行的广泛实验（SEvenLLM-bench）表明，SEvenLLM执行了更复杂的威胁分析，并加强了对不断演变的网络威胁形势的防御。

更新时间: 2024-06-03 04:04:52

领域: cs.CR

下载: http://arxiv.org/abs/2405.03446v2

MC-GTA: Metric-Constrained Model-Based Clustering using Goodness-of-fit Tests with Autocorrelations

A wide range of (multivariate) temporal (1D) and spatial (2D) data analysis tasks, such as grouping vehicle sensor trajectories, can be formulated as clustering with given metric constraints. Existing metric-constrained clustering algorithms overlook the rich correlation between feature similarity and metric distance, i.e., metric autocorrelation. The model-based variations of these clustering algorithms (e.g. TICC and STICC) achieve SOTA performance, yet suffer from computational instability and complexity by using a metric-constrained Expectation-Maximization procedure. In order to address these two problems, we propose a novel clustering algorithm, MC-GTA (Model-based Clustering via Goodness-of-fit Tests with Autocorrelations). Its objective is only composed of pairwise weighted sums of feature similarity terms (square Wasserstein-2 distance) and metric autocorrelation terms (a novel multivariate generalization of classic semivariogram). We show that MC-GTA is effectively minimizing the total hinge loss for intra-cluster observation pairs not passing goodness-of-fit tests, i.e., statistically not originating from the same distribution. Experiments on 1D/2D synthetic and real-world datasets demonstrate that MC-GTA successfully incorporates metric autocorrelation. It outperforms strong baselines by large margins (up to 14.3% in ARI and 32.1% in NMI) with faster and stabler optimization (>10x speedup).

Updated: 2024-06-03 03:53:16

标题: MC-GTA: 利用自相关性进行度量约束的基于模型的聚类方法与拟合优度检验

摘要: 一种广泛的（多变量）时间（1D）和空间（2D）数据分析任务，如对车辆传感器轨迹进行分组，可以被表述为在给定度量约束下的聚类。现有的度量约束聚类算法忽视了特征相似性与度量距离之间的丰富相关性，即度量自相关性。这些聚类算法的基于模型的变体（例如TICC和STICC）实现了SOTA性能，但由于使用度量约束的期望最大化过程而遭受计算不稳定性和复杂性。为了解决这两个问题，我们提出了一种新颖的聚类算法，MC-GTA（通过自相关性的拟合度检验进行基于模型的聚类）。它的目标仅由特征相似性项（平方Wasserstein-2距离的成对加权和）和度量自相关性项（经典半变异函数的新颖多变量泛化）组成。我们展示了MC-GTA有效地最小化了不通过拟合度检验的簇内观测对的总铰链损失，即在统计上不源自同一分布。对1D/2D合成和真实世界数据集的实验表明，MC-GTA成功地整合了度量自相关性。它以更快，更稳定的优化速度（> 10倍加速）大幅优于强基线（ARI最多提高14.3％，NMI最多提高32.1％）。

更新时间: 2024-06-03 03:53:16

领域: cs.LG,cs.AI,stat.AP

下载: http://arxiv.org/abs/2405.18395v2

Continual Learning: Forget-free Winning Subnetworks for Video Representations

Inspired by the Lottery Ticket Hypothesis (LTH), which highlights the existence of efficient subnetworks within larger, dense networks, a high-performing Winning Subnetwork (WSN) in terms of task performance under appropriate sparsity conditions is considered for various continual learning tasks. It leverages pre-existing weights from dense networks to achieve efficient learning in Task Incremental Learning (TIL) and Task-agnostic Incremental Learning (TaIL) scenarios. In Few-Shot Class Incremental Learning (FSCIL), a variation of WSN referred to as the Soft subnetwork (SoftNet) is designed to prevent overfitting when the data samples are scarce. Furthermore, the sparse reuse of WSN weights is considered for Video Incremental Learning (VIL). The use of Fourier Subneural Operator (FSO) within WSN is considered. It enables compact encoding of videos and identifies reusable subnetworks across varying bandwidths. We have integrated FSO into different architectural frameworks for continual learning, including VIL, TIL, and FSCIL. Our comprehensive experiments demonstrate FSO's effectiveness, significantly improving task performance at various convolutional representational levels. Specifically, FSO enhances higher-layer performance in TIL and FSCIL and lower-layer performance in VIL.

Updated: 2024-06-03 03:51:38

标题: 持续学习：视频表示中无遗忘的胜出子网络

摘要: 受到抽奖票假设（LTH）的启发，该假设强调在更大、更密集的网络中存在有效的子网络，针对各种持续学习任务考虑了在适当的稀疏条件下表现出色的获奖子网络（WSN）。它利用来自密集网络的现有权重，在任务增量学习（TIL）和任务不可知增量学习（TaIL）场景中实现高效学习。在少样本类增量学习（FSCIL）中，设计了一种称为Soft子网络（SoftNet）的WSN变体，用于在数据样本稀缺时防止过拟合。此外，考虑了WSN权重的稀疏重用，用于视频增量学习（VIL）。考虑了在WSN中使用傅里叶子神经运算器（FSO），它实现了视频的紧凑编码，并识别了在不同带宽范围内可重用的子网络。我们已将FSO集成到不同的持续学习架构中，包括VIL、TIL和FSCIL。我们的综合实验表明FSO的有效性，显著提高了在各种卷积表征水平上的任务性能。具体来说，FSO提升了TIL和FSCIL中的高层性能，以及VIL中的低层性能。

更新时间: 2024-06-03 03:51:38

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.11973v4

Guaranteeing Data Privacy in Federated Unlearning with Dynamic User Participation

Federated Unlearning (FU) is gaining prominence for its capacity to eliminate influences of Federated Learning (FL) users' data from trained global FL models. A straightforward FU method involves removing the unlearned users and subsequently retraining a new global FL model from scratch with all remaining users, a process that leads to considerable overhead. To enhance unlearning efficiency, a widely adopted strategy employs clustering, dividing FL users into clusters, with each cluster maintaining its own FL model. The final inference is then determined by aggregating the majority vote from the inferences of these sub-models. This method confines unlearning processes to individual clusters for removing a user, thereby enhancing unlearning efficiency by eliminating the need for participation from all remaining users. However, current clustering-based FU schemes mainly concentrate on refining clustering to boost unlearning efficiency but overlook the potential information leakage from FL users' gradients, a privacy concern that has been extensively studied. Typically, integrating secure aggregation (SecAgg) schemes within each cluster can facilitate a privacy-preserving FU. Nevertheless, crafting a clustering methodology that seamlessly incorporates SecAgg schemes is challenging, particularly in scenarios involving adversarial users and dynamic users. In this connection, we systematically explore the integration of SecAgg protocols within the most widely used federated unlearning scheme, which is based on clustering, to establish a privacy-preserving FU framework, aimed at ensuring privacy while effectively managing dynamic user participation. Comprehensive theoretical assessments and experimental results show that our proposed scheme achieves comparable unlearning effectiveness, alongside offering improved privacy protection and resilience in the face of varying user participation.

Updated: 2024-06-03 03:39:07

标题: 确保在动态用户参与的联邦去学习中的数据隐私

摘要: 联邦遗忘（FU）因其消除联邦学习（FL）用户数据对已训练全局FL模型的影响能力而日益受到关注。一个直接的FU方法涉及删除未学习的用户，然后从头开始重新训练一个新的全局FL模型，所有剩余用户参与，这个过程会导致相当大的开销。为了增强遗忘效率，一个广泛采用的策略是使用聚类，将FL用户分成簇，每个簇维护自己的FL模型。最终的推断是通过汇总这些子模型的推断的多数投票来确定的。这种方法将遗忘过程限制在单个簇中以删除一个用户，从而通过消除所有剩余用户的参与需求来增强遗忘效率。然而，当前基于聚类的FU方案主要集中于改进聚类以提高遗忘效率，却忽视了FL用户渐变可能泄露信息的潜在隐私问题，这是一个已经被广泛研究的隐私问题。通常，在每个簇中集成安全聚合（SecAgg）方案可以促进隐私保护的FU。然而，在每个簇中构建一个无缝整合SecAgg方案的聚类方法具有挑战性，尤其是在涉及对抗用户和动态用户的情况下。在这方面，我们系统地探讨了在基于聚类的最广泛使用的联邦遗忘方案中整合SecAgg协议，以建立一个注重隐私保护的FU框架，旨在确保隐私同时有效地管理动态用户参与。全面的理论评估和实验结果表明，我们提出的方案实现了可比的遗忘效果，同时提供了改善的隐私保护和在用户参与变化的情况下的韧性。

更新时间: 2024-06-03 03:39:07

领域: cs.CR

下载: http://arxiv.org/abs/2406.00966v1

LLark: A Multimodal Instruction-Following Language Model for Music

Music has a unique and complex structure which is challenging for both expert humans and existing AI systems to understand, and presents unique challenges relative to other forms of audio. We present LLark, an instruction-tuned multimodal model for \emph{music} understanding. We detail our process for dataset creation, which involves augmenting the annotations of diverse open-source music datasets and converting them to a unified instruction-tuning format. We propose a multimodal architecture for LLark, integrating a pretrained generative model for music with a pretrained language model. In evaluations on three types of tasks (music understanding, captioning, reasoning), we show that LLark matches or outperforms existing baselines in music understanding, and that humans show a high degree of agreement with its responses in captioning and reasoning tasks. LLark is trained entirely from open-source music data and models, and we make our training code available along with the release of this paper. Additional results and audio examples are at https://bit.ly/llark, and our source code is available at https://github.com/spotify-research/llark .

Updated: 2024-06-03 03:35:01

标题: LLark：用于音乐的多模态指令跟随语言模型

摘要: 音乐具有独特而复杂的结构，对于专家人类和现有的人工智能系统都具有挑战性，并且相对于其他形式的音频，音乐提出了独特的挑战。我们提出了LLark，一种针对音乐理解的指导调整多模态模型。我们详细介绍了我们创建数据集的过程，其中包括增强各种开源音乐数据集的注释并将它们转换为统一的指导调整格式。我们为LLark提出了一种多模态架构，将一个预训练的音乐生成模型与一个预训练的语言模型集成。在对三种类型任务（音乐理解、字幕生成、推理）的评估中，我们展示LLark在音乐理解方面与现有基线相匹配或表现更好，并且人类在字幕生成和推理任务中对其回应显示出高度一致性。LLark完全是通过开源音乐数据和模型进行训练的，我们将我们的训练代码与本文发布一起提供。更多结果和音频示例请访问https://bit.ly/llark，我们的源代码可在https://github.com/spotify-research/llark找到。

更新时间: 2024-06-03 03:35:01

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2310.07160v3

Conformalized Survival Distributions: A Generic Post-Process to Increase Calibration

Discrimination and calibration represent two important properties of survival analysis, with the former assessing the model's ability to accurately rank subjects and the latter evaluating the alignment of predicted outcomes with actual events. With their distinct nature, it is hard for survival models to simultaneously optimize both of them especially as many previous results found improving calibration tends to diminish discrimination performance. This paper introduces a novel approach utilizing conformal regression that can improve a model's calibration without degrading discrimination. We provide theoretical guarantees for the above claim, and rigorously validate the efficiency of our approach across 11 real-world datasets, showcasing its practical applicability and robustness in diverse scenarios.

Updated: 2024-06-03 03:32:56

标题: 调整后的生存分布：一种增加校准性的通用后处理方法

摘要: 歧视和校准代表生存分析的两个重要属性，前者评估模型准确排名受试者的能力，后者评估预测结果与实际事件的一致性。由于它们具有不同的特性，生存模型很难同时优化这两个属性，尤其是许多先前的研究结果表明，改善校准往往会降低歧视性能。本文介绍了一种利用符合回归的新方法，可以提高模型的校准性，而不降低歧视性能。我们为上述说法提供了理论保证，并在11个真实世界数据集上严格验证了我们方法的效率，展示了它在不同场景下的实际适用性和稳健性。

更新时间: 2024-06-03 03:32:56

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.07374v2

The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative

Due to their unprecedented ability to process and respond to various types of data, Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI). As these advanced generative models increasingly form collaborative networks for complex tasks, the integrity and security of these systems are crucial. Our paper, ``The Wolf Within'', explores a novel vulnerability in MLLM societies - the indirect propagation of malicious content. Unlike direct harmful output generation for MLLMs, our research demonstrates how a single MLLM agent can be subtly influenced to generate prompts that, in turn, induce other MLLM agents in the society to output malicious content. Our findings reveal that, an MLLM agent, when manipulated to produce specific prompts or instructions, can effectively ``infect'' other agents within a society of MLLMs. This infection leads to the generation and circulation of harmful outputs, such as dangerous instructions or misinformation, across the society. We also show the transferability of these indirectly generated prompts, highlighting their possibility in propagating malice through inter-agent communication. This research provides a critical insight into a new dimension of threat posed by MLLMs, where a single agent can act as a catalyst for widespread malevolent influence. Our work underscores the urgent need for developing robust mechanisms to detect and mitigate such covert manipulations within MLLM societies, ensuring their safe and ethical utilization in societal applications.

Updated: 2024-06-03 03:29:07

标题: 《内心的狼：通过一个MLLM操作者向MLLM社会秘密注入恶意》

摘要: 由于它们处理和响应各种类型数据的能力空前，多模态大型语言模型（MLLMs）不断定义人工通用智能（AGI）的新边界。随着这些先进的生成模型越来越多地形成复杂任务的协作网络，这些系统的完整性和安全性至关重要。我们的论文《内心的狼》探讨了MLLM社会中的一种新型漏洞 - 恶意内容的间接传播。与直接为MLLMs生成有害输出不同，我们的研究表明，一个单一的MLLM代理可以被微妙地影响，产生相应的提示，从而诱使社会中的其他MLLM代理输出恶意内容。我们的研究发现，当操纵一个MLLM代理产生特定提示或指令时，可以有效地“感染”MLLM社会中的其他代理。这种感染导致了有害输出的生成和传播，例如危险指令或错误信息，在整个社会中传播。我们还展示了这些间接生成的提示的可转移性，突出它们通过代理间通信传播恶意的可能性。这项研究提供了对MLLMs所构成的威胁新维度的重要洞察力，其中一个单一代理可以充当广泛恶意影响的催化剂。我们的工作强调了迫切需要开发强大机制来检测和缓解MLLM社会中这种隐蔽操纵，确保它们在社会应用中的安全和道德利用。

更新时间: 2024-06-03 03:29:07

领域: cs.CR,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2402.14859v2

Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

Exploring the loss landscape offers insights into the inherent principles of deep neural networks (DNNs). Recent work suggests an additional asymmetry of the valley beyond the flat and sharp ones, yet without thoroughly examining its causes or implications. Our study methodically explores the factors affecting the symmetry of DNN valleys, encompassing (1) the dataset, network architecture, initialization, and hyperparameters that influence the convergence point; and (2) the magnitude and direction of the noise for 1D visualization. Our major observation shows that the {\it degree of sign consistency} between the noise and the convergence point is a critical indicator of valley symmetry. Theoretical insights from the aspects of ReLU activation and softmax function could explain the interesting phenomenon. Our discovery propels novel understanding and applications in the scenario of Model Fusion: (1) the efficacy of interpolating separate models significantly correlates with their sign consistency ratio, and (2) imposing sign alignment during federated learning emerges as an innovative approach for model parameter alignment.

Updated: 2024-06-03 03:26:59

标题: 探索和利用深度神经网络的不对称谷。

摘要: 探索损失景观可以揭示深度神经网络（DNNs）固有原则。最近的研究表明，除了平坦和陡峭之外，山谷还存在另一种不对称性，但尚未彻底研究其原因或影响。我们的研究系统地探讨了影响DNN山谷对称性的因素，包括（1）影响收敛点的数据集、网络架构、初始化和超参数；以及（2）用于1D可视化的噪声的大小和方向。我们的主要观察结果显示，噪声与收敛点之间的符号一致性程度是山谷对称性的关键指标。从ReLU激活和softmax函数的角度可以解释这一有趣的现象。我们的发现推动了对模型融合场景中的新理解和应用：（1）插值分离模型的有效性与它们的符号一致性比率显著相关，（2）在联邦学习期间实施符号对齐出现为模型参数对齐的创新方法。

更新时间: 2024-06-03 03:26:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.12489v2

Tackling the Unlimited Staleness in Federated Learning with Intertwined Data and Device Heterogeneities

The efficiency of Federated Learning (FL) is often affected by both data and device heterogeneities. Data heterogeneity is defined as the heterogeneity of data distributions on different clients. Device heterogeneity is defined as the clients' variant latencies in uploading their local model updates due to heterogeneous conditions of local hardware resources, and causes the problem of staleness when being addressed by asynchronous FL. Traditional schemes of tackling the impact of staleness consider data and device heterogeneities as two separate and independent aspects in FL, but this assumption is unrealistic in many practical FL scenarios where data and device heterogeneities are intertwined. In these cases, traditional schemes of weighted aggregation in FL have been proved to be ineffective, and a better approach is to convert a stale model update into a non-stale one. In this paper, we present a new FL framework that leverages the gradient inversion technique for such conversion, hence efficiently tackling unlimited staleness in clients' model updates. Our basic idea is to use gradient inversion to get estimations of clients' local training data from their uploaded stale model updates, and use these estimations to compute non-stale client model updates. In this way, we address the problem of possible data quality drop when using gradient inversion, while still preserving the clients' local data privacy. We compared our approach with the existing FL strategies on mainstream datasets and models, and experiment results demonstrate that when tackling unlimited staleness, our approach can significantly improve the trained model accuracy by up to 20% and speed up the FL training progress by up to 35%.

Updated: 2024-06-03 03:13:35

标题: 解决联邦学习中无限陈旧性的方法：利用数据和设备异质性交织

摘要: 联邦学习（FL）的效率通常受到数据和设备异质性的影响。数据异质性定义为不同客户端上数据分布的异质性。设备异质性定义为客户端上传本地模型更新的变体延迟，这是由于本地硬件资源的异质条件导致的，并且在异步FL中解决时会导致陈旧性问题。传统的处理陈旧性影响的方法认为在FL中数据和设备异质性是两个独立的方面，但在许多实际的FL场景中，这种假设是不现实的，因为数据和设备的异质性是交织在一起的。在这些情况下，传统的FL加权聚合方案已被证明是无效的，更好的方法是将陈旧的模型更新转换为非陈旧的模型更新。在本文中，我们提出了一种新的FL框架，利用梯度反转技术进行转换，从而有效地处理客户端模型更新中的无限陈旧性。我们的基本思想是使用梯度反转从客户端上传的陈旧模型更新中获取客户端本地训练数据的估计，并使用这些估计来计算非陈旧的客户端模型更新。通过这种方式，我们解决了使用梯度反转时可能出现的数据质量下降问题，同时仍保护了客户端的本地数据隐私。我们将我们的方法与现有的FL策略在主流数据集和模型上进行了比较，实验结果表明，在处理无限陈旧性时，我们的方法可以将训练模型的准确性显著提高高达20％，并且可以加快FL训练进度高达35％。

更新时间: 2024-06-03 03:13:35

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2309.13536v2

Stochastic Optimal Control for Diffusion Bridges in Function Spaces

Recent advancements in diffusion models and diffusion bridges primarily focus on finite-dimensional spaces, yet many real-world problems necessitate operations in infinite-dimensional function spaces for more natural and interpretable formulations. In this paper, we present a theory of stochastic optimal control (SOC) tailored to infinite-dimensional spaces, aiming to extend diffusion-based algorithms to function spaces. Specifically, we demonstrate how Doob's $h$-transform, the fundamental tool for constructing diffusion bridges, can be derived from the SOC perspective and expanded to infinite dimensions. This expansion presents a challenge, as infinite-dimensional spaces typically lack closed-form densities. Leveraging our theory, we establish that solving the optimal control problem with a specific objective function choice is equivalent to learning diffusion-based generative models. We propose two applications: (1) learning bridges between two infinite-dimensional distributions and (2) generative models for sampling from an infinite-dimensional distribution. Our approach proves effective for diverse problems involving continuous function space representations, such as resolution-free images, time-series data, and probability density functions.

Updated: 2024-06-03 03:11:45

标题: 在函数空间中扩散桥的随机最优控制

摘要: 最近对扩散模型和扩散桥的研究主要集中在有限维空间，然而许多现实世界问题需要在无限维函数空间中进行操作，以获得更自然和可解释的表达。本文提出了一种适用于无限维空间的随机最优控制（SOC）理论，旨在将基于扩散的算法扩展到函数空间。具体来说，我们展示了如何从SOC的角度推导出Doob的h-变换，这是构建扩散桥的基本工具，并将其扩展到无限维空间。这种扩展提出了一个挑战，因为无限维空间通常缺乏封闭形式的密度。利用我们的理论，我们建立了解决具有特定目标函数选择的最优控制问题等效于学习基于扩散的生成模型。我们提出了两个应用：（1）学习两个无限维分布之间的桥梁和（2）从无限维分布中抽样的生成模型。我们的方法在涉及连续函数空间表示的各种问题中表现出有效性，例如无分辨率图像、时间序列数据和概率密度函数。

更新时间: 2024-06-03 03:11:45

领域: cs.LG

下载: http://arxiv.org/abs/2405.20630v2

Annotation Guidelines-Based Knowledge Augmentation: Towards Enhancing Large Language Models for Educational Text Classification

Various machine learning approaches have gained significant popularity for the automated classification of educational text to identify indicators of learning engagement -- i.e. learning engagement classification (LEC). LEC can offer comprehensive insights into human learning processes, attracting significant interest from diverse research communities, including Natural Language Processing (NLP), Learning Analytics, and Educational Data Mining. Recently, Large Language Models (LLMs), such as ChatGPT, have demonstrated remarkable performance in various NLP tasks. However, their comprehensive evaluation and improvement approaches in LEC tasks have not been thoroughly investigated. In this study, we propose the Annotation Guidelines-based Knowledge Augmentation (AGKA) approach to improve LLMs. AGKA employs GPT 4.0 to retrieve label definition knowledge from annotation guidelines, and then applies the random under-sampler to select a few typical examples. Subsequently, we conduct a systematic evaluation benchmark of LEC, which includes six LEC datasets covering behavior classification (question and urgency level), emotion classification (binary and epistemic emotion), and cognition classification (opinion and cognitive presence). The study results demonstrate that AGKA can enhance non-fine-tuned LLMs, particularly GPT 4.0 and Llama 3 70B. GPT 4.0 with AGKA few-shot outperforms full-shot fine-tuned models such as BERT and RoBERTa on simple binary classification datasets. However, GPT 4.0 lags in multi-class tasks that require a deep understanding of complex semantic information. Notably, Llama 3 70B with AGKA is a promising combination based on open-source LLM, because its performance is on par with closed-source GPT 4.0 with AGKA. In addition, LLMs struggle to distinguish between labels with similar names in multi-class classification.

Updated: 2024-06-03 03:09:01

标题: 基于注释指南的知识增强：朝着增强大型语言模型用于教育文本分类的方向

摘要: 各种机器学习方法在教育文本的自动分类中获得了显著的流行度，以识别学习参与的指标，即学习参与分类（LEC）。LEC可以提供关于人类学习过程的全面洞察，吸引了来自自然语言处理（NLP）、学习分析和教育数据挖掘等多样研究领域的重要兴趣。最近，大型语言模型（LLMs），如ChatGPT，在各种NLP任务中展现出卓越的性能。然而，它们在LEC任务中的全面评估和改进方法尚未得到深入研究。在本研究中，我们提出了基于注释指南的知识增强（AGKA）方法来改进LLMs。AGKA利用GPT 4.0从注释指南中检索标签定义知识，然后应用随机下采样器选择几个典型示例。随后，我们进行了LEC的系统评估基准，包括六个LEC数据集，涵盖行为分类（问题和紧急程度）、情感分类（二进制和认知情感）和认知分类（观点和认知存在）。研究结果表明，AGKA可以增强未经微调的LLMs，特别是GPT 4.0和Llama 3 70B。带有AGKA的GPT 4.0在简单的二元分类数据集上胜过全量微调模型，如BERT和RoBERTa。然而，在需要对复杂语义信息有深入理解的多类任务中，GPT 4.0落后。值得注意的是，带有AGKA的Llama 3 70B是一种有前途的组合，基于开源LLM，因为其性能与闭源GPT 4.0带AGKA的性能相当。此外，LLMs在多类分类中很难区分名称相似的标签。

更新时间: 2024-06-03 03:09:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.00954v1

FIFO-Diffusion: Generating Infinite Videos from Text without Training

We propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Our approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without additional training. This is achieved by iteratively performing diagonal denoising, which concurrently processes a series of consecutive frames with increasing noise levels in a queue; our method dequeues a fully denoised frame at the head while enqueuing a new random noise frame at the tail. However, diagonal denoising is a double-edged sword as the frames near the tail can take advantage of cleaner ones by forward reference but such a strategy induces the discrepancy between training and inference. Hence, we introduce latent partitioning to reduce the training-inference gap and lookahead denoising to leverage the benefit of forward referencing. Practically, FIFO-Diffusion consumes a constant amount of memory regardless of the target video length given a baseline model, while well-suited for parallel inference on multiple GPUs. We have demonstrated the promising results and effectiveness of the proposed methods on existing text-to-video generation baselines. Generated video samples and source codes are available at our project page.

Updated: 2024-06-03 03:04:12

标题: FIFO-Diffusion：无需训练即可从文本生成无限视频

摘要: 我们提出了一种基于预训练扩散模型的新型推断技术，用于文本条件视频生成。我们的方法，称为FIFO-Diffusion，从概念上讲能够生成无限长的视频而无需额外训练。这是通过迭代执行对角去噪来实现的，该过程同时处理一个队列中具有逐渐增加噪声水平的一系列连续帧；我们的方法从头部出列一个完全去噪的帧，同时在尾部入列一个新的随机噪声帧。然而，对角去噪是一把双刃剑，因为接近尾部的帧可以通过前向引用利用更干净的帧，但这种策略会引起训练和推断之间的差异。因此，我们引入了潜在分区来减少训练和推断之间的差距，并引入了前瞻去噪来利用前向引用的好处。从实际上讲，FIFO-Diffusion在给定基线模型的情况下，无论目标视频长度如何，都会消耗恒定的内存，同时非常适合在多个GPU上进行并行推断。我们在现有的文本到视频生成基线上展示了所提出方法的有希望的结果和有效性。生成的视频样本和源代码可在我们的项目页面上找到。

更新时间: 2024-06-03 03:04:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.11473v2

On the Inherent Privacy Properties of Discrete Denoising Diffusion Models

Privacy concerns have led to a surge in the creation of synthetic datasets, with diffusion models emerging as a promising avenue. Although prior studies have performed empirical evaluations on these models, there has been a gap in providing a mathematical characterization of their privacy-preserving capabilities. To address this, we present the pioneering theoretical exploration of the privacy preservation inherent in discrete diffusion models (DDMs) for discrete dataset generation. Focusing on per-instance differential privacy (pDP), our framework elucidates the potential privacy leakage for each data point in a given training dataset, offering insights into how the privacy loss of each point correlates with the dataset's distribution. Our bounds also show that training with $s$-sized data points leads to a surge in privacy leakage from $(\epsilon, O(\frac{1}{s^2\epsilon}))$-pDP to $(\epsilon, O(\frac{1}{s\epsilon}))$-pDP of the DDM during the transition from the pure noise to the synthetic clean data phase, and a faster decay in diffusion coefficients amplifies the privacy guarantee. Finally, we empirically verify our theoretical findings on both synthetic and real-world datasets.

Updated: 2024-06-03 03:02:54

标题: 关于离散去噪扩散模型固有的隐私特性

摘要: 隐私问题导致合成数据集的创建激增，扩散模型作为一种有前途的途径出现。尽管先前的研究对这些模型进行了实证评估，但在提供其保护隐私能力的数学特征方面存在空白。为了解决这一问题，我们提出了对离散扩散模型（DDMs）中固有的隐私保护进行开创性理论探索，用于生成离散数据集。我们的框架侧重于每个实例的差分隐私（pDP），阐明了在给定训练数据集中每个数据点的潜在隐私泄漏，为我们提供了有关每个点的隐私损失如何与数据集分布相关联的见解。我们的界限还表明，使用$s$大小的数据点进行训练会导致从$(\epsilon，O(\frac{1}{s^2\epsilon}))$-pDP到$(\epsilon，O(\frac{1}{s\epsilon}))$-pDP的隐私泄漏激增，这发生在从纯噪声到合成干净数据阶段的过渡中，扩散系数的快速衰减增强了隐私保证。最后，我们在合成和真实数据集上通过实证验证我们的理论发现。

更新时间: 2024-06-03 03:02:54

领域: cs.LG

下载: http://arxiv.org/abs/2310.15524v3

Unveil the Duality of Retrieval-Augmented Generation: Theoretical Analysis and Practical Solution

Retrieval-augmented generation (RAG) utilizes retrieved texts to enhance large language models (LLMs). However, studies show that RAG is not consistently effective and can even mislead LLMs due to noisy or incorrect retrieved texts. This suggests that RAG possesses a duality including both benefit and detriment. Although many existing methods attempt to address this issue, they lack a theoretical explanation for the duality in RAG. The benefit and detriment within this duality remain a black box that cannot be quantified or compared in an explainable manner. This paper takes the first step in theoretically giving the essential explanation of benefit and detriment in RAG by: (1) decoupling and formalizing them from RAG prediction, (2) approximating the gap between their values by representation similarity and (3) establishing the trade-off mechanism between them, to make them explainable, quantifiable, and comparable. We demonstrate that the distribution difference between retrieved texts and LLMs' knowledge acts as double-edged sword, bringing both benefit and detriment. We also prove that the actual effect of RAG can be predicted at token level. Based on our theory, we propose a practical novel method, X-RAG, which achieves collaborative generation between pure LLM and RAG at token level to preserve benefit and avoid detriment. Experiments in real-world tasks based on LLMs including OPT, LLaMA-2, and Mistral show the effectiveness of our method and support our theoretical results.

Updated: 2024-06-03 02:56:14

标题: 揭示检索增强生成的二重性：理论分析与实际解决方案

摘要: 检索增强生成（RAG）利用检索到的文本来增强大型语言模型（LLMs）。然而，研究表明RAG并不总是有效的，甚至可能会误导LLMs，这是由于嘈杂或不正确的检索文本。这表明RAG具有一种包含利益和损害的二重性。尽管许多现有方法试图解决这个问题，但它们缺乏对RAG中二重性的理论解释。这种二重性中的利益和损害仍然是一个无法量化或以可解释的方式进行比较的黑匣子。本文首次在理论上解释RAG中利益和损害的基本原因：（1）从RAG预测中解耦并形式化它们，（2）通过表示相似性来近似它们之间的差距，（3）建立它们之间的权衡机制，使它们成为可解释的、可量化的和可比较的。我们证明检索文本与LLMs知识之间的分布差异起着双刃剑作用，既带来利益又带来损害。我们还证明RAG的实际效果可以在令牌级别上预测。基于我们的理论，我们提出了一种实用的新方法X-RAG，在令牌级别实现了纯LLM和RAG之间的协作生成，以保留利益并避免损害。基于LLMs的OPT、LLaMA-2和Mistral等真实世界任务的实验显示了我们方法的有效性，并支持我们的理论结果。

更新时间: 2024-06-03 02:56:14

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.00944v1

State Space Models on Temporal Graphs: A First-Principles Study

Over the past few years, research on deep graph learning has shifted from static graphs to temporal graphs in response to real-world complex systems that exhibit dynamic behaviors. In practice, temporal graphs are formalized as an ordered sequence of static graph snapshots observed at discrete time points. Sequence models such as RNNs or Transformers have long been the predominant backbone networks for modeling such temporal graphs. Yet, despite the promising results, RNNs struggle with long-range dependencies, while transformers are burdened by quadratic computational complexity. Recently, state space models (SSMs), which are framed as discretized representations of an underlying continuous-time linear dynamical system, have garnered substantial attention and achieved breakthrough advancements in independent sequence modeling. In this work, we undertake a principled investigation that extends SSM theory to temporal graphs by integrating structural information into the online approximation objective via the adoption of a Laplacian regularization term. The emergent continuous-time system introduces novel algorithmic challenges, thereby necessitating our development of GraphSSM, a graph state space model for modeling the dynamics of temporal graphs. Extensive experimental results demonstrate the effectiveness of our GraphSSM framework across various temporal graph benchmarks.

Updated: 2024-06-03 02:56:11

标题: 时间图上的状态空间模型：一项第一原理研究

摘要: 在过去几年中，深度图学习的研究已经从静态图转向时间图，以响应展现动态行为的现实复杂系统。在实践中，时间图被形式化为在离散时间点观察到的静态图快照的有序序列。诸如RNN或Transformer之类的序列模型长期以来一直是建模这种时间图的主要骨干网络。然而，尽管有着令人期待的结果，RNN很难处理长距离依赖关系，而transformers则受到二次计算复杂性的负担。最近，被构建为基础连续时间线性动态系统的离散化表示的状态空间模型（SSMs）引起了相当大的关注，并在独立序列建模方面取得了突破性进展。在这项工作中，我们进行了一项原则性研究，将SSM理论延伸到时间图，通过采用拉普拉斯正则项将结构信息集成到在线近似目标中。新出现的连续时间系统引入了新颖的算法挑战，从而需要我们开发GraphSSM，一种用于建模时间图动态的图状态空间模型。广泛的实验结果展示了我们的GraphSSM框架在各种时间图基准上的有效性。

更新时间: 2024-06-03 02:56:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.00943v1

Proof-of-Learning with Incentive Security

Most concurrent blockchain systems rely heavily on the Proof-of-Work (PoW) or Proof-of-Stake (PoS) mechanisms for decentralized consensus and security assurance. However, the substantial energy expenditure stemming from computationally intensive yet meaningless tasks has raised considerable concerns surrounding traditional PoW approaches, The PoS mechanism, while free of energy consumption, is subject to security and economic issues. Addressing these issues, the paradigm of Proof-of-Useful-Work (PoUW) seeks to employ challenges of practical significance as PoW, thereby imbuing energy consumption with tangible value. While previous efforts in Proof of Learning (PoL) explored the utilization of deep learning model training SGD tasks as PoUW challenges, recent research has revealed its vulnerabilities to adversarial attacks and the theoretical hardness in crafting a byzantine-secure PoL mechanism. In this paper, we introduce the concept of incentive-security that incentivizes rational provers to behave honestly for their best interest, bypassing the existing hardness to design a PoL mechanism with computational efficiency, a provable incentive-security guarantee and controllable difficulty. Particularly, our work is secure against two attacks to the recent work of Jia et al. [2021], and also improves the computational overhead from $\Theta(1)$ to $O(\frac{\log E}{E})$. Furthermore, while most recent research assumes trusted problem providers and verifiers, our design also guarantees frontend incentive-security even when problem providers are untrusted, and verifier incentive-security that bypasses the Verifier's Dilemma. By incorporating ML training into blockchain consensus mechanisms with provable guarantees, our research not only proposes an eco-friendly solution to blockchain systems, but also provides a proposal for a completely decentralized computing power market in the new AI age.

Updated: 2024-06-03 02:51:46

标题: 学习证明与激励安全性

摘要: 大多数并发区块链系统在分散共识和安全保障方面严重依赖工作量证明（PoW）或权益证明（PoS）机制。然而，由于计算密集但毫无意义的任务造成的巨大能源消耗引起了对传统PoW方法的重大担忧。PoS机制虽然不消耗能源，但存在安全和经济问题。为了解决这些问题，Proof-of-Useful-Work（PoUW）范式旨在将具有实际意义的挑战作为PoW，从而赋予能源消耗实际价值。虽然以往的Proof of Learning（PoL）工作探讨了将深度学习模型训练SGD任务作为PoUW挑战的利用，但最近的研究揭示了其容易受到对抗性攻击和设计拜占庭安全PoL机制的理论困难。本文引入了激励安全的概念，激励理性的证明者为了自身利益而诚实行事，绕过了设计PoL机制时的现有困难，具有计算效率、可证明的激励安全保证和可控的难度。特别地，我们的工作针对贾等人最新研究的两种攻击进行了安全防护，并将计算开销从$\Theta(1)$改进到$O(\frac{\log E}{E})$。此外，尽管大多数最近的研究假设问题提供者和验证者是可信的，我们的设计还确保了即使问题提供者不受信任，也能保证前端激励安全，同时绕过验证者困境的验证者激励安全。通过将机器学习训练纳入具有可证明保障的区块链共识机制，我们的研究不仅提出了区块链系统的环保解决方案，还为新人工智能时代提供了一个完全去中心化的计算能力市场的提案。

更新时间: 2024-06-03 02:51:46

领域: cs.CR,cs.AI,cs.ET,cs.GT,cs.LG

下载: http://arxiv.org/abs/2404.09005v4

A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions

The $2$-Wasserstein distance is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the $2$-Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical $2$-Wasserstein distance on $n$ samples in $\mathbb{R}^2$ to converge to the true distance at a rate of $n^{-1/4}$, which is significantly slower than the rate of $n^{-1/2}$ for $1$-Wasserstein distance. We introduce a new family of distances parameterized by $k \ge 0$, called $k$-RPW that is based on computing the partial $2$-Wasserstein distance. We show that (1) $k$-RPW satisfies the metric properties, (2) $k$-RPW is robust to small outlier mass while retaining the sensitivity of $2$-Wasserstein distance to minor geometric differences, and (3) when $k$ is a constant, $k$-RPW distance between empirical distributions on $n$ samples in $\mathbb{R}^2$ converges to the true distance at a rate of $n^{-1/3}$, which is faster than the convergence rate of $n^{-1/4}$ for the $2$-Wasserstein distance. Using the partial $p$-Wasserstein distance, we extend our distance to any $p \in [1,\infty]$. By setting parameters $k$ or $p$ appropriately, we can reduce our distance to the total variation, $p$-Wasserstein, and the L\'evy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the $1$-Wasserstein, $2$-Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.

Updated: 2024-06-03 02:50:35

标题: 一个用于比较分布的新的稳健的基于部分$p$-Wasserstein的度量方法

摘要: 二次Wasserstein距离对分布之间微小几何差异敏感，使其成为一种非常强大的不相似度度量。然而，由于这种敏感性，一个小的异常值可能导致两个相似分布之间的二次Wasserstein距离显著增加。类似地，采样差异可能导致在$\mathbb{R}^2$中n个样本上的经验二次Wasserstein距离以$n^{-1/4}$的速率收敛到真实距离，这比1-Wasserstein距离的$n^{-1/2}$收敛速率慢得多。我们引入了一个由$k \ge 0$参数化的新距离族，称为$k$-RPW，它基于计算部分二次Wasserstein距离。我们证明了：(1) $k$-RPW满足度量性质，(2) $k$-RPW对小的异常值具有鲁棒性，同时保留了二次Wasserstein距离对微小几何差异的敏感性，以及(3)当$k$为常数时，在$\mathbb{R}^2$中n个样本上的经验分布的$k$-RPW距离以$n^{-1/3}$的速率收敛到真实距离，这比二次Wasserstein距离的$n^{-1/4}$收敛速度更快。通过使用部分$p$-Wasserstein距离，我们将我们的距离扩展到任意$p \in [1,\infty]$。通过适当设置参数$k$或$p$，我们可以将我们的距离简化为总变差、$p$-Wasserstein和L\'evy-Prokhorov距离。实验表明，在嘈杂的真实世界数据集上的图像检索任务中，我们的距离函数相比于1-Wasserstein、2-Wasserstein和TV距离能够实现更高的准确度。

更新时间: 2024-06-03 02:50:35

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.03664v2

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose \oursfull (\ours), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time. Additionally, we introduce a latent diffusion architecture for symbolic music generation with high time resolution, which can be composed with SCG in a plug-and-play fashion. Compared to standard strong baselines in symbolic music generation, this framework demonstrates marked advancements in music quality and rule-based controllability, outperforming current state-of-the-art generators in a variety of settings. For detailed demonstrations, code and model checkpoints, please visit our project website: https://scg-rule-guided-music.github.io/.

Updated: 2024-06-03 02:47:27

标题: 使用非可微分规则引导扩散生成符号音乐

摘要: 我们研究了符号音乐生成的问题（例如生成钢琴卷谱），技术重点放在不可微分的规则指导上。音乐规则通常以符号形式表达在音符特征上，如音符密度或和弦进行，其中许多是不可微分的，这在使用它们进行引导扩散时构成挑战。我们提出了一种新颖的指导方法\oursfull（简称\ours），该方法仅需要对规则函数进行前向评估，可以与预训练的扩散模型插拔式地工作，从而首次实现了对不可微分规则的无训练指导。此外，我们引入了一种用于具有高时间分辨率的符号音乐生成的潜在扩散架构，可以与SCG以插拔方式组合。与符号音乐生成中的标准强基线相比，该框架展示了在音乐质量和基于规则的可控性方面的显著进展，在各种情景下优于当前最先进的生成器。有关详细演示、代码和模型检查点，请访问我们的项目网站：https://scg-rule-guided-music.github.io/。

更新时间: 2024-06-03 02:47:27

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2402.14285v3

CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from poor performance in low-bit (such as 2 to 3 bits) scenarios. In this paper, we present a novel and effective Column-Level Adaptive weight Quantization (CLAQ) framework by introducing three different types of adaptive strategies for LLM quantization. Firstly, a K-Means clustering based algorithm is proposed that allows dynamic generation of quantization centroids for each column of a parameter matrix. Secondly, we design an outlier-guided adaptive precision search strategy which can dynamically assign varying bit-widths to different columns. Finally, a dynamic outlier reservation scheme is developed to retain some parameters in their original float point precision, in trade off of boosted model performance. Experiments on various mainstream open source LLMs including LLaMA-1, LLaMA-2 and Yi demonstrate that our methods achieve the state-of-the-art results across different bit settings, especially in extremely low-bit scenarios. Code is available at https://github.com/fayuge/CLAQ.

Updated: 2024-06-03 02:46:53

标题: CLAQ：推动LLMs的低比特后训练量化的极限。

摘要: Parameter quantization对于大型语言模型（LLMs）最近在减少内存成本和提高计算效率方面引起了越来越多的关注。早期的方法已被广泛采用。然而，现有方法在低位（如2至3位）情况下性能不佳。本文介绍了一种新颖而有效的列级适应性权重量化（CLAQ）框架，通过引入三种不同类型的适应性策略进行LLM量化。首先，提出了一种基于K-Means聚类的算法，允许动态生成参数矩阵每列的量化中心。其次，设计了一种基于异常值引导的自适应精度搜索策略，可以动态分配不同列的不同位宽。最后，开发了一种动态异常值保留方案，以保留一些参数在其原始浮点精度下，以换取提升模型性能。在包括LLaMA-1、LLaMA-2和Yi在内的各种主流开源LLMs上的实验表明，我们的方法在不同位设置下取得了最先进的结果，尤其是在极低位的情况下。代码可在https://github.com/fayuge/CLAQ上找到。

更新时间: 2024-06-03 02:46:53

领域: cs.LG

下载: http://arxiv.org/abs/2405.17233v2

Representing Molecules as Random Walks Over Interpretable Grammars

Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representing and reasoning over such molecules in terms of graph grammars that explicitly describe the hierarchical design space featuring motifs to be the design basis. We present a novel representation in the form of random walks over the design space, which facilitates both molecule generation and property prediction. We demonstrate clear advantages over existing methods in terms of performance, efficiency, and synthesizability of predicted molecules, and we provide detailed insights into the method's chemical interpretability.

Updated: 2024-06-03 02:43:24

标题: 用可解释的语法将分子表示为随机行走

摘要: 近年来，分子发现领域的研究主要集中在小型、类药物分子上，这导致许多同样重要的材料设计应用缺乏足够的技术支持。这些应用通常依赖于更复杂的分子结构，设计精心、使用已知亚结构较少。我们提出了一种数据高效且可解释的模型，用于以图形语法的形式表示和推理这些分子，明确描述具有图案的分层设计空间作为设计基础。我们提出了一种新颖的表示形式，即对设计空间中的随机漫步，既有助于分子生成，又有助于属性预测。我们证明了在性能、效率和预测分子的合成可行性方面，我们的方法相对于现有方法具有明显优势，并提供了对该方法的化学可解释性的详细见解。

更新时间: 2024-06-03 02:43:24

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2403.08147v3

Achieving $\tilde{O}(1/ε)$ Sample Complexity for Constrained Markov Decision Process

We consider the reinforcement learning problem for the constrained Markov decision process (CMDP), which plays a central role in satisfying safety or resource constraints in sequential learning and decision-making. In this problem, we are given finite resources and a MDP with unknown transition probabilities. At each stage, we take an action, collecting a reward and consuming some resources, all assumed to be unknown and need to be learned over time. In this work, we take the first step towards deriving optimal problem-dependent guarantees for the CMDP problems. We derive a logarithmic regret bound, which translates into a $O(\frac{1}{\Delta\cdot\eps}\cdot\log^2(1/\eps))$ sample complexity bound, with $\Delta$ being a problem-dependent parameter, yet independent of $\eps$. Our sample complexity bound improves upon the state-of-art $O(1/\eps^2)$ sample complexity for CMDP problems established in the previous literature, in terms of the dependency on $\eps$. To achieve this advance, we develop a new framework for analyzing CMDP problems. To be specific, our algorithm operates in the primal space and we resolve the primal LP for the CMDP problem at each period in an online manner, with \textit{adaptive} remaining resource capacities. The key elements of our algorithm are: i) a characterization of the instance hardness via LP basis, ii) an eliminating procedure that identifies one optimal basis of the primal LP, and; iii) a resolving procedure that is adaptive to the remaining resources and sticks to the characterized optimal basis.

Updated: 2024-06-03 02:37:28

标题: 实现受限马尔可夫决策过程的 $\tilde{O}(1/ε)$ 样本复杂度

摘要: 我们考虑受限马尔可夫决策过程（CMDP）的强化学习问题，在顺序学习和决策中满足安全性或资源约束起着核心作用。在这个问题中，我们拥有有限资源和一个未知转移概率的MDP。在每个阶段，我们采取一个行动，收集奖励并消耗一些资源，所有这些都被假定为未知且需要随时间学习。在这项工作中，我们迈出了为CMDP问题推导最优问题相关保证的第一步。我们推导出一个对数遗憾界限，这转化为一个$O(\frac{1}{\Delta\cdot\eps}\cdot\log^2(1/\eps))$的样本复杂度界限，其中$\Delta$是一个问题相关参数，但独立于$\eps$。我们的样本复杂度界限改进了先前文献中建立的CMDP问题的$O(1/\eps^2)$样本复杂度，就$\eps$的依赖性而言。为了实现这一进展，我们开发了一个新的分析CMDP问题的框架。具体来说，我们的算法在原始空间中运行，并以在线方式解决每个周期的CMDP问题的原始LP，具有\textit{自适应}剩余资源容量。我们算法的关键元素是：i) 通过LP基础对实例难度进行表征，ii) 一个识别原始LP的一个最优基础的消除过程，和；iii) 一个对剩余资源自适应且坚持表征最优基础的解决过程。

更新时间: 2024-06-03 02:37:28

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2402.16324v2

Variational Schrödinger Diffusion Models

Schr\"odinger bridge (SB) has emerged as the go-to method for optimizing transportation plans in diffusion models. However, SB requires estimating the intractable forward score functions, inevitably resulting in the costly implicit training loss based on simulated trajectories. To improve the scalability while preserving efficient transportation plans, we leverage variational inference to linearize the forward score functions (variational scores) of SB and restore simulation-free properties in training backward scores. We propose the variational Schr\"odinger diffusion model (VSDM), where the forward process is a multivariate diffusion and the variational scores are adaptively optimized for efficient transport. Theoretically, we use stochastic approximation to prove the convergence of the variational scores and show the convergence of the adaptively generated samples based on the optimal variational scores. Empirically, we test the algorithm in simulated examples and observe that VSDM is efficient in generations of anisotropic shapes and yields straighter sample trajectories compared to the single-variate diffusion. We also verify the scalability of the algorithm in real-world data and achieve competitive unconditional generation performance in CIFAR10 and conditional generation in time series modeling. Notably, VSDM no longer depends on warm-up initializations and has become tuning-friendly in training large-scale experiments.

Updated: 2024-06-03 02:36:51

标题: 变分薛定谔扩散模型

摘要: 薛定谔桥（SB）已成为优化扩散模型中的交通计划的首选方法。然而，SB需要估计难以处理的前向得分函数，从而不可避免地导致基于模拟轨迹的昂贵的隐式训练损失。为了提高可扩展性并保留高效的交通计划，我们利用变分推断来线性化SB的前向得分函数（变分得分），并在训练后向得分时恢复无需模拟的特性。我们提出了变分薛定谔扩散模型（VSDM），其中前向过程是多元扩散，变分得分被自适应优化以实现高效传输。从理论上讲，我们使用随机逼近来证明变分得分的收敛性，并展示基于最优变分得分生成的样本的收敛性。在实证方面，我们在模拟示例中测试了该算法，并观察到VSDM在生成各向异性形状方面效率高，并且相对于单变量扩散，产生更直的样本轨迹。我们还验证了该算法在真实数据中的可扩展性，并在CIFAR10数据集中取得了竞争力强的无条件生成性能和在时间序列建模中的条件生成性能。值得注意的是，VSDM不再依赖于预热初始化，并且在训练大规模实验中变得更易调整。

更新时间: 2024-06-03 02:36:51

领域: cs.LG

下载: http://arxiv.org/abs/2405.04795v2

KTO: Model Alignment as Prospect Theoretic Optimization

Kahneman & Tversky's $\textit{prospect theory}$ tells us that humans perceive random variables in a biased but well-defined manner (1992); for example, humans are famously loss-averse. We show that objectives for aligning LLMs with human feedback implicitly incorporate many of these biases -- the success of these objectives (e.g., DPO) over cross-entropy minimization can partly be ascribed to them belonging to a family of loss functions that we call $\textit{human-aware losses}$ (HALOs). However, the utility functions these methods attribute to humans still differ from those in the prospect theory literature. Using a Kahneman-Tversky model of human utility, we propose a HALO that directly maximizes the utility of generations instead of maximizing the log-likelihood of preferences, as current methods do. We call this approach KTO, and it matches or exceeds the performance of preference-based methods at scales from 1B to 30B, despite only learning from a binary signal of whether an output is desirable. More broadly, our work suggests that there is no one HALO that is universally superior; the best loss depends on the inductive biases most appropriate for a given setting, an oft-overlooked consideration.

Updated: 2024-06-03 02:36:09

标题: KTO：作为前景理论优化的模型对齐

摘要: 康纳曼和特沃斯基的“前景理论”告诉我们，人类以有偏见但明确定义的方式感知随机变量（1992年）；例如，人类以著名的风险规避方式处理损失。我们展示了将LLMs与人类反馈对齐的目标暗含了许多这些偏见 -- 这些目标的成功（例如DPO相对于交叉熵最小化）部分归因于它们属于我们称之为“人类感知损失”（HALOs）家族的损失函数。然而，这些方法归因于人类的效用函数仍与前景理论文献中的不同。利用康纳曼-特沃斯基人类效用模型，我们提出了一种直接最大化生成效用而不是最大化偏好对数似然的HALO。我们将这种方法称为KTO，并且在从1B到30B的规模上与基于偏好的方法的性能相匹配或超越，尽管它仅从输出是否可取的二进制信号中学习。更广泛地说，我们的工作表明没有一种普遍优越的HALO；最佳损失取决于对于特定环境最合适的归纳偏见，这是一个经常被忽视的考虑因素。

更新时间: 2024-06-03 02:36:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.01306v2

BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models

Large Language Models (LLMs) are constrained by outdated information and a tendency to generate incorrect data, commonly referred to as "hallucinations." Retrieval-Augmented Generation (RAG) addresses these limitations by combining the strengths of retrieval-based methods and generative models. This approach involves retrieving relevant information from a large, up-to-date dataset and using it to enhance the generation process, leading to more accurate and contextually appropriate responses. Despite its benefits, RAG introduces a new attack surface for LLMs, particularly because RAG databases are often sourced from public data, such as the web. In this paper, we propose \TrojRAG{} to identify the vulnerabilities and attacks on retrieval parts (RAG database) and their indirect attacks on generative parts (LLMs). Specifically, we identify that poisoning several customized content passages could achieve a retrieval backdoor, where the retrieval works well for clean queries but always returns customized poisoned adversarial queries. Triggers and poisoned passages can be highly customized to implement various attacks. For example, a trigger could be a semantic group like "The Republican Party, Donald Trump, etc." Adversarial passages can be tailored to different contents, not only linked to the triggers but also used to indirectly attack generative LLMs without modifying them. These attacks can include denial-of-service attacks on RAG and semantic steering attacks on LLM generations conditioned by the triggers. Our experiments demonstrate that by just poisoning 10 adversarial passages can induce 98.2\% success rate to retrieve the adversarial passages. Then, these passages can increase the reject ratio of RAG-based GPT-4 from 0.01\% to 74.6\% or increase the rate of negative responses from 0.22\% to 72\% for targeted queries.

Updated: 2024-06-03 02:25:33

标题: BadRAG：识别大型语言模型检索增强生成中的漏洞

摘要: 大型语言模型（LLMs）受过时信息和生成不正确数据的倾向的限制，通常称为“幻觉”。检索增强生成（RAG）通过结合基于检索的方法和生成模型的优势来解决这些限制。这种方法涉及从一个大型、最新的数据集中检索相关信息，并利用它来增强生成过程，从而产生更准确和上下文适当的响应。尽管具有益处，RAG为LLMs引入了一个新的攻击面，特别是因为RAG数据库通常是从公共数据源（如网络）获取的。在本文中，我们提出了\TrojRAG{}来识别检索部分（RAG数据库）的漏洞和攻击，以及它们对生成部分（LLMs）的间接攻击。具体来说，我们发现污染几个定制内容段落可以实现检索后门，其中检索对于干净查询效果良好，但始终返回定制的毒化对抗查询。触发器和受感染的段落可以高度定制以实施各种攻击。例如，触发器可以是像“共和党、唐纳德·特朗普等”这样的语义组。对抗性段落可以针对不同内容进行定制，不仅与触发器相关联，还可用于间接攻击生成LLMs而无需修改它们。这些攻击可能包括对RAG的拒绝服务攻击以及由触发器引导的LLM生成的语义操纵攻击。我们的实验表明，仅污染10个对抗性段落就可以导致98.2%的成功率来检索这些对抗性段落。然后，这些段落可以将基于RAG的GPT-4的拒绝比例从0.01%增加至74.6%，或将针对性查询的负面响应率从0.22%增加至72%。

更新时间: 2024-06-03 02:25:33

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.00083v1

A Synergistic Approach In Network Intrusion Detection By Neurosymbolic AI

The prevailing approaches in Network Intrusion Detection Systems (NIDS) are often hampered by issues such as high resource consumption, significant computational demands, and poor interpretability. Furthermore, these systems generally struggle to identify novel, rapidly changing cyber threats. This paper delves into the potential of incorporating Neurosymbolic Artificial Intelligence (NSAI) into NIDS, combining deep learning's data-driven strengths with symbolic AI's logical reasoning to tackle the dynamic challenges in cybersecurity, which also includes detailed NSAI techniques introduction for cyber professionals to explore the potential strengths of NSAI in NIDS. The inclusion of NSAI in NIDS marks potential advancements in both the detection and interpretation of intricate network threats, benefiting from the robust pattern recognition of neural networks and the interpretive prowess of symbolic reasoning. By analyzing network traffic data types and machine learning architectures, we illustrate NSAI's distinctive capability to offer more profound insights into network behavior, thereby improving both detection performance and the adaptability of the system. This merging of technologies not only enhances the functionality of traditional NIDS but also sets the stage for future developments in building more resilient, interpretable, and dynamic defense mechanisms against advanced cyber threats. The continued progress in this area is poised to transform NIDS into a system that is both responsive to known threats and anticipatory of emerging, unseen ones.

Updated: 2024-06-03 02:24:01

标题: 《一种神经符号人工智能在网络入侵检测中的协同方法》

摘要: 目前网络入侵检测系统（NIDS）中流行的方法经常受到诸如高资源消耗、显著的计算需求和较差的可解释性等问题的阻碍。此外，这些系统通常难以识别新型、快速变化的网络威胁。本文探讨了将神经符号人工智能（NSAI）纳入NIDS的潜力，将深度学习的数据驱动优势与符号人工智能的逻辑推理相结合，以应对网络安全领域的动态挑战。同时，本文还介绍了NSAI技术，供网络安全专业人员探索NSAI在NIDS中的潜在优势。将NSAI纳入NIDS标志着在检测和解释复杂网络威胁方面的潜在进展，从神经网络的强大模式识别和符号推理的解释能力中受益。通过分析网络流量数据类型和机器学习架构，我们展示了NSAI提供更深刻洞见网络行为的独特能力，从而提高了检测性能和系统的适应性。这种技术的融合不仅增强了传统NIDS的功能性，还为构建更具弹性、可解释性和动态的防御机制打下了基础。在这一领域的持续进展有望将NIDS转变为一个既对已知威胁做出响应，又能预见到新出现的未知威胁的系统。

更新时间: 2024-06-03 02:24:01

领域: cs.CR,cs.AI,cs.SC

下载: http://arxiv.org/abs/2406.00938v1

Policy Dispersion in Non-Markovian Environment

Markov Decision Process (MDP) presents a mathematical framework to formulate the learning processes of agents in reinforcement learning. MDP is limited by the Markovian assumption that a reward only depends on the immediate state and action. However, a reward sometimes depends on the history of states and actions, which may result in the decision process in a non-Markovian environment. In such environments, agents receive rewards via temporally-extended behaviors sparsely, and the learned policies may be similar. This leads the agents acquired with similar policies generally overfit to the given task and can not quickly adapt to perturbations of environments. To resolve this problem, this paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment, in which a policy dispersion scheme is designed for seeking diverse policy representation. Specifically, we first adopt a transformer-based method to learn policy embeddings. Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies. Finally, we prove that if the dispersion matrix is positive definite, the dispersed embeddings can effectively enlarge the disagreements across policies, yielding a diverse expression for the original policy embedding distribution. Experimental results show that this dispersion scheme can obtain more expressive diverse policies, which then derive more robust performance than recent learning baselines under various learning environments.

Updated: 2024-06-03 02:18:44

标题: 非马尔可夫环境中的政策分散程度

摘要: 马尔可夫决策过程（MDP）提供了一个数学框架，用于制定强化学习中代理的学习过程。MDP受到马尔可夫假设的限制，即奖励仅取决于当前状态和动作。然而，有时奖励取决于状态和动作的历史，这可能导致在非马尔可夫环境中的决策过程。在这种环境中，代理通过时间延伸的行为稀疏地接收奖励，并且学习的策略可能相似。这导致获得相似策略的代理通常对给定任务过度拟合，并且无法快速适应环境的扰动。为了解决这个问题，本文尝试在非马尔可夫环境下从状态-动作对的历史中学习多样化策略，其中设计了一个策略分散方案以寻求多样化的策略表示。具体地，我们首先采用基于transformer的方法来学习策略嵌入。然后，我们堆叠策略嵌入以构建一个分散矩阵，以诱导一组多样化的策略。最后，我们证明如果分散矩阵是正定的，分散的嵌入可以有效地扩大策略之间的分歧，产生原始策略嵌入分布的多样化表达。实验结果表明，这种分散方案可以获得更具表现力的多样化策略，从而在各种学习环境下实现更强大的性能，胜过最近的学习基线。

更新时间: 2024-06-03 02:18:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2302.14509v2

EGTR: Extracting Graph from Transformer for Scene Graph Generation

Scene Graph Generation (SGG) is a challenging task of detecting objects and predicting relationships between objects. After DETR was developed, one-stage SGG models based on a one-stage object detector have been actively studied. However, complex modeling is used to predict the relationship between objects, and the inherent relationship between object queries learned in the multi-head self-attention of the object detector has been neglected. We propose a lightweight one-stage SGG model that extracts the relation graph from the various relationships learned in the multi-head self-attention layers of the DETR decoder. By fully utilizing the self-attention by-products, the relation graph can be extracted effectively with a shallow relation extraction head. Considering the dependency of the relation extraction task on the object detection task, we propose a novel relation smoothing technique that adjusts the relation label adaptively according to the quality of the detected objects. By the relation smoothing, the model is trained according to the continuous curriculum that focuses on object detection task at the beginning of training and performs multi-task learning as the object detection performance gradually improves. Furthermore, we propose a connectivity prediction task that predicts whether a relation exists between object pairs as an auxiliary task of the relation extraction. We demonstrate the effectiveness and efficiency of our method for the Visual Genome and Open Image V6 datasets. Our code is publicly available at https://github.com/naver-ai/egtr.

Updated: 2024-06-03 02:15:03

标题: EGTR：从Transformer中提取图形用于场景图生成

摘要: 场景图生成（SGG）是一项具有挑战性的任务，涉及检测物体并预测物体之间的关系。在开发了DETR之后，基于单阶段物体检测器的单阶段SGG模型得到了积极研究。然而，为了预测物体之间的关系，使用了复杂的建模，并且在物体检测器的多头自注意力中学习的物体查询之间的内在关系被忽视了。我们提出了一种轻量级的单阶段SGG模型，从DETR解码器的多头自注意力层中学习的各种关系中提取关系图。通过充分利用自注意力的副产品，可以利用浅层关系提取头有效地提取关系图。考虑到关系提取任务对物体检测任务的依赖性，我们提出了一种新颖的关系平滑技术，根据检测到的物体的质量自适应地调整关系标签。通过关系平滑，模型根据连续的课程进行训练，从而在训练开始时专注于物体检测任务，并随着物体检测性能逐渐提高而进行多任务学习。此外，我们提出了一项连接性预测任务，作为关系提取的辅助任务，用于预测物体对之间是否存在关系。我们展示了我们的方法对Visual Genome和Open Image V6数据集的有效性和效率。我们的代码可以在https://github.com/naver-ai/egtr 上公开获取。

更新时间: 2024-06-03 02:15:03

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.02072v4

BioFusionNet: Deep Learning-Based Survival Risk Stratification in ER+ Breast Cancer Through Multifeature and Multimodal Data Fusion

Breast cancer is a significant health concern affecting millions of women worldwide. Accurate survival risk stratification plays a crucial role in guiding personalised treatment decisions and improving patient outcomes. Here we present BioFusionNet, a deep learning framework that fuses image-derived features with genetic and clinical data to obtain a holistic profile and achieve survival risk stratification of ER+ breast cancer patients. We employ multiple self-supervised feature extractors (DINO and MoCoV3) pretrained on histopathological patches to capture detailed image features. These features are then fused by a variational autoencoder and fed to a self-attention network generating patient-level features. A co-dual-cross-attention mechanism combines the histopathological features with genetic data, enabling the model to capture the interplay between them. Additionally, clinical data is incorporated using a feed-forward network, further enhancing predictive performance and achieving comprehensive multimodal feature integration. Furthermore, we introduce a weighted Cox loss function, specifically designed to handle imbalanced survival data, which is a common challenge. Our model achieves a mean concordance index of 0.77 and a time-dependent area under the curve of 0.84, outperforming state-of-the-art methods. It predicts risk (high versus low) with prognostic significance for overall survival in univariate analysis (HR=2.99, 95% CI: 1.88--4.78, p<0.005), and maintains independent significance in multivariate analysis incorporating standard clinicopathological variables (HR=2.91, 95\% CI: 1.80--4.68, p<0.005).

Updated: 2024-06-03 02:14:12

标题: BioFusionNet: 基于深度学习的多特征和多模态数据融合在ER+乳腺癌中的生存风险分层

摘要: 乳腺癌是全球影响数百万妇女的重大健康问题。准确的生存风险分层在指导个性化治疗决策和改善患者预后方面发挥着至关重要的作用。在这里，我们提出了BioFusionNet，这是一个深度学习框架，将图像衍生特征与遗传和临床数据融合在一起，获得全面的概况，并实现ER+乳腺癌患者的生存风险分层。我们采用了多个自监督特征提取器（DINO和MoCoV3），它们在组织病理学片段上进行了预训练，以捕获详细的图像特征。然后，这些特征通过变分自动编码器融合，并馈送到自我关注网络生成患者级特征。一个共同的双交叉关注机制将组织病理学特征与遗传数据结合起来，使模型能够捕捉它们之间的相互作用。此外，临床数据使用前馈网络进行整合，进一步提高了预测性能，并实现了全面的多模式特征整合。此外，我们引入了一种加权Cox损失函数，专门设计用于处理不平衡的生存数据，这是一个常见的挑战。我们的模型实现了0.77的平均协调指数和0.84的时间依赖曲线下面积，超过了最先进的方法。它在单变量分析中预测了风险（高与低）对整体生存具有预后意义（HR=2.99，95% CI：1.88-4.78，p<0.005），并在包含标准临床病理变量的多变量分析中保持独立的显著性（HR=2.91，95\% CI：1.80-4.68，p<0.005）。

更新时间: 2024-06-03 02:14:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.10717v2

E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models to generate paired datasets used for training generative adversarial networks (GANs). This approach notably alleviates the stringent requirements typically imposed by high-end commercial GPUs for performing image editing with diffusion models. However, unlike text-to-image diffusion models, each distilled GAN is specialized for a specific image editing task, necessitating costly training efforts to obtain models for various concepts. In this work, we introduce and address a novel research direction: can the process of distilling GANs from diffusion models be made significantly more efficient? To achieve this goal, we propose a series of innovative techniques. First, we construct a base GAN model with generalized features, adaptable to different concepts through fine-tuning, eliminating the need for training from scratch. Second, we identify crucial layers within the base GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model. Third, we investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time. Extensive experiments show that we can efficiently empower GANs with the ability to perform real-time high-quality image editing on mobile devices with remarkably reduced training and storage costs for each concept.

Updated: 2024-06-03 02:09:38

标题: E$^{2}$GAN：高效训练用于图像到图像翻译的高效GAN

摘要: 一种非常有前途的方法，用于实现灵活的实时设备端图像编辑，是利用数据蒸馏，通过利用大规模文本到图像扩散模型生成配对数据集，用于训练生成对抗网络（GANs）。这种方法明显减轻了高端商用GPU通常施加的严格要求，用于使用扩散模型进行图像编辑。然而，与文本到图像扩散模型不同，每个蒸馏的GAN专门针对特定的图像编辑任务，需要昂贵的训练工作来获得各种概念的模型。在这项工作中，我们介绍并解决了一个新颖的研究方向：蒸馏GANs从扩散模型的过程能否变得更加高效？为了实现这一目标，我们提出了一系列创新技术。首先，我们构建了一个具有广义特征的基本GAN模型，通过微调适应不同概念，消除了从头开始训练的需要。其次，我们确定了基本GAN模型中关键层，并采用低秩适应（LoRA）与一个简单但有效的秩搜索过程，而不是对整个基本模型进行微调。第三，我们调查了微调所需的最小数据量，进一步减少了整体训练时间。大量实验证明，我们可以有效地赋予GANs在移动设备上执行实时高质量图像编辑的能力，同时减少每个概念的训练和存储成本。

更新时间: 2024-06-03 02:09:38

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.06127v2

GraphAny: A Foundation Model for Node Classification on Any Graph

Foundation models that can perform inference on any new task without requiring specific training have revolutionized machine learning in vision and language applications. However, applications involving graph-structured data remain a tough nut for foundation models, due to challenges in the unique feature- and label spaces associated with each graph. Traditional graph ML models such as graph neural networks (GNNs) trained on graphs cannot perform inference on a new graph with feature and label spaces different from the training ones. Furthermore, existing models learn functions specific to the training graph and cannot generalize to new graphs. In this work, we tackle these two challenges with a new foundational architecture for inductive node classification named GraphAny. GraphAny models inference on a new graph as an analytical solution to a LinearGNN, thereby solving the first challenge. To solve the second challenge, we learn attention scores for each node to fuse the predictions of multiple LinearGNNs. Specifically, the attention module is carefully parameterized as a function of the entropy-normalized distance-features between multiple LinearGNNs predictions to ensure generalization to new graphs. Empirically, GraphAny trained on the Wisconsin dataset with only 120 labeled nodes can effectively generalize to 30 new graphs with an average accuracy of 67.26\% in an inductive manner, surpassing GCN and GAT trained in the supervised regime, as well as other inductive baselines.

Updated: 2024-06-03 02:08:54

标题: GraphAny：一种用于任何图上节点分类的基础模型

摘要: 基于图结构数据的应用一直是基础模型的一个难题，因为每个图都有独特的特征和标签空间，传统的图机器学习模型如图神经网络（GNNs）在图上训练时无法对具有不同特征和标签空间的新图进行推断。本文提出了一种名为GraphAny的归纳节点分类的新基础架构，通过将对新图的推断建模为LinearGNN的解析解来解决第一个挑战。为了解决第二个挑战，我们学习了每个节点的注意力分数，以融合多个LinearGNN的预测。实验结果表明，仅在威斯康星数据集上训练的GraphAny可以有效地在归纳方式下推广到30个新图，平均准确率为67.26％，优于在监督模式下训练的GCN和GAT，以及其他归纳基线模型。

更新时间: 2024-06-03 02:08:54

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2405.20445v2

Consistency of semi-supervised learning, stochastic tug-of-war games, and the p-Laplacian

In this paper we give a broad overview of the intersection of partial differential equations (PDEs) and graph-based semi-supervised learning. The overview is focused on a large body of recent work on PDE continuum limits of graph-based learning, which have been used to prove well-posedness of semi-supervised learning algorithms in the large data limit. We highlight some interesting research directions revolving around consistency of graph-based semi-supervised learning, and present some new results on the consistency of $p$-Laplacian semi-supervised learning using the stochastic tug-of-war game interpretation of the $p$-Laplacian. We also present the results of some numerical experiments that illustrate our results and suggest directions for future work.

Updated: 2024-06-03 01:55:52

标题: 半监督学习的一致性、随机拉锯游戏和p-Laplacian

摘要: 在这篇论文中，我们对偏微分方程（PDEs）和基于图的半监督学习的交叉领域进行了广泛的概述。概述侧重于最近大量关于PDE连续极限在基于图的学习中的应用的研究成果，这些成果已被用来证明在大数据极限下半监督学习算法的良好性质。我们强调了围绕基于图的半监督学习一致性的一些有趣研究方向，并介绍了使用随机拉锯战游戏解释$p$-Laplacian的一致性的新结果。我们还展示了一些数值实验的结果，这些结果说明了我们的研究，并为未来工作提供了方向。

更新时间: 2024-06-03 01:55:52

领域: math.ST,cs.LG,cs.NA,math.AP,math.NA,math.PR,stat.TH,91A05, 68T05, 68Q32, 35D40, 35J60, 65N06

下载: http://arxiv.org/abs/2401.07463v2

DFA-RAG: Conversational Semantic Router for Large Language Model with Definite Finite Automaton

This paper introduces the retrieval-augmented large language model with Definite Finite Automaton (DFA-RAG), a novel framework designed to enhance the capabilities of conversational agents using large language models (LLMs). Traditional LLMs face challenges in generating regulated and compliant responses in special scenarios with predetermined response guidelines, like emotional support and customer service. Our framework addresses these challenges by embedding a Definite Finite Automaton (DFA), learned from training dialogues, within the LLM. This structured approach acts as a semantic router which enables the LLM to adhere to a deterministic response pathway. The routing is achieved by the retrieval-augmentation generation (RAG) strategy, which carefully selects dialogue examples aligned with the current conversational context. The advantages of DFA-RAG include an interpretable structure through human-readable DFA, context-aware retrieval for responses in conversations, and plug-and-play compatibility with existing LLMs. Extensive benchmarks validate DFA-RAG's effectiveness, indicating its potential as a valuable contribution to the conversational agent.

Updated: 2024-06-03 01:40:46

标题: DFA-RAG：具有确定有限自动机的大型语言模型的会话语义路由器

摘要: 这篇论文介绍了一种检索增强的大型语言模型与确定性有限自动机（DFA-RAG），这是一个设计用来增强对话系统能力的新框架，使用大型语言模型（LLMs）。传统的LLMs在生成受监管和合规响应上面临挑战，特别是在具有预定响应指南的特殊场景中，比如情感支持和客户服务。我们的框架通过将从训练对话中学习的确定性有限自动机（DFA）嵌入到LLM中来解决这些挑战。这种结构化方法充当语义路由器，使LLM能够遵循确定性的响应路径。路由是通过检索增强生成（RAG）策略实现的，该策略精心选择与当前对话上下文对齐的对话示例。DFA-RAG的优势包括通过可读的DFA实现可解释的结构，用于对话中的响应的上下文感知检索，以及与现有LLMs的即插即用兼容性。广泛的基准测试验证了DFA-RAG的有效性，表明它对对话系统有潜在的有价值贡献。

更新时间: 2024-06-03 01:40:46

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.04411v2

Understanding Sample Generation Strategies for Learning Heuristic Functions in Classical Planning

We study the problem of learning good heuristic functions for classical planning tasks with neural networks based on samples represented by states with their cost-to-goal estimates. The heuristic function is learned for a state space and goal condition with the number of samples limited to a fraction of the size of the state space, and must generalize well for all states of the state space with the same goal condition. Our main goal is to better understand the influence of sample generation strategies on the performance of a greedy best-first heuristic search (GBFS) guided by a learned heuristic function. In a set of controlled experiments, we find that two main factors determine the quality of the learned heuristic: the algorithm used to generate the sample set and how close the sample estimates to the perfect cost-to-goal are. These two factors are dependent: having perfect cost-to-goal estimates is insufficient if the samples are not well distributed across the state space. We also study other effects, such as adding samples with high-value estimates. Based on our findings, we propose practical strategies to improve the quality of learned heuristics: three strategies that aim to generate more representative states and two strategies that improve the cost-to-goal estimates. Our practical strategies result in a learned heuristic that, when guiding a GBFS algorithm, increases by more than 30% the mean coverage compared to a baseline learned heuristic.

Updated: 2024-06-03 01:24:38

标题: 理解古典规划中学习启发式函数的样本生成策略

摘要: 我们研究了使用基于神经网络的样本学习经典规划任务的良好启发函数的问题，这些样本由带有其成本到目标估计的状态表示。启发函数针对一个状态空间和目标条件进行学习，样本数量限制为状态空间大小的一小部分，并且必须很好地泛化到具有相同目标条件的状态空间的所有状态。我们的主要目标是更好地理解样本生成策略对由学习启发函数引导的贪婪最佳优先启发式搜索（GBFS）性能的影响。在一系列受控实验中，我们发现两个主要因素决定了学习启发函数的质量：用于生成样本集的算法以及样本估计与完美成本到目标的接近程度。这两个因素是相关的：如果样本在状态空间中分布不均匀，即使具有完美的成本到目标估计也是不足的。我们还研究了其他影响，比如添加具有高价值估计的样本。基于我们的发现，我们提出了改进学习启发式函数质量的实际策略：三种旨在生成更具代表性的状态的策略以及两种改进成本到目标估计的策略。我们的实际策略导致学习的启发函数在引导GBFS算法时，与基线学习启发函数相比，平均覆盖率增加了30%以上。

更新时间: 2024-06-03 01:24:38

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2211.13316v3

The Heterogeneous Productivity Effects of Generative AI

We analyse the individual productivity effects of Italy's ban on ChatGPT, a generative pretrained transformer chatbot. We compile data on the daily coding output quantity and quality of over 36,000 GitHub users in Italy and other European countries and combine these data with the sudden announcement of the ban in a difference-in-differences framework. Among the affected users in Italy, we find a short-term increase in output quantity and quality for less experienced users and a decrease in productivity on more routine tasks for experienced users.

Updated: 2024-06-03 01:21:01

标题: 生成式人工智能的异质生产力影响

摘要: 我们分析了意大利对ChatGPT进行禁令的个人生产力影响。我们收集了意大利和其他欧洲国家超过36,000名GitHub用户的日常编码产出数量和质量数据，并将这些数据与禁令突然宣布的时间结合在一起，采用差异中的差异框架进行分析。在受影响的意大利用户中，我们发现经验较少的用户在短期内产出数量和质量有所提高，而经验丰富的用户在更常规任务上的生产力下降。

更新时间: 2024-06-03 01:21:01

领域: econ.GN,cs.AI,q-fin.EC

下载: http://arxiv.org/abs/2403.01964v2

SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning

Large Language Models (LLMs) have highlighted the necessity of effective unlearning mechanisms to comply with data regulations and ethical AI practices. LLM unlearning aims at removing undesired data influences and associated model capabilities without compromising utility out of the scope of unlearning. While interest in studying LLM unlearning is growing,the impact of the optimizer choice for LLM unlearning remains under-explored. In this work, we shed light on the significance of optimizer selection in LLM unlearning for the first time, establishing a clear connection between {second-order optimization} and influence unlearning (a classical approach using influence functions to update the model for data influence removal). This insight propels us to develop a second-order unlearning framework, termed SOUL, built upon the second-order clipped stochastic optimization (Sophia)-based LLM training method. SOUL extends the static, one-shot model update using influence unlearning to a dynamic, iterative unlearning process. Our extensive experiments show that SOUL consistently outperforms conventional first-order methods across various unlearning tasks, models, and metrics, suggesting the promise of second-order optimization in providing a scalable and easily implementable solution for LLM unlearning.

Updated: 2024-06-03 01:10:53

标题: SOUL：释放LLM反学习的二阶优化力量

摘要: 大型语言模型（LLMs）已经突显了有效的遗忘机制对于遵守数据法规和道德AI实践的必要性。LLM遗忘旨在去除不希望的数据影响和相关模型能力，同时不损害遗忘范围之外的实用性。虽然对LLM遗忘的研究兴趣日益增长，但优化器选择对LLM遗忘的影响仍未得到充分探讨。在这项工作中，我们首次揭示了LLM遗忘中优化器选择的重要性，建立了{二阶优化}和影响遗忘之间的明确联系（一种使用影响函数更新模型以去除数据影响的传统方法）。这一见解促使我们开发了一个二阶遗忘框架，称为SOUL，基于二阶裁剪随机优化（Sophia）的LLM训练方法。SOUL将静态、一次性模型更新扩展为动态、迭代的遗忘过程。我们的广泛实验表明，SOUL在各种遗忘任务、模型和指标上始终优于传统的一阶方法，表明二阶优化在提供LLM遗忘可扩展且易实现的解决方案方面具有潜力。

更新时间: 2024-06-03 01:10:53

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.18239v3

Assessing the Adversarial Security of Perceptual Hashing Algorithms

Perceptual hashing algorithms (PHAs) are utilized extensively for identifying illegal online content. Given their crucial role in sensitive applications, understanding their security strengths and weaknesses is critical. This paper compares three major PHAs deployed widely in practice: PhotoDNA, PDQ, and NeuralHash, and assesses their robustness against three typical attacks: normal image editing attacks, malicious adversarial attacks, and hash inversion attacks. Contrary to prevailing studies, this paper reveals that these PHAs exhibit resilience to black-box adversarial attacks when realistic constraints regarding the distortion and query budget are applied, attributed to the unique property of random hash variations. Moreover, this paper illustrates that original images can be reconstructed from the hash bits, raising significant privacy concerns. By comprehensively exposing their security vulnerabilities, this paper contributes to the ongoing efforts aimed at enhancing the security of PHAs for effective deployment.

Updated: 2024-06-03 01:04:50

标题: 评估感知哈希算法的对抗安全性

摘要: 感知哈希算法（PHAs）被广泛应用于识别非法在线内容。鉴于它们在敏感应用中的关键作用，了解它们的安全优势和劣势至关重要。本文比较了实践中广泛部署的三种主要PHAs：PhotoDNA、PDQ和NeuralHash，并评估它们对三种典型攻击的鲁棒性：正常图像编辑攻击、恶意对抗性攻击和哈希反演攻击。与现有研究相反，本文揭示了这些PHAs在应用了关于扭曲和查询预算的现实约束时对黑盒对抗性攻击表现出的韧性，这归因于随机哈希变化的独特属性。此外，本文说明了原始图像可以从哈希位中重建，引发了重大的隐私问题。通过全面揭示它们的安全漏洞，本文有助于不断努力以增强PHAs的安全性，以便有效部署。

更新时间: 2024-06-03 01:04:50

领域: cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.00918v1

Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining

The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets. We critically review this approach. We primarily question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving. We caution that publicizing these models pretrained on Web data as "private" could lead to harm and erode the public's trust in differential privacy as a meaningful definition of privacy. Beyond the privacy considerations of using public data, we further question the utility of this paradigm. We scrutinize whether existing machine learning benchmarks are appropriate for measuring the ability of pretrained models to generalize to sensitive domains, which may be poorly represented in public Web data. Finally, we notice that pretraining has been especially impactful for the largest available models -- models sufficiently large to prohibit end users running them on their own devices. Thus, deploying such models today could be a net loss for privacy, as it would require (private) data to be outsourced to a more compute-powerful third party. We conclude by discussing potential paths forward for the field of private learning, as public pretraining becomes more popular and powerful.

Updated: 2024-06-03 01:03:49

标题: 位置：考虑具有大规模公共预训练的差分隐私学习

摘要: 不同ially private 机器学习的性能可以通过利用在大型公共数据集上预训练的非私有模型的迁移学习能力显著提升。我们对这种方法进行了批判性审查。我们主要质疑是否应将大规模网络抓取数据集的使用视为保护差分隐私。我们警告称，将在网络数据上预训练的这些模型公开为“私有”可能会导致伤害，并破坏公众对差分隐私作为隐私有意义定义的信任。除了使用公共数据的隐私考虑外，我们进一步质疑这种范式的实用性。我们审查现有的机器学习基准是否适合衡量预训练模型推广到可能在公共网络数据中得到较差代表的敏感领域的能力。最后，我们注意到，预训练对于最大的可用模型特别有影响力——这些模型足够大，以至于终端用户无法在自己的设备上运行它们。因此，如今部署这样的模型可能对隐私造成净损失，因为这将要求（私有）数据被外包给更有计算能力的第三方。我们最后讨论了隐私学习领域未来可能的发展方向，随着公共预训练变得越来越受欢迎和强大。

更新时间: 2024-06-03 01:03:49

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2212.06470v2

Provably Stable Feature Rankings with SHAP and LIME

Feature attributions are ubiquitous tools for understanding the predictions of machine learning models. However, the calculation of popular methods for scoring input variables such as SHAP and LIME suffers from high instability due to random sampling. Leveraging ideas from multiple hypothesis testing, we devise attribution methods that ensure the most important features are ranked correctly with high probability. Given SHAP estimates from KernelSHAP or Shapley Sampling, we demonstrate how to retrospectively verify the number of stable rankings. Further, we introduce efficient sampling algorithms for SHAP and LIME that guarantee the $K$ highest-ranked features have the proper ordering. Finally, we show how to adapt these local feature attribution methods for the global importance setting.

Updated: 2024-06-03 00:49:43

标题: 通过SHAP和LIME可证明稳定的特征排名

摘要: 特征归因是用于理解机器学习模型预测的普遍工具。然而，对于评分输入变量的流行方法（如SHAP和LIME）的计算由于随机抽样而导致高度不稳定。借鉴多重假设检验的思想，我们设计了确保最重要特征被正确排名的归因方法，且具有高概率。给定来自KernelSHAP或Shapley Sampling的SHAP估计，我们展示了如何回顾地验证稳定排名的数量。此外，我们引入了SHAP和LIME的高效抽样算法，确保$K$个最高排名的特征具有正确的顺序。最后，我们展示了如何将这些局部特征归因方法调整为全局重要性设置。

更新时间: 2024-06-03 00:49:43

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2401.15800v2

Wasserstein gradient flow for optimal probability measure decomposition

We examine the infinite-dimensional optimization problem of finding a decomposition of a probability measure into K probability sub-measures to minimize specific loss functions inspired by applications in clustering and user grouping. We analytically explore the structures of the support of optimal sub-measures and introduce algorithms based on Wasserstein gradient flow, demonstrating their convergence. Numerical results illustrate the implementability of our algorithms and provide further insights.

Updated: 2024-06-03 00:47:32

标题: Wasserstein梯度流用于最佳概率测度分解

摘要: 我们研究了将概率测度分解为K个概率子测度的无限维优化问题，以最小化受到聚类和用户分组应用启发的特定损失函数。我们在分析中探索了最优子测度支持的结构，并引入基于Wasserstein梯度流的算法，证明了它们的收敛性。数值结果展示了我们算法的可实现性，并提供了进一步的见解。

更新时间: 2024-06-03 00:47:32

领域: math.OC,cs.AI

下载: http://arxiv.org/abs/2406.00914v1

Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates

Distributed and federated learning algorithms and techniques associated primarily with minimization problems. However, with the increase of minimax optimization and variational inequality problems in machine learning, the necessity of designing efficient distributed/federated learning approaches for these problems is becoming more apparent. In this paper, we provide a unified convergence analysis of communication-efficient local training methods for distributed variational inequality problems (VIPs). Our approach is based on a general key assumption on the stochastic estimates that allows us to propose and analyze several novel local training algorithms under a single framework for solving a class of structured non-monotone VIPs. We present the first local gradient descent-accent algorithms with provable improved communication complexity for solving distributed variational inequalities on heterogeneous data. The general algorithmic framework recovers state-of-the-art algorithms and their sharp convergence guarantees when the setting is specialized to minimization or minimax optimization problems. Finally, we demonstrate the strong performance of the proposed algorithms compared to state-of-the-art methods when solving federated minimax optimization problems.

Updated: 2024-06-03 00:32:53

标题: 通信高效的梯度下降-加速方法用于分布式变分不等式：统一分析和本地更新

摘要: 分布式和联合学习算法主要与最小化问题相关。然而，随着机器学习中极小化优化和变分不等式问题的增加，设计高效的分布式/联合学习方法来解决这些问题的必要性变得更加明显。在本文中，我们为分布式变分不等式问题（VIPs）提供了一种通信高效的本地训练方法的统一收敛分析。我们的方法基于对随机估计的一般关键假设，这使我们能够在一个单一框架下提出和分析几种新颖的本地训练算法，用于解决一类结构化的非单调VIPs。我们提出了第一个具有可证明改进通信复杂性的本地梯度下降-腾挪算法，用于解决异构数据上的分布式变分不等式。当设置专门化为最小化或极小化优化问题时，该通用算法框架可以恢复最先进的算法及其尖锐的收敛保证。最后，我们展示了所提出算法在解决联合极小化优化问题时与最先进方法相比的优异性能。

更新时间: 2024-06-03 00:32:53

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2306.05100v2

TimeCMA: Towards LLM-Empowered Time Series Forecasting via Cross-Modality Alignment

The widespread adoption of scalable mobile sensing has led to large amounts of time series data for real-world applications. A fundamental application is multivariate time series forecasting (MTSF), which aims to predict future time series values based on historical observations. Existing MTSF methods suffer from limited parameterization and small-scale training data. Recently, Large language models (LLMs) have been introduced in time series, which achieve promising forecasting performance but incur heavy computational costs. To solve these challenges, we propose TimeCMA, an LLM-empowered framework for time series forecasting with cross-modality alignment. We design a dual-modality encoding module with two branches, where the time series encoding branch extracts relatively low-quality yet pure embeddings of time series through an inverted Transformer. In addition, the LLM-empowered encoding branch wraps the same time series as prompts to obtain high-quality yet entangled prompt embeddings via a Pre-trained LLM. Then, we design a cross-modality alignment module to retrieve high-quality and pure time series embeddings from the prompt embeddings. Moreover, we develop a time series forecasting module to decode the aligned embeddings while capturing dependencies among multiple variables for forecasting. Notably, we tailor the prompt to encode sufficient temporal information into a last token and design the last token embedding storage to reduce computational costs. Extensive experiments on real data offer insight into the accuracy and efficiency of the proposed framework.

Updated: 2024-06-03 00:27:29

标题: TimeCMA: 通过交叉模态对齐实现基于LLM的时间序列预测

摘要: 可扩展移动传感技术的广泛应用导致了大量用于现实世界应用的时间序列数据。一个基础应用是多变量时间序列预测（MTSF），旨在基于历史观测来预测未来时间序列值。现有的MTSF方法存在参数化有限和规模小的训练数据的问题。最近，时间序列中引入了大型语言模型（LLMs），这些模型取得了有希望的预测性能，但需要大量的计算成本。为了解决这些挑战，我们提出了TimeCMA，这是一个基于LLM的框架，用于具有跨模态对齐的时间序列预测。我们设计了一个双模态编码模块，其中时间序列编码分支通过反向Transformer提取相对低质量但纯净的时间序列嵌入。此外，LLM增强的编码分支将相同的时间序列作为提示，通过预训练的LLM获得高质量但纠缠在一起的提示嵌入。然后，我们设计了一个跨模态对齐模块，从提示嵌入中检索高质量和纯净的时间序列嵌入。此外，我们开发了一个时间序列预测模块，用于解码对齐的嵌入，同时捕捉多个变量之间的依赖关系进行预测。值得注意的是，我们定制了提示以将足够的时间信息编码到最后一个标记中，并设计了最后一个标记嵌入存储以降低计算成本。对真实数据的广泛实验提供了对所提出框架的准确性和效率的见解。

更新时间: 2024-06-03 00:27:29

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.01638v1

A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models

Antibodies are crucial proteins produced by the immune system to eliminate harmful foreign substances and have become pivotal therapeutic agents for treating human diseases. To accelerate the discovery of antibody therapeutics, there is growing interest in constructing language models using antibody sequences. However, the applicability of pre-trained language models for antibody discovery has not been thoroughly evaluated due to the scarcity of labeled datasets. To overcome these limitations, we introduce AVIDa-SARS-CoV-2, a dataset featuring the antigen-variable domain of heavy chain of heavy chain antibody (VHH) interactions obtained from two alpacas immunized with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike proteins. AVIDa-SARS-CoV-2 includes binary labels indicating the binding or non-binding of diverse VHH sequences to 12 SARS-CoV-2 mutants, such as the Delta and Omicron variants. Furthermore, we release VHHCorpus-2M, a pre-training dataset for antibody language models, containing over two million VHH sequences. We report benchmark results for predicting SARS-CoV-2-VHH binding using VHHBERT pre-trained on VHHCorpus-2M and existing general protein and antibody-specific pre-trained language models. These results confirm that AVIDa-SARS-CoV-2 provides valuable benchmarks for evaluating the representation capabilities of antibody language models for binding prediction, thereby facilitating the development of AI-driven antibody discovery. The datasets are available at https://datasets.cognanous.com.

Updated: 2024-06-03 00:17:05

标题: 一个SARS-CoV-2相互作用数据集和VHH序列语料库，用于抗体语言模型

摘要: 抗体是免疫系统产生的关键蛋白质，用于消除有害的外来物质，并已成为治疗人类疾病的关键治疗药物。为加速抗体治疗药物的发现，人们越来越关注利用抗体序列构建语言模型。然而，由于标记数据集稀缺，预训练语言模型在抗体发现中的适用性尚未得到充分评估。为了克服这些限制，我们介绍了AVIDa-SARS-CoV-2，这是一个数据集，包括从两只被严重急性呼吸综合征冠状病毒2（SARS-CoV-2）刺突蛋白免疫的羊驼中获得的重链抗体（VHH）的抗原可变域相互作用。AVIDa-SARS-CoV-2包含二元标签，指示各种VHH序列与12种SARS-CoV-2突变体（如Delta和Omicron变体）的结合或非结合。此外，我们发布了VHHCorpus-2M，一个用于抗体语言模型的预训练数据集，包含超过两百万个VHH序列。我们报告了使用在VHHCorpus-2M上预训练的VHHBERT和现有的一般蛋白质和抗体特定预训练语言模型进行SARS-CoV-2-VHH结合预测的基准结果。这些结果证实了AVIDa-SARS-CoV-2为评估抗体语言模型的表示能力提供了有价值的基准，从而促进了基于人工智能的抗体发现的发展。这些数据集可在https://datasets.cognanous.com上获得。

更新时间: 2024-06-03 00:17:05

领域: cs.LG,q-bio.GN

下载: http://arxiv.org/abs/2405.18749v2