Arxiv Day: Article

Rethinking Robustness Assessment: Adversarial Attacks on Learning-based Quadrupedal Locomotion Controllers

Legged locomotion has recently achieved remarkable success with the progress of machine learning techniques, especially deep reinforcement learning (RL). Controllers employing neural networks have demonstrated empirical and qualitative robustness against real-world uncertainties, including sensor noise and external perturbations. However, formally investigating the vulnerabilities of these locomotion controllers remains a challenge. This difficulty arises from the requirement to pinpoint vulnerabilities across a long-tailed distribution within a high-dimensional, temporally sequential space. As a first step towards quantitative verification, we propose a computational method that leverages sequential adversarial attacks to identify weaknesses in learned locomotion controllers. Our research demonstrates that, even state-of-the-art robust controllers can fail significantly under well-designed, low-magnitude adversarial sequence. Through experiments in simulation and on the real robot, we validate our approach's effectiveness, and we illustrate how the results it generates can be used to robustify the original policy and offer valuable insights into the safety of these black-box policies. Project page: https://fanshi14.github.io/me/rss24.html

Updated: 2024-05-30 23:54:53

标题: 重新思考鲁棒性评估：对基于学习的四足步行控制器的对抗性攻击

摘要: 四肢运动最近在机器学习技术的进展中取得了显著的成功，特别是深度强化学习（RL）。采用神经网络的控制器已经证明在真实世界的不确定性中，包括传感器噪声和外部干扰方面具有经验和质量上的稳健性。然而，正式调查这些运动控制器的脆弱性仍然是一个挑战。这种困难来自于需要在高维度、时间顺序空间中准确确定长尾分布中的脆弱性的要求。作为量化验证的第一步，我们提出了一种计算方法，利用顺序对抗攻击来识别学习到的运动控制器的弱点。我们的研究表明，即使是最先进的稳健控制器，在设计良好、低幅度的对抗序列下也可能显著失败。通过在仿真环境和真实机器人上进行实验，我们验证了我们方法的有效性，并展示了它生成的结果如何可以用来加强原始策略并提供有关这些黑箱策略安全性的宝贵见解。项目页面：https://fanshi14.github.io/me/rss24.html

更新时间: 2024-05-30 23:54:53

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2405.12424v2

On the Connection Between Non-negative Matrix Factorization and Latent Dirichlet Allocation

Non-negative matrix factorization with the generalized Kullback-Leibler divergence (NMF) and latent Dirichlet allocation (LDA) are two popular approaches for dimensionality reduction of non-negative data. Here, we show that NMF with $\ell_1$ normalization constraints on the columns of both matrices of the decomposition and a Dirichlet prior on the columns of one matrix is equivalent to LDA. To show this, we demonstrate that explicitly accounting for the scaling ambiguity of NMF by adding $\ell_1$ normalization constraints to the optimization problem allows a joint update of both matrices in the widely used multiplicative updates (MU) algorithm. When both of the matrices are normalized, the joint MU algorithm leads to probabilistic latent semantic analysis (PLSA), which is LDA without a Dirichlet prior. Our approach of deriving joint updates for NMF also reveals that a Lasso penalty on one matrix together with an $\ell_1$ normalization constraint on the other matrix is insufficient to induce any sparsity.

Updated: 2024-05-30 23:54:17

标题: 关于非负矩阵分解与潜在狄利克雷分布之间的关系

摘要: 使用广义Kullback-Leibler散度（NMF）和潜在Dirichlet分配（LDA）是对非负数据进行降维的两种流行方法。在这里，我们展示了在分解的两个矩阵的列上具有$\ell_1$规范化约束和一个狄利克雷先验的NMF等价于LDA。为了证明这一点，我们展示了通过在优化问题中添加$\ell_1$规范化约束明确考虑NMF的缩放模糊性，允许在广泛使用的乘法更新（MU）算法中对两个矩阵进行联合更新。当两个矩阵都被规范化时，联合MU算法导致概率潜在语义分析（PLSA），这是没有狄利克雷先验的LDA。我们为NMF推导联合更新的方法还揭示了对一个矩阵施加Lasso惩罚，同时对另一个矩阵施加$\ell_1$规范化约束不足以引起任何稀疏性。

更新时间: 2024-05-30 23:54:17

领域: cs.LG,stat.ML,62H30, 15A23, 62H22, 62F15, 68W40

下载: http://arxiv.org/abs/2405.20542v1

Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the perplexity of a larger model can yield high-quality data, we investigate whether smaller models can be used for perplexity-based pruning and how pruning is affected by the domain composition of the data being pruned. We demonstrate that for multiple dataset compositions, perplexity-based pruning of pretraining data can \emph{significantly} improve downstream task performance: pruning based on perplexities computed with a 125 million parameter model improves the average performance on downstream tasks of a 3 billion parameter model by up to 2.04 and achieves up to a $1.45\times$ reduction in pretraining steps to reach commensurate baseline performance. Furthermore, we demonstrate that such perplexity-based data pruning also yields downstream performance gains in the over-trained and data-constrained regimes.

Updated: 2024-05-30 23:50:20

标题: 被困扰的困惑：基于困惑度的小参考模型数据修剪

摘要: 在这项工作中，我们调查小型语言模型是否能够确定大规模文本数据集的高质量子集，从而提高较大语言模型的性能。尽管现有研究表明，基于较大模型的困惑度进行修剪可以产生高质量数据，但我们调查了较小模型是否可以用于基于困惑度的修剪，以及修剪如何受到正在被修剪的数据的领域构成的影响。我们证明，对于多个数据集构成，基于困惑度的预训练数据修剪可以显著提高下游任务性能：基于一个1.25亿参数模型计算的困惑度进行修剪，可以将一个30亿参数模型在下游任务的平均性能提高高达2.04，并实现了相应基线性能的预训练步骤减少高达1.45倍。此外，我们证明，这种基于困惑度的数据修剪还可以在过度训练和数据受限制的情况下提高下游性能。

更新时间: 2024-05-30 23:50:20

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.20541v1

Fully Unconstrained Online Learning

We provide an online learning algorithm that obtains regret $G\|w_\star\|\sqrt{T\log(\|w_\star\|G\sqrt{T})} + \|w_\star\|^2 + G^2$ on $G$-Lipschitz convex losses for any comparison point $w_\star$ without knowing either $G$ or $\|w_\star\|$. Importantly, this matches the optimal bound $G\|w_\star\|\sqrt{T}$ available with such knowledge (up to logarithmic factors), unless either $\|w_\star\|$ or $G$ is so large that even $G\|w_\star\|\sqrt{T}$ is roughly linear in $T$. Thus, it matches the optimal bound in all cases in which one can achieve sublinear regret, which arguably most "interesting" scenarios.

Updated: 2024-05-30 23:41:01

标题: 完全无约束的在线学习

摘要: 我们提供了一种在线学习算法，该算法在$G$-Lipschitz凸损失函数上获得遗憾$G\|w_\star\|\sqrt{T\log(\|w_\star\|G\sqrt{T})} + \|w_\star\|^2 + G^2$，而不知道$G$或$\|w_\star\|$。重要的是，这与具有此类知识的最优界$G\|w_\star\|\sqrt{T}$相匹配（至多对数因子），除非$\|w_\star\|$或$G$太大，以至于甚至$G\|w_\star\|\sqrt{T}$在$T$中大致是线性的。因此，在所有可以实现次线性遗憾的情况下，它匹配了最优界，这可能是最多"有趣"的情景。

更新时间: 2024-05-30 23:41:01

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2405.20540v1

SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

Reinforcement learning (RL) is an actively growing field that is seeing increased usage in real-world, safety-critical applications -- making it paramount to ensure the robustness of RL algorithms against adversarial attacks. In this work we explore a particularly stealthy form of training-time attacks against RL -- backdoor poisoning. Here the adversary intercepts the training of an RL agent with the goal of reliably inducing a particular action when the agent observes a pre-determined trigger at inference time. We uncover theoretical limitations of prior work by proving their inability to generalize across domains and MDPs. Motivated by this, we formulate a novel poisoning attack framework which interlinks the adversary's objectives with those of finding an optimal policy -- guaranteeing attack success in the limit. Using insights from our theoretical analysis we develop ``SleeperNets'' as a universal backdoor attack which exploits a newly proposed threat model and leverages dynamic reward poisoning techniques. We evaluate our attack in 6 environments spanning multiple domains and demonstrate significant improvements in attack success over existing methods, while preserving benign episodic return.

Updated: 2024-05-30 23:31:25

标题: SleeperNets: 对强化学习代理的通用后门毒化攻击

摘要: 强化学习（RL）是一个不断发展的领域，越来越多地在现实世界中的安全关键应用中使用，这使得确保RL算法对抗对抗性攻击的鲁棒性至关重要。在这项工作中，我们探讨了一种特别隐蔽的训练时攻击RL的形式 - 后门污染。在这种情况下，对手拦截RL代理的训练，目的是在代理看到预先确定的触发器时可靠地引发特定动作。我们通过证明它们无法在不同领域和MDP之间推广来揭示先前工作的理论限制。在此基础上，我们制定了一种新颖的污染攻击框架，将对手的目标与寻找最佳策略联系起来，从而保证攻击在极限下成功。利用我们理论分析的见解，我们开发了“SleeperNets”作为一种通用的后门攻击，利用了一种新提出的威胁模型，并利用了动态奖励污染技术。我们在6个跨越多个领域的环境中评估了我们的攻击，并展示了攻击成功与现有方法相比的显著改进，同时保留了良性的情节回报。

更新时间: 2024-05-30 23:31:25

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2405.20539v1

An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration

In out-of-distribution (OOD) generalization tasks, fine-tuning pre-trained models has become a prevalent strategy. Different from most prior work that has focused on advancing learning algorithms, we systematically examined how pre-trained model size, pre-training dataset size, and training strategies impact generalization and uncertainty calibration on downstream tasks. We evaluated 100 models across diverse pre-trained model sizes, \update{five} pre-training datasets, and five data augmentations through extensive experiments on four distribution shift datasets totaling over 120,000 GPU hours. Our results demonstrate the significant impact of pre-trained model selection, with optimal choices substantially improving OOD accuracy over algorithm improvement alone. We find larger models and bigger pre-training data improve OOD performance and calibration, in contrast to some prior studies that found modern deep networks to calibrate worse than classical shallow models. Our work underscores the overlooked importance of pre-trained model selection for out-of-distribution generalization and calibration.

Updated: 2024-05-30 23:30:02

标题: 一个预训练模型选择的实证研究：针对超出分布泛化和校准的情况

摘要: 在超出分布（OOD）泛化任务中，微调预训练模型已经成为一种普遍的策略。与大多数先前的工作集中在推进学习算法不同，我们系统地研究了预训练模型大小、预训练数据集大小和训练策略对下游任务的泛化和不确定性校准的影响。我们通过对四个分布转移数据集进行了广泛的实验，评估了100个模型，涵盖了各种预训练模型大小、五个预训练数据集和五种数据增强方法，总计超过120,000个GPU小时。我们的结果表明，预训练模型选择的重要影响，最佳选择显著提高了OOD准确性，超过了仅仅改进算法的效果。我们发现，更大的模型和更大的预训练数据提高了OOD性能和校准性，与一些先前的研究相反，这些研究发现现代深度网络的校准性比传统的浅层模型差。我们的工作强调了在超出分布泛化和校准中被忽视的预训练模型选择的重要性。

更新时间: 2024-05-30 23:30:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2307.08187v3

Q-learning as a monotone scheme

Stability issues with reinforcement learning methods persist. To better understand some of these stability and convergence issues involving deep reinforcement learning methods, we examine a simple linear quadratic example. We interpret the convergence criterion of exact Q-learning in the sense of a monotone scheme and discuss consequences of function approximation on monotonicity properties.

Updated: 2024-05-30 23:22:36

标题: Q学习作为一个单调方案

摘要: 强化学习方法中存在稳定性问题。为了更好地理解涉及深度强化学习方法的一些稳定性和收敛问题，我们研究了一个简单的线性二次示例。我们将精确Q学习的收敛准则解释为单调方案，并讨论函数逼近对单调性属性的影响。

更新时间: 2024-05-30 23:22:36

领域: cs.LG

下载: http://arxiv.org/abs/2405.20538v1

Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning

Instruction Fine-Tuning (IFT) significantly enhances the zero-shot capabilities of pretrained Large Language Models (LLMs). While coding data is known to boost reasoning abilities during LLM pretraining, its role in activating internal reasoning capacities during IFT remains understudied. This paper investigates a key question: How does coding data impact LLMs' reasoning capacities during the IFT stage? To explore this, we thoroughly examine the impact of coding data across different coding data proportions, model families, sizes, and reasoning domains, from various perspectives. Specifically, we create three IFT datasets with increasing coding data proportions, fine-tune six LLM backbones across different families and scales on these datasets, evaluate the tuned models' performance across twelve tasks in three reasoning domains, and analyze the outcomes from three broad-to-granular perspectives: overall, domain-level, and task-specific. Our holistic analysis provides valuable insights in each perspective. First, coding data tuning enhances the overall reasoning capabilities of LLMs across different model families and scales. Moreover, the effect of coding data varies among different domains but shows consistent trends across model families and scales within each domain. Additionally, coding data generally yields comparable task-specific benefits across different model families, with the optimal coding data proportions in IFT datasets being task-specific.

Updated: 2024-05-30 23:20:25

标题: 揭示编码数据指导微调对大型语言模型推理的影响

摘要: 指导微调（IFT）显著增强了预训练大型语言模型（LLMs）的零射击能力。虽然编码数据被认为在LLM预训练期间提高推理能力，但其在IFT期间激活内部推理能力的作用仍未得到充分研究。本文探讨了一个关键问题：编码数据如何影响LLMs在IFT阶段的推理能力？为了探索这一问题，我们从不同的编码数据比例、模型系列、规模和推理领域的各个角度全面研究了编码数据的影响。具体地，我们创建了三个具有增加编码数据比例的IFT数据集，在这些数据集上微调了六个不同系列和规模的LLM骨干模型，评估了微调模型在三个推理领域的十二项任务中的表现，并从整体、领域级和任务特定的角度分析了结果。我们的整体分析为每个角度提供了宝贵的见解。首先，编码数据微调提高了LLMs在不同模型系列和规模上的整体推理能力。此外，编码数据的效果在不同领域之间有所不同，但在每个领域内不同模型系列和规模之间显示出一致的趋势。此外，编码数据通常在不同模型系列之间产生可比较的任务特定收益，在IFT数据集中最佳编码数据比例是任务特定的。

更新时间: 2024-05-30 23:20:25

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20535v1

Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning

An exciting and promising frontier for Deep Reinforcement Learning (DRL) is its application to real-world robotic systems. While modern DRL approaches achieved remarkable successes in many robotic scenarios (including mobile robotics, surgical assistance, and autonomous driving) unpredictable and non-stationary environments can pose critical challenges to such methods. These features can significantly undermine fundamental requirements for a successful training process, such as the Markovian properties of the transition model. To address this challenge, we propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and DRL. In more detail, we show that our benchmarking environment is problematic even for state-of-the-art DRL approaches that may struggle to generate reliable policies in terms of generalization power and safety. Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques (such as curriculum learning and learnable hyperparameters). Our extensive empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results. Our simulation environment and training baselines are freely available to facilitate further research on this open problem and encourage collaboration in the field.

Updated: 2024-05-30 23:20:23

标题: 水下导航：深度强化学习的挑战性基准

摘要: 深度强化学习（DRL）在现实世界机器人系统中的应用是一个令人兴奋和有前途的领域。虽然现代DRL方法在许多机器人场景（包括移动机器人、手术辅助和自动驾驶）中取得了显著的成功，但不可预测的非稳态环境可能对这些方法构成重大挑战。这些特点可能会严重削弱成功训练过程的基本要求，如转换模型的马尔可夫特性。为了应对这一挑战，我们提出了一个新的水下导航基准环境，利用游戏引擎和DRL之间的集成最新进展。更详细地说，我们展示了我们的基准环境即使对于最先进的DRL方法来说也存在问题，可能难以生成具有泛化能力和安全性的可靠策略。具体来说，我们专注于PPO，这是最广泛接受的算法之一，我们提出了先进的训练技术（如课程学习和可学习的超参数）。我们广泛的实证评估表明，这些因素的良好组合可以取得令人鼓舞的结果。我们的模拟环境和训练基线是免费提供的，以促进进一步研究这一开放问题，并鼓励该领域的合作。

更新时间: 2024-05-30 23:20:23

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.20534v1

AI-enabled prediction of NMR spectroscopy: Deducing 2-D NMR of carbohydrate

In the dynamic field of nuclear magnetic resonance (NMR) spectroscopy, artificial intelligence (AI) has ushered in a transformative era for molecular studies. AI-driven NMR prediction, powered by advanced machine learning and predictive algorithms, has fundamentally reshaped the interpretation of NMR spectra. This innovation empowers us to forecast spectral patterns swiftly and accurately across a broad spectrum of molecular structures. Furthermore, the advent of generative modeling offers a groundbreaking approach, making it feasible to make informed prediction of 2D NMR from chemical language (such as SMILES, IUPAC Name). Our method mirrors the multifaceted nature of NMR imaging experiments, producing 2D NMRs for the same molecule based on different conditions, such as solvents and temperatures. Our methodology is versatile, catering to both monosaccharide-derived small molecules, oligosaccharides and large polysaccharides. A deeper exploration of the discrepancies in these predictions can provide insights into the influence of elements such as functional groups, repeating units, and the modification of the monomers on the outcomes. Given the complex nature involved in the generation of 2D NMRs, our objective is to fully leverage the potential of AI to enhance the precision, efficiency, and comprehensibility of NMR spectral analysis, ultimately advancing both the field of NMR spectroscopy and the broader realm of molecular research.

Updated: 2024-05-30 23:18:46

标题: 人工智能预测NMR光谱：推导碳水化合物的2D NMR

摘要: 在核磁共振（NMR）光谱学这一充满活力的领域中，人工智能（AI）开启了分子研究的转型时代。由先进的机器学习和预测算法驱动的AI预测技术，从根本上改变了对NMR光谱的解释。这种创新使我们能够迅速准确地预测各种分子结构的光谱图案。此外，生成建模的出现提供了一种开创性的方法，使得从化学语言（例如SMILES，IUPAC名称）中对2D NMR进行明智的预测成为可能。我们的方法反映了NMR成像实验的多方面性质，根据不同条件（如溶剂和温度），为同一分子生成2D NMR。我们的方法灵活多样，适用于单糖衍生的小分子、寡糖和大多糖。对这些预测中的差异进行更深入的探索可以揭示功能团、重复单元以及单体修饰对结果的影响。鉴于生成2D NMR的复杂性，我们的目标是充分利用人工智能的潜力，提升NMR光谱分析的精确性、效率和可理解性，最终推动NMR光谱学领域和更广泛的分子研究领域的发展。

更新时间: 2024-05-30 23:18:46

领域: cs.LG,cs.AI,physics.chem-ph

下载: http://arxiv.org/abs/2403.11353v3

Mitigating the Impact of Labeling Errors on Training via Rockafellian Relaxation

Labeling errors in datasets are common, if not systematic, in practice. They naturally arise in a variety of contexts-human labeling, noisy labeling, and weak labeling (i.e., image classification), for example. This presents a persistent and pervasive stress on machine learning practice. In particular, neural network (NN) architectures can withstand minor amounts of dataset imperfection with traditional countermeasures such as regularization, data augmentation, and batch normalization. However, major dataset imperfections often prove insurmountable. We propose and study the implementation of Rockafellian Relaxation (RR), a new loss reweighting, architecture-independent methodology, for neural network training. Experiments indicate RR can enhance standard neural network methods to achieve robust performance across classification tasks in computer vision and natural language processing (sentiment analysis). We find that RR can mitigate the effects of dataset corruption due to both (heavy) labeling error and/or adversarial perturbation, demonstrating effectiveness across a variety of data domains and machine learning tasks.

Updated: 2024-05-30 23:13:01

标题: 减轻Rockafellian Relaxation对训练的标签错误影响

摘要: 数据集中的标注错误在实践中是常见的，如果不是系统性的。它们自然地出现在各种情境中，比如人工标注、嘈杂标注和弱标注（例如图像分类）。这给机器学习实践带来了持续和普遍的压力。特别是，神经网络（NN）架构可以通过传统的对抗措施（如正则化、数据增强和批量归一化）来抵御数据集中的轻微缺陷。然而，重大的数据集缺陷通常是无法克服的。我们提出并研究了Rockafellian Relaxation（RR），一种新的损失重加权、与架构无关的方法，用于神经网络训练。实验证明，RR可以增强标准的神经网络方法，在计算机视觉和自然语言处理（情感分析）的分类任务中实现稳健的性能。我们发现，RR可以缓解由于（严重的）标注错误和/或对抗性扰动而引起的数据集损坏的影响，展示了在各种数据领域和机器学习任务中的有效性。

更新时间: 2024-05-30 23:13:01

领域: cs.LG

下载: http://arxiv.org/abs/2405.20531v1

Hypothesis Search: Inductive Reasoning with Language Models

Inductive reasoning is a core problem-solving capacity: humans can identify underlying principles from a few examples, which robustly generalize to novel scenarios. Recent work evaluates large language models (LLMs) on inductive reasoning tasks by directly prompting them yielding "in context learning." This works well for straightforward inductive tasks but performs poorly on complex tasks such as the Abstraction and Reasoning Corpus (ARC). In this work, we propose to improve the inductive reasoning ability of LLMs by generating explicit hypotheses at multiple levels of abstraction: we prompt the LLM to propose multiple abstract hypotheses about the problem, in natural language, then implement the natural language hypotheses as concrete Python programs. These programs can be verified by running on observed examples and generalized to novel inputs. To reduce the hypothesis search space, we explore steps to filter the set of hypotheses to implement: we either ask the LLM to summarize them into a smaller set of hypotheses or ask human annotators to select a subset. We verify our pipeline's effectiveness on the ARC visual inductive reasoning benchmark, its variant 1D-ARC, string transformation dataset SyGuS, and list transformation dataset List Functions. On a random 100-problem subset of ARC, our automated pipeline using LLM summaries achieves 30% accuracy, outperforming the direct prompting baseline (accuracy of 17%). With the minimal human input of selecting from LLM-generated candidates, performance is boosted to 33%. Our ablations show that both abstract hypothesis generation and concrete program representations benefit LLMs on inductive reasoning tasks.

Updated: 2024-05-30 23:10:00

标题: 假设搜索：与语言模型一起的归纳推理

摘要: 感知推理是核心解决问题的能力：人类可以从少数示例中识别基本原则，并将其普遍应用于新颖情境。最近的研究通过直接提示大型语言模型（LLMs）进行感知推理任务评估，从而产生“上下文学习”。这对于简单的感知任务效果很好，但在复杂任务（如抽象和推理语料库（ARC））上表现不佳。在这项工作中，我们提出通过在多个抽象层次生成明确的假设来改进LLMs的感知推理能力：我们提示LLM提出关于问题的多个抽象假设，用自然语言表达，然后将这些自然语言假设实现为具体的Python程序。这些程序可以通过在观察示例上运行进行验证，并推广到新的输入。为减少假设搜索空间，我们探索了筛选要实现的假设集的步骤：我们要么要求LLM将它们总结为更小的一组假设，要么要求人类注释者选择子集。我们在ARC视觉感知推理基准、其变体1D-ARC、字符串转换数据集SyGuS和列表转换数据集List Functions上验证了我们的流水线的有效性。在ARC的随机100个问题子集上，我们使用LLM摘要的自动化流水线实现了30%的准确性，超过了直接提示基线（17%的准确性）。通过从LLM生成的候选项中进行最少的人类输入选择，性能提升至33%。我们的切除实验证明，抽象假设生成和具体程序表示对LLMs在感知推理任务上有益。

更新时间: 2024-05-30 23:10:00

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2309.05660v2

An Automatic Question Usability Evaluation Toolkit

Evaluating multiple-choice questions (MCQs) involves either labor intensive human assessments or automated methods that prioritize readability, often overlooking deeper question design flaws. To address this issue, we introduce the Scalable Automatic Question Usability Evaluation Toolkit (SAQUET), an open-source tool that leverages the Item-Writing Flaws (IWF) rubric for a comprehensive and automated quality evaluation of MCQs. By harnessing the latest in large language models such as GPT-4, advanced word embeddings, and Transformers designed to analyze textual complexity, SAQUET effectively pinpoints and assesses a wide array of flaws in MCQs. We first demonstrate the discrepancy between commonly used automated evaluation metrics and the human assessment of MCQ quality. Then we evaluate SAQUET on a diverse dataset of MCQs across the five domains of Chemistry, Statistics, Computer Science, Humanities, and Healthcare, showing how it effectively distinguishes between flawed and flawless questions, providing a level of analysis beyond what is achievable with traditional metrics. With an accuracy rate of over 94% in detecting the presence of flaws identified by human evaluators, our findings emphasize the limitations of existing evaluation methods and showcase potential in improving the quality of educational assessments.

Updated: 2024-05-30 23:04:53

标题: 一个自动问题可用性评估工具包

摘要: 评估多项选择题(MCQs)涉及劳动密集型的人类评估或优先考虑可读性的自动化方法，往往忽视更深层次的问题设计缺陷。为了解决这个问题，我们引入了可扩展的自动问题可用性评估工具包(SAQUET)，这是一个开源工具，利用项目编写缺陷(IWF)评分标准对MCQs进行全面和自动化的质量评估。通过利用最新的大型语言模型，如GPT-4、先进的词嵌入和设计用于分析文本复杂性的变压器，SAQUET有效地指出并评估了MCQs中各种缺陷。我们首先展示了常用的自动评估指标与人类评估MCQ质量之间的差异。然后，我们在化学、统计学、计算机科学、人文学和医疗保健等五个领域的多样化MCQ数据集上评估了SAQUET，展示了它如何有效区分有缺陷和完美的问题，提供了比传统指标更深入的分析水平。在检测到人类评估员识别的缺陷存在方面，我们的发现强调了现有评估方法的局限性，并展示了改善教育评估质量的潜力。

更新时间: 2024-05-30 23:04:53

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20529v1

Towards Ontology-Enhanced Representation Learning for Large Language Models

Taking advantage of the widespread use of ontologies to organise and harmonize knowledge across several distinct domains, this paper proposes a novel approach to improve an embedding-Large Language Model (embedding-LLM) of interest by infusing the knowledge formalized by a reference ontology: ontological knowledge infusion aims at boosting the ability of the considered LLM to effectively model the knowledge domain described by the infused ontology. The linguistic information (i.e. concept synonyms and descriptions) and structural information (i.e. is-a relations) formalized by the ontology are utilized to compile a comprehensive set of concept definitions, with the assistance of a powerful generative LLM (i.e. GPT-3.5-turbo). These concept definitions are then employed to fine-tune the target embedding-LLM using a contrastive learning framework. To demonstrate and evaluate the proposed approach, we utilize the biomedical disease ontology MONDO. The results show that embedding-LLMs enhanced by ontological disease knowledge exhibit an improved capability to effectively evaluate the similarity of in-domain sentences from biomedical documents mentioning diseases, without compromising their out-of-domain performance.

Updated: 2024-05-30 23:01:10

标题: 朝向本体增强表示学习：大型语言模型

摘要: 利用本体论在多个不同领域组织和协调知识的广泛应用，本文提出了一种改进嵌入-大语言模型（embedding-LLM）的新方法，即通过注入由参考本体论形式化的知识来提高感兴趣的模型：本体知识注入旨在增强考虑的LLM对于描述由注入的本体论述定的知识领域的能力。利用本体论形式化的语言信息（即概念同义词和描述）和结构信息（即is-a关系），借助强大的生成式LLM（即GPT-3.5-turbo）编制了一套全面的概念定义。这些概念定义然后被用来通过对比学习框架微调目标嵌入-LLM。为了演示和评估所提出的方法，我们利用生物医学疾病本体MONDO。结果显示，通过本体疾病知识增强的嵌入-LLMs表现出对于提及疾病的生物医学文档中领域内句子相似性的评估能力得到改善，而不会损害其领域外性能。

更新时间: 2024-05-30 23:01:10

领域: cs.CL,cs.AI,68T50,I.2.7; I.2.6

下载: http://arxiv.org/abs/2405.20527v1

Automated Generation and Tagging of Knowledge Components from Multiple-Choice Questions

Knowledge Components (KCs) linked to assessments enhance the measurement of student learning, enrich analytics, and facilitate adaptivity. However, generating and linking KCs to assessment items requires significant effort and domain-specific knowledge. To streamline this process for higher-education courses, we employed GPT-4 to generate KCs for multiple-choice questions (MCQs) in Chemistry and E-Learning. We analyzed discrepancies between the KCs generated by the Large Language Model (LLM) and those made by humans through evaluation from three domain experts in each subject area. This evaluation aimed to determine whether, in instances of non-matching KCs, evaluators showed a preference for the LLM-generated KCs over their human-created counterparts. We also developed an ontology induction algorithm to cluster questions that assess similar KCs based on their content. Our most effective LLM strategy accurately matched KCs for 56% of Chemistry and 35% of E-Learning MCQs, with even higher success when considering the top five KC suggestions. Human evaluators favored LLM-generated KCs, choosing them over human-assigned ones approximately two-thirds of the time, a preference that was statistically significant across both domains. Our clustering algorithm successfully grouped questions by their underlying KCs without needing explicit labels or contextual information. This research advances the automation of KC generation and classification for assessment items, alleviating the need for student data or predefined KC labels.

Updated: 2024-05-30 22:57:49

标题: 多选题自动化生成和标记知识组件

摘要: 将知识组件（KCs）与评估联系起来可以增强对学生学习的测量，丰富分析内容，并促进适应性。然而，生成和将KCs与评估项目联系起来需要大量努力和领域特定知识。为了简化高等教育课程中这一过程，我们利用GPT-4生成化学和电子学习中的多项选择题（MCQs）的KCs。我们通过三名每个学科领域的专家的评估分析了大型语言模型（LLM）生成的KCs与人类生成的KCs之间的差异。这一评估旨在确定在KCs不匹配的情况下，评估者是否更倾向于LLM生成的KCs而不是人类创建的KCs。我们还开发了一个本体感应算法，根据内容将评估相似KCs的问题进行聚类。我们最有效的LLM策略准确匹配了56%的化学和35%的电子学习MCQs的KCs，当考虑前五个KCs建议时，成功率更高。人类评估者更青睐LLM生成的KCs，他们在近三分之二的时间内选择了它们而不是人类分配的KCs，这一偏好在两个领域中都具有统计学意义。我们的聚类算法成功地将问题按其潜在的KCs进行分组，而不需要明确标签或上下文信息。这项研究推动了评估项目的KCs生成和分类自动化，减少了对学生数据或预定义的KCs标签的需求。

更新时间: 2024-05-30 22:57:49

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20526v1

SoK: Public Blockchain Sharding

Blockchain's decentralization, transparency, and tamper-resistance properties have facilitated the system's use in various application fields. However, the low throughput and high confirmation latency hinder the widespread adoption of Blockchain. Many solutions have been proposed to address these issues, including first-layer solutions (or on-chain solutions) and second-layer solutions (or off-chain solutions). Among the proposed solutions, the blockchain sharding system is the most scalable one, where the nodes in the network are divided into several groups. The nodes in different shards work in parallel to validate the transactions and add them to the blocks, and in such a way, the throughput increases significantly. However, previous works have not adequately summarized the latest achievements in blockchain sharding, nor have they fully showcased its state-of-the-art. Our study provides a systemization of knowledge of public blockchain sharding, including the core components of sharding systems, challenges, limitations, and mechanisms of the latest sharding protocols. We also compare their performance and discuss current constraints and future research directions.

Updated: 2024-05-30 22:38:40

标题: SoK: 公共区块链分片

摘要: 区块链的去中心化、透明性和防篡改性质已经促进了该系统在各个应用领域的使用。然而，低吞吐量和高确认延迟阻碍了区块链的广泛采用。已经提出了许多解决方案来解决这些问题，包括第一层解决方案（或链上解决方案）和第二层解决方案（或链下解决方案）。在提出的解决方案中，区块链分片系统是最具可扩展性的解决方案，其中网络中的节点被分成几个组。不同分片中的节点并行工作以验证交易并将其添加到区块中，从而显著增加吞吐量。然而，以前的研究并没有充分总结区块链分片的最新成就，也没有完全展示其最新技术。我们的研究提供了公共区块链分片知识的系统化，包括分片系统的核心组件、挑战、限制以及最新分片协议的机制。我们还比较了它们的性能，并讨论了当前的限制和未来研究方向。

更新时间: 2024-05-30 22:38:40

领域: cs.CR

下载: http://arxiv.org/abs/2405.20521v1

Position: Graph Foundation Models are Already Here

Graph Foundation Models (GFMs) are emerging as a significant research topic in the graph domain, aiming to develop graph models trained on extensive and diverse data to enhance their applicability across various tasks and domains. Developing GFMs presents unique challenges over traditional Graph Neural Networks (GNNs), which are typically trained from scratch for specific tasks on particular datasets. The primary challenge in constructing GFMs lies in effectively leveraging vast and diverse graph data to achieve positive transfer. Drawing inspiration from existing foundation models in the CV and NLP domains, we propose a novel perspective for the GFM development by advocating for a ``graph vocabulary'', in which the basic transferable units underlying graphs encode the invariance on graphs. We ground the graph vocabulary construction from essential aspects including network analysis, expressiveness, and stability. Such a vocabulary perspective can potentially advance the future GFM design in line with the neural scaling laws. All relevant resources with GFM design can be found here.

Updated: 2024-05-30 22:38:39

标题: 立场：图基础模型已经存在

摘要: 图基础模型（GFMs）正逐渐成为图领域中重要的研究课题，旨在开发在广泛和多样化数据上训练的图模型，以增强它们在各种任务和领域中的适用性。开发GFMs相较于传统的图神经网络（GNNs）存在独特的挑战，后者通常是针对特定任务在特定数据集上从零开始训练的。构建GFMs的主要挑战在于有效利用大量和多样化的图数据实现正向传递。受CV和NLP领域现有基础模型的启发，我们提出了一个新颖的观点，即通过提倡“图词汇”，在其中底层的可转移单元编码图的不变性。我们从网络分析、表达性和稳定性等关键方面构建图词汇。这种词汇视角有望推动未来GFMs设计与神经缩放定律保持一致。所有与GFMs设计相关的资源都可以在这里找到。

更新时间: 2024-05-30 22:38:39

领域: cs.LG

下载: http://arxiv.org/abs/2402.02216v3

Diffusion On Syntax Trees For Program Synthesis

Large language models generate code one token at a time. Their autoregressive generation process lacks the feedback of observing the program's output. Training LLMs to suggest edits directly can be challenging due to the scarcity of rich edit data. To address these problems, we propose neural diffusion models that operate on syntax trees of any context-free grammar. Similar to image diffusion models, our method also inverts ``noise'' applied to syntax trees. Rather than generating code sequentially, we iteratively edit it while preserving syntactic validity, which makes it easy to combine this neural model with search. We apply our approach to inverse graphics tasks, where our model learns to convert images into programs that produce those images. Combined with search, our model is able to write graphics programs, see the execution result, and debug them to meet the required specifications. We additionally show how our system can write graphics programs for hand-drawn sketches.

Updated: 2024-05-30 22:31:16

标题: 程序综合中的语法树扩散

摘要: 大型语言模型一次生成一个令牌的代码。它们的自回归生成过程缺乏观察程序输出的反馈。直接训练LLMs提出编辑建议可能会面临挑战，因为编辑数据稀缺。为了解决这些问题，我们提出了在任何上下文无关语法的语法树上运行的神经扩散模型。类似于图像扩散模型，我们的方法也反转了应用于语法树的“噪音”。我们不是顺序生成代码，而是在保持语法有效性的同时迭代地编辑它，这使得很容易将这个神经模型与搜索结合起来。我们将我们的方法应用于逆向图形任务，在这些任务中，我们的模型学会将图像转换为生成这些图像的程序。结合搜索，我们的模型能够编写图形程序，查看执行结果，并对其进行调试以满足所需的规格。我们此外展示了我们的系统如何为手绘草图编写图形程序。

更新时间: 2024-05-30 22:31:16

领域: cs.AI

下载: http://arxiv.org/abs/2405.20519v1

Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now

Generative models can produce impressively realistic images. This paper demonstrates that generated images have geometric features different from those of real images. We build a set of collections of generated images, prequalified to fool simple, signal-based classifiers into believing they are real. We then show that prequalified generated images can be identified reliably by classifiers that only look at geometric properties. We use three such classifiers. All three classifiers are denied access to image pixels, and look only at derived geometric features. The first classifier looks at the perspective field of the image, the second looks at lines detected in the image, and the third looks at relations between detected objects and shadows. Our procedure detects generated images more reliably than SOTA local signal based detectors, for images from a number of distinct generators. Saliency maps suggest that the classifiers can identify geometric problems reliably. We conclude that current generators cannot reliably reproduce geometric properties of real images.

Updated: 2024-05-30 22:22:54

标题: 阴影不会撒谎，线条无法弯曲！生成模型目前不了解投影几何学。

摘要: 生成模型可以产生令人印象深刻的逼真图像。本文表明，生成的图像具有与真实图像不同的几何特征。我们建立了一组生成图像的集合，经过预先筛选，可以欺骗简单的基于信号的分类器，使其认为这些图像是真实的。然后，我们展示了只查看几何属性的分类器可以可靠地识别经过预先筛选的生成图像。我们使用了三种这样的分类器。这三种分类器都无法访问图像像素，只查看衍生的几何特征。第一种分类器查看图像的透视场，第二种查看图像中检测到的线条，第三种查看检测到的物体和阴影之间的关系。我们的程序比SOTA本地信号检测器更可靠地检测生成的图像，这些图像来自多个不同的生成器。显著性图表明，分类器可以可靠地识别几何问题。我们得出结论，当前的生成器无法可靠地再现真实图像的几何特性。

更新时间: 2024-05-30 22:22:54

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2311.17138v2

WaveCastNet: An AI-enabled Wavefield Forecasting Framework for Earthquake Early Warning

Large earthquakes can be destructive and quickly wreak havoc on a landscape. To mitigate immediate threats, early warning systems have been developed to alert residents, emergency responders, and critical infrastructure operators seconds to a minute before seismic waves arrive. These warnings provide time to take precautions and prevent damage. The success of these systems relies on fast, accurate predictions of ground motion intensities, which is challenging due to the complex physics of earthquakes, wave propagation, and their intricate spatial and temporal interactions. To improve early warning, we propose a novel AI-enabled framework, WaveCastNet, for forecasting ground motions from large earthquakes. WaveCastNet integrates a novel convolutional Long Expressive Memory (ConvLEM) model into a sequence to sequence (seq2seq) forecasting framework to model long-term dependencies and multi-scale patterns in both space and time. WaveCastNet, which shares weights across spatial and temporal dimensions, requires fewer parameters compared to more resource-intensive models like transformers and thus, in turn, reduces inference times. Importantly, WaveCastNet also generalizes better than transformer-based models to different seismic scenarios, including to more rare and critical situations with higher magnitude earthquakes. Our results using simulated data from the San Francisco Bay Area demonstrate the capability to rapidly predict the intensity and timing of destructive ground motions. Importantly, our proposed approach does not require estimating earthquake magnitudes and epicenters, which are prone to errors using conventional approaches; nor does it require empirical ground motion models, which fail to capture strongly heterogeneous wave propagation effects.

Updated: 2024-05-30 22:18:16

标题: WaveCastNet：一种用于地震预警的AI启用的波场预测框架

摘要: 大地震可能具有破坏性，并迅速对景观造成严重破坏。为了减轻即时威胁，已经开发了早期预警系统，可以在地震波到达之前几秒钟到一分钟通知居民、应急响应人员和关键基础设施运营商。这些预警提供了时间采取预防措施，防止损坏。这些系统的成功依赖于快速准确地预测地面运动强度，这是具有挑战性的，因为地震、波传播及其复杂的空间和时间相互作用。为了改进早期预警，我们提出了一个新颖的AI启用框架WaveCastNet，用于预测大地震的地面运动。WaveCastNet将一个新颖的卷积长表达记忆(ConvLEM)模型集成到一个序列到序列(seq2seq)预测框架中，以建模空间和时间中的长期依赖关系和多尺度模式。WaveCastNet通过跨空间和时间维度共享权重，相比于更消耗资源的转换器模型，需要更少的参数，从而降低推理时间。重要的是，WaveCastNet还比基于转换器的模型更好地泛化到不同的地震场景，包括更罕见和更严重的情况，如更高震级的地震。我们使用旧金山湾区的模拟数据的结果表明了快速预测破坏性地面运动的强度和时间的能力。重要的是，我们提出的方法不需要估计地震的震级和震中，这在传统方法中容易出错；也不需要经验性地面运动模型，这些模型无法捕捉强烈异质波传播效应。

更新时间: 2024-05-30 22:18:16

领域: cs.LG,physics.geo-ph

下载: http://arxiv.org/abs/2405.20516v1

Deep Modeling of Non-Gaussian Aleatoric Uncertainty

Deep learning offers promising new ways to accurately model aleatoric uncertainty in robotic estimation systems, particularly when the uncertainty distributions do not conform to traditional assumptions of being fixed and Gaussian. In this study, we formulate and evaluate three fundamental deep learning approaches for conditional probability density modeling to quantify non-Gaussian aleatoric uncertainty: parametric, discretized, and generative modeling. We systematically compare the respective strengths and weaknesses of these three methods on simulated non-Gaussian densities as well as on real-world terrain-relative navigation data. Our results show that these deep learning methods can accurately capture complex uncertainty patterns, highlighting their potential for improving the reliability and robustness of estimation systems.

Updated: 2024-05-30 22:13:17

标题: 非高斯随机不确定性的深层建模

摘要: 深度学习为机器人估计系统准确建模随机不确定性提供了有希望的新方法，特别是当不确定性分布不符合传统的固定和高斯假设时。在本研究中，我们制定并评估了三种基本的深度学习方法，用于条件概率密度建模以量化非高斯随机不确定性：参数化、离散化和生成建模。我们系统地比较了这三种方法在模拟的非高斯密度和实际地形相关导航数据上的各自优势和劣势。我们的结果表明，这些深度学习方法可以准确捕捉复杂的不确定性模式，突显了它们提高估计系统可靠性和稳健性的潜力。

更新时间: 2024-05-30 22:13:17

领域: cs.LG,cs.AI,cs.CV,cs.RO

下载: http://arxiv.org/abs/2405.20513v1

How Multilingual Are Large Language Models Fine-Tuned for Translation?

A new paradigm for machine translation has recently emerged: fine-tuning large language models (LLM) on parallel text has been shown to outperform dedicated translation systems trained in a supervised fashion on much larger amounts of parallel data (Xu et al., 2024a; Alves et al., 2024). However, it remains unclear whether this paradigm can enable massively multilingual machine translation or whether it requires fine-tuning dedicated models for a small number of language pairs. How does translation fine-tuning impact the MT capabilities of LLMs for zero-shot languages, zero-shot language pairs, and translation tasks that do not involve English? To address these questions, we conduct an extensive empirical evaluation of the translation quality of the TOWER family of language models (Alves et al., 2024) on 132 translation tasks from the multi-parallel FLORES-200 data. We find that translation fine-tuning improves translation quality even for zero-shot languages on average, but that the impact is uneven depending on the language pairs involved. These results call for further research to effectively enable massively multilingual translation with LLMs.

Updated: 2024-05-30 22:08:20

标题: 大型语言模型在翻译微调时有多语言化？

摘要: 最近出现了一种新的机器翻译范式：在平行文本上对大型语言模型（LLM）进行微调已被证明优于以监督方式训练的专用翻译系统，这些系统使用了更多的平行数据（Xu等人，2024a；Alves等人，2024）。然而，目前尚不清楚这种范式是否可以实现大规模多语言机器翻译，或者是否需要为少量语言对微调专用模型。翻译微调如何影响LLM的机器翻译能力，特别是对于零翻语言、零翻语言对和不涉及英语的翻译任务？为了回答这些问题，我们对FLORES-200数据中的132个翻译任务上的TOWER语言模型系列（Alves等人，2024）的翻译质量进行了广泛的实证评估。我们发现，即使对于零翻语言，翻译微调平均提高了翻译质量，但其影响取决于涉及的语言对。这些结果呼吁进一步研究，以有效实现LLM的大规模多语言翻译。

更新时间: 2024-05-30 22:08:20

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.20512v1

SPOT: Text Source Prediction from Originality Score Thresholding

The wide acceptance of large language models (LLMs) has unlocked new applications and social risks. Popular countermeasures aim at detecting misinformation, usually involve domain specific models trained to recognize the relevance of any information. Instead of evaluating the validity of the information, we propose to investigate LLM generated text from the perspective of trust. In this study, we define trust as the ability to know if an input text was generated by a LLM or a human. To do so, we design SPOT, an efficient method, that classifies the source of any, standalone, text input based on originality score. This score is derived from the prediction of a given LLM to detect other LLMs. We empirically demonstrate the robustness of the method to the architecture, training data, evaluation data, task and compression of modern LLMs.

Updated: 2024-05-30 21:51:01

标题: SPOT：基于原创度得分阈值的文本来源预测

摘要: 大型语言模型（LLMs）的广泛接受已经开辟了新的应用和社会风险。流行的对策通常旨在检测错误信息，通常涉及域特定模型，经过训练可以识别任何信息的相关性。我们提出从信任的角度研究LLM生成的文本，而不是评估信息的有效性。在本研究中，我们定义信任为能够知道输入文本是由LLM还是人生成的能力。为此，我们设计了SPOT，一种有效的方法，根据原创度评分对任何独立的文本输入的来源进行分类。这个评分是根据给定LLM的预测来检测其他LLM而得出的。我们从经验上证明了该方法对于现代LLMs的架构、训练数据、评估数据、任务和压缩的稳健性。

更新时间: 2024-05-30 21:51:01

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.20505v1

The Road Less Scheduled

Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available (https://github.com/facebookresearch/schedule_free).

Updated: 2024-05-30 21:50:15

标题: 《较少安排的道路》

摘要: 现有的学习率调度，不需要指定优化停止步骤T的，远不如依赖于T的学习率调度表现出色。我们提出了一种方法，通过完全避免使用调度表，同时展示了与调度表在从凸问题到大规模深度学习问题的广泛问题族中的最先进性能相比的表现。我们的免调度方法比标准带动量的优化器没有额外的超参数。我们的方法是我们开发的一个新理论的直接结果，该理论统一了调度和迭代平均化。我们的方法的开源实现可用（https://github.com/facebookresearch/schedule_free）。

更新时间: 2024-05-30 21:50:15

领域: cs.LG,cs.AI,math.OC,stat.ML

下载: http://arxiv.org/abs/2405.15682v2

FCOM: A Federated Collaborative Online Monitoring Framework via Representation Learning

Online learning has demonstrated notable potential to dynamically allocate limited resources to monitor a large population of processes, effectively balancing the exploitation of processes yielding high rewards, and the exploration of uncertain processes. However, most online learning algorithms were designed under 1) a centralized setting that requires data sharing across processes to obtain an accurate prediction or 2) a homogeneity assumption that estimates a single global model from the decentralized data. To facilitate the online learning of heterogeneous processes from the decentralized data, we propose a federated collaborative online monitoring method, which captures the latent representative models inherent in the population through representation learning and designs a novel federated collaborative UCB algorithm to estimate the representative models from sequentially observed decentralized data. The efficiency of our method is illustrated through theoretical analysis, simulation studies, and decentralized cognitive degradation monitoring in Alzheimer's disease.

Updated: 2024-05-30 21:49:14

标题: FCOM：通过表示学习实现的联合协作在线监控框架

摘要: 在线学习已经展示出了动态分配有限资源来监控大量过程的潜力，有效地平衡了对产生高回报过程的利用和对不确定过程的探索。然而，大多数在线学习算法是在以下情况下设计的：1）需要跨过程共享数据以获得准确的预测的集中设置，或者2）从分散的数据中估计单一全局模型的同质性假设。为了促进从分散数据中学习异质过程的在线学习，我们提出了一种联合协作在线监测方法，通过表示学习捕获人群中固有的潜在代表模型，并设计了一种新颖的联合协作UCB算法，用于从顺序观察的分散数据中估计代表模型。我们的方法的效率通过理论分析、仿真研究和在阿尔茨海默病中的分散认知退化监测中得到了说明。

更新时间: 2024-05-30 21:49:14

领域: cs.LG

下载: http://arxiv.org/abs/2405.20504v1

Optimizing cnn-Bigru performance: Mish activation and comparative analysis with Relu

Deep learning is currently extensively employed across a range of research domains. The continuous advancements in deep learning techniques contribute to solving intricate challenges. Activation functions (AF) are fundamental components within neural networks, enabling them to capture complex patterns and relationships in the data. By introducing non-linearities, AF empowers neural networks to model and adapt to the diverse and nuanced nature of real-world data, enhancing their ability to make accurate predictions across various tasks. In the context of intrusion detection, the Mish, a recent AF, was implemented in the CNN-BiGRU model, using three datasets: ASNM-TUN, ASNM-CDX, and HOGZILLA. The comparison with Rectified Linear Unit (ReLU), a widely used AF, revealed that Mish outperforms ReLU, showcasing superior performance across the evaluated datasets. This study illuminates the effectiveness of AF in elevating the performance of intrusion detection systems.

Updated: 2024-05-30 21:48:56

标题: 优化CNN-BiGRU性能：Mish激活和与Relu的比较分析

摘要: 深度学习目前广泛应用于各种研究领域。深度学习技术的持续进步有助于解决复杂的挑战。激活函数（AF）是神经网络中的基本组件，使其能够捕捉数据中的复杂模式和关系。通过引入非线性，AF使神经网络能够对真实世界数据的多样性和微妙性进行建模和适应，提高它们在各种任务中进行准确预测的能力。在入侵检测的背景下，最近一种AF Mish 被应用于CNN-BiGRU模型中，使用了三个数据集：ASNM-TUN、ASNM-CDX和HOGZILLA。与广泛使用的激活函数ReLU相比，Mish表现出色，在评估的数据集中展现出卓越的性能。这项研究阐明了激活函数在提升入侵检测系统性能方面的有效性。

更新时间: 2024-05-30 21:48:56

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2405.20503v1

ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane

The ability to shop independently, especially in grocery stores, is important for maintaining a high quality of life. This can be particularly challenging for people with visual impairments (PVI). Stores carry thousands of products, with approximately 30,000 new products introduced each year in the US market alone, presenting a challenge even for modern computer vision solutions. Through this work, we present a proof-of-concept socially assistive robotic system we call ShelfHelp, and propose novel technical solutions for enhancing instrumented canes traditionally meant for navigation tasks with additional capability within the domain of shopping. ShelfHelp includes a novel visual product locator algorithm designed for use in grocery stores and a novel planner that autonomously issues verbal manipulation guidance commands to guide the user during product retrieval. Through a human subjects study, we show the system's success in locating and providing effective manipulation guidance to retrieve desired products with novice users. We compare two autonomous verbal guidance modes achieving comparable performance to a human assistance baseline and present encouraging findings that validate our system's efficiency and effectiveness and through positive subjective metrics including competence, intelligence, and ease of use.

Updated: 2024-05-30 21:42:54

标题: ShelfHelp：通过社交辅助机器人拐杖使人类能够执行与视觉无关的操作任务

摘要: 能够独立购物，特别是在杂货店购物，对于维持高质量生活至关重要。这对于视觉障碍者（PVI）来说可能是特别具有挑战性的。商店里有成千上万种产品，仅在美国市场每年就会推出大约3万种新产品，这对于现代计算机视觉解决方案来说也是一个挑战。通过这项工作，我们提出了一个名为ShelfHelp的概念验证的社交辅助机器人系统，并提出了增强传统用于导航任务的仪器杖在购物领域内额外功能的创新技术解决方案。ShelfHelp包括一种专为在杂货店使用而设计的新颖的视觉产品定位算法，以及一个能够自主发出口头操纵指导命令以引导用户进行产品检索的新颖规划器。通过人体实验研究，我们展示了该系统在定位和提供有效操纵指导方面的成功，以帮助新手用户检索所需产品。我们比较了两种自主口头指导模式，达到了与人类协助基准相当的性能，并提出了令人鼓舞的发现，验证了我们系统的高效性和有效性，以及通过积极的主观指标（包括能力、智能和易用性）来验证。

更新时间: 2024-05-30 21:42:54

领域: cs.RO,cs.AI,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2405.20501v1

Hybrid Reinforcement Learning Framework for Mixed-Variable Problems

Optimization problems characterized by both discrete and continuous variables are common across various disciplines, presenting unique challenges due to their complex solution landscapes and the difficulty of navigating mixed-variable spaces effectively. To Address these challenges, we introduce a hybrid Reinforcement Learning (RL) framework that synergizes RL for discrete variable selection with Bayesian Optimization for continuous variable adjustment. This framework stands out by its strategic integration of RL and continuous optimization techniques, enabling it to dynamically adapt to the problem's mixed-variable nature. By employing RL for exploring discrete decision spaces and Bayesian Optimization to refine continuous parameters, our approach not only demonstrates flexibility but also enhances optimization performance. Our experiments on synthetic functions and real-world machine learning hyperparameter tuning tasks reveal that our method consistently outperforms traditional RL, random search, and standalone Bayesian optimization in terms of effectiveness and efficiency.

Updated: 2024-05-30 21:42:33

标题: 混合变量问题的混合强化学习框架

摘要: 由离散和连续变量共同特征的优化问题在各个学科中很常见，由于其复杂的解决方案景观和有效导航混合变量空间的困难，面临独特挑战。为了应对这些挑战，我们引入了一个混合强化学习（RL）框架，将RL用于离散变量选择与贝叶斯优化用于连续变量调整相结合。这个框架通过战略性整合RL和连续优化技术脱颖而出，使其能够动态适应问题的混合变量特性。通过利用RL来探索离散决策空间和贝叶斯优化来优化连续参数，我们的方法不仅展现了灵活性，而且提升了优化性能。我们在合成函数和实际机器学习超参数调整任务上的实验显示，我们的方法在效果和效率方面一直优于传统RL、随机搜索和独立的贝叶斯优化。

更新时间: 2024-05-30 21:42:33

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2405.20500v1

Transfer Q Star: Principled Decoding for LLM Alignment

Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable framework for alignment. However, principled decoding methods rely on oracle access to an optimal Q-function ($Q^*$), which is often unavailable in practice. Hence, prior SoTA methods either approximate this $Q^*$ using $Q^{\pi_{\texttt{sft}}}$ (derived from the reference $\texttt{SFT}$ model) or rely on short-term rewards, resulting in sub-optimal decoding performance. In this work, we propose Transfer $Q^*$, which implicitly estimates the optimal value function for a target reward $r$ through a baseline model $\rho_{\texttt{BL}}$ aligned with a baseline reward $\rho_{\texttt{BL}}$ (which can be different from the target reward $r$). Theoretical analyses of Transfer $Q^*$ provide a rigorous characterization of its optimality, deriving an upper bound on the sub-optimality gap and identifying a hyperparameter to control the deviation from the pre-trained reference $\texttt{SFT}$ model based on user needs. Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods and demonstrates superior empirical performance across key metrics such as coherence, diversity, and quality in extensive tests on several synthetic and real datasets.

Updated: 2024-05-30 21:36:12

标题: Transfer Q星：LLM对齐的原则解码

摘要: 对基础模型进行对齐对于它们的安全和可信赖的部署至关重要。然而，传统的微调方法在计算上是密集的，并且需要更新数十亿个模型参数。一种有前途的替代方法，通过解码进行对齐，直接调整响应分布而无需更新模型，以最大化目标奖励$r，从而提供了一个轻量级和适应性强的对齐框架。然而，基于原则的解码方法依赖于对最优Q函数($Q^*$)的oracle访问，而这在实践中通常是不可用的。因此，先前的SoTA方法要么使用$Q^{\pi_{\texttt{sft}}}$（从参考$\texttt{SFT}$模型派生）来估计$Q^*$，要么依赖于短期奖励，导致解码性能次优。在这项工作中，我们提出了Transfer $Q^*$，通过与基准奖励$\rho_{\texttt{BL}}$对齐的基准模型$\rho_{\texttt{BL}}$（可以与目标奖励$r$不同）隐含地估计了目标奖励$r$的最优值函数。Transfer $Q^*$的理论分析提供了其优化性的严格刻画，导出了次优间隙的上限，并确定了一个超参数，用于根据用户需求控制与基于预训练的参考$\texttt{SFT}$模型的偏差。我们的方法显著减少了先前SoTA方法中观察到的次优间隙，并在几个合成和实际数据集的广泛测试中展示了卓越的实证表现，包括连贯性、多样性和质量等关键指标。

更新时间: 2024-05-30 21:36:12

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.20495v1

Slight Corruption in Pre-training Data Makes Better Diffusion Models

Diffusion models (DMs) have shown remarkable capabilities in generating realistic high-quality images, audios, and videos. They benefit significantly from extensive pre-training on large-scale datasets, including web-crawled data with paired data and conditions, such as image-text and image-class pairs. Despite rigorous filtering, these pre-training datasets often inevitably contain corrupted pairs where conditions do not accurately describe the data. This paper presents the first comprehensive study on the impact of such corruption in pre-training data of DMs. We synthetically corrupt ImageNet-1K and CC3M to pre-train and evaluate over 50 conditional DMs. Our empirical findings reveal that various types of slight corruption in pre-training can significantly enhance the quality, diversity, and fidelity of the generated images across different DMs, both during pre-training and downstream adaptation stages. Theoretically, we consider a Gaussian mixture model and prove that slight corruption in the condition leads to higher entropy and a reduced 2-Wasserstein distance to the ground truth of the data distribution generated by the corruptly trained DMs. Inspired by our analysis, we propose a simple method to improve the training of DMs on practical datasets by adding condition embedding perturbations (CEP). CEP significantly improves the performance of various DMs in both pre-training and downstream tasks. We hope that our study provides new insights into understanding the data and pre-training processes of DMs.

Updated: 2024-05-30 21:35:48

标题: 微小的预训练数据损坏会使扩散模型更好

摘要: 扩散模型（DMs）在生成逼真高质量的图像、音频和视频方面显示出显著的能力。它们受益于在大规模数据集上进行广泛的预训练，包括具有成对数据和条件的网络爬取数据，如图像文本和图像类别对。尽管经过严格的过滤，这些预训练数据集往往不可避免地包含有损的对，其中条件不能准确描述数据。本文首次全面研究了这种损坏对DMs预训练数据的影响。我们通过对ImageNet-1K和CC3M进行合成损坏来预训练和评估50多个有条件的DMs。我们的实证发现表明，预训练中的各种轻微损坏可以显著提高生成图像的质量、多样性和忠实度，无论是在预训练阶段还是下游适应阶段。在理论上，我们考虑了一个高斯混合模型，并证明在条件中轻微的损坏会导致更高的熵和由受损训练DMs生成的数据分布的地面真实度之间的减少的2-Wasserstein距离。受我们分析的启发，我们提出了一种简单的方法来通过添加条件嵌入扰动（CEP）来改进DMs在实际数据集上的训练。CEP显著提高了各种DMs在预训练和下游任务中的性能。我们希望我们的研究为理解DMs的数据和预训练过程提供新的见解。

更新时间: 2024-05-30 21:35:48

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20494v1

Probabilities of Causation for Continuous and Vector Variables

Probabilities of causation (PoC) are valuable concepts for explainable artificial intelligence and practical decision-making. PoC are originally defined for scalar binary variables. In this paper, we extend the concept of PoC to continuous treatment and outcome variables, and further generalize PoC to capture causal effects between multiple treatments and multiple outcomes. In addition, we consider PoC for a sub-population and PoC with multi-hypothetical terms to capture more sophisticated counterfactual information useful for decision-making. We provide a nonparametric identification theorem for each type of PoC we introduce. Finally, we illustrate the application of our results on a real-world dataset about education.

Updated: 2024-05-30 21:22:26

标题: 连续和向量变量的因果概率

摘要: 因果概率（PoC）是解释人工智能和实际决策制定的有价值概念。PoC最初是针对标量二值变量定义的。在本文中，我们将PoC的概念扩展到连续的治疗和结果变量，并进一步推广PoC以捕捉多个治疗和多个结果之间的因果效应。此外，我们考虑了针对亚群体的PoC以及具有多种假设项的PoC，以捕捉更复杂的逆向事实信息，有助于决策制定。我们为引入的每种PoC类型提供了一个非参数识别定理。最后，我们通过一个关于教育的真实数据集展示了我们结果的应用。

更新时间: 2024-05-30 21:22:26

领域: cs.AI

下载: http://arxiv.org/abs/2405.20487v1

Policy Trees for Prediction: Interpretable and Adaptive Model Selection for Machine Learning

As a multitude of capable machine learning (ML) models become widely available in forms such as open-source software and public APIs, central questions remain regarding their use in real-world applications, especially in high-stakes decision-making. Is there always one best model that should be used? When are the models likely to be error-prone? Should a black-box or interpretable model be used? In this work, we develop a prescriptive methodology to address these key questions, introducing a tree-based approach, Optimal Predictive-Policy Trees (OP2T), that yields interpretable policies for adaptively selecting a predictive model or ensemble, along with a parameterized option to reject making a prediction. We base our methods on learning globally optimized prescriptive trees. Our approach enables interpretable and adaptive model selection and rejection while only assuming access to model outputs. By learning policies over different feature spaces, including the model outputs, our approach works with both structured and unstructured datasets. We evaluate our approach on real-world datasets, including regression and classification tasks with both structured and unstructured data. We demonstrate that our approach provides both strong performance against baseline methods while yielding insights that help answer critical questions about which models to use, and when.

Updated: 2024-05-30 21:21:33

标题: 预测的政策树：可解释和自适应的机器学习模型选择

摘要: 随着越来越多功能强大的机器学习（ML）模型以开源软件和公共API的形式广泛可用，关于它们在现实世界应用中的使用仍然存在一些核心问题，特别是在高风险决策中。是否总是有一个最佳模型应该被使用？模型何时可能出错？应该使用黑盒还是可解释的模型？在这项工作中，我们开发了一个规范性方法来解决这些关键问题，引入了一种基于树的方法，Optimal Predictive-Policy Trees（OP2T），该方法产生可解释的策略，用于自适应地选择预测模型或集成，以及拒绝进行预测的参数化选项。我们的方法基于学习全局优化的规范性树。我们的方法在只假设访问模型输出的情况下，实现了可解释和自适应的模型选择和拒绝。通过在不同特征空间上学习策略，包括模型输出，我们的方法适用于结构化和非结构化数据集。我们在包括结构化和非结构化数据的真实数据集上评估了我们的方法，包括回归和分类任务。我们证明了我们的方法在对比基线方法时表现出强大的性能，同时提供了有助于回答关于何时使用哪种模型的关键问题的见解。

更新时间: 2024-05-30 21:21:33

领域: cs.LG

下载: http://arxiv.org/abs/2405.20486v1

Phantom: General Trigger Attacks on Retrieval Augmented Language Generation

Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs) in chatbot applications, enabling developers to adapt and personalize the LLM output without expensive training or fine-tuning. RAG systems use an external knowledge database to retrieve the most relevant documents for a given query, providing this context to the LLM generator. While RAG achieves impressive utility in many applications, its adoption to enable personalized generative models introduces new security risks. In this work, we propose new attack surfaces for an adversary to compromise a victim's RAG system, by injecting a single malicious document in its knowledge database. We design Phantom, general two-step attack framework against RAG augmented LLMs. The first step involves crafting a poisoned document designed to be retrieved by the RAG system within the top-k results only when an adversarial trigger, a specific sequence of words acting as backdoor, is present in the victim's queries. In the second step, a specially crafted adversarial string within the poisoned document triggers various adversarial attacks in the LLM generator, including denial of service, reputation damage, privacy violations, and harmful behaviors. We demonstrate our attacks on multiple LLM architectures, including Gemma, Vicuna, and Llama.

Updated: 2024-05-30 21:19:24

标题: 幻影：检索增强语言生成的一般触发器攻击

摘要: 检索增强生成（RAG）扩展了现代大型语言模型（LLMs）在聊天机器人应用中的能力，使开发人员能够调整和个性化LLM输出，而无需昂贵的训练或微调。RAG系统使用外部知识数据库来检索给定查询的最相关文档，将此上下文提供给LLM生成器。虽然RAG在许多应用中取得了令人印象深刻的效用，但其采用以启用个性化生成模型引入了新的安全风险。在这项工作中，我们提出了对手通过向其知识数据库中注入单个恶意文档来妥协受害者的RAG系统的新攻击面。我们设计了Phantom，针对RAG增强LLMs的通用两步攻击框架。第一步涉及制作一个有毒文档，该文档设计为仅在受害者的查询中存在作为后门的特定单词序列时才被RAG系统检索到前k个结果中。在第二步中，在有毒文档中特别设计的对手字符串触发LLM生成器中的各种对手攻击，包括拒绝服务、声誉损害、隐私侵犯和有害行为。我们在多个LLM架构上展示了我们的攻击，包括Gemma、Vicuna和Llama。

更新时间: 2024-05-30 21:19:24

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.20485v1

Non-asymptotic Convergence of Discrete-time Diffusion Models: New Approach and Improved Rate

The denoising diffusion model has recently emerged as a powerful generative technique that converts noise into data. While there are many studies providing theoretical guarantees for diffusion processes based on discretized stochastic differential equation (D-SDE), many generative samplers in real applications directly employ a discrete-time (DT) diffusion process. However, there are very few studies analyzing these DT processes, e.g., convergence for DT diffusion processes has been obtained only for distributions with bounded support. In this paper, we establish the convergence guarantee for substantially larger classes of distributions under DT diffusion processes and further improve the convergence rate for distributions with bounded support. In particular, we first establish the convergence rates for both smooth and general (possibly non-smooth) distributions having a finite second moment. We then specialize our results to a number of interesting classes of distributions with explicit parameter dependencies, including distributions with Lipschitz scores, Gaussian mixture distributions, and any distributions with early-stopping. We further propose a novel accelerated sampler and show that it improves the convergence rates of the corresponding regular sampler by orders of magnitude with respect to all system parameters. Our study features a novel analytical technique that constructs a tilting factor representation of the convergence error and exploits Tweedie's formula for handling Taylor expansion power terms.

Updated: 2024-05-30 21:18:01

标题: 离散时间扩散模型的非渐近收敛性：新方法和改进速率

摘要: 最近，去噪扩散模型已经成为一种强大的生成技术，将噪声转化为数据。虽然有许多研究针对基于离散化随机微分方程（D-SDE）的扩散过程提供了理论保证，但在实际应用中许多生成采样器直接采用离散时间（DT）扩散过程。然而，对这些DT过程的研究很少，例如，仅对有界支持的分布获得了DT扩散过程的收敛性。在本文中，我们建立了在DT扩散过程下更大类别分布的收敛保证，并进一步改善了有界支持分布的收敛速度。特别地，我们首先建立了对具有有限二阶矩的光滑和一般（可能非光滑）分布的收敛速度。然后，我们将结果专门应用于一些有显式参数依赖性的有趣分布类别，包括具有Lipschitz分数、高斯混合分布和任何具有早停止的分布。我们进一步提出了一种新颖的加速采样器，并展示它相对于所有系统参数的常规采样器的收敛速度提高了数个数量级。我们的研究采用了一种新颖的分析技术，构建了一个倾斜因子表示收敛误差，并利用了Tweedie的公式来处理泰勒展开幂项。

更新时间: 2024-05-30 21:18:01

领域: cs.LG,eess.SP,stat.ML

下载: http://arxiv.org/abs/2402.13901v2

GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models

The discovery of "jailbreaks" to bypass safety filters of Large Language Models (LLMs) and harmful responses have encouraged the community to implement safety measures. One major safety measure is to proactively test the LLMs with jailbreaks prior to the release. Therefore, such testing will require a method that can generate jailbreaks massively and efficiently. In this paper, we follow a novel yet intuitive strategy to generate jailbreaks in the style of the human generation. We propose a role-playing system that assigns four different roles to the user LLMs to collaborate on new jailbreaks. Furthermore, we collect existing jailbreaks and split them into different independent characteristics using clustering frequency and semantic patterns sentence by sentence. We organize these characteristics into a knowledge graph, making them more accessible and easier to retrieve. Our system of different roles will leverage this knowledge graph to generate new jailbreaks, which have proved effective in inducing LLMs to generate unethical or guideline-violating responses. In addition, we also pioneer a setting in our system that will automatically follow the government-issued guidelines to generate jailbreaks to test whether LLMs follow the guidelines accordingly. We refer to our system as GUARD (Guideline Upholding through Adaptive Role-play Diagnostics). We have empirically validated the effectiveness of GUARD on three cutting-edge open-sourced LLMs (Vicuna-13B, LongChat-7B, and Llama-2-7B), as well as a widely-utilized commercial LLM (ChatGPT). Moreover, our work extends to the realm of vision language models (MiniGPT-v2 and Gemini Vision Pro), showcasing GUARD's versatility and contributing valuable insights for the development of safer, more reliable LLM-based applications across diverse modalities.

Updated: 2024-05-30 21:14:26

标题: GUARD:角色扮演生成自然语言越狱以测试大型语言模型遵循指南的角色

摘要: 发现绕过大型语言模型（LLMs）安全过滤器的“越狱”以及有害回应，促使社区采取安全措施。一项重要的安全措施是在发布之前积极测试LLMs以寻找越狱。因此，这种测试将需要一种能够大规模且高效地生成越狱的方法。在本文中，我们遵循一种新颖而直观的策略，以人类生成的方式生成越狱。我们提出了一个角色扮演系统，将四种不同的角色分配给用户LLMs，以合作生成新的越狱。此外，我们收集现有的越狱，并将它们按照句子逐句的聚类频率和语义模式拆分成不同的独立特征。我们将这些特征组织成一个知识图，使其更易于访问和检索。我们的不同角色系统将利用这个知识图来生成新的越狱，这已被证明能够有效地导致LLMs生成不道德或违反指导原则的回应。此外，我们在系统中还开创了一种设置，将自动遵循政府发布的准则生成越狱，以测试LLMs是否遵守准则。我们将我们的系统称为GUARD（Guideline Upholding through Adaptive Role-play Diagnostics）。我们已经通过实证验证了GUARD在三款尖端开源LLMs（Vicuna-13B、LongChat-7B和Llama-2-7B）以及一款广泛使用的商业LLM（ChatGPT）上的有效性。此外，我们的工作还延伸到视觉语言模型领域（MiniGPT-v2和Gemini Vision Pro），展示了GUARD的多功能性，并为开发跨不同模态的更安全、更可靠的基于LLM的应用程序提供了宝贵的见解。

更新时间: 2024-05-30 21:14:26

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2402.03299v4

Hiding Your Awful Online Choices Made More Efficient and Secure: A New Privacy-Aware Recommender System

Recommender systems are an integral part of online platforms that recommend new content to users with similar interests. However, they demand a considerable amount of user activity data where, if the data is not adequately protected, constitute a critical threat to the user privacy. Privacy-aware recommender systems enable protection of such sensitive user data while still maintaining a similar recommendation accuracy compared to the traditional non-private recommender systems. However, at present, the current privacy-aware recommender systems suffer from a significant trade-off between privacy and computational efficiency. For instance, it is well known that architectures that rely purely on cryptographic primitives offer the most robust privacy guarantees, however, they suffer from substantial computational and network overhead. Thus, it is crucial to improve this trade-off for better performance. This paper presents a novel privacy-aware recommender system that combines privacy-aware machine learning algorithms for practical scalability and efficiency with cryptographic primitives like Homomorphic Encryption and Multi-Party Computation - without assumptions like trusted-party or secure hardware - for solid privacy guarantees. Experiments on standard benchmark datasets show that our approach results in time and memory gains by three orders of magnitude compared to using cryptographic primitives in a standalone for constructing a privacy-aware recommender system. Furthermore, for the first time our method makes it feasible to compute private recommendations for datasets containing 100 million entries, even on memory-constrained low-power SOC (System on Chip) devices.

Updated: 2024-05-30 21:08:42

标题: 隐藏您糟糕的在线选择变得更高效和安全：一种新的隐私感知推荐系统

摘要: 推荐系统是在线平台的一个重要组成部分，它可以向具有相似兴趣的用户推荐新内容。然而，这些系统需要大量的用户活动数据，如果这些数据没有得到充分保护，就会构成对用户隐私的严重威胁。注重隐私的推荐系统可以在保护敏感用户数据的同时，仍然保持与传统非隐私推荐系统相似的推荐准确度。然而，目前的隐私感知推荐系统存在着隐私和计算效率之间的显著权衡。例如，纯粹依赖加密原语的体系结构可以提供最健壮的隐私保证，但却面临着巨大的计算和网络开销。因此，改善这种权衡以获得更好的性能非常关键。本文提出了一种新颖的隐私感知推荐系统，结合了注重隐私的机器学习算法以实现实用的可扩展性和效率，同时利用诸如同态加密和多方计算等加密原语 - 而不依赖于受信方或安全硬件 - 以提供坚实的隐私保证。在标准基准数据集上的实验表明，与单独使用加密原语构建隐私感知推荐系统相比，我们的方法能够节省时间和内存，效果提高了三个数量级。此外，我们的方法还首次实现了对包含1亿条记录的数据集进行私密推荐的可行性，即使在内存受限的低功耗SOC设备上也能实现。

更新时间: 2024-05-30 21:08:42

领域: cs.CR

下载: http://arxiv.org/abs/2405.20483v1

Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations

Many causal systems such as biological processes in cells can only be observed indirectly via measurements, such as gene expression. Causal representation learning -- the task of correctly mapping low-level observations to latent causal variables -- could advance scientific understanding by enabling inference of latent variables such as pathway activation. In this paper, we develop methods for inferring latent variables from multiple related datasets (environments) and tasks. As a running example, we consider the task of predicting a phenotype from gene expression, where we often collect data from multiple cell types or organisms that are related in known ways. The key insight is that the mapping from latent variables driven by gene expression to the phenotype of interest changes sparsely across closely related environments. To model sparse changes, we introduce Tree-Based Regularization (TBR), an objective that minimizes both prediction error and regularizes closely related environments to learn similar predictors. We prove that under assumptions about the degree of sparse changes, TBR identifies the true latent variables up to some simple transformations. We evaluate the theory empirically with both simulations and ground-truth gene expression data. We find that TBR recovers the latent causal variables better than related methods across these settings, even under settings that violate some assumptions of the theory.

Updated: 2024-05-30 21:08:14

标题: 利用环境之间的结构：系统发育正则化激励解耦表示

摘要: Many causal systems, such as biological processes in cells, can only be observed indirectly through measurements, such as gene expression. Causal representation learning, which involves accurately mapping low-level observations to latent causal variables, has the potential to improve scientific understanding by allowing for the inference of latent variables like pathway activation. In this study, we propose methods for inferring latent variables from multiple related datasets and tasks. As an example, we focus on predicting a phenotype from gene expression data collected from multiple cell types or organisms that are known to be related. Our key insight is that the mapping from latent variables driven by gene expression to the phenotype changes sparsely across closely related environments. To model these sparse changes, we introduce Tree-Based Regularization (TBR), an objective function that minimizes prediction error while also regularizing closely related environments to learn similar predictors. We demonstrate that, under certain assumptions about the extent of sparse changes, TBR can identify the true latent variables with some simple transformations. We validate the theory through simulations and real gene expression data, showing that TBR outperforms other methods in recovering latent causal variables even in scenarios where some assumptions of the theory are violated.

更新时间: 2024-05-30 21:08:14

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.20482v1

Simulator-Free Visual Domain Randomization via Video Games

Domain randomization is an effective computer vision technique for improving transferability of vision models across visually distinct domains exhibiting similar content. Existing approaches, however, rely extensively on tweaking complex and specialized simulation engines that are difficult to construct, subsequently affecting their feasibility and scalability. This paper introduces BehAVE, a video understanding framework that uniquely leverages the plethora of existing commercial video games for domain randomization, without requiring access to their simulation engines. Under BehAVE (1) the inherent rich visual diversity of video games acts as the source of randomization and (2) player behavior -- represented semantically via textual descriptions of actions -- guides the *alignment* of videos with similar content. We test BehAVE on 25 games of the first-person shooter (FPS) genre across various video and text foundation models and we report its robustness for domain randomization. BehAVE successfully aligns player behavioral patterns and is able to zero-shot transfer them to multiple unseen FPS games when trained on just one FPS game. In a more challenging setting, BehAVE manages to improve the zero-shot transferability of foundation models to unseen FPS games (up to 22%) even when trained on a game of a different genre (Minecraft). Code and dataset can be found at https://github.com/nrasajski/BehAVE.

Updated: 2024-05-30 21:04:36

标题: 无需模拟器的视觉领域随机化：通过视频游戏

摘要: 领域随机化是一种有效的计算机视觉技术，用于提高视觉模型在视觉上不同的领域之间展示相似内容时的可转移性。然而，现有方法在很大程度上依赖于调整复杂和专门的模拟引擎，这些引擎很难构建，随后影响了它们的可行性和可扩展性。本文介绍了BehAVE，这是一个视频理解框架，独特地利用了现有商业视频游戏的大量资源，进行领域随机化，而无需访问它们的模拟引擎。在BehAVE下，(1) 视频游戏的固有丰富的视觉多样性充当随机化的来源，(2) 玩家行为 -- 通过动作的文本描述语义表示 -- 引导视频与相似内容的*对齐*。我们在25个第一人称射击（FPS）类型的游戏上对BehAVE进行了测试，跨不同视频和文本基础模型，并报告了其领域随机化的稳健性。BehAVE成功地对齐了玩家的行为模式，并能够将它们零次迁移到多个未见过的FPS游戏，即使只在一个FPS游戏上进行了训练。在更具挑战性的情况下，BehAVE设法提高了基础模型对未见过的FPS游戏的零次可转移性（高达22%），即使在不同类型的游戏（Minecraft）上进行了训练。代码和数据集可在https://github.com/nrasajski/BehAVE找到。

更新时间: 2024-05-30 21:04:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.01335v2

Identification and Estimation of Conditional Average Partial Causal Effects via Instrumental Variable

There has been considerable recent interest in estimating heterogeneous causal effects. In this paper, we study conditional average partial causal effects (CAPCE) to reveal the heterogeneity of causal effects with continuous treatment. We provide conditions for identifying CAPCE in an instrumental variable setting. Notably, CAPCE is identifiable under a weaker assumption than required by a commonly used measure for estimating heterogeneous causal effects of continuous treatment. We develop three families of CAPCE estimators: sieve, parametric, and reproducing kernel Hilbert space (RKHS)-based, and analyze their statistical properties. We illustrate the proposed CAPCE estimators on synthetic and real-world data.

Updated: 2024-05-30 21:01:44

标题: 条件平均部分因果效应的识别和估计：通过工具变量

摘要: 最近人们对估计异质因果效应产生了相当大的兴趣。在本文中，我们研究了条件平均部分因果效应（CAPCE），以揭示连续治疗的因果效应的异质性。我们提供了在工具变量设置下识别CAPCE的条件。值得注意的是，CAPCE在一个比通常用于估计连续治疗异质因果效应的指标需要更弱的假设下是可识别的。我们开发了三类CAPCE估计器：筛选、参数和再生核希尔伯特空间（RKHS）-基础，并分析了它们的统计性质。我们在合成数据和真实世界数据上说明了所提出的CAPCE估计器。

更新时间: 2024-05-30 21:01:44

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2401.11130v2

EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens

Masked Video Autoencoder (MVA) approaches have demonstrated their potential by significantly outperforming previous video representation learning methods. However, they waste an excessive amount of computations and memory in predicting uninformative tokens/frames due to random masking strategies. (e.g., over 16 nodes with 128 NVIDIA A100 GPUs). To resolve this issue, we exploit the unequal information density among the patches in videos and propose EVEREST, a surprisingly efficient MVA approach for video representation learning that finds tokens containing rich motion features and discards uninformative ones during both pre-training and fine-tuning. We further present an information-intensive frame selection strategy that allows the model to focus on informative and causal frames with minimal redundancy. Our method significantly reduces the computation and memory requirements of MVA, enabling the pre-training and fine-tuning on a single machine with 8 GPUs while achieving comparable performance to computation- and memory-heavy baselines on multiple benchmarks and the uncurated Ego4D dataset. We hope that our work contributes to reducing the barrier to further research on video understanding.

Updated: 2024-05-30 20:58:39

标题: 珠穆朗玛：通过删除冗余的时空令牌实现高效的遮罩视频自编码器

摘要: 遮蔽视频自动编码器（MVA）方法通过显著优于先前的视频表示学习方法展示了其潜力。然而，由于随机遮罩策略而预测无信息标记/帧而浪费了大量计算资源和内存（例如，使用128个NVIDIA A100 GPU的超过16个节点）。为解决这个问题，我们利用视频中各个补丁之间的不均匀信息密度，并提出了EVEREST，一种令人惊讶的高效MVA方法，用于视频表示学习，它在预训练和微调过程中找到包含丰富运动特征的标记，并丢弃无信息的标记。我们进一步提出了一种信息密集的帧选择策略，使模型能够专注于具有最小冗余的信息和因果关系的帧。我们的方法显著降低了MVA的计算和内存需求，使得可以在单台计算机上使用8个GPU进行预训练和微调，同时在多个基准测试和未经筛选的Ego4D数据集上实现与计算和内存密集基线相当的性能。我们希望我们的工作有助于降低对视频理解进一步研究的障碍。

更新时间: 2024-05-30 20:58:39

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2211.10636v5

Towards Socially and Morally Aware RL agent: Reward Design With LLM

When we design and deploy an Reinforcement Learning (RL) agent, reward functions motivates agents to achieve an objective. An incorrect or incomplete specification of the objective can result in behavior that does not align with human values - failing to adhere with social and moral norms that are ambiguous and context dependent, and cause undesired outcomes such as negative side effects and exploration that is unsafe. Previous work have manually defined reward functions to avoid negative side effects, use human oversight for safe exploration, or use foundation models as planning tools. This work studies the ability of leveraging Large Language Models (LLM)' understanding of morality and social norms on safe exploration augmented RL methods. This work evaluates language model's result against human feedbacks and demonstrates language model's capability as direct reward signals.

Updated: 2024-05-30 20:40:30

标题: 朝着具有社会和道德意识的强化学习代理：利用LLM进行奖励设计

摘要: 在设计和部署强化学习（RL）代理时，奖励函数激励代理实现目标。目标的不正确或不完整规范可能导致与人类价值观不一致的行为 - 不遵守模糊且依赖于上下文的社会和道德规范，并导致负面副作用和不安全的探索等不良结果。先前的工作手动定义奖励函数以避免负面副作用，使用人类监督进行安全探索，或使用基础模型作为规划工具。本研究探讨了利用大型语言模型（LLM）对道德和社会规范的理解在安全探索增强RL方法中的能力。本研究评估了语言模型的结果与人类反馈，并展示了语言模型作为直接奖励信号的能力。

更新时间: 2024-05-30 20:40:30

领域: cs.AI

下载: http://arxiv.org/abs/2401.12459v2

The Power of Few: Accelerating and Enhancing Data Reweighting with Coreset Selection

As machine learning tasks continue to evolve, the trend has been to gather larger datasets and train increasingly larger models. While this has led to advancements in accuracy, it has also escalated computational costs to unsustainable levels. Addressing this, our work aims to strike a delicate balance between computational efficiency and model accuracy, a persisting challenge in the field. We introduce a novel method that employs core subset selection for reweighting, effectively optimizing both computational time and model performance. By focusing on a strategically selected coreset, our approach offers a robust representation, as it efficiently minimizes the influence of outliers. The re-calibrated weights are then mapped back to and propagated across the entire dataset. Our experimental results substantiate the effectiveness of this approach, underscoring its potential as a scalable and precise solution for model training.

Updated: 2024-05-30 20:39:59

标题: 少数的力量：通过核心选择加速和增强数据重新加权

摘要: 随着机器学习任务的不断发展，趋势是收集更大的数据集并训练越来越大的模型。虽然这导致了准确性的提高，但也使计算成本升至不可持续的水平。针对这一问题，我们的工作旨在在计算效率和模型准确性之间取得微妙的平衡，这是该领域中一直存在的挑战。我们引入了一种新颖的方法，利用核心子集选择进行重新加权，有效优化了计算时间和模型性能。通过专注于一个战略选择的核心子集，我们的方法提供了一个强大的表示，因为它有效地最小化了异常值的影响。重新校准的权重然后映射回整个数据集并传播。我们的实验结果证实了这种方法的有效性，强调了其作为一个可扩展和精确的模型训练解决方案的潜力。

更新时间: 2024-05-30 20:39:59

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.12166v3

Extending the Massive Text Embedding Benchmark to French

In recent years, numerous embedding models have been made available and widely used for various NLP tasks. Choosing a model that performs well for several tasks in English has been largely simplified by the Massive Text Embedding Benchmark (MTEB), but extensions to other languages remain challenging. This is why we expand MTEB to propose the first massive benchmark of sentence embeddings for French. Not only we gather 22 existing datasets in an easy-to-use interface, but we also create three new French datasets for a global evaluation over 8 different tasks. We perform a large scale comparison with 46 carefully selected embedding models, conduct comprehensive statistical tests, and analyze the correlation between model performance and many of their characteristics. We find out that even if no model is the best on all tasks, large multilingual models pre-trained on sentence similarity perform particularly well. Our work comes with open-source code, new datasets and a public leaderboard.

Updated: 2024-05-30 20:34:37

标题: 将大规模文本嵌入基准扩展到法语

摘要: 在最近几年，许多嵌入模型已经被开发并广泛用于各种自然语言处理任务。通过大规模文本嵌入基准测试（MTEB），在英语中选择在多个任务中表现良好的模型已经大大简化，但对其他语言的扩展仍然具有挑战性。因此，我们将MTEB扩展到法语，提出了第一个用于法语句子嵌入的大规模基准测试。我们不仅收集了22个现有数据集，并提供了易于使用的界面，还为全面评估8种不同任务创建了三个新的法语数据集。我们与46个精心选择的嵌入模型进行了大规模比较，进行了全面的统计测试，并分析了模型性能与许多特征之间的相关性。我们发现，即使没有模型在所有任务中表现最佳，但在句子相似性上预训练的大型多语言模型表现特别好。我们的工作提供了开源代码、新数据集和一个公开的排行榜。

更新时间: 2024-05-30 20:34:37

领域: cs.CL,cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.20468v1

Performance of NPG in Countable State-Space Average-Cost RL

We consider policy optimization methods in reinforcement learning settings where the state space is arbitrarily large, or even countably infinite. The motivation arises from control problems in communication networks, matching markets, and other queueing systems. We consider Natural Policy Gradient (NPG), which is a popular algorithm for finite state spaces. Under reasonable assumptions, we derive a performance bound for NPG that is independent of the size of the state space, provided the error in policy evaluation is within a factor of the true value function. We obtain this result by establishing new policy-independent bounds on the solution to Poisson's equation, i.e., the relative value function, and by combining these bounds with previously known connections between MDPs and learning from experts.

Updated: 2024-05-30 20:29:52

标题: NPG在可数状态空间平均成本强化学习中的性能

摘要: 我们考虑在强化学习设置中的政策优化方法，其中状态空间可以是任意大，甚至是可数无限的。这种动机源于通信网络、匹配市场和其他排队系统中的控制问题。我们考虑自然政策梯度（NPG），这是一种用于有限状态空间的流行算法。在合理的假设下，我们推导出了一个性能界限，该界限与状态空间的大小无关，只要政策评估的误差在真实价值函数的一个因子内。我们通过建立对泊松方程解的新政策无关界限，即相对价值函数，并将这些界限与先前已知的MDP与专家学习之间的联系相结合，从而获得这一结果。

更新时间: 2024-05-30 20:29:52

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.20467v1

Variance reduction techniques for stochastic proximal point algorithms

In the context of finite sums minimization, variance reduction techniques are widely used to improve the performance of state-of-the-art stochastic gradient methods. Their practical impact is clear, as well as their theoretical properties. Stochastic proximal point algorithms have been studied as an alternative to stochastic gradient algorithms since they are more stable with respect to the choice of the stepsize but their variance reduced versions are not as studied as the gradient ones. In this work, we propose the first unified study of variance reduction techniques for stochastic proximal point algorithms. We introduce a generic stochastic proximal algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants for smooth and convex functions. We provide several convergence results for the iterates and the objective function values. In addition, under the Polyak-{\L}ojasiewicz (PL) condition, we obtain linear convergence rates for the iterates and the function values. Our numerical experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts, especially about the stability with respect to the choice of the stepsize for difficult problems.

Updated: 2024-05-30 20:26:54

标题: 随机近端点算法的方差减少技术

摘要: 在有限和最小化的背景下，方差减少技术被广泛应用于提高最先进的随机梯度方法的性能。它们的实际影响显而易见，以及它们的理论特性。自从随机梯度算法以来，随机近端点算法被研究作为一种替代方法，因为它们对步长的选择更稳定，但它们的方差减少版本没有像梯度算法一样被研究得那么多。在这项工作中，我们提出了对随机近端点算法的方差减少技术的第一次统一研究。我们引入了一个通用的随机近端算法，可以指定为给出SVRG、SAGA的近端版本，以及它们的一些变体，适用于光滑和凸函数。我们为迭代和目标函数值提供了几个收敛结果。此外，在Polyak-Lojasiewicz（PL）条件下，我们获得了迭代和函数值的线性收敛速度。我们的数值实验表明，与梯度对应物相比，近端方差减少方法在选择困难问题的步长方面尤其稳定的优势。

更新时间: 2024-05-30 20:26:54

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2308.09310v2

ENTIRe-ID: An Extensive and Diverse Dataset for Person Re-Identification

The growing importance of person reidentification in computer vision has highlighted the need for more extensive and diverse datasets. In response, we introduce the ENTIRe-ID dataset, an extensive collection comprising over 4.45 million images from 37 different cameras in varied environments. This dataset is uniquely designed to tackle the challenges of domain variability and model generalization, areas where existing datasets for person re-identification have fallen short. The ENTIRe-ID dataset stands out for its coverage of a wide array of real-world scenarios, encompassing various lighting conditions, angles of view, and diverse human activities. This design ensures a realistic and robust training platform for ReID models. The ENTIRe-ID dataset is publicly available at https://serdaryildiz.github.io/ENTIRe-ID

Updated: 2024-05-30 20:26:47

标题: ENTIRe-ID: 一个广泛和多样化的用于人员再识别的数据集

摘要: 随着人物再识别在计算机视觉中的重要性不断增长，对更广泛和多样化的数据集的需求日益凸显。作为回应，我们引入了ENTIRe-ID数据集，这是一个庞大的集合，包括来自37个不同摄像头在不同环境中的超过445万张图片。该数据集独特地设计用于解决领域差异和模型泛化的挑战，这是现有用于人物再识别的数据集存在不足的领域。ENTIRe-ID数据集在涵盖各种真实世界场景方面脱颖而出，包括各种光照条件、视角和多样化的人类活动。这一设计确保了对ReID模型的真实和强大的训练平台。ENTIRe-ID数据集可以在https://serdaryildiz.github.io/ENTIRe-ID公开获取。

更新时间: 2024-05-30 20:26:47

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20465v1

Scaling Laws for the Value of Individual Data Points in Machine Learning

Recent works have shown that machine learning models improve at a predictable rate with the total amount of training data, leading to scaling laws that describe the relationship between error and dataset size. These scaling laws can help design a model's training dataset, but they typically take an aggregate view of the data by only considering the dataset's size. We introduce a new perspective by investigating scaling behavior for the value of individual data points: we find that a data point's contribution to model's performance shrinks predictably with the size of the dataset in a log-linear manner. Interestingly, there is significant variability in the scaling exponent among different data points, indicating that certain points are more valuable in small datasets while others are relatively more useful as a part of large datasets. We provide learning theory to support our scaling law, and we observe empirically that it holds across diverse model classes. We further propose a maximum likelihood estimator and an amortized estimator to efficiently learn the individualized scaling behaviors from a small number of noisy observations per data point. Using our estimators, we provide insights into factors that influence the scaling behavior of different data points. Finally, we demonstrate applications of the individualized scaling laws to data valuation and data subset selection. Overall, our work represents a first step towards understanding and utilizing scaling properties for the value of individual data points.

Updated: 2024-05-30 20:10:24

标题: 《机器学习中个体数据点价值的标度定律》

摘要: 最近的研究表明，机器学习模型随着训练数据的总量呈可预测的改善趋势，导致了描述误差和数据集大小关系的扩展定律。这些扩展定律可以帮助设计模型的训练数据集，但它们通常只考虑数据集的大小，从整体上看待数据。我们引入了一个新的视角，研究个别数据点价值的扩展行为：我们发现一个数据点对模型性能的贡献会以对数线性方式随数据集大小的增加而可预测地缩小。有趣的是，不同数据点之间的扩展指数有显著的变异性，表明某些数据点在小数据集中更有价值，而其他数据点相对更有用作为大数据集的一部分。我们提供学习理论来支持我们的扩展定律，并经验性地观察到它适用于各种模型类别。我们进一步提出了一个最大似然估计器和一个摊销估计器，以有效地从每个数据点的少量嘈杂观察中学习个性化的扩展行为。利用我们的估计器，我们揭示了影响不同数据点扩展行为的因素。最后，我们展示了个性化扩展定律在数据估值和数据子集选择方面的应用。总的来说，我们的工作代表了理解和利用个别数据点价值的扩展属性的第一步。

更新时间: 2024-05-30 20:10:24

领域: cs.LG

下载: http://arxiv.org/abs/2405.20456v1

Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning

Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-broken by fine-tuning on a dataset mixed with harmful data. First time in the literature, we show that the jail-broken effect can be mitigated by separating states in the finetuning stage to optimize the alignment and user datasets. Unfortunately, our subsequent study shows that this simple Bi-State Optimization (BSO) solution experiences convergence instability when steps invested in its alignment state is too small, leading to downgraded alignment performance. By statistical analysis, we show that the \textit{excess drift} towards consensus could be a probable reason for the instability. To remedy this issue, we propose \textbf{L}azy(\textbf{i}) \textbf{s}afety \textbf{a}lignment (\textbf{Lisa}), which introduces a proximal term to constraint the drift of each state. Theoretically, the benefit of the proximal term is supported by the convergence analysis, wherein we show that a sufficient large proximal factor is necessary to guarantee Lisa's convergence. Empirically, our results on four downstream finetuning tasks show that Lisa with a proximal term can significantly increase alignment performance while maintaining the LLM's accuracy on the user tasks. Code is available at \url{https://github.com/git-disl/Lisa}.

Updated: 2024-05-30 20:03:37

标题: 大型语言模型的惰性安全对齐：抵抗有害微调

摘要: 最近的研究表明，具有安全对齐的大型语言模型（LLMs）可以通过在混合有害数据的数据集上进行微调而被越狱。文献中首次提出，我们展示了通过在微调阶段分离状态来优化对齐和用户数据集可以减轻越狱效应。不幸的是，我们随后的研究显示，当在对齐状态中投入的步骤过小时，这种简单的双状态优化（BSO）解决方案会经历收敛不稳定性，导致对齐性能下降。通过统计分析，我们表明朝向共识的“过度漂移”可能是不稳定性的一个可能原因。为了解决这个问题，我们提出了\textbf{L}azy(\textbf{i}) \textbf{s}afety \textbf{a}lignment（\textbf{Lisa}），它引入了一个约束每个状态漂移的近端项。从理论上讲，近端项的好处得到了收敛性分析的支持，我们展示了一个足够大的近端因子对于保证Lisa的收敛性是必要的。从经验上看，我们在四个下游微调任务上的结果表明，具有近端项的Lisa可以显著提高对齐性能，同时保持LLM在用户任务上的准确性。代码可在\url{https://github.com/git-disl/Lisa}上找到。

更新时间: 2024-05-30 20:03:37

领域: cs.LG

下载: http://arxiv.org/abs/2405.18641v2

Learning Optimal Graph Filters for Clustering of Attributed Graphs

Many real-world systems can be represented as graphs where the different entities in the system are presented by nodes and their interactions by edges. An important task in studying large datasets with graphical structure is graph clustering. While there has been a lot of work on graph clustering using the connectivity between the nodes, many real-world networks also have node attributes. Clustering attributed graphs requires joint modeling of graph structure and node attributes. Recent work has focused on combining these two complementary sources of information through graph convolutional networks and graph filtering. However, these methods are mostly limited to lowpass filtering and do not explicitly learn the filter parameters for the clustering task. In this paper, we introduce a graph signal processing based approach, where we learn the parameters of Finite Impulse Response (FIR) and Autoregressive Moving Average (ARMA) graph filters optimized for clustering. The proposed approach is formulated as a two-step iterative optimization problem, focusing on learning interpretable graph filters that are optimal for the given data and that maximize the separation between different clusters. The proposed approach is evaluated on attributed networks and compared to the state-of-the-art methods.

Updated: 2024-05-30 20:01:41

标题: 学习属性图聚类的最优图滤波器

摘要: 许多现实世界的系统可以表示为图形，其中系统中的不同实体由节点表示，它们的相互作用由边表示。在研究具有图形结构的大型数据集时，图聚类是一个重要任务。虽然在使用节点之间的连接进行图聚类方面已经进行了大量工作，但许多现实世界的网络也具有节点属性。对属性图进行聚类需要对图结构和节点属性进行联合建模。最近的工作集中于通过图卷积网络和图滤波器结合这两种互补信息源。然而，这些方法大多限制于低通滤波，并且并没有明确学习用于聚类任务的滤波器参数。在本文中，我们介绍了一种基于图信号处理的方法，其中我们学习了用于聚类的有限冲激响应（FIR）和自回归移动平均（ARMA）图滤波器的参数。所提出的方法被制定为一个两步迭代优化问题，重点是学习对于给定数据最佳且最大化不同群集之间分离的可解释图滤波器。所提出的方法在属性网络上进行评估，并与最先进的方法进行比较。

更新时间: 2024-05-30 20:01:41

领域: cs.LG,cs.SI,eess.SP

下载: http://arxiv.org/abs/2211.04634v2

Understanding Encoder-Decoder Structures in Machine Learning Using Information Measures

We present new results to model and understand the role of encoder-decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss (MIL), to represent predictive structures in machine learning. Our first main result provides a functional expression that characterizes the class of probabilistic models consistent with an IS encoder-decoder latent predictive structure. This result formally justifies the encoder-decoder forward stages many modern ML architectures adopt to learn latent (compressed) representations for classification. To illustrate IS as a realistic and relevant model assumption, we revisit some known ML concepts and present some interesting new examples: invariant, robust, sparse, and digital models. Furthermore, our IS characterization allows us to tackle the fundamental question of how much performance (predictive expressiveness) could be lost, using the cross entropy risk, when a given encoder-decoder architecture is adopted in a learning setting. Here, our second main result shows that a mutual information loss quantifies the lack of expressiveness attributed to the choice of a (biased) encoder-decoder ML design. Finally, we address the problem of universal cross-entropy learning with an encoder-decoder design where necessary and sufficiency conditions are established to meet this requirement. In all these results, Shannon's information measures offer new interpretations and explanations for representation learning.

Updated: 2024-05-30 19:58:01

标题: 理解机器学习中的编码器-解码器结构：使用信息度量

摘要: 我们提出了新的结果，以信息论的角度对编码器-解码器设计在机器学习（ML）中的作用进行建模和理解。我们使用两个主要的信息概念，信息充分性（IS）和互信息损失（MIL），来表示机器学习中的预测结构。我们的第一个主要结果提供了一个功能表达式，表征与IS编码器-解码器潜在预测结构一致的概率模型类别。这个结果正式证明了现代许多ML架构采用的学习潜在（压缩）表示以进行分类的编码器-解码器前向阶段。为了说明IS作为一个现实和相关的模型假设，我们重新审视了一些已知的ML概念，并提出了一些有趣的新例子：不变、稳健、稀疏和数字模型。此外，我们的IS表征使我们能够解决一个基本问题，即在学习设置中采用给定的编码器-解码器架构时，可能会损失多少性能（预测表达能力），使用交叉熵风险来衡量。在这里，我们的第二个主要结果表明，互信息损失量化了由于选择（偏差）编码器-解码器ML设计而导致的表达能力不足。最后，我们讨论了使用编码器-解码器设计进行通用交叉熵学习的问题，建立了满足这一要求的必要和充分条件。在所有这些结果中，香农的信息度量提供了对表示学习的新解释和说明。

更新时间: 2024-05-30 19:58:01

领域: cs.LG,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2405.20452v1

Statistical Properties of Robust Satisficing

The Robust Satisficing (RS) model is an emerging approach to robust optimization, offering streamlined procedures and robust generalization across various applications. However, the statistical theory of RS remains unexplored in the literature. This paper fills in the gap by comprehensively analyzing the theoretical properties of the RS model. Notably, the RS structure offers a more straightforward path to deriving statistical guarantees compared to the seminal Distributionally Robust Optimization (DRO), resulting in a richer set of results. In particular, we establish two-sided confidence intervals for the optimal loss without the need to solve a minimax optimization problem explicitly. We further provide finite-sample generalization error bounds for the RS optimizer. Importantly, our results extend to scenarios involving distribution shifts, where discrepancies exist between the sampling and target distributions. Our numerical experiments show that the RS model consistently outperforms the baseline empirical risk minimization in small-sample regimes and under distribution shifts. Furthermore, compared to the DRO model, the RS model exhibits lower sensitivity to hyperparameter tuning, highlighting its practicability for robustness considerations.

Updated: 2024-05-30 19:57:28

标题: "稳健满足性的统计特性"

摘要: Robust Satisficing (RS)模型是一种新兴的鲁棒优化方法，提供了简化的流程和在各种应用中的鲁棒泛化能力。然而，文献中对RS的统计理论尚未被探讨。本文通过全面分析RS模型的理论特性填补了这一空白。值得注意的是，与传统的分布鲁棒优化(DRO)相比，RS结构提供了一个更直接的路径来推导统计保证，从而产生了更丰富的结果。特别地，我们建立了最优损失的双侧置信区间，无需明确解决最小最大优化问题。我们进一步为RS优化器提供了有限样本泛化误差界限。重要的是，我们的结果扩展到涉及分布变化的情景，即采样和目标分布之间存在差异的情况。我们的数值实验表明，在小样本情况和分布变化下，RS模型始终优于基线的经验风险最小化。此外，与DRO模型相比，RS模型对于超参数调整的敏感性较低，突显了其在鲁棒性考虑方面的实用性。

更新时间: 2024-05-30 19:57:28

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2405.20451v1

Contrastive Learning and Mixture of Experts Enables Precise Vector Embeddings

The advancement of transformer neural networks has significantly elevated the capabilities of sentence similarity models, but they struggle with highly discriminative tasks and produce sub-optimal representations of important documents like scientific literature. With the increased reliance on retrieval augmentation and search, representing diverse documents as concise and descriptive vectors is crucial. This paper improves upon the vectors embeddings of scientific literature by assembling niche datasets using co-citations as a similarity metric, focusing on biomedical domains. We apply a novel Mixture of Experts (MoE) extension pipeline to pretrained BERT models, where every multi-layer perceptron section is enlarged and copied into multiple distinct experts. Our MoE variants perform well over $N$ scientific domains with $N$ dedicated experts, whereas standard BERT models excel in only one domain. Notably, extending just a single transformer block to MoE captures 85% of the benefit seen from full MoE extension at every layer. This holds promise for versatile and efficient One-Size-Fits-All transformer networks for numerically representing diverse inputs. Our methodology marks significant advancements in representing scientific text and holds promise for enhancing vector database search and compilation.

Updated: 2024-05-30 19:55:58

标题: 对比学习和专家混合使得向量嵌入更加精确

摘要: 随着变压器神经网络的发展，句子相似性模型的能力得到了显著提升，但它们在处理高度区分性任务时表现不佳，并且产生了对重要文献（如科学文献）的次优表示。随着对检索增强和搜索的依赖增加，将各种文档表示为简洁和描述性向量至关重要。本文通过使用共引用作为相似度度量来组装具有特定领域数据集的科学文献的向量嵌入，重点放在生物医学领域。我们将新颖的专家混合（MoE）扩展管道应用于预训练的BERT模型，其中每个多层感知器部分都被扩大并复制到多个不同的专家中。我们的MoE变体在N个科学领域中表现良好，其中N个专门的专家，而标准的BERT模型只在一个领域表现出色。值得注意的是，仅将一个transformer块扩展到MoE中，就能捕捉到每层完整MoE扩展所见益处的85%。这为多样化和高效的一体化变压器网络带来了希望，用于数值表示各种输入。我们的方法在表示科学文本方面取得了重大进展，并有望提升向量数据库搜索和编制的能力。

更新时间: 2024-05-30 19:55:58

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2401.15713v2

Decentralized AI: Permissionless LLM Inference on POKT Network

POKT Network's decentralized Remote Procedure Call (RPC) infrastructure, surpassing 740 billion requests since launching on MainNet in 2020, is well-positioned to extend into providing AI inference services with minimal design or implementation modifications. This litepaper illustrates how the network's open-source and permissionless design aligns incentives among model researchers, hardware operators, API providers and users whom we term model Sources, Suppliers, Gateways and Applications respectively. Through its Relay Mining algorithm, POKT creates a transparent marketplace where costs and earnings directly reflect cryptographically verified usage. This decentralized framework offers large model AI researchers a new avenue to disseminate their work and generate revenue without the complexities of maintaining infrastructure or building end-user products. Supply scales naturally with demand, as evidenced in recent years and the protocol's free market dynamics. POKT Gateways facilitate network growth, evolution, adoption, and quality by acting as application-facing load balancers, providing value-added features without managing LLM nodes directly. This vertically decoupled network, battle tested over several years, is set up to accelerate the adoption, operation, innovation and financialization of open-source models. It is the first mature permissionless network whose quality of service competes with centralized entities set up to provide application grade inference.

Updated: 2024-05-30 19:50:07

标题: 去中心化人工智能：在POKT网络上无需许可的LLM推理

摘要: POKT Network的去中心化远程过程调用（RPC）基础设施自2020年在MainNet上推出以来，已经超过了7400亿次请求，具备良好的条件来扩展提供人工智能推理服务，几乎不需要设计或实现修改。这份轻量级白皮书展示了网络的开源和无权限设计如何在模型研究人员、硬件运营商、API提供商和用户之间对齐激励，我们将他们分别称为模型来源、供应商、网关和应用程序。通过其Relay Mining算法，POKT创建了一个透明的市场，成本和收入直接反映了经过加密验证的使用情况。这种去中心化框架为大型模型AI研究人员提供了一个新的途径来传播他们的工作并生成收入，而无需维护基础设施或构建终端用户产品的复杂性。供应与需求自然地相匹配，正如近年来和协议的自由市场动态所证明的那样。POKT网关通过充当面向应用负载均衡器，提供增值功能而无需直接管理LLM节点，促进网络的增长、演进、采用和质量。这种经过数年考验的垂直解耦网络旨在加速开源模型的采用、运营、创新和金融化。它是第一个成熟的无权限网络，其服务质量与设立提供应用级推理的中心化实体相媲美。

更新时间: 2024-05-30 19:50:07

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2405.20450v1

Algorithmic Fairness in Performative Policy Learning: Escaping the Impossibility of Group Fairness

In many prediction problems, the predictive model affects the distribution of the prediction target. This phenomenon is known as performativity and is often caused by the behavior of individuals with vested interests in the outcome of the predictive model. Although performativity is generally problematic because it manifests as distribution shifts, we develop algorithmic fairness practices that leverage performativity to achieve stronger group fairness guarantees in social classification problems (compared to what is achievable in non-performative settings). In particular, we leverage the policymaker's ability to steer the population to remedy inequities in the long term. A crucial benefit of this approach is that it is possible to resolve the incompatibilities between conflicting group fairness definitions.

Updated: 2024-05-30 19:46:47

标题: 表演政策学习中的算法公平性：摆脱群体公平性的不可能性

摘要: 在许多预测问题中，预测模型会影响预测目标的分布。这种现象被称为表现性，并经常是由于对预测模型结果有利益关系的个人的行为所引起的。虽然表现性通常是有问题的，因为它表现为分布变化，但我们开发了算法公平实践，利用表现性在社会分类问题中实现更强的群体公平保证（相比于非表现性环境中可以实现的情况）。特别是，我们利用决策者的能力引导人口以长期解决不平等问题。这种方法的一个关键好处是可以解决不同群体公平定义之间的冲突。

更新时间: 2024-05-30 19:46:47

领域: stat.ML,cs.CY,cs.LG

下载: http://arxiv.org/abs/2405.20447v1

Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) systems have shown great promise in natural language processing. However, their reliance on data stored in a retrieval database, which may contain proprietary or sensitive information, introduces new privacy concerns. Specifically, an attacker may be able to infer whether a certain text passage appears in the retrieval database by observing the outputs of the RAG system, an attack known as a Membership Inference Attack (MIA). Despite the significance of this threat, MIAs against RAG systems have yet remained under-explored. This study addresses this gap by introducing an efficient and easy-to-use method for conducting MIA against RAG systems. We demonstrate the effectiveness of our attack using two benchmark datasets and multiple generative models, showing that the membership of a document in the retrieval database can be efficiently determined through the creation of an appropriate prompt in both black-box and gray-box settings. Our findings highlight the importance of implementing security countermeasures in deployed RAG systems to protect the privacy and security of retrieval databases.

Updated: 2024-05-30 19:46:36

标题: 我的数据是否在您的检索数据库中？对受检索增强生成的成员推断攻击

摘要: 检索增强生成（RAG）系统在自然语言处理中显示出了极大的潜力。然而，它们对存储在检索数据库中的数据的依赖性可能包含专有或敏感信息，引入了新的隐私问题。具体来说，攻击者可能通过观察RAG系统的输出来推断某个文本段落是否出现在检索数据库中，这种攻击被称为成员推断攻击（MIA）。尽管这种威胁的重要性，针对RAG系统的MIA研究仍然较少。本研究通过引入一种高效且易于使用的方法来进行针对RAG系统的MIA，填补了这一空白。我们使用两个基准数据集和多个生成模型展示了我们攻击的有效性，表明可以通过在黑盒和灰盒设置中创建适当提示来有效确定文档在检索数据库中的成员资格。我们的发现凸显了在部署的RAG系统中实施安全对策以保护检索数据库的隐私和安全性的重要性。

更新时间: 2024-05-30 19:46:36

领域: cs.CR,cs.AI,cs.LG,I.2; K.6.5

下载: http://arxiv.org/abs/2405.20446v1

Efficient Prompt Optimization Through the Lens of Best Arm Identification

The remarkable instruction-following capability of large language models (LLMs) has sparked a growing interest in automatically finding good prompts, i.e., prompt optimization. Most existing works follow the scheme of selecting from a pre-generated pool of candidate prompts. However, these designs mainly focus on the generation strategy, while limited attention has been paid to the selection method. Especially, the cost incurred during the selection (e.g., accessing LLM and evaluating the responses) is rarely explicitly considered. To overcome this limitation, this work provides a principled framework, TRIPLE, to efficiently perform prompt selection under an explicit budget constraint. TRIPLE is built on a novel connection established between prompt optimization and fixed-budget best arm identification (BAI-FB) in multi-armed bandits (MAB); thus, it is capable of leveraging the rich toolbox from BAI-FB systematically and also incorporating unique characteristics of prompt optimization. Extensive experiments on multiple well-adopted tasks using various LLMs demonstrate the remarkable performance improvement of TRIPLE over baselines while satisfying the limited budget constraints. As an extension, variants of TRIPLE are proposed to efficiently select examples for few-shot prompts, also achieving superior empirical performance.

Updated: 2024-05-30 19:40:21

标题: 通过最佳臂识别的视角，高效的提示优化

摘要: 大型语言模型（LLMs）具有显著的指令遵循能力，引发了对自动寻找良好提示（即提示优化）的兴趣不断增长。大多数现有工作都遵循从预先生成的候选提示池中选择的方案。然而，这些设计主要集中在生成策略上，而对选择方法的关注有限。特别是，在选择过程中产生的成本（例如，访问LLM和评估响应）很少被明确考虑。为了克服这一局限性，本文提出了一个有原则的框架TRIPLE，以在明确预算约束下高效地执行提示选择。TRIPLE建立在提示优化与多臂老虎机中固定预算最佳臂识别（BAI-FB）之间建立的新联系上；因此，它能够系统地利用BAI-FB的丰富工具箱，并结合提示优化的独特特征。通过在多个广泛采用的任务上使用各种LLMs进行的大量实验，展示了TRIPLE相对于基线的显著性能改进，同时满足了有限的预算约束。作为扩展，提出了TRIPLE的变体，以有效地选择少样本提示，并实现卓越的实证性能。

更新时间: 2024-05-30 19:40:21

领域: stat.ML,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.09723v3

Differentially Private Data Generation with Missing Data

Despite several works that succeed in generating synthetic data with differential privacy (DP) guarantees, they are inadequate for generating high-quality synthetic data when the input data has missing values. In this work, we formalize the problems of DP synthetic data with missing values and propose three effective adaptive strategies that significantly improve the utility of the synthetic data on four real-world datasets with different types and levels of missing data and privacy requirements. We also identify the relationship between privacy impact for the complete ground truth data and incomplete data for these DP synthetic data generation algorithms. We model the missing mechanisms as a sampling process to obtain tighter upper bounds for the privacy guarantees to the ground truth data. Overall, this study contributes to a better understanding of the challenges and opportunities for using private synthetic data generation algorithms in the presence of missing data.

Updated: 2024-05-30 19:38:24

标题: 具有缺失数据的差分隐私数据生成

摘要: 尽管有几项工作成功地生成了具有差分隐私（DP）保证的合成数据，但当输入数据存在缺失值时，它们无法生成高质量的合成数据。在这项工作中，我们形式化了具有缺失值的DP合成数据的问题，并提出了三种有效的自适应策略，显著改善了四个真实世界数据集上合成数据的效用，这些数据集具有不同类型和不同水平的缺失数据和隐私要求。我们还确定了完整的地面真相数据对这些DP合成数据生成算法的隐私影响与不完整数据之间的关系。我们将缺失机制建模为一个抽样过程，以获得对地面真实数据的隐私保证的更紧密的上界。总体而言，这项研究有助于更好地理解在存在缺失数据的情况下使用私人合成数据生成算法所面临的挑战和机遇。

更新时间: 2024-05-30 19:38:24

领域: cs.DB,cs.CR

下载: http://arxiv.org/abs/2310.11548v2

SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory

Large Language Models (LLMs) have demonstrated potential in cybersecurity applications but have also caused lower confidence due to problems like hallucinations and a lack of truthfulness. Existing benchmarks provide general evaluations but do not sufficiently address the practical and applied aspects of LLM performance in cybersecurity-specific tasks. To address this gap, we introduce the SECURE (Security Extraction, Understanding \& Reasoning Evaluation), a benchmark designed to assess LLMs performance in realistic cybersecurity scenarios. SECURE includes six datasets focussed on the Industrial Control System sector to evaluate knowledge extraction, understanding, and reasoning based on industry-standard sources. Our study evaluates seven state-of-the-art models on these tasks, providing insights into their strengths and weaknesses in cybersecurity contexts, and offer recommendations for improving LLMs reliability as cyber advisory tools.

Updated: 2024-05-30 19:35:06

标题: SECURE：用于网络安全咨询的生成式大型语言模型的基准测试

摘要: 大型语言模型（LLMs）已经在网络安全应用中展现出潜力，但也因出现幻觉和缺乏真实性等问题导致了较低的信心。现有的基准提供了一般性评估，但并未充分解决LLMs在网络安全特定任务中的实际应用方面的性能。为了填补这一空白，我们引入了SECURE（安全提取、理解和推理评估），这是一个旨在评估LLMs在现实网络安全场景中表现的基准。SECURE包括六个数据集，侧重于工业控制系统领域，以评估基于行业标准来源的知识提取、理解和推理。我们的研究评估了七种最先进的模型在这些任务上的表现，为改进LLMs作为网络安全咨询工具的可靠性提供了见解，并提出了建议。

更新时间: 2024-05-30 19:35:06

领域: cs.CR,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.20441v1

LIA: Privacy-Preserving Data Quality Evaluation in Federated Learning Using a Lazy Influence Approximation

In Federated Learning, it is crucial to handle low-quality, corrupted, or malicious data. However, traditional data valuation methods are not suitable due to privacy concerns. To address this, we propose a simple yet effective approach that utilizes a new influence approximation called "lazy influence" to filter and score data while preserving privacy. To do this, each participant uses their own data to estimate the influence of another participant's batch and sends a differentially private obfuscated score to the central coordinator. Our method has been shown to successfully filter out biased and corrupted data in various simulated and real-world settings, achieving a recall rate of over $>90\%$ (sometimes up to $100\%$) while maintaining strong differential privacy guarantees with $\varepsilon \leq 1$.

Updated: 2024-05-30 19:33:40

标题: LIA：使用懒惰影响近似在联邦学习中进行隐私保护的数据质量评估

摘要: 在联邦学习中，处理低质量、损坏或恶意数据至关重要。然而，由于隐私问题，传统的数据估值方法并不适用。为了解决这个问题，我们提出了一种简单而有效的方法，利用一种称为“懒惰影响”的新影响近似来过滤和评分数据，同时保护隐私。为了做到这一点，每个参与者使用自己的数据来估计另一个参与者批次的影响，并向中央协调员发送一个差分隐私模糊化得分。我们的方法已被证明能够成功地在各种模拟和实际环境中过滤出有偏见和损坏的数据，实现了超过90%的召回率（有时甚至达到100%），同时保持强大的差分隐私保证，$\varepsilon \leq 1$。

更新时间: 2024-05-30 19:33:40

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2205.11518v3

Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning

Sharpness-Aware Minimization (SAM) has emerged as a promising alternative optimizer to stochastic gradient descent (SGD). The originally-proposed motivation behind SAM was to bias neural networks towards flatter minima that are believed to generalize better. However, recent studies have shown conflicting evidence on the relationship between flatness and generalization, suggesting that flatness does fully explain SAM's success. Sidestepping this debate, we identify an orthogonal effect of SAM that is beneficial out-of-distribution: we argue that SAM implicitly balances the quality of diverse features. SAM achieves this effect by adaptively suppressing well-learned features which gives remaining features opportunity to be learned. We show that this mechanism is beneficial in datasets that contain redundant or spurious features where SGD falls for the simplicity bias and would not otherwise learn all available features. Our insights are supported by experiments on real data: we demonstrate that SAM improves the quality of features in datasets containing redundant or spurious features, including CelebA, Waterbirds, CIFAR-MNIST, and DomainBed.

Updated: 2024-05-30 19:32:56

标题: 锐度感知最小化通过平衡学习增强特征质量

摘要: 锋利感知最小化（SAM）已经成为随机梯度下降（SGD）的一种有希望的替代优化器。最初提出SAM的动机是将神经网络偏向于更平缓的极小值，这被认为具有更好的泛化能力。然而，最近的研究显示了关于平坦性和泛化之间关系的矛盾证据，表明平坦性并不能完全解释SAM的成功。避开这个争论，我们确定了SAM的一个对分布外有益的正交效应：我们认为SAM隐式平衡了多样特征的质量。SAM通过自适应抑制已学习好的特征来实现这种效应，从而使剩余的特征有机会被学习。我们展示了这种机制在包含冗余或虚假特征的数据集中是有益的，而SGD会受到简单偏差的影响，否则不会学习所有可用特征。我们的见解得到了在真实数据上的实验证明：我们展示了SAM提高了包含冗余或虚假特征的数据集中特征的质量，包括CelebA，Waterbirds，CIFAR-MNIST和DomainBed。

更新时间: 2024-05-30 19:32:56

领域: cs.LG

下载: http://arxiv.org/abs/2405.20439v1

Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design

Most reinforcement learning practitioners evaluate their policies with online Monte Carlo estimators for either hyperparameter tuning or testing different algorithmic design choices, where the policy is repeatedly executed in the environment to get the average outcome. Such massive interactions with the environment are prohibitive in many scenarios. In this paper, we propose novel methods that improve the data efficiency of online Monte Carlo estimators while maintaining their unbiasedness. We first propose a tailored closed-form behavior policy that provably reduces the variance of an online Monte Carlo estimator. We then design efficient algorithms to learn this closed-form behavior policy from previously collected offline data. Theoretical analysis is provided to characterize how the behavior policy learning error affects the amount of reduced variance. Compared with previous works, our method achieves better empirical performance in a broader set of environments, with fewer requirements for offline data.

Updated: 2024-05-30 19:32:53

标题: 具有离线数据的高效政策评估与行为政策设计

摘要: 大多数强化学习从业者使用在线蒙特卡洛估计器评估其策略，用于超参数调整或测试不同的算法设计选择，其中策略在环境中被重复执行以获得平均结果。在许多情况下，与环境的大量交互是禁止的。在本文中，我们提出了一种新颖的方法，可以提高在线蒙特卡洛估计器的数据效率，同时保持其无偏性。我们首先提出了一个定制的封闭形式行为策略，可以明显减少在线蒙特卡洛估计器的方差。然后我们设计了有效的算法来从以前收集的离线数据中学习这个封闭形式的行为策略。理论分析表明，行为策略学习误差如何影响减少的方差量。与以前的研究相比，我们的方法在更广泛的环境中实现了更好的实证性能，对离线数据的要求更少。

更新时间: 2024-05-30 19:32:53

领域: cs.LG

下载: http://arxiv.org/abs/2301.13734v4

Deep Learning for Computing Convergence Rates of Markov Chains

Convergence rate analysis for general state-space Markov chains is fundamentally important in areas such as Markov chain Monte Carlo and algorithmic analysis (for computing explicit convergence bounds). This problem, however, is notoriously difficult because traditional analytical methods often do not generate practically useful convergence bounds for realistic Markov chains. We propose the Deep Contractive Drift Calculator (DCDC), the first general-purpose sample-based algorithm for bounding the convergence of Markov chains to stationarity in Wasserstein distance. The DCDC has two components. First, inspired by the new convergence analysis framework in (Qu et.al, 2023), we introduce the Contractive Drift Equation (CDE), the solution of which leads to an explicit convergence bound. Second, we develop an efficient neural-network-based CDE solver. Equipped with these two components, DCDC solves the CDE and converts the solution into a convergence bound. We analyze the sample complexity of the algorithm and further demonstrate the effectiveness of the DCDC by generating convergence bounds for realistic Markov chains arising from stochastic processing networks as well as constant step-size stochastic optimization.

Updated: 2024-05-30 19:26:51

标题: 深度学习用于计算马尔可夫链的收敛速率

摘要: 通用状态空间马尔可夫链的收敛率分析在诸如马尔可夫链蒙特卡罗和算法分析（用于计算明确的收敛界限）等领域是至关重要的。然而，这个问题因为传统的分析方法通常无法为现实中的马尔可夫链生成实用的收敛界限而变得难以处理。我们提出了深度收缩漂移计算器（DCDC），这是第一个用于在Wasserstein距离下限定马尔可夫链收敛到稳态的通用样本算法。DCDC有两个组件。首先，受（Qu等人，2023年）中新的收敛分析框架的启发，我们引入了收缩漂移方程（CDE），其解决方案导致了一个明确的收敛界限。其次，我们开发了一个高效的基于神经网络的CDE求解器。配备这两个组件，DCDC解决了CDE并将解决方案转化为一个收敛界限。我们分析了算法的样本复杂度，并通过为从随机处理网络和恒定步长随机优化中产生的现实马尔可夫链生成收敛界限来进一步展示DCDC的有效性。

更新时间: 2024-05-30 19:26:51

领域: cs.LG,math.PR,stat.ML

下载: http://arxiv.org/abs/2405.20435v1

Privacy Issues in Large Language Models: A Survey

This is the first survey of the active area of AI research that focuses on privacy issues in Large Language Models (LLMs). Specifically, we focus on work that red-teams models to highlight privacy risks, attempts to build privacy into the training or inference process, enables efficient data deletion from trained models to comply with existing privacy regulations, and tries to mitigate copyright issues. Our focus is on summarizing technical research that develops algorithms, proves theorems, and runs empirical evaluations. While there is an extensive body of legal and policy work addressing these challenges from a different angle, that is not the focus of our survey. Nevertheless, these works, along with recent legal developments do inform how these technical problems are formalized, and so we discuss them briefly in Section 1. While we have made our best effort to include all the relevant work, due to the fast moving nature of this research we may have missed some recent work. If we have missed some of your work please contact us, as we will attempt to keep this survey relatively up to date. We are maintaining a repository with the list of papers covered in this survey and any relevant code that was publicly available at https://github.com/safr-ml-lab/survey-llm.

Updated: 2024-05-30 19:26:05

标题: 大型语言模型中的隐私问题：一项调查

摘要: 这是针对大型语言模型（LLMs）隐私问题的AI研究领域的第一次调查。具体来说，我们关注红队模型以突出隐私风险的工作，尝试在训练或推理过程中构建隐私保护，实现对经过训练的模型进行高效数据删除以符合现有隐私法规，并尝试缓解版权问题。我们的重点是总结开发算法、证明定理和进行经验评估的技术研究。虽然有大量的法律和政策工作从不同角度解决这些挑战，但这并不是我们调查的重点。尽管如此，这些工作以及最近的法律进展确实影响了这些技术问题如何被形式化，因此我们在第1节中简要讨论了它们。虽然我们已经尽力包括所有相关工作，但由于这项研究的快速发展性质，我们可能错过了一些最近的工作。如果我们错过了你的工作，请与我们联系，因为我们将尽力保持这项调查相对最新。我们正在维护一个包含本调查涵盖的论文列表和任何公开可用的相关代码的存储库，网址为https://github.com/safr-ml-lab/survey-llm。

更新时间: 2024-05-30 19:26:05

领域: cs.AI

下载: http://arxiv.org/abs/2312.06717v4

Facilitating Human-LLM Collaboration through Factuality Scores and Source Attributions

While humans increasingly rely on large language models (LLMs), they are susceptible to generating inaccurate or false information, also known as "hallucinations". Technical advancements have been made in algorithms that detect hallucinated content by assessing the factuality of the model's responses and attributing sections of those responses to specific source documents. However, there is limited research on how to effectively communicate this information to users in ways that will help them appropriately calibrate their trust toward LLMs. To address this issue, we conducted a scenario-based study (N=104) to systematically compare the impact of various design strategies for communicating factuality and source attribution on participants' ratings of trust, preferences, and ease in validating response accuracy. Our findings reveal that participants preferred a design in which phrases within a response were color-coded based on the computed factuality scores. Additionally, participants increased their trust ratings when relevant sections of the source material were highlighted or responses were annotated with reference numbers corresponding to those sources, compared to when they received no annotation in the source material. Our study offers practical design guidelines to facilitate human-LLM collaboration and it promotes a new human role to carefully evaluate and take responsibility for their use of LLM outputs.

Updated: 2024-05-30 19:23:14

标题: 通过事实得分和来源归因促进人类与语言模型的协作

摘要: 随着人类越来越依赖大型语言模型（LLMs），他们容易生成不准确或错误信息，也被称为“幻觉”。在评估模型回应的真实性并将回应的部分归因于特定源文档方面，算法方面已经取得了技术进展。然而，关于如何有效地向用户传达这些信息以帮助他们适当地校准对LLMs的信任，研究有限。为了解决这个问题，我们进行了一个基于场景的研究（N=104），系统比较了用于沟通真实性和来源归因的各种设计策略对参与者信任、偏好和验证回应准确性的影响。我们的发现显示，参与者更喜欢一种设计，其中回应中的短语根据计算的真实性得分进行颜色编码。此外，与在源文档中没有注释时相比，当源材料的相关部分被突出显示或回应被注释为与这些源相关的参考号时，参与者增加了他们的信任评级。我们的研究提供了实用的设计指南，以促进人类与LLM的合作，并推动一个新的人类角色，即仔细评估并对他们对LLM输出的使用负责。

更新时间: 2024-05-30 19:23:14

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2405.20434v1

Exploring the Practicality of Federated Learning: A Survey Towards the Communication Perspective

Federated Learning (FL) is a promising paradigm that offers significant advancements in privacy-preserving, decentralized machine learning by enabling collaborative training of models across distributed devices without centralizing data. However, the practical deployment of FL systems faces a significant bottleneck: the communication overhead caused by frequently exchanging large model updates between numerous devices and a central server. This communication inefficiency can hinder training speed, model performance, and the overall feasibility of real-world FL applications. In this survey, we investigate various strategies and advancements made in communication-efficient FL, highlighting their impact and potential to overcome the communication challenges inherent in FL systems. Specifically, we define measures for communication efficiency, analyze sources of communication inefficiency in FL systems, and provide a taxonomy and comprehensive review of state-of-the-art communication-efficient FL methods. Additionally, we discuss promising future research directions for enhancing the communication efficiency of FL systems. By addressing the communication bottleneck, FL can be effectively applied and enable scalable and practical deployment across diverse applications that require privacy-preserving, decentralized machine learning, such as IoT, healthcare, or finance.

Updated: 2024-05-30 19:21:33

标题: 探索联邦学习的实用性：通信视角的调查

摘要: 联邦学习（FL）是一种有希望的范式，通过在不集中数据的情况下跨分布设备进行模型的协作训练，从而在隐私保护、分散式机器学习方面取得了显著进展。然而，FL系统的实际部署面临一个重要瓶颈：由于频繁在众多设备和中央服务器之间交换大型模型更新而引起的通信开销。这种通信效率低下可能会阻碍训练速度、模型性能和FL应用的整体可行性。在本调查中，我们调查了通信高效FL中采用的各种策略和进展，突出它们对克服FL系统中固有的通信挑战的影响和潜力。具体而言，我们定义了通信效率的度量标准，分析了FL系统中通信效率低下的原因，提供了最新通信高效FL方法的分类和全面审查。此外，我们还讨论了未来提高FL系统通信效率的有前途的研究方向。通过解决通信瓶颈，FL可以被有效应用，并实现跨多种应用领域的可扩展和实际部署，这些应用领域需要隐私保护、分散式机器学习，如物联网、医疗保健或金融领域。

更新时间: 2024-05-30 19:21:33

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.20431v1

Enhancing Performance for Highly Imbalanced Medical Data via Data Regularization in a Federated Learning Setting

The increased availability of medical data has significantly impacted healthcare by enabling the application of machine / deep learning approaches in various instances. However, medical datasets are usually small and scattered across multiple providers, suffer from high class-imbalance, and are subject to stringent data privacy constraints. In this paper, the application of a data regularization algorithm, suitable for learning under high class-imbalance, in a federated learning setting is proposed. Specifically, the goal of the proposed method is to enhance model performance for cardiovascular disease prediction by tackling the class-imbalance that typically characterizes datasets used for this purpose, as well as by leveraging patient data available in different nodes of a federated ecosystem without compromising their privacy and enabling more resource sensitive allocation. The method is evaluated across four datasets for cardiovascular disease prediction, which are scattered across different clients, achieving improved performance. Meanwhile, its robustness under various hyperparameter settings, as well as its ability to adapt to different resource allocation scenarios, is verified.

Updated: 2024-05-30 19:15:38

标题: 在联邦学习环境中通过数据正则化提升高度不平衡的医疗数据性能

摘要: 医疗数据的增加使得机器/深度学习方法在各种情况下得以应用，从而显著影响了医疗保健领域。然而，医疗数据集通常规模较小且分散在多个提供者之间，存在严重的类别不平衡问题，并受到严格的数据隐私约束。本文提出在联邦学习环境中应用一种适用于处理高类别不平衡的数据正则化算法。具体来说，所提出的方法的目标是通过解决通常特征化用于心血管疾病预测目的的数据集中存在的类别不平衡问题，以及利用分布在联邦生态系统不同节点中的患者数据而不损害其隐私，并实现更加资源敏感的分配，从而提高心血管疾病预测模型的性能。该方法在分布在不同客户端的四个心血管疾病预测数据集上进行评估，取得了改进的性能。同时，其在各种超参数设置下的稳健性以及适应不同资源分配情况的能力也得到了验证。

更新时间: 2024-05-30 19:15:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.20430v1

Dirichlet Flow Matching with Applications to DNA Sequence Design

Discrete diffusion or flow models could enable faster and more controllable sequence generation than autoregressive models. We show that na\"ive linear flow matching on the simplex is insufficient toward this goal since it suffers from discontinuities in the training target and further pathologies. To overcome this, we develop Dirichlet flow matching on the simplex based on mixtures of Dirichlet distributions as probability paths. In this framework, we derive a connection between the mixtures' scores and the flow's vector field that allows for classifier and classifier-free guidance. Further, we provide distilled Dirichlet flow matching, which enables one-step sequence generation with minimal performance hits, resulting in $O(L)$ speedups compared to autoregressive models. On complex DNA sequence generation tasks, we demonstrate superior performance compared to all baselines in distributional metrics and in achieving desired design targets for generated sequences. Finally, we show that our classifier-free guidance approach improves unconditional generation and is effective for generating DNA that satisfies design targets. Code is available at https://github.com/HannesStark/dirichlet-flow-matching.

Updated: 2024-05-30 19:09:41

标题: 狄利克雷流匹配及其在DNA序列设计中的应用

摘要: 离散扩散或流动模型可以比自回归模型实现更快速和更可控的序列生成。我们发现朴素线性流匹配在单纯形上对于这一目标是不够的，因为它在训练目标中存在不连续性和进一步的病态。为了克服这一问题，我们基于狄利克雷分布混合物作为概率路径开发了单纯形上的狄利克雷流匹配。在这个框架中，我们推导出混合物得分与流的矢量场之间的联系，从而实现分类器和无分类器的指导。此外，我们提供了精炼的狄利克雷流匹配，它可以在最小的性能损失下实现一步序列生成，相比自回归模型可以实现$O(L)$的加速。在复杂的DNA序列生成任务中，我们证明了与所有基线相比在分布度量和实现所生成序列的期望设计目标方面表现优越。最后，我们展示了我们的无分类器指导方法改进了无条件生成，并有效地生成符合设计目标的DNA。代码可在https://github.com/HannesStark/dirichlet-flow-matching获取。

更新时间: 2024-05-30 19:09:41

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2402.05841v2

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like software engineering, we posit that LM agents represent a new category of end users with their own needs and abilities, and would benefit from specially-built interfaces to the software they use. We investigate how interface design affects the performance of language model agents. As a result of this exploration, we introduce SWE-agent: a system that facilitates LM agents to autonomously use computers to solve software engineering tasks. SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs. We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively, far exceeding the previous state-of-the-art achieved with non-interactive LMs. Finally, we provide insight on how the design of the ACI can impact agents' behavior and performance.

Updated: 2024-05-30 19:09:01

标题: SWE代理：代理-计算机界面实现自动化软件工程

摘要: 语言模型（LM）代理越来越被用于自动化数字环境中的复杂任务。就像人类受益于强大的软件应用程序，比如集成开发环境，用于复杂任务如软件工程，我们认为LM代理代表了一个新的终端用户类别，拥有他们自己的需求和能力，并且会从专门构建的软件界面中受益。我们研究了界面设计如何影响语言模型代理的性能。通过这一探索，我们介绍了SWE-agent：一个系统，可以使LM代理自主使用计算机来解决软件工程任务。SWE-agent的自定义代理-计算机界面（ACI）显著增强了代理的能力，使其能够创建和编辑代码文件，浏览整个代码库，并执行测试和其他程序。我们在SWE-bench和HumanEvalFix上评估了SWE-agent，在两者上均取得了最先进的性能，分别为12.5%和87.7%的pass@1率，远远超过以前非交互式LM实现的最先进性能。最后，我们提供了关于ACI设计如何影响代理行为和性能的见解。

更新时间: 2024-05-30 19:09:01

领域: cs.SE,cs.AI,cs.CL,cs.HC,cs.LG

下载: http://arxiv.org/abs/2405.15793v2

Harmonic Self-Conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design

A significant amount of protein function requires binding small molecules, including enzymatic catalysis. As such, designing binding pockets for small molecules has several impactful applications ranging from drug synthesis to energy storage. Towards this goal, we first develop HarmonicFlow, an improved generative process over 3D protein-ligand binding structures based on our self-conditioned flow matching objective. FlowSite extends this flow model to jointly generate a protein pocket's discrete residue types and the molecule's binding 3D structure. We show that HarmonicFlow improves upon state-of-the-art generative processes for docking in simplicity, generality, and average sample quality in pocket-level docking. Enabled by this structure modeling, FlowSite designs binding sites substantially better than baseline approaches.

Updated: 2024-05-30 19:04:39

标题: 多配体对接和结合位点设计的谐波自条件流匹配

摘要: 大量蛋白质功能需要结合小分子，包括酶催化。因此，设计小分子结合口袋对于从药物合成到能量存储等应用具有重要影响。为实现这一目标，我们首先开发了HarmonicFlow，这是一个基于我们的自我条件流匹配目标的改进的三维蛋白质-配体结合结构生成过程。FlowSite将该流模型扩展到联合生成蛋白质口袋的离散残基类型以及分子的结合三维结构。我们展示了HarmonicFlow在口袋级对接中在简单性、一般性和平均样本质量方面优于最先进的生成过程。借助这种结构建模，FlowSite设计的结合位点明显优于基线方法。

更新时间: 2024-05-30 19:04:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.05764v4

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Efficient and versatile Out-of-Distribution (OOD) detection is essential for the safe deployment of AI yet remains challenging for existing algorithms. Inspired by Neural Collapse, we discover that features of in-distribution (ID) samples cluster closer to the weight vectors compared to features of OOD samples. In addition, we reveal that ID features tend to expand in space to structure a simplex Equiangular Tight Framework, which nicely explains the prevalent observation that ID features reside further from the origin than OOD features. Taking both insights from Neural Collapse into consideration, we propose to leverage feature proximity to weight vectors for OOD detection and further complement this perspective by using feature norms to filter OOD samples. Extensive experiments on off-the-shelf models demonstrate the efficiency and effectiveness of our method across diverse classification tasks and model architectures, enhancing the generalization capability of OOD detection.

Updated: 2024-05-30 18:59:12

标题: 通过神经崩溃的视角检测分布之外的数据

摘要: 高效和多功能的区域外（OOD）检测对人工智能的安全部署至关重要，但对现有算法仍然具有挑战性。受神经崩溃的启发，我们发现与OOD样本的特征相比，ID样本的特征更接近权向量。此外，我们揭示ID特征倾向于在空间中扩展以构建一个简单的等角紧框架，这很好地解释了ID特征比OOD特征离原点更远的普遍观察结果。综合考虑神经崩溃的两个见解，我们提议利用特征与权向量的接近度进行OOD检测，并进一步通过使用特征范数来过滤OOD样本来补充这一观点。对现成模型进行的广泛实验显示，我们的方法在各种分类任务和模型架构上都表现出高效和有效，增强了OOD检测的泛化能力。

更新时间: 2024-05-30 18:59:12

领域: cs.LG,eess.IV

下载: http://arxiv.org/abs/2311.01479v5

Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA

Large Multimodal Models (LMMs) have shown remarkable progress in the field of medical Visual Question Answering (Med-VQA), achieving high accuracy on existing benchmarks. However, their reliability under robust evaluation is questionable. This study reveals that state-of-the-art models, when subjected to simple probing evaluation, perform worse than random guessing on medical diagnosis questions. To address this critical evaluation problem, we introduce the Probing Evaluation for Medical Diagnosis (ProbMed) dataset to rigorously assess LMM performance in medical imaging through probing evaluation and procedural diagnosis. Particularly, probing evaluation features pairing original questions with negation questions with hallucinated attributes, while procedural diagnosis requires reasoning across various diagnostic dimensions for each image, including modality recognition, organ identification, clinical findings, abnormalities, and positional grounding. Our evaluation reveals that top-performing models like GPT-4V and Gemini Pro perform worse than random guessing on specialized diagnostic questions, indicating significant limitations in handling fine-grained medical inquiries. Besides, models like LLaVA-Med struggle even with more general questions, and results from CheXagent demonstrate the transferability of expertise across different modalities of the same organ, showing that specialized domain knowledge is still crucial for improving performance. This study underscores the urgent need for more robust evaluation to ensure the reliability of LMMs in critical fields like medical diagnosis, and current LMMs are still far from applicable to those fields.

Updated: 2024-05-30 18:56:01

标题: 比随机更糟糕吗？对医学视觉问答中大型多模态模型的尴尬简单探究评估

摘要: 大型多模态模型（LMMs）在医学视觉问答（Med-VQA）领域取得了显著进展，在现有基准测试中取得了高准确度。然而，它们在鲁棒评估下的可靠性令人怀疑。这项研究揭示了当最先进的模型面临简单的探测评估时，在医学诊断问题上的表现比随机猜测还要差。为了解决这一关键评估问题，我们引入了用于医学诊断的探测评估（ProbMed）数据集，通过探测评估和程序诊断来严格评估LMM在医学成像中的表现。特别是，探测评估包括将原始问题与带有虚构属性的否定问题配对，而程序诊断则需要跨越各种诊断维度对每幅图像进行推理，包括模态识别、器官识别、临床发现、异常和位置基础。我们的评估显示，像GPT-4V和Gemini Pro这样的表现最佳的模型在专业诊断问题上的表现比随机猜测还要差，表明在处理细粒度医学查询方面存在显著限制。此外，像LLaVA-Med这样的模型甚至在更一般的问题上也遇到困难，而来自CheXagent的结果展示了专业领域知识在改善性能方面的可转移性，表明专门领域知识对于提高性能仍然至关重要。该研究强调了在关键领域如医学诊断中确保LMM可靠性的迫切需求，当前的LMM在这些领域仍远未达到适用的水平。

更新时间: 2024-05-30 18:56:01

领域: cs.AI

下载: http://arxiv.org/abs/2405.20421v1

Back to the Basics on Predicting Transfer Performance

In the evolving landscape of deep learning, selecting the best pre-trained models from a growing number of choices is a challenge. Transferability scorers propose alleviating this scenario, but their recent proliferation, ironically, poses the challenge of their own assessment. In this work, we propose both robust benchmark guidelines for transferability scorers, and a well-founded technique to combine multiple scorers, which we show consistently improves their results. We extensively evaluate 13 scorers from literature across 11 datasets, comprising generalist, fine-grained, and medical imaging datasets. We show that few scorers match the predictive performance of the simple raw metric of models on ImageNet, and that all predictors suffer on medical datasets. Our results highlight the potential of combining different information sources for reliably predicting transferability across varied domains.

Updated: 2024-05-30 18:55:50

标题: 回到预测转移绩效的基本原理

摘要: 在不断发展的深度学习领域中，从日益增多的选择中选择最佳的预训练模型是一个挑战。转移性评分器提出缓解这种情况，但讽刺的是，它们最近的激增反而提出了自己的评估挑战。在这项工作中，我们提出了转移性评分器的强大基准指南，以及一个经过充分验证的技术来结合多个评分器，我们表明这一方法始终能够提高它们的结果。我们在11个数据集上广泛评估了来自文献的13个评分器，包括广义、细粒度和医学成像数据集。我们展示了只有少数评分器能够匹配模型在ImageNet上的简单原始指标的预测性能，并且所有预测器都在医学数据集上遇到困难。我们的结果突出了结合不同信息源以可靠地预测跨不同领域的转移性的潜力。

更新时间: 2024-05-30 18:55:50

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.20420v1

Enhancing Antibiotic Stewardship using a Natural Language Approach for Better Feature Representation

The rapid emergence of antibiotic-resistant bacteria is recognized as a global healthcare crisis, undermining the efficacy of life-saving antibiotics. This crisis is driven by the improper and overuse of antibiotics, which escalates bacterial resistance. In response, this study explores the use of clinical decision support systems, enhanced through the integration of electronic health records (EHRs), to improve antibiotic stewardship. However, EHR systems present numerous data-level challenges, complicating the effective synthesis and utilization of data. In this work, we transform EHR data into a serialized textual representation and employ pretrained foundation models to demonstrate how this enhanced feature representation can aid in antibiotic susceptibility predictions. Our results suggest that this text representation, combined with foundation models, provides a valuable tool to increase interpretability and support antibiotic stewardship efforts.

Updated: 2024-05-30 18:53:53

标题: 利用自然语言方法增强抗生素管理，以获得更好的特征表示

摘要: 抗生素耐药细菌的迅速出现被认为是全球卫生危机，削弱了拯救生命的抗生素的功效。这一危机是由于抗生素的不当使用和滥用，导致细菌耐药性加剧。为应对这一问题，本研究探讨了通过集成电子健康记录（EHR）增强的临床决策支持系统用于改善抗生素管理的方法。然而，EHR系统存在许多数据层面的挑战，使数据的有效综合和利用变得复杂。在这项工作中，我们将EHR数据转换为序列化文本表示，并利用预训练的基础模型，展示了这种增强的特征表示如何有助于抗生素敏感性预测。我们的结果表明，这种文本表示结合基础模型，提供了一个有价值的工具，可以增加可解释性并支持抗生素管理工作。

更新时间: 2024-05-30 18:53:53

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20419v1

BayOTIDE: Bayesian Online Multivariate Time series Imputation with functional decomposition

In real-world scenarios like traffic and energy, massive time-series data with missing values and noises are widely observed, even sampled irregularly. While many imputation methods have been proposed, most of them work with a local horizon, which means models are trained by splitting the long sequence into batches of fit-sized patches. This local horizon can make models ignore global trends or periodic patterns. More importantly, almost all methods assume the observations are sampled at regular time stamps, and fail to handle complex irregular sampled time series arising from different applications. Thirdly, most existing methods are learned in an offline manner. Thus, it is not suitable for many applications with fast-arriving streaming data. To overcome these limitations, we propose BayOTIDE: Bayesian Online Multivariate Time series Imputation with functional decomposition. We treat the multivariate time series as the weighted combination of groups of low-rank temporal factors with different patterns. We apply a group of Gaussian Processes (GPs) with different kernels as functional priors to fit the factors. For computational efficiency, we further convert the GPs into a state-space prior by constructing an equivalent stochastic differential equation (SDE), and developing a scalable algorithm for online inference. The proposed method can not only handle imputation over arbitrary time stamps, but also offer uncertainty quantification and interpretability for the downstream application. We evaluate our method on both synthetic and real-world datasets.We release the code at {https://github.com/xuangu-fang/BayOTIDE}

Updated: 2024-05-30 18:50:32

标题: BayOTIDE：基于贝叶斯方法的在线多元时间序列插补，采用功能分解

摘要: 在像交通和能源这样的现实场景中，广泛观察到具有缺失值和噪声的大量时间序列数据，甚至不规则采样。虽然已经提出了许多插补方法，但大多数方法都在本地时间范围内工作，这意味着模型通过将长序列拆分为适当大小的批次来进行训练。这种本地时间范围可能使模型忽略全局趋势或周期性模式。更重要的是，几乎所有方法都假设观测值在规则时间戳上采样，并且无法处理由不同应用程序产生的复杂不规则采样时间序列。第三，大多数现有方法是以离线方式学习的。因此，对于许多具有快速到达流数据的应用程序来说并不适用。为了克服这些限制，我们提出了BayOTIDE：具有功能分解的贝叶斯在线多元时间序列插补。我们将多元时间序列视为具有不同模式的低秩时间因素组的加权组合。我们应用一组具有不同内核的高斯过程（GPs）作为功能先验来拟合这些因素。为了提高计算效率，我们进一步通过构建等价的随机微分方程（SDE）将GPs转换为状态空间先验，并开发了一个可扩展的在线推理算法。该方法不仅可以处理任意时间戳上的插补，还可以为下游应用程序提供不确定性量化和可解释性。我们在合成数据集和真实数据集上评估了我们的方法。我们在{https://github.com/xuangu-fang/BayOTIDE}上发布了代码。

更新时间: 2024-05-30 18:50:32

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2308.14906v3

Digital Inheritance in Web3: A Case Study of Soulbound Tokens and the Social Recovery Pallet within the Polkadot and Kusama Ecosystems

In recent years discussions centered around digital inheritance have increased among social media users and across blockchain ecosystems. As a result digital assets such as social media content cryptocurrencies and non-fungible tokens have become increasingly valuable and widespread, leading to the need for clear and secure mechanisms for transferring these assets upon the testators death or incapacitation. This study proposes a framework for digital inheritance using soulbound tokens and the social recovery pallet as a use case in the Polkadot and Kusama blockchain networks. The findings discussed within this study suggest that while soulbound tokens and the social recovery pallet offer a promising solution for creating a digital inheritance plan the findings also raise important considerations for testators digital executors and developers. While further research is needed to fully understand the potential impacts and risks of other technologies such as artificial intelligence and quantum computing this study provides a primer for users to begin planning a digital inheritance strategy and for developers to develop a more intuitive solution.

Updated: 2024-05-30 18:50:09

标题: Web3中的数字继承：以Polkadot和Kusama生态系统中的Soulbound代币和社交恢复平台为案例研究

摘要: 近年来，围绕数字遗产的讨论在社交媒体用户和区块链生态系统中不断增加。因此，诸如社交媒体内容、加密货币和非同质化代币等数字资产变得越来越有价值和普遍，这导致了需要清晰和安全的机制来在遗嘱人死亡或丧失行为能力时转移这些资产的需求。本研究提出了一个框架，使用基于灵魂的代币和社交恢复托盘作为Polkadot和Kusama区块链网络中的一个用例，用于数字遗产。本研究讨论的研究结果表明，虽然基于灵魂的代币和社交恢复托盘为创建数字遗产计划提供了一个有前途的解决方案，但这些发现也引发了遗嘱人、数字执行人员和开发人员的重要考虑。虽然需要进一步研究才能充分了解其他技术如人工智能和量子计算的潜在影响和风险，但本研究为用户开始规划数字遗产策略和开发者开发更直观解决方案提供了一个入门指南。

更新时间: 2024-05-30 18:50:09

领域: cs.CR

下载: http://arxiv.org/abs/2301.11074v2

Graph as Point Set

Graph is a fundamental data structure to model interconnections between entities. Set, on the contrary, stores independent elements. To learn graph representations, current Graph Neural Networks (GNNs) primarily use message passing to encode the interconnections. In contrast, this paper introduces a novel graph-to-set conversion method that bijectively transforms interconnected nodes into a set of independent points and then uses a set encoder to learn the graph representation. This conversion method holds dual significance. Firstly, it enables using set encoders to learn from graphs, thereby significantly expanding the design space of GNNs. Secondly, for Transformer, a specific set encoder, we provide a novel and principled approach to inject graph information losslessly, different from all the heuristic structural/positional encoding methods adopted in previous graph transformers. To demonstrate the effectiveness of our approach, we introduce Point Set Transformer (PST), a transformer architecture that accepts a point set converted from a graph as input. Theoretically, PST exhibits superior expressivity for both short-range substructure counting and long-range shortest path distance tasks compared to existing GNNs. Extensive experiments further validate PST's outstanding real-world performance. Besides Transformer, we also devise a Deepset-based set encoder, which achieves performance comparable to representative GNNs, affirming the versatility of our graph-to-set method.

Updated: 2024-05-30 18:44:49

标题: 图作为点集

摘要: 图是一种基本的数据结构，用于建模实体之间的相互关系。相反，集合存储独立的元素。为了学习图表示，当前的图神经网络（GNNs）主要使用消息传递来编码这些相互关系。相比之下，本文介绍了一种新颖的图转集合转换方法，将相互连接的节点双射地转换为一组独立点，然后使用集合编码器来学习图表示。这种转换方法具有双重意义。首先，它使得可以使用集合编码器从图中学习，从而显著扩展了GNNs的设计空间。其次，对于Transformer，一种特定的集合编码器，我们提供了一种新颖和有原则的方法来无损地注入图信息，与先前图变换器中采用的所有启发式结构/位置编码方法不同。为了证明我们方法的有效性，我们引入了Point Set Transformer（PST），一种接受从图转换而来的点集作为输入的变压器架构。理论上，与现有的GNNs相比，PST在短程子结构计数和长程最短路径距离任务中展现出更优越的表达能力。大量实验证明了PST在现实世界中的出色表现。除了Transformer之外，我们还设计了基于Deepset的集合编码器，其性能可与代表性的GNNs相媲美，证实了我们的图转集合方法的多功能性。

更新时间: 2024-05-30 18:44:49

领域: cs.LG

下载: http://arxiv.org/abs/2405.02795v2

Efficacy of ByT5 in Multilingual Translation of Biblical Texts for Underrepresented Languages

This study presents the development and evaluation of a ByT5-based multilingual translation model tailored for translating the Bible into underrepresented languages. Utilizing the comprehensive Johns Hopkins University Bible Corpus, we trained the model to capture the intricate nuances of character-based and morphologically rich languages. Our results, measured by the BLEU score and supplemented with sample translations, suggest the model can improve accessibility to sacred texts. It effectively handles the distinctive biblical lexicon and structure, thus bridging the linguistic divide. The study also discusses the model's limitations and suggests pathways for future enhancements, focusing on expanding access to sacred literature across linguistic boundaries.

Updated: 2024-05-30 18:42:45

标题: ByT5在代表性语言的圣经文本多语言翻译中的有效性

摘要: 这项研究介绍了一种基于ByT5的多语言翻译模型的开发和评估，该模型专为将圣经翻译成少数语言而设计。利用约翰霍普金斯大学圣经语料库，我们训练了该模型，以捕捉基于字符和形态丰富的语言的微妙细节。通过BLEU分数衡量并辅以样本翻译，我们的结果表明该模型可以提高对圣经文本的可访问性。它有效地处理了独特的圣经词汇和结构，从而弥合了语言上的鸿沟。该研究还讨论了模型的局限性，并提出了未来增强途径，重点是扩大跨语言边界的圣经文学访问。

更新时间: 2024-05-30 18:42:45

领域: cs.CL,cs.LG,I.2.7

下载: http://arxiv.org/abs/2405.13350v2

The Impact of Ontology on the Prediction of Cardiovascular Disease Compared to Machine Learning Algorithms

Cardiovascular disease is one of the chronic diseases that is on the rise. The complications occur when cardiovascular disease is not discovered early and correctly diagnosed at the right time. Various machine learning approaches, including ontology-based Machine Learning techniques, have lately played an essential role in medical science by building an automated system that can identify heart illness. This paper compares and reviews the most prominent machine learning algorithms, as well as ontology-based Machine Learning classification. Random Forest, Logistic regression, Decision Tree, Naive Bayes, k-Nearest Neighbours, Artificial Neural Network, and Support Vector Machine were among the classification methods explored. The dataset used consists of 70000 instances and can be downloaded from the Kaggle website. The findings are assessed using performance measures generated from the confusion matrix, such as F-Measure, Accuracy, Recall, and Precision. The results showed that the ontology outperformed all the machine learning algorithms.

Updated: 2024-05-30 18:40:27

标题: 本体论对心血管疾病预测的影响与机器学习算法的比较

摘要: 心血管疾病是一种不断增长的慢性疾病之一。当心血管疾病没有及时发现并在正确时间进行正确诊断时，会出现并发症。最近，包括基于本体的机器学习技术在内的各种机器学习方法在医学科学中发挥着重要作用，通过构建一个自动化系统来识别心脏疾病。本文比较和审查了最突出的机器学习算法，以及基于本体的机器学习分类。随机森林、逻辑回归、决策树、朴素贝叶斯、k-最近邻、人工神经网络和支持向量机是探索的分类方法之一。使用的数据集包括70000个实例，可以从Kaggle网站下载。通过从混淆矩阵生成的性能度量来评估研究结果，如F-度量、准确率、召回率和精确度。结果表明，本体胜过所有机器学习算法。

更新时间: 2024-05-30 18:40:27

领域: cs.LG

下载: http://arxiv.org/abs/2405.20414v1

Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters

Large Language Models (LLMs) are typically harmless but remain vulnerable to carefully crafted prompts known as ``jailbreaks'', which can bypass protective measures and induce harmful behavior. Recent advancements in LLMs have incorporated moderation guardrails that can filter outputs, which trigger processing errors for certain malicious questions. Existing red-teaming benchmarks often neglect to include questions that trigger moderation guardrails, making it difficult to evaluate jailbreak effectiveness. To address this issue, we introduce JAMBench, a harmful behavior benchmark designed to trigger and evaluate moderation guardrails. JAMBench involves 160 manually crafted instructions covering four major risk categories at multiple severity levels. Furthermore, we propose a jailbreak method, JAM (Jailbreak Against Moderation), designed to attack moderation guardrails using jailbreak prefixes to bypass input-level filters and a fine-tuned shadow model functionally equivalent to the guardrail model to generate cipher characters to bypass output-level filters. Our extensive experiments on four LLMs demonstrate that JAM achieves higher jailbreak success ($\sim$ $\times$ 19.88) and lower filtered-out rates ($\sim$ $\times$ 1/6) than baselines.

Updated: 2024-05-30 18:38:36

标题: 使用密码字符对大型语言模型进行越狱，绕过审查防护栏

摘要: 大型语言模型（LLMs）通常是无害的，但仍然容易受到精心设计的提示的攻击，这些提示被称为“越狱”，可以绕过保护措施并引发有害行为。LLMs的最新进展已经融入了可以过滤输出的调节防护栏，这些调节防护栏可以触发某些恶意问题的处理错误。现有的红队基准测试经常忽略了会触发调节防护栏的问题，这使得评估越狱效果变得困难。为了解决这个问题，我们引入了JAMBench，这是一个旨在触发和评估调节防护栏的有害行为基准测试。JAMBench包括160个手工制作的指令，涵盖了四个主要风险类别，以及多个严重程度级别。此外，我们提出了一种越狱方法，JAM（对抗调节防护栏），旨在使用越狱前缀攻击调节防护栏，以绕过输入级别的过滤器，并使用经过微调的影子模型功能等效于防护栏模型，生成密码字符以绕过输出级别的过滤器。我们在四个LLMs上进行了大量实验，结果表明，JAM实现了更高的越狱成功率（约19.88倍），以及较低的被过滤率（约1/6倍），相较于基准线。

更新时间: 2024-05-30 18:38:36

领域: cs.CR,cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.20413v1

Audio2Rig: Artist-oriented deep learning tool for facial animation

Creating realistic or stylized facial and lip sync animation is a tedious task. It requires lot of time and skills to sync the lips with audio and convey the right emotion to the character's face. To allow animators to spend more time on the artistic and creative part of the animation, we present Audio2Rig: a new deep learning based tool leveraging previously animated sequences of a show, to generate facial and lip sync rig animation from an audio file. Based in Maya, it learns from any production rig without any adjustment and generates high quality and stylized animations which mimic the style of the show. Audio2Rig fits in the animator workflow: since it generates keys on the rig controllers, the animation can be easily retaken. The method is based on 3 neural network modules which can learn an arbitrary number of controllers. Hence, different configurations can be created for specific parts of the face (such as the tongue, lips or eyes). With Audio2Rig, animators can also pick different emotions and adjust their intensities to experiment or customize the output, and have high level controls on the keyframes setting. Our method shows excellent results, generating fine animation details while respecting the show style. Finally, as the training relies on the studio data and is done internally, it ensures data privacy and prevents from copyright infringement.

Updated: 2024-05-30 18:37:21

标题: Audio2Rig：面部动画的面向艺术家的深度学习工具

摘要: 创建逼真或风格化的面部和唇部同步动画是一项繁琐的任务。需要花费大量时间和技能将嘴唇与音频同步，并传达正确的情感到角色的脸上。为了让动画师能够在动画的艺术和创意部分上花费更多时间，我们提出了Audio2Rig：一种基于深度学习的新工具，利用节目中先前动画序列，从音频文件生成面部和唇部同步的动画。基于Maya，它可以从任何制作人物模型中学习，无需任何调整，并生成模仿节目风格的高质量和风格化动画。Audio2Rig适合动画师的工作流程：由于它在人物模型控制器上生成关键帧，动画可以轻松重新制作。该方法基于3个神经网络模块，可以学习任意数量的控制器。因此，可以为面部的特定部分（如舌头、嘴唇或眼睛）创建不同的配置。使用Audio2Rig，动画师还可以选择不同的情感并调整其强度以进行实验或自定义输出，并对关键帧设置具有高级控制。我们的方法显示出优秀的结果，生成精细的动画细节同时尊重节目风格。最后，由于训练依赖于工作室数据并在内部完成，它确保了数据隐私并防止侵犯版权。

更新时间: 2024-05-30 18:37:21

领域: cs.GR,cs.LG

下载: http://arxiv.org/abs/2405.20412v1

SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought

Expressive speech-to-speech translation (S2ST) is a key research topic in seamless communication, which focuses on the preservation of semantics and speaker vocal style in translated speech. Early works synthesized speaker style aligned speech in order to directly learn the mapping from speech to target speech spectrogram. Without reliance on style aligned data, recent studies leverage the advances of language modeling (LM) and build cascaded LMs on semantic and acoustic tokens. This work proposes SeamlessExpressiveLM, a single speech language model for expressive S2ST. We decompose the complex source-to-target speech mapping into intermediate generation steps with chain-of-thought prompting. The model is first guided to translate target semantic content and then transfer the speaker style to multi-stream acoustic units. Evaluated on Spanish-to-English and Hungarian-to-English translations, SeamlessExpressiveLM outperforms cascaded LMs in both semantic quality and style transfer, meanwhile achieving better parameter efficiency.

Updated: 2024-05-30 18:28:31

标题: SeamlessExpressiveLM：一种用于具有思维链的表达性语音到语音翻译的语言模型

摘要: 表达性语音到语音翻译（S2ST）是无缝通信中的一个关键研究课题，重点是在翻译语音中保留语义和说话者的语音风格。早期的研究合成了与说话者风格对齐的语音，以便直接学习从语音到目标语音频谱图的映射。最近的研究利用语言建模（LM）的进展，构建了基于语义和声学标记的级联LM。本研究提出了SeamlessExpressiveLM，一个用于表达性S2ST的单一语音语言模型。我们将复杂的源到目标语音映射分解为中间生成步骤，并进行链式思维提示。该模型首先被引导将目标语义内容翻译，然后将说话者风格转移到多流声学单元。在西班牙语到英语和匈牙利语到英语的翻译中进行评估，SeamlessExpressiveLM在语义质量和风格转移方面均优于级联LM，同时实现更好的参数效率。

更新时间: 2024-05-30 18:28:31

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2405.20410v1

Robust Explainer Recommendation for Time Series Classification

Time series classification is a task which deals with temporal sequences, a prevalent data type common in domains such as human activity recognition, sports analytics and general sensing. In this area, interest in explainability has been growing as explanation is key to understand the data and the model better. Recently, a great variety of techniques have been proposed and adapted for time series to provide explanation in the form of saliency maps, where the importance of each data point in the time series is quantified with a numerical value. However, the saliency maps can and often disagree, so it is unclear which one to use. This paper provides a novel framework to quantitatively evaluate and rank explanation methods for time series classification. We show how to robustly evaluate the informativeness of a given explanation method (i.e., relevance for the classification task), and how to compare explanations side-by-side. The goal is to recommend the best explainer for a given time series classification dataset. We propose AMEE, a Model-Agnostic Explanation Evaluation framework, for recommending saliency-based explanations for time series classification. In this approach, data perturbation is added to the input time series guided by each explanation. Our results show that perturbing discriminative parts of the time series leads to significant changes in classification accuracy, which can be used to evaluate each explanation. To be robust to different types of perturbations and different types of classifiers, we aggregate the accuracy loss across perturbations and classifiers. This novel approach allows us to recommend the best explainer among a set of different explainers, including random and oracle explainers. We provide a quantitative and qualitative analysis for synthetic datasets, a variety of timeseries datasets, as well as a real-world case study with known expert ground truth.

Updated: 2024-05-30 18:26:06

标题: 时间序列分类的强大解释器推荐

摘要: 时间序列分类是一项处理时间序列的任务，这是一种常见的数据类型，在人类活动识别、体育分析和一般感知领域中很常见。在这个领域，对解释性的兴趣正在增长，因为解释是理解数据和模型的关键。最近，已经提出并调整了各种技术，用于为时间序列提供解释，形式是显著性图，其中时间序列中每个数据点的重要性都用数值来量化。然而，显著性图可能经常存在分歧，所以不清楚要使用哪个。本文提供了一个新颖的框架来定量评估和排名时间序列分类的解释方法。我们展示了如何强健地评估给定解释方法的信息量（即与分类任务的相关性），以及如何进行解释之间的比较。目标是为给定的时间序列分类数据集推荐最佳的解释器。我们提出了AMEE，一个面向模型的解释评估框架，用于推荐基于显著性的时间序列分类解释。在这种方法中，通过每个解释引导的数据扰动添加到输入时间序列中。我们的结果显示，扰动时间序列的歧视部分会导致分类准确度的显著变化，这可以用来评估每个解释。为了对不同类型的扰动和不同类型的分类器具有强健性，我们跨扰动和分类器聚合准确度损失。这种新颖的方法使我们能够在一组不同的解释器中推荐最佳的解释器，包括随机和神谕解释器。我们为合成数据集、各种时间序列数据集以及具有已知专家地面真值的实际案例研究提供了定量和定性分析。

更新时间: 2024-05-30 18:26:06

领域: cs.LG

下载: http://arxiv.org/abs/2306.05501v4

Private Mean Estimation with Person-Level Differential Privacy

We study differentially private (DP) mean estimation in the case where each person holds multiple samples. Commonly referred to as the "user-level" setting, DP here requires the usual notion of distributional stability when all of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that \[n = \tilde \Theta\left(\frac{d}{\alpha^2 m} + \frac{d }{ \alpha m^{1/2} \varepsilon} + \frac{d}{\alpha^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\] people are necessary and sufficient to estimate the mean up to distance $\alpha$ in $\ell_2$-norm under $\varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate DP (with slightly degraded sample complexity) and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the well known noisy-clipped-mean approach, but the analysis for our setting requires new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables, and a new argument for bounding the bias introduced by clipping.

Updated: 2024-05-30 18:20:35

标题: 使用个人级差分隐私进行私有均值估计

摘要: 我们研究不同ially隐私（DP）均值估计，在每个人持有多个样本的情况下。通常称为“用户级”设置，DP在这里需要当一个人的所有数据点都可以修改时的分布稳定性概念。简而言之，如果$n$个人每个人都有$m$个来自未知的$d$维分布且有有界$k$阶矩的样本，我们表明需要和足够的人数才能在$\ell_2$范数下估计均值高达$\alpha$的距离在$\varepsilon$-差分隐私（及其常见放宽条件下）下。在多变量设置中，我们提供了在近似DP下的计算高效算法（带有略微降低的样本复杂度）和在纯DP下的计算低效算法，我们的几乎匹配的下界适用于最宽松的近似DP情况。我们的计算高效估计器基于众所周知的噪声截断均值方法，但是我们的设置的分析需要对独立的、矢量值、有界矩的随机变量之和的尾部进行新的界限，并提供一个新的方法来限制截断引入的偏差。

更新时间: 2024-05-30 18:20:35

领域: cs.DS,cs.CR,cs.IT,cs.LG,math.IT,stat.ML

下载: http://arxiv.org/abs/2405.20405v1

Random Linear Projections Loss for Hyperplane-Based Optimization in Neural Networks

Advancing loss function design is pivotal for optimizing neural network training and performance. This work introduces Random Linear Projections (RLP) loss, a novel approach that enhances training efficiency by leveraging geometric relationships within the data. Distinct from traditional loss functions that target minimizing pointwise errors, RLP loss operates by minimizing the distance between sets of hyperplanes connecting fixed-size subsets of feature-prediction pairs and feature-label pairs. Our empirical evaluations, conducted across benchmark datasets and synthetic examples, demonstrate that neural networks trained with RLP loss outperform those trained with traditional loss functions, achieving improved performance with fewer data samples, and exhibiting greater robustness to additive noise. We provide theoretical analysis supporting our empirical findings.

Updated: 2024-05-30 18:17:57

标题: 神经网络中基于超平面优化的随机线性投影损失

摘要: 推进损失函数设计对于优化神经网络的训练和性能至关重要。本文介绍了一种新颖的方法，即随机线性投影（RLP）损失，通过利用数据中的几何关系提高训练效率。与传统的旨在最小化点误差的损失函数不同，RLP损失通过最小化连接固定大小特征-预测对和特征-标签对的超平面集之间的距离来操作。我们通过对基准数据集和合成示例进行的实证评估表明，使用RLP损失训练的神经网络优于使用传统损失函数训练的网络，能够在更少的数据样本下实现改进的性能，并且对加性噪声表现出更强的鲁棒性。我们提供支持我们实证结果的理论分析。

更新时间: 2024-05-30 18:17:57

领域: cs.LG

下载: http://arxiv.org/abs/2311.12356v3

XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution

Large Language Models (LLMs) have demonstrated impressive performances in complex text generation tasks. However, the contribution of the input prompt to the generated content still remains obscure to humans, underscoring the necessity of elucidating and explaining the causality between input and output pairs. Existing works for providing prompt-specific explanation often confine model output to be classification or next-word prediction. Few initial attempts aiming to explain the entire language generation often treat input prompt texts independently, ignoring their combinatorial effects on the follow-up generation. In this study, we introduce a counterfactual explanation framework based on joint prompt attribution, XPrompt, which aims to explain how a few prompt texts collaboratively influences the LLM's complete generation. Particularly, we formulate the task of prompt attribution for generation interpretation as a combinatorial optimization problem, and introduce a probabilistic algorithm to search for the casual input combination in the discrete space. We define and utilize multiple metrics to evaluate the produced explanations, demonstrating both faithfulness and efficiency of our framework.

Updated: 2024-05-30 18:16:41

标题: XPrompt：通过联合提示归因解释大型语言模型的生成

摘要: 大型语言模型（LLMs）在复杂文本生成任务中展示出令人印象深刻的性能。然而，输入提示对生成内容的贡献仍然对人类来说是不明确的，强调了阐明和解释输入和输出对之间因果关系的必要性。现有的提供特定提示解释的工作通常将模型输出限制为分类或下一个单词预测。少数初步尝试旨在解释整个语言生成的工作通常独立地处理输入提示文本，忽略了它们对后续生成的组合效应。在本研究中，我们引入了一个基于联合提示归因的反事实解释框架XPrompt，旨在解释少数提示文本如何共同影响LLM的完整生成。特别地，我们将用于生成解释的提示归因任务构建为一个组合优化问题，并引入一个概率算法来在离散空间中搜索因果输入组合。我们定义并利用多个度量标准来评估生成的解释，展示了我们框架的忠实性和效率。

更新时间: 2024-05-30 18:16:41

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.20404v1

Fast leave-one-cluster-out cross-validation by clustered Network Information Criteria (NICc)

This paper introduced a clustered estimator of the Network Information Criterion (NICc) to approximate leave-one-cluster-out cross-validated deviance, which can be used as an alternative to cluster-based cross-validation when modeling clustered data. Stone proved that Akaike Information Criterion (AIC) is an asymptotic equivalence to leave-one-observation-out cross-validation if the parametric model is true. Ripley pointed out that the Network Information Criterion (NIC) derived in Stone's proof, is a better approximation to leave-one-observation-out cross-validation when the model is not true. For clustered data, we derived a clustered estimator of NIC, referred to as NICc, by substituting the Fisher information matrix in NIC with its estimator that adjusts for clustering. This adjustment imposes a larger penalty in NICc than the unclustered estimator of NIC when modeling clustered data, thereby preventing overfitting more effectively. In a simulation study and an empirical example, we used linear and logistic regression to model clustered data with Gaussian or binomial response, respectively. We showed that NICc is a better approximation to leave-one-cluster-out deviance and prevents overfitting more effectively than AIC and Bayesian Information Criterion (BIC). NICc leads to more accurate model selection, as determined by cluster-based cross-validation, compared to AIC and BIC.

Updated: 2024-05-30 18:10:02

标题: 使用聚类网络信息准则（NICc）实现快速离群交叉验证

摘要: 本文介绍了一种网络信息准则（NICc）的聚类估计器，用于近似单一聚类交叉验证偏差，可用作建模聚类数据时的交叉验证的替代方法。Stone证明了如果参数模型是真实的，则赤池信息准则（AIC）是与单一观测交叉验证渐近等价的。Ripley指出，在模型不真实时，Stone证明的网络信息准则（NIC）是更好的近似leave-one-observation-out交叉验证的方法。对于聚类数据，我们通过用调整聚类的Fisher信息矩阵的估计值替换NIC中的Fisher信息矩阵来推导了一种聚类估计器NICc。与未聚类的NIC估计器相比，这种调整在建模聚类数据时对NICc施加了更大的惩罚，从而更有效地防止过拟合。在模拟研究和实证例中，我们使用线性和逻辑回归来模拟具有高斯或二项响应的聚类数据。我们展示了NICc比AIC和贝叶斯信息准则（BIC）更好地近似了leave-one-cluster-out偏差，并更有效地防止了过拟合。与AIC和BIC相比，NICc导致更准确的模型选择，这是通过基于聚类的交叉验证确定的。

更新时间: 2024-05-30 18:10:02

领域: stat.ME,cs.LG,stat.CO,stat.ML

下载: http://arxiv.org/abs/2405.20400v1

Gransformer: Transformer-based Graph Generation

Transformers have become widely used in various tasks, such as natural language processing and machine vision. This paper proposes Gransformer, an algorithm based on Transformer for generating graphs. We modify the Transformer encoder to exploit the structural information of the given graph. The attention mechanism is adapted to consider the presence or absence of edges between each pair of nodes. We also introduce a graph-based familiarity measure between node pairs that applies to both the attention and the positional encoding. This measure of familiarity is based on message-passing algorithms and contains structural information about the graph. Also, this measure is autoregressive, which allows our model to acquire the necessary conditional probabilities in a single forward pass. In the output layer, we also use a masked autoencoder for density estimation to efficiently model the sequential generation of dependent edges connected to each node. In addition, we propose a technique to prevent the model from generating isolated nodes without connection to preceding nodes by using BFS node orderings. We evaluate this method using synthetic and real-world datasets and compare it with related ones, including recurrent models and graph convolutional networks. Experimental results show that the proposed method performs comparatively to these methods.

Updated: 2024-05-30 18:08:00

标题: Gransformer: 基于Transformer的图生成

摘要: Transformer已经被广泛应用于各种任务，如自然语言处理和机器视觉。本文提出了Gransformer，一种基于Transformer的用于生成图形的算法。我们修改了Transformer编码器以利用给定图形的结构信息。注意机制被调整以考虑每对节点之间的边的存在或缺失。我们还引入了一个适用于注意力和位置编码的节点对之间的基于图形的熟悉度度量。这种熟悉度度量基于消息传递算法，并包含有关图形的结构信息。此外，这种度量是自回归的，这使得我们的模型能够在单次前向传递中获得必要的条件概率。在输出层，我们还使用掩码自编码器进行密度估计，以有效地模拟与每个节点连接的依赖边的顺序生成。此外，我们提出了一种技术，通过使用BFS节点顺序来防止模型生成与先前节点无连接的孤立节点。我们使用合成和真实数据集对这种方法进行评估，并将其与相关方法进行比较，包括递归模型和图卷积网络。实验结果表明，所提出的方法与这些方法相比表现相当。

更新时间: 2024-05-30 18:08:00

领域: cs.LG

下载: http://arxiv.org/abs/2203.13655v3

Explainable Data-driven Modeling of Adsorption Energy in Heterogeneous Catalysis

The increasing popularity of machine learning (ML) in catalysis has spurred interest in leveraging these techniques to enhance catalyst design. Our study aims to bridge the gap between physics-based studies and data-driven methodologies by integrating ML techniques with eXplainable AI (XAI). Specifically, we employ two XAI techniques: Post-hoc XAI analysis and Symbolic Regression. These techniques help us unravel the correlation between adsorption energy and the properties of the adsorbate-catalyst system. Leveraging a large dataset such as the Open Catalyst Dataset (OC20), we employ a combination of shallow ML techniques and XAI methodologies. Our investigation involves utilizing multiple shallow machine learning techniques to predict adsorption energy, followed by post-hoc analysis for feature importance, inter-feature correlations, and the influence of various feature values on the prediction of adsorption energy. The post-hoc analysis reveals that adsorbate properties exert a greater influence than catalyst properties in our dataset. The top five features based on higher Shapley values are adsorbate electronegativity, the number of adsorbate atoms, catalyst electronegativity, effective coordination number, and the sum of atomic numbers of the adsorbate molecule. There is a positive correlation between catalyst and adsorbate electronegativity with the prediction of adsorption energy. Additionally, symbolic regression yields results consistent with SHAP analysis. It deduces a mathematical relationship indicating that the square of the catalyst electronegativity is directly proportional to the adsorption energy. These consistent correlations resemble those derived from physics-based equations in previous research. Our work establishes a robust framework that integrates ML techniques with XAI, leveraging large datasets like OC20 to enhance catalyst design through model explainability.

Updated: 2024-05-30 18:06:14

标题: 可解释的基于数据驱动的异质催化吸附能建模

摘要: 机器学习（ML）在催化领域的日益流行促使人们利用这些技术来增强催化剂设计的兴趣。我们的研究旨在通过将ML技术与可解释人工智能（XAI）集成，弥合基于物理的研究和数据驱动方法之间的差距。具体而言，我们采用了两种XAI技术：事后XAI分析和符号回归。这些技术帮助我们揭示吸附能与吸附剂-催化剂系统的性质之间的相关性。利用像Open Catalyst Dataset（OC20）这样的大型数据集，我们采用浅层ML技术和XAI方法的组合。我们的研究涉及利用多种浅层机器学习技术来预测吸附能，然后进行事后分析以确定特征重要性、特征之间的相关性以及各种特征值对吸附能预测的影响。事后分析表明，在我们的数据集中，吸附剂性质比催化剂性质更具影响力。基于较高Shapley值的前五个特征是吸附剂电负性、吸附剂原子数、催化剂电负性、有效配位数以及吸附剂分子的原子数之和。催化剂和吸附剂的电负性与吸附能的预测之间存在正相关性。此外，符号回归的结果与SHAP分析一致。它推断出一个数学关系，表明催化剂电负性的平方与吸附能成正比。这些一致的相关性类似于以往研究中基于物理方程推导的结果。我们的工作建立了一个坚固的框架，将ML技术与XAI相结合，利用OC20等大型数据集通过模型可解释性来增强催化剂设计。

更新时间: 2024-05-30 18:06:14

领域: cs.LG,physics.chem-ph

下载: http://arxiv.org/abs/2405.20397v1

Quantitative Convergences of Lie Group Momentum Optimizers

Explicit, momentum-based dynamics that optimize functions defined on Lie groups can be constructed via variational optimization and momentum trivialization. Structure preserving time discretizations can then turn this dynamics into optimization algorithms. This article investigates two types of discretization, Lie Heavy-Ball, which is a known splitting scheme, and Lie NAG-SC, which is newly proposed. Their convergence rates are explicitly quantified under $L$-smoothness and local strong convexity assumptions. Lie NAG-SC provides acceleration over the momentumless case, i.e. Riemannian gradient descent, but Lie Heavy-Ball does not. When compared to existing accelerated optimizers for general manifolds, both Lie Heavy-Ball and Lie NAG-SC are computationally cheaper and easier to implement, thanks to their utilization of group structure. Only gradient oracle and exponential map are required, but not logarithm map or parallel transport which are computational costly.

Updated: 2024-05-30 18:01:14

标题: 李群动量优化器的定量收敛性

摘要: 通过变分优化和动量平凡化，可以构建优化定义在李群上的函数的显式、基于动量的动力学。然后，结构保持的时间离散化可以将这种动力学转化为优化算法。本文研究了两种离散化类型，即已知的Lie Heavy-Ball和新提出的Lie NAG-SC。它们在$L$-光滑性和局部强凸性假设下的收敛速率得到了明确量化。与无动量情况（即黎曼梯度下降）相比，Lie NAG-SC能够提供加速，但Lie Heavy-Ball则不能。与现有的用于一般流形的加速优化器相比，Lie Heavy-Ball和Lie NAG-SC都更具有计算效率，更易于实现，这得益于它们对群结构的利用。只需要梯度预言和指数映射，而不需要计算昂贵的对数映射或平行传输。

更新时间: 2024-05-30 18:01:14

领域: cs.LG,cs.NA,math.NA,math.OC,stat.ML

下载: http://arxiv.org/abs/2405.20390v1

Designing an Evaluation Framework for Large Language Models in Astronomy Research

Large Language Models (LLMs) are shifting how scientific research is done. It is imperative to understand how researchers interact with these models and how scientific sub-communities like astronomy might benefit from them. However, there is currently no standard for evaluating the use of LLMs in astronomy. Therefore, we present the experimental design for an evaluation study on how astronomy researchers interact with LLMs. We deploy a Slack chatbot that can answer queries from users via Retrieval-Augmented Generation (RAG); these responses are grounded in astronomy papers from arXiv. We record and anonymize user questions and chatbot answers, user upvotes and downvotes to LLM responses, user feedback to the LLM, and retrieved documents and similarity scores with the query. Our data collection method will enable future dynamic evaluations of LLM tools for astronomy.

Updated: 2024-05-30 18:00:21

标题: 在天文研究中设计大型语言模型的评估框架

摘要: 大型语言模型(LLMs)正在改变科学研究的方式。了解研究人员如何与这些模型进行交互，以及天文学等科学子领域如何从中受益至关重要。然而，目前还没有针对在天文学中评估LLMs使用的标准。因此，我们提出了一个评估研究的实验设计，研究天文学研究人员如何与LLMs进行交互。我们部署了一个Slack聊天机器人，可以通过检索增强生成(RAG)来回答用户的查询；这些回答是基于arXiv上的天文学论文。我们记录并匿名用户的问题和聊天机器人的答案，用户对LLM回答的点赞和踩踏，用户对LLM的反馈，以及检索到的文档和与查询的相似度分数。我们的数据收集方法将为未来对天文学LLM工具的动态评估提供可能。

更新时间: 2024-05-30 18:00:21

领域: astro-ph.IM,cs.AI,cs.HC,cs.IR

下载: http://arxiv.org/abs/2405.20389v1

Recurrent neural network wave functions for Rydberg atom arrays on kagome lattice

Rydberg atom array experiments have demonstrated the ability to act as powerful quantum simulators, preparing strongly-correlated phases of matter which are challenging to study for conventional computer simulations. A key direction has been the implementation of interactions on frustrated geometries, in an effort to prepare exotic many-body states such as spin liquids and glasses. In this paper, we apply two-dimensional recurrent neural network (RNN) wave functions to study the ground states of Rydberg atom arrays on the kagome lattice. We implement an annealing scheme to find the RNN variational parameters in regions of the phase diagram where exotic phases may occur, corresponding to rough optimization landscapes. For Rydberg atom array Hamiltonians studied previously on the kagome lattice, our RNN ground states show no evidence of exotic spin liquid or emergent glassy behavior. In the latter case, we argue that the presence of a non-zero Edwards-Anderson order parameter is an artifact of the long autocorrelations times experienced with quantum Monte Carlo simulations. This result emphasizes the utility of autoregressive models, such as RNNs, to explore Rydberg atom array physics on frustrated lattices and beyond.

Updated: 2024-05-30 18:00:06

标题: 循环神经网络波函数用于kagome格子上的Rydberg原子阵列

摘要: Rydberg原子阵列实验已经证明了其作为强大量子模拟器的能力，可以准备出对于传统计算机模拟来说具有挑战性的强相关相。一个关键的方向是在受挫几何形状上实现相互作用，以准备出类似自旋液体和玻璃等外来多体态。在这篇论文中，我们应用二维循环神经网络（RNN）波函数来研究kagome晶格上的Rydberg原子阵列的基态。我们实现了一个退火方案，以找到RNN变分参数在相图中可能出现外来相的区域，对应于复杂的优化景观。对于之前在kagome晶格上研究的Rydberg原子阵列哈密顿量，我们的RNN基态显示出没有外来自旋液体或新兴玻璃行为的证据。在后一种情况下，我们认为存在非零的Edwards-Anderson序参数是由于量子蒙特卡罗模拟中经历的长自相关时间的伪装。这个结果强调了自回归模型，例如RNN，对于探索在受挫晶格和更远处的Rydberg原子阵列物理学的实用性。

更新时间: 2024-05-30 18:00:06

领域: cond-mat.quant-gas,cond-mat.dis-nn,cond-mat.str-el,cs.LG,quant-ph

下载: http://arxiv.org/abs/2405.20384v1

Gradient Inversion of Federated Diffusion Models

Diffusion models are becoming defector generative models, which generate exceptionally high-resolution image data. Training effective diffusion models require massive real data, which is privately owned by distributed parties. Each data party can collaboratively train diffusion models in a federated learning manner by sharing gradients instead of the raw data. In this paper, we study the privacy leakage risk of gradient inversion attacks. First, we design a two-phase fusion optimization, GIDM, to leverage the well-trained generative model itself as prior knowledge to constrain the inversion search (latent) space, followed by pixel-wise fine-tuning. GIDM is shown to be able to reconstruct images almost identical to the original ones. Considering a more privacy-preserving training scenario, we then argue that locally initialized private training noise $\epsilon$ and sampling step t may raise additional challenges for the inversion attack. To solve this, we propose a triple-optimization GIDM+ that coordinates the optimization of the unknown data, $\epsilon$ and $t$. Our extensive evaluation results demonstrate the vulnerability of sharing gradient for data protection of diffusion models, even high-resolution images can be reconstructed with high quality.

Updated: 2024-05-30 18:00:03

标题: 《联邦扩散模型的梯度反演》

摘要: 扩散模型正变成缺陷生成模型，可以生成异常高分辨率的图像数据。训练有效的扩散模型需要大量真实数据，这些数据由分布式方拥有。每个数据方可以通过共享梯度而不是原始数据以联合学习的方式训练扩散模型。在本文中，我们研究了梯度反演攻击的隐私泄露风险。首先，我们设计了一个两阶段融合优化，GIDM，利用训练良好的生成模型本身作为先验知识来约束反演搜索（潜在）空间，然后进行逐像素微调。结果表明，GIDM能够重建几乎与原始图像完全相同的图像。考虑到更具隐私保护的训练情景，我们认为局部初始化的私有训练噪声$\epsilon$和采样步长t可能会为反演攻击带来额外挑战。为解决这个问题，我们提出了一个三重优化GIDM+，协调未知数据，$\epsilon$和t的优化。我们广泛的评估结果显示，共享梯度对于扩散模型的数据保护存在漏洞，即使是高分辨率图像也可以被高质量地重建。

更新时间: 2024-05-30 18:00:03

领域: cs.AI,cs.CR,cs.CV

下载: http://arxiv.org/abs/2405.20380v1

Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Previous methods based on Score Distillation Sampling (SDS) can produce diversified 3D results by distilling 3D knowledge from large 2D diffusion models, but they usually suffer from long per-case optimization time with inconsistent issues. Recent works address the problem and generate better 3D results either by finetuning a multi-view diffusion model or training a fast feed-forward model. However, they still lack intricate textures and complex geometries due to inconsistency and limited generated resolution. To simultaneously achieve high fidelity, consistency, and efficiency in single image-to-3D, we propose a novel framework Unique3D that includes a multi-view diffusion model with a corresponding normal diffusion model to generate multi-view images with their normal maps, a multi-level upscale process to progressively improve the resolution of generated orthographic multi-views, as well as an instant and consistent mesh reconstruction algorithm called ISOMER, which fully integrates the color and geometric priors into mesh results. Extensive experiments demonstrate that our Unique3D significantly outperforms other image-to-3D baselines in terms of geometric and textural details.

Updated: 2024-05-30 17:59:54

标题: Unique3D：从单个图像生成高质量高效的3D网格

摘要: 在这项工作中，我们介绍了Unique3D，这是一个新颖的图像到3D框架，可以有效地从单视图图像中生成高质量的3D网格，具有最先进的生成保真度和强大的泛化能力。以往基于得分蒸馏采样（SDS）的方法可以通过从大型2D扩散模型中提炼3D知识来生成多样化的3D结果，但它们通常在每个案例的优化时间长且存在不一致的问题。最近的研究解决了这个问题，通过微调多视图扩散模型或训练快速前馈模型来生成更好的3D结果。然而，由于不一致性和生成分辨率有限，它们仍然缺乏复杂的纹理和复杂的几何形状。为了在单个图像到3D中同时实现高保真度、一致性和效率，我们提出了一个新颖的框架Unique3D，其中包括一个多视图扩散模型和一个对应的法线扩散模型，用于生成带有其法线图的多视图图像，一个多级放大过程逐步提高生成的正交多视图的分辨率，以及一个名为ISOMER的即时一致的网格重建算法，将颜色和几何先验完全整合到网格结果中。大量实验证明，我们的Unique3D在几何和纹理细节方面明显优于其他图像到3D基线。

更新时间: 2024-05-30 17:59:54

领域: cs.CV,cs.GR,cs.LG,I.2.10

下载: http://arxiv.org/abs/2405.20343v1

Learning 3D Robotics Perception using Inductive Priors

Recent advances in deep learning have led to a data-centric intelligence i.e. artificially intelligent models unlocking the potential to ingest a large amount of data and be really good at performing digital tasks such as text-to-image generation, machine-human conversation, and image recognition. This thesis covers the topic of learning with structured inductive bias and priors to design approaches and algorithms unlocking the potential of principle-centric intelligence. Prior knowledge (priors for short), often available in terms of past experience as well as assumptions of how the world works, helps the autonomous agent generalize better and adapt their behavior based on past experience. In this thesis, I demonstrate the use of prior knowledge in three different robotics perception problems. 1. object-centric 3D reconstruction, 2. vision and language for decision-making, and 3. 3D scene understanding. To solve these challenging problems, I propose various sources of prior knowledge including 1. geometry and appearance priors from synthetic data, 2. modularity and semantic map priors and 3. semantic, structural, and contextual priors. I study these priors for solving robotics 3D perception tasks and propose ways to efficiently encode them in deep learning models. Some priors are used to warm-start the network for transfer learning, others are used as hard constraints to restrict the action space of robotics agents. While classical techniques are brittle and fail to generalize to unseen scenarios and data-centric approaches require a large amount of labeled data, this thesis aims to build intelligent agents which require very-less real-world data or data acquired only from simulation to generalize to highly dynamic and cluttered environments in novel simulations (i.e. sim2sim) or real-world unseen environments (i.e. sim2real) for a holistic scene understanding of the 3D world.

Updated: 2024-05-30 17:59:51

标题: 学习使用归纳先验知识的3D机器人感知技术

摘要: 深度学习的最新进展已经导致了数据为中心的智能，即人工智能模型解锁了摄取大量数据并在执行数字任务（如文本到图像生成、机器人对话和图像识别）方面表现出色的潜力。本文涵盖了学习具有结构归纳偏差和先验知识的主题，以设计方法和算法来解锁以原则为中心的智能的潜力。先验知识（简称为先验）通常以过去的经验和对世界运作方式的假设形式出现，有助于自主代理进行更好的泛化，并根据过去的经验调整其行为。在本文中，我展示了先验知识在三个不同的机器人感知问题中的应用。1.物体为中心的三维重建，2.视觉和语言用于决策制定，3.三维场景理解。为了解决这些具有挑战性的问题，我提出了各种来源的先验知识，包括1.来自合成数据的几何和外观先验，2.模块化和语义地图先验，以及3.语义、结构和上下文先验。我研究了这些先验知识，以解决机器人的三维感知任务，并提出了在深度学习模型中有效编码它们的方法。一些先验被用于对网络进行预热以进行迁移学习，其他一些被用作硬约束来限制机器人代理的行动空间。虽然传统技术脆弱且无法泛化到未知情形，而数据为中心的方法则需要大量标记数据，但本文旨在构建智能代理，这些代理只需很少的真实世界数据或仅从模拟中获取的数据即可泛化到高度动态和混乱的环境，以实现对三维世界的整体场景理解的新型模拟环境（即sim2sim）或真实世界未知环境（即sim2real）。

更新时间: 2024-05-30 17:59:51

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.20364v1

From Zero to Hero: Cold-Start Anomaly Detection

When first deploying an anomaly detection system, e.g., to detect out-of-scope queries in chatbots, there are no observed data, making data-driven approaches ineffective. Zero-shot anomaly detection methods offer a solution to such "cold-start" cases, but unfortunately they are often not accurate enough. This paper studies the realistic but underexplored cold-start setting where an anomaly detection model is initialized using zero-shot guidance, but subsequently receives a small number of contaminated observations (namely, that may include anomalies). The goal is to make efficient use of both the zero-shot guidance and the observations. We propose ColdFusion, a method that effectively adapts the zero-shot anomaly detector to contaminated observations. To support future development of this new setting, we propose an evaluation suite consisting of evaluation protocols and metrics.

Updated: 2024-05-30 17:59:51

标题: 从零到英雄：冷启动异常检测

摘要: 在首次部署异常检测系统时，例如在聊天机器人中检测超出范围的查询时，没有观察到的数据，使得数据驱动方法失效。零射击异常检测方法为这种“冷启动”情况提供了解决方案，但不幸的是它们通常不够精确。本文研究了现实但未被充分探索的冷启动设置，其中异常检测模型使用零射击指导进行初始化，但随后接收了少量受污染的观察数据（即可能包含异常）。目标是有效利用零射击指导和观察数据。我们提出了ColdFusion方法，该方法有效地将零射击异常检测器适应受污染的观察数据。为支持这种新设置的未来发展，我们提出了一个评估套件，包括评估协议和指标。

更新时间: 2024-05-30 17:59:51

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.20341v1

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

Understanding the evolution of 3D scenes is important for effective autonomous driving. While conventional methods mode scene development with the motion of individual instances, world models emerge as a generative framework to describe the general scene dynamics. However, most existing methods adopt an autoregressive framework to perform next-token prediction, which suffer from inefficiency in modeling long-term temporal evolutions. To address this, we propose a diffusion-based 4D occupancy generation model, OccSora, to simulate the development of the 3D world for autonomous driving. We employ a 4D scene tokenizer to obtain compact discrete spatial-temporal representations for 4D occupancy input and achieve high-quality reconstruction for long-sequence occupancy videos. We then learn a diffusion transformer on the spatial-temporal representations and generate 4D occupancy conditioned on a trajectory prompt. We conduct extensive experiments on the widely used nuScenes dataset with Occ3D occupancy annotations. OccSora can generate 16s-videos with authentic 3D layout and temporal consistency, demonstrating its ability to understand the spatial and temporal distributions of driving scenes. With trajectory-aware 4D generation, OccSora has the potential to serve as a world simulator for the decision-making of autonomous driving. Code is available at: https://github.com/wzzheng/OccSora.

Updated: 2024-05-30 17:59:42

标题: OccSora: 作为自动驾驶世界模拟器的4D占用生成模型

摘要: 理解3D场景的演变对于有效的自动驾驶至关重要。传统方法将场景发展模式与个体实例的运动联系起来，世界模型则作为一个生成框架出现，用于描述一般场景动态。然而，大多数现有方法采用自回归框架进行下一个标记预测，这种方法在建模长期时间演变方面效率低下。为了解决这个问题，我们提出了一种基于扩散的4D占用生成模型OccSora，用于模拟自动驾驶的3D世界发展。我们使用4D场景标记器获取紧凑的离散时空表示，用于4D占用输入，并实现长序列占用视频的高质量重建。然后，我们在时空表示上学习扩散变换器，并生成受轨迹提示条件的4D占用。我们在广泛使用的nuScenes数据集上进行了大量实验，使用Occ3D占用标注。OccSora可以生成具有真实3D布局和时间一致性的16秒视频，展示了它理解驾驶场景的空间和时间分布的能力。通过轨迹感知的4D生成，OccSora有潜力成为自动驾驶决策的世界模拟器。代码可在以下网址获得：https://github.com/wzzheng/OccSora。

更新时间: 2024-05-30 17:59:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.20337v1

CoSy: Evaluating Textual Explanations of Neurons

A crucial aspect of understanding the complex nature of Deep Neural Networks (DNNs) is the ability to explain learned concepts within their latent representations. While various methods exist to connect neurons to textual descriptions of human-understandable concepts, evaluating the quality of these explanation methods presents a major challenge in the field due to a lack of unified, general-purpose quantitative evaluation. In this work, we introduce CoSy (Concept Synthesis) -- a novel, architecture-agnostic framework to evaluate the quality of textual explanations for latent neurons. Given textual explanations, our proposed framework leverages a generative model conditioned on textual input to create data points representing the textual explanation. Then, the neuron's response to these explanation data points is compared with the response to control data points, providing a quality estimate of the given explanation. We ensure the reliability of our proposed framework in a series of meta-evaluation experiments and demonstrate practical value through insights from benchmarking various concept-based textual explanation methods for Computer Vision tasks, showing that tested explanation methods significantly differ in quality.

Updated: 2024-05-30 17:59:04

标题: CoSy: 评估神经元的文本解释

摘要: 理解深度神经网络（DNNs）复杂性质的一个关键方面是能够解释其潜在表示中学习到的概念。虽然存在多种方法将神经元与人类可理解概念的文本描述联系起来，但由于缺乏统一的、通用性的定量评估方法，评估这些解释方法的质量是该领域面临的一个重要挑战。在本研究中，我们引入了CoSy（Concept Synthesis）——一种新颖的、与架构无关的框架，用于评估潜在神经元的文本解释质量。给定文本解释，我们提出的框架利用一个基于文本输入的生成模型来创建代表文本解释的数据点。然后，将神经元对这些解释数据点的响应与对控制数据点的响应进行比较，从而提供给定解释的质量估计。我们通过一系列元评估实验确保了我们提出的框架的可靠性，并通过对计算机视觉任务进行基准测试，展示了基于各种概念的文本解释方法的实用价值，结果显示测试的解释方法在质量上存在显著差异。

更新时间: 2024-05-30 17:59:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.20331v1

4DHands: Reconstructing Interactive Hands in 4D with Transformers

In this paper, we introduce 4DHands, a robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a transformer-based architecture with novel tokenization and feature fusion strategies. Specifically, we propose a Relation-aware Two-Hand Tokenization (RAT) method to embed positional relation information into the hand tokens. In this way, our network can handle both single-hand and two-hand inputs and explicitly leverage relative hand positions, facilitating the reconstruction of intricate hand interactions in real-world scenarios. As such tokenization indicates the relative relationship of two hands, it also supports more effective feature fusion. To this end, we further develop a Spatio-temporal Interaction Reasoning (SIR) module to fuse hand tokens in 4D with attention and decode them into 3D hand meshes and relative temporal movements. The efficacy of our approach is validated on several benchmark datasets. The results on in-the-wild videos and real-world scenarios demonstrate the superior performances of our approach for interactive hand reconstruction. More video results can be found on the project page: https://4dhands.github.io.

Updated: 2024-05-30 17:59:02

标题: 4DHands：使用变换器在四维空间中重建互动手部

摘要: 在这篇论文中，我们介绍了4DHands，一种从单眼输入中恢复交互手部网格及其相对移动的强大方法。我们的方法解决了先前方法的两个主要局限性：缺乏处理各种手部图像输入的统一解决方案以及忽略图像中两只手之间的位置关系。为了克服这些挑战，我们开发了一种基于变换器的架构，具有新颖的标记化和特征融合策略。具体来说，我们提出了一种关系感知的双手标记化（RAT）方法，将位置关系信息嵌入手部标记中。通过这种方式，我们的网络可以处理单手和双手输入，并明确利用相对手部位置，促进在现实场景中复杂手部互动的重建。由于这种标记化指示了两只手的相对关系，它还支持更有效的特征融合。为此，我们进一步开发了一个时空交互推理（SIR）模块，用注意力将4D中的手部标记融合，并解码成3D手部网格和相对时间移动。我们的方法的有效性在几个基准数据集上得到验证。野外视频和现实场景中的结果展示了我们的方法在交互手部重建方面的优越性能。更多视频结果可以在项目页面上找到：https://4dhands.github.io。

更新时间: 2024-05-30 17:59:02

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2405.20330v1

Don't drop your samples! Coherence-aware training benefits Conditional diffusion

Conditional diffusion models are powerful generative models that can leverage various types of conditional information, such as class labels, segmentation masks, or text captions. However, in many real-world scenarios, conditional information may be noisy or unreliable due to human annotation errors or weak alignment. In this paper, we propose the Coherence-Aware Diffusion (CAD), a novel method that integrates coherence in conditional information into diffusion models, allowing them to learn from noisy annotations without discarding data. We assume that each data point has an associated coherence score that reflects the quality of the conditional information. We then condition the diffusion model on both the conditional information and the coherence score. In this way, the model learns to ignore or discount the conditioning when the coherence is low. We show that CAD is theoretically sound and empirically effective on various conditional generation tasks. Moreover, we show that leveraging coherence generates realistic and diverse samples that respect conditional information better than models trained on cleaned datasets where samples with low coherence have been discarded.

Updated: 2024-05-30 17:57:26

标题: 不要丢弃你的样本！具有一致性意识的训练有益于条件扩散

摘要: 条件扩散模型是强大的生成模型，可以利用各种类型的条件信息，例如类标签、分割掩模或文本标题。然而，在许多现实场景中，条件信息可能由于人为注释错误或弱对齐而变得嘈杂或不可靠。在本文中，我们提出了一种新方法Coherence-Aware Diffusion（CAD），将条件信息中的一致性整合到扩散模型中，使其能够从嘈杂的注释中学习而不丢弃数据。我们假设每个数据点都有一个反映条件信息质量的关联一致性分数。然后，我们将扩散模型条件化为条件信息和一致性分数。通过这种方式，当一致性较低时，模型学会忽略或折扣条件。我们证明CAD在理论上是合理的，并在各种条件生成任务上在实践上有效。此外，我们展示了利用一致性生成出比在清理数据集上训练的模型更好地尊重条件信息的逼真和多样的样本。

更新时间: 2024-05-30 17:57:26

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.20324v1

$\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving. Despite the efficacy of Neural Radiance Fields (NeRF) for driving scenes, 3D Gaussian Splatting (3DGS) emerges as a promising direction due to its faster speed and more explicit representation. However, most existing street 3DGS methods require tracked 3D vehicle bounding boxes to decompose the static and dynamic elements for effective reconstruction, limiting their applications for in-the-wild scenarios. To facilitate efficient 3D scene reconstruction without costly annotations, we propose a self-supervised street Gaussian ($\textit{S}^3$Gaussian) method to decompose dynamic and static elements from 4D consistency. We represent each scene with 3D Gaussians to preserve the explicitness and further accompany them with a spatial-temporal field network to compactly model the 4D dynamics. We conduct extensive experiments on the challenging Waymo-Open dataset to evaluate the effectiveness of our method. Our $\textit{S}^3$Gaussian demonstrates the ability to decompose static and dynamic scenes and achieves the best performance without using 3D annotations. Code is available at: https://github.com/nnanhuang/S3Gaussian/.

Updated: 2024-05-30 17:57:08

标题: $\textit{S}^3$高斯：自监督街道高斯用于自动驾驶

摘要: 街景的照片级3D重建是开发自动驾驶真实世界模拟器的关键技术。尽管神经辐射场（NeRF）在驾驶场景中的有效性，但3D高斯飞溅（3DGS）由于速度更快且表示更明确而成为一种有前途的方向。然而，大多数现有的街景3DGS方法需要跟踪的3D车辆边界框来分解静态和动态元素以进行有效的重建，从而限制了它们在野外场景中的应用。为了促进高效的3D场景重建而无需昂贵的注释，我们提出了一种自监督街道高斯（S3Gaussian）方法，通过4D一致性分解动态和静态元素。我们用3D高斯来表示每个场景以保留其明确性，并进一步配备空间-时间场网络来紧凑地建模4D动态。我们在具有挑战性的Waymo-Open数据集上进行了广泛实验，评估了我们方法的有效性。我们的S3Gaussian展示了分解静态和动态场景的能力，并在不使用3D注释的情况下取得了最佳性能。代码可在以下网址获得：https://github.com/nnanhuang/S3Gaussian/。

更新时间: 2024-05-30 17:57:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.20323v1

Vision-based Manipulation from Single Human Video with Open-World Object Graphs

We present an object-centric approach to empower robots to learn vision-based manipulation skills from human videos. We investigate the problem of imitating robot manipulation from a single human video in the open-world setting, where a robot must learn to manipulate novel objects from one video demonstration. We introduce ORION, an algorithm that tackles the problem by extracting an object-centric manipulation plan from a single RGB-D video and deriving a policy that conditions on the extracted plan. Our method enables the robot to learn from videos captured by daily mobile devices such as an iPad and generalize the policies to deployment environments with varying visual backgrounds, camera angles, spatial layouts, and novel object instances. We systematically evaluate our method on both short-horizon and long-horizon tasks, demonstrating the efficacy of ORION in learning from a single human video in the open world. Videos can be found in the project website https://ut-austin-rpl.github.io/ORION-release.

Updated: 2024-05-30 17:56:54

标题: 基于视觉的单人视频中的操作：利用开放世界物体图

摘要: 我们提出了一种以物体为中心的方法，使机器人能够从人类视频中学习基于视觉的操作技能。我们研究了在开放世界环境中从单个人类视频中模仿机器人操作的问题，其中机器人必须从一个视频演示中学习操作新颖的物体。我们引入了一种名为ORION的算法，通过从单个RGB-D视频中提取物体为中心的操作计划，并制定一个根据提取的计划来条件化的策略来解决这个问题。我们的方法使机器人能够从iPad等日常移动设备捕获的视频中学习，并将策略推广到具有不同视觉背景、摄像机角度、空间布局和新颖物体实例的部署环境。我们在短视野和长视野任务上系统评估了我们的方法，展示了ORION在开放世界中从单个人类视频中学习的有效性。视频可以在项目网站https://ut-austin-rpl.github.io/ORION-release中找到。

更新时间: 2024-05-30 17:56:54

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.20321v1

Improving the Training of Rectified Flows

Diffusion models have shown great promise for image and video generation, but sampling from state-of-the-art models requires expensive numerical integration of a generative ODE. One approach for tackling this problem is rectified flows, which iteratively learn smooth ODE paths that are less susceptible to truncation error. However, rectified flows still require a relatively large number of function evaluations (NFEs). In this work, we propose improved techniques for training rectified flows, allowing them to compete with knowledge distillation methods even in the low NFE setting. Our main insight is that under realistic settings, a single iteration of the Reflow algorithm for training rectified flows is sufficient to learn nearly straight trajectories; hence, the current practice of using multiple Reflow iterations is unnecessary. We thus propose techniques to improve one-round training of rectified flows, including a U-shaped timestep distribution and LPIPS-Huber premetric. With these techniques, we improve the FID of the previous 2-rectified flow by up to 72% in the 1 NFE setting on CIFAR-10. On ImageNet 64$\times$64, our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation and progressive distillation in both one-step and two-step settings and rivals the performance of improved consistency training (iCT) in FID. Code is available at https://github.com/sangyun884/rfpp.

Updated: 2024-05-30 17:56:04

标题: 改进矫正流的训练

摘要: 扩散模型在图像和视频生成方面显示出了巨大潜力，但从最先进的模型中进行采样需要进行昂贵的生成ODE的数值积分。解决这个问题的一种方法是矫正流，它们可以迭代地学习平滑的ODE路径，这些路径对截断误差不太敏感。然而，矫正流仍然需要相对较大数量的函数评估（NFEs）。在这项工作中，我们提出了改进的技术，用于训练矫正流，使它们能够在低NFE设置中与知识蒸馏方法竞争。我们的主要见解是，在实际设置下，对于训练矫正流的Reflow算法的单次迭代足以学习几乎是直线的轨迹；因此，使用多个Reflow迭代的当前实践是不必要的。因此，我们提出了改进矫正流的单轮训练的技术，包括U形时间步长分布和LPIPS-Huber预度量。通过这些技术，在CIFAR-10的1 NFE设置中，我们将先前的2矫正流的FID提高了高达72%。在ImageNet 64×64上，我们改进的矫正流在单步和两步设置中优于最先进的蒸馏方法，如一致性蒸馏和渐进蒸馏，并与改进的一致性训练（iCT）在FID上的性能相媲美。代码可在https://github.com/sangyun884/rfpp 上找到。

更新时间: 2024-05-30 17:56:04

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20320v1

ParSEL: Parameterized Shape Editing with Language

The ability to edit 3D assets from natural language presents a compelling paradigm to aid in the democratization of 3D content creation. However, while natural language is often effective at communicating general intent, it is poorly suited for specifying precise manipulation. To address this gap, we introduce ParSEL, a system that enables controllable editing of high-quality 3D assets from natural language. Given a segmented 3D mesh and an editing request, ParSEL produces a parameterized editing program. Adjusting the program parameters allows users to explore shape variations with a precise control over the magnitudes of edits. To infer editing programs which align with an input edit request, we leverage the abilities of large-language models (LLMs). However, while we find that LLMs excel at identifying initial edit operations, they often fail to infer complete editing programs, and produce outputs that violate shape semantics. To overcome this issue, we introduce Analytical Edit Propagation (AEP), an algorithm which extends a seed edit with additional operations until a complete editing program has been formed. Unlike prior methods, AEP searches for analytical editing operations compatible with a range of possible user edits through the integration of computer algebra systems for geometric analysis. Experimentally we demonstrate ParSEL's effectiveness in enabling controllable editing of 3D objects through natural language requests over alternative system designs.

Updated: 2024-05-30 17:55:46

标题: ParSEL: 带有语言的参数化形状编辑

摘要: 能够通过自然语言编辑3D资产的能力呈现了一个引人注目的范例，有助于促进3D内容创作的民主化。然而，虽然自然语言通常有效地传达一般意图，但并不适合用于指定精确的操作。为了弥补这一差距，我们引入了ParSEL，这是一个能够通过自然语言实现对高质量3D资产进行可控编辑的系统。给定一个分割的3D网格和一个编辑请求，ParSEL会生成一个参数化的编辑程序。调整程序参数允许用户精确控制编辑的幅度，从而探索形状变化。为了推断与输入编辑请求对齐的编辑程序，我们利用大型语言模型（LLMs）的能力。然而，我们发现LLMs在识别初始编辑操作方面表现出色，但往往无法推断完整的编辑程序，并产生违反形状语义的输出。为了克服这个问题，我们引入了Analytical Edit Propagation（AEP），这是一种算法，它通过在种子编辑中添加额外操作，直到形成一个完整的编辑程序。与先前的方法不同，AEP通过整合用于几何分析的计算机代数系统，寻找与一系列可能的用户编辑兼容的分析编辑操作。在实验中，我们展示了ParSEL通过自然语言请求实现对3D对象进行可控编辑的有效性，相比于其他系统设计。

更新时间: 2024-05-30 17:55:46

领域: cs.CV,cs.AI,cs.GR,cs.HC,cs.SC

下载: http://arxiv.org/abs/2405.20319v1

CausalQuest: Collecting Natural Causal Questions for AI Agents

Humans have an innate drive to seek out causality. Whether fuelled by curiosity or specific goals, we constantly question why things happen, how they are interconnected, and many other related phenomena. To develop AI agents capable of addressing this natural human quest for causality, we urgently need a comprehensive dataset of natural causal questions. Unfortunately, existing datasets either contain only artificially-crafted questions that do not reflect real AI usage scenarios or have limited coverage of questions from specific sources. To address this gap, we present CausalQuest, a dataset of 13,500 naturally occurring questions sourced from social networks, search engines, and AI assistants. We formalize the definition of causal questions and establish a taxonomy for finer-grained classification. Through a combined effort of human annotators and large language models (LLMs), we carefully label the dataset. We find that 42% of the questions humans ask are indeed causal, with the majority seeking to understand the causes behind given effects. Using this dataset, we train efficient classifiers (up to 2.85B parameters) for the binary task of identifying causal questions, achieving high performance with F1 scores of up to 0.877. We conclude with a rich set of future research directions that can build upon our data and models.

Updated: 2024-05-30 17:55:28

标题: CausalQuest: 为AI代理收集自然因果问题

摘要: 人类具有天生的追求因果关系的驱动力。无论是出于好奇还是特定目标，我们不断质疑事情发生的原因，它们如何相互关联，以及许多其他相关现象。为了开发能够满足这种自然人类对因果关系的追求的人工智能代理，我们迫切需要一个全面的自然因果问题数据集。不幸的是，现有的数据集要么只包含不反映真实人工智能使用场景的人工制作问题，要么只涵盖了特定来源的问题。为了填补这一空白，我们提出了CausalQuest，这是一个包含来自社交网络、搜索引擎和人工智能助手的1.35万个自然发生问题的数据集。我们形式化了因果问题的定义，并建立了一个更细粒度分类的分类法。通过人类注释者和大型语言模型（LLMs）的共同努力，我们认真标记了数据集。我们发现，人类提出的问题中有42%的确实是因果问题，其中大多数是为了了解给定效果背后的原因。利用这个数据集，我们训练了高效的分类器（高达2.85B参数）来识别因果问题的二元任务，取得了高性能，F1分数高达0.877。我们最后提出了一系列可以基于我们的数据和模型的未来研究方向。

更新时间: 2024-05-30 17:55:28

领域: cs.CL,cs.AI,cs.CC,cs.LG

下载: http://arxiv.org/abs/2405.20318v1

Recurrent Drafter for Fast Speculative Decoding in Large Language Models

In this paper, we introduce an improved approach of speculative decoding aimed at enhancing the efficiency of serving large language models. Our method capitalizes on the strengths of two established techniques: the classic two-model speculative decoding approach, and the more recent single-model approach, Medusa. Drawing inspiration from Medusa, our approach adopts a single-model strategy for speculative decoding. However, our method distinguishes itself by employing a single, lightweight draft head with a recurrent dependency design, akin in essence to the small, draft model uses in classic speculative decoding, but without the complexities of the full transformer architecture. And because of the recurrent dependency, we can use beam search to swiftly filter out undesired candidates with the draft head. The outcome is a method that combines the simplicity of single-model design and avoids the need to create a data-dependent tree attention structure only for inference in Medusa. We empirically demonstrate the effectiveness of the proposed method on several popular open source language models, along with a comprehensive analysis of the trade-offs involved in adopting this approach.

Updated: 2024-05-30 17:55:19

标题: 大型语言模型中用于快速推测解码的循环草稿者

摘要: 在这篇论文中，我们介绍了一种改进的投机解码方法，旨在提高为大型语言模型提供服务的效率。我们的方法利用了两种成熟技术的优势：经典的双模型投机解码方法和较新的单模型方法Medusa。受Medusa的启发，我们的方法采用了单模型策略进行投机解码。然而，我们的方法通过采用具有循环依赖设计的单个轻量级草稿头来区别于众。这种设计本质上类似于经典投机解码中使用的小型草稿模型，但没有完整transformer架构的复杂性。由于循环依赖，我们可以使用波束搜索快速过滤掉草稿头中的不良候选者。结果是一种方法，结合了单模型设计的简单性，并避免了在Medusa中仅用于推断的数据相关树形关注结构的需要。我们通过对几种流行的开源语言模型进行实证分析，展示了提出的方法的有效性，同时对采用这种方法所涉及的权衡进行了全面分析。

更新时间: 2024-05-30 17:55:19

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.09919v3

ANAH: Analytical Annotation of Hallucinations in Large Language Models

Reducing the `$\textit{hallucination}$' problem of Large Language Models (LLMs) is crucial for their wide applications. A comprehensive and fine-grained measurement of the hallucination is the first key step for the governance of this issue but is under-explored in the community. Thus, we present $\textbf{ANAH}$, a bilingual dataset that offers $\textbf{AN}$alytical $\textbf{A}$nnotation of $\textbf{H}$allucinations in LLMs within Generative Question Answering. Each answer sentence in our dataset undergoes rigorous annotation, involving the retrieval of a reference fragment, the judgment of the hallucination type, and the correction of hallucinated content. ANAH consists of ~12k sentence-level annotations for ~4.3k LLM responses covering over 700 topics, constructed by a human-in-the-loop pipeline. Thanks to the fine granularity of the hallucination annotations, we can quantitatively confirm that the hallucinations of LLMs progressively accumulate in the answer and use ANAH to train and evaluate hallucination annotators. We conduct extensive experiments on studying generative and discriminative annotators and show that, although current open-source LLMs have difficulties in fine-grained hallucination annotation, the generative annotator trained with ANAH can surpass all open-source LLMs and GPT-3.5, obtain performance competitive with GPT-4, and exhibits better generalization ability on unseen questions.

Updated: 2024-05-30 17:54:40

标题: ANAH：大型语言模型中幻觉的分析性注释

摘要: 减少大型语言模型（LLMs）的“幻觉”问题对于它们的广泛应用至关重要。对幻觉的全面和精细化测量是治理此问题的第一个关键步骤，但在社区中尚未得到充分探讨。因此，我们提出了ANAH，这是一个双语数据集，提供LLMs中生成问答中幻觉的分析注释。我们数据集中的每个答案句经过严格的注释，包括检索参考片段、判断幻觉类型和更正幻觉内容。ANAH包含约12k个句子级注释，涵盖约4.3k个LLM响应，涵盖700多个主题，由人在循环管道构建。由于幻觉注释的细粒度，我们可以定量确认LLMs的幻觉在答案中逐渐积累，并使用ANAH来训练和评估幻觉注释者。我们进行了大量实验，研究生成和判别注释者，并表明，尽管当前开源LLMs在细粒度幻觉注释方面存在困难，但用ANAH训练的生成注释者可以超越所有开源LLMs和GPT-3.5，在GPT-4上获得具有竞争力的性能，并在未见问题上展现出更好的泛化能力。

更新时间: 2024-05-30 17:54:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.20315v1

Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss

In this work, we study statistical learning with dependent ($\beta$-mixing) data and square loss in a hypothesis class $\mathscr{F}\subset L_{\Psi_p}$ where $\Psi_p$ is the norm $\|f\|_{\Psi_p} \triangleq \sup_{m\geq 1} m^{-1/p} \|f\|_{L^m} $ for some $p\in [2,\infty]$. Our inquiry is motivated by the search for a sharp noise interaction term, or variance proxy, in learning with dependent data. Absent any realizability assumption, typical non-asymptotic results exhibit variance proxies that are deflated multiplicatively by the mixing time of the underlying covariates process. We show that whenever the topologies of $L^2$ and $\Psi_p$ are comparable on our hypothesis class $\mathscr{F}$ -- that is, $\mathscr{F}$ is a weakly sub-Gaussian class: $\|f\|_{\Psi_p} \lesssim \|f\|_{L^2}^\eta$ for some $\eta\in (0,1]$ -- the empirical risk minimizer achieves a rate that only depends on the complexity of the class and second order statistics in its leading term. Our result holds whether the problem is realizable or not and we refer to this as a \emph{near mixing-free rate}, since direct dependence on mixing is relegated to an additive higher order term. We arrive at our result by combining the above notion of a weakly sub-Gaussian class with mixed tail generic chaining. This combination allows us to compute sharp, instance-optimal rates for a wide range of problems. Examples that satisfy our framework include sub-Gaussian linear regression, more general smoothly parameterized function classes, finite hypothesis classes, and bounded smoothness classes.

Updated: 2024-05-30 17:54:02

标题: Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss 依赖学习理论中的尖锐速率：避免方差损失的样本量缩减

摘要: 在这项工作中，我们研究了具有相关性（$\beta$-混合）数据和平方损失的统计学习，在一个假设类$\mathscr{F}\subset L_{\Psi_p}$中进行，其中$\Psi_p$是范数$\|f\|_{\Psi_p} \triangleq \sup_{m\geq 1} m^{-1/p} \|f\|_{L^m}$，对于某些$p\in [2,\infty]$。我们的研究动机是寻找与相关数据学习中尖锐的噪声交互项或方差代理有关的内容。在没有任何可实现的假设的情况下，典型的非渐近结果展示了方差代理，通过基础协变量过程的混合时间被乘法地削弱。我们证明了只要$L^2$和$\Psi_p$在我们的假设类$\mathscr{F}$上是可比较的拓扑结构 -- 也就是说，$\mathscr{F}$是一个弱次高斯类：$\|f\|_{\Psi_p} \lesssim \|f\|_{L^2}^\eta$，对于一些$\eta\in (0,1]$ -- 经验风险最小化器实现了一个速率，这个速率仅取决于类的复杂性和其第二阶统计量作为主要项。我们的结果适用于问题是否可实现，并将其称为\emph{近似无混合速率}，因为对混合的直接依赖被降级为一个附加的高阶项。我们通过将弱次高斯类的上述概念与混合尾部泛函链相结合来得出我们的结果。这种组合使我们能够计算一系列问题的尖锐、实例最优的速率。满足我们框架的示例包括次高斯线性回归、更一般的平滑参数化函数类、有限假设类和有界光滑度类。

更新时间: 2024-05-30 17:54:02

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.05928v2

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation

Proteins are essential for almost all biological processes and derive their diverse functions from complex 3D structures, which are in turn determined by their amino acid sequences. In this paper, we exploit the rich biological inductive bias of amino acid sequences and introduce FoldFlow-2, a novel sequence-conditioned SE(3)-equivariant flow matching model for protein structure generation. FoldFlow-2 presents substantial new architectural features over the previous FoldFlow family of models including a protein large language model to encode sequence, a new multi-modal fusion trunk that combines structure and sequence representations, and a geometric transformer based decoder. To increase diversity and novelty of generated samples -- crucial for de-novo drug design -- we train FoldFlow-2 at scale on a new dataset that is an order of magnitude larger than PDB datasets of prior works, containing both known proteins in PDB and high-quality synthetic structures achieved through filtering. We further demonstrate the ability to align FoldFlow-2 to arbitrary rewards, e.g. increasing secondary structures diversity, by introducing a Reinforced Finetuning (ReFT) objective. We empirically observe that FoldFlow-2 outperforms previous state-of-the-art protein structure-based generative models, improving over RFDiffusion in terms of unconditional generation across all metrics including designability, diversity, and novelty across all protein lengths, as well as exhibiting generalization on the task of equilibrium conformation sampling. Finally, we demonstrate that a fine-tuned FoldFlow-2 makes progress on challenging conditional design tasks such as designing scaffolds for the VHH nanobody.

Updated: 2024-05-30 17:53:50

标题: 增强序列SE(3)-流匹配用于条件蛋白质主链生成

摘要: 蛋白质对几乎所有生物过程都是必不可少的，并且从复杂的3D结构中派生出它们的多样化功能，这些结构又由它们的氨基酸序列决定。在本文中，我们利用氨基酸序列的丰富生物归纳偏差，并引入了FoldFlow-2，这是一种新颖的序列条件SE(3)-等变流匹配模型，用于蛋白质结构生成。FoldFlow-2相对于先前的FoldFlow模型家族具有实质性的新架构特征，包括用于编码序列的蛋白质大型语言模型、将结构和序列表示结合的新的多模式融合主干，以及基于几何变换器的解码器。为了增加生成样本的多样性和新颖性--这对于de-novo药物设计至关重要--我们在一个比先前作品中的PDB数据集大一个数量级的新数据集上对FoldFlow-2进行规模训练，其中包含PDB中已知的蛋白质和通过过滤实现的高质量合成结构。我们进一步展示了通过引入强化微调（ReFT）目标，将FoldFlow-2与任意奖励进行对齐的能力。我们在经验上观察到，FoldFlow-2优于先前的最先进的基于蛋白质结构的生成模型，在无条件生成方面优于RFDiffusion，包括设计能力、多样性和新颖性在所有蛋白质长度上的表现，并且在平衡构象采样任务上表现出泛化能力。最后，我们展示了经过精细调整的FoldFlow-2在具有挑战性的条件设计任务上取得了进展，比如为VHH纳米体设计支架。

更新时间: 2024-05-30 17:53:50

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2405.20313v1

Large Language Models Can Self-Improve At Web Agent Tasks

Training models to act as agents that can effectively navigate and perform actions in a complex environment, such as a web browser, has typically been challenging due to lack of training data. Large language models (LLMs) have recently demonstrated some capability to navigate novel environments as agents in a zero-shot or few-shot fashion, purely guided by natural language instructions as prompts. Recent research has also demonstrated LLMs have the capability to exceed their base performance through self-improvement, i.e. fine-tuning on data generated by the model itself. In this work, we explore the extent to which LLMs can self-improve their performance as agents in long-horizon tasks in a complex environment using the WebArena benchmark. In WebArena, an agent must autonomously navigate and perform actions on web pages to achieve a specified objective. We explore fine-tuning on three distinct synthetic training data mixtures and achieve a 31\% improvement in task completion rate over the base model on the WebArena benchmark through a self-improvement procedure. We additionally contribute novel evaluation metrics for assessing the performance, robustness, capabilities, and quality of trajectories of our fine-tuned agent models to a greater degree than simple, aggregate-level benchmark scores currently used to measure self-improvement.

Updated: 2024-05-30 17:52:36

标题: 大型语言模型可以自我改进网络代理任务

摘要: 训练模型以充当能够在复杂环境中有效导航和执行操作的代理人一直是具有挑战性的，因为缺乏训练数据。最近，大型语言模型（LLMs）已经表现出一定能力，以零射击或少射击的方式在新颖环境中作为代理人导航，纯粹依靠自然语言指令作为提示。最近的研究还表明，LLMs具有通过自我改进（即在模型本身生成的数据上进行微调）超越其基本性能的能力。在这项工作中，我们探索LLMs在使用WebArena基准测试中作为代理人在复杂环境中执行长期任务时能够自我改进性能的程度。在WebArena中，代理必须自主导航并执行网页上的操作以实现指定目标。我们探索在三种不同的合成训练数据混合上进行微调，并通过自我改进程序在WebArena基准测试中将任务完成率提高了31％。我们还提出了新颖的评估指标，用于评估我们的微调代理模型的性能、稳健性、能力和轨迹质量，比目前用于衡量自我改进的简单的总体水平基准分数更为全面。

更新时间: 2024-05-30 17:52:36

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20309v1

Contextual Position Encoding: Learning to Count What's Important

The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but is order-invariant. Incorporating position encoding (PE) makes it possible to address by position, such as attending to the i-th token. However, current PE methods use token counts to derive position, and thus cannot generalize to higher levels of abstraction, such as attending to the i-th sentence. In this paper, we propose a new position encoding method, Contextual Position Encoding (CoPE), that allows positions to be conditioned on context by incrementing position only on certain tokens determined by the model. This allows more general position addressing such as attending to the $i$-th particular word, noun, or sentence. We show that CoPE can solve the selective copy, counting and Flip-Flop tasks where popular position embeddings fail, and improves perplexity on language modeling and coding tasks.

Updated: 2024-05-30 17:51:53

标题: 语境位置编码：学会计算重要的内容

摘要: 注意机制是大型语言模型（LLMs）的关键组成部分，它允许序列中的标记相互交互，但是不受顺序影响。结合位置编码（PE）使得可以按位置进行处理，例如关注第i个标记。然而，当前的PE方法使用标记计数来推导位置，因此无法推广到更高层次的抽象，比如关注第i个句子。在本文中，我们提出了一种新的位置编码方法，即上下文位置编码（CoPE），它允许位置根据上下文进行调整，通过仅在模型确定的某些标记上增加位置。这允许更一般的位置寻址，例如关注第i个特定单词、名词或句子。我们展示了CoPE可以解决选择性复制、计数和Flip-Flop任务，而流行的位置嵌入则无法做到，并且在语言建模和编码任务中改善了困惑度。

更新时间: 2024-05-30 17:51:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.18719v2

Group Robust Preference Optimization in Reward-free RLHF

Adapting large language models (LLMs) for specific tasks usually involves fine-tuning through reinforcement learning with human feedback (RLHF) on preference data. While these data often come from diverse labelers' groups (e.g., different demographics, ethnicities, company teams, etc.), traditional RLHF approaches adopt a "one-size-fits-all" approach, i.e., they indiscriminately assume and optimize a single preference model, thus not being robust to unique characteristics and needs of the various groups. To address this limitation, we propose a novel Group Robust Preference Optimization (GRPO) method to align LLMs to individual groups' preferences robustly. Our approach builds upon reward-free direct preference optimization methods, but unlike previous approaches, it seeks a robust policy which maximizes the worst-case group performance. To achieve this, GRPO adaptively and sequentially weights the importance of different groups, prioritizing groups with worse cumulative loss. We theoretically study the feasibility of GRPO and analyze its convergence for the log-linear policy class. By fine-tuning LLMs with GRPO using diverse group-based global opinion data, we significantly improved performance for the worst-performing groups, reduced loss imbalances across groups, and improved probability accuracies compared to non-robust baselines.

Updated: 2024-05-30 17:50:04

标题: 组鲁棒偏好优化在无奖励的RLHF中

摘要: 将大型语言模型（LLMs）调整为特定任务通常需要通过强化学习和人类反馈（RLHF）在偏好数据上进行微调。这些数据通常来自不同的标注者群体（例如，不同的人口统计学、种族、公司团队等），传统的RLHF方法采用“一刀切”的方式，即不加区分地假设和优化单一的偏好模型，因此无法适应各种群体的独特特征和需求。为了解决这一局限性，我们提出了一种新颖的Group Robust Preference Optimization（GRPO）方法，可稳健地使LLMs与各个群体的偏好保持一致。我们的方法建立在无奖励的直接偏好优化方法之上，但与以往方法不同，它寻求一种最大化最坏情况下群体性能的稳健策略。为了实现这一目标，GRPO自适应地和顺序地加权不同群体的重要性，优先考虑累积损失较高的群体。我们在对数线性策略类中理论上研究了GRPO的可行性，并分析了其对数线性策略类的收敛性。通过使用多样化的基于群体的全局意见数据对LLMs进行GRPO微调，我们显著提高了表现最差的群体的性能，减少了各个群体之间的损失不平衡，并与非稳健基线相比提高了概率准确性。

更新时间: 2024-05-30 17:50:04

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.20304v1

You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism

Scaled Dot Product Attention (SDPA) is the backbone of many modern deep-learning models. It is so versatile that it has been used in natural language, vision, and multi-modal domains with very little change compared to its original formulation. This paper discusses why the current formulation is inefficient by delving into the mathematical details of the attention mechanism. We propose three improvements to mitigate these inefficiencies, thereby, introducing three enhanced attention mechanisms: Optimised, Efficient, and Super Attention. Optimised and Efficient Attention have one and two matrix multiplications fewer per head, respectively, and 25% and 50% fewer parameters, respectively, than standard SDPA, but perform similarly to standard SDPA in both vision and natural language tasks. They can be used in all applications where SDPA is used while offering smaller model sizes and faster training and inference without noticeable loss in performance. Super Attention introduces a new linear transformation on the values, transforming them from the left. It outperforms standard SPDA on vision and natural language tasks by up to 17% while having one fewer matrix multiplication per head and 25% fewer parameters than standard SDPA. Consequently, it is also faster than standard SDPA. Super Attention is ideal in applications where the attention layer's context length is fixed, such as Vision Transformers. In addition to providing mathematical reasoning, we evaluate the presented attention mechanisms on several datasets including MNIST, CIFAR100, ImageNet, IMDB Movie Reviews, and Amazon Reviews datasets, as well as combined Europarl and Anki English-Spanish datasets for neural machine translation.

Updated: 2024-05-30 17:46:22

标题: 你需要更加专注：重新思考关注机制的数学

摘要: 缩放点积注意力（SDPA）是许多现代深度学习模型的基础。它非常灵活，已经在自然语言、视觉和多模态领域中被广泛应用，并与其原始公式相比几乎没有改变。本文讨论了当前公式为什么低效，深入研究了注意力机制的数学细节。我们提出了三种改进方法来缓解这些低效性，从而引入了三种增强的注意力机制：优化、高效和超级注意力。优化和高效注意力每个头部分别少了一个和两个矩阵乘法，比标准SDPA分别少了25%和50%的参数，但在视觉和自然语言任务中表现与标准SDPA相似。它们可以在所有使用SDPA的应用中使用，同时提供更小的模型尺寸和更快的训练和推断速度，而性能损失不明显。超级注意力在值上引入了一个新的线性变换，将其从左侧进行转换。在视觉和自然语言任务中，其性能比标准SPDA高出17%，每个头部少一个矩阵乘法，比标准SDPA少了25%的参数。因此，它也比标准SDPA更快。超级注意力在注意力层上的上下文长度固定的应用中非常理想，比如视觉Transformer。除了提供数学推理外，我们还在多个数据集上评估了所提出的注意力机制，包括MNIST、CIFAR100、ImageNet、IMDB电影评论和亚马逊评论数据集，以及结合的欧洲议会和Anki英语-西班牙语数据集用于神经机器翻译。

更新时间: 2024-05-30 17:46:22

领域: cs.LG,cs.AI,cs.CL,cs.CV,68T07 (Primary) 68T45, 68T50, 68T10, 15A03, 15A04 (Secondary),I.2.6; I.2.7; I.2.10; I.4.0; I.5.0; I.7.0

下载: http://arxiv.org/abs/2403.01643v2

How (not) to Build Quantum PKE in Minicrypt

The seminal work by Impagliazzo and Rudich (STOC'89) demonstrated the impossibility of constructing classical public key encryption (PKE) from one-way functions (OWF) in a black-box manner. However, the question remains: can quantum PKE (QPKE) be constructed from quantumly secure OWF? A recent line of work has shown that it is indeed possible to build QPKE from OWF, but with one caveat -- they rely on quantum public keys, which cannot be authenticated and reused. In this work, we re-examine the possibility of perfect complete QPKE in the quantum random oracle model (QROM), where OWF exists. Our first main result: QPKE with classical public keys, secret keys and ciphertext, does not exist in the QROM, if the key generation only makes classical queries. Therefore, a necessary condition for constructing such QPKE from OWF is to have the key generation classically ``un-simulatable''. Previous discussions (Austrin et al. CRYPTO'22) on the impossibility of QPKE from OWF rely on a seemingly strong conjecture. Our work makes a significant step towards a complete and unconditional quantization of Impagliazzo and Rudich's results. Our second main result extends to QPKE with quantum public keys. The second main result: QPKE with quantum public keys, classical secret keys and ciphertext, does not exist in the QROM, if the key generation only makes classical queries and the quantum public key is either pure or ``efficiently clonable''. The result is tight due to all existing QPKEs constructions. Our result further gives evidence on why existing QPKEs lose reusability. To achieve these results, we use a novel argument based on conditional mutual information and quantum Markov chain by Fawzi and Renner (Communications in Mathematical Physics). We believe the techniques used in the work will find other usefulness in separations in quantum cryptography/complexity.

Updated: 2024-05-30 17:44:03

标题: 如何（不）在Minicrypt中构建量子PKE

摘要: Impagliazzo和Rudich在STOC'89上的开创性工作展示了无法以黑盒方式从单向函数(OWF)构建经典公钥加密(PKE)。然而，问题仍然存在：量子公钥加密(QPKE)能否从量子安全OWF构建？最近的一系列研究表明，确实可以从OWF构建QPKE，但有一个限制：它们依赖于无法进行身份验证和重用的量子公钥。在这项工作中，我们重新审视了在存在OWF的量子随机预言模型(QROM)中实现完全完美QPKE的可能性。我们的第一个主要结果是：如果密钥生成仅进行经典查询，则在QROM中不存在具有经典公钥、秘密密钥和密文的QPKE。因此，从OWF构建这种QPKE的一个必要条件是密钥生成在经典情况下“不可模拟”。之前的讨论(Austrin等人在CRYPTO'22上)关于无法从OWF构建QPKE依赖于一个看似强大的猜想。我们的工作朝着对Impagliazzo和Rudich的结果进行完整和无条件的量子化迈出了重要一步。我们的第二个主要结果适用于具有量子公钥的QPKE。第二个主要结果是：如果密钥生成仅进行经典查询且量子公钥是纯粹的或“高效可克隆的”，则在QROM中不存在具有量子公钥、经典秘密密钥和密文的QPKE。该结果由于所有现有的QPKE构造而紧凑。我们的结果进一步证明了为什么现有的QPKE会失去可重用性。为了实现这些结果，我们使用了基于Fawzi和Renner的条件互信息和量子马尔可夫链的新颖论证方法(发表在《数学物理通讯》)。我们相信该工作中使用的技术将在量子密码学/复杂性分离中找到其他有用之处。

更新时间: 2024-05-30 17:44:03

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2405.20295v1

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness

The security threat of backdoor attacks is a central concern for deep neural networks (DNNs). Recently, without poisoned data, unlearning models with clean data and then learning a pruning mask have contributed to backdoor defense. Additionally, vanilla fine-tuning with those clean data can help recover the lost clean accuracy. However, the behavior of clean unlearning is still under-explored, and vanilla fine-tuning unintentionally induces back the backdoor effect. In this work, we first investigate model unlearning from the perspective of weight changes and gradient norms, and find two interesting observations in the backdoored model: 1) the weight changes between poison and clean unlearning are positively correlated, making it possible for us to identify the backdoored-related neurons without using poisoned data; 2) the neurons of the backdoored model are more active (i.e., larger changes in gradient norm) than those in the clean model, suggesting the need to suppress the gradient norm during fine-tuning. Then, we propose an effective two-stage defense method. In the first stage, an efficient Neuron Weight Change (NWC)-based Backdoor Reinitialization is proposed based on observation 1). In the second stage, based on observation 2), we design an Activeness-Aware Fine-Tuning to replace the vanilla fine-tuning. Extensive experiments, involving eight backdoor attacks on three benchmark datasets, demonstrate the superior performance of our proposed method compared to recent state-of-the-art backdoor defense approaches.

Updated: 2024-05-30 17:41:32

标题: 揭示和缓解基于遗忘权重变化和后门活动性的后门漏洞

摘要: 背门攻击的安全威胁是深度神经网络（DNNs）的一个核心关注点。最近，使用清洁数据进行模型取消学习，然后学习修剪掩模有助于防御后门。此外，使用这些清洁数据进行普通微调可以帮助恢复丢失的清洁准确性。然而，清洁取消学习的行为仍未被充分探索，而普通微调无意中会引发后门效应。在这项工作中，我们首先从权重变化和梯度范数的角度研究模型取消学习，并在后门模型中发现了两个有趣的观察结果：1）毒害和清洁取消学习之间的权重变化呈正相关，这使我们有可能在不使用毒害数据的情况下识别与后门相关的神经元；2）后门模型的神经元比清洁模型中的更活跃（即梯度范数变化更大），这表明需要在微调过程中抑制梯度范数。然后，我们提出了一种有效的两阶段防御方法。在第一阶段，基于观察结果1），提出了一种高效的基于神经元权重变化（NWC）的后门重新初始化。在第二阶段，基于观察结果2），我们设计了一种活跃感知微调来替代普通微调。广泛的实验，涉及三个基准数据集上的八种后门攻击，证明了我们提出的方法相比最近的最先进的后门防御方法具有优越的性能。

更新时间: 2024-05-30 17:41:32

领域: cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.20291v1

DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Controllable music generation methods are critical for human-centered AI-based music creation, but are currently limited by speed, quality, and control design trade-offs. Diffusion Inference-Time T-optimization (DITTO), in particular, offers state-of-the-art results, but is over 10x slower than real-time, limiting practical use. We propose Distilled Diffusion Inference-Time T -Optimization (or DITTO-2), a new method to speed up inference-time optimization-based control and unlock faster-than-real-time generation for a wide-variety of applications such as music inpainting, outpainting, intensity, melody, and musical structure control. Our method works by (1) distilling a pre-trained diffusion model for fast sampling via an efficient, modified consistency or consistency trajectory distillation process (2) performing inference-time optimization using our distilled model with one-step sampling as an efficient surrogate optimization task and (3) running a final multi-step sampling generation (decoding) using our estimated noise latents for best-quality, fast, controllable generation. Through thorough evaluation, we find our method not only speeds up generation over 10-20x, but simultaneously improves control adherence and generation quality all at once. Furthermore, we apply our approach to a new application of maximizing text adherence (CLAP score) and show we can convert an unconditional diffusion model without text inputs into a model that yields state-of-the-art text control. Sound examples can be found at https://ditto-music.github.io/ditto2/.

Updated: 2024-05-30 17:40:11

标题: DITTO-2：音乐生成的蒸馏扩散推理时间T优化

摘要: 可控音乐生成方法对基于人类的AI音乐创作至关重要，但目前受到速度、质量和控制设计权衡的限制。扩散推断时间T-优化（DITTO）特别提供了最先进的结果，但比实时慢10倍以上，限制了实际应用。我们提出了蒸馏扩散推断时间T-优化（或DITTO-2），这是一种加速推断时间优化的新方法，可以解锁比实时更快的生成速度，适用于音乐修复、扩展、强度、旋律和音乐结构控制等各种应用。我们的方法通过（1）通过高效的、修改的一致性或一致性轨迹蒸馏过程提炼经过预训练的扩散模型以实现快速采样，（2）使用我们的蒸馏模型进行推断时间优化，采用一步采样作为高效的替代优化任务，（3）使用我们估计的噪声潜变量进行最佳质量、快速、可控制的生成的最终多步采样生成（解码）。通过彻底评估，我们发现我们的方法不仅使生成速度提高了10-20倍，而且同时提高了控制依从性和生成质量。此外，我们将我们的方法应用到最大化文本依从性（CLAP分数）的新应用程序中，并展示我们可以将无条件扩散模型转换为产生最先进的文本控制的模型。可以在https://ditto-music.github.io/ditto2/找到声音示例。

更新时间: 2024-05-30 17:40:11

领域: cs.SD,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20289v1

Flexible SE(2) graph neural networks with applications to PDE surrogates

This paper presents a novel approach for constructing graph neural networks equivariant to 2D rotations and translations and leveraging them as PDE surrogates on non-gridded domains. We show that aligning the representations with the principal axis allows us to sidestep many constraints while preserving SE(2) equivariance. By applying our model as a surrogate for fluid flow simulations and conducting thorough benchmarks against non-equivariant models, we demonstrate significant gains in terms of both data efficiency and accuracy.

Updated: 2024-05-30 17:39:15

标题: SE(2)图神经网络在偏微分方程替代方法中的应用

摘要: 本文提出了一种新颖的方法，用于构建对2D旋转和平移等变的图神经网络，并利用它们作为非网格域上PDE替代模型。我们展示了将表示与主轴对齐能够让我们规避许多约束条件，同时保持SE(2)等变性。通过将我们的模型应用作流体流动模拟的替代模型，并对非等变模型进行彻底的基准测试，我们证明在数据效率和准确性方面取得了显著的收益。

更新时间: 2024-05-30 17:39:15

领域: cs.LG,cs.AI,physics.flu-dyn

下载: http://arxiv.org/abs/2405.20287v1

From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models

To date, toxicity mitigation in language models has almost entirely been focused on single-language settings. As language models embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient annotated datasets across languages, we employ translated data to evaluate and enhance our mitigation techniques. We also compare finetuning mitigation approaches against retrieval-augmented techniques under both static and continual toxicity mitigation scenarios. This allows us to examine the effects of translation quality and the cross-lingual transfer on toxicity mitigation. We also explore how model size and data quantity affect the success of these mitigation efforts. Covering nine languages, our study represents a broad array of linguistic families and levels of resource availability, ranging from high to mid-resource languages. Through comprehensive experiments, we provide insights into the complexities of multilingual toxicity mitigation, offering valuable insights and paving the way for future research in this increasingly important field. Code and data are available at https://github.com/for-ai/goodtriever.

Updated: 2024-05-30 17:37:11

标题: 从一对多：扩大语言模型中毒性缓解的范围

摘要: 迄今为止，语言模型中毒性缓解几乎完全集中在单一语言环境中。随着语言模型拥抱多语言能力，我们的安全措施至关重要。认识到这一研究空白，我们的方法将传统毒性缓解范围扩展到处理多种语言所呈现的复杂性。在缺乏跨语言的足够注释数据集的情况下，我们使用翻译数据来评估和增强我们的缓解技术。我们还比较了在静态和持续毒性缓解场景下，微调缓解方法与检索增强技术之间的差异。这使我们能够研究翻译质量和跨语言转移对毒性缓解的影响。我们还探讨了模型大小和数据数量对这些缓解工作成功的影响。我们的研究涵盖了九种语言，代表了广泛的语言家族和资源可用性水平，从高资源到中资源语言。通过全面的实验，我们为多语言毒性缓解的复杂性提供了深入见解，提供了宝贵的见解，为未来在这一日益重要的领域开展研究铺平了道路。代码和数据可在https://github.com/for-ai/goodtriever 上找到。

更新时间: 2024-05-30 17:37:11

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.03893v3

Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains

Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding and generating natural language. However, their capabilities wane in highly specialized domains underrepresented in the pretraining corpus, such as physical and biomedical sciences. This work explores how to repurpose general LLMs into effective task solvers for specialized domains. We introduce a novel, model-agnostic framework for learning custom input tags, which are parameterized as continuous vectors appended to the LLM's embedding layer, to condition the LLM. We design two types of input tags: domain tags are used to delimit specialized representations (e.g., chemical formulas) and provide domain-relevant context; function tags are used to represent specific functions (e.g., predicting molecular properties) and compress function-solving instructions. We develop a three-stage protocol to learn these tags using auxiliary data and domain knowledge. By explicitly disentangling task domains from task functions, our method enables zero-shot generalization to unseen problems through diverse combinations of the input tags. It also boosts LLM's performance in various specialized domains, such as predicting protein or chemical properties and modeling drug-target interactions, outperforming expert models tailored to these tasks.

Updated: 2024-05-30 17:37:06

标题: Tag-LLM：将通用目的LLMs重新用于专业领域

摘要: 大型语言模型（LLMs）已经展示了在理解和生成自然语言方面的显著能力。然而，在预训练语料库中代表性不高的高度专业化领域中，它们的能力会减弱，比如物理和生物医学科学。本文探讨了如何将通用LLMs重新用于专业领域的有效任务求解器。我们引入了一个新颖的、与模型无关的框架，用于学习自定义输入标签，这些标签被参数化为连续向量附加到LLM的嵌入层，以对LLM进行调整。我们设计了两种类型的输入标签：领域标签用于限定专业化表示（例如化学公式）并提供与领域相关的背景；功能标签用于表示特定功能（例如预测分子属性）并压缩功能求解指令。我们制定了一个三阶段协议，利用辅助数据和领域知识来学习这些标签。通过明确将任务领域与任务功能分离，我们的方法通过输入标签的多样组合实现了对未见问题的零-shot泛化。它还提高了LLM在各种专业领域中的性能，如预测蛋白质或化学物性质以及建模药物靶点相互作用，优于针对这些任务定制的专家模型。

更新时间: 2024-05-30 17:37:06

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.05140v2

An Efficient All-to-All GCD Algorithm for Low Entropy RSA Key Factorization

RSA is an incredibly successful and useful asymmetric encryption algorithm. One of the types of implementation flaws in RSA is low entropy of the key generation, specifically the prime number creation stage. This can occur due to flawed usage of random prime number generator libraries, or on computers where there is a lack of a source of external entropy. These implementation flaws result in some RSA keys sharing prime factors, which means that the full factorization of the public modulus can be recovered incredibly efficiently by performing a computation GCD between the two public key moduli that share the prime factor. However, since one does not know which of the composite moduli share a prime factor a-priori, to determine if any such shared prime factors exist, an all-to-all GCD attack (also known as a batch GCD attack, or a bulk GCD attack) can be performed on the available public keys so as to recover any shared prime factors. This study describes a novel all-to-all batch GCD algorithm, which will be referred to as the binary tree batch GCD algorithm, that is more efficient than the current best batch GCD algorithm (the remainder tree batch GCD algorithm). A comparison against the best existing batch GCD method (which is a product tree followed by a remainder tree computation) is given using a dataset of random RSA moduli that are constructed such that some of the moduli share prime factors. This proposed binary tree batch GCD algorithm has better runtime than the existing remainder tree batch GCD algorithm, although asymptotically it has nearly identical scaling and its complexity is dependent on how many shared prime factors exist in the set of RSA keys. In practice, the implementation of the proposed binary tree batch GCD algorithm has a roughly 6x speedup compared to the standard remainder tree batch GCD approach.

Updated: 2024-05-30 17:36:52

标题: 一种用于低熵RSA密钥分解的高效全对全GCD算法

摘要: RSA是一种非常成功和有用的非对称加密算法。RSA实现中的一种缺陷类型是密钥生成的熵值较低，特别是质数生成阶段。这可能是由于对随机质数生成器库的错误使用，或者在计算机上缺乏外部熵源而导致的。这些实现缺陷导致一些RSA密钥共享质数因子，这意味着通过在共享质数因子的两个公钥模数之间执行计算GCD，可以非常有效地恢复公共模数的完整因数分解。然而，由于事先不知道哪个复合模数共享一个质数因子，为了确定是否存在任何这样的共享质数因子，可以对可用公钥执行一种全对全GCD攻击（也称为批量GCD攻击或批量GCD攻击），以恢复任何共享的质数因子。本研究描述了一种新颖的全对全批量GCD算法，将其称为二叉树批量GCD算法，该算法比当前最佳批量GCD算法（余树批量GCD算法）更有效。使用一组构造成一些模数共享质数因子的随机RSA模数数据集，对最佳现有的批量GCD方法进行比较（即按照乘积树后跟余树计算）。提出的二叉树批量GCD算法比现有的余树批量GCD算法具有更好的运行时间，尽管从渐近角度来看它的缩放几乎相同，并且其复杂性取决于RSA密钥集中存在多少共享的质数因子。实际上，所提出的二叉树批量GCD算法的实现与标准余树批量GCD方法相比，速度提高了大约6倍。

更新时间: 2024-05-30 17:36:52

领域: cs.CR

下载: http://arxiv.org/abs/2405.03166v2

Tight Characterizations for Preprocessing against Cryptographic Salting

Cryptography often considers the strongest yet plausible attacks in the real world. Preprocessing (a.k.a. non-uniform attack) plays an important role in both theory and practice: an efficient online attacker can take advantage of advice prepared by a time-consuming preprocessing stage. Salting is a heuristic strategy to counter preprocessing attacks by feeding a small amount of randomness to the cryptographic primitive. We present general and tight characterizations of preprocessing against cryptographic salting, with upper bounds matching the advantages of the most intuitive attack. Our result quantitatively strengthens the previous work by Coretti, Dodis, Guo, and Steinberger (EUROCRYPT'18). Our proof exploits a novel connection between the non-uniform security of salted games and direct product theorems for memoryless algorithms. For quantum adversaries, we give similar characterizations for property finding games, resolving an open problem of the quantum non-uniform security of salted collision resistant hash by Chung, Guo, Liu, and Qian (FOCS'20). Our proof extends the compressed oracle framework of Zhandry (CRYPTO'19) to prove quantum strong direct product theorems for property finding games in the average-case hardness.

Updated: 2024-05-30 17:34:25

标题: 加密盐处理的紧密特征化

摘要: 密码学通常考虑现实世界中最强大但仍然可能的攻击。预处理（也称为非均匀攻击）在理论和实践中起着重要作用：一个高效的在线攻击者可以利用由耗时的预处理阶段准备的建议。盐化是一种启发式策略，通过向密码原语提供少量随机性来抵御预处理攻击。我们给出了预处理对密码盐化的一般和紧密特征化，上界与最直观的攻击的优势相匹配。我们的结果在量化上加强了Coretti、Dodis、Guo和Steinberger（EUROCRYPT'18）的先前工作。我们的证明利用了盐化游戏的非均匀安全性与无记忆算法的直接乘积定理之间的新颖联系。对于量子对手，我们给出了类似的特性发现游戏的特征化，解决了由Chung、Guo、Liu和Qian（FOCS'20）提出的盐化抗碰撞哈希的量子非均匀安全性的一个开放问题。我们的证明将Zhandry（CRYPTO'19）的压缩预言框架扩展到平均情况难度中证明特性发现游戏的量子强直接乘积定理。

更新时间: 2024-05-30 17:34:25

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2405.20281v1

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the distribution of discrete tokens derived from 3D VAEs within the VQVAE framework, while most diffusion-based video models capture the distribution of continuous latent extracted by 2D VAEs without quantization. The temporal compression is simply realized by uniform frame sampling which results in unsmooth motion between consecutive frames. Currently, there lacks of a commonly used continuous video (3D) VAE for latent diffusion-based video models in the research community. Moreover, since current diffusion-based approaches are often implemented using pre-trained text-to-image (T2I) models, directly training a video VAE without considering the compatibility with existing T2I models will result in a latent space gap between them, which will take huge computational resources for training to bridge the gap even with the T2I models as initialization. To address this issue, we propose a method for training a video VAE of latent video models, namely CV-VAE, whose latent space is compatible with that of a given image VAE, e.g., image VAE of Stable Diffusion (SD). The compatibility is achieved by the proposed novel latent space regularization, which involves formulating a regularization loss using the image VAE. Benefiting from the latent space compatibility, video models can be trained seamlessly from pre-trained T2I or video models in a truly spatio-temporally compressed latent space, rather than simply sampling video frames at equal intervals. With our CV-VAE, existing video models can generate four times more frames with minimal finetuning. Extensive experiments are conducted to demonstrate the effectiveness of the proposed video VAE.

Updated: 2024-05-30 17:33:10

标题: CV-VAE：一种用于潜在生成视频模型的兼容视频VAE

摘要: 视频的时空压缩，利用网络如变分自动编码器（VAE），在OpenAI的SORA和许多其他视频生成模型中起着至关重要的作用。例如，许多类似LLM的视频模型学习来自3D VAE的离散令牌的分布，在VQVAE框架内，而大多数基于扩散的视频模型捕捉由2D VAE提取的连续潜在变量的分布，而不进行量化。时间压缩简单地通过均匀帧采样实现，这导致相邻帧之间的运动不平滑。目前，在研究界缺乏一种常用的连续视频（3D）VAE，用于基于扩散的视频模型。此外，由于目前基于扩散的方法通常使用经过预训练的文本到图像（T2I）模型实现，直接训练视频VAE而不考虑与现有T2I模型的兼容性将导致它们之间存在潜在空间差距，即使使用T2I模型作为初始化，也需要大量的计算资源来弥合这一差距。为了解决这个问题，我们提出了一种用于训练视频模型的视频VAE的方法，即CV-VAE，其潜在空间与给定图像VAE（例如，稳定扩散的图像VAE）兼容。这种兼容性是通过提出的新颖潜在空间正则化实现的，其中涉及使用图像VAE制定正则化损失。由于潜在空间的兼容性，视频模型可以在真正的时空压缩潜在空间中无缝地从预训练的T2I或视频模型中进行训练，而不仅仅是在等间隔采样视频帧。借助我们的CV-VAE，现有视频模型可以生成四倍更多的帧，而只需要进行最少的微调。进行了大量实验来证明所提出的视频VAE的有效性。

更新时间: 2024-05-30 17:33:10

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2405.20279v1

Length independent generalization bounds for deep SSM architectures with stability constraints

Many state-of-the-art models trained on long-range sequences, for example S4, S5 or LRU, are made of sequential blocks combining State-Space Models (SSMs) with neural networks. In this paper we provide a PAC bound that holds for these kind of architectures with stable SSM blocks and does not depend on the length of the input sequence. Imposing stability of the SSM blocks is a standard practice in the literature, and it is known to help performance. Our results provide a theoretical justification for the use of stable SSM blocks as the proposed PAC bound decreases as the degree of stability of the SSM blocks increases.

Updated: 2024-05-30 17:32:46

标题: 深度SSM架构的稳定性约束下的长度无关泛化界限

摘要: 许多最先进的模型，如S4、S5或LRU，是基于长序列训练的，由组合了状态空间模型（SSMs）和神经网络的顺序块构成。本文提供了适用于这种稳定SSM块的体系结构的PAC边界，该边界不依赖于输入序列的长度。在文献中，强加SSM块的稳定性是一种标准做法，已知有助于性能。我们的结果为使用稳定SSM块提供了理论上的理据，因为所提出的PAC边界随着SSM块的稳定度增加而降低。

更新时间: 2024-05-30 17:32:46

领域: cs.LG,cs.AI,68,I.2.6

下载: http://arxiv.org/abs/2405.20278v1

Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning

A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e., solutions that result in a different cost functional but same feedback matrix, and convergence to such solutions. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.

Updated: 2024-05-30 17:31:41

标题: 基于观察者的逆强化学习中的非唯一性和等效解的收敛

摘要: 在在线和实时解决确定性逆强化学习（IRL）问题时，存在多个解决方案是一个关键挑战。非唯一性需要研究等效解的概念，即导致不同成本函数但相同反馈矩阵的解决方案，并收敛到这样的解决方案。虽然文献中已经开发了能收敛到等效解决方案的离线算法，但目前尚无解决非唯一性的在线、实时技术。本文开发了一个收敛到近似等效解决方案的正则化历史堆栈观察器。开发了新颖的数据丰富条件以便于分析，并提供了模拟结果以展示所开发技术的有效性。

更新时间: 2024-05-30 17:31:41

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2210.16299v4

A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

The sparsely gated mixture of experts (MoE) architecture sends different inputs to different subnetworks, i.e., experts, through trainable routers. MoE reduces the training computation significantly for large models, but its deployment can be still memory or computation expensive for some downstream tasks. Model pruning is a popular approach to reduce inference computation, but its application in MoE architecture is largely unexplored. To the best of our knowledge, this paper provides the first provably efficient technique for pruning experts in finetuned MoE models. We theoretically prove that prioritizing the pruning of the experts with a smaller change of the routers l2 norm from the pretrained model guarantees the preservation of test accuracy, while significantly reducing the model size and the computational requirements. Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models such as VMoE and E3MoE finetuned on benchmark datasets such as CIFAR10, CIFAR100, and ImageNet.

Updated: 2024-05-30 17:30:42

标题: 一种在精细调整稀疏专家混合模型中修剪专家的可证明有效方法

摘要: 稀疏门控混合专家（MoE）架构通过可训练的路由器将不同的输入发送到不同的子网络，即专家。MoE架构显着减少了大型模型的训练计算量，但对于一些下游任务，其部署仍可能是内存或计算昂贵的。模型修剪是减少推断计算的常用方法，但在MoE架构中的应用尚未被广泛探索。据我们所知，本文提供了第一个在微调的MoE模型中修剪专家的可证效率技术。我们在理论上证明，优先考虑修剪与预训练模型的路由器l2范数变化较小的专家，可以保证保持测试准确性，同时显著减少模型大小和计算需求。尽管我们的理论分析集中在简化的MoE架构上的二元分类任务上，但我们的专家修剪方法已在基准数据集（如CIFAR10、CIFAR100和ImageNet）上微调的大型视觉MoE模型（如VMoE和E3MoE）上得到验证。

更新时间: 2024-05-30 17:30:42

领域: cs.LG

下载: http://arxiv.org/abs/2405.16646v3

ROAST: Review-level Opinion Aspect Sentiment Target Joint Detection

Aspect-Based Sentiment Analysis (ABSA) has experienced tremendous expansion and diversity due to various shared tasks spanning several languages and fields and organized via SemEval workshops and Germeval. Nonetheless, a few shortcomings still need to be addressed, such as the lack of low-resource language evaluations and the emphasis on sentence-level analysis. To thoroughly assess ABSA techniques in the context of complete reviews, this research presents a novel task, Review-Level Opinion Aspect Sentiment Target (ROAST). ROAST seeks to close the gap between sentence-level and text-level ABSA by identifying every ABSA constituent at the review level. We extend the available datasets to enable ROAST, addressing the drawbacks noted in previous research by incorporating low-resource languages, numerous languages, and a variety of topics. Through this effort, ABSA research will be able to cover more ground and get a deeper comprehension of the task and its practical application in a variety of languages and domains (https://github.com/RiTUAL-UH/ROAST-ABSA).

Updated: 2024-05-30 17:29:15

标题: ROAST: 评论级别的意见方面情感目标联合检测

摘要: 面向方面的情感分析（ABSA）经历了巨大的扩展和多样性，这是由于跨越多种语言和领域的各种共享任务，通过SemEval研讨会和Germeval组织。尽管如此，仍然需要解决一些不足之处，比如缺乏低资源语言的评估和对句子级分析的强调。为了全面评估ABSA技术在完整评论的背景下的情况，本研究提出了一个新颖的任务，即Review-Level Opinion Aspect Sentiment Target（ROAST）。ROAST旨在通过在评论级别识别每个ABSA组成部分，弥合句子级和文本级ABSA之间的差距。我们扩展了可用数据集以实现ROAST，通过纳入低资源语言、多种语言和各种主题，解决了先前研究中指出的缺点。通过这一努力，ABSA研究将能够涵盖更多领域，并更深入地理解任务及其在各种语言和领域中的实际应用。

更新时间: 2024-05-30 17:29:15

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20274v1

Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable

Machine unlearning is motivated by desire for data autonomy: a person can request to have their data's influence removed from deployed models, and those models should be updated as if they were retrained without the person's data. We show that, counter-intuitively, these updates expose individuals to high-accuracy reconstruction attacks which allow the attacker to recover their data in its entirety, even when the original models are so simple that privacy risk might not otherwise have been a concern. We show how to mount a near-perfect attack on the deleted data point from linear regression models. We then generalize our attack to other loss functions and architectures, and empirically demonstrate the effectiveness of our attacks across a wide range of datasets (capturing both tabular and image data). Our work highlights that privacy risk is significant even for extremely simple model classes when individuals can request deletion of their data from the model.

Updated: 2024-05-30 17:27:44

标题: 对机器遗忘的重建攻击：简单模型易受攻击

摘要: 机器取消学习的动机是出于对数据自主权的渴望：一个人可以请求将其数据的影响从部署的模型中移除，而这些模型应该被更新，就像没有这个人的数据重新训练一样。我们展示了，出乎意料地，这些更新使个人暴露于高准确度的重建攻击中，使攻击者能够完全恢复他们的数据，即使原始模型非常简单，隐私风险可能也不是一个问题。我们展示了如何对线性回归模型中删除的数据点进行几乎完美的攻击。然后我们将我们的攻击推广到其他损失函数和架构，并在一系列数据集上进行实证验证我们攻击的有效性（包括表格和图像数据）。我们的工作强调了即使对于极其简单的模型类别，当个人可以要求将其数据从模型中删除时，隐私风险也是显著的。

更新时间: 2024-05-30 17:27:44

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2405.20272v1

ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections

Parameter-efficient finetuning (PEFT) has become ubiquitous to adapt foundation models to downstream task requirements while retaining their generalization ability. However, the amount of additionally introduced parameters and compute for successful adaptation and hyperparameter searches can explode quickly, especially when deployed at scale to serve numerous individual requests. To ensure effective, parameter-efficient, and hyperparameter-robust adaptation, we propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections. By design, ETHER transformations require a minimal number of parameters, are less likely to deteriorate model performance, and exhibit robustness to hyperparameter and learning rate choices. In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters ($\sim$$10$-$100$ times lower than LoRA or OFT) across multiple image synthesis and natural language tasks without exhaustive hyperparameter tuning. Finally, we investigate the recent emphasis on Hyperspherical Energy retention for adaptation and raise questions on its practical utility. The code is available at https://github.com/mwbini/ether.

Updated: 2024-05-30 17:26:02

标题: ETHER：使用超平面反射高效微调大规模模型

摘要: Parameter-efficient finetuning (PEFT)已经成为一种普遍的方法，用于根据下游任务的要求来调整基础模型，同时保持其泛化能力。然而，成功适应和超参数搜索所引入的额外参数和计算量可能会迅速增加，特别是在大规模部署以服务众多个体请求时。为了确保有效、参数高效和超参数稳健的适应性，我们提出了ETHER转换系列，通过Efficient fineTuning via HypErplane Reflections来执行。通过设计，ETHER转换需要最少数量的参数，不太可能降低模型性能，并且对超参数和学习率选择表现出稳健性。具体而言，我们介绍了ETHER及其松弛版本ETHER+，在多个图像合成和自然语言任务中，与现有的PEFT方法相比，在没有详尽超参数调整的情况下，具有明显更少的参数（大约比LoRA或OFT低10-100倍）匹配或超越。最后，我们探讨了最近对于适应性的Hyperspherical Energy retention的强调，并对其实际效用提出了问题。代码可在https://github.com/mwbini/ether获取。

更新时间: 2024-05-30 17:26:02

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2405.20271v1

Oja's Algorithm for Sparse PCA

Oja's algorithm for streaming Principal Component Analysis (PCA) for $n$ datapoints in a $d$ dimensional space achieves the same sin-squared error $O(r_\mathsf{eff}/n)$ as the offline algorithm in $O(d)$ space and $O(nd)$ time and a single pass through the datapoints. Here $r_\mathsf{eff}$ is the effective rank (ratio of the trace and the principal eigenvalue of the population covariance matrix $\Sigma$). Under this computational budget, we consider the problem of sparse PCA, where the principal eigenvector of $\Sigma$ is $s$-sparse, and $r_\mathsf{eff}$ can be large. In this setting, to our knowledge, \textit{there are no known single-pass algorithms} that achieve the minimax error bound in $O(d)$ space and $O(nd)$ time without either requiring strong initialization conditions or assuming further structure (e.g., spiked) of the covariance matrix. We show that a simple single-pass procedure that thresholds the output of Oja's algorithm (the Oja vector) can achieve the minimax error bound under some regularity conditions in $O(d)$ space and $O(nd)$ time as long as $r_\mathsf{eff}=O(n/\log n)$. We present a nontrivial and novel analysis of the entries of the unnormalized Oja vector, which involves the projection of a product of independent random matrices on a random initial vector. This is completely different from previous analyses of Oja's algorithm and matrix products, which have been done when the $r_\mathsf{eff}$ is bounded.

Updated: 2024-05-30 17:23:03

标题: Oja的稀疏主成分分析算法

摘要: Oja算法用于在$d$维空间中对$n$个数据点进行流式主成分分析(PCA)，实现与离线算法相同的正弦平方误差$O(r_\mathsf{eff}/n)$，在$O(d)$空间和$O(nd)$时间内通过一次遍历数据点。其中$r_\mathsf{eff}$是有效秩(总体协方差矩阵$\Sigma$的迹与主特征值的比值)。在这个计算预算下，我们考虑稀疏PCA问题，其中$\Sigma$的主特征向量是$s$-稀疏的，$r_\mathsf{eff}$可以很大。在这种情况下，据我们所知，没有已知的可以在$O(d)$空间和$O(nd)$时间内实现极小化误差界的单遍算法，而不需要强初始化条件或假设协方差矩阵的进一步结构(例如尖峰)。我们展示了一种简单的单遍过程，该过程对Oja算法的输出(即Oja向量)进行阈值处理，可以在一些正则条件下在$O(d)$空间和$O(nd)$时间内实现极小化误差界，只要$r_\mathsf{eff}=O(n/\log n)$。我们提供了对未归一化Oja向量条目的非平凡和新颖分析，涉及独立随机矩阵的乘积在随机初始向量上的投影。这与以前对Oja算法和矩阵乘积的分析完全不同，以前的分析是在$r_\mathsf{eff}$有界时进行的。

更新时间: 2024-05-30 17:23:03

领域: math.ST,cs.LG,stat.ML,stat.TH

下载: http://arxiv.org/abs/2402.07240v3

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparable to existing closed-source LLMs. However, only the model's weights are provided with most details (e.g., intermediate checkpoints, pre-training corpus, and training code, etc.) being undisclosed. To improve the transparency of LLMs, the research community has formed to open-source truly open LLMs (e.g., Pythia, Amber, OLMo), where more details (e.g., pre-training corpus and training code) are being provided. These models have greatly advanced the scientific study of these large models including their strengths, weaknesses, biases and risks. However, we observe that the existing truly open LLMs on reasoning, knowledge, and coding tasks are still inferior to existing state-of-the-art LLMs with similar model sizes. To this end, we open-source MAP-Neo, a highly capable and transparent bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens. Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs. Moreover, we open-source all details to reproduce our MAP-Neo, where the cleaned pre-training corpus, data cleaning pipeline, checkpoints, and well-optimized training/evaluation framework are provided. Finally, we hope our MAP-Neo will enhance and strengthen the open research community and inspire more innovations and creativities to facilitate the further improvements of LLMs.

Updated: 2024-05-30 17:17:21

标题: MAP-Neo：高性能且透明的双语大型语言模型系列

摘要: 大语言模型（LLMs）在近年取得了巨大进展，实现了在不同任务上前所未有的性能。然而，由于商业利益，像GPT、Gemini和Claude这样的最具竞争力的模型一直被封闭在专有接口背后，没有披露训练细节。最近，许多机构已经开源了几个强大的LLMs，如LLaMA-3，与现有闭源LLMs相媲美。然而，大多数细节（例如中间检查点、预训练语料库和训练代码等）都没有披露，只提供了模型的权重。为了提高LLMs的透明度，研究界已经开始开源真正开放的LLMs（例如Pythia、Amber、OLMo），提供了更多细节（例如预训练语料库和训练代码）。这些模型极大地推动了对这些大型模型的科学研究，包括它们的优势、弱点、偏见和风险。然而，我们发现现有的真正开放的LLMs在推理、知识和编码任务上仍然不及现有同等模型大小的顶尖LLMs。因此，我们开源了MAP-Neo，一个能力强大且透明的双语语言模型，由45亿高质量标记从头训练而成。我们的MAP-Neo是第一个完全开源的双语LLM，性能与现有顶尖LLM相媲美。此外，我们还开源了所有细节，以便再现我们的MAP-Neo，其中提供了清理后的预训练语料库、数据清洗管道、检查点和经过良好优化的训练/评估框架。最后，我们希望我们的MAP-Neo能够增强和加强开放研究社区，并鼓励更多的创新和创造力，促进LLMs的进一步改进。

更新时间: 2024-05-30 17:17:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19327v2

How to Leverage Diverse Demonstrations in Offline Imitation Learning

Offline Imitation Learning (IL) with imperfect demonstrations has garnered increasing attention owing to the scarcity of expert data in many real-world domains. A fundamental problem in this scenario is how to extract positive behaviors from noisy data. In general, current approaches to the problem select data building on state-action similarity to given expert demonstrations, neglecting precious information in (potentially abundant) $\textit{diverse}$ state-actions that deviate from expert ones. In this paper, we introduce a simple yet effective data selection method that identifies positive behaviors based on their resultant states -- a more informative criterion enabling explicit utilization of dynamics information and effective extraction of both expert and beneficial diverse behaviors. Further, we devise a lightweight behavior cloning algorithm capable of leveraging the expert and selected data correctly. In the experiments, we evaluate our method on a suite of complex and high-dimensional offline IL benchmarks, including continuous-control and vision-based tasks. The results demonstrate that our method achieves state-of-the-art performance, outperforming existing methods on $\textbf{20/21}$ benchmarks, typically by $\textbf{2-5x}$, while maintaining a comparable runtime to Behavior Cloning ($\texttt{BC}$).

Updated: 2024-05-30 17:15:09

标题: 如何利用离线模仿学习中的多样化演示

摘要: 难以获得专家数据的真实世界领域中，具有不完美演示的离线模仿学习（IL）引起了越来越多的关注。在这种情况下的一个基本问题是如何从嘈杂的数据中提取积极行为。一般来说，当前解决该问题的方法是基于状态-动作相似性选择数据，忽略了与专家示范可能不同的（潜在丰富的）$\textit{多样化}$状态-动作中的宝贵信息。在本文中，我们引入了一种简单而有效的数据选择方法，该方法基于其结果状态识别积极行为 -- 这是一种更具信息性的标准，可以明确利用动态信息并有效地提取专家和有益的多样化行为。此外，我们设计了一种轻量级的行为克隆算法，能够正确利用专家和选择的数据。在实验中，我们在一系列复杂和高维度的离线IL基准测试中评估了我们的方法，包括连续控制和基于视觉的任务。结果表明，我们的方法实现了最先进的性能，在$\textbf{20/21}$个基准测试中表现优异，通常达到$\textbf{2-5倍}$，同时保持与行为克隆（$\texttt{BC}$）相当的运行时间。

更新时间: 2024-05-30 17:15:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17476v3

Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning

In this paper, we focus on single-demonstration imitation learning (IL), a practical approach for real-world applications where acquiring multiple expert demonstrations is costly or infeasible and the ground truth reward function is not available. In contrast to typical IL settings with multiple demonstrations, single-demonstration IL involves an agent having access to only one expert trajectory. We highlight the issue of sparse reward signals in this setting and propose to mitigate this issue through our proposed Transition Discriminator-based IL (TDIL) method. TDIL is an IRL method designed to address reward sparsity by introducing a denser surrogate reward function that considers environmental dynamics. This surrogate reward function encourages the agent to navigate towards states that are proximal to expert states. In practice, TDIL trains a transition discriminator to differentiate between valid and non-valid transitions in a given environment to compute the surrogate rewards. The experiments demonstrate that TDIL outperforms existing IL approaches and achieves expert-level performance in the single-demonstration IL setting across five widely adopted MuJoCo benchmarks as well as the "Adroit Door" robotic environment.

Updated: 2024-05-30 17:14:43

标题: 专家接近作为单次演示模仿学习的替代奖励

摘要: 在本文中，我们关注单次演示模仿学习（IL），这是一种实际的方法，适用于在真实世界应用中获取多个专家演示成本高昂或不可行，并且无法获得地面真实奖励函数的情况。与典型的具有多个演示的IL设置相反，单次演示IL涉及代理只能访问一个专家轨迹。我们强调在这种情况下奖励信号稀疏的问题，并提出通过我们提出的基于转换鉴别器的IL（TDIL）方法来缓解这个问题。TDIL是一种旨在通过引入考虑环境动态的更密集的替代奖励函数来解决奖励稀疏性的IRL方法。这个替代奖励函数鼓励代理人导航到接近专家状态的状态。在实践中，TDIL训练一个转换鉴别器，以区分给定环境中的有效和非有效转换，从而计算替代奖励。实验表明，TDIL优于现有的IL方法，并在五个广泛采用的MuJoCo基准以及“Adroit Door”机器人环境中实现了单次演示IL设置中的专家级性能。

更新时间: 2024-05-30 17:14:43

领域: cs.LG

下载: http://arxiv.org/abs/2402.01057v2

Absolute Policy Optimization

In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function, optimizing which leads to guaranteed monotonic improvement in the lower probability bound of performance with high confidence. Building upon this groundbreaking theoretical advancement, we further introduce a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO as well as its efficient variation Proximal Absolute Policy Optimization (PAPO) significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in worst-case performance, as well as expected performance.

Updated: 2024-05-30 17:13:04

标题: 绝对政策优化

摘要: 近年来，信任区域内的在线策略强化学习在解决复杂控制任务和游戏场景方面取得了令人印象深刻的成果。然而，这一类别内的当代最先进算法主要强调对期望性能的改进，缺乏对最坏情况性能结果的控制能力。为了解决这一限制，我们引入了一种新颖的目标函数，其优化导致了性能的下界概率边界在高置信度下的保证单调改进。基于这一开创性的理论进展，我们进一步引入了一种名为绝对策略优化（APO）的实用解决方案。我们的实验表明，我们的方法在挑战性的连续控制基准任务中的有效性，并将其适用性扩展到掌握Atari游戏。我们的研究结果表明，APO以及其高效变种Proximal Absolute Policy Optimization（PAPO）显着优于最先进的策略梯度算法，导致最坏情况性能和期望性能的实质性改进。

更新时间: 2024-05-30 17:13:04

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2310.13230v5

OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning

In this paper, we study offline-to-online Imitation Learning (IL) that pretrains an imitation policy from static demonstration data, followed by fast finetuning with minimal environmental interaction. We find the na\"ive combination of existing offline IL and online IL methods tends to behave poorly in this context, because the initial discriminator (often used in online IL) operates randomly and discordantly against the policy initialization, leading to misguided policy optimization and $\textit{unlearning}$ of pretraining knowledge. To overcome this challenge, we propose a principled offline-to-online IL method, named $\texttt{OLLIE}$, that simultaneously learns a near-expert policy initialization along with an $\textit{aligned discriminator initialization}$, which can be seamlessly integrated into online IL, achieving smooth and fast finetuning. Empirically, $\texttt{OLLIE}$ consistently and significantly outperforms the baseline methods in $\textbf{20}$ challenging tasks, from continuous control to vision-based domains, in terms of performance, demonstration efficiency, and convergence speed. This work may serve as a foundation for further exploration of pretraining and finetuning in the context of IL.

Updated: 2024-05-30 17:11:46

标题: OLLIE: 从离线预训练到在线微调的模仿学习

摘要: 本文研究了离线到在线模仿学习（IL），该方法从静态演示数据中预训练一个模仿策略，然后通过最小的环境交互进行快速微调。我们发现现有的离线IL和在线IL方法的天真组合在这种情况下往往表现不佳，因为初始的鉴别器（通常用于在线IL）随机操作并且与策略初始化不一致，导致策略优化错误和预训练知识的遗忘。为了克服这一挑战，我们提出了一种原则性的离线到在线IL方法，名为OLLIE，同时学习一个接近专家的策略初始化以及一个对齐的鉴别器初始化，可以无缝地集成到在线IL中，实现平稳快速的微调。从实证上看，OLLIE在20个具有挑战性的任务中，从连续控制到基于视觉的领域，在性能、演示效率和收敛速度方面始终明显优于基线方法。这项工作可能为进一步探讨IL背景下的预训练和微调奠定基础。

更新时间: 2024-05-30 17:11:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17477v3

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework to formalize prompt injection attacks. Existing attacks are special cases in our framework. Moreover, based on our framework, we design a new attack by combining existing ones. Using our framework, we conduct a systematic evaluation on 5 prompt injection attacks and 10 defenses with 10 LLMs and 7 tasks. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses. To facilitate research on this topic, we make our platform public at https://github.com/liu00222/Open-Prompt-Injection.

Updated: 2024-05-30 17:09:56

标题: 形式化和基准测试提示注入攻击和防御

摘要: 一种快速注入攻击旨在将恶意指令/数据注入到LLM集成应用程序的输入中，以便产生攻击者所期望的结果。现有的研究局限于案例研究。因此，文献缺乏对快速注入攻击及其防御的系统理解。我们的目标是填补这一空白。具体而言，我们提出了一个框架来形式化快速注入攻击。现有攻击在我们的框架中是特例。此外，基于我们的框架，我们设计了一个结合现有攻击的新攻击。利用我们的框架，我们对5种快速注入攻击和10种防御措施进行了系统评估，涉及10个LLM和7项任务。我们的工作为定量评估未来快速注入攻击和防御提供了一个共同的基准。为了促进这一主题的研究，我们将我们的平台公开在https://github.com/liu00222/Open-Prompt-Injection。

更新时间: 2024-05-30 17:09:56

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2310.12815v2

Entropy annealing for policy mirror descent in continuous time and space

Entropy regularization has been extensively used in policy optimization algorithms to regularize the optimization landscape and accelerate convergence; however, it comes at the cost of introducing an additional regularization bias. This work quantifies the impact of entropy regularization on the convergence of policy gradient methods for stochastic exit time control problems. We analyze a continuous-time policy mirror descent dynamics, which updates the policy based on the gradient of an entropy-regularized value function and adjusts the strength of entropy regularization as the algorithm progresses. We prove that with a fixed entropy level, the dynamics converges exponentially to the optimal solution of the regularized problem. We further show that when the entropy level decays at suitable polynomial rates, the annealed flow converges to the solution of the unregularized problem at a rate of $\mathcal O(1/S)$ for discrete action spaces and, under suitable conditions, at a rate of $\mathcal O(1/\sqrt{S})$ for general action spaces, with $S$ being the gradient flow time. This paper explains how entropy regularization improves policy optimization, even with the true gradient, from the perspective of convergence rate.

Updated: 2024-05-30 17:02:18

标题: 熵退火在连续时间和空间中的策略镜像下降中的应用

摘要: 熵正则化在策略优化算法中被广泛使用，以规范优化景观并加快收敛速度；然而，这是以引入额外的正则化偏差为代价。本文量化了熵正则化对随机退出时间控制问题的策略梯度方法收敛的影响。我们分析了一个连续时间的策略镜像下降动态，该动态基于熵正则化值函数的梯度更新策略，并随着算法的进行调整熵正则化的强度。我们证明了在固定的熵水平下，动态以指数速率收敛到正则化问题的最优解。我们进一步展示了当熵水平以适当的多项式速率衰减时，退火流以$\mathcal O(1/S)$的速率收敛到离散动作空间的未正则化问题的解，且在适当条件下，以$\mathcal O(1/\sqrt{S})$的速率收敛到一般动作空间的解，其中$S$是梯度流时间。本文解释了熵正则化如何从收敛速度的角度改进策略优化，即使使用真实梯度。

更新时间: 2024-05-30 17:02:18

领域: math.OC,cs.LG,math.PR,Primary 93E20, Secondary 49M29, 68Q25, 60H30, 35J61

下载: http://arxiv.org/abs/2405.20250v1

WW-FL: Secure and Private Large-Scale Federated Learning

Federated learning (FL) is an efficient approach for large-scale distributed machine learning that promises data privacy by keeping training data on client devices. However, recent research has uncovered vulnerabilities in FL, impacting both security and privacy through poisoning attacks and the potential disclosure of sensitive information in individual model updates as well as the aggregated global model. This paper explores the inadequacies of existing FL protection measures when applied independently, and the challenges of creating effective compositions. Addressing these issues, we propose WW-FL, an innovative framework that combines secure multi-party computation (MPC) with hierarchical FL to guarantee data and global model privacy. One notable feature of WW-FL is its capability to prevent malicious clients from directly poisoning model parameters, confining them to less destructive data poisoning attacks. We furthermore provide a PyTorch-based FL implementation integrated with Meta's CrypTen MPC framework to systematically measure the performance and robustness of WW-FL. Our extensive evaluation demonstrates that WW-FL is a promising solution for secure and private large-scale federated learning.

Updated: 2024-05-30 17:00:35

标题: WW-FL：安全和隐私的大规模联邦学习

摘要: 联邦学习（FL）是一种针对大规模分布式机器学习的高效方法，通过将训练数据保留在客户端设备上，承诺数据隐私。然而，最近的研究揭示了FL中的漏洞，通过毒化攻击对安全性和隐私性产生影响，以及可能在个体模型更新和聚合全局模型中披露敏感信息。本文探讨了现有FL保护措施在独立应用时的不足之处，以及创建有效组合面临的挑战。为解决这些问题，我们提出了WW-FL，这是一个创新的框架，将安全多方计算（MPC）与分层FL结合，以保证数据和全局模型的隐私性。WW-FL的一个显著特点是其能够阻止恶意客户端直接毒化模型参数，将它们限制在较少破坏性的数据毒化攻击中。此外，我们提供了一个基于PyTorch的FL实现，集成了Meta的CrypTen MPC框架，以系统地衡量WW-FL的性能和稳健性。我们的广泛评估表明，WW-FL是一种有望解决安全和隐私大规模联邦学习问题的解决方案。

更新时间: 2024-05-30 17:00:35

领域: cs.LG,cs.CR,cs.DC,cs.IT,math.IT

下载: http://arxiv.org/abs/2302.09904v3

On the Last-Iterate Convergence of Shuffling Gradient Methods

Shuffling gradient methods are widely implemented in practice, particularly including three popular algorithms: Random Reshuffle (RR), Shuffle Once (SO), and Incremental Gradient (IG). Compared to the empirical success, the theoretical guarantee of shuffling gradient methods was not well-understood for a long time. Until recently, the convergence rates had just been established for the average iterate for convex functions and the last iterate for strongly convex problems (using squared distance as the metric). However, when using the function value gap as the convergence criterion, existing theories cannot interpret the good performance of the last iterate in different settings (e.g., constrained optimization). To bridge this gap between practice and theory, we prove the first last-iterate convergence rates for shuffling gradient methods with respect to the objective value even without strong convexity. Our new results either (nearly) match the existing last-iterate lower bounds or are as fast as the previous best upper bounds for the average iterate.

Updated: 2024-05-30 16:58:52

标题: 关于随机梯度方法最终收敛性的研究

摘要: 搅拌梯度方法在实践中被广泛实施，特别是包括三种流行的算法：随机重洗（RR）、一次洗牌（SO）和增量梯度（IG）。与经验成功相比，长期以来对搅拌梯度方法的理论保证并不为人所了解。直到最近，针对凸函数的平均迭代和强凸问题的最后迭代（使用平方距离作为度量）才建立了收敛速度。然而，当使用函数值差作为收敛标准时，现有理论无法解释在不同设置（例如，受约束优化）中最后迭代的良好性能。为了弥合实践和理论之间的差距，我们证明了搅拌梯度方法相对于目标值的第一个最后迭代的收敛速度，即使没有强凸性。我们的新结果要么（几乎）与现有的最后迭代下界相匹配，要么与以前最佳平均迭代的上界一样快。

更新时间: 2024-05-30 16:58:52

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2403.07723v2

KerasCV and KerasNLP: Vision and Language Power-Ups

We present the Keras domain packages KerasCV and KerasNLP, extensions of the Keras API for Computer Vision and Natural Language Processing workflows, capable of running on either JAX, TensorFlow, or PyTorch. These domain packages are designed to enable fast experimentation, with a focus on ease-of-use and performance. We adopt a modular, layered design: at the library's lowest level of abstraction, we provide building blocks for creating models and data preprocessing pipelines, and at the library's highest level of abstraction, we provide pretrained ``task" models for popular architectures such as Stable Diffusion, YOLOv8, GPT2, BERT, Mistral, CLIP, Gemma, T5, etc. Task models have built-in preprocessing, pretrained weights, and can be fine-tuned on raw inputs. To enable efficient training, we support XLA compilation for all models, and run all preprocessing via a compiled graph of TensorFlow operations using the tf.data API. The libraries are fully open-source (Apache 2.0 license) and available on GitHub.

Updated: 2024-05-30 16:58:34

标题: KerasCV和KerasNLP：视觉和语言增强功能

摘要: 我们介绍了Keras领域包KerasCV和KerasNLP，这是Keras API的扩展，用于计算机视觉和自然语言处理工作流，可以在JAX，TensorFlow或PyTorch上运行。这些领域包旨在实现快速实验，注重易用性和性能。我们采用模块化、分层设计：在库的最低抽象级别上，我们提供用于创建模型和数据预处理流程的构建模块，而在库的最高抽象级别上，我们提供预训练的“任务”模型，用于流行架构，如稳定扩散，YOLOv8，GPT2，BERT，Mistral，CLIP，Gemma，T5等。任务模型具有内置预处理、预训练权重，并可在原始输入上进行微调。为了实现高效训练，我们支持所有模型的XLA编译，并通过TensorFlow操作的编译图使用tf.data API运行所有预处理。这些库完全开源（Apache 2.0许可证），并可在GitHub上获得。

更新时间: 2024-05-30 16:58:34

领域: cs.AI,cs.CV,cs.LG,cs.SE,I.2.5; I.2.7; I.2.10

下载: http://arxiv.org/abs/2405.20247v1

Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use

Business Document Information Extraction (BDIE) is the problem of transforming a blob of unstructured information (raw text, scanned documents, etc.) into a structured format that downstream systems can parse and use. It has two main tasks: Key-Information Extraction (KIE) and Line Items Recognition (LIR). In this paper, we argue that BDIE is best modeled as a Tool Use problem, where the tools are these downstream systems. We then present Retrieval Augmented Structured Generation (RASG), a novel general framework for BDIE that achieves state of the art (SOTA) results on both KIE and LIR tasks on BDIE benchmarks. The contributions of this paper are threefold: (1) We show, with ablation benchmarks, that Large Language Models (LLMs) with RASG are already competitive with or surpasses current SOTA Large Multimodal Models (LMMs) without RASG on BDIE benchmarks. (2) We propose a new metric class for Line Items Recognition, General Line Items Recognition Metric (GLIRM), that is more aligned with practical BDIE use cases compared to existing metrics, such as ANLS*, DocILE, and GriTS. (3) We provide a heuristic algorithm for backcalculating bounding boxes of predicted line items and tables without the need for vision encoders. Finally, we claim that, while LMMs might sometimes offer marginal performance benefits, LLMs + RASG is oftentimes superior given real-world applications and constraints of BDIE.

Updated: 2024-05-30 16:54:42

标题: 检索增强结构生成：作为工具使用的商业文档信息提取

摘要: 商业文档信息提取（BDIE）是将一块未结构化信息（原始文本、扫描文档等）转换为下游系统可以解析和使用的结构化格式的问题。它包括两个主要任务：关键信息提取（KIE）和行项目识别（LIR）。在本文中，我们认为BDIE最好被建模为一个工具使用问题，其中工具是这些下游系统。然后，我们提出了一种新颖的通用框架Retrieval Augmented Structured Generation（RASG），在BDIE基准测试中在KIE和LIR任务上实现了最先进的结果。本文的贡献有三点：（1）我们通过消融基准测试表明，具有RASG的大型语言模型（LLMs）已经在BDIE基准测试上与当前最先进的大型多模态模型（LMMs）竞争力相当或超越。（2）我们提出了一种新的度量类别用于行项目识别，通用行项目识别度量（GLIRM），与现有的度量标准（如ANLS*，DocILE和GriTS）相比，更符合实际的BDIE使用情况。（3）我们提供了一种启发式算法，用于在不需要视觉编码器的情况下反向计算预测的行项目和表的边界框。最后，我们声称，虽然LMMs有时可能提供边际性能优势，但在真实世界应用和BDIE的约束条件下，LLMs + RASG通常更优越。

更新时间: 2024-05-30 16:54:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20245v1

Training-efficient density quantum machine learning

Quantum machine learning requires powerful, flexible and efficiently trainable models to be successful in solving challenging problems. In this work, we present density quantum neural networks, a learning model incorporating randomisation over a set of trainable unitaries. These models generalise quantum neural networks using parameterised quantum circuits, and allow a trade-off between expressibility and efficient trainability, particularly on quantum hardware. We demonstrate the flexibility of the formalism by applying it to two recently proposed model families. The first are commuting-block quantum neural networks (QNNs) which are efficiently trainable but may be limited in expressibility. The second are orthogonal (Hamming-weight preserving) quantum neural networks which provide well-defined and interpretable transformations on data but are challenging to train at scale on quantum devices. Density commuting QNNs improve capacity with minimal gradient complexity overhead, and density orthogonal neural networks admit a quadratic-to-constant gradient query advantage with minimal to no performance loss. We conduct numerical experiments on synthetic translationally invariant data and MNIST image data with hyperparameter optimisation to support our findings. Finally, we discuss the connection to post-variational quantum neural networks, measurement-based quantum machine learning and the dropout mechanism.

Updated: 2024-05-30 16:40:28

标题: 高效训练密度量子机器学习

摘要: 量子机器学习需要强大、灵活和高效可训练的模型才能成功解决具有挑战性的问题。在这项工作中，我们提出了密度量子神经网络，这是一种学习模型，结合了一组可训练的酉矩阵的随机化。这些模型通过参数化量子电路泛化了量子神经网络，并允许在表达能力和高效训练性能之间进行权衡，特别是在量子硬件上。我们通过将其应用于两个最近提出的模型家族来展示这种形式主义的灵活性。第一个是交换块量子神经网络(QNNs)，这些网络具有高效的可训练性能，但在表达能力上可能受限。第二个是正交(保持汉明权重)量子神经网络，这些网络在数据上提供了明确定义且可解释的转换，但在量子设备上规模训练具有挑战性。密度交换块QNNs提高了容量，但梯度复杂度开销较小，而密度正交神经网络具有二次到常数梯度查询优势，几乎没有性能损失。我们进行了关于合成平移不变数据和MNIST图像数据的数值实验，并进行了超参数优化以支持我们的发现。最后，我们讨论了与后变分量子神经网络、基于测量的量子机器学习和辍学机制的联系。

更新时间: 2024-05-30 16:40:28

领域: quant-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20237v1

Disentangling and Mitigating the Impact of Task Similarity for Continual Learning

Continual learning of partially similar tasks poses a challenge for artificial neural networks, as task similarity presents both an opportunity for knowledge transfer and a risk of interference and catastrophic forgetting. However, it remains unclear how task similarity in input features and readout patterns influences knowledge transfer and forgetting, as well as how they interact with common algorithms for continual learning. Here, we develop a linear teacher-student model with latent structure and show analytically that high input feature similarity coupled with low readout similarity is catastrophic for both knowledge transfer and retention. Conversely, the opposite scenario is relatively benign. Our analysis further reveals that task-dependent activity gating improves knowledge retention at the expense of transfer, while task-dependent plasticity gating does not affect either retention or transfer performance at the over-parameterized limit. In contrast, weight regularization based on the Fisher information metric significantly improves retention, regardless of task similarity, without compromising transfer performance. Nevertheless, its diagonal approximation and regularization in the Euclidean space are much less robust against task similarity. We demonstrate consistent results in a permuted MNIST task with latent variables. Overall, this work provides insights into when continual learning is difficult and how to mitigate it.

Updated: 2024-05-30 16:40:07

标题: 解开和减轻任务相似性对持续学习的影响

摘要: 部分相似任务的持续学习对人工神经网络构成挑战，因为任务相似性既提供了知识传递的机会，又存在干扰和灾难性遗忘的风险。然而，目前尚不清楚输入特征和输出模式中的任务相似性如何影响知识传递和遗忘，以及它们如何与持续学习的常见算法相互作用。在这里，我们发展了一个具有潜在结构的线性师生模型，并通过分析表明，高输入特征相似性与低输出相似性相结合对知识传递和保留都是灾难性的。相反，相反的情况相对温和。我们的分析进一步揭示，任务相关的活动门控可以改善知识保留，但会牺牲传递；而任务相关的可塑性门控不会影响保留或传递性能在超参数化极限下。相比之下，基于Fisher信息度量的权重正则化显著改善保留能力，而不会影响传递性能，而其在欧几里得空间的对角近似和正则化则对任务相似性的鲁棒性较差。我们在具有潜变量的打乱的MNIST任务中展示了一致的结果。总的来说，这项工作揭示了持续学习何时困难以及如何减轻其影响。

更新时间: 2024-05-30 16:40:07

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.20236v1

Context Injection Attacks on Large Language Models

Large Language Models (LLMs) such as ChatGPT and Llama-2 have become prevalent in real-world applications, exhibiting impressive text generation performance. LLMs are fundamentally developed from a scenario where the input data remains static and lacks a clear structure. To behave interactively over time, LLM-based chat systems must integrate additional contextual information (i.e., chat history) into their inputs, following a pre-defined structure. This paper identifies how such integration can expose LLMs to misleading context from untrusted sources and fail to differentiate between system and user inputs, allowing users to inject context. We present a systematic methodology for conducting context injection attacks aimed at eliciting disallowed responses by introducing fabricated context. This could lead to illegal actions, inappropriate content, or technology misuse. Our context fabrication strategies, acceptance elicitation and word anonymization, effectively create misleading contexts that can be structured with attacker-customized prompt templates, achieving injection through malicious user messages. Comprehensive evaluations on real-world LLMs such as ChatGPT and Llama-2 confirm the efficacy of the proposed attack with success rates reaching 97%. We also discuss potential countermeasures that can be adopted for attack detection and developing more secure models. Our findings provide insights into the challenges associated with the real-world deployment of LLMs for interactive and structured data scenarios.

Updated: 2024-05-30 16:36:47

标题: 大规模语言模型上的上下文注入攻击

摘要: 大型语言模型（LLMs）如ChatGPT和Llama-2已经在现实世界的应用中变得普遍，展示出令人印象深刻的文本生成性能。LLMs基本上是从输入数据保持静态且缺乏清晰结构的情况下开发的。为了能够随时间进行交互，基于LLM的聊天系统必须将额外的上下文信息（即聊天历史）整合到其输入中，遵循预定义的结构。本文确定了这种整合如何使LLMs暴露于来自不受信任来源的误导性上下文，并且无法区分系统和用户输入，从而允许用户注入上下文。我们提出了一种系统方法，用于开展针对引起不允许响应的上下文注入攻击，通过引入虚构的上下文。这可能导致非法行为、不当内容或技术滥用。我们的上下文制造策略、接受引诱和词匿名化，有效地创建具有误导性的上下文，可以与攻击者定制的提示模板结构化，通过恶意用户消息实现注入。对于ChatGPT和Llama-2等真实世界的LLMs进行全面评估确认了所提出的攻击的有效性，成功率达到97%。我们还讨论了可以用于攻击检测和开发更安全模型的潜在对策。我们的研究结果提供了关于LLMs在交互和结构化数据场景中的实际部署所面临挑战的见解。

更新时间: 2024-05-30 16:36:47

领域: cs.AI

下载: http://arxiv.org/abs/2405.20234v1

Grokfast: Accelerated Grokking by Amplifying Slow Gradients

One puzzling artifact in machine learning dubbed grokking is where delayed generalization is achieved tenfolds of iterations after near perfect overfitting to the training data. Focusing on the long delay itself on behalf of machine learning practitioners, our goal is to accelerate generalization of a model under grokking phenomenon. By regarding a series of gradients of a parameter over training iterations as a random signal over time, we can spectrally decompose the parameter trajectories under gradient descent into two components: the fast-varying, overfitting-yielding component and the slow-varying, generalization-inducing component. This analysis allows us to accelerate the grokking phenomenon more than $\times 50$ with only a few lines of code that amplifies the slow-varying components of gradients. The experiments show that our algorithm applies to diverse tasks involving images, languages, and graphs, enabling practical availability of this peculiar artifact of sudden generalization. Our code is available at \url{https://github.com/ironjr/grokfast}.

Updated: 2024-05-30 16:35:30

标题: Grokfast: 通过放大慢梯度加速理解

摘要: 机器学习中一个令人困惑的现象被称为“grokking”，即在接近完美过拟合训练数据后，延迟十倍迭代才实现泛化。我们的目标是加速在grokking现象下模型的泛化，专注于机器学习从业者长时间延迟本身。通过将参数在训练迭代中的梯度视为随时间变化的随机信号，我们可以将梯度下降下的参数轨迹谱分解为两个组成部分：快速变化、导致过拟合的部分和缓慢变化、引发泛化的部分。这种分析使我们能够通过仅几行代码加速grokking现象超过50倍，放大梯度的缓慢变化部分。实验证明我们的算法适用于涉及图像、语言和图表的各种任务，实现了突然泛化这一独特现象的实际可用性。我们的代码可在\url{https://github.com/ironjr/grokfast}上获得。

更新时间: 2024-05-30 16:35:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.20233v1

The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof

Many algorithms and observed phenomena in deep learning appear to be affected by parameter symmetries -- transformations of neural network parameters that do not change the underlying neural network function. These include linear mode connectivity, model merging, Bayesian neural network inference, metanetworks, and several other characteristics of optimization or loss-landscapes. However, theoretical analysis of the relationship between parameter space symmetries and these phenomena is difficult. In this work, we empirically investigate the impact of neural parameter symmetries by introducing new neural network architectures that have reduced parameter space symmetries. We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries. With these new methods, we conduct a comprehensive experimental study consisting of multiple tasks aimed at assessing the effect of removing parameter symmetries. Our experiments reveal several interesting observations on the empirical impact of parameter symmetries; for instance, we observe linear mode connectivity between our networks without alignment of weight spaces, and we find that our networks allow for faster and more effective Bayesian neural network training.

Updated: 2024-05-30 16:32:31

标题: 神经参数对称性的实证影响，或其缺失

摘要: 深度学习中的许多算法和观察现象似乎受到参数对称性的影响——神经网络参数的变换并不改变基础神经网络函数。这些包括线性模态连接性、模型合并、贝叶斯神经网络推理、元网络以及优化或损失景观的几个其他特征。然而，参数空间对称性与这些现象之间的关系的理论分析是困难的。在这项工作中，我们通过引入具有减少参数空间对称性的新神经网络架构来实证研究神经参数对称性的影响。我们开发了两种方法，并提供了一些可靠的保证，用于修改标准神经网络以减少参数空间对称性。借助这些新方法，我们进行了一项综合实验研究，包括多个任务，旨在评估去除参数对称性的影响。我们的实验揭示了关于参数对称性的实证影响的几个有趣观察；例如，我们观察到我们的网络之间存在线性模态连接性，而权重空间未对齐，并且我们发现我们的网络可以更快、更有效地进行贝叶斯神经网络训练。

更新时间: 2024-05-30 16:32:31

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.20231v1

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models

A pivotal advancement in the progress of large language models (LLMs) is the emergence of the Mixture-of-Experts (MoE) LLMs. Compared to traditional LLMs, MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. Different from previous weight pruning methods that rely on specifically designed hardware, this paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques. Specifically, we propose, for the first time to our best knowledge, post-training approaches for task-agnostic and task-specific expert pruning and skipping of MoE LLMs, tailored to improve deployment efficiency while maintaining model performance across a wide range of tasks. Extensive experiments show that our proposed methods can simultaneously reduce model sizes and increase the inference speed, while maintaining satisfactory performance. Data and code will be available at https://github.com/Lucky-Lance/Expert_Sparsity.

Updated: 2024-05-30 16:24:16

标题: 并非所有专家都是平等的：混合专家大型语言模型的高效专家修剪和跳过

摘要: 大型语言模型（LLMs）进展中的一个关键进展是混合专家（MoE）LLMs的出现。与传统的LLMs相比，MoE LLMs可以在更少的参数下实现更高的性能，但由于其巨大的参数大小，部署仍然很困难。与先前依赖专门设计的硬件的权重修剪方法不同，本文主要旨在通过引入即插即用的专家级稀疏化技术来提高MoE LLMs的部署效率。具体来说，我们首次提出了面向任务不可知和任务特定专家修剪和跳过MoE LLMs的后训练方法，旨在提高部署效率同时保持在各种任务中的模型性能。大量实验证明，我们提出的方法可以同时减少模型大小和增加推理速度，同时保持令人满意的性能。数据和代码将在https://github.com/Lucky-Lance/Expert_Sparsity 上提供。

更新时间: 2024-05-30 16:24:16

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.14800v2

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

We present MOFA-Video, an advanced controllable image animation method that generates video from the given image using various additional controllable signals (such as human landmarks reference, manual trajectories, and another even provided video) or their combinations. This is different from previous methods which only can work on a specific motion domain or show weak control abilities with diffusion prior. To achieve our goal, we design several domain-aware motion field adapters (\ie, MOFA-Adapters) to control the generated motions in the video generation pipeline. For MOFA-Adapters, we consider the temporal motion consistency of the video and generate the dense motion flow from the given sparse control conditions first, and then, the multi-scale features of the given image are wrapped as a guided feature for stable video diffusion generation. We naively train two motion adapters for the manual trajectories and the human landmarks individually since they both contain sparse information about the control. After training, the MOFA-Adapters in different domains can also work together for more controllable video generation.

Updated: 2024-05-30 16:22:22

标题: MOFA-Video: 冻结图像到视频扩散模型中通过生成运动场调整实现可控图像动画

摘要: 我们提出了MOFA-Video，这是一种先进的可控图像动画方法，通过使用各种额外可控信号（如人类标记参考、手动轨迹以及另一个提供的视频）或它们的组合，从给定图像生成视频。这与先前的方法不同，先前的方法只能在特定的动作领域工作，或者在扩散先验条件下显示弱控制能力。为了实现我们的目标，我们设计了几个领域感知的运动场适配器（即MOFA-Adapters），以控制视频生成管道中的生成动作。对于MOFA-Adapters，我们考虑视频的时间运动一致性，并首先从给定的稀疏控制条件生成密集运动流，然后，给定图像的多尺度特征被包装为稳定视频扩散生成的引导特征。我们分别为手动轨迹和人类标记训练了两个运动适配器，因为它们都包含有关控制的稀疏信息。训练后，不同领域的MOFA-Adapters也可以共同工作，以实现更可控的视频生成。

更新时间: 2024-05-30 16:22:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.20222v1

Effects of Dataset Sampling Rate for Noise Cancellation through Deep Learning

Background: Active noise cancellation has been a subject of research for decades. Traditional techniques, like the Fast Fourier Transform, have limitations in certain scenarios. This research explores the use of deep neural networks (DNNs) as a superior alternative. Objective: The study aims to determine the effect sampling rate within training data has on lightweight, efficient DNNs that operate within the processing constraints of mobile devices. Methods: We chose the ConvTasNET network for its proven efficiency in speech separation and enhancement. ConvTasNET was trained on datasets such as WHAM!, LibriMix, and the MS-2023 DNS Challenge. The datasets were sampled at rates of 8kHz, 16kHz, and 48kHz to analyze the effect of sampling rate on noise cancellation efficiency and effectiveness. The model was tested on a core-i7 Intel processor from 2023, assessing the network's ability to produce clear audio while filtering out background noise. Results: Models trained at higher sampling rates (48kHz) provided much better evaluation metrics against Total Harmonic Distortion (THD) and Quality Prediction For Generative Neural Speech Codecs (WARP-Q) values, indicating improved audio quality. However, a trade-off was noted with the processing time being longer for higher sampling rates. Conclusions: The Conv-TasNET network, trained on datasets sampled at higher rates like 48kHz, offers a robust solution for mobile devices in achieving noise cancellation through speech separation and enhancement. Future work involves optimizing the model's efficiency further and testing on mobile devices.

Updated: 2024-05-30 16:20:44

标题: 通过深度学习进行噪声抑制时数据集采样率的影响

摘要: 背景：主动噪音抵消已经是一个研究课题已经有数十年。传统技术，如快速傅里叶变换，在某些场景中存在局限性。本研究探讨了深度神经网络（DNNs）作为一个更优越的替代方案。目标：本研究旨在确定训练数据中采样率对轻量高效的DNNs在移动设备处理限制下运行的影响。方法：我们选择了ConvTasNET网络，因为其在语音分离和增强方面已被证实的高效性。ConvTasNET在WHAM！、LibriMix和MS-2023 DNS挑战等数据集上进行训练。这些数据集以8kHz、16kHz和48kHz的速率进行采样，以分析采样率对噪声抵消效率和效果的影响。该模型在2023年的Core-i7英特尔处理器上进行了测试，评估网络在过滤背景噪音的同时产生清晰音频的能力。结果：在更高采样率（48kHz）训练的模型在总谐波失真（THD）和生成性神经语音编解码器的质量预测（WARP-Q）值方面提供了更好的评估指标，表明音频质量得到了改善。然而，注意到在更高采样率下处理时间更长的折衷。结论：在48kHz等更高速率采样的数据集上训练的Conv-TasNET网络，为移动设备实现通过语音分离和增强实现噪音抵消提供了强大的解决方案。未来的工作将涉及进一步优化模型的效率并在移动设备上进行测试。

更新时间: 2024-05-30 16:20:44

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2405.20884v1

Croissant: A Metadata Format for ML-Ready Datasets

Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks.

Updated: 2024-05-30 16:20:04

标题: 可颂面包：一个用于机器学习数据集的元数据格式

摘要: 数据是机器学习（ML）的关键资源，但是处理数据仍然是一个主要的摩擦点。本文介绍了Croissant，这是一种用于数据集的元数据格式，简化了ML工具和框架中数据的使用。Croissant使数据集更易于发现、可移植和可互操作，从而解决了ML数据管理和负责任的人工智能中的重大挑战。Croissant已经得到几个流行数据集库的支持，涵盖数十万个数据集，可以直接加载到最流行的ML框架中。

更新时间: 2024-05-30 16:20:04

领域: cs.LG,cs.AI,cs.DB,cs.IR

下载: http://arxiv.org/abs/2403.19546v2

ESG-FTSE: A corpus of news articles with ESG relevance labels and use cases

We present ESG-FTSE, the first corpus comprised of news articles with Environmental, Social and Governance (ESG) relevance annotations. In recent years, investors and regulators have pushed ESG investing to the mainstream due to the urgency of climate change. This has led to the rise of ESG scores to evaluate an investment's credentials as socially responsible. While demand for ESG scores is high, their quality varies wildly. Quantitative techniques can be applied to improve ESG scores, thus, responsible investing. To contribute to resource building for ESG and financial text mining, we pioneer the ESG-FTSE corpus. We further present the first of its kind ESG annotation schema. It has three levels: a binary classification (relevant versus irrelevant news articles), ESG classification (ESG-related news articles), and target company. Both supervised and unsupervised learning experiments for ESG relevance detection were conducted to demonstrate that the corpus can be used in different settings to derive accurate ESG predictions. Keywords: corpus annotation, ESG labels, annotation schema, news article, natural language processing

Updated: 2024-05-30 16:19:02

标题: ESG-FTSE：一份带有ESG相关标签和使用案例的新闻文章语料库

摘要: 我们呈现了ESG-FTSE，这是第一个由新闻文章组成的语料库，具有环境、社会和治理（ESG）相关性注释。近年来，由于气候变化的紧迫性，投资者和监管机构将ESG投资推向主流。这导致ESG评分的崛起，用于评估投资的社会责任认证。尽管对ESG评分的需求很高，但它们的质量千差万别。可以应用定量技术来改进ESG评分，从而实现负责任投资。为了为ESG和金融文本挖掘资源建设做出贡献，我们开创了ESG-FTSE语料库。我们进一步提出了首个ESG注释模式。它有三个级别：二元分类（相关与不相关新闻文章），ESG分类（与ESG相关的新闻文章）和目标公司。进行了有监督和无监督学习实验以展示该语料库可以在不同环境中用于得出准确的ESG预测。关键词：语料库注释，ESG标签，注释模式，新闻文章，自然语言处理。

更新时间: 2024-05-30 16:19:02

领域: cs.AI

下载: http://arxiv.org/abs/2405.20218v1

Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback

The generation of high-quality human images through text-to-image (T2I) methods is a significant yet challenging task. Distinct from general image generation, human image synthesis must satisfy stringent criteria related to human pose, anatomy, and alignment with textual prompts, making it particularly difficult to achieve realistic results. Recent advancements in T2I generation based on diffusion models have shown promise, yet challenges remain in meeting human-specific preferences. In this paper, we introduce a novel approach tailored specifically for human image generation utilizing Direct Preference Optimization (DPO). Specifically, we introduce an efficient method for constructing a specialized DPO dataset for training human image generation models without the need for costly human feedback. We also propose a modified loss function that enhances the DPO training process by minimizing artifacts and improving image fidelity. Our method demonstrates its versatility and effectiveness in generating human images, including personalized text-to-image generation. Through comprehensive evaluations, we show that our approach significantly advances the state of human image generation, achieving superior results in terms of natural anatomies, poses, and text-image alignment.

Updated: 2024-05-30 16:18:05

标题: 通过人工智能反馈直接优化偏好，提升自己的人类形象生成模型

摘要: 通过文本到图像（T2I）方法生成高质量的人类图像是一项重要且具有挑战性的任务。与一般图像生成不同，人类图像合成必须满足与人类姿势、解剖和与文本提示对齐相关的严格标准，这使得实现逼真结果尤为困难。基于扩散模型的T2I生成的最新进展显示出潜力，但仍存在着满足人类特定偏好的挑战。在本文中，我们引入了一种专门针对人类图像生成的新方法，利用直接偏好优化（DPO）。具体地，我们介绍了一种为训练人类图像生成模型构建专门的DPO数据集的有效方法，而无需昂贵的人类反馈。我们还提出了一种修改后的损失函数，通过最小化伪影和提高图像保真度来增强DPO训练过程。我们的方法在生成人类图像方面表现出了其多功能性和有效性，包括个性化的文本到图像生成。通过全面评估，我们展示了我们的方法在人类图像生成领域取得了显著进展，在自然解剖、姿势和文本图像对齐方面取得了优越的结果。

更新时间: 2024-05-30 16:18:05

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20216v1

Abstract Weighted Based Gradual Semantics in Argumentation Theory

Weighted gradual semantics provide an acceptability degree to each argument representing the strength of the argument, computed based on factors including background evidence for the argument, and taking into account interactions between this argument and others. We introduce four important problems linking gradual semantics and acceptability degrees. First, we reexamine the inverse problem, seeking to identify the argument weights of the argumentation framework which lead to a specific final acceptability degree. Second, we ask whether the function mapping between argument weights and acceptability degrees is injective or a homeomorphism onto its image. Third, we ask whether argument weights can be found when preferences, rather than acceptability degrees for arguments are considered. Fourth, we consider the topology of the space of valid acceptability degrees, asking whether "gaps" exist in this space. While different gradual semantics have been proposed in the literature, in this paper, we identify a large family of weighted gradual semantics, called abstract weighted based gradual semantics. These generalise many of the existing semantics while maintaining desirable properties such as convergence to a unique fixed point. We also show that a sub-family of the weighted gradual semantics, called abstract weighted (L^p,\lambda,\mu)-based gradual semantics and which include well-known semantics, solve all four of the aforementioned problems.

Updated: 2024-05-30 16:16:50

标题: 在论证理论中的抽象加权逐渐语义

摘要: 加权逐渐语义为每个代表论据的参数提供一个可接受程度，表示论据的强度，计算基于包括论据的背景证据在内的因素，并考虑该论据与其他论据之间的相互作用。我们介绍了四个将逐渐语义和可接受程度联系起来的重要问题。首先，我们重新审视逆问题，试图确定导致特定最终可接受程度的论证框架的参数权重。其次，我们询问论证权重与可接受程度之间的映射函数是否是单射或同胚映射。第三，我们询问在考虑偏好而非论据的可接受程度时是否可以找到论据权重。第四，我们考虑有效可接受程度空间的拓扑性质，询问该空间中是否存在“间隔”。尽管文献中提出了不同的逐渐语义，但在本文中，我们确定了一大类加权逐渐语义，称为抽象加权逐渐语义。这些泛化了许多现有语义，同时保持了收敛到唯一固定点等理想特性。我们还展示了加权逐渐语义的一个子类，称为抽象加权(L^p,\lambda,\mu)-基础逐渐语义，包括了众所周知的语义，解决了上述四个问题。

更新时间: 2024-05-30 16:16:50

领域: cs.AI

下载: http://arxiv.org/abs/2401.11472v2

PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization

A poster from a long input document can be considered as a one-page easy-to-read multimodal (text and images) summary presented on a nice template with good design elements. Automatic transformation of a long document into a poster is a very less studied but challenging task. It involves content summarization of the input document followed by template generation and harmonization. In this work, we propose a novel deep submodular function which can be trained on ground truth summaries to extract multimodal content from the document and explicitly ensures good coverage, diversity and alignment of text and images. Then, we use an LLM based paraphraser and propose to generate a template with various design aspects conditioned on the input content. We show the merits of our approach through extensive automated and human evaluations.

Updated: 2024-05-30 16:16:25

标题: 博士后研究：使用深度次模优化从长篇多模态文档生成海报

摘要: 一份长篇文档中的海报可以被视为一个一页易于阅读的多模式（文本和图像）摘要，呈现在一个精美的模板上，具有良好的设计元素。将长篇文档自动转换成海报是一个很少研究但具有挑战性的任务。它涉及输入文档的内容摘要，随后生成和协调模板。在这项工作中，我们提出了一种新颖的深度次模函数，可以在地面真实摘要上进行训练，从文档中提取多模式内容，并明确确保文本和图像的良好覆盖、多样性和对齐。然后，我们使用基于LLM的改写器，并建议根据输入内容生成具有各种设计特点的模板。我们通过广泛的自动化和人工评估展示了我们方法的优点。

更新时间: 2024-05-30 16:16:25

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.20213v1

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Contrastive Language-Image Pretraining (CLIP) is widely used to train models to align images and texts in a common embedding space by mapping them to fixed-sized vectors. These models are key to multimodal information retrieval and related tasks. However, CLIP models generally underperform in text-only tasks compared to specialized text models. This creates inefficiencies for information retrieval systems that keep separate embeddings and models for text-only and multimodal tasks. We propose a novel, multi-task contrastive training method to address this issue, which we use to train the jina-clip-v1 model to achieve the state-of-the-art performance on both text-image and text-text retrieval tasks.

Updated: 2024-05-30 16:07:54

标题: Jina CLIP：您的CLIP模型也是您的文本检索器

摘要: 对比语言-图像预训练（CLIP）被广泛用于训练模型，在一个固定大小的向量中将图像和文本对齐。这些模型对于多模态信息检索和相关任务至关重要。然而，与专门的文本模型相比，CLIP模型在仅文本任务中表现普遍不佳。这为信息检索系统带来了低效，因为它们需要为文本任务和多模态任务保留独立的嵌入和模型。我们提出了一种新颖的多任务对比训练方法来解决这个问题，我们使用这种方法来训练jina-clip-v1模型，在文本-图像检索和文本-文本检索任务上实现了最先进的性能。

更新时间: 2024-05-30 16:07:54

领域: cs.CL,cs.AI,cs.CV,cs.IR,68T50,I.2.7

下载: http://arxiv.org/abs/2405.20204v1

One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments

Large Language Models (LLMs) have advanced rapidly but face significant memory demands. While quantization has shown promise for LLMs, current methods typically require lengthy training to alleviate the performance degradation from quantization loss. However, deploying LLMs across diverse scenarios with different resource constraints, e.g., servers and personal computers, requires repeated training per application, which amplifies the lengthy training problem. Given that, it is advantageous to train a once-for-all (OFA) supernet capable of yielding diverse optimal subnets for downstream applications through one-shot training. Nonetheless, the scale of current language models impedes efficiency and amplifies interference from weight sharing between subnets. We make an initial attempt to extend the once-for-all framework to large language models. Specifically, we decouple shared weights to eliminate the interference and incorporate Low-Rank adapters for training efficiency. Furthermore, we observe the imbalance allocation of training resources from the traditional uniform sampling. A non-parametric scheduler is introduced to adjust the sampling rate for each quantization configuration, achieving a more balanced allocation among subnets with varying demands. We validate the approach on LLaMA2 families, and downstream evaluation confirms our ability to maintain high performance while significantly reducing deployment time faced with multiple scenarios.

Updated: 2024-05-30 16:05:15

标题: 一个QuantLLM适用于所有：一次微调量化LLMs以便高效部署

摘要: 大型语言模型（LLMs）发展迅速，但面临着巨大的内存需求。尽管量化已显示出对LLMs有希望的作用，但当前的方法通常需要长时间的训练来减轻由于量化损失而导致的性能下降。然而，将LLMs部署到不同资源约束的多种场景中，例如服务器和个人电脑，需要针对每个应用程序重复训练，这加剧了长时间训练的问题。鉴于此，通过一次训练（OFA）超网络训练一个能够通过一次训练为下游应用程序产生多样化最佳子网的方法具有优势。然而，当前语言模型的规模妨碍了效率，并放大了子网之间的权重共享干扰。我们首次尝试将一次性框架扩展到大型语言模型。具体来说，我们解耦共享权重以消除干扰，并引入低秩适配器以提高训练效率。此外，我们观察到从传统均匀采样中分配的训练资源存在不平衡。引入了一个非参数调度器来调整每个量化配置的采样率，实现对具有不同需求的子网之间更平衡的分配。我们在LLaMA2系列上验证了这种方法，并下游评估证实了我们在面对多种情景时能够保持高性能，同时显著减少部署时间。

更新时间: 2024-05-30 16:05:15

领域: cs.AI

下载: http://arxiv.org/abs/2405.20202v1

AnalogCoder: Analog Circuit Design via Training-Free Code Generation

Analog circuit design is a significant task in modern chip technology, focusing on the selection of component types, connectivity, and parameters to ensure proper circuit functionality. Despite advances made by Large Language Models (LLMs) in digital circuit design, the complexity and scarcity of data in analog circuitry pose significant challenges. To mitigate these issues, we introduce AnalogCoder, the first training-free LLM agent for designing analog circuits through Python code generation. Firstly, AnalogCoder incorporates a feedback-enhanced flow with tailored domain-specific prompts, enabling the automated and self-correcting design of analog circuits with a high success rate. Secondly, it proposes a circuit tool library to archive successful designs as reusable modular sub-circuits, simplifying composite circuit creation. Thirdly, extensive experiments on a benchmark designed to cover a wide range of analog circuit tasks show that AnalogCoder outperforms other LLM-based methods. It has successfully designed 20 circuits, 5 more than standard GPT-4o. We believe AnalogCoder can significantly improve the labor-intensive chip design process, enabling non-experts to design analog circuits efficiently.

Updated: 2024-05-30 16:04:44

标题: 模拟编码器：通过无需训练的代码生成进行模拟电路设计

摘要: 模拟电路设计是现代芯片技术中的重要任务，侧重于选择组件类型、连接性和参数，以确保电路功能正常。尽管大型语言模型（LLMs）在数字电路设计方面取得了进展，但模拟电路中的复杂性和数据稀缺性带来了重大挑战。为了缓解这些问题，我们引入了AnalogCoder，这是第一个无需训练的LLM代理，通过Python代码生成来设计模拟电路。首先，AnalogCoder结合了反馈增强流程和定制领域特定提示，实现了模拟电路的自动化和自我校正设计，成功率高。其次，它提出了一个电路工具库，将成功设计存档为可重复使用的模块化子电路，简化了复合电路的创建。第三，对设计涵盖各种模拟电路任务的基准进行了广泛实验，结果显示AnalogCoder优于其他基于LLM的方法。它成功设计了20个电路，比标准GPT-4o多5个。我们相信AnalogCoder可以显著改善繁重的芯片设计过程，使非专家能够高效设计模拟电路。

更新时间: 2024-05-30 16:04:44

领域: cs.LG,cs.ET

下载: http://arxiv.org/abs/2405.14918v2

Unified Explanations in Machine Learning Models: A Perturbation Approach

A high-velocity paradigm shift towards Explainable Artificial Intelligence (XAI) has emerged in recent years. Highly complex Machine Learning (ML) models have flourished in many tasks of intelligence, and the questions have started to shift away from traditional metrics of validity towards something deeper: What is this model telling me about my data, and how is it arriving at these conclusions? Inconsistencies between XAI and modeling techniques can have the undesirable effect of casting doubt upon the efficacy of these explainability approaches. To address these problems, we propose a systematic, perturbation-based analysis against a popular, model-agnostic method in XAI, SHapley Additive exPlanations (Shap). We devise algorithms to generate relative feature importance in settings of dynamic inference amongst a suite of popular machine learning and deep learning methods, and metrics that allow us to quantify how well explanations generated under the static case hold. We propose a taxonomy for feature importance methodology, measure alignment, and observe quantifiable similarity amongst explanation models across several datasets.

Updated: 2024-05-30 16:04:35

标题: 机器学习模型中的统一解释：一种扰动方法

摘要: 近年来，一种高速度的转变范式朝着可解释人工智能（XAI）方向出现。高度复杂的机器学习（ML）模型在许多智能任务中蓬勃发展，问题已经开始从传统的有效性指标转向更深层次的问题：这个模型告诉我关于我的数据的什么，以及它是如何得出这些结论的？XAI和建模技术之间的不一致性可能会对这些可解释性方法的有效性产生不良影响。为了解决这些问题，我们提出了一种系统化的、基于扰动的分析方法，针对XAI中一种流行的、与模型无关的方法——Shapley Additive exPlanations（Shap）。我们设计了算法，在一系列流行的机器学习和深度学习方法的动态推理设置中生成相对特征重要性，并且设计了度量标准，使我们能够量化在静态情况下生成的解释的可靠性。我们提出了特征重要性方法学、度量对准度，并观察了在几个数据集中解释模型之间的可量化相似性。

更新时间: 2024-05-30 16:04:35

领域: cs.LG

下载: http://arxiv.org/abs/2405.20200v1

Occam Gradient Descent

Deep learning neural network models must be large enough to adapt to their problem domain, while small enough to avoid overfitting training data during gradient descent. To balance these competing demands, overprovisioned deep learning models such as transformers are trained for a single epoch on large data sets, and hence inefficient with both computing resources and training data. In response to these inefficiencies, we exploit learning theory to derive Occam Gradient Descent, an algorithm that interleaves adaptive reduction of model size to minimize generalization error, with gradient descent on model weights to minimize fitting error. In contrast, traditional gradient descent greedily minimizes fitting error without regard to generalization error. Our algorithm simultaneously descends the space of weights and topological size of any neural network without modification, and is effective in our experiments in outperforming traditional gradient descent with or without post-train pruning in accuracy, compute and model compression.

Updated: 2024-05-30 15:58:22

标题: 奥卡姆梯度下降

摘要: 深度学习神经网络模型必须足够大以适应其问题领域，同时又要足够小以避免在梯度下降训练数据时过拟合。为了平衡这些竞争性要求，过度配置的深度学习模型如transformers在大数据集上进行单次训练，因此在计算资源和训练数据上效率低下。为了应对这些低效性，我们利用学习理论推导出奥卡姆梯度下降算法，该算法交替进行自适应减小模型大小以最小化泛化误差，并在模型权重上进行梯度下降以最小化拟合误差。相比之下，传统梯度下降贪婪地最小化拟合误差而不考虑泛化误差。我们的算法同时降低任何神经网络的权重空间和拓扑大小，无需修改，并在我们的实验中表现出在准确性、计算和模型压缩方面优于传统梯度下降，无论是否进行后期修剪。

更新时间: 2024-05-30 15:58:22

领域: cs.LG

下载: http://arxiv.org/abs/2405.20194v1

Nadine: An LLM-driven Intelligent Social Robot with Affective Capabilities and Human-like Memory

In this work, we describe our approach to developing an intelligent and robust social robotic system for the Nadine social robot platform. We achieve this by integrating Large Language Models (LLMs) and skilfully leveraging the powerful reasoning and instruction-following capabilities of these types of models to achieve advanced human-like affective and cognitive capabilities. This approach is novel compared to the current state-of-the-art LLM-based agents which do not implement human-like long-term memory or sophisticated emotional appraisal. The naturalness of social robots, consisting of multiple modules, highly depends on the performance and capabilities of each component of the system and the seamless integration of the components. We built a social robot system that enables generating appropriate behaviours through multimodal input processing, bringing episodic memories accordingly to the recognised user, and simulating the emotional states of the robot induced by the interaction with the human partner. In particular, we introduce an LLM-agent frame for social robots, SoR-ReAct, serving as a core component for the interaction module in our system. This design has brought forth the advancement of social robots and aims to increase the quality of human-robot interaction.

Updated: 2024-05-30 15:55:41

标题: 娜丁：一款具有情感能力和类人记忆的基于LLM的智能社交机器人

摘要: 在这项工作中，我们描述了我们开发一种智能和稳健的社交机器人系统的方法，该系统是为Nadine社交机器人平台设计的。我们通过集成大型语言模型（LLMs）并巧妙地利用这些类型模型强大的推理和指令跟随能力来实现先进的类人情感和认知功能。与当前基于LLM的代理程序相比，该方法是新颖的，因为这些代理程序并未实现类人长期记忆或复杂的情感评估。社交机器人的自然性，由多个模块组成，高度依赖于系统的每个组件的性能和功能以及组件之间的无缝集成。我们构建了一个社交机器人系统，通过多模态输入处理实现生成适当行为，根据识别的用户带来情景记忆，并模拟机器人由与人类伙伴互动引起的情感状态。特别是，我们引入了一个用于社交机器人的LLM代理框架SoR-ReAct，作为我们系统中交互模块的核心组件。这种设计推动了社交机器人的进步，并旨在提高人机交互的质量。

更新时间: 2024-05-30 15:55:41

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.20189v1

A Survey Study on the State of the Art of Programming Exercise Generation using Large Language Models

This paper analyzes Large Language Models (LLMs) with regard to their programming exercise generation capabilities. Through a survey study, we defined the state of the art, extracted their strengths and weaknesses and finally proposed an evaluation matrix, helping researchers and educators to decide which LLM is the best fitting for the programming exercise generation use case. We also found that multiple LLMs are capable of producing useful programming exercises. Nevertheless, there exist challenges like the ease with which LLMs might solve exercises generated by LLMs. This paper contributes to the ongoing discourse on the integration of LLMs in education.

Updated: 2024-05-30 15:49:34

标题: 使用大型语言模型生成编程练习的现状调研研究

摘要: 本文分析了大型语言模型（LLMs）在编程练习生成能力方面的情况。通过一项调查研究，我们定义了现有技术水平，提取了它们的优势和劣势，并最终提出了一个评估矩阵，帮助研究人员和教育工作者决定哪种LLM最适合用于编程练习生成。我们还发现多个LLM能够生成有用的编程练习。然而，存在挑战，比如LLMs可能轻松解决由LLMs生成的练习。本文为LLMs在教育中的整合持续讨论做出了贡献。

更新时间: 2024-05-30 15:49:34

领域: cs.AI,cs.SE

下载: http://arxiv.org/abs/2405.20183v1

Transformers and Slot Encoding for Sample Efficient Physical World Modelling

World modelling, i.e. building a representation of the rules that govern the world so as to predict its evolution, is an essential ability for any agent interacting with the physical world. Recent applications of the Transformer architecture to the problem of world modelling from video input show notable improvements in sample efficiency. However, existing approaches tend to work only at the image level thus disregarding that the environment is composed of objects interacting with each other. In this paper, we propose an architecture combining Transformers for world modelling with the slot-attention paradigm, an approach for learning representations of objects appearing in a scene. We describe the resulting neural architecture and report experimental results showing an improvement over the existing solutions in terms of sample efficiency and a reduction of the variation of the performance over the training examples. The code for our architecture and experiments is available at https://github.com/torchipeppo/transformers-and-slot-encoding-for-wm

Updated: 2024-05-30 15:48:04

标题: 变压器和槽编码用于高效建模物理世界

摘要: 世界建模，即构建一个代表规则的表示，以预测其演变，对于与物理世界互动的任何代理都是一种必不可少的能力。最近将Transformer架构应用于从视频输入解决世界建模问题的应用显示出明显的样本效率改进。然而，现有方法往往只在图像级别工作，忽视了环境由相互交互的对象组成这一事实。在本文中，我们提出了一种结合了Transformer用于世界建模和slot-attention范式的架构，这是一种学习场景中出现的对象表示的方法。我们描述了由此产生的神经架构，并报告了实验结果，显示其在样本效率方面优于现有解决方案，并减少了对训练示例性能变化的幅度。我们的架构和实验的代码可在https://github.com/torchipeppo/transformers-and-slot-encoding-for-wm 上找到。

更新时间: 2024-05-30 15:48:04

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.20180v1

Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning CodeLLMs

Large language models (LLMs) have shown great promise at generating robot programs from natural language given domain-specific robot application programming interfaces (APIs). However, the performance gap between proprietary LLMs and smaller open-weight LLMs remains wide. This raises a question: Can we fine-tune smaller open-weight LLMs for generating domain-specific robot programs to close the performance gap with proprietary LLMs? While Self-Instruct is a promising solution by generating a diverse set of training data, it cannot verify the correctness of these programs. In contrast, a robot simulator with a well-defined world can identify execution errors but limits the diversity of programs that it can verify. In this work, we introduce Robo-Instruct, which brings the best of both worlds -- it promotes the diversity of Self-Instruct while providing the correctness of simulator-based checking. Robo-Instruct introduces RoboSim to synthesize a consistent world state on the fly by inferring properties relevant to the program being checked, and simulating actions accordingly. Furthermore, the instructions and programs generated by Self-Instruct may be subtly inconsistent -- such as the program missing a step implied by the instruction. Robo-Instruct further addresses this with InstAlign, an instruction-program alignment procedure that revises the task instruction to reflect the actual results of the generated program. Given a few seed task descriptions and the robot APIs, Robo-Instruct is capable of generating a training dataset using only a small open-weight model. This dataset can then be used to fine-tune small open-weight language models, enabling them to match or even exceed the performance of several proprietary LLMs, such as GPT-3.5-Turbo and Gemini-Pro.

Updated: 2024-05-30 15:47:54

标题: 罗博-指导：模拟增强指导用于微调代码LLMs

摘要: 大型语言模型（LLMs）已经展现出了从自然语言生成机器人程序的巨大潜力，特别是在给定特定领域机器人应用编程接口（APIs）的情况下。然而，专有LLMs和较小的开源LLMs之间的性能差距仍然很大。这引发了一个问题：我们是否可以对较小的开源LLMs进行微调，以生成特定领域的机器人程序，以缩小与专有LLMs之间的性能差距？虽然“自我指导”是一个有希望的解决方案，可以生成多样化的训练数据，但它无法验证这些程序的正确性。相比之下，具有明确定义世界的机器人模拟器可以识别执行错误，但会限制可以验证的程序的多样性。在这项工作中，我们介绍了Robo-Instruct，它将“自我指导”的多样性与基于模拟器的检查的正确性结合起来。Robo-Instruct引入了RoboSim，通过推断与正在检查的程序相关的属性，并相应地模拟行动，实时合成一致的世界状态。此外，“自我指导”生成的指令和程序可能存在微妙的不一致性，例如程序缺少指令隐含的步骤。Robo-Instruct通过InstAlign进一步解决了这个问题，这是一种指令-程序对齐过程，修订任务指令以反映生成的程序的实际结果。给定一些种子任务描述和机器人APIs，Robo-Instruct能够仅使用一个小型开源模型生成一个训练数据集。然后，可以使用这个数据集来微调小型开源语言模型，使它们能够匹配甚至超越一些专有LLMs的性能，如GPT-3.5-Turbo和Gemini-Pro。

更新时间: 2024-05-30 15:47:54

领域: cs.CL,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.20179v1

Non-intrusive data-driven model order reduction for circuits based on Hammerstein architectures

We demonstrate that data-driven system identification techniques can provide a basis for effective, non-intrusive model order reduction (MOR) for common circuits that are key building blocks in microelectronics. Our approach is motivated by the practical operation of these circuits and utilizes a canonical Hammerstein architecture. To demonstrate the approach we develop a parsimonious Hammerstein model for a non-linear CMOS differential amplifier. We train this model on a combination of direct current (DC) and transient Spice (Xyce) circuit simulation data using a novel sequential strategy to identify the static nonlinear and linear dynamical parts of the model. Simulation results show that the Hammerstein model is an effective surrogate for the differential amplifier circuit that accurately and efficiently reproduces its behavior over a wide range of operating points and input frequencies.

Updated: 2024-05-30 15:47:48

标题: 基于Hammerstein结构的电路非侵入式数据驱动模型降阶

摘要: 我们证明了数据驱动的系统辨识技术可以为微电子中关键构件的有效、非侵入式模型阶降（MOR）提供基础。我们的方法受到这些电路的实际操作的启发，利用了一个经典的Hammerstein结构。为了证明这一方法，我们为一个非线性CMOS差分放大器开发了一个简洁的Hammerstein模型。我们使用一种新颖的顺序策略在直流（DC）和瞬态Spice（Xyce）电路模拟数据的组合上训练这个模型，以识别模型的静态非线性和线性动态部分。模拟结果表明，Hammerstein模型是差分放大器电路的有效替代品，能够准确、高效地在广泛的工作点和输入频率范围内再现其行为。

更新时间: 2024-05-30 15:47:48

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2405.20178v1

Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources

Federated Learning (FL) has recently been applied to the parameter-efficient fine-tuning of Large Language Models (LLMs). While promising, it raises significant challenges due to the heterogeneous resources and data distributions of clients. This study introduces FlexLoRA, a simple yet effective aggregation scheme for LLM fine-tuning, which mitigates the ``bucket effect'' in traditional FL that restricts the potential of clients with ample resources by tying them to the capabilities of the least-resourced participants. FlexLoRA allows for dynamic adjustment of local LoRA ranks, fostering the development of a global model imbued with broader, less task-specific knowledge. By synthesizing a full-size LoRA weight from individual client contributions and employing Singular Value Decomposition (SVD) for weight redistribution, FlexLoRA fully leverages heterogeneous client resources. Involving thousands of clients performing heterogeneous NLP tasks and client resources, our experiments validate the efficacy of FlexLoRA, with the federated global model achieving consistently better improvement over SOTA FL methods in downstream NLP task performance across various heterogeneous distributions. FlexLoRA's practicality is further underscored by our theoretical analysis and its seamless integration with existing LoRA-based FL methods, offering a path toward cross-device, privacy-preserving federated tuning for LLMs.

Updated: 2024-05-30 15:46:10

标题: 异构任务和客户资源下的大型语言模型联邦微调

摘要: 最近，联邦学习（FL）已被应用于大型语言模型（LLMs）的参数高效微调。虽然有希望，但由于客户端资源和数据分布的异质性，它带来了重大挑战。本研究介绍了FlexLoRA，这是一种简单但有效的LLM微调聚合方案，可以缓解传统FL中的“桶效应”，该效应限制了拥有充足资源的客户端的潜力，将它们与资源最少的参与者的能力联系在一起。FlexLoRA允许动态调整本地LoRA排名，促进开发具有更广泛、更少任务特定知识的全局模型。通过从个体客户端贡献中合成完整大小的LoRA权重，并利用奇异值分解（SVD）进行权重重新分配，FlexLoRA充分利用了异质客户端资源。通过涉及执行异质NLP任务和客户端资源的数千客户端，我们的实验验证了FlexLoRA的有效性，联邦全局模型在各种异质分布中持续取得比SOTA FL方法更好的下游NLP任务性能改善。FlexLoRA的实用性进一步得到我们的理论分析的强调，并且它与现有的基于LoRA的FL方法无缝集成，为LLMs的跨设备、保护隐私的联邦微调提供了一条途径。

更新时间: 2024-05-30 15:46:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.11505v2

InstructionCP: A fast approach to transfer Large Language Models into target language

The rapid development of large language models (LLMs) in recent years has largely focused on English, resulting in models that respond exclusively in English. To adapt these models to other languages, continual pre-training (CP) is often employed, followed by supervised fine-tuning (SFT) to maintain conversational abilities. However, CP and SFT can reduce a model's ability to filter harmful content. We propose Instruction Continual Pre-training (InsCP), which integrates instruction tags into the CP process to prevent loss of conversational proficiency while acquiring new languages. Our experiments demonstrate that InsCP retains conversational and Reinforcement Learning from Human Feedback (RLHF) abilities. Empirical evaluations on language alignment, reliability, and knowledge benchmarks confirm the efficacy of InsCP. Notably, this approach requires only 0.1 billion tokens of high-quality instruction-following data, thereby reducing resource consumption.

Updated: 2024-05-30 15:45:13

标题: InstructionCP：一种快速将大型语言模型转换为目标语言的方法

摘要: 近年来，大型语言模型（LLMs）的快速发展主要集中在英语上，导致这些模型只能用英语做出回应。为了使这些模型适应其他语言，通常会采用持续预训练（CP），然后进行监督微调（SFT）以保持对话能力。然而，持续预训练和监督微调可能会降低模型过滤有害内容的能力。我们提出了指令持续预训练（InsCP），将指令标签整合到预训练过程中，防止失去对话能力同时学习新语言。我们的实验表明，InsCP保留了对话和来自人类反馈的强化学习能力。对语言对齐、可靠性和知识基准的实证评估证实了InsCP的有效性。值得注意的是，这种方法仅需要0.1亿个高质量的指令跟随数据，从而减少资源消耗。

更新时间: 2024-05-30 15:45:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.20175v1

Tropical Expressivity of Neural Networks

We propose an algebraic geometric framework to study the expressivity of linear activation neural networks. A particular quantity that has been actively studied in the field of deep learning is the number of linear regions, which gives an estimate of the information capacity of the architecture. To study and evaluate information capacity and expressivity, we work in the setting of tropical geometry -- a combinatorial and polyhedral variant of algebraic geometry -- where there are known connections between tropical rational maps and feedforward neural networks. Our work builds on and expands this connection to capitalize on the rich theory of tropical geometry to characterize and study various architectural aspects of neural networks. Our contributions are threefold: we provide a novel tropical geometric approach to selecting sampling domains among linear regions; an algebraic result allowing for a guided restriction of the sampling domain for network architectures with symmetries; and an open source library to analyze neural networks as tropical Puiseux rational maps. We provide a comprehensive set of proof-of-concept numerical experiments demonstrating the breadth of neural network architectures to which tropical geometric theory can be applied to reveal insights on expressivity characteristics of a network. Our work provides the foundations for the adaptation of both theory and existing software from computational tropical geometry and symbolic computation to deep learning.

Updated: 2024-05-30 15:45:03

标题: 热带神经网络的表现力

摘要: 我们提出了一个代数几何框架来研究线性激活神经网络的表达能力。在深度学习领域中，一种被积极研究的特定数量是线性区域的数量，这提供了架构的信息容量的估计。为了研究和评估信息容量和表达能力，我们在热带几何的设置中工作——这是代数几何的组合和多面体变体——其中已知热带有理映射和前馈神经网络之间的连接。我们的工作建立在这种连接的基础上，并扩展了它，以利用丰富的热带几何理论来表征和研究神经网络的各种架构方面。我们的贡献有三个方面：我们提供了一种新颖的热带几何方法来选择线性区域中的采样域；一项代数结果允许对具有对称性的网络架构的采样域进行指导限制；以及一个用于分析神经网络的开源库，作为热带普伊索有理映射。我们提供了一套概念证明数值实验，展示了热带几何理论可以应用于各种神经网络架构，以揭示网络表达特征的见解。我们的工作为将计算热带几何和符号计算的理论和现有软件转化为深度学习提供了基础。

更新时间: 2024-05-30 15:45:03

领域: cs.LG,math.AG

下载: http://arxiv.org/abs/2405.20174v1

Machine Unlearning of Pre-trained Large Language Models

This study investigates the concept of the `right to be forgotten' within the context of large language models (LLMs). We explore machine unlearning as a pivotal solution, with a focus on pre-trained models--a notably under-researched area. Our research delineates a comprehensive framework for machine unlearning in pre-trained LLMs, encompassing a critical analysis of seven diverse unlearning methods. Through rigorous evaluation using curated datasets from arXiv, books, and GitHub, we establish a robust benchmark for unlearning performance, demonstrating that these methods are over $10^5$ times more computationally efficient than retraining. Our results show that integrating gradient ascent with gradient descent on in-distribution data improves hyperparameter robustness. We also provide detailed guidelines for efficient hyperparameter tuning in the unlearning process. Our findings advance the discourse on ethical AI practices, offering substantive insights into the mechanics of machine unlearning for pre-trained LLMs and underscoring the potential for responsible AI development.

Updated: 2024-05-30 15:44:51

标题: 大型预训练语言模型的机器反学习

摘要: 这项研究探讨了“被遗忘权”概念在大型语言模型（LLMs）中的应用。我们探索了机器遗忘作为一个关键解决方案，重点放在预训练模型上 - 一个明显缺乏研究的领域。我们的研究勾勒了一个全面的框架，用于在预训练的LLMs中进行机器遗忘，包括对七种不同的遗忘方法进行了批判性分析。通过使用来自arXiv、书籍和GitHub的策划数据集进行严格评估，我们建立了一个强大的遗忘性能基准，证明这些方法比重新训练高出$10^5$倍的计算效率。我们的结果表明，在分布数据上将梯度上升与梯度下降相结合可以改善超参数的稳健性。我们还提供了在遗忘过程中进行高效超参数调整的详细指导方针。我们的发现推动了关于道德人工智能实践的讨论，为预训练LLMs的机器遗忘机制提供了实质性见解，并强调了负责任的人工智能发展的潜力。

更新时间: 2024-05-30 15:44:51

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2402.15159v3

Cheap Talking Algorithms

We simulate behaviour of two independent reinforcement learning algorithms playing the Crawford and Sobel (1982) game of strategic information transmission. We adopt memoryless algorithms to capture learning in a static game where a large population interacts anonymously. We show that sender and receiver converge to Nash equilibrium play. The level of informativeness of the sender's cheap talk decreases as the bias increases and, at intermediate level of the bias, it matches the level predicted by the Pareto optimal equilibrium or by the second best one. Conclusions are robust to alternative specifications of the learning hyperparameters and of the game.

Updated: 2024-05-30 15:44:37

标题: 廉价的对话算法

摘要: 我们模拟了两个独立的强化学习算法在克劳福德和索贝尔（1982年）的战略信息传递游戏中的行为。我们采用无记忆算法来捕捉在一个大型人口匿名交互的静态游戏中的学习过程。我们展示了发送者和接收者会收敛到纳什均衡策略。发送者的廉价谈话的信息量随着偏见增加而减少，在偏见的中等水平下，它与帕累托最优均衡或次优均衡预测的水平相匹配。结论对学习超参数和游戏的替代规范具有稳健性。

更新时间: 2024-05-30 15:44:37

领域: econ.TH,cs.AI

下载: http://arxiv.org/abs/2310.07867v4

Iterative Feature Boosting for Explainable Speech Emotion Recognition

In speech emotion recognition (SER), using predefined features without considering their practical importance may lead to high dimensional datasets, including redundant and irrelevant information. Consequently, high-dimensional learning often results in decreasing model accuracy while increasing computational complexity. Our work underlines the importance of carefully considering and analyzing features in order to build efficient SER systems. We present a new supervised SER method based on an efficient feature engineering approach. We pay particular attention to the explainability of results to evaluate feature relevance and refine feature sets. This is performed iteratively through feature evaluation loop, using Shapley values to boost feature selection and improve overall framework performance. Our approach allows thus to balance the benefits between model performance and transparency. The proposed method outperforms human-level performance (HLP) and state-of-the-art machine learning methods in emotion recognition on the TESS dataset.

Updated: 2024-05-30 15:44:27

标题: 可解释的语音情绪识别中的迭代特征增强

摘要: 在语音情感识别（SER）中，如果使用预定义的特征而不考虑它们的实际重要性，可能会导致高维数据集，其中包含冗余和不相关的信息。因此，高维学习往往会导致模型准确性下降，同时增加计算复杂性。我们的工作强调了仔细考虑和分析特征以构建高效的SER系统的重要性。我们提出了一种基于高效特征工程方法的新型监督式SER方法。我们特别关注结果的可解释性，以评估特征的相关性并完善特征集。这是通过特征评估循环迭代执行的，使用Shapley值来增强特征选择并提高整体框架性能。因此，我们的方法允许在模型性能和透明度之间取得平衡。所提出的方法在TESS数据集上的情感识别中优于人类水平表现（HLP）和现有的机器学习方法。

更新时间: 2024-05-30 15:44:27

领域: cs.SD,cs.AI,cs.CL,cs.LG,eess.AS,I.2.7; I.2.6; I.2.1; I.2.8

下载: http://arxiv.org/abs/2405.20172v1

Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation

We study reinforcement learning with multinomial logistic (MNL) function approximation where the underlying transition probability kernel of the Markov decision processes (MDPs) is parametrized by an unknown transition core with features of state and action. For the finite horizon episodic setting with inhomogeneous state transitions, we propose provably efficient algorithms with randomized exploration having frequentist regret guarantees. For our first algorithm, $\texttt{RRL-MNL}$, we adapt optimistic sampling to ensure the optimism of the estimated value function with sufficient frequency and establish that $\texttt{RRL-MNL}$ is both statistically and computationally efficient, achieving a $\tilde{O}(\kappa^{-1} d^{\frac{3}{2}} H^{\frac{3}{2}} \sqrt{T})$ frequentist regret bound with constant-time computational cost per episode. Here, $d$ is the dimension of the transition core, $H$ is the horizon length, $T$ is the total number of steps, and $\kappa$ is a problem-dependent constant. Despite the simplicity and practicality of $\texttt{RRL-MNL}$, its regret bound scales with $\kappa^{-1}$, which is potentially large in the worst case. To improve the dependence on $\kappa^{-1}$, we propose $\texttt{ORRL-MNL}$, which estimates the value function using local gradient information of the MNL transition model. We show that its frequentist regret bound is $\tilde{O}(d^{\frac{3}{2}} H^{\frac{3}{2}} \sqrt{T} + \kappa^{-1} d^2 H^2)$. To the best of our knowledge, these are the first randomized RL algorithms for the MNL transition model that achieve both computational and statistical efficiency. Numerical experiments demonstrate the superior performance of the proposed algorithms.

Updated: 2024-05-30 15:39:19

标题: 使用多项式逻辑函数逼近的强化学习随机探索

摘要: 我们研究了使用多项式逻辑（MNL）函数逼近的强化学习，在这种情况下，马尔可夫决策过程（MDPs）的基础转移概率核是由具有状态和动作特征的未知转移核参数化的。针对具有不均匀状态转移的有限时间段周期性设置，我们提出了具有频率遗憾保证的经过证明有效的带有随机探索的算法。对于我们的第一个算法$\texttt{RRL-MNL}$，我们调整乐观采样以确保估计值函数的乐观性具有足够的频率，并且建立了$\texttt{RRL-MNL}$在统计和计算上都是高效的，实现了一个常数时间的计算成本的$\tilde{O}(\kappa^{-1} d^{\frac{3}{2}} H^{\frac{3}{2}} \sqrt{T})$频率遗憾界。这里，$d$是转移核的维度，$H$是时间跨度，$T$是总步数，$\kappa$是一个与问题相关的常数。尽管$\texttt{RRL-MNL}$的简单性和实用性，其遗憾界随$\kappa^{-1}$增长，在最坏情况下可能很大。为了改善对$\kappa^{-1}$的依赖性，我们提出了$\texttt{ORRL-MNL}$，它使用MNL转移模型的局部梯度信息来估计值函数。我们展示了它的频率遗憾界为$\tilde{O}(d^{\frac{3}{2}} H^{\frac{3}{2}} \sqrt{T} + \kappa^{-1} d^2 H^2)$。据我们所知，这是第一个为MNL转移模型设计的随机强化学习算法，可同时实现计算和统计效率。数值实验证明了所提算法的优越性能。

更新时间: 2024-05-30 15:39:19

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.20165v1

Aligning Crowd Feedback via Distributional Preference Reward Modeling

Deep Reinforcement Learning is widely used for aligning Large Language Models (LLM) with human preference. However, the conventional reward modelling is predominantly dependent on human annotations provided by a select cohort of individuals. Such dependence may unintentionally result in skewed models that reflect the inclinations of these annotators, thereby failing to adequately represent the wider population's expectations. We propose the Distributional Preference Reward Model (DPRM), a simple yet effective framework to align large language models with diverse human preferences. To this end, we characterize multiple preferences by a categorical distribution and introduce a Bayesian updater to accommodate shifted or new preferences. On top of that, we design an optimal-transportation-based loss to calibrate DPRM to align with the preference distribution. Finally, the expected reward is utilized to fine-tune an LLM policy to generate responses favoured by the population. Our experiments show that DPRM significantly enhances the alignment of LLMs with population preference, yielding more accurate, unbiased, and contextually appropriate responses.

Updated: 2024-05-30 15:39:17

标题: 通过分布式偏好奖励建模对齐群体反馈

摘要: 深度强化学习被广泛应用于将大型语言模型(LLM)与人类偏好对齐。然而，传统的奖励建模主要依赖于由一小部分个体提供的人类注释。这种依赖可能会导致倾向于这些标注者偏好的模型，从而未能充分代表更广泛人群的期望。我们提出了分布偏好奖励模型(DPRM)，这是一个简单而有效的框架，用于将大型语言模型与不同的人类偏好对齐。为此，我们通过分类分布表征多种偏好，并引入贝叶斯更新器来适应偏移或新的偏好。此外，我们设计了基于最优传输的损失来校准DPRM以与偏好分布对齐。最后，期望奖励被用来微调LLM策略，生成受人群青睐的回应。我们的实验表明，DPRM显著增强了LLM与人群偏好的对齐，产生更准确、无偏和上下文恰当的回应。

更新时间: 2024-05-30 15:39:17

领域: cs.AI

下载: http://arxiv.org/abs/2402.09764v3

Reasoning about concepts with LLMs: Inconsistencies abound

The ability to summarize and organize knowledge into abstract concepts is key to learning and reasoning. Many industrial applications rely on the consistent and systematic use of concepts, especially when dealing with decision-critical knowledge. However, we demonstrate that, when methodically questioned, large language models (LLMs) often display and demonstrate significant inconsistencies in their knowledge. Computationally, the basic aspects of the conceptualization of a given domain can be represented as Is-A hierarchies in a knowledge graph (KG) or ontology, together with a few properties or axioms that enable straightforward reasoning. We show that even simple ontologies can be used to reveal conceptual inconsistencies across several LLMs. We also propose strategies that domain experts can use to evaluate and improve the coverage of key domain concepts in LLMs of various sizes. In particular, we have been able to significantly enhance the performance of LLMs of various sizes with openly available weights using simple knowledge-graph (KG) based prompting strategies.

Updated: 2024-05-30 15:38:54

标题: 使用LLMs推理关于概念的文献：不一致现象频繁存在

摘要: 总结和组织知识并将其转化为抽象概念的能力对于学习和推理至关重要。许多工业应用依赖于概念的一致和系统化使用，尤其是在处理决策关键知识时。然而，我们证明，当系统地询问时，大型语言模型（LLMs）经常显示并展示其知识中的重大不一致性。在计算方面，给定领域的概念化基本方面可以表示为知识图（KG）或本体中的Is-A层次结构，以及一些能够进行直接推理的属性或公理。我们展示了即使简单的本体也可以用来揭示几个LLMs之间的概念不一致性。我们还提出了领域专家可以使用的策略，以评估和改进各种规模的LLMs中关键领域概念的覆盖范围。特别地，我们已经能够利用基于简单知识图（KG）的提示策略显著提高各种规模的LLMs的性能，这些提示策略使用公开可用的权重。

更新时间: 2024-05-30 15:38:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.20163v1

Zero-Shot Hierarchical Classification on the Common Procurement Vocabulary Taxonomy

Classifying public tenders is a useful task for both companies that are invited to participate and for inspecting fraudulent activities. To facilitate the task for both participants and public administrations, the European Union presented a common taxonomy (Common Procurement Vocabulary, CPV) which is mandatory for tenders of certain importance; however, the contracts in which a CPV label is mandatory are the minority compared to all the Public Administrations activities. Classifying over a real-world taxonomy introduces some difficulties that can not be ignored. First of all, some fine-grained classes have an insufficient (if any) number of observations in the training set, while other classes are far more frequent (even thousands of times) than the average. To overcome those difficulties, we present a zero-shot approach, based on a pre-trained language model that relies only on label description and respects the label taxonomy. To train our proposed model, we used industrial data, which comes from contrattipubblici.org, a service by SpazioDati s.r.l. that collects public contracts stipulated in Italy in the last 25 years. Results show that the proposed model achieves better performance in classifying low-frequent classes compared to three different baselines, and is also able to predict never-seen classes.

Updated: 2024-05-30 15:34:10

标题: 零射击层次分类在通用采购分类词汇(Taxonomy)上

摘要: 对于被邀请参与的公司和检查欺诈活动的人来说，对公共招标进行分类是一项有用的任务。为了简化参与者和公共管理机构的任务，欧盟提出了一个共同的分类法（共同采购词汇表，CPV），对于某些重要的招标是强制性的；然而，对于所有公共管理活动而言，CPV标签是强制性的合同中的少数。在现实世界的分类中引入了一些不能忽视的困难。首先，一些细粒度的类在训练集中的观察次数不足（如果有的话），而其他类比平均值更频繁（甚至有数千次）。为了克服这些困难，我们提出了一种零-shot方法，基于一个预训练的语言模型，只依赖于标签描述并遵守标签分类法。为了训练我们提出的模型，我们使用了来自SpazioDati s.r.l.的服务contrattipubblici.org收集的意大利过去25年签订的公共合同的工业数据。结果表明，所提出的模型在分类低频类方面表现比三种不同的基线更好，并且还能预测从未见过的类别。

更新时间: 2024-05-30 15:34:10

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.09983v2

Characterizing Data Point Vulnerability via Average-Case Robustness

Studying the robustness of machine learning models is important to ensure consistent model behaviour across real-world settings. To this end, adversarial robustness is a standard framework, which views robustness of predictions through a binary lens: either a worst-case adversarial misclassification exists in the local region around an input, or it does not. However, this binary perspective does not account for the degrees of vulnerability, as data points with a larger number of misclassified examples in their neighborhoods are more vulnerable. In this work, we consider a complementary framework for robustness, called average-case robustness, which measures the fraction of points in a local region that provides consistent predictions. However, computing this quantity is hard, as standard Monte Carlo approaches are inefficient especially for high-dimensional inputs. In this work, we propose the first analytical estimators for average-case robustness for multi-class classifiers. We show empirically that our estimators are accurate and efficient for standard deep learning models and demonstrate their usefulness for identifying vulnerable data points, as well as quantifying robustness bias of models. Overall, our tools provide a complementary view to robustness, improving our ability to characterize model behaviour.

Updated: 2024-05-30 15:33:55

标题: 通过平均情况的稳健性表征数据点的脆弱性

摘要: 研究机器学习模型的稳健性对于确保模型在真实世界环境中的一致行为至关重要。为此，对抗稳健性是一个标准框架，通过二元视角来观察预测的稳健性：要么在输入周围的局部区域存在最坏情况下的对抗错误分类，要么不存在。然而，这种二元观点并未考虑脆弱性的程度，因为在其邻域中存在更多被错误分类的数据点更容易受到攻击。在这项工作中，我们考虑了一种补充的稳健性框架，称为平均情况稳健性，它衡量了提供一致预测的局部区域的点的比例。然而，计算这个数量是困难的，因为标准的蒙特卡洛方法在高维输入下效率低下。在这项工作中，我们提出了多类分类器的平均情况稳健性的第一个分析估计器。我们通过实验证明，我们的估计器对于标准的深度学习模型是准确和高效的，并展示了它们用于识别脆弱数据点以及量化模型的稳健性偏差的实用性。总的来说，我们的工具提供了一个补充视角来提高我们表征模型行为的能力。

更新时间: 2024-05-30 15:33:55

领域: cs.LG

下载: http://arxiv.org/abs/2307.13885v5

Code Repair with LLMs gives an Exploration-Exploitation Tradeoff

Iteratively improving and repairing source code with large language models (LLMs), known as refinement, has emerged as a popular way of generating programs that would be too complex to construct in one shot. Given a bank of test cases, together with a candidate program, an LLM can improve that program by being prompted with failed test cases. But it remains an open question how to best iteratively refine code, with prior work employing simple greedy or breadth-first strategies. We show here that refinement exposes an explore-exploit tradeoff: exploit by refining the program that passes the most test cases, or explore by refining a lesser considered program. We frame this as an arm-acquiring bandit problem, which we solve with Thompson Sampling. The resulting LLM-based program synthesis algorithm is broadly applicable: Across loop invariant synthesis, visual reasoning puzzles, and competition programming problems, we find that our new method can solve more problems using fewer language model calls.

Updated: 2024-05-30 15:20:19

标题: 用LLMs修复代码会产生一种探索-利用权衡

摘要: 使用大型语言模型（LLMs）逐步改进和修复源代码，即细化，已经成为生成程序的流行方式，这些程序在一次构建中可能过于复杂。给定一个测试用例库以及一个候选程序，LLM可以通过提示失败的测试用例来改进该程序。但如何最好地迭代地细化代码仍然是一个开放的问题，先前的研究采用简单的贪婪或广度优先策略。我们在这里展示，细化暴露了一种探索-利用的权衡：通过细化通过最多测试用例的程序来利用，或者通过细化一个较少考虑的程序来探索。我们将这个问题框架化为一个获取臂的赌博问题，我们用汤普森抽样方法解决。由此产生的基于LLM的程序合成算法具有广泛的适用性：在循环不变量合成、视觉推理谜题和竞赛编程问题中，我们发现我们的新方法可以使用更少的语言模型调用解决更多问题。

更新时间: 2024-05-30 15:20:19

领域: cs.SE,cs.AI,cs.CL,cs.PL

下载: http://arxiv.org/abs/2405.17503v2

Learning Optimal Deterministic Policies with Stochastic Policy Gradients

Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. They learn stochastic parametric (hyper)policies by either exploring in the space of actions or in the space of parameters. Stochastic controllers, however, are often undesirable from a practical perspective because of their lack of robustness, safety, and traceability. In common practice, stochastic (hyper)policies are learned only to deploy their deterministic version. In this paper, we make a step towards the theoretical understanding of this practice. After introducing a novel framework for modeling this scenario, we study the global convergence to the best deterministic policy, under (weak) gradient domination assumptions. Then, we illustrate how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy. Finally, we quantitatively compare action-based and parameter-based exploration, giving a formal guise to intuitive results.

Updated: 2024-05-30 15:18:24

标题: 学习使用随机策略梯度找到最优确定性策略

摘要: 政策梯度（PG）方法是处理连续强化学习（RL）问题的成功方法。它们通过在动作空间或参数空间中探索来学习随机参数化（超）策略。然而，从实际角度来看，随机控制器通常是不可取的，因为它们缺乏鲁棒性、安全性和可追溯性。在实际应用中，通常只学习随机（超）策略以部署其确定性版本。在本文中，我们朝着理解这种实践的理论方向迈出了一步。在引入一个新的框架来建模这种情景之后，我们研究了在（弱）梯度支配假设下最佳确定性策略的全局收敛性。然后，我们说明了如何调整用于学习的探索水平，以优化样本复杂度和部署的确定性策略性能之间的权衡。最后，我们定量地比较了基于动作和基于参数的探索，为直观结果提供了正式的外观。

更新时间: 2024-05-30 15:18:24

领域: cs.LG

下载: http://arxiv.org/abs/2405.02235v2

Text clustering with LLM embeddings

Text clustering is an important approach for organising the growing amount of digital content, helping to structure and find hidden patterns in uncategorised data. However, the effectiveness of text clustering heavily relies on the choice of textual embeddings and clustering algorithms. We argue that recent advances in large language models (LLMs) can potentially improve this task. In this research, we investigated how different textual embeddings -- particularly those used in LLMs -- and clustering algorithms affect how text datasets are clustered. A series of experiments were conducted to assess how embeddings influence clustering results, the role played by dimensionality reduction through summarisation, and model size adjustment. Findings reveal that LLM embeddings excel at capturing subtleties in structured language, while BERT leads the lightweight options in performance. In addition, we observe that increasing model dimensionality and employing summarization techniques do not consistently lead to improvements in clustering efficiency, suggesting that these strategies require careful analysis to use in real-life models. These results highlight a complex balance between the need for refined text representation and computational feasibility in text clustering applications. This study extends traditional text clustering frameworks by incorporating embeddings from LLMs, providing a path for improved methodologies, while informing new avenues for future research in various types of textual analysis.

Updated: 2024-05-30 15:17:55

标题: LLM嵌入的文本聚类

摘要: 文本聚类是组织日益增长的数字内容的重要方法，有助于结构化和发现未分类数据中的隐藏模式。然而，文本聚类的有效性在很大程度上取决于文本嵌入和聚类算法的选择。我们认为最近大型语言模型（LLMs）的进展可能会改善这项任务。在这项研究中，我们调查了不同文本嵌入（特别是LLMs中使用的嵌入）和聚类算法如何影响文本数据集的聚类。进行了一系列实验来评估嵌入如何影响聚类结果，摘要通过降维的作用，以及模型大小调整。研究结果显示，LLM嵌入在捕捉结构化语言的微妙之处方面表现出色，而BERT在性能方面领先于轻量级选项。此外，我们观察到增加模型维度和使用摘要技术并不一致地导致聚类效率的提高，这表明这些策略需要仔细分析才能在实际模型中使用。这些结果突显了在文本聚类应用中需要精细的文本表示和计算可行性之间的复杂平衡。这项研究通过将LLMs中的嵌入纳入传统文本聚类框架，为改进方法提供了一条道路，同时为未来在各种类型的文本分析中提供了新的研究方向。

更新时间: 2024-05-30 15:17:55

领域: cs.CL,cs.AI,cs.LG,I.2.6; I.2.7; I.7.m

下载: http://arxiv.org/abs/2403.15112v3

MSSC-BiMamba: Multimodal Sleep Stage Classification and Early Diagnosis of Sleep Disorders with Bidirectional Mamba

Background and Objectives: Monitoring sleep states is crucial for assessing sleep quality and diagnosing sleep disorders. Traditional manual staging methods are not only time-consuming but also subject to subjective judgment, leading to inconsistent results. This study developed an automated sleep staging and sleep disorder classification model through deep learning technology, aimed at improving diagnostic accuracy and efficiency. Methods: Considering the characteristics of polysomnography (PSG) multi-lead sleep monitoring, we designed a sleep state classification model, MSSC-BiMamba, that combines an Efficient Channel Attention (ECA) mechanism with a Bidirectional State Space Model (BSSM). The ECA module allows for weighting data from different sensor channels, thereby amplifying the influence of diverse sensor inputs. Additionally, the implementation of mamba enables the model to effectively capture the multidimensional features and long-range dependencies of PSG data. Results: The developed model demonstrated impressive performance on sleep stage classification tasks. Furthermore, the model exhibited an accuracy of 0.952 for sleep health prediction when evaluated on a combined dataset consisting of ISRUC and Sleep-EDF. Conclusion: Our model is the first to apply the bidirectional Mamba to sleep staging with complex PSG data, showing substantial gains in computational and memory efficiency over traditional Transformer-style models. This method not only makes health monitoring more accessible but also broadens the reach of advanced healthcare, thereby enhancing sleep health management with innovative technology.

Updated: 2024-05-30 15:16:53

标题: MSSC-BiMamba: 双向曼巴的多模式睡眠分期分类和睡眠障碍的早期诊断

摘要: 背景和目的：监测睡眠状态对评估睡眠质量和诊断睡眠障碍至关重要。传统的手动分期方法不仅耗时，而且受主观判断的影响，导致结果不一致。本研究通过深度学习技术开发了一个自动睡眠分期和睡眠障碍分类模型，旨在提高诊断准确性和效率。方法：考虑到多导睡眠监测多通道多态图（PSG）的特点，我们设计了一个睡眠状态分类模型MSSC-BiMamba，结合了高效通道关注（ECA）机制和双向状态空间模型（BSSM）。ECA模块允许对来自不同传感器通道的数据进行加权，从而放大不同传感器输入的影响。此外，mamba的实施使模型能够有效捕捉PSG数据的多维特征和长距离依赖性。结果：开发的模型在睡眠分期任务上表现出色。此外，当在包含ISRUC和Sleep-EDF的合并数据集上进行评估时，该模型展现出0.952的睡眠健康预测准确率。结论：我们的模型是首个将双向Mamba应用于复杂PSG数据的睡眠分期模型，相比传统的Transformer风格模型，在计算和内存效率上取得了显著进展。这种方法不仅使健康监测更加便捷，还拓展了先进医疗的范围，从而通过创新技术提升睡眠健康管理水平。

更新时间: 2024-05-30 15:16:53

领域: cs.AI

下载: http://arxiv.org/abs/2405.20142v1

GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning

Knowledge Graphs (KGs) represent human-crafted factual knowledge in the form of triplets (head, relation, tail), which collectively form a graph. Question Answering over KGs (KGQA) is the task of answering natural questions grounding the reasoning to the information provided by the KG. Large Language Models (LLMs) are the state-of-the-art models for QA tasks due to their remarkable ability to understand natural language. On the other hand, Graph Neural Networks (GNNs) have been widely used for KGQA as they can handle the complex graph information stored in the KG. In this work, we introduce GNN-RAG, a novel method for combining language understanding abilities of LLMs with the reasoning abilities of GNNs in a retrieval-augmented generation (RAG) style. First, a GNN reasons over a dense KG subgraph to retrieve answer candidates for a given question. Second, the shortest paths in the KG that connect question entities and answer candidates are extracted to represent KG reasoning paths. The extracted paths are verbalized and given as input for LLM reasoning with RAG. In our GNN-RAG framework, the GNN acts as a dense subgraph reasoner to extract useful graph information, while the LLM leverages its natural language processing ability for ultimate KGQA. Furthermore, we develop a retrieval augmentation (RA) technique to further boost KGQA performance with GNN-RAG. Experimental results show that GNN-RAG achieves state-of-the-art performance in two widely used KGQA benchmarks (WebQSP and CWQ), outperforming or matching GPT-4 performance with a 7B tuned LLM. In addition, GNN-RAG excels on multi-hop and multi-entity questions outperforming competing approaches by 8.9--15.5% points at answer F1.

Updated: 2024-05-30 15:14:24

标题: GNN-RAG：大型语言模型推理的图神经检索

摘要: 知识图谱（KGs）以三元组（头，关系，尾）的形式表示人工制定的事实知识，这些三元组共同构成一个图。在KGs上的问答（KGQA）是回答自然问题的任务，将推理基于KG提供的信息。大型语言模型（LLMs）是QA任务的最先进模型，因为它们具有出色的理解自然语言的能力。另一方面，图神经网络（GNNs）被广泛用于KGQA，因为它们可以处理存储在KG中的复杂图信息。在这项工作中，我们引入了GNN-RAG，一种将LLMs的语言理解能力与GNNs的推理能力以检索增强生成（RAG）风格相结合的新方法。首先，GNN在稠密的KG子图上推理，以检索给定问题的答案候选项。其次，提取连接问题实体和答案候选项的KG中的最短路径，以表示KG推理路径。提取的路径被用作LLM推理的输入，并与RAG一起进行。在我们的GNN-RAG框架中，GNN充当稠密子图推理器，提取有用的图信息，而LLM利用其自然语言处理能力进行最终的KGQA。此外，我们开发了检索增强（RA）技术，进一步提升了GNN-RAG的KGQA性能。实验结果表明，GNN-RAG在两个广泛使用的KGQA基准（WebQSP和CWQ）中实现了最先进的性能，在7B调整的LLM上优于或与GPT-4性能相匹配。此外，GNN-RAG在多跳和多实体问题上表现出色，比竞争方法在答案F1上提高了8.9-15.5个百分点。

更新时间: 2024-05-30 15:14:24

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20139v1

Separation and Collapse of Equilibria Inequalities on AND-OR Trees without Shape Constraints

Herein, we investigate the randomized complexity, which is the least cost against the worst input, of AND-OR tree computation by imposing various restrictions on the algorithm to find the Boolean value of the root of that tree and no restrictions on the tree shape. When a tree satisfies a certain condition regarding its symmetry, directional algorithms proposed by Saks and Wigderson (1986), special randomized algorithms, are known to achieve the randomized complexity. Furthermore, there is a known example of a tree that is so unbalanced that no directional algorithm achieves the randomized complexity (Vereshchagin 1998). In this study, we aim to identify where deviations arise between the general randomized Boolean decision tree and its special case, directional algorithms. In this paper, we show that for any AND-OR tree, randomized depth-first algorithms, which form a broader class compared with directional algorithms, have the same equilibrium as that of the directional algorithms. Thus, we get the collapse result on equilibria inequalities that holds for an arbitrary AND-OR tree. This implies that there exists a case where even depth-first algorithms cannot be the fastest, leading to the separation result on equilibria inequality. Additionally, a new algorithm is introduced as a key concept for proof of the separation result.

Updated: 2024-05-30 15:13:46

标题: AND-OR树上没有形状约束的平衡不等式的分离和崩溃

摘要: 在这里，我们研究了AND-OR树计算的随机复杂性，即通过对算法施加各种限制来找到该树根的布尔值的最小成本与最坏输入。当树满足关于对称性的某种条件时，由Saks和Wigderson（1986年）提出的定向算法，特殊随机算法，已知可以实现随机复杂度。此外，已知存在一种树的例子，它非常不平衡，以至于没有定向算法可以实现随机复杂度（Vereshchagin，1998年）。在这项研究中，我们的目标是确定一般随机化布尔决策树与其特殊情况定向算法之间的偏差出现在哪里。在本文中，我们表明对于任何AND-OR树，随机深度优先算法（与定向算法相比形成更广泛的类别）具有与定向算法相同的平衡性。因此，我们得到了对一般AND-OR树成立的平衡不等式的坍缩结果。这意味着存在一种情况，即使深度优先算法也不能是最快的，导致平衡不等式分离结果。此外，引入了一个新算法作为证明分离结果的关键概念。

更新时间: 2024-05-30 15:13:46

领域: cs.AI,68T20, 68Q17, 03D15, 91A60,I.2.8; F.2.2

下载: http://arxiv.org/abs/2405.20138v1

Neural Optimal Transport with General Cost Functionals

We introduce a novel neural network-based algorithm to compute optimal transport (OT) plans for general cost functionals. In contrast to common Euclidean costs, i.e., $\ell^1$ or $\ell^2$, such functionals provide more flexibility and allow using auxiliary information, such as class labels, to construct the required transport map. Existing methods for general costs are discrete and have limitations in practice, i.e. they do not provide an out-of-sample estimation. We address the challenge of designing a continuous OT approach for general costs that generalizes to new data points in high-dimensional spaces, such as images. Additionally, we provide the theoretical error analysis for our recovered transport plans. As an application, we construct a cost functional to map data distributions while preserving the class-wise structure.

Updated: 2024-05-30 15:11:45

标题: 具有一般成本函数的神经最优输运

摘要: 我们介绍了一种基于神经网络的新型算法，用于计算一般成本函数的最优传输（OT）方案。与常见的欧几里得成本，即$\ell^1$或$\ell^2$不同，这种函数提供了更多的灵活性，并允许使用辅助信息，如类标签，来构建所需的传输映射。现有的一般成本方法是离散的，在实践中存在局限性，即它们无法提供样本外估计。我们解决了为一般成本设计连续OT方法的挑战，该方法可以推广到高维空间中的新数据点，如图像。此外，我们还为我们恢复的传输方案提供了理论误差分析。作为应用，我们构建了一个成本函数，以映射数据分布同时保留类别结构。

更新时间: 2024-05-30 15:11:45

领域: cs.LG

下载: http://arxiv.org/abs/2205.15403v4

LLaMEA: A Large Language Model Evolutionary Algorithm for Automatically Generating Metaheuristics

Large Language Models (LLMs) such as GPT-4 have demonstrated their ability to understand natural language and generate complex code snippets. This paper introduces a novel Large Language Model Evolutionary Algorithm (LLaMEA) framework, leveraging GPT models for the automated generation and refinement of algorithms. Given a set of criteria and a task definition (the search space), LLaMEA iteratively generates, mutates and selects algorithms based on performance metrics and feedback from runtime evaluations. This framework offers a unique approach to generating optimized algorithms without requiring extensive prior expertise. We show how this framework can be used to generate novel black-box metaheuristic optimization algorithms automatically. LLaMEA generates multiple algorithms that outperform state-of-the-art optimization algorithms (Covariance Matrix Adaptation Evolution Strategy and Differential Evolution) on the five dimensional black box optimization benchmark (BBOB). The results demonstrate the feasibility of the framework and identify future directions for automated generation and optimization of algorithms via LLMs.

Updated: 2024-05-30 15:10:59

标题: LLaMEA：自动生成元启发式算法的大型语言模型进化算法

摘要: 大型语言模型（LLM）如GPT-4已经展示了它们理解自然语言并生成复杂代码片段的能力。本文介绍了一种新颖的大型语言模型进化算法（LLaMEA）框架，利用GPT模型自动生成和优化算法。给定一组标准和任务定义（搜索空间），LLaMEA根据性能指标和运行时评估的反馈，迭代生成、突变和选择算法。该框架提供了一种独特的方法来生成优化算法，无需广泛的先前专业知识。我们展示了如何利用该框架自动生成新的黑盒元启发式优化算法。LLaMEA生成多个算法，在五维黑盒优化基准（BBOB）上胜过最先进的优化算法（协方差矩阵适应进化策略和差分进化）。结果表明该框架的可行性，并确定了通过LLM自动生成和优化算法的未来方向。

更新时间: 2024-05-30 15:10:59

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2405.20132v1

Language Models Need Inductive Biases to Count Inductively

Counting is a fundamental example of generalization, whether viewed through the mathematical lens of Peano's axioms defining the natural numbers or the cognitive science literature for children learning to count. The argument holds for both cases that learning to count means learning to count infinitely. While few papers have tried to distill transformer "reasoning" to the simplest case of counting, investigating length generalization does occur throughout the literature. In the "train short, test long" paradigm of NLP, length refers to the training sentence length. In formal language recognition, length refers to the input sequence length, or the maximum stack size induced by a pushdown automata. In general problem solving, length refers to the number of hops in a deductive reasoning chain or the recursion depth. For all cases, counting is central to task success. And crucially, generalizing counting inductively is central to success on OOD instances. This work provides extensive empirical results on training language models to count. We experiment with architectures ranging from RNNs, Transformers, State-Space Models and RWKV. We present carefully-designed task formats, auxiliary tasks and positional embeddings to avoid limitations in generalization with OOD-position and OOD-vocabulary. We find that while traditional RNNs trivially achieve inductive counting, Transformers have to rely on positional embeddings to count out-of-domain. As counting is the basis for many arguments concerning the expressivity of Transformers, our finding calls for the community to reexamine the application scope of primitive functions defined in formal characterizations. Finally, modern RNNs also largely underperform traditional RNNs in generalizing counting inductively. We discuss how design choices that enable parallelized training of modern RNNs cause them to lose merits of a recurrent nature.

Updated: 2024-05-30 15:10:37

标题: 语言模型需要归纳偏差来归纳计数

摘要: 计数是一种泛化的基本示例，无论是通过定义自然数的皮亚诺公理的数学角度来看，还是通过认知科学文献来看待儿童学习计数。在这两种情况下，学习计数意味着学习无限计数。虽然很少有论文尝试将变压器的“推理”简化为最简单的计数情况，但在文献中确实存在对长度泛化的研究。在NLP中的“训练短，测试长”范式中，长度指的是训练句子的长度。在形式语言识别中，长度指的是输入序列的长度，或者由下推自动机引起的最大堆栈大小。在一般问题解决中，长度指的是演绎推理链中的跳数或递归深度。对于所有情况，计数对任务成功至关重要。而且，归纳地泛化计数对OOD实例的成功至关重要。这项工作提供了训练语言模型进行计数的广泛经验结果。我们尝试了从RNN、变压器、状态空间模型到RWKV的各种架构。我们提出了精心设计的任务格式、辅助任务和位置嵌入，以避免在OOD位置和OOD词汇泛化方面的限制。我们发现，虽然传统的RNN在归纳计数方面轻而易举地取得了成功，但变压器必须依赖位置嵌入来计数域外。由于计数是许多关于变压器表现能力的论点的基础，我们的发现呼吁社区重新审视在正式特性定义中定义的原始函数的应用范围。最后，现代RNN在归纳计数方面也大多表现不佳。我们讨论了使现代RNN能够并行训练的设计选择如何导致它们失去了递归性质的优点。

更新时间: 2024-05-30 15:10:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.20131v1

Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study

Despite the impressive capabilities of Multimodal Large Language Models (MLLMs) in integrating text and image modalities, challenges remain in accurately interpreting detailed visual elements. This paper presents an empirical study on enhancing MLLMs with state-of-the-art (SOTA) object detection and Optical Character Recognition (OCR) models to improve fine-grained understanding and reduce hallucination in responses. We investigate the embedding-based infusion of textual detection information, the impact of such infusion on MLLMs' original abilities, and the interchangeability of detection models. We conduct systematic and extensive experiments with representative models such as LLaVA-1.5, DINO, PaddleOCRv2, and Grounding DINO, revealing that our simple yet general approach not only refines MLLMs' performance in fine-grained visual tasks but also maintains their original strengths. Notably, the enhanced LLaVA-1.5 outperforms its original 7B/13B models on all 10 benchmarks, achieving an improvement of up to 12.5% on the normalized average score. We release our codes to facilitate further exploration into the fine-grained multimodal capabilities of MLLMs.

Updated: 2024-05-30 15:09:49

标题: 利用视觉检测模型增强多模态大型语言模型：一项实证研究

摘要: 尽管多模态大型语言模型（MLLMs）在整合文本和图像模态方面具有令人印象深刻的能力，但在准确解释详细的视觉元素方面仍存在挑战。本文提出了一项实证研究，通过使用最先进的目标检测和光学字符识别（OCR）模型来增强MLLMs，以改善细粒度理解并减少响应中的幻觉。我们研究了基于嵌入的文本检测信息注入、这种注入对MLLMs原始能力的影响以及检测模型的互换性。我们使用代表性模型进行系统和广泛的实验，如LLaVA-1.5、DINO、PaddleOCRv2和Grounding DINO，结果显示我们简单而通用的方法不仅提高了MLLMs在细粒度视觉任务中的性能，而且保持了它们的原始优势。值得注意的是，增强的LLaVA-1.5在所有10个基准测试中表现优于其原始的7B/13B模型，平均得分提高了高达12.5%。我们发布了我们的代码，以促进进一步探索MLLMs的细粒度多模态能力。

更新时间: 2024-05-30 15:09:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2401.17981v2

SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning

Cross-device training is a crucial subfield of federated learning, where the number of clients can reach into the billions. Standard approaches and local methods are prone to issues such as client drift and insensitivity to data similarities. We propose a novel algorithm (SPAM) for cross-device federated learning with non-convex losses, which solves both issues. We provide sharp analysis under second-order (Hessian) similarity, a condition satisfied by a variety of machine learning problems in practice. Additionally, we extend our results to the partial participation setting, where a cohort of selected clients communicate with the server at each communication round. Our method is the first in its kind, that does not require the smoothness of the objective and provably benefits from clients having similar data.

Updated: 2024-05-30 15:07:30

标题: SPAM：带动量方差减少的随机近端点方法用于非凸交叉设备联邦学习

摘要: 跨设备训练是联邦学习的一个关键子领域，其中客户端数量可达数十亿。标准方法和本地方法容易出现诸如客户端漂移和对数据相似性不敏感等问题。我们提出了一种新颖的算法（SPAM）用于解决非凸损失的跨设备联邦学习问题，同时解决了这两个问题。我们在满足实际机器学习问题的二阶（Hessian）相似性条件下进行了深入分析。此外，我们将结果扩展到部分参与设置，其中一组选定的客户端在每一轮通信中与服务器通信。我们的方法是其类别中的第一个，不需要目标函数的平滑性，并且可以证明受益于客户端具有相似数据。

更新时间: 2024-05-30 15:07:30

领域: math.OC,cs.LG,90C26

下载: http://arxiv.org/abs/2405.20127v1

A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective

Offline reinforcement learning endeavors to leverage offline datasets to craft effective agent policy without online interaction, which imposes proper conservative constraints with the support of behavior policies to tackle the out-of-distribution problem. However, existing works often suffer from the constraint conflict issue when offline datasets are collected from multiple behavior policies, i.e., different behavior policies may exhibit inconsistent actions with distinct returns across the state space. To remedy this issue, recent advantage-weighted methods prioritize samples with high advantage values for agent training while inevitably ignoring the diversity of behavior policy. In this paper, we introduce a novel Advantage-Aware Policy Optimization (A2PO) method to explicitly construct advantage-aware policy constraints for offline learning under mixed-quality datasets. Specifically, A2PO employs a conditional variational auto-encoder to disentangle the action distributions of intertwined behavior policies by modeling the advantage values of all training data as conditional variables. Then the agent can follow such disentangled action distribution constraints to optimize the advantage-aware policy towards high advantage values. Extensive experiments conducted on both the single-quality and mixed-quality datasets of the D4RL benchmark demonstrate that A2PO yields results superior to the counterparts. Our code will be made publicly available.

Updated: 2024-05-30 15:04:42

标题: A2PO: 朝向有效的离线强化学习，从一个认识优势的角度

摘要: 离线强化学习旨在利用离线数据集来构建有效的智能体策略，而无需在线交互，通过适当的保守约束以支持行为策略来解决超出分布问题。然而，现有研究往往在离线数据集来自多个行为策略时遇到约束冲突问题，即不同行为策略可能在状态空间中展示出不一致的行动和不同的回报。为了解决这个问题，最近的基于优势权重的方法优先考虑具有高优势值的样本，以便训练智能体，同时不可避免地忽略了行为策略的多样性。本文介绍了一种新颖的基于优势的策略优化（A2PO）方法，用于明确构建离线学习下混合质量数据集的优势感知策略约束。具体来说，A2PO利用条件变分自动编码器来解开相互交织的行为策略的动作分布，通过将所有训练数据的优势值建模为条件变量。然后，智能体可以遵循这种解开的动作分布约束，以优化朝向高优势值的优势感知策略。对D4RL基准测试单一质量和混合质量数据集进行的大量实验表明，A2PO的结果优于对照组。我们的代码将公开提供。

更新时间: 2024-05-30 15:04:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.07262v2

Trust-based Consensus in Multi-Agent Reinforcement Learning Systems

An often neglected issue in multi-agent reinforcement learning (MARL) is the potential presence of unreliable agents in the environment whose deviations from expected behavior can prevent a system from accomplishing its intended tasks. In particular, consensus is a fundamental underpinning problem of cooperative distributed multi-agent systems. Consensus requires different agents, situated in a decentralized communication network, to reach an agreement out of a set of initial proposals that they put forward. Learning-based agents should adopt a protocol that allows them to reach consensus despite having one or more unreliable agents in the system. This paper investigates the problem of unreliable agents in MARL, considering consensus as a case study. Echoing established results in the distributed systems literature, our experiments show that even a moderate fraction of such agents can greatly impact the ability of reaching consensus in a networked environment. We propose Reinforcement Learning-based Trusted Consensus (RLTC), a decentralized trust mechanism, in which agents can independently decide which neighbors to communicate with. We empirically demonstrate that our trust mechanism is able to handle unreliable agents effectively, as evidenced by higher consensus success rates.

Updated: 2024-05-30 15:04:27

标题: 多智能体强化学习系统中基于信任的共识

摘要: 在多智能体强化学习（MARL）中经常被忽视的问题是环境中可能存在不可靠智能体，其偏离预期行为可能阻止系统完成其预期任务。特别是，共识是合作分布式多智能体系统的一个基本支撑问题。共识要求处于分散通信网络中的不同智能体就它们提出的一组初始提议达成一致意见。基于学习的智能体应该采用一种协议，使它们能够在系统中存在一个或多个不可靠智能体的情况下达成共识。本文研究了MARL中不可靠智能体的问题，以共识作为一个案例研究。与分布式系统文献中的已建立结果相呼应，我们的实验表明，即使有适度比例的这种智能体，也会极大地影响在网络环境中达成共识的能力。我们提出了基于强化学习的可信共识（RLTC），这是一种分散的信任机制，智能体可以独立决定与哪些邻居进行通信。我们从实证方面证明，我们的信任机制能够有效处理不可靠智能体，表现为更高的共识成功率。

更新时间: 2024-05-30 15:04:27

领域: cs.MA,cs.AI,cs.LG

下载: http://arxiv.org/abs/2205.12880v2

PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

There has been significant interest in "extreme" compression of large language models (LLMs), i.e., to 1-2 bits per parameter, which allows such models to be executed efficiently on resource-constrained devices. Existing work focused on improved one-shot quantization techniques and weight representations; yet, purely post-training approaches are reaching diminishing returns in terms of the accuracy-vs-bit-width trade-off. State-of-the-art quantization methods such as QuIP# and AQLM include fine-tuning (part of) the compressed parameters over a limited amount of calibration data; however, such fine-tuning techniques over compressed weights often make exclusive use of straight-through estimators (STE), whose performance is not well-understood in this setting. In this work, we question the use of STE for extreme LLM compression, showing that it can be sub-optimal, and perform a systematic study of quantization-aware fine-tuning strategies for LLMs. We propose PV-Tuning - a representation-agnostic framework that generalizes and improves upon existing fine-tuning strategies, and provides convergence guarantees in restricted cases. On the practical side, when used for 1-2 bit vector quantization, PV-Tuning outperforms prior techniques for highly-performant models such as Llama and Mistral. Using PV-Tuning, we achieve the first Pareto-optimal quantization for Llama 2 family models at 2 bits per parameter.

Updated: 2024-05-30 15:01:49

标题: PV-Tuning：超越直通估计的极端LLM压缩

摘要: 对于大型语言模型（LLMs）的“极端”压缩，即每个参数1-2位，引起了相当大的兴趣，这使得这些模型可以在资源受限设备上高效执行。现有工作集中在改进一次性量化技术和权重表示上；然而，在准确性与位宽之间的权衡方面，纯后训练方法正在递减。最先进的量化方法，比如QuIP#和AQLM，包括在有限数量的校准数据上对部分压缩参数进行微调；然而，对压缩权重进行微调的技术通常仅使用直通估计器（STE），在这种环境中其性能并不被很好理解。在这项工作中，我们对于极端LLM压缩中使用STE的做法提出疑问，表明这可能不是最佳选择，并对LLMs的量化感知微调策略进行了系统研究。我们提出了PV-Tuning - 一个与表示无关的框架，可以泛化和改进现有的微调策略，并在受限情况下提供收敛保证。在实践方面，当用于1-2位矢量量化时，PV-Tuning在高性能模型（如Llama和Mistral）上表现优于先前的技术。利用PV-Tuning，我们实现了Llama 2家族模型在每个参数2位的第一个帕累托最优量化。

更新时间: 2024-05-30 15:01:49

领域: cs.LG

下载: http://arxiv.org/abs/2405.14852v2

A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set

The state-of-the-art methods for estimating high-dimensional covariance matrices all shrink the eigenvalues of the sample covariance matrix towards a data-insensitive shrinkage target. The underlying shrinkage transformation is either chosen heuristically - without compelling theoretical justification - or optimally in view of restrictive distributional assumptions. In this paper, we propose a principled approach to construct covariance estimators without imposing restrictive assumptions. That is, we study distributionally robust covariance estimation problems that minimize the worst-case Frobenius error with respect to all data distributions close to a nominal distribution, where the proximity of distributions is measured via a divergence on the space of covariance matrices. We identify mild conditions on this divergence under which the resulting minimizers represent shrinkage estimators. We show that the corresponding shrinkage transformations are intimately related to the geometrical properties of the underlying divergence. We also prove that our robust estimators are efficiently computable and asymptotically consistent and that they enjoy finite-sample performance guarantees. We exemplify our general methodology by synthesizing explicit estimators induced by the Kullback-Leibler, Fisher-Rao, and Wasserstein divergences. Numerical experiments based on synthetic and real data show that our robust estimators are competitive with state-of-the-art estimators.

Updated: 2024-05-30 15:01:18

标题: 一个几何统一的分布鲁棒协方差估计器：通过扩大模糊集合来缩小频谱

摘要: 目前用于估计高维协方差矩阵的先进方法都将样本协方差矩阵的特征值收缩到一个与数据无关的收缩目标。潜在的收缩转换要么是根据启发式方法选择的，没有强有力的理论依据，要么是在严格的分布假设下最优选择的。在本文中，我们提出了一种基于原则的方法来构建协方差估计器，而不会施加严格的假设。也就是说，我们研究了分布鲁棒的协方差估计问题，该问题在接近名义分布的所有数据分布中最小化最坏情况的Frobenius误差，其中分布之间的接近程度通过协方差矩阵空间上的散度来衡量。我们确定了关于这种散度的温和条件，使得得到的最小化函数代表了收缩估计器。我们证明了相应的收缩转换与基础散度的几何特性密切相关。我们还证明了我们的鲁棒估计器是高效可计算的，并且渐近一致，并且具有有限样本性能保证。我们通过合成由Kullback-Leibler、Fisher-Rao和Wasserstein散度引起的显式估计器来举例说明我们的一般方法。基于合成和真实数据的数值实验表明，我们的鲁棒估计器与最先进的估计器具有竞争力。

更新时间: 2024-05-30 15:01:18

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2405.20124v1

A Structure-Aware Lane Graph Transformer Model for Vehicle Trajectory Prediction

Accurate prediction of future trajectories for surrounding vehicles is vital for the safe operation of autonomous vehicles. This study proposes a Lane Graph Transformer (LGT) model with structure-aware capabilities. Its key contribution lies in encoding the map topology structure into the attention mechanism. To address variations in lane information from different directions, four Relative Positional Encoding (RPE) matrices are introduced to capture the local details of the map topology structure. Additionally, two Shortest Path Distance (SPD) matrices are employed to capture distance information between two accessible lanes. Numerical results indicate that the proposed LGT model achieves a significantly higher prediction performance on the Argoverse 2 dataset. Specifically, the minFDE$_6$ metric was decreased by 60.73% compared to the Argoverse 2 baseline model (Nearest Neighbor) and the b-minFDE$_6$ metric was reduced by 2.65% compared to the baseline LaneGCN model. Furthermore, ablation experiments demonstrated that the consideration of map topology structure led to a 4.24% drop in the b-minFDE$_6$ metric, validating the effectiveness of this model.

Updated: 2024-05-30 14:57:16

标题: 一个结构感知的车道图变换器模型用于车辆轨迹预测

摘要: 对周围车辆未来轨迹的准确预测对自动驾驶车辆的安全运行至关重要。本研究提出了一种具有结构感知能力的Lane Graph Transformer (LGT)模型。其关键贡献在于将地图拓扑结构编码到注意力机制中。为了解决不同方向的车道信息变化，引入了四个相对位置编码矩阵以捕捉地图拓扑结构的局部细节。此外，还采用了两个最短路径距离矩阵来捕捉两个可访问车道之间的距离信息。数值结果表明，所提出的LGT模型在Argoverse 2数据集上取得了显著更高的预测性能。具体而言，与Argoverse 2基线模型（最近邻）相比，minFDE$_6$指标减少了60.73%，b-minFDE$_6$指标与基线LaneGCN模型相比减少了2.65%。此外，消融实验表明，考虑地图拓扑结构导致b-minFDE$_6$指标下降了4.24%，验证了该模型的有效性。

更新时间: 2024-05-30 14:57:16

领域: cs.AI

下载: http://arxiv.org/abs/2405.20121v1

Near Optimal Decentralized Optimization with Compression and Momentum Tracking

Communication efficiency has garnered significant attention as it is considered the main bottleneck for large-scale decentralized Machine Learning applications in distributed and federated settings. In this regime, clients are restricted to transmitting small amounts of quantized information to their neighbors over a communication graph. Numerous endeavors have been made to address this challenging problem by developing algorithms with compressed communication for decentralized non-convex optimization problems. Despite considerable efforts, the current results suffer from various issues such as non-scalability with the number of clients, requirements for large batches, or bounded gradient assumption. In this paper, we introduce MoTEF, a novel approach that integrates communication compression with Momentum Tracking and Error Feedback. Our analysis demonstrates that MoTEF achieves most of the desired properties, and significantly outperforms existing methods under arbitrary data heterogeneity. We provide numerical experiments to validate our theoretical findings and confirm the practical superiority of MoTEF.

Updated: 2024-05-30 14:51:57

标题: 使用压缩和动量跟踪实现近乎最佳的分散优化

摘要: 通信效率已经引起了重要的关注，因为它被认为是大规模分布式和联邦设置中去中心化机器学习应用的主要瓶颈。在这种情况下，客户端被限制在通信图上向其邻居传输少量量化信息。已经进行了许多努力来解决这一具有挑战性的问题，通过开发具有压缩通信的算法来处理去中心化非凸优化问题。尽管付出了大量努力，但当前的结果存在各种问题，例如与客户端数量的不可扩展性、对大批量的要求或有界梯度假设。在本文中，我们介绍了MoTEF，这是一种集成了动量跟踪和误差反馈的通信压缩的新方法。我们的分析表明，MoTEF实现了大部分期望的性质，并在任意数据异质性下明显优于现有方法。我们提供了数值实验来验证我们的理论发现，并确认MoTEF的实际优越性。

更新时间: 2024-05-30 14:51:57

领域: cs.LG,cs.AI,math.OC,stat.ML

下载: http://arxiv.org/abs/2405.20114v1

Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmic components. Inspired by Cutkosky et al. (2023), we consider the framework called online learning of updates/increments, where we choose the updates/increments of an optimizer based on an online learner. With this framework, the design of a good optimizer is reduced to the design of a good online learner. Our main observation is that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). Building on this observation, we study the benefits of its algorithmic components from the online learning perspective.

Updated: 2024-05-30 14:49:45

标题: 通过在线学习更新理解Adam优化器：Adam其实是伪装成FTRL算法

摘要: 尽管Adam优化器在实践中取得了成功，但对其算法组件的理论理解仍然有限。特别是，大多数现有对Adam的分析显示，其收敛速度可以简单地通过类似SGD的非自适应算法实现。在这项工作中，我们提供了基于在线学习的不同视角，强调了Adam算法组件的重要性。受Cutkosky等人（2023年）的启发，我们考虑了一个名为更新/增量的在线学习框架，在这个框架中，我们基于在线学习者选择优化器的更新/增量。通过这个框架，设计一个好的优化器被简化为设计一个好的在线学习者。我们的主要观察是，Adam对应于一个被称为Follow-the-Regularized-Leader（FTRL）的原则性在线学习框架。基于这一观察，我们从在线学习的角度研究了其算法组件的好处。

更新时间: 2024-05-30 14:49:45

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2402.01567v2

On the Representational Capacity of Recurrent Neural Language Models

This work investigates the computational expressivity of language models (LMs) based on recurrent neural networks (RNNs). Siegelmann and Sontag (1992) famously showed that RNNs with rational weights and hidden states and unbounded computation time are Turing complete. However, LMs define weightings over strings in addition to just (unweighted) language membership and the analysis of the computational power of RNN LMs (RLMs) should reflect this. We extend the Turing completeness result to the probabilistic case, showing how a rationally weighted RLM with unbounded computation time can simulate any deterministic probabilistic Turing machine (PTM) with rationally weighted transitions. Since, in practice, RLMs work in real-time, processing a symbol at every time step, we treat the above result as an upper bound on the expressivity of RLMs. We also provide a lower bound by showing that under the restriction to real-time computation, such models can simulate deterministic real-time rational PTMs.

Updated: 2024-05-30 14:49:25

标题: 关于循环神经语言模型的表征能力

摘要: 这项工作研究了基于循环神经网络（RNNs）的语言模型（LMs）的计算表达能力。Siegelmann和Sontag（1992）曾经著名地展示了具有有理权重和隐藏状态以及无限计算时间的RNNs是图灵完备的。然而，LMs除了仅仅（无权重的）语言成员关系外，还定义了对字符串的加权，对RNN LMs（RLMs）的计算能力分析应该反映这一点。我们将图灵完备性结果扩展到概率情况，展示了具有有理权重和无限计算时间的Rational RLM如何模拟任何具有有理权重转换的确定性概率图灵机（PTM）。由于在实践中，RLMs实时工作，每个时间步处理一个符号，我们将上述结果视为RLMs表达能力的上限。我们还通过展示在实时计算限制下，这些模型可以模拟确定性实时有理PTMs，提供了一个下限。

更新时间: 2024-05-30 14:49:25

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2310.12942v5

Identifying Drivers of Predictive Aleatoric Uncertainty

Explainability and uncertainty quantification are two pillars of trustable artificial intelligence. However, the reasoning behind uncertainty estimates is generally left unexplained. Identifying the drivers of uncertainty complements explanations of point predictions in recognizing model limitations and enhances trust in decisions and their communication. So far, explanations of uncertainties have been rarely studied. The few exceptions rely on Bayesian neural networks or technically intricate approaches, such as auxiliary generative models, thereby hindering their broad adoption. We present a simple approach to explain predictive aleatoric uncertainties. We estimate uncertainty as predictive variance by adapting a neural network with a Gaussian output distribution. Subsequently, we apply out-of-the-box explainers to the model's variance output. This approach can explain uncertainty influences more reliably than literature baselines, which we evaluate in a synthetic setting with a known data-generating process. We further adapt multiple metrics from conventional XAI research to uncertainty explanations. We quantify our findings with a nuanced benchmark analysis that includes real-world datasets. Finally, we apply our approach to an age regression model and discover reasonable sources of uncertainty. Overall, we explain uncertainty estimates with little modifications to the model architecture and demonstrate that our approach competes effectively with more intricate methods.

Updated: 2024-05-30 14:48:06

标题: 确定预测性随机不确定性的驱动因素

摘要: 解释性和不确定性量化是可信人工智能的两大支柱。然而，通常未解释不确定性估计背后的推理。识别不确定性的驱动因素可以补充对点预测的解释，认识模型的局限性，并增强对决策及其沟通的信任。到目前为止，不确定性的解释很少被研究。少数例外依赖于贝叶斯神经网络或技术复杂的方法，如辅助生成模型，从而阻碍其广泛采用。我们提出了一种简单的方法来解释预测的随机不确定性。我们通过调整具有高斯输出分布的神经网络来估计不确定性作为预测方差。随后，我们将现成的解释器应用于模型的方差输出。这种方法可以更可靠地解释不确定性影响，优于文献基线，我们在已知数据生成过程的合成环境中进行评估。我们进一步将传统XAI研究中的多个指标调整为不确定性解释。我们通过包括真实世界数据集在内的细致基准分析来量化我们的发现。最后，我们将我们的方法应用于一个年龄回归模型，并发现了合理的不确定性来源。总的来说，我们通过对模型架构进行少量修改来解释不确定性估计，并且证明我们的方法与更复杂的方法有效竞争。

更新时间: 2024-05-30 14:48:06

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2312.07252v2

Goals as Reward-Producing Programs

People are remarkably capable of generating their own goals, beginning with child's play and continuing into adulthood. Despite considerable empirical and computational work on goals and goal-oriented behavior, models are still far from capturing the richness of everyday human goals. Here, we bridge this gap by collecting a dataset of human-generated playful goals, modeling them as reward-producing programs, and generating novel human-like goals through program synthesis. Reward-producing programs capture the rich semantics of goals through symbolic operations that compose, add temporal constraints, and allow for program execution on behavioral traces to evaluate progress. To build a generative model of goals, we learn a fitness function over the infinite set of possible goal programs and sample novel goals with a quality-diversity algorithm. Human evaluators found that model-generated goals, when sampled from partitions of program space occupied by human examples, were indistinguishable from human-created games. We also discovered that our model's internal fitness scores predict games that are evaluated as more fun to play and more human-like.

Updated: 2024-05-30 14:46:04

标题: 目标作为产生奖励的程序

摘要: 人们非常擅长制定自己的目标，从儿童游戏开始，一直延续到成年。尽管在目标和目标导向行为方面进行了大量实证研究和计算工作，但模型仍远未能捕捉到日常人类目标的丰富性。在这里，我们通过收集人类生成的有趣目标数据集，将其建模为产生奖励的程序，并通过程序合成生成类人类的新目标来弥合这一差距。产生奖励的程序通过符号操作捕捉目标的丰富语义，这些操作包括组合、添加时间约束，并允许在行为轨迹上执行程序来评估进展。为了构建目标的生成模型，我们学习了一个适应于无限可能目标程序集的适应度函数，并使用质量-多样性算法对新颖目标进行抽样。人类评估者发现，从被人类示例占据的程序空间划分中抽样的模型生成的目标与人类创建的游戏无法区分。我们还发现，我们模型内部的适应度分数预测了玩起来更有趣、更类人类的游戏。

更新时间: 2024-05-30 14:46:04

领域: cs.AI

下载: http://arxiv.org/abs/2405.13242v2

Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks

Safety, security, and compliance are essential requirements when aligning large language models (LLMs). However, many seemingly aligned LLMs are soon shown to be susceptible to jailbreak attacks. These attacks aim to circumvent the models' safety guardrails and security mechanisms by introducing jailbreak prompts into malicious queries. In response to these challenges, this paper introduces Defensive Prompt Patch (DPP), a novel prompt-based defense mechanism specifically designed to protect LLMs against such sophisticated jailbreak strategies. Unlike previous approaches, which have often compromised the utility of the model for the sake of safety, DPP is designed to achieve a minimal Attack Success Rate (ASR) while preserving the high utility of LLMs. Our method uses strategically designed interpretable suffix prompts that effectively thwart a wide range of standard and adaptive jailbreak techniques. Empirical results conducted on LLAMA-2-7B-Chat and Mistral-7B-Instruct-v0.2 models demonstrate the robustness and adaptability of DPP, showing significant reductions in ASR with negligible impact on utility. Our approach not only outperforms existing defense strategies in balancing safety and functionality, but also provides a scalable and interpretable solution applicable to various LLM platforms.

Updated: 2024-05-30 14:40:35

标题: 防御性快速补丁：LLM对越狱攻击的稳健和可解释防御

摘要: 安全、安全和合规性是调整大型语言模型（LLMs）时必不可少的要求。然而，许多表面上对齐的LLMs很快就被证明容易受到越狱攻击的影响。这些攻击旨在通过向恶意查询引入越狱提示来规避模型的安全防护和安全机制。为了应对这些挑战，本文介绍了防御性提示修补（DPP），这是一种新颖的基于提示的防御机制，专门设计用于保护LLMs免受这种复杂的越狱策略的攻击。与先前通常为了安全而牺牲模型效用的方法不同，DPP旨在实现最小的攻击成功率（ASR），同时保持LLMs的高效用性。我们的方法使用策略性设计的可解释后缀提示，有效地挫败了各种标准和自适应越狱技术。在LLAMA-2-7B-Chat和Mistral-7B-Instruct-v0.2模型上进行的实证结果展示了DPP的鲁棒性和适应性，显示出ASR显着降低，对效用几乎没有影响。我们的方法不仅在平衡安全性和功能性方面优于现有的防御策略，还提供了一个可扩展和可解释的解决方案，适用于各种LLM平台。

更新时间: 2024-05-30 14:40:35

领域: cs.CR

下载: http://arxiv.org/abs/2405.20099v1

ReMatch: Retrieval Enhanced Schema Matching with LLMs

Schema matching is a crucial task in data integration, involving the alignment of a source schema with a target schema to establish correspondence between their elements. This task is challenging due to textual and semantic heterogeneity, as well as differences in schema sizes. Although machine-learning-based solutions have been explored in numerous studies, they often suffer from low accuracy, require manual mapping of the schemas for model training, or need access to source schema data which might be unavailable due to privacy concerns. In this paper we present a novel method, named ReMatch, for matching schemas using retrieval-enhanced Large Language Models (LLMs). Our method avoids the need for predefined mapping, any model training, or access to data in the source database. Our experimental results on large real-world schemas demonstrate that ReMatch is an effective matcher. By eliminating the requirement for training data, ReMatch becomes a viable solution for real-world scenarios.

Updated: 2024-05-30 14:33:46

标题: ReMatch：利用LLMs增强的检索模式匹配

摘要: 模式匹配在数据集成中是一项至关重要的任务，涉及将源模式与目标模式进行对齐，以建立它们之间元素的对应关系。由于文本和语义的异质性以及模式大小的差异，这项任务具有挑战性。尽管许多研究已经探索了基于机器学习的解决方案，但它们往往存在准确性低、需要手动映射模式进行模型训练，或需要访问可能由于隐私问题而无法获取的源模式数据。本文提出了一种名为ReMatch的新方法，利用增强检索的大型语言模型（LLMs）进行模式匹配。我们的方法避免了预定义映射、任何模型训练或访问源数据库中的数据的需要。我们对大型真实世界模式的实验结果表明，ReMatch是一个有效的匹配器。通过消除训练数据的要求，ReMatch成为了现实世界场景的可行解决方案。

更新时间: 2024-05-30 14:33:46

领域: cs.DB,cs.AI

下载: http://arxiv.org/abs/2403.01567v2

Low-dimensional approximations of the conditional law of Volterra processes: a non-positive curvature approach

Predicting the conditional evolution of Volterra processes with stochastic volatility is a crucial challenge in mathematical finance. While deep neural network models offer promise in approximating the conditional law of such processes, their effectiveness is hindered by the curse of dimensionality caused by the infinite dimensionality and non-smooth nature of these problems. To address this, we propose a two-step solution. Firstly, we develop a stable dimension reduction technique, projecting the law of a reasonably broad class of Volterra process onto a low-dimensional statistical manifold of non-positive sectional curvature. Next, we introduce a sequentially deep learning model tailored to the manifold's geometry, which we show can approximate the projected conditional law of the Volterra process. Our model leverages an auxiliary hypernetwork to dynamically update its internal parameters, allowing it to encode non-stationary dynamics of the Volterra process, and it can be interpreted as a gating mechanism in a mixture of expert models where each expert is specialized at a specific point in time. Our hypernetwork further allows us to achieve approximation rates that would seemingly only be possible with very large networks.

Updated: 2024-05-30 14:32:06

标题: Volterra 过程条件定律的低维逼近：一种非正曲率方法

摘要: 预测具有随机波动率的Volterra过程的条件演化在数学金融中是一个重要挑战。深度神经网络模型在逼近这类过程的条件律方面具有潜力，但由于这些问题的无限维度和非光滑性质而受到维度诅咒的影响。为了解决这个问题，我们提出了一个两步解决方案。首先，我们开发了一种稳定的维度缩减技术，将一个相当广泛的Volterra过程类的律投影到具有非正曲率的低维统计流形上。接下来，我们引入一个针对流形几何特性定制的顺序深度学习模型，我们展示该模型可以逼近Volterra过程的投影条件律。我们的模型利用一个辅助超网络动态更新其内部参数，使其能够编码Volterra过程的非平稳动态，并且可以被解释为一个专家模型混合中的门控机制，其中每个专家在特定时间点上专门化。我们的超网络还使我们能够实现似乎只能通过非常庞大的网络才能实现的近似速率。

更新时间: 2024-05-30 14:32:06

领域: math.NA,cs.LG,cs.NA,cs.NE,math.DG,q-fin.CP

下载: http://arxiv.org/abs/2405.20094v1

Systematic Analysis for Pretrained Language Model Priming for Parameter-Efficient Fine-tuning

Parameter-efficient (PE) methods (like Prompts or Adapters) for adapting pre-trained language models (PLM) to downstream tasks have been popular recently. However, hindrances still prevent these methods from reaching their full potential. For example, two significant challenges are few-shot adaptation and cross-task generalization. To tackle these issues, we propose a general PE priming framework to enhance and explore the few-shot adaptation and generalization ability of PE methods. In this framework, PLMs are primed with PE methods for rapidly adapting to various target tasks. To evaluate the generalization ability of these PE methods, we conduct experiments on a few-shot cross-domain benchmark containing 160 diverse NLP tasks. Our experiment not only reveals the best priming strategy but also verifies that priming facilitates the adaptation to target tasks.

Updated: 2024-05-30 14:27:21

标题: 预训练语言模型引导的参数高效微调的系统分析

摘要: Parameter-efficient (PE)方法（如Prompts或Adapters）用于将预训练语言模型（PLM）适应到下游任务近来变得流行。然而，仍然存在阻碍这些方法充分发挥潜力的障碍。例如，两个重要挑战是少样本适应和跨任务泛化。为了解决这些问题，我们提出了一个通用的PE启动框架，以增强和探索PE方法的少样本适应和泛化能力。在这个框架中，PLMs通过PE方法进行启动，以快速适应各种目标任务。为了评估这些PE方法的泛化能力，我们在包含160个不同NLP任务的少样本跨领域基准测试上进行实验。我们的实验不仅揭示了最佳的启动策略，还验证了启动有助于适应目标任务。

更新时间: 2024-05-30 14:27:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2212.01032v2

Analysis of a multi-target linear shrinkage covariance estimator

Multi-target linear shrinkage is an extension of the standard single-target linear shrinkage for covariance estimation. We combine several constant matrices - the targets - with the sample covariance matrix. We derive the oracle and a \textit{bona fide} multi-target linear shrinkage estimator with exact and empirical mean. In both settings, we proved its convergence towards the oracle under Kolmogorov asymptotics. Finally, we show empirically that it outperforms other standard estimators in various situations.

Updated: 2024-05-30 14:16:32

标题: 多目标线性收缩协方差估计量的分析

摘要: 多目标线性收缩是标准单目标线性收缩用于协方差估计的扩展。我们将多个恒定矩阵 - 目标 - 与样本协方差矩阵相结合。我们推导了正交和\textit{真实}的多目标线性收缩估计器，具有精确和经验均值。在两种设置中，我们证明了在Kolmogorov渐近性下其收敛于正交估计。最后，我们在实证研究中证明了它在各种情况下优于其他标准估计器。

更新时间: 2024-05-30 14:16:32

领域: math.ST,cs.LG,math.PR,stat.ML,stat.TH

下载: http://arxiv.org/abs/2405.20086v1

Soft Partitioning of Latent Space for Semantic Channel Equalization

Semantic channel equalization has emerged as a solution to address language mismatch in multi-user semantic communications. This approach aims to align the latent spaces of an encoder and a decoder which were not jointly trained and it relies on a partition of the semantic (latent) space into atoms based on the the semantic meaning. In this work we explore the role of the semantic space partition in scenarios where the task structure involves a one-to-many mapping between the semantic space and the action space. In such scenarios, partitioning based on hard inference results results in loss of information which degrades the equalization performance. We propose a soft criterion to derive the atoms of the partition which leverages the soft decoder's output and offers a more comprehensive understanding of the semantic space's structure. Through empirical validation, we demonstrate that soft partitioning yields a more descriptive and regular partition of the space, consequently enhancing the performance of the equalization algorithm.

Updated: 2024-05-30 14:16:19

标题: 潜在空间的软分区对语义通道均衡的影响

摘要: 语义通道均衡已经成为解决多用户语义通信中语言不匹配问题的方法。这种方法旨在对齐一个编码器和一个解码器的潜在空间，这两者没有联合训练，它依赖于基于语义含义将语义（潜在）空间分成原子的分区。在这项工作中，我们探讨了语义空间分区在任务结构涉及语义空间和动作空间之间的一对多映射的情景中的作用。在这样的情景中，基于硬推断的分区结果导致信息丢失，从而降低了均衡性能。我们提出了一个软准则来推导分区的原子，利用了软解码器的输出，并提供了对语义空间结构更全面的理解。通过经验验证，我们证明软分区产生了更具描述性和规则性的空间分区，从而增强了均衡算法的性能。

更新时间: 2024-05-30 14:16:19

领域: cs.LG,cs.IT,cs.MA,math.IT

下载: http://arxiv.org/abs/2405.20085v1

Simultaneous identification of models and parameters of scientific simulators

Many scientific models are composed of multiple discrete components, and scientists often make heuristic decisions about which components to include. Bayesian inference provides a mathematical framework for systematically selecting model components, but defining prior distributions over model components and developing associated inference schemes has been challenging. We approach this problem in a simulation-based inference framework: We define model priors over candidate components and, from model simulations, train neural networks to infer joint probability distributions over both model components and associated parameters. Our method, simulation-based model inference (SBMI), represents distributions over model components as a conditional mixture of multivariate binary distributions in the Grassmann formalism. SBMI can be applied to any compositional stochastic simulator without requiring likelihood evaluations. We evaluate SBMI on a simple time series model and on two scientific models from neuroscience, and show that it can discover multiple data-consistent model configurations, and that it reveals non-identifiable model components and parameters. SBMI provides a powerful tool for data-driven scientific inquiry which will allow scientists to identify essential model components and make uncertainty-informed modelling decisions.

Updated: 2024-05-30 14:15:22

标题: 科学模拟器模型和参数的同时识别

摘要: 许多科学模型由多个离散组件组成，科学家经常对包括哪些组件做出启发式决定。贝叶斯推断提供了一个系统地选择模型组件的数学框架，但是定义模型组件的先验分布并开发相关的推断方案一直是具有挑战性的。我们在基于模拟的推断框架中解决了这个问题：我们在候选组件上定义模型先验，并从模型模拟中训练神经网络以推断模型组件和相关参数的联合概率分布。我们的方法，基于模拟的模型推断（SBMI），将模型组件的分布表示为Grassmann形式中多元二进制分布的条件混合。SBMI可以应用于任何组合随机模拟器，而不需要进行似然评估。我们在一个简单的时间序列模型和两个来自神经科学的科学模型上评估了SBMI，并展示了它可以发现多个与数据一致的模型配置，以及它揭示了非可识别的模型组件和参数。SBMI为基于数据驱动的科学探究提供了一个强大的工具，这将使科学家能够识别关键的模型组件并做出基于不确定性的建模决策。

更新时间: 2024-05-30 14:15:22

领域: cs.LG

下载: http://arxiv.org/abs/2305.15174v3

On the Limits of Multi-modal Meta-Learning with Auxiliary Task Modulation Using Conditional Batch Normalization

Few-shot learning aims to learn representations that can tackle novel tasks given a small number of examples. Recent studies show that cross-modal learning can improve representations for few-shot classification. More specifically, language is a rich modality that can be used to guide visual learning. In this work, we experiment with a multi-modal architecture for few-shot learning that consists of three components: a classifier, an auxiliary network, and a bridge network. While the classifier performs the main classification task, the auxiliary network learns to predict language representations from the same input, and the bridge network transforms high-level features of the auxiliary network into modulation parameters for layers of the few-shot classifier using conditional batch normalization. The bridge should encourage a form of lightweight semantic alignment between language and vision which could be useful for the classifier. However, after evaluating the proposed approach on two popular few-shot classification benchmarks we find that a) the improvements do not reproduce across benchmarks, and b) when they do, the improvements are due to the additional compute and parameters introduced by the bridge network. We contribute insights and recommendations for future work in multi-modal meta-learning, especially when using language representations.

Updated: 2024-05-30 14:13:05

标题: 关于使用条件批量归一化进行辅助任务调节的多模态元学习的限制

摘要: Few-shot learning旨在学习能够处理给定少量示例的新任务的表示。最近的研究表明，跨模态学习可以改善用于少样本分类的表示。更具体地说，语言是一种丰富的模态，可以用来指导视觉学习。在这项工作中，我们尝试了一种用于少样本学习的多模态架构，包括三个组件：分类器、辅助网络和桥接网络。虽然分类器执行主要的分类任务，辅助网络学习从相同输入中预测语言表示，而桥接网络将辅助网络的高级特征转换为对少样本分类器的层使用条件批量归一化的调制参数。桥接应该鼓励一种轻量级的语义对齐方式，这对分类器可能是有用的。然而，在对两个流行的少样本分类基准上评估所提出的方法后，我们发现a）改进在基准之间无法复制，b）当它们复制时，改进是由桥接网络引入的额外计算和参数造成的。我们为未来的多模态元学习工作提供见解和建议，特别是在使用语言表示时。

更新时间: 2024-05-30 14:13:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.18751v2

Interpretable classifiers for tabular data via discretization and feature selection

We introduce a method for computing immediately human interpretable yet accurate classifiers from tabular data. The classifiers obtained are short Boolean formulas, computed via first discretizing the original data and then using feature selection coupled with a very fast algorithm for producing the best possible Boolean classifier for the setting. We demonstrate the approach via 13 experiments, obtaining results with accuracies comparable to ones obtained via random forests, XGBoost, and existing results for the same datasets in the literature. In most cases, the accuracy of our method is in fact similar to that of the reference methods, even though the main objective of our study is the immediate interpretability of our classifiers. We also prove a new result on the probability that the classifier we obtain from real-life data corresponds to the ideally best classifier with respect to the background distribution the data comes from.

Updated: 2024-05-30 14:12:54

标题: 通过离散化和特征选择实现对表格数据的可解释分类器

摘要: 我们介绍了一种从表格数据中计算出即时人类可解释性且准确的分类器的方法。所得到的分类器是短布尔公式，首先通过将原始数据离散化，然后利用特征选择与一个非常快速的算法来生成最佳的布尔分类器。我们通过13个实验展示了这种方法，获得的结果准确度与文献中随机森林、XGBoost和现有结果相当。在大多数情况下，我们方法的准确度实际上与参考方法相似，尽管我们研究的主要目标是分类器的即时解释性。我们还证明了一个关于我们从真实数据中获得的分类器与数据背景分布中的理想最佳分类器对应的概率的新结果。

更新时间: 2024-05-30 14:12:54

领域: cs.LG,cs.AI,cs.LO,I.2.6; F.4.1; I.2.4; E.2

下载: http://arxiv.org/abs/2402.05680v2

Segment, Shuffle, and Stitch: A Simple Mechanism for Improving Time-Series Representations

Existing approaches for learning representations of time-series keep the temporal arrangement of the time-steps intact with the presumption that the original order is the most optimal for learning. However, non-adjacent sections of real-world time-series may have strong dependencies. Accordingly we raise the question: Is there an alternative arrangement for time-series which could enable more effective representation learning? To address this, we propose a simple plug-and-play mechanism called Segment, Shuffle, and Stitch (S3) designed to improve time-series representation learning of existing models. S3 works by creating non-overlapping segments from the original sequence and shuffling them in a learned manner that is the most optimal for the task at hand. It then re-attaches the shuffled segments back together and performs a learned weighted sum with the original input to capture both the newly shuffled sequence along with the original sequence. S3 is modular and can be stacked to create various degrees of granularity, and can be added to many forms of neural architectures including CNNs or Transformers with negligible computation overhead. Through extensive experiments on several datasets and state-of-the-art baselines, we show that incorporating S3 results in significant improvements for the tasks of time-series classification and forecasting, improving performance on certain datasets by up to 68\%. We also show that S3 makes the learning more stable with a smoother training loss curve and loss landscape compared to the original baseline. The code is available at https://github.com/shivam-grover/S3-TimeSeries .

Updated: 2024-05-30 14:11:29

标题: 分割、洗牌和拼接：改进时间序列表示的简单机制

摘要: 现有的学习时间序列表示方法保持时间步骤的时间排列不变，假设原始顺序对学习来说是最优的。然而，现实世界时间序列的非相邻部分可能具有强烈的依赖性。因此，我们提出一个问题：是否有一种替代时间序列排列方式可以实现更有效的表示学习？为了解决这个问题，我们提出了一个简单的即插即用机制，称为Segment, Shuffle, and Stitch (S3)，旨在改进现有模型的时间序列表示学习。S3通过从原始序列创建非重叠的段，并以一种学习到的最适合当前任务的方式对它们进行混洗。然后重新将混洗的段重新连接在一起，并对原始输入执行一次学习加权求和，以捕捉新混洗序列和原始序列。S3是模块化的，可以堆叠以创建各种粒度，并可以添加到许多形式的神经架构，包括CNN或Transformer，计算开销极小。通过对多个数据集和先进基线的广泛实验，我们展示了将S3纳入时间序列分类和预测任务中的显著改进，某些数据集的性能提高了高达68\%。我们还表明，与原始基线相比，S3使学习更加稳定，训练损失曲线和损失景观更加平滑。代码可在https://github.com/shivam-grover/S3-TimeSeries 获得。

更新时间: 2024-05-30 14:11:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.20082v1

NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detailed descriptions for images. Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models, leading to excessive dependence on linguistic tokens while neglecting vision information. In this paper, we propose NoiseBoost, a broadly applicable and simple method for alleviating hallucinations for MLLMs through the integration of noise feature perturbations. Noise perturbation acts as a regularizer, facilitating a balanced distribution of attention weights among visual and linguistic tokens. Despite its simplicity, NoiseBoost consistently enhances the performance of MLLMs across common training strategies, including supervised fine-tuning and reinforcement learning. Further, NoiseBoost pioneerly enables semi-supervised learning for MLLMs, unleashing the power of unlabeled data. Comprehensive experiments demonstrate that NoiseBoost improves dense caption accuracy by 8.1% with human evaluation and achieves comparable results with 50% of the data by mining unlabeled data. Code and models are available at https://kaiwu5.github.io/noiseboost.

Updated: 2024-05-30 14:11:27

标题: NoiseBoost：通过噪声扰动缓解多模态大型语言模型的虚构问题

摘要: 多模态大型语言模型（MLLMs）为理解视觉信息提供了一个强大的机制，建立在大型语言模型的基础上。然而，MLLMs以产生图片的详细描述为例时，尤其容易出现幻觉。我们的分析揭示了幻觉源于大型语言模型固有的总结机制，导致过度依赖语言标记而忽视视觉信息。在本文中，我们提出了NoiseBoost，一种广泛适用且简单的方法，通过集成噪声特征扰动来减轻MLLMs的幻觉。噪声扰动充当正则化器，促进视觉和语言标记之间注意权重的平衡分配。尽管其简单性，NoiseBoost在常见的训练策略中始终提高了MLLMs的性能，包括监督微调和强化学习。此外，NoiseBoost首次实现了MLLMs的半监督学习，释放了无标签数据的潜力。全面的实验证明，NoiseBoost通过人工评估提高了密集字幕的准确性8.1％，并通过挖掘无标签数据实现了50％数据的可比结果。代码和模型可在https://kaiwu5.github.io/noiseboost 上获得。

更新时间: 2024-05-30 14:11:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.20081v1

Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning

Intelligent Tutoring Systems (ITS) enhance personalized learning by predicting student answers to provide immediate and customized instruction. However, recent research has primarily focused on the correctness of the answer rather than the student's performance on specific answer choices, limiting insights into students' thought processes and potential misconceptions. To address this gap, we present MCQStudentBert, an answer forecasting model that leverages the capabilities of Large Language Models (LLMs) to integrate contextual understanding of students' answering history along with the text of the questions and answers. By predicting the specific answer choices students are likely to make, practitioners can easily extend the model to new answer choices or remove answer choices for the same multiple-choice question (MCQ) without retraining the model. In particular, we compare MLP, LSTM, BERT, and Mistral 7B architectures to generate embeddings from students' past interactions, which are then incorporated into a finetuned BERT's answer-forecasting mechanism. We apply our pipeline to a dataset of language learning MCQ, gathered from an ITS with over 10,000 students to explore the predictive accuracy of MCQStudentBert, which incorporates student interaction patterns, in comparison to correct answer prediction and traditional mastery-learning feature-based approaches. This work opens the door to more personalized content, modularization, and granular support.

Updated: 2024-05-30 14:09:43

标题: 学生答案预测：基于Transformer的语言学习答案选择预测

摘要: 智能辅导系统（ITS）通过预测学生答案以提供即时和定制化的指导，增强了个性化学习。然而，最近的研究主要集中在答案的正确性上，而不是学生对特定答案选择的表现，这限制了对学生思维过程和潜在误解的洞察。为了解决这一问题，我们提出了MCQStudentBert，一个答案预测模型，利用大型语言模型（LLMs）的能力，将学生的答题历史的上下文理解与问题和答案的文本相结合。通过预测学生可能会做出的具体答案选择，从业者可以轻松地将模型扩展到新的答案选择，或者删除相同的多项选择题（MCQ）的答案选择，而无需重新训练模型。具体来说，我们比较了MLP、LSTM、BERT和Mistral 7B架构，以生成学生过去互动的嵌入，并将其纳入微调的BERT的答案预测机制中。我们将我们的管道应用到一个语言学习MCQ的数据集中，该数据集来自一个具有超过10,000名学生的ITS，以探索MCQStudentBert的预测准确性，该模型融合了学生互动模式，与正确答案预测和传统掌握学习特征为基础的方法进行比较。这项工作为更个性化的内容、模块化和精细支持打开了大门。

更新时间: 2024-05-30 14:09:43

领域: cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2405.20079v1

Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models

In the face of uncertainty, the ability to *seek information* is of fundamental importance. In many practical applications, such as medical diagnosis and troubleshooting, the information needed to solve the task is not initially given and has to be actively sought by asking follow-up questions (for example, a doctor asking a patient for more details about their symptoms). In this work, we introduce Uncertainty of Thoughts (UoT), an algorithm to augment large language models with the ability to actively seek information by asking effective questions. UoT combines 1) an *uncertainty-aware simulation approach* which enables the model to simulate possible future scenarios and how likely they are to occur, 2) *uncertainty-based rewards* motivated by information gain which incentivizes the model to seek information, and 3) a *reward propagation scheme* to select the optimal question to ask in a way that maximizes the expected reward. In experiments on medical diagnosis, troubleshooting, and the `20 Questions` game, UoT achieves an average performance improvement of 38.1% in the rate of successful task completion across multiple LLMs compared with direct prompting and also improves efficiency (i.e., the number of questions needed to complete the task). Our code has been released [here](https://github.com/zhiyuanhubj/UoT)

Updated: 2024-05-30 14:03:35

标题: 思维的不确定性：不确定性感知规划增强大型语言模型中的信息搜索

摘要: 在面对不确定性时，寻求信息的能力至关重要。在许多实际应用中，如医学诊断和故障排除，解决任务所需的信息并未一开始就给出，而必须通过提出后续问题（例如，医生询问患者有关症状的更多细节）来主动寻求。在这项工作中，我们介绍了一种称为思维不确定性（UoT）的算法，它可以增强大型语言模型的能力，通过提出有效的问题来主动寻求信息。UoT结合了以下三个方面：1）一种基于不确定性的模拟方法，使模型能够模拟可能的未来情景以及它们发生的可能性有多大；2）基于信息增益的不确定性奖励，激励模型寻求信息；3）奖励传播方案，以最大化期望奖励选择要问的最佳问题。在医学诊断、故障排除和“20个问题”游戏的实验中，与直接提示相比，UoT在多个大型语言模型中取得了成功任务完成率平均提高38.1％的表现，并且提高了效率（即完成任务所需的问题数量）。我们的代码已经发布在[这里](https://github.com/zhiyuanhubj/UoT)。

更新时间: 2024-05-30 14:03:35

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.03271v2

A Staged Approach using Machine Learning and Uncertainty Quantification to Predict the Risk of Hip Fracture

Despite advancements in medical care, hip fractures impose a significant burden on individuals and healthcare systems. This paper focuses on the prediction of hip fracture risk in older and middle-aged adults, where falls and compromised bone quality are predominant factors. We propose a novel staged model that combines advanced imaging and clinical data to improve predictive performance. By using CNNs to extract features from hip DXA images, along with clinical variables, shape measurements, and texture features, our method provides a comprehensive framework for assessing fracture risk. A staged machine learning-based model was developed using two ensemble models: Ensemble 1 (clinical variables only) and Ensemble 2 (clinical variables and DXA imaging features). This staged approach used uncertainty quantification from Ensemble 1 to decide if DXA features are necessary for further prediction. Ensemble 2 exhibited the highest performance, achieving an AUC of 0.9541, an accuracy of 0.9195, a sensitivity of 0.8078, and a specificity of 0.9427. The staged model also performed well, with an AUC of 0.8486, an accuracy of 0.8611, a sensitivity of 0.5578, and a specificity of 0.9249, outperforming Ensemble 1, which had an AUC of 0.5549, an accuracy of 0.7239, a sensitivity of 0.1956, and a specificity of 0.8343. Furthermore, the staged model suggested that 54.49% of patients did not require DXA scanning. It effectively balanced accuracy and specificity, offering a robust solution when DXA data acquisition is not always feasible. Statistical tests confirmed significant differences between the models, highlighting the advantages of the advanced modeling strategies. Our staged approach could identify individuals at risk with a high accuracy but reduce the unnecessary DXA scanning. It has great promise to guide interventions to prevent hip fractures with reduced cost and radiation.

Updated: 2024-05-30 14:01:02

标题: 一种使用机器学习和不确定性量化的分阶段方法来预测髋部骨折风险

摘要: 尽管医疗护理取得了进展，但髋部骨折对个人和医疗系统造成了重大负担。本文侧重于预测老年和中年成年人髋部骨折风险，其中跌倒和骨质受损是主要因素。我们提出了一个结合先进成像和临床数据的新型分阶段模型，以提高预测性能。通过使用卷积神经网络从髋部DXA图像中提取特征，以及临床变量、形状测量和纹理特征，我们的方法为评估骨折风险提供了一个全面的框架。开发了一个基于机器学习的分阶段模型，使用两个集成模型：集成1（仅临床变量）和集成2（临床变量和DXA成像特征）。这种分阶段方法利用了集成1的不确定性量化，以决定是否需要进一步预测DXA特征。集成2表现最佳，实现了0.9541的AUC，0.9195的准确度，0.8078的敏感度和0.9427的特异性。分阶段模型也表现良好，具有0.8486的AUC，0.8611的准确度，0.5578的敏感度和0.9249的特异性，优于集成1，后者具有0.5549的AUC，0.7239的准确度，0.1956的敏感度和0.8343的特异性。此外，分阶段模型表明，54.49%的患者不需要进行DXA扫描。它有效地平衡了准确度和特异性，在DXA数据获取并非总是可行时提供了一个稳健的解决方案。统计测试证实了模型之间的显著差异，突出了先进建模策略的优势。我们的分阶段方法能够高准确度地识别风险个体，同时减少不必要的DXA扫描。它有望指导预防髋部骨折的干预，降低成本和辐射。

更新时间: 2024-05-30 14:01:02

领域: physics.med-ph,cs.LG

下载: http://arxiv.org/abs/2405.20071v1

Debating with More Persuasive LLMs Leads to More Truthful Answers

Common methods for aligning large language models (LLMs) with desired behaviour heavily rely on human-labelled data. However, as models grow increasingly sophisticated, they will surpass human expertise, and the role of human evaluation will evolve into non-experts overseeing experts. In anticipation of this, we ask: can weaker models assess the correctness of stronger models? We investigate this question in an analogous setting, where stronger models (experts) possess the necessary information to answer questions and weaker models (non-experts) lack this information. The method we evaluate is debate, where two LLM experts each argue for a different answer, and a non-expert selects the answer. We find that debate consistently helps both non-expert models and humans answer questions, achieving 76% and 88% accuracy respectively (naive baselines obtain 48% and 60%). Furthermore, optimising expert debaters for persuasiveness in an unsupervised manner improves non-expert ability to identify the truth in debates. Our results provide encouraging empirical evidence for the viability of aligning models with debate in the absence of ground truth.

Updated: 2024-05-30 13:59:34

标题: 与更具有说服力的LLMs辩论会导致更真实的回答

摘要: 大型语言模型（LLMs）与期望行为对齐的常见方法严重依赖于人工标注的数据。然而，随着模型变得越来越复杂，它们将超越人类专业知识，人类评估的角色将演变为非专家监督专家。为了预见这一点，我们问：较弱的模型能否评估更强大的模型的正确性？我们在一个类似的设置中调查了这个问题，其中更强大的模型（专家）拥有回答问题所需的信息，而较弱的模型（非专家）缺乏这些信息。我们评估的方法是辩论，其中两个LLM专家各自辩论不同的答案，而非专家选择答案。我们发现，辩论始终有助于非专家模型和人类回答问题，分别达到76%和88%的准确率（天真基线分别获得48%和60%）。此外，以无监督的方式优化专家辩手的说服力有助于提高非专家在辩论中识别真相的能力。我们的结果为在没有基准真相的情况下通过辩论来对齐模型提供了令人鼓舞的经验证据。

更新时间: 2024-05-30 13:59:34

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.06782v3

Cross-Lingual Knowledge Editing in Large Language Models

Knowledge editing aims to change language models' performance on several special cases (i.e., editing scope) by infusing the corresponding expected knowledge into them. With the recent advancements in large language models (LLMs), knowledge editing has been shown as a promising technique to adapt LLMs to new knowledge without retraining from scratch. However, most of the previous studies neglect the multi-lingual nature of some main-stream LLMs (e.g., LLaMA, ChatGPT and GPT-4), and typically focus on monolingual scenarios, where LLMs are edited and evaluated in the same language. As a result, it is still unknown the effect of source language editing on a different target language. In this paper, we aim to figure out this cross-lingual effect in knowledge editing. Specifically, we first collect a large-scale cross-lingual synthetic dataset by translating ZsRE from English to Chinese. Then, we conduct English editing on various knowledge editing methods covering different paradigms, and evaluate their performance in Chinese, and vice versa. To give deeper analyses of the cross-lingual effect, the evaluation includes four aspects, i.e., reliability, generality, locality and portability. Furthermore, we analyze the inconsistent behaviors of the edited models and discuss their specific challenges. Data and codes are available at https://github.com/krystalan/Bi_ZsRE

Updated: 2024-05-30 13:49:47

标题: 大型语言模型中的跨语言知识编辑

摘要: 知识编辑旨在通过向语言模型注入相应的期望知识，改变语言模型在几个特殊情况下的表现（即编辑范围）。随着大型语言模型（LLMs）的最新进展，知识编辑被证明是一种有希望的技术，可以使LLMs适应新知识，而无需从头开始重新训练。然而，大多数先前的研究忽视了一些主流LLMs（例如LLaMA、ChatGPT和GPT-4）的多语言性质，并且通常专注于单语境境况，其中LLMs在相同语言中进行编辑和评估。因此，仍然不清楚源语言编辑对不同目标语言的影响。在本文中，我们旨在探索知识编辑中的跨语言效应。具体来说，我们首先通过将英语中的ZsRE翻译成中文，收集了一个大规模的跨语言合成数据集。然后，我们对涵盖不同范式的各种知识编辑方法进行英语编辑，并在中文中评估它们的性能，反之亦然。为了更深入地分析跨语言效应，评估包括可靠性、普适性、局部性和可移植性四个方面。此外，我们分析了编辑模型的不一致行为，并讨论它们的具体挑战。数据和代码可在https://github.com/krystalan/Bi_ZsRE 上找到。

更新时间: 2024-05-30 13:49:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2309.08952v2

Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation

Separating vocal elements from musical tracks is a longstanding challenge in audio signal processing. This study tackles the distinct separation of vocal components from musical spectrograms. We employ the Short Time Fourier Transform (STFT) to extract audio waves into detailed frequency-time spectrograms, utilizing the benchmark MUSDB18 dataset for music separation. Subsequently, we implement a UNet neural network to segment the spectrogram image, aiming to delineate and extract singing voice components accurately. We achieved noteworthy results in audio source separation using of our U-Net-based models. The combination of frequency-axis normalization with Min/Max scaling and the Mean Absolute Error (MAE) loss function achieved the highest Source-to-Distortion Ratio (SDR) of 7.1 dB, indicating a high level of accuracy in preserving the quality of the original signal during separation. This setup also recorded impressive Source-to-Interference Ratio (SIR) and Source-to-Artifact Ratio (SAR) scores of 25.2 dB and 7.2 dB, respectively. These values significantly outperformed other configurations, particularly those using Quantile-based normalization or a Mean Squared Error (MSE) loss function. Our source code, model weights, and demo material can be found at the project's GitHub repository: https://github.com/mbrotos/SoundSeg

Updated: 2024-05-30 13:47:53

标题: 歌声的频谱映射：U-Net辅助的声音分割

摘要: 将声乐元素与音乐轨道分离是音频信号处理中长期存在的挑战。本研究致力于从音乐频谱图中单独分离声乐组件。我们使用短时傅立叶变换（STFT）将音频波形提取为详细的频率-时间谱图，利用标准的MUSDB18数据集进行音乐分离。随后，我们实现了一个UNet神经网络来分割谱图图像，旨在准确勾画和提取歌唱声音组件。我们使用基于U-Net的模型在音频源分离方面取得了显著的成果。频率轴归一化与最小/最大缩放以及平均绝对误差（MAE）损失函数的组合实现了最高的信号失真比（SDR）为7.1 dB，表明在分离过程中保持原始信号质量的高准确性水平。这一设置还记录了令人印象深刻的信号干扰比（SIR）和信号伪像比（SAR）分别为25.2 dB和7.2 dB的得分。这些数值明显优于其他配置，特别是使用基于分位数的归一化或均方误差（MSE）损失函数的配置。我们的源代码、模型权重和演示材料可以在项目的GitHub存储库中找到：https://github.com/mbrotos/SoundSeg

更新时间: 2024-05-30 13:47:53

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2405.20059v1

An Efficient and Multi-private Key Secure Aggregation for Federated Learning

With the emergence of privacy leaks in federated learning, secure aggregation protocols that mainly adopt either homomorphic encryption or threshold secret sharing have been widely developed for federated learning to protect the privacy of the local training data of each client. However, these existing protocols suffer from many shortcomings, such as the dependence on a trusted third party, the vulnerability to clients being corrupted, low efficiency, the trade-off between security and fault tolerance, etc. To solve these disadvantages, we propose an efficient and multi-private key secure aggregation scheme for federated learning. Specifically, we skillfully modify the variant ElGamal encryption technique to achieve homomorphic addition operation, which has two important advantages: 1) The server and each client can freely select public and private keys without introducing a trust third party and 2) Compared to the variant ElGamal encryption, the plaintext space is relatively large, which is more suitable for the deep model. Besides, for the high dimensional deep model parameter, we introduce a super-increasing sequence to compress multi-dimensional data into 1-D, which can greatly reduce encryption and decryption times as well as communication for ciphertext transmission. Detailed security analyses show that our proposed scheme achieves the semantic security of both individual local gradients and the aggregated result while achieving optimal robustness in tolerating both client collusion and dropped clients. Extensive simulations demonstrate that the accuracy of our scheme is almost the same as the non-private approach, while the efficiency of our scheme is much better than the state-of-the-art homomorphic encryption-based secure aggregation schemes. More importantly, the efficiency advantages of our scheme will become increasingly prominent as the number of model parameters increases.

Updated: 2024-05-30 13:46:34

标题: 一种高效且多私钥安全的联邦学习聚合方法

摘要: 随着联邦学习中隐私泄漏的出现，主要采用同态加密或阈值秘密共享的安全聚合协议已广泛发展用于保护每个客户端的本地训练数据的隐私。然而，这些现有协议存在许多缺点，如依赖于可信第三方、容易受到客户端腐败、效率低下、安全与容错之间的权衡等。为了解决这些缺点，我们提出了一种高效的、多私钥安全聚合方案，适用于联邦学习。具体来说，我们巧妙地修改变种ElGamal加密技术，实现同态加法操作，具有两个重要优点：1）服务器和每个客户端可以自由选择公钥和私钥，而无需引入可信第三方；2）与变种ElGamal加密相比，明文空间相对较大，更适合深度模型。此外，针对高维深度模型参数，我们引入一个超递增序列将多维数据压缩为1-D，可以大大减少加密和解密次数以及密文传输的通信量。详细的安全分析显示，我们提出的方案在实现个体本地梯度和聚合结果的语义安全的同时，实现了在容忍客户端勾结和离线客户端方面的最佳鲁棒性。大量模拟表明，我们方案的准确性几乎与非私密方法相同，而我们方案的效率远远优于基于同态加密的安全聚合方案。更重要的是，随着模型参数数量的增加，我们方案的效率优势将变得越来越突出。

更新时间: 2024-05-30 13:46:34

领域: cs.CR,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2306.08970v2

Algebraic and Statistical Properties of the Ordinary Least Squares Interpolator

Deep learning research has uncovered the phenomenon of benign overfitting for overparameterized statistical models, which has drawn significant theoretical interest in recent years. Given its simplicity and practicality, the ordinary least squares (OLS) interpolator has become essential to gain foundational insights into this phenomenon. While properties of OLS are well established in classical, underparameterized settings, its behavior in high-dimensional, overparameterized regimes is less explored (unlike for ridge or lasso regression) though significant progress has been made of late. We contribute to this growing literature by providing fundamental algebraic and statistical results for the minimum $\ell_2$-norm OLS interpolator. In particular, we provide algebraic equivalents of (i) the leave-$k$-out residual formula, (ii) Cochran's formula, and (iii) the Frisch-Waugh-Lovell theorem in the overparameterized regime. These results aid in understanding the OLS interpolator's ability to generalize and have substantive implications for causal inference. Under the Gauss-Markov model, we present statistical results such as an extension of the Gauss-Markov theorem and an analysis of variance estimation under homoskedastic errors for the overparameterized regime. To substantiate our theoretical contributions, we conduct simulations that further explore the stochastic properties of the OLS interpolator.

Updated: 2024-05-30 13:43:44

标题: 最小二乘插值器的代数和统计性质

摘要: 深度学习研究揭示了对于过度参数化的统计模型，出现了良性过拟合现象，这在近年来引起了重要的理论兴趣。鉴于其简单性和实用性，普通最小二乘（OLS）插值器已经成为获得对这一现象基础洞察力所必不可少的工具。虽然OLS的性质在经典的、欠参数化的设置中已经被充分建立，但其在高维、过度参数化的情况下的行为却较少被探讨（不像岭回归或套索回归那样），尽管最近已经取得了重要进展。我们通过为最小l2范数OLS插值器提供基础代数和统计结果，为这一不断增长的文献贡献了一份力量。具体地，我们提供了在过度参数化领域中等价的代数形式的（i）留下k个观测值的残差公式，（ii）Cochran公式，以及（iii）Frisch-Waugh-Lovell定理。这些结果有助于理解OLS插值器的泛化能力，并对因果推断具有实质性影响。在高斯-马可夫模型下，我们提出了统计结果，如高斯-马可夫定理的延伸以及在过度参数化领域下同方差误差的方差估计。为了证实我们的理论贡献，我们进行了模拟实验，进一步探讨OLS插值器的随机性质。

更新时间: 2024-05-30 13:43:44

领域: math.ST,cs.LG,stat.ME,stat.TH

下载: http://arxiv.org/abs/2309.15769v2

Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads

Pre-trained Language Models (LMs) exhibit strong zero-shot and in-context learning capabilities; however, their behaviors are often difficult to control. By utilizing Reinforcement Learning from Human Feedback (RLHF), it is possible to fine-tune unsupervised LMs to follow instructions and produce outputs that reflect human preferences. Despite its benefits, RLHF has been shown to potentially harm a language model's reasoning capabilities and introduce artifacts such as hallucinations where the model may fabricate facts. To address this issue we introduce Direct Preference Heads (DPH), a fine-tuning framework that enables LMs to learn human preference signals through an auxiliary reward head without directly affecting the output distribution of the language modeling head. We perform a theoretical analysis of our objective function and find strong ties to Conservative Direct Preference Optimization (cDPO). Finally we evaluate our models on GLUE, RACE, and the GPT4All evaluation suite and demonstrate that our method produces models which achieve higher scores than those fine-tuned with Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO) alone.

Updated: 2024-05-30 13:38:52

标题: 我会欺骗你吗？使用直接偏好头的语言模型推理时间对齐

摘要: 预训练语言模型（LMs）表现出强大的零-shot和上下文学习能力；然而，它们的行为通常难以控制。通过利用来自人类反馈的强化学习（RLHF），可以对无监督的LMs进行微调，使其遵循指令并产生反映人类偏好的输出。尽管具有诸多好处，RLHF被证明可能损害语言模型的推理能力，并引入幻觉等人工现象，其中模型可能捏造事实。为了解决这个问题，我们引入了Direct Preference Heads（DPH），这是一个微调框架，使LMs能够通过辅助奖励头学习人类偏好信号，而不直接影响语言建模头的输出分布。我们对我们的目标函数进行理论分析，并发现与保守的直接偏好优化（cDPO）有着密切联系。最后，我们在GLUE、RACE和GPT4All评估套件上评估我们的模型，并证明我们的方法产生的模型比仅进行监督微调（SFT）或仅进行直接偏好优化（DPO）的模型取得更高的分数。

更新时间: 2024-05-30 13:38:52

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20053v1

Is Complexity an Illusion?

Simplicity is held by many to be the key to general intelligence. Simpler models tend to "generalise", identifying the cause or generator of data with greater sample efficiency. The implications of the correlation between simplicity and generalisation extend far beyond computer science, addressing questions of physics and even biology. Yet simplicity is a property of form, while generalisation is of function. In interactive settings, any correlation between the two depends on interpretation. In theory there could be no correlation and yet in practice, there is. Previous theoretical work showed generalisation to be a consequence of "weak" constraints implied by function, not form. Experiments demonstrated choosing weak constraints over simple forms yielded a 110-500% improvement in generalisation rate. Here we show that all constraints can take equally simple forms, regardless of weakness. However if forms are spatially extended, then function is represented using a finite subset of forms. If function is represented using a finite subset of forms, then we can force a correlation between simplicity and generalisation by making weak constraints take simple forms. If function is determined by a goal directed process that favours versatility (e.g. natural selection), then efficiency demands weak constraints take simple forms. Complexity has no causal influence on generalisation, but appears to due to confounding.

Updated: 2024-05-30 13:38:42

标题: 复杂性是一种幻觉吗？

摘要: 简单被许多人认为是智能的关键。简单的模型往往会“泛化”，以更高的样本效率识别数据的原因或生成器。简单和泛化之间的相关性的含义远远超出了计算机科学的范畴，涉及物理甚至生物学的问题。然而，简单是一种形式的特性，而泛化是一种功能的特性。在交互设置中，两者之间的任何相关性取决于解释。理论上可能不存在相关性，但在实践中存在。先前的理论工作表明，泛化是由功能暗示的“弱”约束的结果，而不是形式。实验证明，选择弱约束而非简单形式可以使泛化率提高110-500%。在这里，我们展示了所有约束都可以采用同样简单的形式，无论强度如何。然而，如果形式在空间上延伸，那么功能就可以用有限形式的子集来表示。如果将功能表示为有限形式的子集，则可以通过使弱约束取简单形式来强制简单和泛化之间存在相关性。如果功能由倾向于多功能性的目标导向过程确定（例如自然选择），则效率要求弱约束采用简单形式。复杂性对泛化没有因果影响，但似乎是由于混杂引起的。

更新时间: 2024-05-30 13:38:42

领域: cs.AI

下载: http://arxiv.org/abs/2404.07227v4

Threshold-Independent Fair Matching through Score Calibration

Entity Matching (EM) is a critical task in numerous fields, such as healthcare, finance, and public administration, as it identifies records that refer to the same entity within or across different databases. EM faces considerable challenges, particularly with false positives and negatives. These are typically addressed by generating matching scores and apply thresholds to balance false positives and negatives in various contexts. However, adjusting these thresholds can affect the fairness of the outcomes, a critical factor that remains largely overlooked in current fair EM research. The existing body of research on fair EM tends to concentrate on static thresholds, neglecting their critical impact on fairness. To address this, we introduce a new approach in EM using recent metrics for evaluating biases in score based binary classification, particularly through the lens of distributional parity. This approach enables the application of various bias metrics like equalized odds, equal opportunity, and demographic parity without depending on threshold settings. Our experiments with leading matching methods reveal potential biases, and by applying a calibration technique for EM scores using Wasserstein barycenters, we not only mitigate these biases but also preserve accuracy across real world datasets. This paper contributes to the field of fairness in data cleaning, especially within EM, which is a central task in data cleaning, by promoting a method for generating matching scores that reduce biases across different thresholds.

Updated: 2024-05-30 13:37:53

标题: 不受阈值限制的公平匹配通过分数校准

摘要: 实体匹配（EM）是许多领域中的关键任务，例如医疗保健、金融和公共行政，因为它可以识别在不同数据库内或跨数据库之间引用同一实体的记录。EM面临着相当大的挑战，尤其是在处理假阳性和假阴性时。通常通过生成匹配分数并应用阈值来平衡各种上下文中的假阳性和假阴性来解决这些问题。然而，调整这些阈值可能会影响结果的公平性，这是当前公平EM研究中被大多数人忽视的一个关键因素。现有的公平EM研究主要集中在静态阈值上，忽视了它们对公平性的重要影响。为了解决这个问题，我们介绍了一种新的EM方法，使用最近的评估基于分数的二元分类中偏见的度量标准，特别是通过分布平等的视角。这种方法使得可以应用各种偏见度量标准，如平等几率、平等机会和人口平等，而不依赖于阈值设定。我们对主要匹配方法进行的实验显示潜在的偏见，并通过使用Wasserstein贝叶斯中心对EM分数进行校准技术，不仅可以减少这些偏见，还可以在现实世界数据集中保持准确性。本文对数据清洗领域，特别是EM中的公平性做出了贡献，通过推广一种生成匹配分数的方法，可以减少在不同阈值下的偏见。

更新时间: 2024-05-30 13:37:53

领域: cs.LG,cs.DB

下载: http://arxiv.org/abs/2405.20051v1

Cross-Training with Multi-View Knowledge Fusion for Heterogenous Federated Learning

Federated learning benefits from cross-training strategies, which enables models to train on data from distinct sources to improve the generalization capability. However, the data heterogeneity between sources may lead models to gradually forget previously acquired knowledge when undergoing cross-training to adapt to new tasks or data sources. We argue that integrating personalized and global knowledge to gather information from multiple perspectives could potentially improve performance. To achieve this goal, this paper presents a novel approach that enhances federated learning through a cross-training scheme incorporating multi-view information. Specifically, the proposed method, termed FedCT, includes three main modules, where the consistency-aware knowledge broadcasting module aims to optimize model assignment strategies, which enhances collaborative advantages between clients and achieves an efficient federated learning process. The multi-view knowledge-guided representation learning module leverages fused prototypical knowledge from both global and local views to enhance the preservation of local knowledge before and after model exchange, as well as to ensure consistency between local and global knowledge. The mixup-based feature augmentation module aggregates rich information to further increase the diversity of feature spaces, which enables the model to better discriminate complex samples. Extensive experiments were conducted on four datasets in terms of performance comparison, ablation study, in-depth analysis and case study. The results demonstrated that FedCT alleviates knowledge forgetting from both local and global views, which enables it outperform state-of-the-art methods.

Updated: 2024-05-30 13:27:30

标题: 多视角知识融合的异构联邦学习交叉训练

摘要: 联邦学习受益于跨训练策略，这使得模型能够在来自不同来源的数据上进行训练，以提高泛化能力。然而，不同来源之间的数据异质性可能导致模型在进行跨训练以适应新任务或数据来源时逐渐遗忘先前获得的知识。我们认为，整合个性化和全局知识以从多个角度收集信息可能会提高性能。为实现这一目标，本文提出了一种通过融合多视图信息增强联邦学习的新方法。具体而言，所提出的方法，称为FedCT，包括三个主要模块，其中一致性感知知识广播模块旨在优化模型分配策略，增强客户之间的协作优势，并实现高效的联邦学习过程。多视图知识引导表示学习模块利用来自全局和局部视图的融合原型知识来增强模型交换前后局部知识的保留，并确保局部和全局知识之间的一致性。基于混合的特征增强模块聚合丰富的信息，进一步增加特征空间的多样性，使模型能够更好地区分复杂样本。在性能比较、消融研究、深入分析和案例研究方面对四个数据集进行了广泛实验。结果表明，FedCT缓解了从局部和全局视图遗忘知识，使得其优于最先进的方法。

更新时间: 2024-05-30 13:27:30

领域: cs.AI

下载: http://arxiv.org/abs/2405.20046v1

Iterative Learning Control of Fast, Nonlinear, Oscillatory Dynamics (Preprint)

The sudden onset of deleterious and oscillatory dynamics (often called instabilities) is a known challenge in many fluid, plasma, and aerospace systems. These dynamics are difficult to address because they are nonlinear, chaotic, and are often too fast for active control schemes. In this work, we develop an alternative active controls system using an iterative, trajectory-optimization and parameter-tuning approach based on Iterative Learning Control (ILC), Time-Lagged Phase Portraits (TLPP) and Gaussian Process Regression (GPR). The novelty of this approach is that it can control a system's dynamics despite the controller being much slower than the dynamics. We demonstrate this controller on the Lorenz system of equations where it iteratively adjusts (tunes) the system's input parameters to successfully reproduce a desired oscillatory trajectory or state. Additionally, we investigate the system's dynamical sensitivity to its control parameters, identify continuous and bounded regions of desired dynamical trajectories, and demonstrate that the controller is robust to missing information and uncontrollable parameters as long as certain requirements are met. The controller presented in this work provides a framework for low-speed control for a variety of fast, nonlinear systems that may aid in instability suppression and mitigation.

Updated: 2024-05-30 13:27:17

标题: 快速、非线性、振荡动态的迭代学习控制（预印本）

摘要: 突然发生的有害和振荡动态（通常称为不稳定性）是许多流体、等离子体和航空航天系统中的一个已知挑战。这些动态很难解决，因为它们是非线性的、混沌的，并且通常对于主动控制方案来说太快了。在这项工作中，我们开发了一种基于迭代学习控制（ILC）、滞后相位图（TLPP）和高斯过程回归（GPR）的替代主动控制系统，采用迭代、轨迹优化和参数调整的方法。这种方法的新颖之处在于它可以控制系统的动态，尽管控制器比动态要慢得多。我们在洛伦兹方程组上展示了这种控制器，它通过迭代调整（调谐）系统的输入参数，成功地重现了期望的振荡轨迹或状态。此外，我们研究了系统对其控制参数的动态敏感性，识别了连续且有界的期望动态轨迹区域，并证明了只要满足特定要求，控制器对缺失信息和不可控参数是鲁棒的。本工作中提出的控制器为各种快速、非线性系统提供了低速控制的框架，可能有助于抑制和减轻不稳定性。

更新时间: 2024-05-30 13:27:17

领域: cs.LG,cs.SY,eess.SY,math.DS

下载: http://arxiv.org/abs/2405.20045v1

CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval

Composed Image Retrieval (CIR) involves searching for target images based on an image-text pair query. While current methods treat this as a query-target matching problem, we argue that CIR triplets contain additional associations beyond this primary relation. In our paper, we identify two new relations within triplets, treating each triplet as a graph node. Firstly, we introduce the concept of text-bridged image alignment, where the query text serves as a bridge between the query image and the target image. We propose a hinge-based cross-attention mechanism to incorporate this relation into network learning. Secondly, we explore complementary text reasoning, considering CIR as a form of cross-modal retrieval where two images compose to reason about complementary text. To integrate these perspectives effectively, we design a twin attention-based compositor. By combining these complementary associations with the explicit query pair-target image relation, we establish a comprehensive set of constraints for CIR. Our framework, CaLa (Complementary Association Learning for Augmenting Composed Image Retrieval), leverages these insights. We evaluate CaLa on CIRR and FashionIQ benchmarks with multiple backbones, demonstrating its superiority in composed image retrieval.

Updated: 2024-05-30 13:26:43

标题: CaLa：用于增强组合图像检索的互补关联学习

摘要: 组合图像检索（CIR）涉及基于图像-文本对查询搜索目标图像。尽管当前方法将其视为一个查询-目标匹配问题，但我们认为CIR三元组包含超出这种主要关系的额外关联。在我们的论文中，我们确定了三元组中的两种新关系，将每个三元组视为一个图节点。首先，我们引入了文本桥接图像对齐的概念，其中查询文本充当查询图像和目标图像之间的桥梁。我们提出了基于铰链的交叉注意力机制，将这种关系纳入网络学习中。其次，我们探讨了互补文本推理，将CIR视为一种交叉模态检索形式，其中两幅图像组合以推理互补文本。为了有效整合这些观点，我们设计了一个基于双注意力的组合器。通过将这些互补关联与显式查询对-目标图像关系相结合，我们为CIR建立了一套全面的约束。我们的框架CaLa（用于增强组合图像检索的互补关联学习）利用了这些见解。我们在CIRR和FashionIQ基准测试中使用多个主干网络评估CaLa，展示了其在组合图像检索中的优越性。

更新时间: 2024-05-30 13:26:43

领域: cs.CV,cs.AI,cs.IR

下载: http://arxiv.org/abs/2405.19149v2

Proof of Quality: A Costless Paradigm for Trustless Generative AI Model Inference on Blockchains

Generative AI models, such as GPT-4 and Stable Diffusion, have demonstrated powerful and disruptive capabilities in natural language and image tasks. However, deploying these models in decentralized environments remains challenging. Unlike traditional centralized deployment, systematically guaranteeing the integrity of AI model services in fully decentralized environments, particularly on trustless blockchains, is both crucial and difficult. In this paper, we present a new inference paradigm called \emph{proof of quality} (PoQ) to enable the deployment of arbitrarily large generative models on blockchain architecture. Unlike traditional approaches based on validating inference procedures, such as ZKML or OPML, our PoQ paradigm focuses on the outcome quality of model inference. Using lightweight BERT-based cross-encoders as our underlying quality evaluation model, we design and implement PQML, the first practical protocol for real-world NLP generative model inference on blockchains, tailored for popular open-source models such as Llama 3 and Mixtral. Our analysis demonstrates that our protocol is robust against adversarial but rational participants in ecosystems, where lazy or dishonest behavior results in fewer benefits compared to well-behaving participants. The computational overhead of validating the quality evaluation is minimal, allowing quality validators to complete the quality check within a second, even using only a CPU. Preliminary simulation results show that PoQ consensus is generated in milliseconds, 1,000 times faster than any existing scheme.

Updated: 2024-05-30 13:26:35

标题: 质量证明：一种无成本的信任生成式AI模型推断区块链范式

摘要: 生成式AI模型，如GPT-4和Stable Diffusion，在自然语言和图像任务中展示了强大和颠覆性的能力。然而，在分散环境中部署这些模型仍然具有挑战性。与传统的集中式部署不同，在完全分散的环境中系统地保证AI模型服务的完整性，特别是在不信任的区块链上，既至关重要又困难。在本文中，我们提出了一种名为“质量证明”（PoQ）的新推理范式，以实现在区块链架构上部署任意大的生成式模型。与基于验证推理过程的传统方法，如ZKML或OPML不同，我们的PoQ范式侧重于模型推理的结果质量。利用基于轻量级BERT的交叉编码器作为我们的底层质量评估模型，我们设计并实现了PQML，这是第一个针对流行的开源模型，如Llama 3和Mixtral，定制的用于区块链上真实世界自然语言生成模型推理的实用协议。我们的分析表明，我们的协议对生态系统中的敌对但理性的参与者具有鲁棒性，在那里懒惰或不诚实的行为导致与表现良好的参与者相比获益更少。验证质量评估的计算开销很小，使质量验证者能够在一秒内完成质量检查，即使仅使用CPU。初步的模拟结果显示，PoQ共识在毫秒内生成，比任何现有方案快1000倍。

更新时间: 2024-05-30 13:26:35

领域: cs.AI

下载: http://arxiv.org/abs/2405.17934v2

Task-Agnostic Machine Learning-Assisted Inference

Machine learning (ML) is playing an increasingly important role in scientific research. In conjunction with classical statistical approaches, ML-assisted analytical strategies have shown great promise in accelerating research findings. This has also opened up a whole new field of methodological research focusing on integrative approaches that leverage both ML and statistics to tackle data science challenges. One type of study that has quickly gained popularity employs ML to predict unobserved outcomes in massive samples and then uses the predicted outcomes in downstream statistical inference. However, existing methods designed to ensure the validity of this type of post-prediction inference are limited to very basic tasks such as linear regression analysis. This is because any extension of these approaches to new, more sophisticated statistical tasks requires task-specific algebraic derivations and software implementations, which ignores the massive library of existing software tools already developed for complex inference tasks and severely constrains the scope of post-prediction inference in real applications. To address this challenge, we propose a novel statistical framework for task-agnostic ML-assisted inference. It provides a post-prediction inference solution that can be easily plugged into almost any established data analysis routine. It delivers valid and efficient inference that is robust to arbitrary choices of ML models, while allowing nearly all existing analytical frameworks to be incorporated into the analysis of ML-predicted outcomes. Through extensive experiments, we showcase the validity, versatility, and superiority of our method compared to existing approaches.

Updated: 2024-05-30 13:19:49

标题: 任务无关的机器学习辅助推断

摘要: 机器学习（ML）在科学研究中发挥着越来越重要的作用。与传统统计方法相结合，ML辅助的分析策略显示出加速研究发现的巨大潜力。这也开辟了一个全新的方法研究领域，专注于整合ML和统计学，以应对数据科学挑战。一种迅速受到欢迎的研究类型利用ML来预测大样本中未观察到的结果，然后在下游统计推断中使用预测结果。然而，现有的方法旨在确保这种后预测推断的有效性，仅限于非常基本的任务，如线性回归分析。这是因为将这些方法扩展到新的、更复杂的统计任务需要特定任务的代数推导和软件实现，这忽略了已经为复杂推断任务开发的庞大软件库，并严重限制了实际应用中后预测推断的范围。为了解决这一挑战，我们提出了一种新颖的统计框架，用于任务无关的ML辅助推断。它提供了一个后预测推断解决方案，可以轻松地插入几乎任何已建立的数据分析程序中。它提供了对任意ML模型选择鲁棒的有效推断，同时允许几乎所有现有的分析框架被纳入ML预测结果的分析中。通过广泛的实验，我们展示了我们的方法相对于现有方法的有效性、多功能性和优越性。

更新时间: 2024-05-30 13:19:49

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2405.20039v1

Deep Reinforcement Learning for Intrusion Detection in IoT: A Survey

The rise of new complex attacks scenarios in Internet of things (IoT) environments necessitate more advanced and intelligent cyber defense techniques such as various Intrusion Detection Systems (IDSs) which are responsible for detecting and mitigating malicious activities in IoT networks without human intervention. To address this issue, deep reinforcement learning (DRL) has been proposed in recent years, to automatically tackle intrusions/attacks. In this paper, a comprehensive survey of DRL-based IDS on IoT is presented. Furthermore, in this survey, the state-of-the-art DRL-based IDS methods have been classified into five categories including wireless sensor network (WSN), deep Q-network (DQN), healthcare, hybrid, and other techniques. In addition, the most crucial performance metrics, namely accuracy, recall, precision, false negative rate (FNR), false positive rate (FPR), and F-measure, are detailed, in order to evaluate the performance of each proposed method. The paper provides a summary of datasets utilized in the studies as well.

Updated: 2024-05-30 13:19:23

标题: 物联网中入侵检测的深度强化学习：一项调查

摘要: 物联网(IoT)环境中新型复杂攻击场景的出现需要更先进和智能的网络防御技术，如各种入侵检测系统(IDS)，其负责在物联网网络中检测和缓解恶意活动，无需人为干预。为解决这一问题，近年来提出了深度强化学习(DRL)方法，以自动处理入侵/攻击。本文介绍了基于DRL的IDS在物联网上的全面调查。此外，在这项调查中，最先进的基于DRL的IDS方法被分类为五类，包括无线传感器网络(WSN)、深度Q网络(DQN)、医疗保健、混合和其他技术。此外，还详细介绍了最关键的性能指标，即准确性、召回率、精确度、误报率(FNR)、误报率(FPR)和F-度量，以评估每种提出方法的性能。本文还总结了研究中使用的数据集。

更新时间: 2024-05-30 13:19:23

领域: cs.CR

下载: http://arxiv.org/abs/2405.20038v1

Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion

With the exponential growth of video traffic, traditional video streaming systems are approaching their limits in compression efficiency and communication capacity. To further reduce bitrate while maintaining quality, we propose Promptus, a disruptive novel system that streaming prompts instead of video content with Stable Diffusion, which converts video frames into a series of "prompts" for delivery. To ensure pixel alignment, a gradient descent-based prompt fitting framework is proposed. To achieve adaptive bitrate for prompts, a low-rank decomposition-based bitrate control algorithm is introduced. For inter-frame compression of prompts, a temporal smoothing-based prompt interpolation algorithm is proposed. Evaluations across various video domains and real network traces demonstrate Promptus can enhance the perceptual quality by 0.111 and 0.092 (in LPIPS) compared to VAE and H.265, respectively, and decreases the ratio of severely distorted frames by 89.3% and 91.7%. Moreover, Promptus achieves real-time video generation from prompts at over 150 FPS. To the best of our knowledge, Promptus is the first attempt to replace video codecs with prompt inversion and the first to use prompt streaming instead of video streaming. Our work opens up a new paradigm for efficient video communication beyond the Shannon limit.

Updated: 2024-05-30 13:16:48

标题: Promptus: 能否用稳定传播替代视频流传播

摘要: 随着视频流量的指数增长，传统视频流系统在压缩效率和通信容量方面正接近其极限。为了进一步降低比特率同时保持质量，我们提出了Promptus，这是一个颠覆性的新系统，它使用稳定扩散来流传提示而不是视频内容，将视频帧转换为一系列“提示”进行传输。为了确保像素对齐，提出了一种基于梯度下降的提示拟合框架。为了实现提示的自适应比特率，引入了基于低秩分解的比特率控制算法。针对提示的帧间压缩，提出了一种基于时间平滑的提示插值算法。在各种视频领域和真实网络跟踪的评估中，Promptus相对于VAE和H.265分别可以提高感知质量0.111和0.092（在LPIPS中），并将严重失真帧的比例降低了89.3%和91.7%。此外，Promptus实现了从提示实时生成视频，帧率超过150FPS。据我们所知，Promptus是第一个尝试用提示逆向替代视频编解码器的系统，也是第一个使用提示流而不是视频流的系统。我们的工作为超越香农极限的高效视频通信开辟了新的范式。

更新时间: 2024-05-30 13:16:48

领域: cs.NI,cs.AI,cs.MM

下载: http://arxiv.org/abs/2405.20032v1

A Simple and Adaptive Learning Rate for FTRL in Online Learning with Minimax Regret of $Θ(T^{2/3})$ and its Application to Best-of-Both-Worlds

Follow-the-Regularized-Leader (FTRL) is a powerful framework for various online learning problems. By designing its regularizer and learning rate to be adaptive to past observations, FTRL is known to work adaptively to various properties of an underlying environment. However, most existing adaptive learning rates are for online learning problems with a minimax regret of $\Theta(\sqrt{T})$ for the number of rounds $T$, and there are only a few studies on adaptive learning rates for problems with a minimax regret of $\Theta(T^{2/3})$, which include several important problems dealing with indirect feedback. To address this limitation, we establish a new adaptive learning rate framework for problems with a minimax regret of $\Theta(T^{2/3})$. Our learning rate is designed by matching the stability, penalty, and bias terms that naturally appear in regret upper bounds for problems with a minimax regret of $\Theta(T^{2/3})$. As applications of this framework, we consider two major problems dealing with indirect feedback: partial monitoring and graph bandits. We show that FTRL with our learning rate and the Tsallis entropy regularizer improves existing Best-of-Both-Worlds (BOBW) regret upper bounds, which achieve simultaneous optimality in the stochastic and adversarial regimes. The resulting learning rate is surprisingly simple compared to the existing learning rates for BOBW algorithms for problems with a minimax regret of $\Theta(T^{2/3})$.

Updated: 2024-05-30 13:13:12

标题: 一个简单且自适应的FTRL在线学习学习率，具有$Θ(T^{2/3})$最小化遗憾以及其在Best-of-Both-Worlds中的应用

摘要: Follow-the-Regularized-Leader (FTRL) 是各种在线学习问题的强大框架。通过设计其正则化器和学习率以适应过去的观察结果，已知FTRL能够自适应地适应底层环境的各种特性。然而，大多数现有的自适应学习率适用于具有最小最大后悔度为 $\Theta(\sqrt{T})$ 的在线学习问题，而仅有少数研究涉及具有最小最大后悔度为 $\Theta(T^{2/3})$ 的问题的自适应学习率，其中包括处理间接反馈的若干重要问题。为了解决这一限制，我们建立了一个针对具有最小最大后悔度为 $\Theta(T^{2/3})$ 问题的新的自适应学习率框架。我们的学习率是通过匹配自然出现在具有最小最大后悔度为 $\Theta(T^{2/3})$ 问题的后悔上界中的稳定性、惩罚和偏差项来设计的。作为此框架的应用，我们考虑了两个处理间接反馈的重要问题：部分监控和图形赌博机。我们展示了使用我们的学习率和Tsallis熵正则化器的FTRL可以改进现有的同时在随机和敌对环境中实现最优性的Best-of-Both-Worlds（BOBW）后悔上界，结果学习率相对于具有最小最大后悔度为 $\Theta(T^{2/3})$ 问题的BOBW算法的现有学习率来说出奇的简单。

更新时间: 2024-05-30 13:13:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.20028v1

SEA Cache: A Performance-Efficient Countermeasure for Contention-based Attacks

Many cache designs have been proposed to guard against contention-based side-channel attacks. One well-known type of cache is the randomized remapping cache. Many randomized remapping caches provide fixed or over protection, which leads to permanent performance degradation, or they provide flexible protection, but sacrifice performance against strong contention-based attacks. To improve the secure cache design, we extend an existing secure cache design, CEASER-SH cache, and propose the SEA cache. The novel cache configurations in both caches are logical associativity, which allows the cache line to be placed not only in its mapped cache set but also in the subsequent cache sets. SEA cache allows each user or each process to have a different local logical associativity. Hence, only those users or processes that request extra protection against contention-based attacks are protected with high logical associativity. Other users or processes can access the cache with lower latency and higher performance. Compared to a CEASER-SH cache with logical associativity of 8, an SEA cache with logical associativity of 1 for normal protection users and 16 for high protection users has a Cycles Per Instruction penalty that is about 0.6% less for users under normal protections and provides better security against contention-based attacks. Based on a 45nm technology library, and compared to a conventional cache, we estimate the power overhead is about 20% and the area overhead is 3.4%.

Updated: 2024-05-30 13:12:53

标题: SEA缓存：一种针对基于争用攻击的性能有效对策

摘要: 许多缓存设计方案都被提出来防范基于争用的侧信道攻击。一种著名的缓存类型是随机重映射缓存。许多随机重映射缓存提供固定或超过的保护，导致了永久性性能下降，或者它们提供灵活的保护，但是在面对强烈的基于争用的攻击时会牺牲性能。为了改进安全的缓存设计，我们扩展了一个现有的安全缓存设计，CEASER-SH缓存，并提出了SEA缓存。两种缓存中的新颖配置是逻辑关联性，它允许缓存行不仅放置在其映射的缓存集中，还可以放置在后续的缓存集中。SEA缓存允许每个用户或每个进程具有不同的本地逻辑关联性。因此，只有那些请求额外保护来防范基于争用的攻击的用户或进程会以高逻辑关联性得到保护。其他用户或进程可以以更低的延迟和更高的性能访问缓存。与逻辑关联性为8的CEASER-SH缓存相比，逻辑关联性为1的普通保护用户和逻辑关联性为16的高保护用户的SEA缓存在普通保护用户下的每指令周期惩罚约少了0.6%，并提供更好的防范基于争用的攻击的安全性。基于45纳米技术库，与传统缓存相比，我们估计功耗开销约为20%，面积开销为3.4%。

更新时间: 2024-05-30 13:12:53

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2405.20027v1

CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation Process

Identifying the underlying time-delayed latent causal processes in sequential data is vital for grasping temporal dynamics and making downstream reasoning. While some recent methods can robustly identify these latent causal variables, they rely on strict assumptions about the invertible generation process from latent variables to observed data. However, these assumptions are often hard to satisfy in real-world applications containing information loss. For instance, the visual perception process translates a 3D space into 2D images, or the phenomenon of persistence of vision incorporates historical data into current perceptions. To address this challenge, we establish an identifiability theory that allows for the recovery of independent latent components even when they come from a nonlinear and non-invertible mix. Using this theory as a foundation, we propose a principled approach, CaRiNG, to learn the CAusal RepresentatIon of Non-invertible Generative temporal data with identifiability guarantees. Specifically, we utilize temporal context to recover lost latent information and apply the conditions in our theory to guide the training process. Through experiments conducted on synthetic datasets, we validate that our CaRiNG method reliably identifies the causal process, even when the generation process is non-invertible. Moreover, we demonstrate that our approach considerably improves temporal understanding and reasoning in practical applications.

Updated: 2024-05-30 13:09:47

标题: CaRiNG：在非可逆生成过程下学习时间因果表示

摘要: 在序列数据中识别潜在的带有时间延迟的因果过程对于把握时间动态和进行下游推理至关重要。虽然一些最近的方法可以稳健地识别这些潜在的因果变量，但它们依赖于关于从潜在变量到观察数据的可逆生成过程的严格假设。然而，这些假设通常很难在包含信息丢失的实际应用中得到满足。例如，视觉感知过程将一个3D空间转化为2D图像，或者视觉暂留现象将历史数据融入到当前感知中。为了解决这一挑战，我们建立了一个可识别性理论，允许即使来自非线性和非可逆混合的情况下也可以恢复独立的潜在成分。借助这一理论作为基础，我们提出了一种有原则的方法CaRiNG，用于学习具有可识别性保证的非可逆生成时态数据的因果表示。具体来说，我们利用时间上下文来恢复丢失的潜在信息，并将我们理论中的条件应用于指导训练过程。通过在合成数据集上进行的实验，我们验证了我们的CaRiNG方法可靠地识别因果过程，即使生成过程是非可逆的。此外，我们证明了我们的方法在实际应用中显著改善了对时间的理解和推理能力。

更新时间: 2024-05-30 13:09:47

领域: cs.LG,cs.CV,stat.ME

下载: http://arxiv.org/abs/2401.14535v2

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning; adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, and PaLI-3. Despite the volume of new releases, key design decisions around image preprocessing, architecture, and optimization are under-explored, making it challenging to understand what factors account for model performance $-$ a challenge further complicated by the lack of objective, consistent evaluations. To address these gaps, we first compile a suite of standardized evaluations spanning visual question answering, object localization, and challenge sets that probe properties such as hallucination; evaluations that provide fine-grained insight VLM capabilities. Second, we rigorously investigate VLMs along key design axes, including pretrained visual representations and training from base vs. instruct-tuned language models, amongst others. We couple our analysis with three resource contributions: (1) a unified framework for evaluating VLMs, (2) optimized, flexible training code, and (3) checkpoints for all models, including a family of VLMs at the 7-13B scale that strictly outperform InstructBLIP and LLaVa v1.5, the state-of-the-art in open VLMs.

Updated: 2024-05-30 13:08:48

标题: 棱柱形VLMs：探索视觉条件语言模型的设计空间

摘要: 视觉条件语言模型（VLMs）在视觉对话、场景理解和机器人任务规划等应用中得到了越来越广泛的应用；这种应用促使了诸如LLaVa、InstructBLIP和PaLI-3等一系列新模型的涌现。尽管有大量新模型发布，但在图像预处理、架构和优化等关键设计决策方面仍未得到充分探讨，这使得理解模型性能的因素变得具有挑战性，而缺乏客观、一致的评估更进一步加剧了这一挑战。为了填补这些空白，我们首先编制了一套标准化评估，涵盖了视觉问答、物体定位和挑战集等方面，探究了诸如幻觉等属性；这些评估提供了对VLM能力的细致洞察。其次，我们严谨地研究了VLMs在关键设计轴线上，包括预训练的视觉表示和从基础模型vs.指导调整语言模型进行训练等方面。我们将分析与三种资源贡献结合起来：（1）一个评估VLMs的统一框架，（2）优化的、灵活的训练代码，以及（3）所有模型的检查点，包括严格优于InstructBLIP和LLaVa v1.5的一系列VLMs在7-13B规模上的模型，这些模型是开放VLMs的最新技术水平。

更新时间: 2024-05-30 13:08:48

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.07865v2

Applications of Generative AI (GAI) for Mobile and Wireless Networking: A Survey

The success of Artificial Intelligence (AI) in multiple disciplines and vertical domains in recent years has promoted the evolution of mobile networking and the future Internet toward an AI-integrated Internet-of-Things (IoT) era. Nevertheless, most AI techniques rely on data generated by physical devices (e.g., mobile devices and network nodes) or specific applications (e.g., fitness trackers and mobile gaming). To bypass this circumvent, Generative AI (GAI), a.k.a. AI-generated content (AIGC), has emerged as a powerful AI paradigm; thanks to its ability to efficiently learn complex data distributions and generate synthetic data to represent the original data in various forms. This impressive feature is projected to transform the management of mobile networking and diversify the current services and applications provided. On this basis, this work presents a concise tutorial on the role of GAIs in mobile and wireless networking. In particular, this survey first provides the fundamentals of GAI and representative GAI models, serving as an essential preliminary to the understanding of the applications of GAI in mobile and wireless networking. Then, this work provides a comprehensive review of state-of-the-art studies and GAI applications in network management, wireless security, semantic communication, and lessons learned from the open literature. Finally, this work summarizes the current research on GAI for mobile and wireless networking by outlining important challenges that need to be resolved to facilitate the development and applicability of GAI in this edge-cutting area.

Updated: 2024-05-30 13:06:40

标题: Generative AI（GAI）在移动和无线网络中的应用：一项调查

摘要: 近年来，人工智能在多个学科和垂直领域取得了成功，推动了移动网络和未来互联网向人工智能整合的物联网时代发展。然而，大多数人工智能技术依赖于由物理设备（如移动设备和网络节点）或特定应用程序（如健身跟踪器和移动游戏）生成的数据。为了规避这一限制，生成式人工智能（GAI），即人工智能生成内容（AIGC），已经成为一种强大的人工智能范式；这要归功于它有效学习复杂数据分布并生成合成数据来代表原始数据的能力。这一引人注目的特点预计将改变移动网络的管理方式，并丰富当前提供的服务和应用。基于此，本文提供了关于GAIs在移动和无线网络中的作用的简明教程。具体而言，本调查首先介绍了GAI的基础知识和代表性GAI模型，作为理解GAI在移动和无线网络应用的重要先决条件。然后，本文全面回顾了网络管理、无线安全、语义通信的最新研究和GAI应用，并总结了来自开放文献的经验教训。最后，本文通过概述需要解决的重要挑战，总结了移动和无线网络中关于GAI的当前研究，以促进GAI在这一前沿领域的发展和适用性。

更新时间: 2024-05-30 13:06:40

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2405.20024v1

A random-key GRASP for combinatorial optimization

This paper proposes a problem-independent GRASP metaheuristic using the random-key optimizer (RKO) paradigm. GRASP (greedy randomized adaptive search procedure) is a metaheuristic for combinatorial optimization that repeatedly applies a semi-greedy construction procedure followed by a local search procedure. The best solution found over all iterations is returned as the solution of the GRASP. Continuous GRASP (C-GRASP) is an extension of GRASP for continuous optimization in the unit hypercube. A random-key optimizer (RKO) uses a vector of random keys to encode a solution to a combinatorial optimization problem. It uses a decoder to evaluate a solution encoded by the vector of random keys. A random-key GRASP is a C-GRASP where points in the unit hypercube are evaluated employing a decoder. We describe random key GRASP consisting of a problem-independent component and a problem-dependent decoder. As a proof of concept, the random-key GRASP is tested on five NP-hard combinatorial optimization problems: traveling salesman problem, tree of hubs location problem, Steiner triple covering problem, node capacitated graph partitioning problem, and job sequencing and tool switching problem.

Updated: 2024-05-30 13:06:27

标题: 一种用于组合优化的随机密钥GRASP算法

摘要: 本文提出了一种使用随机密钥优化器（RKO）范例的问题无关的GRASP元启发式方法。GRASP（贪婪随机自适应搜索程序）是一种用于组合优化的元启发式方法，它反复应用半贪婪构造过程，然后是局部搜索过程。在所有迭代中找到的最佳解被返回为GRASP的解。连续GRASP（C-GRASP）是GRASP的一个扩展，用于单位超立方体中的连续优化。随机密钥优化器（RKO）使用一组随机密钥来对组合优化问题的解进行编码。它使用解码器来评估由随机密钥向量编码的解。随机密钥GRASP是一个C-GRASP，其中通过解码器评估单位超立方体中的点。我们描述了由问题无关组件和问题相关解码器组成的随机密钥GRASP。作为概念验证，随机密钥GRASP在五个NP难的组合优化问题上进行了测试：旅行商问题、中心树定位问题、斯坦纳三重覆盖问题、节点容量图分区问题和作业排序和工具切换问题。

更新时间: 2024-05-30 13:06:27

领域: cs.NE,cs.AI,math.OC,90-02, 90B40, 90C27,G.1.6; G.2.1; I.2.8

下载: http://arxiv.org/abs/2405.18681v2

Drug Discovery with Dynamic Goal-aware Fragments

Fragment-based drug discovery is an effective strategy for discovering drug candidates in the vast chemical space, and has been widely employed in molecular generative models. However, many existing fragment extraction methods in such models do not take the target chemical properties into account or rely on heuristic rules. Additionally, the existing fragment-based generative models cannot update the fragment vocabulary with goal-aware fragments newly discovered during the generation. To this end, we propose a molecular generative framework for drug discovery, named Goal-aware fragment Extraction, Assembly, and Modification (GEAM). GEAM consists of three modules, each responsible for goal-aware fragment extraction, fragment assembly, and fragment modification. The fragment extraction module identifies important fragments contributing to the desired target properties with the information bottleneck principle, thereby constructing an effective goal-aware fragment vocabulary. Moreover, GEAM can explore beyond the initial vocabulary with the fragment modification module, and the exploration is further enhanced through the dynamic goal-aware vocabulary update. We experimentally demonstrate that GEAM effectively discovers drug candidates through the generative cycle of the three modules in various drug discovery tasks. Our code is available at https://github.com/SeulLee05/GEAM.

Updated: 2024-05-30 13:03:32

标题: 使用动态目标意识片段进行药物发现

摘要: 基于片段的药物发现是在广阔的化学空间中发现药物候选物的有效策略，并已广泛应用于分子生成模型中。然而，在这些模型中许多现有的片段提取方法并没有考虑目标化学特性，或依赖于启发式规则。此外，现有的基于片段的生成模型无法使用目标感知片段在生成过程中新发现。为此，我们提出了一种名为目标感知片段提取、组装和修改（GEAM）的药物发现分子生成框架。GEAM包括三个模块，分别负责目标感知片段提取、片段组装和片段修改。片段提取模块利用信息瓶颈原则识别对期望的目标特性有贡献的重要片段，从而构建一个有效的目标感知片段词汇表。此外，GEAM可以通过片段修改模块探索超出初始词汇表的内容，并通过动态目标感知词汇表更新进一步增强探索能力。我们通过实验证明，GEAM通过三个模块的生成循环有效地在各种药物发现任务中发现药物候选物。我们的代码可在https://github.com/SeulLee05/GEAM找到。

更新时间: 2024-05-30 13:03:32

领域: cs.LG

下载: http://arxiv.org/abs/2310.00841v3

Safe Multi-agent Reinforcement Learning with Natural Language Constraints

The role of natural language constraints in Safe Multi-agent Reinforcement Learning (MARL) is crucial, yet often overlooked. While Safe MARL has vast potential, especially in fields like robotics and autonomous vehicles, its full potential is limited by the need to define constraints in pre-designed mathematical terms, which requires extensive domain expertise and reinforcement learning knowledge, hindering its broader adoption. To address this limitation and make Safe MARL more accessible and adaptable, we propose a novel approach named Safe Multi-agent Reinforcement Learning with Natural Language constraints (SMALL). Our method leverages fine-tuned language models to interpret and process free-form textual constraints, converting them into semantic embeddings that capture the essence of prohibited states and behaviours. These embeddings are then integrated into the multi-agent policy learning process, enabling agents to learn policies that minimize constraint violations while optimizing rewards. To evaluate the effectiveness of SMALL, we introduce the LaMaSafe, a multi-task benchmark designed to assess the performance of multiple agents in adhering to natural language constraints. Empirical evaluations across various environments demonstrate that SMALL achieves comparable rewards and significantly fewer constraint violations, highlighting its effectiveness in understanding and enforcing natural language constraints.

Updated: 2024-05-30 12:57:35

标题: 带有自然语言约束的安全多智能体强化学习

摘要: 自然语言约束在安全多Agent强化学习（MARL）中的作用至关重要，但常常被忽视。虽然安全MARL在机器人和自动驾驶等领域具有巨大潜力，但其完整潜力受到需要以预先设计的数学术语定义约束的限制，这需要广泛的领域专业知识和强化学习知识，从而阻碍了其更广泛的采用。为了解决这一限制，并使安全MARL更具可访问性和适应性，我们提出了一种新颖的方法，名为带有自然语言约束的安全多Agent强化学习（SMALL）。我们的方法利用经过微调的语言模型来解释和处理自由形式的文本约束，并将其转化为捕捉禁止状态和行为本质的语义嵌入。然后将这些嵌入集成到多Agent策略学习过程中，使代理能够学习最小化违反约束的策略同时优化奖励。为了评估SMALL的有效性，我们引入了LaMaSafe，一个多任务基准测试，旨在评估多个Agent在遵守自然语言约束方面的表现。在各种环境中的实证评估表明，SMALL实现了可比的奖励，并明显减少了约束违规，突显了其在理解和强制自然语言约束方面的有效性。

更新时间: 2024-05-30 12:57:35

领域: cs.MA,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.20018v1

Efficient LLM-Jailbreaking by Introducing Visual Modality

This paper focuses on jailbreaking attacks against large language models (LLMs), eliciting them to generate objectionable content in response to harmful user queries. Unlike previous LLM-jailbreaks that directly orient to LLMs, our approach begins by constructing a multimodal large language model (MLLM) through the incorporation of a visual module into the target LLM. Subsequently, we conduct an efficient MLLM-jailbreak to generate jailbreaking embeddings embJS. Finally, we convert the embJS into text space to facilitate the jailbreaking of the target LLM. Compared to direct LLM-jailbreaking, our approach is more efficient, as MLLMs are more vulnerable to jailbreaking than pure LLM. Additionally, to improve the attack success rate (ASR) of jailbreaking, we propose an image-text semantic matching scheme to identify a suitable initial input. Extensive experiments demonstrate that our approach surpasses current state-of-the-art methods in terms of both efficiency and effectiveness. Moreover, our approach exhibits superior cross-class jailbreaking capabilities.

Updated: 2024-05-30 12:50:32

标题: 通过引入视觉模态实现高效的LLM越狱

摘要: 本文关注针对大型语言模型（LLMs）的越狱攻击，诱使它们在响应有害用户查询时生成令人反感的内容。与直接面向LLMs的先前LLM越狱不同，我们的方法首先通过将视觉模块整合到目标LLM中构建多模态大型语言模型（MLLM）。随后，我们进行有效的MLLM越狱以生成越狱嵌入embJS。最后，我们将embJS转换为文本空间以促进对目标LLM的越狱。与直接LLM越狱相比，我们的方法更有效，因为MLLM比纯LLM更容易受到越狱攻击。此外，为提高越狱攻击成功率（ASR），我们提出了一种图像文本语义匹配方案以识别合适的初始输入。大量实验证明，我们的方法在效率和效果方面均超越了当前的最先进方法。此外，我们的方法展现了出色的跨类别越狱能力。

更新时间: 2024-05-30 12:50:32

领域: cs.AI

下载: http://arxiv.org/abs/2405.20015v1

subMFL: Compatiple subModel Generation for Federated Learning in Device Heterogenous Environment

Federated Learning (FL) is commonly used in systems with distributed and heterogeneous devices with access to varying amounts of data and diverse computing and storage capacities. FL training process enables such devices to update the weights of a shared model locally using their local data and then a trusted central server combines all of those models to generate a global model. In this way, a global model is generated while the data remains local to devices to preserve privacy. However, training large models such as Deep Neural Networks (DNNs) on resource-constrained devices can take a prohibitively long time and consume a large amount of energy. In the current process, the low-capacity devices are excluded from the training process, although they might have access to unseen data. To overcome this challenge, we propose a model compression approach that enables heterogeneous devices with varying computing capacities to participate in the FL process. In our approach, the server shares a dense model with all devices to train it: Afterwards, the trained model is gradually compressed to obtain submodels with varying levels of sparsity to be used as suitable initial global models for resource-constrained devices that were not capable of train the first dense model. This results in an increased participation rate of resource-constrained devices while the transferred weights from the previous round of training are preserved. Our validation experiments show that despite reaching about 50 per cent global sparsity, generated submodels maintain their accuracy while can be shared to increase participation by around 50 per cent.

Updated: 2024-05-30 12:49:34

标题: subMFL：设备异构环境中联邦学习的兼容子模型生成

摘要: 联邦学习（FL）通常用于具有分布式和异构设备的系统，这些设备可以访问不同数量的数据和不同的计算和存储容量。FL训练过程使这些设备能够使用其本地数据在本地更新共享模型的权重，然后信任的中央服务器将所有这些模型组合起来生成一个全局模型。通过这种方式，生成了一个全局模型，同时数据仍然保留在设备本地以保护隐私。然而，在资源受限的设备上训练大型模型，如深度神经网络（DNNs），可能需要很长时间并消耗大量能量。在当前的过程中，低容量设备被排除在训练过程之外，尽管它们可能可以访问未见过的数据。为了克服这一挑战，我们提出了一种模型压缩方法，使具有不同计算能力的异构设备能够参与FL过程。在我们的方法中，服务器与所有设备共享一个密集模型进行训练：随后，训练好的模型逐渐压缩以获得具有不同稀疏级别的子模型，用作适合资源受限设备的初始全局模型。这导致了资源受限设备的参与率增加，同时保留了来自前一轮训练的传输权重。我们的验证实验证明，尽管达到了约50%的全局稀疏度，生成的子模型仍然保持其准确性，同时可以共享以增加约50%的参与度。

更新时间: 2024-05-30 12:49:34

领域: cs.LG

下载: http://arxiv.org/abs/2405.20014v1

FlexiDrop: Theoretical Insights and Practical Advances in Random Dropout Method on GNNs

Graph Neural Networks (GNNs) are powerful tools for handling graph-type data. Recently, GNNs have been widely applied in various domains, but they also face some issues, such as overfitting, over-smoothing and non-robustness. The existing research indicates that random dropout methods are an effective way to address these issues. However, random dropout methods in GNNs still face unresolved problems. Currently, the choice of dropout rate, often determined by heuristic or grid search methods, can increase the generalization error, contradicting the principal aims of dropout. In this paper, we propose a novel random dropout method for GNNs called FlexiDrop. First, we conduct a theoretical analysis of dropout in GNNs using rademacher complexity and demonstrate that the generalization error of traditional random dropout methods is constrained by a function related to the dropout rate. Subsequently, we use this function as a regularizer to unify the dropout rate and empirical loss within a single loss function, optimizing them simultaneously. Therefore, our method enables adaptive adjustment of the dropout rate and theoretically balances the trade-off between model complexity and generalization ability. Furthermore, extensive experimental results on benchmark datasets show that FlexiDrop outperforms traditional random dropout methods in GNNs.

Updated: 2024-05-30 12:48:44

标题: FlexiDrop：关于GNNs中随机Dropout方法的理论洞见和实际进展

摘要: 图神经网络（GNNs）是处理图形数据的强大工具。最近，GNNs已被广泛应用于各个领域，但也面临一些问题，如过拟合、过度平滑和非稳健性。现有研究表明，随机丢弃方法是解决这些问题的有效途径。然而，在GNNs中，随机丢弃方法仍然面临未解决的问题。当前，丢弃率的选择通常由启发式或网格搜索方法确定，可能会增加泛化误差，与丢弃的主要目标相矛盾。本文提出了一种新颖的GNNs随机丢弃方法FlexiDrop。首先，我们利用拉德马赫复杂度对GNNs中的丢弃进行理论分析，并证明传统随机丢弃方法的泛化误差受到与丢弃率相关的函数的约束。随后，我们将这个函数作为正则化器，将丢弃率和经验损失统一到一个损失函数中，同时优化它们。因此，我们的方法能够自适应调整丢弃率，并在理论上平衡模型复杂度和泛化能力之间的权衡。此外，对基准数据集进行的广泛实验结果表明，FlexiDrop在GNNs中表现优于传统的随机丢弃方法。

更新时间: 2024-05-30 12:48:44

领域: cs.LG

下载: http://arxiv.org/abs/2405.20012v1

Language Models Represent Beliefs of Self and Others

Understanding and attributing mental states, known as Theory of Mind (ToM), emerges as a fundamental capability for human social reasoning. While Large Language Models (LLMs) appear to possess certain ToM abilities, the mechanisms underlying these capabilities remain elusive. In this study, we discover that it is possible to linearly decode the belief status from the perspectives of various agents through neural activations of language models, indicating the existence of internal representations of self and others' beliefs. By manipulating these representations, we observe dramatic changes in the models' ToM performance, underscoring their pivotal role in the social reasoning process. Additionally, our findings extend to diverse social reasoning tasks that involve different causal inference patterns, suggesting the potential generalizability of these representations.

Updated: 2024-05-30 12:43:01

标题: 语言模型代表自我和他人的信念

摘要: 理解和归因心理状态，即心灵理论（ToM），被视为人类社会推理的基本能力。虽然大型语言模型（LLMs）似乎具有某些ToM能力，但这些能力的机制仍然难以捉摸。在这项研究中，我们发现通过语言模型的神经激活，可以线性解码来自各种代理人视角的信念状态，表明存在自身和他人信念的内部表征。通过操纵这些表征，我们观察到模型ToM性能的显著变化，强调了它们在社会推理过程中的关键作用。此外，我们的发现扩展到涉及不同因果推理模式的多样化社会推理任务，表明这些表征的潜在泛化能力。

更新时间: 2024-05-30 12:43:01

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.18496v3

Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities

Uncertainty quantification in Large Language Models (LLMs) is crucial for applications where safety and reliability are important. In particular, uncertainty can be used to improve the trustworthiness of LLMs by detecting factually incorrect model responses, commonly called hallucinations. Critically, one should seek to capture the model's semantic uncertainty, i.e., the uncertainty over the meanings of LLM outputs, rather than uncertainty over lexical or syntactic variations that do not affect answer correctness. To address this problem, we propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs. KLE defines positive semidefinite unit trace kernels to encode the semantic similarities of LLM outputs and quantifies uncertainty using the von Neumann entropy. It considers pairwise semantic dependencies between answers (or semantic clusters), providing more fine-grained uncertainty estimates than previous methods based on hard clustering of answers. We theoretically prove that KLE generalizes the previous state-of-the-art method called semantic entropy and empirically demonstrate that it improves uncertainty quantification performance across multiple natural language generation datasets and LLM architectures.

Updated: 2024-05-30 12:42:05

标题: 核语言熵：基于语义相似性对LLMs进行细粒度不确定性量化

摘要: 大型语言模型（LLMs）中的不确定性量化对于安全性和可靠性至关重要。特别是，不确定性可以用来通过检测事实上不正确的模型响应（通常称为幻觉）来提高LLMs的可信度。关键是，人们应该寻求捕捉模型的语义不确定性，即LLMs输出的含义上的不确定性，而不是不影响答案正确性的词汇或句法变化上的不确定性。为了解决这个问题，我们提出了一种新颖的方法，称为核语言熵（KLE），用于在白盒和黑盒LLMs中进行不确定性估计。KLE定义了正半定单位迹核，用于编码LLMs输出的语义相似性，并使用冯·诺依曼熵来量化不确定性。它考虑答案（或语义聚类）之间的成对语义依赖关系，比基于答案的硬聚类的先前方法提供更精细的不确定性估计。我们在理论上证明了KLE推广了先前的最先进方法，称为语义熵，并在多个自然语言生成数据集和LLMs架构上经验性地证明了它提高了不确定性量化性能。

更新时间: 2024-05-30 12:42:05

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20003v1

Meta-Task Planning for Language Agents

The rapid advancement of neural language models has sparked a new surge of intelligent agent research. Unlike traditional agents, large language model-based agents (LLM agents) have emerged as a promising paradigm for achieving artificial general intelligence (AGI) due to their superior reasoning and generalization capabilities. Effective planning is crucial for the success of LLM agents in real-world tasks, making it a highly pursued topic in the community. Current planning methods typically translate tasks into executable action sequences. However, determining a feasible or optimal sequence for complex tasks at fine granularity, which often requires compositing long chains of heterogeneous actions, remains challenging. This paper introduces Meta-Task Planning (MTP), a zero-shot methodology for collaborative LLM-based multi-agent systems that simplifies complex task planning by decomposing it into a hierarchy of subordinate tasks, or meta-tasks. Each meta-task is then mapped into executable actions. MTP was assessed on two rigorous benchmarks, TravelPlanner and API-Bank. Notably, MTP achieved an average $\sim40\%$ success rate on TravelPlanner, significantly higher than the state-of-the-art (SOTA) baseline ($2.92\%$), and outperforming $LLM_{api}$-4 with ReAct on API-Bank by $\sim14\%$, showing the immense potential of integrating LLM with multi-agent systems.

Updated: 2024-05-30 12:40:06

标题: 语言代理的元任务规划

摘要: 神经语言模型的快速发展引发了智能代理研究的新浪潮。与传统代理不同，基于大型语言模型的代理（LLM代理）由于其优越的推理和泛化能力，已经成为实现人工通用智能（AGI）的一种有前途的范式。有效的规划对于LLM代理在现实任务中取得成功至关重要，因此成为社区中备受追捧的主题。当前的规划方法通常将任务转化为可执行的动作序列。然而，在细粒度上确定复杂任务的可行或最佳序列，通常需要组合长链的异构动作，这仍然具有挑战性。本文介绍了元任务规划（MTP），这是一种零热点方法，用于协作LLM基础的多代理系统，通过将复杂任务分解成一系列下属任务或元任务，简化了任务规划。然后将每个元任务映射为可执行的动作。MTP在两个严格的基准测试TravelPlanner和API-Bank上进行了评估。值得注意的是，MTP在TravelPlanner上实现了平均约40%的成功率，远高于现有技术（SOTA）基线（2.92%），并且在API-Bank上的LLM_api-4与ReAct相比表现出约14%的优势，显示了将LLM与多代理系统集成的巨大潜力。

更新时间: 2024-05-30 12:40:06

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.16510v3

ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models

Knowledge Base Question Answering (KBQA) aims to answer natural language questions over large-scale knowledge bases (KBs), which can be summarized into two crucial steps: knowledge retrieval and semantic parsing. However, three core challenges remain: inefficient knowledge retrieval, mistakes of retrieval adversely impacting semantic parsing, and the complexity of previous KBQA methods. To tackle these challenges, we introduce ChatKBQA, a novel and simple generate-then-retrieve KBQA framework, which proposes first generating the logical form with fine-tuned LLMs, then retrieving and replacing entities and relations with an unsupervised retrieval method, to improve both generation and retrieval more directly. Experimental results show that ChatKBQA achieves new state-of-the-art performance on standard KBQA datasets, WebQSP, and CWQ. This work can also be regarded as a new paradigm for combining LLMs with knowledge graphs (KGs) for interpretable and knowledge-required question answering. Our code is publicly available.

Updated: 2024-05-30 12:39:51

标题: ChatKBQA：一种用于知识库问答的生成-检索框架，采用经过微调的大型语言模型

摘要: 知识库问答（KBQA）旨在回答大规模知识库（KB）上的自然语言问题，可以总结为两个关键步骤：知识检索和语义解析。然而，仍然存在三个核心挑战：知识检索效率低、检索错误对语义解析产生不利影响以及先前KBQA方法的复杂性。为了解决这些挑战，我们引入了ChatKBQA，这是一个新颖简单的生成-然后检索的KBQA框架，首先提出了使用经过微调的LLMs生成逻辑形式，然后使用无监督检索方法检索并替换实体和关系，以更直接地改进生成和检索。实验结果表明，ChatKBQA在标准KBQA数据集WebQSP和CWQ上实现了新的最先进性能。这项工作也可以看作是将LLMs与知识图谱（KGs）相结合用于可解释和需要知识的问答的新范式。我们的代码已公开提供。

更新时间: 2024-05-30 12:39:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.08975v2

Inconsistency Handling in Prioritized Databases with Universal Constraints: Complexity Analysis and Links with Active Integrity Constraints

This paper revisits the problem of repairing and querying inconsistent databases equipped with universal constraints. We adopt symmetric difference repairs, in which both deletions and additions of facts can be used to restore consistency, and suppose that preferred repair actions are specified via a binary priority relation over (negated) facts. Our first contribution is to show how existing notions of optimal repairs, defined for simpler denial constraints and repairs solely based on fact deletion, can be suitably extended to our richer setting. We next study the computational properties of the resulting repair notions, in particular, the data complexity of repair checking and inconsistency-tolerant query answering. Finally, we clarify the relationship between optimal repairs of prioritized databases and repair notions introduced in the framework of active integrity constraints. In particular, we show that Pareto-optimal repairs in our setting correspond to founded, grounded and justified repairs w.r.t. the active integrity constraints obtained by translating the prioritized database. Our study also yields useful insights into the behavior of active integrity constraints.

Updated: 2024-05-30 12:36:16

标题: 优先数据库中具有通用约束的不一致性处理：复杂性分析及与主动完整性约束的关联

摘要: 本文重新审视了具有普遍约束的不一致数据库修复和查询问题。我们采用对称差修复，即可以使用事实的删除和添加来恢复一致性，并假定首选修复操作通过对（否定）事实的二进制优先关系来指定。我们的第一个贡献是展示如何将现有的最佳修复概念，针对更简单的否定约束和仅基于事实删除的修复，适当扩展到我们更丰富的环境。接下来，我们研究了所得修复概念的计算特性，特别是修复检查和容忍不一致查询回答的数据复杂性。最后，我们澄清了优先数据库的最佳修复与主动完整性约束框架中引入的修复概念之间的关系。具体而言，我们展示在我们的设置中的帕累托最优修复相当于相对于通过翻译优先数据库获得的主动完整性约束的基础、有理和合理修复。我们的研究还为主动完整性约束的行为提供了有用的见解。

更新时间: 2024-05-30 12:36:16

领域: cs.DB,cs.AI,cs.LO

下载: http://arxiv.org/abs/2306.03523v2

DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild

Image quality assessment (IQA) plays a critical role in selecting high-quality images and guiding compression and enhancement methods in a series of applications. The blind IQA, which assesses the quality of in-the-wild images containing complex authentic distortions without reference images, poses greater challenges. Existing methods are limited to modeling a uniform distribution with local patches and are bothered by the gap between low and high-level visions (caused by widely adopted pre-trained classification networks). In this paper, we propose a novel IQA method called diffusion priors-based IQA (DP-IQA), which leverages the prior knowledge from the pre-trained diffusion model with its excellent powers to bridge semantic gaps in the perception of the visual quality of images. Specifically, we use pre-trained stable diffusion as the backbone, extract multi-level features from the denoising U-Net during the upsampling process at a specified timestep, and decode them to estimate the image quality score. The text and image adapters are adopted to mitigate the domain gap for downstream tasks and correct the information loss caused by the variational autoencoder bottleneck. Finally, we distill the knowledge in the above model into a CNN-based student model, significantly reducing the parameter to enhance applicability, with the student model performing similarly or even better than the teacher model surprisingly. Experimental results demonstrate that our DP-IQA achieves state-of-the-art results on various in-the-wild datasets with better generalization capability, which shows the superiority of our method in global modeling and utilizing the hierarchical feature clues of diffusion for evaluating image quality.

Updated: 2024-05-30 12:32:35

标题: DP-IQA：利用扩散先验进行野外盲图像质量评估

摘要: 图像质量评估（IQA）在选择高质量图像以及在一系列应用中引导压缩和增强方法方面发挥着关键作用。盲目IQA评估野外图像的质量，这些图像包含复杂的真实失真，没有参考图像，面临着更大的挑战。现有方法局限于对局部补丁进行统一分布建模，并受到低级和高级视觉之间差距（由广泛采用的预训练分类网络引起）的困扰。本文提出了一种新颖的IQA方法，称为基于扩散先验的IQA（DP-IQA），利用预训练扩散模型的先验知识，具有出色的能力来弥合图像视觉质量感知中的语义差距。具体来说，我们使用预训练的稳定扩散作为骨干，从去噪U-Net中提取多级特征，在指定的时间步骤进行上采样过程，并解码它们以估计图像质量得分。文本和图像适配器被采用以减轻下游任务的域差距，并纠正由变分自动编码器瓶颈引起的信息损失。最后，我们将上述模型中的知识蒸馏到基于CNN的学生模型中，显著减少参数以增强适用性，令学生模型的性能出乎意料地与教师模型表现相似甚至更好。实验结果表明，我们的DP-IQA在各种野外数据集上取得了最先进的结果，具有更好的泛化能力，显示出我们的方法在全局建模和利用扩散的层次特征线索评估图像质量方面的优势。

更新时间: 2024-05-30 12:32:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19996v1

Symmetries in Overparametrized Neural Networks: A Mean-Field View

We develop a Mean-Field (MF) view of the learning dynamics of overparametrized Artificial Neural Networks (NN) under data symmetric in law wrt the action of a general compact group $G$. We consider for this a class of generalized shallow NNs given by an ensemble of $N$ multi-layer units, jointly trained using stochastic gradient descent (SGD) and possibly symmetry-leveraging (SL) techniques, such as Data Augmentation (DA), Feature Averaging (FA) or Equivariant Architectures (EA). We introduce the notions of weakly and strongly invariant laws (WI and SI) on the parameter space of each single unit, corresponding, respectively, to $G$-invariant distributions, and to distributions supported on parameters fixed by the group action (which encode EA). This allows us to define symmetric models compatible with taking $N\to\infty$ and give an interpretation of the asymptotic dynamics of DA, FA and EA in terms of Wasserstein Gradient Flows describing their MF limits. When activations respect the group action, we show that, for symmetric data, DA, FA and freely-trained models obey the exact same MF dynamic, which stays in the space of WI laws and minimizes therein the population risk. We also give a counterexample to the general attainability of an optimum over SI laws. Despite this, quite remarkably, we show that the set of SI laws is also preserved by the MF dynamics even when freely trained. This sharply contrasts the finite-$N$ setting, in which EAs are generally not preserved by unconstrained SGD. We illustrate the validity of our findings as $N$ gets larger in a teacher-student experimental setting, training a student NN to learn from a WI, SI or arbitrary teacher model through various SL schemes. We last deduce a data-driven heuristic to discover the largest subspace of parameters supporting SI distributions for a problem, that could be used for designing EA with minimal generalization error.

Updated: 2024-05-30 12:32:18

标题: 过度参数化神经网络中的对称性：一个平均场视角

摘要: 我们发展了一个对过参数化的人工神经网络（NN）在数据对称性律下学习动力学的平均场（MF）视角，关于一般紧致群$G$的作用。我们考虑了一个由$N$个多层单元组成的广义浅层NN集合，联合使用随机梯度下降（SGD）和可能的对称性利用（SL）技术进行训练，如数据增强（DA）、特征平均（FA）或等变体系结构（EA）。我们引入了单个单元参数空间上弱不变和强不变律（WI和SI）的概念，分别对应于$G$-不变分布和由群作用确定的参数支持的分布（编码EA）。这使我们能够定义与取$N\to\infty$兼容的对称模型，并解释DA、FA和EA的渐近动力学，描述其MF极限的Wasserstein梯度流。当激活遵守群作用时，我们表明，对于对称数据，DA、FA和自由训练模型遵循完全相同的MF动力学，其保持在WI律的空间中，并在其中最小化总体风险。我们还通过一个反例表明了在SI律上达到最优的一般可能性。尽管如此，非常显著的是，我们表明了即使在自由训练时，SI律的集合也会被MF动力学保持。这与有限$N$设置形成鲜明对比，在该设置中，通常不受约束的SGD不会保持EA。我们通过在教师-学生实验设置中将学生NN训练为通过各种SL方案从WI、SI或任意教师模型学习来说明我们发现的有效性。最后，我们推导了一种数据驱动的启发式方法，用于发现支持SI分布的参数子空间，这可以用于设计具有最小泛化误差的EA。

更新时间: 2024-05-30 12:32:18

领域: stat.ML,cs.LG,math.PR

下载: http://arxiv.org/abs/2405.19995v1

Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics

Natural language is often the easiest and most convenient modality for humans to specify tasks for robots. However, learning to ground language to behavior typically requires impractical amounts of diverse, language-annotated demonstrations collected on each target robot. In this work, we aim to separate the problem of what to accomplish from how to accomplish it, as the former can benefit from substantial amounts of external observation-only data, and only the latter depends on a specific robot embodiment. To this end, we propose Video-Language Critic, a reward model that can be trained on readily available cross-embodiment data using contrastive learning and a temporal ranking objective, and use it to score behavior traces from a separate reinforcement learning actor. When trained on Open X-Embodiment data, our reward model enables 2x more sample-efficient policy training on Meta-World tasks than a sparse reward only, despite a significant domain gap. Using in-domain data but in a challenging task generalization setting on Meta-World, we further demonstrate more sample-efficient training than is possible with prior language-conditioned reward models that are either trained with binary classification, use static images, or do not leverage the temporal information present in video data.

Updated: 2024-05-30 12:18:06

标题: 视频语言评论家：面向语言条件机器人的可转移奖励函数

摘要: 自然语言通常是人类为机器人指定任务最容易和方便的方式。然而，学习将语言与行为联系起来通常需要在每台目标机器人上收集大量不同的、带有语言注释的演示，这通常是不切实际的。在这项工作中，我们的目标是将要完成的任务与如何完成任务分开，因为前者可以从大量的外部观察数据中受益，而后者只依赖于特定的机器人实体。为此，我们提出了视频-语言评论家，这是一个奖励模型，可以使用对比学习和时间排序目标在现有的跨实体数据上进行训练，并将其用于评分来自单独的强化学习actor的行为跟踪。当在Open X-Embodiment数据上进行训练时，我们的奖励模型使Meta-World任务的策略训练比仅使用稀疏奖励要高效2倍，尽管存在显著的领域差距。在Meta-World上使用领域内数据，但在具有挑战性的任务泛化设置中，我们进一步展示了比先前的语言条件奖励模型更高效的训练，这些模型要么是用二元分类进行训练，要么使用静态图像，要么不利用视频数据中存在的时间信息。

更新时间: 2024-05-30 12:18:06

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19988v1

A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain

This work is an attempt to introduce a comprehensive benchmark for Arabic speech recognition, specifically tailored to address the challenges of telephone conversations in Arabic language. Arabic, characterized by its rich dialectal diversity and phonetic complexity, presents a number of unique challenges for automatic speech recognition (ASR) systems. These challenges are further amplified in the domain of telephone calls, where audio quality, background noise, and conversational speech styles negatively affect recognition accuracy. Our work aims to establish a robust benchmark that not only encompasses the broad spectrum of Arabic dialects but also emulates the real-world conditions of call-based communications. By incorporating diverse dialectical expressions and accounting for the variable quality of call recordings, this benchmark seeks to provide a rigorous testing ground for the development and evaluation of ASR systems capable of navigating the complexities of Arabic speech in telephonic contexts. This work also attempts to establish a baseline performance evaluation using state-of-the-art ASR technologies.

Updated: 2024-05-30 12:17:51

标题: 一个用于评估阿拉伯呼叫领域自动语音识别的新基准

摘要: 这项工作旨在引入一个全面的基准，用于阿拉伯语语音识别，特别是针对阿拉伯语电话对话的挑战。阿拉伯语以其丰富的方言多样性和语音复杂性而闻名，对自动语音识别（ASR）系统提出了许多独特的挑战。这些挑战在电话通话领域进一步加剧，其中音频质量、背景噪音和对话语音风格都会对识别准确性产生负面影响。我们的工作旨在建立一个稳健的基准，不仅包含广泛的阿拉伯语方言范围，还模拟基于电话通信的实际条件。通过融合多样的方言表达，并考虑通话录音的可变质量，这个基准旨在为开发和评估能够应对阿拉伯语电话语音复杂性的ASR系统提供严格的测试基础。这项工作还尝试使用最先进的ASR技术建立基准性能评估。

更新时间: 2024-05-30 12:17:51

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.04280v2

Targeted Sequential Indirect Experiment Design

Scientific hypotheses typically concern specific aspects of complex, imperfectly understood or entirely unknown mechanisms, such as the effect of gene expression levels on phenotypes or how microbial communities influence environmental health. Such queries are inherently causal (rather than purely associational), but in many settings, experiments can not be conducted directly on the target variables of interest, but are indirect. Therefore, they perturb the target variable, but do not remove potential confounding factors. If, additionally, the resulting experimental measurements are multi-dimensional and the studied mechanisms nonlinear, the query of interest is generally not identified. We develop an adaptive strategy to design indirect experiments that optimally inform a targeted query about the ground truth mechanism in terms of sequentially narrowing the gap between an upper and lower bound on the query. While the general formulation consists of a bi-level optimization procedure, we derive an efficiently estimable analytical kernel-based estimator of the bounds for the causal effect, a query of key interest, and demonstrate the efficacy of our approach in confounded, multivariate, nonlinear synthetic settings.

Updated: 2024-05-30 12:14:25

标题: 目标定向顺序间接实验设计

摘要: 科学假设通常涉及复杂、不完全理解或完全未知机制的特定方面，例如基因表达水平对表型的影响或微生物群落如何影响环境健康。这些查询本质上是因果关系（而不仅仅是关联性），但在许多情况下，实验无法直接在感兴趣的目标变量上进行，而是间接进行。因此，它们扰动目标变量，但不消除潜在的混杂因素。此外，如果所得到的实验测量是多维的且研究的机制是非线性的，感兴趣的查询通常无法确定。我们开发了一种自适应策略，设计间接实验，最佳地提供关于地面真相机制的目标查询的信息，逐步缩小查询上限和下限之间的差距。虽然一般公式包括一个双层优化过程，但我们推导了一种高效可估计的基于内核的估计器，用于因果效应的边界，这是一个关键查询，并展示了我们的方法在混杂、多变量、非线性的合成环境中的有效性。

更新时间: 2024-05-30 12:14:25

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2405.19985v1

Towards Real World Debiasing: A Fine-grained Analysis On Spurious Correlation

Spurious correlations in training data significantly hinder the generalization capability of machine learning models when faced with distribution shifts in real-world scenarios. To tackle the problem, numerous debias approaches have been proposed and benchmarked on datasets intentionally designed with severe biases. However, it remains to be asked: \textit{1. Do existing benchmarks really capture biases in the real world? 2. Can existing debias methods handle biases in the real world?} To answer the questions, we revisit biased distributions in existing benchmarks and real-world datasets, and propose a fine-grained framework for analyzing dataset bias by disentangling it into the magnitude and prevalence of bias. We observe and theoretically demonstrate that existing benchmarks poorly represent real-world biases. We further introduce two novel biased distributions to bridge this gap, forming a nuanced evaluation framework for real-world debiasing. Building upon these results, we evaluate existing debias methods with our evaluation framework. Results show that existing methods are incapable of handling real-world biases. Through in-depth analysis, we propose a simple yet effective approach that can be easily applied to existing debias methods, named Debias in Destruction (DiD). Empirical results demonstrate the superiority of DiD, improving the performance of existing methods on all types of biases within the proposed evaluation framework.

Updated: 2024-05-30 12:14:05

标题: 走向真实世界去偏见化：对虚假相关性进行细致分析

摘要: 培训数据中的虚假相关性显著地阻碍了机器学习模型在面对真实场景中的分布偏移时的泛化能力。为了解决这个问题，已经提出了许多去偏见的方法，并在故意设计有严重偏见的数据集上进行了基准测试。然而，人们仍然要问：1. 现有的基准真的能够捕捉到现实世界中的偏见吗？2. 现有的去偏见方法能够处理现实世界中的偏见吗？为了回答这些问题，我们重新审视了现有基准和真实世界数据集中的偏见分布，并提出了一个细致的框架，通过将其分解为偏见的幅度和普遍性来分析数据集的偏见。我们观察并理论上证明了现有基准很差地代表了真实世界的偏见。我们进一步引入了两个新的偏见分布来弥合这一差距，形成了一个细致的评估框架，用于真实世界的去偏见。基于这些结果，我们使用我们的评估框架评估现有的去偏见方法。结果表明，现有方法无法处理真实世界的偏见。通过深入分析，我们提出了一种简单而有效的方法，可以轻松应用于现有的去偏见方法，名为破坏中的去偏见（DiD）。实证结果表明了DiD的优越性，在提出的评估框架内改善了现有方法在所有类型的偏见上的表现。

更新时间: 2024-05-30 12:14:05

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.15240v2

Eclipse Attack Detection on a Blockchain Network as a Non-Parametric Change Detection Problem

This paper introduces a novel non-parametric change detection algorithm to identify eclipse attacks on a blockchain network; the non-parametric algorithm relies only on the empirical mean and variance of the dataset, making it highly adaptable. An eclipse attack occurs when malicious actors isolate blockchain users, disrupting their ability to reach consensus with the broader network, thereby distorting their local copy of the ledger. To detect an eclipse attack, we monitor changes in the Fr\'echet mean and variance of the evolving blockchain communication network connecting blockchain users. First, we leverage the Johnson-Lindenstrauss lemma to project large-dimensional networks into a lower-dimensional space, preserving essential statistical properties. Subsequently, we employ a non-parametric change detection procedure, leading to a test statistic that converges weakly to a Brownian bridge process in the absence of an eclipse attack. This enables us to quantify the false alarm rate of the detector. Our detector can be implemented as a smart contract on the blockchain, offering a tamper-proof and reliable solution. Finally, we use numerical examples to compare the proposed eclipse attack detector with a detector based on the random forest model.

Updated: 2024-05-30 12:09:38

标题: 在一个区块链网络上的日食攻击检测作为一个非参数变化检测问题

摘要: 这篇论文介绍了一种新颖的非参数变化检测算法，用于识别区块链网络上的日食攻击；非参数算法仅依赖于数据集的经验均值和方差，使其高度适应性。当恶意行为者隔离区块链用户，破坏他们与更广泛网络达成共识的能力，从而扭曲其账本的本地副本时，就会发生日食攻击。为了检测日食攻击，我们监测连接区块链用户的不断演变的区块链通信网络的Fr\'echet均值和方差的变化。首先，我们利用Johnson-Lindenstrauss引理将大维网络投影到较低维空间，保留了基本的统计属性。随后，我们采用非参数变化检测程序，导致一个在没有日食攻击的情况下弱收敛于布朗桥过程的检验统计量。这使我们能够量化检测器的误报率。我们的检测器可以作为一个智能合约实施在区块链上，提供一个防篡改和可靠的解决方案。最后，我们使用数值示例来比较提出的日食攻击检测器与基于随机森林模型的检测器。

更新时间: 2024-05-30 12:09:38

领域: cs.CR,stat.AP

下载: http://arxiv.org/abs/2404.00538v2

A Deep Reinforcement Learning Approach for Trading Optimization in the Forex Market with Multi-Agent Asynchronous Distribution

In today's forex market traders increasingly turn to algorithmic trading, leveraging computers to seek more profits. Deep learning techniques as cutting-edge advancements in machine learning, capable of identifying patterns in financial data. Traders utilize these patterns to execute more effective trades, adhering to algorithmic trading rules. Deep reinforcement learning methods (DRL), by directly executing trades based on identified patterns and assessing their profitability, offer advantages over traditional DL approaches. This research pioneers the application of a multi-agent (MA) RL framework with the state-of-the-art Asynchronous Advantage Actor-Critic (A3C) algorithm. The proposed method employs parallel learning across multiple asynchronous workers, each specialized in trading across multiple currency pairs to explore the potential for nuanced strategies tailored to different market conditions and currency pairs. Two different A3C with lock and without lock MA model was proposed and trained on single currency and multi-currency. The results indicate that both model outperform on Proximal Policy Optimization model. A3C with lock outperforms other in single currency training scenario and A3C without Lock outperforms other in multi-currency scenario. The findings demonstrate that this approach facilitates broader and faster exploration of different currency pairs, significantly enhancing trading returns. Additionally, the agent can learn a more profitable trading strategy in a shorter time.

Updated: 2024-05-30 12:07:08

标题: 一种深度强化学习方法在外汇市场中的交易优化与多智能体异步分布

摘要: 在今天的外汇市场，交易者越来越倾向于算法交易，利用计算机寻求更多利润。深度学习技术作为机器学习领域的尖端进步，能够识别金融数据中的模式。交易者利用这些模式执行更有效的交易，遵循算法交易规则。深度强化学习方法（DRL）通过直接基于识别的模式执行交易并评估其盈利能力，相对于传统的DL方法具有优势。本研究开拓了应用多智能体（MA）RL框架与最先进的异步优势演员评论（A3C）算法。所提出的方法利用多个异步工作者之间的并行学习，每个工作者专注于跨多个货币对的交易，以探索适应不同市场条件和货币对的微妙策略的潜力。提出了两种不同的带锁和不带锁的A3C MA模型，并在单个货币和多个货币上进行训练。结果表明，两种模型在Proximal Policy Optimization模型上表现出色。带锁的A3C在单一货币训练场景中胜过其他模型，而不带锁的A3C在多货币场景中胜过其他模型。研究结果表明，这种方法促进了对不同货币对的更广泛和更快速的探索，显著增强了交易回报。此外，代理可以在较短的时间内学习到更有利可图的交易策略。

更新时间: 2024-05-30 12:07:08

领域: cs.CE,cs.AI,cs.CC

下载: http://arxiv.org/abs/2405.19982v1

DataSP: A Differential All-to-All Shortest Path Algorithm for Learning Costs and Predicting Paths with Context

Learning latent costs of transitions on graphs from trajectories demonstrations under various contextual features is challenging but useful for path planning. Yet, existing methods either oversimplify cost assumptions or scale poorly with the number of observed trajectories. This paper introduces DataSP, a differentiable all-to-all shortest path algorithm to facilitate learning latent costs from trajectories. It allows to learn from a large number of trajectories in each learning step without additional computation. Complex latent cost functions from contextual features can be represented in the algorithm through a neural network approximation. We further propose a method to sample paths from DataSP in order to reconstruct/mimic observed paths' distributions. We prove that the inferred distribution follows the maximum entropy principle. We show that DataSP outperforms state-of-the-art differentiable combinatorial solver and classical machine learning approaches in predicting paths on graphs.

Updated: 2024-05-30 12:04:17

标题: DataSP：一种用于学习成本和预测路径的差分全对全最短路径算法

摘要: 从轨迹演示中学习图上转换的潜在代价在各种情境特征下是具有挑战性但有用的路径规划。然而，现有方法要么过于简化成本假设，要么随着观察到的轨迹数量的增加而扩展性不佳。本文介绍了DataSP，一种可微的全对全最短路径算法，以便从轨迹中学习潜在代价。它允许每个学习步骤从大量轨迹中学习，而无需额外计算。通过神经网络逼近，算法可以表示来自情境特征的复杂潜在代价函数。我们进一步提出了一种从DataSP中采样路径的方法，以重建/模仿观察路径的分布。我们证明了推断的分布遵循最大熵原理。我们展示了DataSP在预测图上路径方面优于最先进的可微组合求解器和经典机器学习方法。

更新时间: 2024-05-30 12:04:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04923v2

Domain Adaptation with Cauchy-Schwarz Divergence

Domain adaptation aims to use training data from one or multiple source domains to learn a hypothesis that can be generalized to a different, but related, target domain. As such, having a reliable measure for evaluating the discrepancy of both marginal and conditional distributions is crucial. We introduce Cauchy-Schwarz (CS) divergence to the problem of unsupervised domain adaptation (UDA). The CS divergence offers a theoretically tighter generalization error bound than the popular Kullback-Leibler divergence. This holds for the general case of supervised learning, including multi-class classification and regression. Furthermore, we illustrate that the CS divergence enables a simple estimator on the discrepancy of both marginal and conditional distributions between source and target domains in the representation space, without requiring any distributional assumptions. We provide multiple examples to illustrate how the CS divergence can be conveniently used in both distance metric- or adversarial training-based UDA frameworks, resulting in compelling performance.

Updated: 2024-05-30 12:01:12

标题: Cauchy-Schwarz散度的域自适应

摘要: 领域自适应旨在利用来自一个或多个源域的训练数据学习一个可以推广到不同但相关的目标域的假设。因此，对于评估边际和条件分布的差异性具有可靠度量是至关重要的。我们将柯西-施瓦茨（CS）散度引入到无监督领域自适应（UDA）的问题中。CS散度比流行的Kullback-Leibler散度提供了一个理论上更紧的泛化误差界限。这适用于监督学习的一般情况，包括多类分类和回归。此外，我们说明CS散度使得在表示空间中简单估计源域和目标域之间的边际和条件分布的差异，而无需任何分布假设。我们提供多个示例来说明CS散度如何方便地在基于距离度量或对抗训练的UDA框架中使用，从而产生引人注目的性能。

更新时间: 2024-05-30 12:01:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.19978v1

Consistent Submodular Maximization

Maximizing monotone submodular functions under cardinality constraints is a classic optimization task with several applications in data mining and machine learning. In this paper we study this problem in a dynamic environment with consistency constraints: elements arrive in a streaming fashion and the goal is maintaining a constant approximation to the optimal solution while having a stable solution (i.e., the number of changes between two consecutive solutions is bounded). We provide algorithms in this setting with different trade-offs between consistency and approximation quality. We also complement our theoretical results with an experimental analysis showing the effectiveness of our algorithms in real-world instances.

Updated: 2024-05-30 11:59:58

标题: 保持子模最大化

摘要: 在基数约束下最大化单调次模函数是一个经典的优化任务，在数据挖掘和机器学习中有多种应用。本文研究了在动态环境中具有一致性约束的问题：元素以流式方式到达，目标是在保持与最优解的恒定近似的同时拥有一个稳定的解决方案（即，两个连续解决方案之间的变化次数受到限制）。我们在这种设置中提供了不同一致性和近似质量之间的权衡的算法。我们还通过实验分析以实证结果补充了我们的理论结果，展示了我们的算法在真实世界实例中的有效性。

更新时间: 2024-05-30 11:59:58

领域: cs.DS,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.19977v1

A Triumvirate of AI Driven Theoretical Discovery

Recent years have seen the dramatic rise of the usage of AI algorithms in pure mathematics and fundamental sciences such as theoretical physics. This is perhaps counter-intuitive since mathematical sciences require the rigorous definitions, derivations, and proofs, in contrast to the experimental sciences which rely on the modelling of data with error-bars. In this Perspective, we categorize the approaches to mathematical discovery as "top-down", "bottom-up" and "meta-mathematics", as inspired by historical examples. We review some of the progress over the last few years, comparing and contrasting both the advances and the short-comings in each approach. We argue that while the theorist is in no way in danger of being replaced by AI in the near future, the hybrid of human expertise and AI algorithms will become an integral part of theoretical discovery.

Updated: 2024-05-30 11:57:00

标题: 一个由人工智能驱动的理论性发现三人团队

摘要: 近年来，AI算法在纯数学和基础科学（如理论物理）中的使用显著增加。这也许有些反直觉，因为数学科学需要严格的定义、推导和证明，与依赖于带有误差边界的数据建模的实验科学形成对比。在这个透视文章中，我们将数学发现的方法归类为“自上而下”、“自下而上”和“元数学”，受历史例子的启发。我们回顾了过去几年的一些进展，比较和对比了每种方法中的进步和不足。我们认为，虽然理论家在短期内不会被AI取代，但人类专业知识和AI算法的混合将成为理论发现的一个重要部分。

更新时间: 2024-05-30 11:57:00

领域: math.HO,cs.AI,hep-th,physics.hist-ph

下载: http://arxiv.org/abs/2405.19973v1

Unity by Diversity: Improved Representation Learning in Multimodal VAEs

Variational Autoencoders for multimodal data hold promise for many tasks in data analysis, such as representation learning, conditional generation, and imputation. Current architectures either share the encoder output, decoder input, or both across modalities to learn a shared representation. Such architectures impose hard constraints on the model. In this work, we show that a better latent representation can be obtained by replacing these hard constraints with a soft constraint. We propose a new mixture-of-experts prior, softly guiding each modality's latent representation towards a shared aggregate posterior. This approach results in a superior latent representation and allows each encoding to preserve information better from its uncompressed original features. In extensive experiments on multiple benchmark datasets and two challenging real-world datasets, we show improved learned latent representations and imputation of missing data modalities compared to existing methods.

Updated: 2024-05-30 11:55:49

标题: 多元统一：多模态VAE中的改进表示学习

摘要: 多模态数据的变分自编码器在数据分析中有许多潜力，例如表示学习、条件生成和插补。目前的架构要么跨模态共享编码器输出，解码器输入，要么两者都共享，以学习共享表示。这种架构对模型施加了硬约束。在这项工作中，我们展示了通过用软约束替换这些硬约束可以获得更好的潜在表示。我们提出了一种新的专家混合先验，软性地引导每个模态的潜在表示朝向一个共享的汇总后验。这种方法导致了更优秀的潜在表示，并且允许每个编码更好地保留来自其未压缩原始特征的信息。通过在多个基准数据集和两个具有挑战性的真实世界数据集上进行大量实验，我们展示了与现有方法相比，学习到的潜在表示和缺失数据模态的插补得到了改进。

更新时间: 2024-05-30 11:55:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.05300v2

GasTrace: Detecting Sandwich Attack Malicious Accounts in Ethereum

The openness and transparency of Ethereum transaction data make it easy to be exploited by any entities, executing malicious attacks. The sandwich attack manipulates the Automated Market Maker (AMM) mechanism, profiting from manipulating the market price through front or after-running transactions. To identify and prevent sandwich attacks, we propose a cascade classification framework GasTrace. GasTrace analyzes various transaction features to detect malicious accounts, notably through the analysis and modeling of Gas features. In the initial classification, we utilize the Support Vector Machine (SVM) with the Radial Basis Function (RBF) kernel to generate the predicted probabilities of accounts, further constructing a detailed transaction network. Subsequently, the behavior features are captured by the Graph Attention Network (GAT) technique in the second classification. Through cascade classification, GasTrace can analyze and classify the sandwich attacks. Our experimental results demonstrate that GasTrace achieves a remarkable detection and generation capability, performing an accuracy of 96.73\% and an F1 score of 95.71\% for identifying sandwich attack accounts.

Updated: 2024-05-30 11:55:21

标题: GasTrace：在以太坊中检测Sandwich攻击恶意账户

摘要: 以太坊交易数据的开放性和透明性使得它容易被任何实体利用，进行恶意攻击。三明治攻击利用自动做市商（AMM）机制，通过前后顺序的交易操纵市场价格从中获利。为了识别和防止三明治攻击，我们提出了一个级联分类框架GasTrace。GasTrace分析各种交易特征来检测恶意账户，尤其是通过对Gas特征进行分析和建模。在初始分类中，我们利用支持向量机（SVM）与径向基函数（RBF）核来生成账户的预测概率，进一步构建详细的交易网络。随后，行为特征通过图注意力网络（GAT）技术在第二次分类中被捕捉。通过级联分类，GasTrace可以分析和分类三明治攻击。我们的实验结果表明，GasTrace实现了卓越的检测和生成能力，对于识别三明治攻击账户的准确率为96.73\%，F1分数为95.71\%。

更新时间: 2024-05-30 11:55:21

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.19971v1

Strategies to Counter Artificial Intelligence in Law Enforcement: Cross-Country Comparison of Citizens in Greece, Italy and Spain

This paper investigates citizens' counter-strategies to the use of Artificial Intelligence (AI) by law enforcement agencies (LEAs). Based on information from three countries (Greece, Italy and Spain) we demonstrate disparities in the likelihood of ten specific counter-strategies. We further identified factors that increase the propensity for counter-strategies. Our study provides an important new perspective to societal impacts of security-focused AI applications by illustrating the conscious, strategic choices by citizens when confronted with AI capabilities for LEAs.

Updated: 2024-05-30 11:55:10

标题: 对抗执法人工智能的策略：希腊、意大利和西班牙公民的跨国比较

摘要: 本文调查了公民对执法机构使用人工智能（AI）的反制策略。基于希腊、意大利和西班牙的信息，我们展示了十种特定反制策略的可能性差异。我们进一步确定了增加反制策略倾向的因素。我们的研究通过展示公民在面对LEAs的AI能力时所做的有意识、战略性选择，为安全焦点AI应用的社会影响提供了重要的新视角。

更新时间: 2024-05-30 11:55:10

领域: cs.AI,I.2.0; K.4.1

下载: http://arxiv.org/abs/2405.19970v1

Achievable Fairness on Your Data With Utility Guarantees

In machine learning fairness, training models that minimize disparity across different sensitive groups often leads to diminished accuracy, a phenomenon known as the fairness-accuracy trade-off. The severity of this trade-off inherently depends on dataset characteristics such as dataset imbalances or biases and therefore, using a uniform fairness requirement across diverse datasets remains questionable. To address this, we present a computationally efficient approach to approximate the fairness-accuracy trade-off curve tailored to individual datasets, backed by rigorous statistical guarantees. By utilizing the You-Only-Train-Once (YOTO) framework, our approach mitigates the computational burden of having to train multiple models when approximating the trade-off curve. Crucially, we introduce a novel methodology for quantifying uncertainty in our estimates, thereby providing practitioners with a robust framework for auditing model fairness while avoiding false conclusions due to estimation errors. Our experiments spanning tabular (e.g., Adult), image (CelebA), and language (Jigsaw) datasets underscore that our approach not only reliably quantifies the optimum achievable trade-offs across various data modalities but also helps detect suboptimality in SOTA fairness methods.

Updated: 2024-05-30 11:44:40

标题: 您的数据的公平性可实现，并具有效用保证

摘要: 在机器学习公平性方面，训练模型以最小化不同敏感群体之间的差异通常会导致准确性降低，这一现象被称为公平性-准确性折衷。这种折衷的严重程度本质上取决于数据集特征，如数据集不平衡或偏见，因此，在不同数据集上使用统一的公平性要求仍然具有疑问性。为了解决这个问题，我们提出了一种计算效率高的方法，用于逼近适用于个体数据集的公平性-准确性折衷曲线，支持严格的统计保证。通过利用You-Only-Train-Once (YOTO)框架，我们的方法减轻了在逼近折衷曲线时需要训练多个模型的计算负担。重要的是，我们引入了一种新的方法来量化我们估计中的不确定性，从而为实践者提供了一个强大的框架，用于审计模型的公平性，同时避免因估计错误而得出错误的结论。我们的实验涵盖了表格（例如Adult）、图像（CelebA）和语言（Jigsaw）数据集，强调了我们的方法不仅可靠地量化各种数据形态之间的最佳可实现的折衷，还有助于发现SOTA公平性方法中的次优性。

更新时间: 2024-05-30 11:44:40

领域: stat.ML,cs.CY,cs.LG

下载: http://arxiv.org/abs/2402.17106v3

Training-Free Consistent Text-to-Image Generation

Text-to-image models offer a new level of creative flexibility by allowing users to guide the image generation process through natural language. However, using these models to consistently portray the same subject across diverse prompts remains challenging. Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects or add image conditioning to the model. These methods require lengthy per-subject optimization or large-scale pre-training. Moreover, they struggle to align generated images with text prompts and face difficulties in portraying multiple subjects. Here, we present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model. We introduce a subject-driven shared attention block and correspondence-based feature injection to promote subject consistency between images. Additionally, we develop strategies to encourage layout diversity while maintaining subject consistency. We compare ConsiStory to a range of baselines, and demonstrate state-of-the-art performance on subject consistency and text alignment, without requiring a single optimization step. Finally, ConsiStory can naturally extend to multi-subject scenarios, and even enable training-free personalization for common objects.

Updated: 2024-05-30 11:42:15

标题: 无需训练的一致性文本到图像生成

摘要: 文本到图像模型通过允许用户通过自然语言引导图像生成过程，提供了新的创意灵活性。然而，使用这些模型始终准确描绘相同主题在不同提示下仍然具有挑战性。现有方法通过微调模型来教授新词汇以描述特定用户提供的主题，或者向模型添加图像条件。这些方法需要针对每个主题进行漫长的优化或大规模的预训练。此外，它们在将生成的图像与文本提示对齐以及描绘多个主题时存在困难。在这里，我们提出了ConsiStory，一种无需训练的方法，通过共享预训练模型的内部激活，实现一致的主题生成。我们引入了一个主题驱动的共享注意力模块和基于对应的特征注入，以促进图像之间的主题一致性。此外，我们开发了策略来鼓励布局的多样性，同时保持主题的一致性。我们将ConsiStory与一系列基线进行比较，并展示其在主题一致性和文本对齐方面的最新表现，而无需进行一次优化步骤。最后，ConsiStory可以自然地扩展到多主题场景，甚至可以实现无需训练的常见对象个性化。

更新时间: 2024-05-30 11:42:15

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2402.03286v3

Multi-Aspect Controllable Text Generation with Disentangled Counterfactual Augmentation

Multi-aspect controllable text generation aims to control the generated texts in attributes from multiple aspects (e.g., "positive" from sentiment and "sport" from topic). For ease of obtaining training samples, existing works neglect attribute correlations formed by the intertwining of different attributes. Particularly, the stereotype formed by imbalanced attribute correlations significantly affects multi-aspect control. In this paper, we propose MAGIC, a new multi-aspect controllable text generation method with disentangled counterfactual augmentation. We alleviate the issue of imbalanced attribute correlations during training using counterfactual feature vectors in the attribute latent space by disentanglement. During inference, we enhance attribute correlations by target-guided counterfactual augmentation to further improve multi-aspect control. Experiments show that MAGIC outperforms state-of-the-art baselines in both imbalanced and balanced attribute correlation scenarios. Our source code and data are available at https://github.com/nju-websoft/MAGIC.

Updated: 2024-05-30 11:25:42

标题: 多方面可控文本生成与解耦反事实增强

摘要: 多方面可控文本生成旨在控制从多个方面生成的文本的属性（例如，从情感中的“积极”和从主题中的“运动”）。为了方便获取训练样本，现有工作忽略了由不同属性交织形成的属性相关性。特别是，由不平衡属性相关性形成的刻板印象显着影响多方面控制。在本文中，我们提出了一种新的多方面可控文本生成方法MAGIC，采用解缠对立增强来缓解训练过程中不平衡属性相关性的问题。通过在属性潜在空间中使用对立特征向量来解缠，我们增强了属性相关性。在推断过程中，我们通过目标引导的对立增强进一步提高多方面控制。实验表明，MAGIC在不平衡和平衡属性相关性场景中均优于最先进的基线。我们的源代码和数据可在https://github.com/nju-websoft/MAGIC上获取。

更新时间: 2024-05-30 11:25:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19958v1

Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy

As Large Language Models (LLMs) have made significant advancements across various tasks, such as question answering, translation, text summarization, and dialogue systems, the need for accuracy in information becomes crucial, especially for serious financial products serving billions of users like Alipay. However, for a real-world product serving millions of users, the inference speed of LLMs becomes a critical factor compared to a mere experimental model. Hence, this paper presents a generic framework for accelerating the inference process, resulting in a substantial increase in speed and cost reduction for our LLM-based scenarios, with lossless generation accuracy. In the traditional inference process, each token is generated sequentially by the LLM, leading to a time consumption proportional to the number of generated tokens. To enhance this process, our framework, named \textit{lookahead}, introduces a \textit{multi-branch} strategy. Instead of generating a single token at a time, we propose a Trie-based retrieval and verification mechanism to be able to accept several tokens at a forward step. Our strategy offers two distinct advantages: (1) it guarantees absolute correctness of the output, avoiding any approximation algorithms, and (2) the worst-case performance of our approach is equivalent to the conventional process. We conduct extensive experiments to demonstrate the significant improvements achieved by applying our inference acceleration framework. Our framework is widely deployed in Alipay since April 2023, and obtain remarkable 2.66x to 6.26x speedup. Our code is available at https://github.com/alipay/PainlessInferenceAcceleration.

Updated: 2024-05-30 11:25:08

标题: 预测：一种用于大型语言模型的推理加速框架，具有无损生成准确性

摘要: 随着大型语言模型（LLMs）在各种任务中取得重大进展，如问答、翻译、文本摘要和对话系统，信息准确性变得至关重要，特别是对于像支付宝这样为数十亿用户提供服务的严肃金融产品。然而，对于为数百万用户提供服务的真实世界产品，LLMs的推理速度成为一个关键因素，而不仅仅是一个实验模型。因此，本文提出了一个通用框架，用于加速推理过程，从而大幅提高我们基于LLMs的场景的速度并降低成本，同时保持生成准确性。在传统的推理过程中，每个标记都是由LLM顺序生成的，导致时间消耗与生成标记数量成正比。为了增强这一过程，我们的框架，名为“前瞻”，引入了一种“多分支”策略。我们提出了一种基于Trie的检索和验证机制，能够在前向步骤接受多个标记，而不是一次生成一个标记。我们的策略具有两个明显优势：（1）它保证输出的绝对正确性，避免任何近似算法；（2）我们的方法的最坏情况性能等同于传统过程。我们进行了大量实验，以展示应用我们的推理加速框架所取得的显著改进。我们的框架自2023年4月起广泛部署在支付宝中，获得显著的2.66倍至6.26倍的加速。我们的代码可在https://github.com/alipay/PainlessInferenceAcceleration 上找到。

更新时间: 2024-05-30 11:25:08

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.12728v3

PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting

As text-conditioned diffusion models (DMs) achieve breakthroughs in image, video, and 3D generation, the research community's focus has shifted to the more challenging task of text-to-4D synthesis, which introduces a temporal dimension to generate dynamic 3D objects. In this context, we identify Score Distillation Sampling (SDS), a widely used technique for text-to-3D synthesis, as a significant hindrance to text-to-4D performance due to its Janus-faced and texture-unrealistic problems coupled with high computational costs. In this paper, we propose \textbf{P}ixel-\textbf{L}evel \textbf{A}lignments for Text-to-\textbf{4D} Gaussian Splatting (\textbf{PLA4D}), a novel method that utilizes text-to-video frames as explicit pixel alignment targets to generate static 3D objects and inject motion into them. Specifically, we introduce Focal Alignment to calibrate camera poses for rendering and GS-Mesh Contrastive Learning to distill geometry priors from rendered image contrasts at the pixel level. Additionally, we develop Motion Alignment using a deformation network to drive changes in Gaussians and implement Reference Refinement for smooth 4D object surfaces. These techniques enable 4D Gaussian Splatting to align geometry, texture, and motion with generated videos at the pixel level. Compared to previous methods, PLA4D produces synthesized outputs with better texture details in less time and effectively mitigates the Janus-faced problem. PLA4D is fully implemented using open-source models, offering an accessible, user-friendly, and promising direction for 4D digital content creation. Our project page: \href{https://github.com/MiaoQiaowei/PLA4D.github.io}{https://github.com/MiaoQiaowei/PLA4D.github.io}.

Updated: 2024-05-30 11:23:01

标题: PLA4D：文本到4D高斯喷洒的像素级对齐

摘要: 随着文本条件扩散模型（DMs）在图像、视频和3D生成方面取得突破，研究界的关注重点已转向更具挑战性的文本到4D合成任务，这引入了时间维度以生成动态3D对象。在这种背景下，我们发现得分蒸馏采样（SDS）作为一种广泛使用的文本到3D合成技术，由于其具有双面性和纹理不真实的问题，以及高计算成本，成为文本到4D性能的重要障碍。在本文中，我们提出了\textbf{P}ixel-\textbf{L}evel \textbf{A}lignments for Text-to-\textbf{4D} Gaussian Splatting（\textbf{PLA4D}），这是一种利用文本到视频帧作为显式像素对齐目标生成静态3D对象并将运动注入其中的新方法。具体来说，我们引入了焦点对准以校准渲染的相机姿势，以及GS-Mesh对比学习以从像素级别的渲染图像对比中提炼几何先验。此外，我们利用变形网络开发了运动对准，以驱动高斯变化，并实现了参考精炼以获得平滑的4D对象表面。这些技术使4D高斯喷溅能够在像素级别与生成的视频对齐几何、纹理和运动。与以前的方法相比，PLA4D以更少的时间产生具有更好纹理细节的合成输出，并有效缓解了双面问题。PLA4D完全使用开源模型实现，为4D数字内容创作提供了一种易于访问、用户友好和有前景的方向。我们的项目页面：\href{https://github.com/MiaoQiaowei/PLA4D.github.io}{https://github.com/MiaoQiaowei/PLA4D.github.io}。

更新时间: 2024-05-30 11:23:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19957v1

HOLMES: to Detect Adversarial Examples with Multiple Detectors

Deep neural networks (DNNs) can easily be cheated by some imperceptible but purposeful noise added to images, and erroneously classify them. Previous defensive work mostly focused on retraining the models or detecting the noise, but has either shown limited success rates or been attacked by new adversarial examples. Instead of focusing on adversarial images or the interior of DNN models, we observed that adversarial examples generated by different algorithms can be identified based on the output of DNNs (logits). Logit can serve as an exterior feature to train detectors. Then, we propose HOLMES (Hierarchically Organized Light-weight Multiple dEtector System) to reinforce DNNs by detecting potential adversarial examples to minimize the threats they may bring in practical. HOLMES is able to distinguish \textit{unseen} adversarial examples from multiple attacks with high accuracy and low false positive rates than single detector systems even in an adaptive model. To ensure the diversity and randomness of detectors in HOLMES, we use two methods: training dedicated detectors for each label and training detectors with top-k logits. Our effective and inexpensive strategies neither modify original DNN models nor require its internal parameters. HOLMES is not only compatible with all kinds of learning models (even only with external APIs), but also complementary to other defenses to achieve higher detection rates (may also fully protect the system against various adversarial examples).

Updated: 2024-05-30 11:22:55

标题: HOLMES: 使用多个检测器检测敌对示例

摘要: 深度神经网络(DNNs)很容易受到一些对图像添加的不可察觉但有目的的噪音的欺骗，从而错误地对其进行分类。先前的防御工作主要集中在重新训练模型或检测噪音上，但要么显示出有限的成功率，要么被新的对抗性示例攻击。我们观察到，与关注对抗性图像或DNN模型内部不同，由不同算法生成的对抗性示例可以基于DNN的输出(logit)进行识别。Logit可以作为外部特征来训练检测器。因此，我们提出了HOLMES（Hierarchically Organized Light-weight Multiple dEtector System）来通过检测潜在的对抗性示例来加强DNN，从而最大程度地减少它们可能带来的威胁。HOLMES能够以较高的准确性和较低的误报率区分多种攻击方式生成的\textit{未知}对抗性示例，甚至在自适应模型中也能胜过单一检测器系统。为了确保HOLMES中检测器的多样性和随机性，我们使用两种方法：为每个标签训练专用检测器和使用前k个logit训练检测器。我们的有效且廉价的策略既不修改原始DNN模型也不需要其内部参数。HOLMES不仅与所有类型的学习模型兼容（甚至仅使用外部API），而且与其他防御措施相辅相成，以实现更高的检测率（也可能完全保护系统免受各种对抗性示例的影响）。

更新时间: 2024-05-30 11:22:55

领域: cs.AI

下载: http://arxiv.org/abs/2405.19956v1

GenKubeSec: LLM-Based Kubernetes Misconfiguration Detection, Localization, Reasoning, and Remediation

A key challenge associated with Kubernetes configuration files (KCFs) is that they are often highly complex and error-prone, leading to security vulnerabilities and operational setbacks. Rule-based (RB) tools for KCF misconfiguration detection rely on static rule sets, making them inherently limited and unable to detect newly-discovered misconfigurations. RB tools also suffer from misdetection, since mistakes are likely when coding the detection rules. Recent methods for detecting and remediating KCF misconfigurations are limited in terms of their scalability and detection coverage, or due to the fact that they have high expertise requirements and do not offer automated remediation along with misconfiguration detection. Novel approaches that employ LLMs in their pipeline rely on API-based, general-purpose, and mainly commercial models. Thus, they pose security challenges, have inconsistent classification performance, and can be costly. In this paper, we propose GenKubeSec, a comprehensive and adaptive, LLM-based method, which, in addition to detecting a wide variety of KCF misconfigurations, also identifies the exact location of the misconfigurations and provides detailed reasoning about them, along with suggested remediation. When empirically compared with three industry-standard RB tools, GenKubeSec achieved equivalent precision (0.990) and superior recall (0.999). When a random sample of KCFs was examined by a Kubernetes security expert, GenKubeSec's explanations as to misconfiguration localization, reasoning and remediation were 100% correct, informative and useful. To facilitate further advancements in this domain, we share the unique dataset we collected, a unified misconfiguration index we developed for label standardization, our experimentation code, and GenKubeSec itself as an open-source tool.

Updated: 2024-05-30 11:18:52

标题: GenKubeSec: 基于LLM的Kubernetes配置错误检测、定位、推理和修复

摘要: Kubernetes配置文件（KCFs）面临的关键挑战是它们通常非常复杂且容易出错，导致安全漏洞和运营挫折。基于规则（RB）的KCF错误配置检测工具依赖于静态规则集，使其固有地受限且无法检测到新发现的错误配置。RB工具还容易出现误检测，因为在编写检测规则时可能会犯错误。近期用于检测和纠正KCF错误配置的方法在可扩展性和检测覆盖范围方面存在限制，或者由于需要高超的专业知识，无法提供自动化的纠正功能。采用LLM在其流程中的新颖方法依赖于基于API的、通用的、主要是商业模型。因此，它们存在安全挑战，分类性能不一致，并且成本较高。在本文中，我们提出了GenKubeSec，这是一种基于LLM的全面且自适应的方法，除了检测各种KCF错误配置外，还能确定错误配置的确切位置，并提供详细的解释以及建议的纠正措施。经过与三种行业标准的RB工具的实证比较，GenKubeSec实现了等价的精度（0.990）和更优秀的召回率（0.999）。当一个Kubernetes安全专家检查了一组随机样本的KCFs时，GenKubeSec关于错误配置定位、推理和纠正的解释是100%正确、详尽和有用的。为了促进该领域的进一步发展，我们分享了我们收集的独特数据集、我们为标签标准化开发的统一错误配置索引、我们的实验代码以及GenKubeSec作为一个开源工具。

更新时间: 2024-05-30 11:18:52

领域: cs.CR,cs.CL,cs.DC,cs.LG

下载: http://arxiv.org/abs/2405.19954v1

MM-Lego: Modular Biomedical Multimodal Models with Minimal Fine-Tuning

Learning holistic computational representations in physical, chemical or biological systems requires the ability to process information from different distributions and modalities within the same model. Thus, the demand for multimodal machine learning models has sharply risen for modalities that go beyond vision and language, such as sequences, graphs, time series, or tabular data. While there are many available multimodal fusion and alignment approaches, most of them require end-to-end training, scale quadratically with the number of modalities, cannot handle cases of high modality imbalance in the training set, or are highly topology-specific, making them too restrictive for many biomedical learning tasks. This paper presents Multimodal Lego (MM-Lego), a modular and general-purpose fusion and model merging framework to turn any set of encoders into a competitive multimodal model with no or minimal fine-tuning. We achieve this by introducing a wrapper for unimodal encoders that enforces lightweight dimensionality assumptions between modalities and harmonises their representations by learning features in the frequency domain to enable model merging with little signal interference. We show that MM-Lego 1) can be used as a model merging method which achieves competitive performance with end-to-end fusion models without any fine-tuning, 2) can operate on any unimodal encoder, and 3) is a model fusion method that, with minimal fine-tuning, achieves state-of-the-art results on six benchmarked multimodal biomedical tasks.

Updated: 2024-05-30 11:14:01

标题: MM-Lego：具有最小微调的模块化生物医学多模态模型

摘要: 在物理、化学或生物系统中学习整体计算表示需要处理来自不同分布和模态的信息的能力。因此，对于超越视觉和语言等模态的多模态机器学习模型的需求急剧增加，如序列、图形、时间序列或表格数据。虽然有许多可用的多模态融合和对齐方法，但大多数需要端到端训练，随着模态数量呈二次比例增长，无法处理训练集中模态不平衡的情况，或者对拓扑结构高度特定，使其对许多生物医学学习任务过于限制。本文介绍了一种名为Multimodal Lego（MM-Lego）的模块化通用融合和模型合并框架，将任何一组编码器转化为具有竞争力的多模态模型，而无需或仅需最少微调。我们通过引入一个对单模态编码器进行包装的方法来实现这一目标，该方法强制在模态之间进行轻量级维度假设，并通过在频域学习特征来调和它们的表示，以实现模型合并并减少信号干扰。我们展示了MM-Lego：1）可以作为一个模型合并方法，在没有任何微调的情况下实现与端到端融合模型相竞争的性能；2）可以在任何单模态编码器上操作；3）是一个模型融合方法，通过最小微调，在六个基准多模态生物医学任务上实现了最先进的结果。

更新时间: 2024-05-30 11:14:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19950v1

Spectral Truncation Kernels: Noncommutativity in $C^*$-algebraic Kernel Machines

In this paper, we propose a new class of positive definite kernels based on the spectral truncation, which has been discussed in the fields of noncommutative geometry and $C^*$-algebra. We focus on kernels whose inputs and outputs are functions and generalize existing kernels, such as polynomial, product, and separable kernels, by introducing a truncation parameter $n$ that describes the noncommutativity of the products appearing in the kernels. When $n$ goes to infinity, the proposed kernels tend to the existing commutative kernels. If $n$ is finite, they exhibit different behavior, and the noncommutativity induces interactions along the data function domain. We show that the truncation parameter $n$ is a governing factor leading to performance enhancement: by setting an appropriate $n$, we can balance the representation power and the complexity of the representation space. The flexibility of the proposed class of kernels allows us to go beyond previous commutative kernels.

Updated: 2024-05-30 11:12:25

标题: 光谱截断核：$C^*$代数核机器中的非交换性

摘要: 在这篇论文中，我们提出了一种基于谱截断的新型正定核类，该核类已在非交换几何和$C^*$代数领域中讨论过。我们重点研究输入和输出为函数的核，通过引入描述出现在核中的乘积的非交换性的截断参数$n，来推广现有的核，如多项式、乘积和可分离核。当$n$趋向无穷大时，所提出的核趋向于现有的交换核。如果$n$是有限的，它们会表现出不同的行为，并且非交换性会在数据函数域中引发交互作用。我们表明，截断参数$n$是导致性能提升的主要因素：通过设置适当的$n，我们可以平衡表示能力和表示空间复杂性。所提出的核类的灵活性使我们能够超越先前的交换核。

更新时间: 2024-05-30 11:12:25

领域: stat.ML,cs.LG,math.OA

下载: http://arxiv.org/abs/2405.17823v2

Scalable Test Generation to Trigger Rare Targets in High-Level Synthesizable IPs for Cloud FPGAs

High-Level Synthesis (HLS) has transformed the development of complex Hardware IPs (HWIP) by offering abstraction and configurability through languages like SystemC/C++, particularly for Field Programmable Gate Array (FPGA) accelerators in high-performance and cloud computing contexts. These IPs can be synthesized for different FPGA boards in cloud, offering compact area requirements and enhanced flexibility. HLS enables designs to execute directly on ARM processors within modern FPGAs without the need for Register Transfer Level (RTL) synthesis, thereby conserving FPGA resources. While HLS offers flexibility and efficiency, it also introduces potential vulnerabilities such as the presence of hidden circuitry, including the possibility of hosting hardware trojans within designs. In cloud environments, these vulnerabilities pose significant security concerns such as leakage of sensitive data, IP functionality disruption and hardware damage, necessitating the development of robust testing frameworks. This research presents an advanced testing approach for HLS-developed cloud IPs, specifically targeting hidden malicious functionalities that may exist in rare conditions within the design. The proposed method leverages selective instrumentation, combining greybox fuzzing and concolic execution techniques to enhance test generation capabilities. Evaluation conducted on various HLS benchmarks, possessing characteristics of FPGA-based cloud IPs with embedded cloud related threats, demonstrates the effectiveness of our framework in detecting trojans and rare scenarios, showcasing improvements in coverage, time efficiency, memory usage, and testing costs compared to existing methods.

Updated: 2024-05-30 11:10:11

标题: 可扩展的测试生成以触发云FPGAs中可合成高级IP中的稀有目标

摘要: 高级综合（HLS）通过提供抽象和可配置性，已经在像SystemC/C++这样的语言中改变了复杂硬件IP（HWIP）的开发方式，特别是针对高性能和云计算环境中的可编程门阵列（FPGA）加速器。这些IP可以在云中为不同的FPGA板合成，提供紧凑的面积需求和增强的灵活性。HLS使设计能够直接在现代FPGA中的ARM处理器上执行，无需进行寄存器传输级（RTL）综合，从而节省FPGA资源。虽然HLS提供了灵活性和效率，但也引入了潜在的漏洞，如隐藏电路的存在，包括设计中可能存在硬件木马的可能性。在云环境中，这些漏洞带来了重要的安全问题，如敏感数据泄露、IP功能中断和硬件损坏，因此需要开发强大的测试框架。本研究提出了一种针对HLS开发的云IP的先进测试方法，特别针对设计中可能存在的罕见条件下隐藏的恶意功能。所提出的方法利用选择性仪器化，结合灰盒模糊测试和共性执行技术，以增强测试生成能力。在各种具有FPGA基础云IP特征的HLS基准测试上进行的评估显示了我们的框架在检测木马和罕见场景方面的有效性，展示了与现有方法相比在覆盖范围、时间效率、内存使用和测试成本方面的改进。

更新时间: 2024-05-30 11:10:11

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2405.19948v1

Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf

Communication is a fundamental aspect of human society, facilitating the exchange of information and beliefs among people. Despite the advancements in large language models (LLMs), recent agents built with these often neglect the control over discussion tactics, which are essential in communication scenarios and games. As a variant of the famous communication game Werewolf, One Night Ultimate Werewolf (ONUW) requires players to develop strategic discussion policies due to the potential role changes that increase the uncertainty and complexity of the game. In this work, we first present the existence of the Perfect Bayesian Equilibria (PBEs) in two scenarios of the ONUW game: one with discussion and one without. The results showcase that the discussion greatly changes players' utilities by affecting their beliefs, emphasizing the significance of discussion tactics. Based on the insights obtained from the analyses, we propose an RL-instructed language agent framework, where a discussion policy trained by reinforcement learning (RL) is employed to determine appropriate discussion tactics to adopt. Our experimental results on several ONUW game settings demonstrate the effectiveness and generalizability of our proposed framework.

Updated: 2024-05-30 11:07:06

标题: 学习战略性讨论：一个《一夜狼人》案例研究

摘要: 交流是人类社会的基本方面，促进了人们之间信息和信念的交流。尽管大型语言模型（LLMs）取得了进展，但最近使用这些模型构建的代理通常忽视了对讨论策略的控制，在交流场景和游戏中这是至关重要的。作为著名的沙发狼游戏的变种，One Night Ultimate Werewolf（ONUW）要求玩家制定战略性的讨论政策，因为潜在的角色变化增加了游戏的不确定性和复杂性。在这项工作中，我们首先介绍了ONUW游戏两种情景中完美贝叶斯均衡（PBEs）的存在：一种有讨论，一种没有讨论。结果显示，讨论通过影响玩家的信念，极大地改变了玩家的效用，强调了讨论策略的重要性。基于分析所得的见解，我们提出了一个RL指导的语言代理框架，其中通过强化学习（RL）训练的讨论策略被用来确定应采取的适当讨论策略。我们在几种ONUW游戏设置上的实验结果展示了我们提出的框架的有效性和普适性。

更新时间: 2024-05-30 11:07:06

领域: cs.AI

下载: http://arxiv.org/abs/2405.19946v1

Tool Learning with Large Language Models: A Survey

Recently, tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems. Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization, posing barriers to entry for newcomers. This gap motivates us to conduct a comprehensive survey of existing works on tool learning with LLMs. In this survey, we focus on reviewing existing literature from the two primary aspects (1) why tool learning is beneficial and (2) how tool learning is implemented, enabling a comprehensive understanding of tool learning with LLMs. We first explore the "why" by reviewing both the benefits of tool integration and the inherent benefits of the tool learning paradigm from six specific aspects. In terms of "how", we systematically review the literature according to a taxonomy of four key stages in the tool learning workflow: task planning, tool selection, tool calling, and response generation. Additionally, we provide a detailed summary of existing benchmarks and evaluation methods, categorizing them according to their relevance to different stages. Finally, we discuss current challenges and outline potential future directions, aiming to inspire both researchers and industrial developers to further explore this emerging and promising area. We also maintain a GitHub repository to continually keep track of the relevant papers and resources in this rising area at \url{https://github.com/quchangle1/LLM-Tool-Survey}.

Updated: 2024-05-30 11:01:10

标题: 大型语言模型的工具学习：一项调查

摘要: 最近，利用大型语言模型（LLMs）进行工具学习已经成为增强LLMs能力以解决高度复杂问题的一种有前景的范式。尽管这一领域受到越来越多的关注并取得了快速进展，但现有文献仍然零散并缺乏系统性组织，对新手造成了入门障碍。这一差距激励我们对现有关于LLMs的工具学习的工作进行全面调查。在这项调查中，我们着重审查了现有文献的两个主要方面：（1）为什么工具学习有益和（2）如何实施工具学习，以全面了解LLMs的工具学习。我们首先通过审查工具整合的益处以及从六个具体方面审查工具学习范式的固有益处来探讨“为什么”。在“如何”方面，我们根据工具学习工作流程的四个关键阶段的分类法系统地审查文献：任务规划、工具选择、工具调用和响应生成。此外，我们根据它们与不同阶段的相关性对现有基准和评估方法进行了详细总结。最后，我们讨论了当前的挑战并概述了潜在的未来方向，旨在激励研究人员和工业开发者进一步探索这一新兴且有前景的领域。我们还维护一个GitHub存储库，以持续跟踪这一新兴领域中相关的论文和资源：https://github.com/quchangle1/LLM-Tool-Survey。

更新时间: 2024-05-30 11:01:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.17935v2

Inpaint Biases: A Pathway to Accurate and Unbiased Image Generation

This paper examines the limitations of advanced text-to-image models in accurately rendering unconventional concepts which are scarcely represented or absent in their training datasets. We identify how these limitations not only confine the creative potential of these models but also pose risks of reinforcing stereotypes. To address these challenges, we introduce the Inpaint Biases framework, which employs user-defined masks and inpainting techniques to enhance the accuracy of image generation, particularly for novel or inaccurately rendered objects. Through experimental validation, we demonstrate how this framework significantly improves the fidelity of generated images to the user's intent, thereby expanding the models' creative capabilities and mitigating the risk of perpetuating biases. Our study contributes to the advancement of text-to-image models as unbiased, versatile tools for creative expression.

Updated: 2024-05-30 10:58:56

标题: 修补偏差：通向准确且无偏见的图像生成路径

摘要: 本文研究了先进的文本到图像模型在准确呈现其训练数据集中稀缺或缺失的非传统概念方面的局限性。我们指出这些局限性不仅限制了这些模型的创造潜力，还存在加强刻板印象的风险。为了解决这些挑战，我们引入了Inpaint Biases框架，该框架采用用户定义的遮罩和修复技术来提高图像生成的准确性，特别是针对新颖或不准确呈现的对象。通过实验证实，我们展示了这个框架如何显著提高生成图像与用户意图的一致性，从而扩展模型的创造能力，并减轻延续偏见的风险。我们的研究为文本到图像模型作为无偏见、多功能创意表达工具的进步做出了贡献。

更新时间: 2024-05-30 10:58:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.18762v2

DeepLag: Discovering Deep Lagrangian Dynamics for Intuitive Fluid Prediction

Accurately predicting the future fluid is vital to extensive areas such as meteorology, oceanology, and aerodynamics. However, since the fluid is usually observed from an Eulerian perspective, its moving and intricate dynamics are seriously obscured and confounded in static grids, bringing thorny challenges to the prediction. This paper introduces a new Lagrangian-Eulerian combined paradigm to tackle the tanglesome fluid dynamics. Instead of solely predicting the future based on Eulerian observations, we propose DeepLag to discover hidden Lagrangian dynamics within the fluid by tracking the movements of adaptively sampled key particles. DeepLag utilizes the proposed where the Lagrangian movement of the tracked particles is inferred from Eulerian observations, and their accumulated Lagrangian dynamics information is incorporated into global Eulerian evolving features to guide future prediction respectively. Tracking key particles not only provides a transparent and interpretable clue for fluid dynamics but also makes our model free from modeling complex correlations among massive grids for better efficiency. Experimentally, DeepLag excels in three challenging fluid prediction tasks covering 2D and 3D, simulated and real-world fluids.

Updated: 2024-05-30 10:53:51

标题: DeepLag：发现用于直观流体预测的深度拉格朗日动力学

摘要: 准确预测未来的流体对气象学、海洋学和空气动力学等广泛领域至关重要。然而，由于通常从欧拉角度观察流体，其移动和复杂动态在静态网格中严重模糊和混淆，给预测带来了棘手的挑战。本文介绍了一种新的拉格朗日-欧拉混合范式来解决复杂的流体动力学。我们提出了DeepLag，通过跟踪自适应采样的关键粒子的运动，发现流体中隐藏的拉格朗日动态，而不仅仅基于欧拉观测来预测未来。DeepLag利用所提出的方法，从欧拉观测中推断被跟踪粒子的拉格朗日运动，并将其累积的拉格朗日动态信息整合到全局欧拉演化特征中，分别用于指导未来的预测。跟踪关键粒子不仅为流体动力学提供了透明且可解释的线索，还使我们的模型摆脱了对大量网格之间复杂相关性建模，从而提高了效率。实验证明，DeepLag在涵盖2D和3D、模拟和真实世界流体的三项具有挑战性的流体预测任务中表现出色。

更新时间: 2024-05-30 10:53:51

领域: cs.LG,physics.flu-dyn

下载: http://arxiv.org/abs/2402.02425v2

Learning Latent Graph Structures and their Uncertainty

Within a prediction task, Graph Neural Networks (GNNs) use relational information as an inductive bias to enhance the model's accuracy. As task-relevant relations might be unknown, graph structure learning approaches have been proposed to learn them while solving the downstream prediction task. In this paper, we demonstrate that minimization of a point-prediction loss function, e.g., the mean absolute error, does not guarantee proper learning of the latent relational information and its associated uncertainty. Conversely, we prove that a suitable loss function on the stochastic model outputs simultaneously grants (i) the unknown adjacency matrix latent distribution and (ii) optimal performance on the prediction task. Finally, we propose a sampling-based method that solves this joint learning task. Empirical results validate our theoretical claims and demonstrate the effectiveness of the proposed approach.

Updated: 2024-05-30 10:49:22

标题: 学习潜在图结构及其不确定性

摘要: 在预测任务中，图神经网络（GNNs）利用关系信息作为归纳偏差，以提高模型的准确性。由于任务相关的关系可能是未知的，因此提出了图结构学习方法来在解决下游预测任务的同时学习这些关系。在本文中，我们证明了点预测损失函数（例如，平均绝对误差）的最小化并不能保证正确学习潜在的关系信息及其相关的不确定性。相反，我们证明了对随机模型输出使用适当的损失函数同时赋予（i）未知邻接矩阵潜在分布和（ii）在预测任务上的最佳性能。最后，我们提出了一种基于采样的方法来解决这个联合学习任务。经验结果验证了我们的理论观点，并展示了所提出方法的有效性。

更新时间: 2024-05-30 10:49:22

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.19933v1

Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks

Few-shot fine-tuning of Diffusion Models (DMs) is a key advancement, significantly reducing training costs and enabling personalized AI applications. However, we explore the training dynamics of DMs and observe an unanticipated phenomenon: during the training process, image fidelity initially improves, then unexpectedly deteriorates with the emergence of noisy patterns, only to recover later with severe overfitting. We term the stage with generated noisy patterns as corruption stage. To understand this corruption stage, we begin by theoretically modeling the one-shot fine-tuning scenario, and then extend this modeling to more general cases. Through this modeling, we identify the primary cause of this corruption stage: a narrowed learning distribution inherent in the nature of few-shot fine-tuning. To tackle this, we apply Bayesian Neural Networks (BNNs) on DMs with variational inference to implicitly broaden the learned distribution, and present that the learning target of the BNNs can be naturally regarded as an expectation of the diffusion loss and a further regularization with the pretrained DMs. This approach is highly compatible with current few-shot fine-tuning methods in DMs and does not introduce any extra inference costs. Experimental results demonstrate that our method significantly mitigates corruption, and improves the fidelity, quality and diversity of the generated images in both object-driven and subject-driven generation tasks.

Updated: 2024-05-30 10:47:48

标题: 探索扩散模型在少样本微调中的破坏阶段，并通过贝叶斯神经网络进行缓解

摘要: 少样本微调扩散模型（DMs）是一项关键进展，显著降低了训练成本，使个性化人工智能应用成为可能。然而，我们探索了DMs的训练动态并观察到一个意想不到的现象：在训练过程中，图像的保真度最初会提高，然后在出现噪声模式时意外地恶化，只有后来才会出现严重的过拟合而恢复。我们将生成噪声模式的阶段称为腐化阶段。为了理解这个腐化阶段，我们从理论上对一次性微调的情景进行建模，然后将这种建模扩展到更一般的情况。通过这种建模，我们确定了这个腐化阶段的主要原因：微调中固有的学习分布变窄。为了解决这个问题，我们在DMs上应用具有变分推断的贝叶斯神经网络（BNNs）来隐式扩展学习分布，并表明BNNs的学习目标可以自然地被视为扩散损失的期望以及与预训练的DMs进一步的正则化。这种方法与当前的DMs中的少样本微调方法非常兼容，而且不会引入额外的推断成本。实验结果表明，我们的方法显著减轻了腐化现象，并在对象驱动和主体驱动的生成任务中提高了生成图像的保真度、质量和多样性。

更新时间: 2024-05-30 10:47:48

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19931v1

BAN: Detecting Backdoors Activated by Adversarial Neuron Noise

Backdoor attacks on deep learning represent a recent threat that has gained significant attention in the research community. Backdoor defenses are mainly based on backdoor inversion, which has been shown to be generic, model-agnostic, and applicable to practical threat scenarios. State-of-the-art backdoor inversion recovers a mask in the feature space to locate prominent backdoor features, where benign and backdoor features can be disentangled. However, it suffers from high computational overhead, and we also find that it overly relies on prominent backdoor features that are highly distinguishable from benign features. To tackle these shortcomings, this paper improves backdoor feature inversion for backdoor detection by incorporating extra neuron activation information. In particular, we adversarially increase the loss of backdoored models with respect to weights to activate the backdoor effect, based on which we can easily differentiate backdoored and clean models. Experimental results demonstrate our defense, BAN, is 1.37$\times$ (on CIFAR-10) and 5.11$\times$ (on ImageNet200) more efficient with 9.99% higher detect success rate than the state-of-the-art defense BTI-DBF. Our code and trained models are publicly available.\url{https://anonymous.4open.science/r/ban-4B32}

Updated: 2024-05-30 10:44:45

标题: BAN：检测对抗性神经噪声激活的后门

摘要: 深度学习的后门攻击代表了一个近期在研究界引起重大关注的威胁。后门防御主要基于后门反演，已被证明具有通用性、与模型无关，并适用于实际威胁场景。最新的后门反演在特征空间中恢复掩码以定位显著的后门特征，其中良性和后门特征可以被分离。然而，它存在高计算开销的缺点，我们也发现它过分依赖于与良性特征高度可区分的显著后门特征。为了解决这些缺点，本文通过整合额外的神经元激活信息，提高后门特征反演以用于后门检测。具体来说，我们通过对权重增加损失来对抗性地激活后门模型，基于这一点我们可以轻松区分带有后门和干净的模型。实验结果表明我们的防御系统BAN在CIFAR-10上比最新的BTI-DBF防御系统效率提高了1.37倍，在ImageNet200上提高了5.11倍，并且检测成功率比现有防御系统高出9.99%。我们的代码和训练模型已经公开可用。

更新时间: 2024-05-30 10:44:45

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2405.19928v1

P$^2$-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer

Vision Transformers (ViTs) have excelled in computer vision tasks but are memory-consuming and computation-intensive, challenging their deployment on resource-constrained devices. To tackle this limitation, prior works have explored ViT-tailored quantization algorithms but retained floating-point scaling factors, which yield non-negligible re-quantization overhead, limiting ViTs' hardware efficiency and motivating more hardware-friendly solutions. To this end, we propose \emph{P$^2$-ViT}, the first \underline{P}ower-of-Two (PoT) \underline{p}ost-training quantization and acceleration framework to accelerate fully quantized ViTs. Specifically, {as for quantization,} we explore a dedicated quantization scheme to effectively quantize ViTs with PoT scaling factors, thus minimizing the re-quantization overhead. Furthermore, we propose coarse-to-fine automatic mixed-precision quantization to enable better accuracy-efficiency trade-offs. {In terms of hardware,} we develop {a dedicated chunk-based accelerator} featuring multiple tailored sub-processors to individually handle ViTs' different types of operations, alleviating reconfigurable overhead. Additionally, we design {a tailored row-stationary dataflow} to seize the pipeline processing opportunity introduced by our PoT scaling factors, thereby enhancing throughput. Extensive experiments consistently validate P$^2$-ViT's effectiveness. {Particularly, we offer comparable or even superior quantization performance with PoT scaling factors when compared to the counterpart with floating-point scaling factors. Besides, we achieve up to $\mathbf{10.1\times}$ speedup and $\mathbf{36.8\times}$ energy saving over GPU's Turing Tensor Cores, and up to $\mathbf{1.84\times}$ higher computation utilization efficiency against SOTA quantization-based ViT accelerators. Codes are available at \url{https://github.com/shihuihong214/P2-ViT}.

Updated: 2024-05-30 10:26:36

标题: P$^2$-ViT: 完全量化视觉变换器的二次幂后训练量化和加速

摘要: Vision Transformers（ViTs）在计算机视觉任务中表现出色，但它们消耗内存且计算密集，挑战了在资源受限设备上部署的难题。为了解决这一限制，之前的研究探索了ViT专门的量化算法，但保留了浮点比例因子，导致了不可忽略的再量化开销，限制了ViTs的硬件效率，促使更多硬件友好的解决方案。为此，我们提出了\emph{P$^2$-ViT}，第一个\textbf{P}ower-of-Two (PoT)后训练量化和加速框架，以加速完全量化的ViTs。具体来说，对于量化，我们探索了一种专门的量化方案，可以有效地使用PoT比例因子量化ViTs，从而最小化再量化开销。此外，我们提出了粗到细的自动混合精度量化，以实现更好的精度-效率折衷。在硬件方面，我们开发了一个专门的基于块的加速器，具有多个定制的子处理器，可以单独处理ViTs的不同类型的操作，减轻了可重配置的开销。此外，我们设计了一个定制的行静态数据流，以利用我们的PoT比例因子带来的流水线处理机会，从而提高吞吐量。大量实验证实了P$^2$-ViT的有效性。特别是，与具有浮点比例因子的对应方案相比，我们在PoT比例因子下提供了可比甚至更优越的量化性能。此外，我们在GPU的Turing Tensor Cores上实现了高达$\mathbf{10.1\times}$的加速和$\mathbf{36.8\times}$的能耗节约，并且在与SOTA基于量化的ViT加速器相比，我们实现了高达$\mathbf{1.84\times}$的更高计算利用效率。代码可在\url{https://github.com/shihuihong214/P2-ViT}找到。

更新时间: 2024-05-30 10:26:36

领域: cs.AI

下载: http://arxiv.org/abs/2405.19915v1

Robust Kernel Hypothesis Testing under Data Corruption

We propose two general methods for constructing robust permutation tests under data corruption. The proposed tests effectively control the non-asymptotic type I error under data corruption, and we prove their consistency in power under minimal conditions. This contributes to the practical deployment of hypothesis tests for real-world applications with potential adversarial attacks. One of our methods inherently ensures differential privacy, further broadening its applicability to private data analysis. For the two-sample and independence settings, we show that our kernel robust tests are minimax optimal, in the sense that they are guaranteed to be non-asymptotically powerful against alternatives uniformly separated from the null in the kernel MMD and HSIC metrics at some optimal rate (tight with matching lower bound). Finally, we provide publicly available implementations and empirically illustrate the practicality of our proposed tests.

Updated: 2024-05-30 10:23:16

标题: 数据损坏情况下的鲁棒核假设检验

摘要: 我们提出了两种构建强健排列检验的一般方法，以防止数据损坏。所提出的检验有效地控制了在数据损坏情况下的非渐近类型I错误，并证明它们在最小条件下具有一致的功效。这有助于在潜在的对抗性攻击下实际部署假设检验，其中一种方法本质上确保差分隐私，进一步扩展了其对私人数据分析的适用性。对于两样本和独立性设置，我们展示了我们的核强健检验在某种最优速率下（与匹配的下界紧密相符）在核MMD和HSIC度量中与零假设有一定间隔的替代方案是最小最大最优的。最后，我们提供了公开可用的实现，并从实证上说明了我们提出的检验的实用性。

更新时间: 2024-05-30 10:23:16

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.19912v1

Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning

In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced. To address this, existing methods often constrain the learned policy through policy regularization. However, these methods often suffer from the issue of unnecessary conservativeness, hampering policy improvement. This occurs due to the indiscriminate use of all actions from the behavior policy that generates the offline dataset as constraints. The problem becomes particularly noticeable when the quality of the dataset is suboptimal. Thus, we propose Adaptive Advantage-guided Policy Regularization (A2PR), obtaining high-advantage actions from an augmented behavior policy combined with VAE to guide the learned policy. A2PR can select high-advantage actions that differ from those present in the dataset, while still effectively maintaining conservatism from OOD actions. This is achieved by harnessing the VAE capacity to generate samples matching the distribution of the data points. We theoretically prove that the improvement of the behavior policy is guaranteed. Besides, it effectively mitigates value overestimation with a bounded performance gap. Empirically, we conduct a series of experiments on the D4RL benchmark, where A2PR demonstrates state-of-the-art performance. Furthermore, experimental results on additional suboptimal mixed datasets reveal that A2PR exhibits superior performance. Code is available at https://github.com/ltlhuuu/A2PR.

Updated: 2024-05-30 10:20:55

标题: 自适应优势引导政策正则化用于离线强化学习

摘要: 在离线强化学习中，超出分布（OOD）的挑战是明显的。为了解决这个问题，现有方法通常通过策略正则化来限制学习到的策略。然而，这些方法经常受到不必要保守性的问题的困扰，阻碍了策略的改进。这是因为对生成离线数据集的行为策略中的所有动作的不加选择的使用作为约束导致的。当数据集的质量次优时，这个问题尤为明显。因此，我们提出了自适应优势引导策略正则化（A2PR），通过从增强行为策略中获取高优势动作并结合VAE来指导学习到的策略。A2PR可以选择与数据集中存在的动作不同的高优势动作，同时有效地保持来自OOD动作的保守性。这是通过利用VAE的能力生成与数据点分布匹配的样本来实现的。我们在理论上证明了行为策略的改进是得到保证的。此外，它通过有界的性能差来有效减轻值过高估计。在D4RL基准上，我们进行了一系列实验，A2PR展示了最先进的性能。此外，在额外的次优混合数据集上的实验结果表明A2PR表现出更优异的性能。代码可在https://github.com/ltlhuuu/A2PR上获得。

更新时间: 2024-05-30 10:20:55

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.19909v1

CL-MRI: Self-Supervised Contrastive Learning to Improve the Accuracy of Undersampled MRI Reconstruction

In Magnetic Resonance Imaging (MRI), image acquisitions are often undersampled in the measurement domain to accelerate the scanning process, at the expense of image quality. However, image quality is a crucial factor that influences the accuracy of clinical diagnosis; hence, high-quality image reconstruction from undersampled measurements has been a key area of research. Recently, deep learning (DL) methods have emerged as the state-of-the-art for MRI reconstruction, typically involving deep neural networks to transform undersampled MRI images into high-quality MRI images through data-driven processes. Nevertheless, there is clear and significant room for improvement in undersampled DL MRI reconstruction to meet the high standards required for clinical diagnosis, in terms of eliminating aliasing artifacts and reducing image noise. In this paper, we introduce a self-supervised pretraining procedure using contrastive learning to improve the accuracy of undersampled DL MRI reconstruction. We use contrastive learning to transform the MRI image representations into a latent space that maximizes mutual information among different undersampled representations and optimizes the information content at the input of the downstream DL reconstruction models. Our experiments demonstrate improved reconstruction accuracy across a range of acceleration factors and datasets, both quantitatively and qualitatively. Furthermore, our extended experiments validate the proposed framework's robustness under adversarial conditions, such as measurement noise, different k-space sampling patterns, and pathological abnormalities, and also prove the transfer learning capabilities on MRI datasets with completely different anatomy. Additionally, we conducted experiments to visualize and analyze the properties of the proposed MRI contrastive learning latent space.

Updated: 2024-05-30 10:18:44

标题: CL-MRI: 自监督对比学习以提高MRI重建的准确性

摘要: 在磁共振成像（MRI）中，图像采集经常在测量域中进行欠采样，以加快扫描过程，但以牺牲图像质量为代价。然而，图像质量是影响临床诊断准确性的关键因素；因此，从欠采样测量中重建高质量图像一直是研究的重点领域。最近，深度学习（DL）方法已成为MRI重建的最先进技术，通常涉及使用深度神经网络通过数据驱动的过程将欠采样的MRI图像转换为高质量的MRI图像。然而，在欠采样DL MRI重建方面有明显而显著的改进空间，以满足临床诊断所需的高标准，包括消除混叠伪影和减少图像噪声。在本文中，我们介绍了一种使用对比学习的自监督预训练过程，以提高欠采样DL MRI重建的准确性。我们使用对比学习将MRI图像表示转换为最大化不同欠采样表示之间的互信息并优化下游DL重建模型输入信息内容的潜在空间。我们的实验证明，在一系列加速因子和数据集上，重建准确性得到了改善，无论是定量还是定性。此外，我们的扩展实验验证了所提出的框架在对抗条件下的鲁棒性，例如测量噪声、不同k空间采样模式和病理异常，并证明了在具有完全不同解剖结构的MRI数据集上的迁移学习能力。此外，我们进行了实验来可视化和分析所提出的MRI对比学习潜在空间的特性。

更新时间: 2024-05-30 10:18:44

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2306.00530v3

A Multi-Branched Radial Basis Network Approach to Predicting Complex Chaotic Behaviours

In this study, we propose a multi branched network approach to predict the dynamics of a physics attractor characterized by intricate and chaotic behavior. We introduce a unique neural network architecture comprised of Radial Basis Function (RBF) layers combined with an attention mechanism designed to effectively capture nonlinear inter-dependencies inherent in the attractor's temporal evolution. Our results demonstrate successful prediction of the attractor's trajectory across 100 predictions made using a real-world dataset of 36,700 time-series observations encompassing approximately 28 minutes of activity. To further illustrate the performance of our proposed technique, we provide comprehensive visualizations depicting the attractor's original and predicted behaviors alongside quantitative measures comparing observed versus estimated outcomes. Overall, this work showcases the potential of advanced machine learning algorithms in elucidating hidden structures in complex physical systems while offering practical applications in various domains requiring accurate short-term forecasting capabilities.

Updated: 2024-05-30 10:16:04

标题: 一种多支径径向基网络方法用于预测复杂混沌行为

摘要: 在这项研究中，我们提出了一种多分支网络方法来预测由复杂和混沌行为特征的物理吸引子的动态。我们引入了一个独特的神经网络架构，由径向基函数（RBF）层和注意机制组成，旨在有效捕捉吸引子时间演化中固有的非线性相互依赖关系。我们的结果展示了对吸引子轨迹的成功预测，通过使用一个包含36,700个时间序列观测的真实数据集进行了100次预测，涵盖了约28分钟的活动。为了进一步说明我们提出的技术的性能，我们提供了全面的可视化，展示了吸引子的原始行为和预测行为，同时比较了观察到的结果与估计结果的定量指标。总的来说，这项工作展示了先进机器学习算法在阐明复杂物理系统中隐藏结构方面的潜力，同时提供了在各种领域中需要准确短期预测能力的实际应用。

更新时间: 2024-05-30 10:16:04

领域: cs.LG,cs.CV,cs.NE

下载: http://arxiv.org/abs/2404.00618v2

Defining Neural Network Architecture through Polytope Structures of Dataset

Current theoretical and empirical research in neural networks suggests that complex datasets require large network architectures for thorough classification, yet the precise nature of this relationship remains unclear. This paper tackles this issue by defining upper and lower bounds for neural network widths, which are informed by the polytope structure of the dataset in question. We also delve into the application of these principles to simplicial complexes and specific manifold shapes, explaining how the requirement for network width varies in accordance with the geometric complexity of the dataset. Moreover, we develop an algorithm to investigate a converse situation where the polytope structure of a dataset can be inferred from its corresponding trained neural networks. Through our algorithm, it is established that popular datasets such as MNIST, Fashion-MNIST, and CIFAR10 can be efficiently encapsulated using no more than two polytopes with a small number of faces.

Updated: 2024-05-30 10:13:13

标题: 通过数据集的多面体结构定义神经网络架构

摘要: 目前神经网络的理论和实证研究表明，复杂数据集需要大型网络架构进行彻底分类，然而这种关系的确切性质仍不清楚。本文通过定义神经网络宽度的上下界来解决这个问题，这些界限受到所讨论数据集的多面体结构的影响。我们还深入探讨了这些原则在单纯复合体和特定流形形状上的应用，解释了网络宽度要求如何随着数据集的几何复杂性而变化。此外，我们开发了一种算法来研究一个相反的情况，即从其相应训练的神经网络中推断数据集的多面体结构。通过我们的算法，确定了流行数据集如MNIST、Fashion-MNIST和CIFAR10可以使用不超过两个多面体和少量面来高效地封装。

更新时间: 2024-05-30 10:13:13

领域: cs.LG,cs.CV,cs.NE

下载: http://arxiv.org/abs/2402.02407v2

Enabling Uncertainty Estimation in Iterative Neural Networks

Turning pass-through network architectures into iterative ones, which use their own output as input, is a well-known approach for boosting performance. In this paper, we argue that such architectures offer an additional benefit: The convergence rate of their successive outputs is highly correlated with the accuracy of the value to which they converge. Thus, we can use the convergence rate as a useful proxy for uncertainty. This results in an approach to uncertainty estimation that provides state-of-the-art estimates at a much lower computational cost than techniques like Ensembles, and without requiring any modifications to the original iterative model. We demonstrate its practical value by embedding it in two application domains: road detection in aerial images and the estimation of aerodynamic properties of 2D and 3D shapes.

Updated: 2024-05-30 10:10:19

标题: 在迭代神经网络中实现不确定性估计

摘要: 将透明网络架构转变为迭代式架构，即使用自身输出作为输入，是一种提高性能的众所周知的方法。本文论证这种架构还提供了另一个好处：连续输出的收敛速率与其收敛到的数值的准确度高度相关。因此，我们可以将收敛速率作为不确定性的有用代理。这导致了一种不确定性估计方法，以比如集成方法等更低的计算成本提供最先进的估计，而且无需对原始迭代模型进行任何修改。我们通过将其嵌入两个应用领域来展示其实际价值：航空图像中的道路检测和二维和三维形状的空气动力学特性估计。

更新时间: 2024-05-30 10:10:19

领域: cs.AI

下载: http://arxiv.org/abs/2403.16732v2

Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection

Label noise, commonly found in real-world datasets, has a detrimental impact on a model's generalization. To effectively detect incorrectly labeled instances, previous works have mostly relied on distinguishable training signals, such as training loss, as indicators to differentiate between clean and noisy labels. However, they have limitations in that the training signals incompletely reveal the model's behavior and are not effectively generalized to various noise types, resulting in limited detection accuracy. In this paper, we propose DynaCor framework that distinguishes incorrectly labeled instances from correctly labeled ones based on the dynamics of the training signals. To cope with the absence of supervision for clean and noisy labels, DynaCor first introduces a label corruption strategy that augments the original dataset with intentionally corrupted labels, enabling indirect simulation of the model's behavior on noisy labels. Then, DynaCor learns to identify clean and noisy instances by inducing two clearly distinguishable clusters from the latent representations of training dynamics. Our comprehensive experiments show that DynaCor outperforms the state-of-the-art competitors and shows strong robustness to various noise types and noise rates.

Updated: 2024-05-30 10:06:06

标题: 学习带标签污染的判别动态以检测嘈杂标签

摘要: 标签噪声在现实世界的数据集中很常见，并且对模型的泛化性能有害。为了有效检测错误标记的实例，先前的研究主要依赖于可区分的训练信号，如训练损失，作为区分清洁和有噪声标签的指标。然而，它们存在局限性，因为训练信号不完全揭示模型的行为，并且不能有效地推广到各种噪声类型，导致检测精度有限。在本文中，我们提出了DynaCor框架，该框架基于训练信号的动态区分错误标记的实例和正确标记的实例。为了解决对清洁和有噪声标签缺乏监督的问题，DynaCor首先引入了一个标签损坏策略，通过故意损坏标签来增加原始数据集，从而间接模拟模型在噪声标签上的行为。然后，DynaCor通过诱导训练动态的潜在表示中的两个明显可区分的簇来学习识别清洁和有噪声的实例。我们的全面实验证明，DynaCor优于最先进的竞争对手，并且对各种噪声类型和噪声率表现出强大的鲁棒性。

更新时间: 2024-05-30 10:06:06

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.19902v1

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models

We propose a novel zero-shot document ranking approach based on Large Language Models (LLMs): the Setwise prompting approach. Our approach complements existing prompting approaches for LLM-based zero-shot ranking: Pointwise, Pairwise, and Listwise. Through the first-of-its-kind comparative evaluation within a consistent experimental framework and considering factors like model size, token consumption, latency, among others, we show that existing approaches are inherently characterised by trade-offs between effectiveness and efficiency. We find that while Pointwise approaches score high on efficiency, they suffer from poor effectiveness. Conversely, Pairwise approaches demonstrate superior effectiveness but incur high computational overhead. Our Setwise approach, instead, reduces the number of LLM inferences and the amount of prompt token consumption during the ranking procedure, compared to previous methods. This significantly improves the efficiency of LLM-based zero-shot ranking, while also retaining high zero-shot ranking effectiveness. We make our code and results publicly available at \url{https://github.com/ielab/llm-rankers}.

Updated: 2024-05-30 10:03:27

标题: 一种有效且高效的零样本排名方法：基于大语言模型的集合式方法

摘要: 我们提出了一种基于大型语言模型（LLMs）的新颖的零射击文档排名方法：Setwise提示方法。我们的方法补充了基于LLM的零射击排名的现有提示方法：Pointwise，Pairwise和Listwise。通过在一致的实验框架内进行首次比较评估，并考虑模型大小，标记消耗，延迟等因素，我们表明现有方法在效果和效率之间存在固有的权衡。我们发现，虽然Pointwise方法在效率上得分很高，但在效果上却表现不佳。相反，Pairwise方法表现出卓越的效果，但会产生高计算开销。与之前的方法相比，我们的Setwise方法在排名过程中减少了LLM推断的数量和提示标记的消耗量。这显着提高了基于LLM的零射击排名的效率，同时也保持了高零射击排名效果。我们将我们的代码和结果公开发布在\url{https://github.com/ielab/llm-rankers}。

更新时间: 2024-05-30 10:03:27

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2310.09497v2

Urban Air Pollution Forecasting: a Machine Learning Approach leveraging Satellite Observations and Meteorological Forecasts

Air pollution poses a significant threat to public health and well-being, particularly in urban areas. This study introduces a series of machine-learning models that integrate data from the Sentinel-5P satellite, meteorological conditions, and topological characteristics to forecast future levels of five major pollutants. The investigation delineates the process of data collection, detailing the combination of diverse data sources utilized in the study. Through experiments conducted in the Milan metropolitan area, the models demonstrate their efficacy in predicting pollutant levels for the forthcoming day, achieving a percentage error of around 30%. The proposed models are advantageous as they are independent of monitoring stations, facilitating their use in areas without existing infrastructure. Additionally, we have released the collected dataset to the public, aiming to stimulate further research in this field. This research contributes to advancing our understanding of urban air quality dynamics and emphasizes the importance of amalgamating satellite, meteorological, and topographical data to develop robust pollution forecasting models.

Updated: 2024-05-30 10:02:53

标题: 城市空气污染预测：利用卫星观测和气象预报的机器学习方法

摘要: 空气污染对公共健康和福祉构成重大威胁，尤其是在城市地区。本研究引入了一系列整合来自Sentinel-5P卫星、气象条件和地形特征数据的机器学习模型，用于预测未来五种主要污染物的水平。研究描绘了数据收集过程，详细说明了研究中使用的不同数据源的组合。通过在米兰都会区进行的实验，模型展示了它们在预测未来一天的污染物水平方面的有效性，实现了约30%的百分比误差。所提出的模型具有优势，因为它们不依赖于监测站，有助于在没有现有基础设施的地区使用。此外，我们已将收集的数据集公开发布，旨在激发这一领域的进一步研究。本研究有助于推动我们对城市空气质量动态的理解，并强调整合卫星、气象和地形数据以开发强大的污染预测模型的重要性。

更新时间: 2024-05-30 10:02:53

领域: cs.LG,physics.ao-ph,I.2.m; G.3

下载: http://arxiv.org/abs/2405.19901v1

Distilling Robustness into Natural Language Inference Models with Domain-Targeted Augmentation

Knowledge distillation optimises a smaller student model to behave similarly to a larger teacher model, retaining some of the performance benefits. While this method can improve results on in-distribution examples, it does not necessarily generalise to out-of-distribution (OOD) settings. We investigate two complementary methods for improving the robustness of the resulting student models on OOD domains. The first approach augments the distillation with generated unlabelled examples that match the target distribution. The second method upsamples data points among the training set that are similar to the target distribution. When applied on the task of natural language inference (NLI), our experiments on MNLI show that distillation with these modifications outperforms previous robustness solutions. We also find that these methods improve performance on OOD domains even beyond the target domain.

Updated: 2024-05-30 10:00:14

标题: 通过领域定向增强将稳健性融入自然语言推理模型

摘要: 知识蒸馏优化了一个较小的学生模型，使其表现类似于一个较大的教师模型，保留了一些性能优势。虽然这种方法可以改善在分布示例上的结果，但不一定推广到分布之外（OOD）的设置。我们研究了两种改进所得学生模型在OOD领域上鲁棒性的互补方法。第一种方法通过生成与目标分布匹配的未标记示例来增强蒸馏。第二种方法在训练集中上采样与目标分布相似的数据点。在自然语言推理（NLI）任务上应用时，我们在MNLI上的实验证明，使用这些修改的蒸馏优于先前的鲁棒性解决方案。我们还发现这些方法提高了OOD领域的性能，甚至超越了目标领域。

更新时间: 2024-05-30 10:00:14

领域: cs.CL,cs.LG,I.2.7

下载: http://arxiv.org/abs/2305.13067v2

Open-Set Domain Adaptation for Semantic Segmentation

Unsupervised domain adaptation (UDA) for semantic segmentation aims to transfer the pixel-wise knowledge from the labeled source domain to the unlabeled target domain. However, current UDA methods typically assume a shared label space between source and target, limiting their applicability in real-world scenarios where novel categories may emerge in the target domain. In this paper, we introduce Open-Set Domain Adaptation for Semantic Segmentation (OSDA-SS) for the first time, where the target domain includes unknown classes. We identify two major problems in the OSDA-SS scenario as follows: 1) the existing UDA methods struggle to predict the exact boundary of the unknown classes, and 2) they fail to accurately predict the shape of the unknown classes. To address these issues, we propose Boundary and Unknown Shape-Aware open-set domain adaptation, coined BUS. Our BUS can accurately discern the boundaries between known and unknown classes in a contrastive manner using a novel dilation-erosion-based contrastive loss. In addition, we propose OpenReMix, a new domain mixing augmentation method that guides our model to effectively learn domain and size-invariant features for improving the shape detection of the known and unknown classes. Through extensive experiments, we demonstrate that our proposed BUS effectively detects unknown classes in the challenging OSDA-SS scenario compared to the previous methods by a large margin. The code is available at https://github.com/KHU-AGI/BUS.

Updated: 2024-05-30 09:55:19

标题: 开放领域适应用于语义分割

摘要: 无监督域自适应（UDA）用于语义分割的目标是将标记的源域的像素级知识转移到未标记的目标域。然而，当前的UDA方法通常假设源和目标之间存在共享的标签空间，限制了它们在现实世界场景中的适用性，其中目标域可能会出现新类别。在本文中，我们首次提出了用于语义分割的开放集域适应（OSDA-SS），其中目标域包括未知类别。我们确定了OSDA-SS场景中的两个主要问题：1）现有的UDA方法难以预测未知类别的确切边界，2）它们无法准确预测未知类别的形状。为了解决这些问题，我们提出了边界和未知形状感知开放集域自适应，即BUS。我们的BUS可以使用一种新的基于膨胀-腐蚀对比损失在对比方式中准确辨别已知和未知类别之间的边界。此外，我们提出了OpenReMix，一种新的域混合增强方法，指导我们的模型有效地学习域和大小不变特征，以改善已知和未知类别的形状检测。通过大量实验证明，与之前的方法相比，我们提出的BUS在具有挑战性的OSDA-SS场景中有效地检测未知类别。该代码可在https://github.com/KHU-AGI/BUS 上找到。

更新时间: 2024-05-30 09:55:19

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19899v1

Position: Tensor Networks are a Valuable Asset for Green AI

For the first time, this position paper introduces a fundamental link between tensor networks (TNs) and Green AI, highlighting their synergistic potential to enhance both the inclusivity and sustainability of AI research. We argue that TNs are valuable for Green AI due to their strong mathematical backbone and inherent logarithmic compression potential. We undertake a comprehensive review of the ongoing discussions on Green AI, emphasizing the importance of sustainability and inclusivity in AI research to demonstrate the significance of establishing the link between Green AI and TNs. To support our position, we first provide a comprehensive overview of efficiency metrics proposed in Green AI literature and then evaluate examples of TNs in the fields of kernel machines and deep learning using the proposed efficiency metrics. This position paper aims to incentivize meaningful, constructive discussions by bridging fundamental principles of Green AI and TNs. We advocate for researchers to seriously evaluate the integration of TNs into their research projects, and in alignment with the link established in this paper, we support prior calls encouraging researchers to treat Green AI principles as a research priority.

Updated: 2024-05-30 09:53:16

标题: 立场：张量网络是绿色人工智能的宝贵资产

摘要: 这篇立场文件首次介绍了张量网络（TNs）与绿色人工智能之间的基本联系，突显它们在增强人工智能研究的包容性和可持续性方面的协同潜力。我们认为TNs对绿色人工智能具有价值，因为它们具有坚实的数学基础和固有的对数压缩潜力。我们对绿色人工智能的持续讨论进行了全面回顾，强调AI研究中可持续性和包容性的重要性，以展示建立绿色人工智能和TNs之间联系的重要性。为了支持我们的立场，我们首先提供了绿色人工智能文献中提出的效率指标的全面概述，然后使用提出的效率指标评估了在核机器和深度学习领域中的TNs的示例。这篇立场文件旨在通过连接绿色人工智能和TNs的基本原则来激励有意义、建设性的讨论。我们主张研究人员认真评估将TNs整合到他们的研究项目中，并与本文建立的联系保持一致，我们支持之前呼吁研究人员将绿色人工智能原则视为研究重点的呼吁。

更新时间: 2024-05-30 09:53:16

领域: cs.LG,cs.AI,I.2.0; K.4.0

下载: http://arxiv.org/abs/2205.12961v2

Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

In recent years, large language models (LLMs) have made remarkable achievements in various domains. However, the untimeliness and cost of knowledge updates coupled with hallucination issues of LLMs have curtailed their applications in knowledge intensive tasks, where retrieval augmented generation (RAG) can be of help. Nevertheless, existing retrieval augmented models typically use similarity as a bridge between queries and documents and follow a retrieve then read procedure. In this work, we argue that similarity is not always the panacea and totally relying on similarity would sometimes degrade the performance of retrieval augmented generation. To this end, we propose MetRag, a Multi layEred Thoughts enhanced Retrieval Augmented Generation framework. To begin with, beyond existing similarity oriented thought, we embrace a small scale utility model that draws supervision from an LLM for utility oriented thought and further come up with a smarter model by comprehensively combining the similarity and utility oriented thoughts. Furthermore, given the fact that the retrieved document set tends to be huge and using them in isolation makes it difficult to capture the commonalities and characteristics among them, we propose to make an LLM as a task adaptive summarizer to endow retrieval augmented generation with compactness-oriented thought. Finally, with multi layered thoughts from the precedent stages, an LLM is called for knowledge augmented generation. Extensive experiments on knowledge-intensive tasks have demonstrated the superiority of MetRag.

Updated: 2024-05-30 09:50:38

标题: 相似性并不是你所需要的全部：为检索增强生成赋予多层次思考

摘要: 近年来，大型语言模型（LLMs）在各个领域取得了显著的成就。然而，LLMs的知识更新的及时性和成本以及幻觉问题限制了它们在知识密集型任务中的应用，而检索增强生成（RAG）可以提供帮助。然而，现有的检索增强模型通常使用相似性作为查询和文档之间的桥梁，并遵循检索然后阅读的过程。在这项工作中，我们认为相似性并不总是万灵药，完全依赖相似性有时会降低检索增强生成的性能。因此，我们提出了MetRag，一个多层思维增强检索增强生成框架。首先，除了现有的相似性导向思维之外，我们接受了一种小规模实用模型，该模型从LLM中获取监督以用于实用导向思维，并通过全面结合相似性和实用导向思维提出了更智能的模型。此外，鉴于检索到的文档集往往很庞大，单独使用它们很难捕捉它们之间的共同点和特征，我们提议将LLM作为一个任务自适应摘要生成器，赋予检索增强生成以紧凑导向思维。最后，在前面阶段的多层思维的基础上，LLM被称为知识增强生成器。对知识密集型任务的大量实验证明了MetRag的优越性。

更新时间: 2024-05-30 09:50:38

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.19893v1

LQER: Low-Rank Quantization Error Reconstruction for LLMs

Post-training quantization of Large Language Models (LLMs) is challenging. In this work, we introduce Low-rank Quantization Error Reduction (LQER), which combines quantization and low-rank approximation to recover the model capability. LQER leverages an activation-induced scale matrix to drive the singular value distribution of quantization error towards a desirable distribution, which enables nearly-lossless W4A8 quantization on various LLMs and downstream tasks without the need for knowledge distillation, grid search, or gradient-base iterative optimization. Unlike existing methods, the computation pattern of LQER eliminates the need for specialized Scatter and Gather processes to collect high-precision weights from irregular memory locations. Our W4A8 LLMs achieve near-lossless performance on six popular downstream tasks, while using 1.36$\times$ fewer hardware resources than the leading state-of-the-art method. We open-source our framework at https://github.com/ChengZhang-98/lqer

Updated: 2024-05-30 09:49:47

标题: LQER: 用于LLMs的低秩量化误差重建

摘要: 大型语言模型（LLMs）的训练后量化是具有挑战性的。在这项工作中，我们介绍了低秩量化误差减少（LQER），它结合了量化和低秩逼近来恢复模型的能力。 LQER利用激活诱导的比例矩阵来驱动量化误差的奇异值分布朝向理想的分布，从而使各种LLMs和下游任务能够实现几乎无损的W4A8量化，而无需知识蒸馏、网格搜索或基于梯度的迭代优化。与现有方法不同，LQER的计算模式消除了需要专门的Scatter和Gather过程来从不规则的内存位置收集高精度权重的需求。我们的W4A8 LLMs在六个流行的下游任务上实现了几乎无损的性能，同时使用的硬件资源比领先的最先进方法少1.36倍。我们在https://github.com/ChengZhang-98/lqer开源我们的框架。

更新时间: 2024-05-30 09:49:47

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.02446v3

Deep Joint Semantic Coding and Beamforming for Near-Space Airship-Borne Massive MIMO Network

Near-space airship-borne communication network is recognized to be an indispensable component of the future integrated ground-air-space network thanks to airships' advantage of long-term residency at stratospheric altitudes, but it urgently needs reliable and efficient Airship-to-X link. To improve the transmission efficiency and capacity, this paper proposes to integrate semantic communication with massive multiple-input multiple-output (MIMO) technology. Specifically, we propose a deep joint semantic coding and beamforming (JSCBF) scheme for airship-based massive MIMO image transmission network in space, in which semantics from both source and channel are fused to jointly design the semantic coding and physical layer beamforming. First, we design two semantic extraction networks to extract semantics from image source and channel state information, respectively. Then, we propose a semantic fusion network that can fuse these semantics into complex-valued semantic features for subsequent physical-layer transmission. To efficiently transmit the fused semantic features at the physical layer, we then propose the hybrid data and model-driven semantic-aware beamforming networks. At the receiver, a semantic decoding network is designed to reconstruct the transmitted images. Finally, we perform end-to-end deep learning to jointly train all the modules, using the image reconstruction quality at the receivers as a metric. The proposed deep JSCBF scheme fully combines the efficient source compressibility and robust error correction capability of semantic communication with the high spectral efficiency of massive MIMO, achieving a significant performance improvement over existing approaches.

Updated: 2024-05-30 09:46:59

标题: 深层联合语义编码和波束成形用于近空中载气球大规模MIMO网络

摘要: 近空间飞艇载通信网络被认为是未来综合地空空网络中不可或缺的组成部分，这要归功于飞艇在平流层高度长期停留的优势，但急需可靠高效的飞艇对X链路。为了提高传输效率和容量，本文提出将语义通信与大规模多输入多输出（MIMO）技术相结合。具体来说，我们提出了一种深度联合语义编码和波束成形（JSCBF）方案，用于空中飞艇基础的大规模MIMO图像传输网络，其中来自源和信道的语义被融合起来共同设计语义编码和物理层波束成形。首先，我们设计了两个语义提取网络，分别从图像源和信道状态信息中提取语义。然后，我们提出了一个语义融合网络，它可以将这些语义融合成复值语义特征，用于后续物理层传输。为了在物理层高效传输融合的语义特征，我们接着提出了混合数据和模型驱动的语义感知波束成形网络。在接收端，设计了一个语义解码网络来重建传输的图像。最后，我们进行端到端的深度学习，联合训练所有模块，以接收端的图像重建质量作为度量标准。所提出的深度JSCBF方案充分结合了语义通信的高效源可压缩性和鲁棒的纠错能力与大规模MIMO的高频谱效率，相较现有方法取得了显著的性能改进。

更新时间: 2024-05-30 09:46:59

领域: eess.SP,cs.IT,cs.LG,cs.MM,math.IT

下载: http://arxiv.org/abs/2405.19889v1

Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

The rise of large language models (LLMs) has enabled LLM-based applications (a.k.a. AI agents or co-pilots), a new software paradigm that combines the strength of LLM and conventional software. Diverse LLM applications from different tenants could design complex workflows using multiple LLM requests to accomplish one task. However, they have to use the over-simplified request-level API provided by today's public LLM services, losing essential application-level information. Public LLM services have to blindly optimize individual LLM requests, leading to sub-optimal end-to-end performance of LLM applications. This paper introduces Parrot, an LLM service system that focuses on the end-to-end experience of LLM-based applications. Parrot proposes Semantic Variable, a unified abstraction to expose application-level knowledge to public LLM services. A Semantic Variable annotates an input/output variable in the prompt of a request, and creates the data pipeline when connecting multiple LLM requests, providing a natural way to program LLM applications. Exposing Semantic Variables to the public LLM service allows it to perform conventional data flow analysis to uncover the correlation across multiple LLM requests. This correlation opens a brand-new optimization space for the end-to-end performance of LLM-based applications. Extensive evaluations demonstrate that Parrot can achieve up to an order-of-magnitude improvement for popular and practical use cases of LLM applications.

Updated: 2024-05-30 09:46:36

标题: 鹦鹉：基于语义变量的LLM应用程序高效服务

摘要: 大型语言模型（LLM）的兴起使得基于LLM的应用（也称为AI代理或副驾驶员）成为可能，这是一种结合了LLM和传统软件优势的新软件范式。来自不同租户的多样化LLM应用可以使用多个LLM请求来设计复杂的工作流程来完成一个任务。然而，它们必须使用当今公共LLM服务提供的过度简化的请求级API，从而丢失了基本的应用级信息。公共LLM服务必须盲目优化单个LLM请求，导致LLM应用的端到端性能不佳。本文介绍了Parrot，一种专注于LLM应用端到端体验的LLM服务系统。Parrot提出了语义变量，这是一种统一的抽象，用于向公共LLM服务公开应用级知识。语义变量在请求的提示中注释输入/输出变量，并在连接多个LLM请求时创建数据管道，为编程LLM应用提供了一种自然的方法。将语义变量暴露给公共LLM服务使其能够执行传统数据流分析，揭示多个LLM请求之间的相关性。这种相关性为LLM应用的端到端性能开辟了全新的优化空间。广泛的评估表明，Parrot可以在LLM应用的流行和实际用例中实现高达一个数量级的改进。

更新时间: 2024-05-30 09:46:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19888v1

Federated Learning with Multi-resolution Model Broadcast

In federated learning, a server must periodically broadcast a model to the agents. We propose to use multi-resolution coding and modulation (also known as non-uniform modulation) for this purpose. In the simplest instance, broadcast transmission is used, whereby all agents are targeted with one and the same transmission (typically without any particular favored beam direction), which is coded using multi-resolution coding/modulation. This enables high-SNR agents, with high path gains to the server, to receive a more accurate model than the low-SNR agents do, without consuming more downlink resources. As one implementation, we use transmission with a non-uniform 8-PSK constellation, where a high-SNR receiver (agent) can separate all 8 constellation points (hence receive 3 bits) whereas a low-SNR receiver can only separate 4 points (hence receive 2 bits). By encoding the least significant information in the third bit, the high-SNR receivers can obtain the model with higher accuracy, while the low-SNR receiver can still obtain the model although with reduced accuracy, thereby facilitating at least some basic participation of the low-SNR receiver. We show the effectiveness of our proposed scheme via experimentation using federated learning with the MNIST data-set.

Updated: 2024-05-30 09:45:18

标题: 使用多分辨率模型广播的联邦学习

摘要: 在联邦学习中，服务器必须定期向代理广播模型。我们建议使用多分辨率编码和调制（也称为非均匀调制）来实现这一目的。在最简单的情况下，使用广播传输，所有代理都被定位为接收相同的传输（通常没有特定的首选波束方向），这些传输使用多分辨率编码/调制进行编码。这使得具有高信噪比和高路径增益的代理可以比低信噪比的代理更准确地接收模型，而不会消耗更多的下行资源。作为一种实现，我们使用非均匀8-PSK星座传输，其中高信噪比的接收器（代理）可以分离所有8个星座点（因此接收3位），而低信噪比的接收器只能分离4个点（因此接收2位）。通过在第三位中编码最不显著的信息，高信噪比的接收器可以获得更高精度的模型，而低信噪比的接收器仍然可以获取模型，尽管精度降低，从而促进低信噪比接收器的至少一些基本参与。我们通过使用MNIST数据集进行联邦学习的实验展示了我们提出方案的有效性。

更新时间: 2024-05-30 09:45:18

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2405.19886v1

Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning

Reinforcement learning is able to obtain generalized low-level robot policies on diverse robotics datasets in embodied learning scenarios, and Transformer has been widely used to model time-varying features. However, it still suffers from the issues of low data efficiency and high inference latency. In this paper, we propose to investigate the task from a new perspective of the frequency domain. We first observe that the energy density in the frequency domain of a robot's trajectory is mainly concentrated in the low-frequency part. Then, we present the Fourier Controller Network (FCNet), a new network that utilizes the Short-Time Fourier Transform (STFT) to extract and encode time-varying features through frequency domain interpolation. We further achieve parallel training and efficient recurrent inference by using FFT and Sliding DFT methods in the model architecture for real-time decision-making. Comprehensive analyses in both simulated (e.g., D4RL) and real-world environments (e.g., robot locomotion) demonstrate FCNet's substantial efficiency and effectiveness over existing methods such as Transformer, e.g., FCNet outperforms Transformer on multi-environmental robotics datasets of all types of sizes (from 1.9M to 120M). The project page and code can be found https://thkkk.github.io/fcnet.

Updated: 2024-05-30 09:43:59

标题: 傅立叶控制器网络用于具身学习中的实时决策-making

摘要: 强化学习能够在具体学习场景中的各种机器人数据集上获得泛化的低层级机器人策略，而Transformer已被广泛用于建模时变特征。然而，它仍然面临数据效率低和推理延迟高的问题。本文提出从频率域的新视角研究任务。我们首先观察到机器人轨迹在频率域的能量密度主要集中在低频部分。然后，我们提出傅立叶控制器网络（FCNet），这是一个利用短时傅立叶变换（STFT）通过频率域插值提取和编码时变特征的新网络。我们进一步通过在模型架构中使用FFT和滑动DFT方法实现并行训练和高效的循环推理，用于实时决策制定。在模拟环境（例如D4RL）和真实环境（例如机器人运动）中的综合分析表明，FCNet在效率和效果方面明显优于Transformer，例如，在各种规模的多环境机器人数据集上（从1.9M到120M），FCNet优于Transformer。项目页面和代码可以在https://thkkk.github.io/fcnet找到。

更新时间: 2024-05-30 09:43:59

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.19885v1

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

In this work, from a theoretical lens, we aim to understand why large language model (LLM) empowered agents are able to solve decision-making problems in the physical world. To this end, consider a hierarchical reinforcement learning (RL) model where the LLM Planner and the Actor perform high-level task planning and low-level execution, respectively. Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting. Under proper assumptions on the pretraining data, we prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning. Additionally, we highlight the necessity for exploration beyond the subgoals derived from BAIL by proving that naively executing the subgoals returned by LLM leads to a linear regret. As a remedy, we introduce an $\epsilon$-greedy exploration strategy to BAIL, which is proven to incur sublinear regret when the pretraining error is small. Finally, we extend our theoretical framework to include scenarios where the LLM Planner serves as a world model for inferring the transition model of the environment and to multi-agent settings, enabling coordination among multiple Actors.

Updated: 2024-05-30 09:42:54

标题: 从言语到行动：揭示LLM驱动自治系统的理论基础

摘要: 在这项工作中，从理论视角出发，我们旨在理解为什么大型语言模型（LLM）赋予的代理能够解决物理世界中的决策问题。为此，考虑一个分层强化学习（RL）模型，其中LLM Planner和Actor分别执行高级任务规划和低级执行。在这个模型下，LLM Planner通过提示生成基于语言的子目标来导航部分可观察的马尔可夫决策过程（POMDP）。在对预训练数据做出适当假设的情况下，我们证明了预先训练的LLM Planner通过上下文学习有效地执行贝叶斯聚合模仿学习（BAIL）。此外，我们强调了超越BAIL得到的子目标的探索的必要性，通过证明简单地执行LLM返回的子目标会导致线性后悔。作为补救措施，我们引入了一个$\epsilon$-贪婪的探索策略到BAIL，证明了当预训练误差较小时，会产生次线性的后悔。最后，我们将我们的理论框架扩展到包括LLM Planner作为推断环境过渡模型的世界模型以及多代理设置，实现多个Actor之间的协调。

更新时间: 2024-05-30 09:42:54

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.19883v1

Learning Latent Dynamic Robust Representations for World Models

Visual Model-Based Reinforcement Learning (MBRL) promises to encapsulate agent's knowledge about the underlying dynamics of the environment, enabling learning a world model as a useful planner. However, top MBRL agents such as Dreamer often struggle with visual pixel-based inputs in the presence of exogenous or irrelevant noise in the observation space, due to failure to capture task-specific features while filtering out irrelevant spatio-temporal details. To tackle this problem, we apply a spatio-temporal masking strategy, a bisimulation principle, combined with latent reconstruction, to capture endogenous task-specific aspects of the environment for world models, effectively eliminating non-essential information. Joint training of representations, dynamics, and policy often leads to instabilities. To further address this issue, we develop a Hybrid Recurrent State-Space Model (HRSSM) structure, enhancing state representation robustness for effective policy learning. Our empirical evaluation demonstrates significant performance improvements over existing methods in a range of visually complex control tasks such as Maniskill \cite{gu2023maniskill2} with exogenous distractors from the Matterport environment. Our code is avaliable at https://github.com/bit1029public/HRSSM.

Updated: 2024-05-30 09:40:02

标题: 学习潜在动态稳健表示以构建世界模型

摘要: 基于视觉模型的强化学习（MBRL）承诺将智能体关于环境潜在动态的知识封装起来，从而使学习一个世界模型成为一个有用的规划器成为可能。然而，像Dreamer这样的顶尖MBRL智能体在存在观察空间中的外生或无关噪声时经常在视觉像素输入方面遇到困难，这是由于未能捕获任务特定特征而滤除无关的时空细节所导致的。为了解决这个问题，我们应用了一种时空掩膜策略，即双模拟原则，结合潜在重构，以捕获环境的内生任务特定方面，从而为世界模型消除非必要信息。表示、动态和策略的联合训练经常导致不稳定性。为了进一步解决这个问题，我们开发了一种混合递归状态空间模型（HRSSM）结构，增强状态表示的鲁棒性，以实现有效的策略学习。我们的实证评估表明，在一系列视觉复杂的控制任务中，如Maniskill（gu2023maniskill2），并从Matterport环境中引入外生干扰者，我们的方法显著提高了性能。我们的代码可在https://github.com/bit1029public/HRSSM 上找到。

更新时间: 2024-05-30 09:40:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.06263v2

Provably Robust Cost-Sensitive Learning via Randomized Smoothing

We study the problem of robust learning against adversarial perturbations under cost-sensitive scenarios, where the potential harm of different types of misclassifications is encoded in a cost matrix. Existing approaches are either empirical and cannot certify robustness or suffer from inherent scalability issues. In this work, we investigate whether randomized smoothing, a scalable framework for robustness certification, can be leveraged to certify and train for cost-sensitive robustness. Built upon the notion of cost-sensitive certified radius, we first illustrate how to adapt the standard certification algorithm of randomized smoothing to produce tight robustness certificates for any binary cost matrix, and then develop a robust training method to promote certified cost-sensitive robustness while maintaining the model's overall accuracy. Through extensive experiments on image benchmarks, we demonstrate the superiority of our proposed certification algorithm and training method under various cost-sensitive scenarios. Our implementation is available as open source code at: https://github.com/TrustMLRG/CS-RS.

Updated: 2024-05-30 09:37:30

标题: 经由随机平滑实现的可证明的鲁棒成本敏感学习

摘要: 我们研究了在成本敏感情景下对抗性扰动的稳健学习问题，其中不同类型的误分类可能造成的潜在危害被编码在成本矩阵中。现有方法要么是经验性的，不能保证稳健性，要么存在固有的可扩展性问题。在这项工作中，我们调查了随机平滑（一种可扩展的稳健性认证框架）是否可以用于认证和训练成本敏感的稳健性。基于成本敏感认证半径的概念，我们首先阐明了如何调整随机平滑的标准认证算法，以产生任意二元成本矩阵的紧密稳健性证书，然后开发了一种稳健训练方法，以提高认证的成本敏感稳健性，同时保持模型的整体准确性。通过在图像基准上进行大量实验，我们展示了我们提出的认证算法和训练方法在各种成本敏感情景下的优越性。我们的实现可作为开源代码在以下链接获取：https://github.com/TrustMLRG/CS-RS。

更新时间: 2024-05-30 09:37:30

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2310.08732v2

Learning from Random Demonstrations: Offline Reinforcement Learning with Importance-Sampled Diffusion Models

Generative models such as diffusion have been employed as world models in offline reinforcement learning to generate synthetic data for more effective learning. Existing work either generates diffusion models one-time prior to training or requires additional interaction data to update it. In this paper, we propose a novel approach for offline reinforcement learning with closed-loop policy evaluation and world-model adaptation. It iteratively leverages a guided diffusion world model to directly evaluate the offline target policy with actions drawn from it, and then performs an importance-sampled world model update to adaptively align the world model with the updated policy. We analyzed the performance of the proposed method and provided an upper bound on the return gap between our method and the real environment under an optimal policy. The result sheds light on various factors affecting learning performance. Evaluations in the D4RL environment show significant improvement over state-of-the-art baselines, especially when only random or medium-expertise demonstrations are available -- thus requiring improved alignment between the world model and offline policy evaluation.

Updated: 2024-05-30 09:34:31

标题: 从随机演示中学习：使用重要性采样扩散模型的离线强化学习

摘要: 生成模型（如扩散模型）已经被用作离线强化学习中的世界模型，以生成合成数据以实现更有效的学习。现有研究要么在训练之前一次性生成扩散模型，要么需要额外的交互数据来更新它。在本文中，我们提出了一种新颖的离线强化学习方法，其中包括闭环策略评估和世界模型适应。它迭代地利用引导扩散世界模型直接评估离线目标策略，其中的动作来自于它，并执行一个重要性采样的世界模型更新，以自适应地将世界模型与更新后的策略对齐。我们分析了所提出方法的性能，并给出了我们的方法与真实环境在最优策略下的回报差距的上限。结果揭示了影响学习性能的各种因素。在D4RL环境中的评估显示，与最先进的基线相比，明显改善，特别是当只有随机或中等熟练度的演示可用时，因此需要改进世界模型和离线策略评估之间的对齐。

更新时间: 2024-05-30 09:34:31

领域: cs.LG,cs.GT

下载: http://arxiv.org/abs/2405.19878v1

KNOW: A Real-World Ontology for Knowledge Capture with Large Language Models

We present KNOW--the Knowledge Navigator Ontology for the World--the first ontology designed to capture everyday knowledge to augment large language models (LLMs) in real-world generative AI use cases such as personal AI assistants. Our domain is human life, both its everyday concerns and its major milestones. We have limited the initial scope of the modeled concepts to only established human universals: spacetime (places, events) plus social (people, groups, organizations). The inclusion criteria for modeled concepts are pragmatic, beginning with universality and utility. We compare and contrast previous work such as Schema.org and Cyc--as well as attempts at a synthesis of knowledge graphs and language models--noting how LLMs already encode internally much of the commonsense tacit knowledge that took decades to capture in the Cyc project. We also make available code-generated software libraries for the 12 most popular programming languages, enabling the direct use of ontology concepts in software engineering. We emphasize simplicity and developer experience in promoting AI interoperability.

Updated: 2024-05-30 09:32:14

标题: 知识: 一种用于大型语言模型捕获知识的现实世界本体论

摘要: 我们提出了KNOW-世界知识导航本体论，这是第一个旨在捕捉日常知识以增强大型语言模型（LLMs）的本体论，用于真实世界生成式人工智能应用案例，例如个人人工智能助手。我们的领域是人类生活，包括日常关注和重要里程碑。我们将建模概念的初始范围限制在已建立的人类普遍性上：时空（地点、事件）加上社会（人、群体、组织）。建模概念的包含标准是务实的，从普遍性和实用性开始。我们比较和对比了以前的工作，如Schema.org和Cyc，以及尝试将知识图和语言模型综合起来的尝试，指出LLMs已经在内部编码了花费几十年时间才在Cyc项目中捕捉的常识隐性知识。我们还提供了针对12种最流行的编程语言的代码生成软件库，使得可以直接在软件工程中使用本体论概念。我们强调在推广人工智能互操作性时的简单性和开发人员体验。

更新时间: 2024-05-30 09:32:14

领域: cs.AI,cs.CL,I.2.4; I.2.7

下载: http://arxiv.org/abs/2405.19877v1

Is In-Context Learning Sufficient for Instruction Following in LLMs?

In-context learning (ICL) allows LLMs to learn from examples without changing their weights, which is a particularly promising capability for long-context LLMs that can potentially learn from many examples. Recently, Lin et al. (2024) proposed URIAL, a method using only three in-context examples to align base LLMs, achieving non-trivial instruction following performance. In this work, we show that, while effective, ICL alignment with URIAL still underperforms compared to instruction fine-tuning on established benchmarks such as MT-Bench and AlpacaEval 2.0 (LC), especially with more capable base LMs. Unlike for tasks such as classification, translation, or summarization, adding more ICL demonstrations for long-context LLMs does not systematically improve instruction following performance. To address this limitation, we derive a greedy selection approach for ICL examples that noticeably improves performance, yet without bridging the gap to instruction fine-tuning. Finally, we provide a series of ablation studies to better understand the reasons behind the remaining gap, and we show how some aspects of ICL depart from the existing knowledge and are specific to the instruction tuning setting. Overall, our work advances the understanding of ICL as an alignment technique. We provide our code at https://github.com/tml-epfl/icl-alignment.

Updated: 2024-05-30 09:28:56

标题: 在LLMs中，上下文学习是否足以支持指令遵循？

摘要: 在上下文学习（ICL）中，LLMs可以从示例中学习而不改变它们的权重，这对于潜在可以从许多示例中学习的长上下文LLMs来说是一种特别有前景的能力。最近，Lin等人（2024）提出了URIAL，这是一种仅使用三个上下文示例来对齐基本LLMs的方法，实现了非平凡的指令遵循性能。在这项工作中，我们发现，虽然有效，与在已建立的基准测试中进行指令微调（如MT-Bench和AlpacaEval 2.0（LC））相比，ICL与URIAL的对齐仍未达到预期表现，特别是在具有更强大基本LMs的情况下。与分类、翻译或摘要等任务不同，为长上下文LLMs添加更多ICL示例并不会系统地提高指令遵循性能。为了解决这一局限性，我们提出了一种贪婪选择方法，用于选择ICL示例，显著提高了性能，但仍未填补与指令微调之间的差距。最后，我们进行了一系列消融研究，以更好地理解剩余差距的原因，并展示ICL的一些方面与现有知识不同，特定于指令微调设置。总的来说，我们的工作推动了ICL作为一种对齐技术的理解。我们将我们的代码提供在https://github.com/tml-epfl/icl-alignment。

更新时间: 2024-05-30 09:28:56

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19874v1

Don't Get Hijacked: Prevalence, Mitigation, and Impact of Non-Secure DNS Dynamic Updates

DNS dynamic updates represent an inherently vulnerable mechanism deliberately granting the potential for any host to dynamically modify DNS zone files. Consequently, this feature exposes domains to various security risks such as domain hijacking, compromise of domain control validation, and man-in-the-middle attacks. Originally devised without the implementation of authentication mechanisms, non-secure DNS updates were widely adopted in DNS software, subsequently leaving domains susceptible to a novel form of attack termed zone poisoning. In order to gauge the extent of this issue, our analysis encompassed over 353 million domain names, revealing the presence of 381,965 domains that openly accepted unsolicited DNS updates. We then undertook a comprehensive three-phase campaign involving the notification of Computer Security Incident Response Teams (CSIRTs). Following extensive discussions spanning six months, we observed substantial remediation, with nearly 54\% of nameservers and 98% of vulnerable domains addressing the issue. This outcome serves as evidence that engaging with CSIRTs can prove to be an effective approach for reporting security vulnerabilities. Moreover, our notifications had a lasting impact, as evidenced by the sustained low prevalence of vulnerable domains.

Updated: 2024-05-30 09:23:53

标题: 不要被劫持：非安全DNS动态更新的普及率、缓解和影响

摘要: DNS动态更新代表了一个固有脆弱的机制，故意授予任何主机动态修改DNS区域文件的潜力。因此，这一特性使域面临各种安全风险，例如域劫持、域控制验证的妥协以及中间人攻击。最初设计时未实施认证机制，非安全的DNS更新被广泛采用在DNS软件中，导致域易受一种称为区域污染的新型攻击形式的威胁。为了评估这个问题的程度，我们的分析涵盖了超过3.53亿个域名，发现有381,965个域名公开接受不请自来的DNS更新。然后，我们进行了一项全面的三阶段行动，涉及通知计算机安全事件响应团队（CSIRTs）。经过长达六个月的广泛讨论，我们观察到大量的修正措施，近54％的名称服务器和98％的脆弱域名解决了这个问题。这一结果证明与CSIRTs合作可以成为报告安全漏洞的有效方法。此外，我们的通知产生了持久的影响，如脆弱域名的持续低普及率所证明的那样。

更新时间: 2024-05-30 09:23:53

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2405.19871v1

On Vessel Location Forecasting and the Effect of Federated Learning

The wide spread of Automatic Identification System (AIS) has motivated several maritime analytics operations. Vessel Location Forecasting (VLF) is one of the most critical operations for maritime awareness. However, accurate VLF is a challenging problem due to the complexity and dynamic nature of maritime traffic conditions. Furthermore, as privacy concerns and restrictions have grown, training data has become increasingly fragmented, resulting in dispersed databases of several isolated data silos among different organizations, which in turn decreases the quality of learning models. In this paper, we propose an efficient VLF solution based on LSTM neural networks, in two variants, namely Nautilus and FedNautilus for the centralized and the federated learning approach, respectively. We also demonstrate the superiority of the centralized approach with respect to current state of the art and discuss the advantages and disadvantages of the federated against the centralized approach.

Updated: 2024-05-30 09:23:48

标题: 关于船只位置预测和联邦学习效果的研究

摘要: Automatic Identification System（AIS）的广泛应用推动了多个海事分析操作的发展。船只位置预测（VLF）是海事意识中最关键的操作之一。然而，由于海事交通条件的复杂性和动态性，准确的VLF是一个具有挑战性的问题。此外，随着隐私问题和限制的增加，训练数据变得越来越分散，导致不同组织之间存在多个孤立数据孤岛的分散数据库，进而降低了学习模型的质量。在本文中，我们提出了一种基于LSTM神经网络的高效VLF解决方案，分为两个变体，即Nautilus和FedNautilus，分别用于中心化和联邦学习方法。我们还展示了中心化方法相对于当前技术水平的优越性，并讨论了联邦与中心化方法之间的优缺点。

更新时间: 2024-05-30 09:23:48

领域: cs.LG

下载: http://arxiv.org/abs/2405.19870v1

Elastic Feature Consolidation for Cold Start Exemplar-Free Incremental Learning

Exemplar-Free Class Incremental Learning (EFCIL) aims to learn from a sequence of tasks without having access to previous task data. In this paper, we consider the challenging Cold Start scenario in which insufficient data is available in the first task to learn a high-quality backbone. This is especially challenging for EFCIL since it requires high plasticity, which results in feature drift which is difficult to compensate for in the exemplar-free setting. To address this problem, we propose a simple and effective approach that consolidates feature representations by regularizing drift in directions highly relevant to previous tasks and employs prototypes to reduce task-recency bias. Our method, called Elastic Feature Consolidation (EFC), exploits a tractable second-order approximation of feature drift based on an Empirical Feature Matrix (EFM). The EFM induces a pseudo-metric in feature space which we use to regularize feature drift in important directions and to update Gaussian prototypes used in a novel asymmetric cross entropy loss which effectively balances prototype rehearsal with data from new tasks. Experimental results on CIFAR-100, Tiny-ImageNet, ImageNet-Subset and ImageNet-1K demonstrate that Elastic Feature Consolidation is better able to learn new tasks by maintaining model plasticity and significantly outperform the state-of-the-art.

Updated: 2024-05-30 09:15:06

标题: 弹性特征整合用于冷启动非样本增量学习

摘要: Exemplar-Free Class Incremental Learning (EFCIL)旨在从一系列任务中学习，而无需访问先前任务的数据。在本文中，我们考虑了具有挑战性的冷启动场景，在该场景中，第一个任务中的数据不足以学习高质量的骨干模型。这对于EFCIL来说尤为具有挑战性，因为它需要高度的可塑性，这会导致特征漂移，而在无样本情况下难以弥补。为了解决这个问题，我们提出了一种简单而有效的方法，通过规范在与先前任务高度相关的方向上的漂移来巩固特征表示，并利用原型来减少任务新近度偏差。我们的方法称为Elastic Feature Consolidation（EFC），利用基于经验特征矩阵（EFM）的可处理的二阶近似特征漂移。EFM在特征空间中引入了一个伪度量，我们使用它来规范重要方向上的特征漂移，并更新高斯原型，用于一种新颖的不对称交叉熵损失，有效地平衡原型的复习和来自新任务的数据。在CIFAR-100、Tiny-ImageNet、ImageNet-Subset和ImageNet-1K上的实验结果表明，Elastic Feature Consolidation能够更好地学习新任务，通过保持模型的可塑性并显著优于现有技术水平。

更新时间: 2024-05-30 09:15:06

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.03917v3

Out-of-distribution Reject Option Method for Dataset Shift Problem in Early Disease Onset Prediction

Machine learning is increasingly used to predict lifestyle-related disease onset using health and medical data. However, the prediction effectiveness is hindered by dataset shift, which involves discrepancies in data distribution between the training and testing datasets, misclassifying out-of-distribution (OOD) data. To diminish dataset shift effects, this paper proposes the out-of-distribution reject option for prediction (ODROP), which integrates OOD detection models to preclude OOD data from the prediction phase. We investigated the efficacy of five OOD detection methods (variational autoencoder, neural network ensemble std, neural network ensemble epistemic, neural network energy, and neural network gaussian mixture based energy measurement) across two datasets, the Hirosaki and Wakayama health checkup data, in the context of three disease onset prediction tasks: diabetes, dyslipidemia, and hypertension. To evaluate the ODROP method, we trained disease onset prediction models and OOD detection models on Hirosaki data and used AUROC-rejection curve plots from Wakayama data. The variational autoencoder method showed superior stability and magnitude of improvement in Area Under the Receiver Operating Curve (AUROC) in five cases: AUROC in the Wakayama data was improved from 0.80 to 0.90 at a 31.1% rejection rate for diabetes onset and from 0.70 to 0.76 at a 34% rejection rate for dyslipidemia. We categorized dataset shifts into two types using SHAP clustering - those that considerably affect predictions and those that do not. We expect that this classification will help standardize measuring instruments. This study is the first to apply OOD detection to actual health and medical data, demonstrating its potential to substantially improve the accuracy and reliability of disease prediction models amidst dataset shift.

Updated: 2024-05-30 09:14:01

标题: 早期疾病发作预测中的数据集转移问题的分布外拒绝选项方法

摘要: 机器学习越来越被用于预测与生活方式相关的疾病发作，利用健康和医疗数据。然而，数据集转移阻碍了预测的有效性，这涉及到训练集和测试集之间数据分布的差异，误将未知分布（OOD）数据误分类。为了减少数据集转移效应，本文提出了用于预测的未知分布拒绝选项（ODROP），该方法整合了OOD检测模型，以在预测阶段排除OOD数据。我们在两个数据集（弘前和和歌山健康检查数据）的背景下，研究了五种OOD检测方法（变分自编码器、神经网络集成标准差、神经网络集成认知、神经网络能量、神经网络高斯混合能量测量）在三种疾病发作预测任务中的有效性：糖尿病、血脂异常和高血压。为了评估ODROP方法，我们在弘前数据上训练了疾病发作预测模型和OOD检测模型，并使用了来自和歌山数据的AUROC拒绝曲线图。变分自编码器方法在五种情况下表现出更高的稳定性和改进幅度，即糖尿病发作时，在和歌山数据中，AUROC从0.80提高到0.90，拒绝率为31.1%，血脂异常发作时，从0.70提高到0.76，拒绝率为34%。我们使用SHAP聚类将数据集转移分为两种类型 - 那些显著影响预测的和那些不影响的。我们期望这种分类将有助于标准化测量仪器。这项研究是第一个将OOD检测应用于实际的健康和医疗数据中，展示了在数据集转移中显著提高疾病预测模型的准确性和可靠性的潜力。

更新时间: 2024-05-30 09:14:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19864v1

Stochastic Gradient Descent-like relaxation is equivalent to Metropolis dynamics in discrete optimization and inference problems

Is Stochastic Gradient Descent (SGD) substantially different from Metropolis Monte Carlo dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Metropolis Monte Carlo with a properly chosen temperature, which depends on the mini-batch size. This quantitative matching holds both at equilibrium and in the out-of-equilibrium regime, despite the two algorithms having fundamental differences (e.g.\ SGD does not satisfy detailed balance). Such equivalence allows us to use results about performances and limits of Monte Carlo algorithms to optimize the mini-batch size in the SGD-like algorithm and make it efficient at recovering the signal in hard inference problems.

Updated: 2024-05-30 09:11:51

标题: 随机梯度下降式松弛等价于Metropolis动力学在离散优化和推理问题中

摘要: 随机梯度下降（SGD）与Metropolis Monte Carlo动态是否存在明显差异？这是在理解机器学习领域中最常用的训练算法时的一个基本问题，但直到现在都没有得到答案。在这里，我们展示了在离散优化和推理问题中，类似SGD算法的动态与选择适当温度的Metropolis Monte Carlo非常相似，该温度取决于小批量大小。这种定量匹配在平衡和非平衡状态下都成立，尽管两种算法有根本区别（例如SGD不满足详细平衡）。这种等价性使我们能够利用有关Monte Carlo算法性能和限制的结果，来优化SGD-like算法中的小批量大小，并使其在难以推理问题中高效地恢复信号。

更新时间: 2024-05-30 09:11:51

领域: cond-mat.dis-nn,cond-mat.stat-mech,cs.LG

下载: http://arxiv.org/abs/2309.05337v2

Hierarchical Object-Centric Learning with Capsule Networks

Capsule networks (CapsNets) were introduced to address convolutional neural networks limitations, learning object-centric representations that are more robust, pose-aware, and interpretable. They organize neurons into groups called capsules, where each capsule encodes the instantiation parameters of an object or one of its parts. Moreover, a routing algorithm connects capsules in different layers, thereby capturing hierarchical part-whole relationships in the data. This thesis investigates the intriguing aspects of CapsNets and focuses on three key questions to unlock their full potential. First, we explore the effectiveness of the routing algorithm, particularly in small-sized networks. We propose a novel method that anneals the number of routing iterations during training, enhancing performance in architectures with fewer parameters. Secondly, we investigate methods to extract more effective first-layer capsules, also known as primary capsules. By exploiting pruned backbones, we aim to improve computational efficiency by reducing the number of capsules while achieving high generalization. This approach reduces CapsNets memory requirements and computational effort. Third, we explore part-relationship learning in CapsNets. Through extensive research, we demonstrate that capsules with low entropy can extract more concise and discriminative part-whole relationships compared to traditional capsule networks, even with reasonable network sizes. Lastly, we showcase how CapsNets can be utilized in real-world applications, including autonomous localization of unmanned aerial vehicles, quaternion-based rotations prediction in synthetic datasets, and lung nodule segmentation in biomedical imaging. The findings presented in this thesis contribute to a deeper understanding of CapsNets and highlight their potential to address complex computer vision challenges.

Updated: 2024-05-30 09:10:33

标题: 基于胶囊网络的分层对象中心学习

摘要: 胶囊网络（CapsNets）被引入以解决卷积神经网络的局限性，学习更加稳健、姿态感知和可解释的对象中心表示。它们将神经元组织成被称为胶囊的组群，其中每个胶囊编码对象或其部分的实例化参数。此外，一种路由算法连接不同层的胶囊，从而捕捉数据中的分层整体关系。本文研究胶囊网络的有趣方面，并关注解锁其全部潜力的三个关键问题。首先，我们探讨路由算法的有效性，特别是在规模较小的网络中。我们提出了一种新颖的方法，在训练过程中退火路由迭代次数，提高了在参数较少的架构中的性能。其次，我们研究提取更有效的第一层胶囊，也称为主要胶囊的方法。通过利用经过修剪的骨干，我们旨在通过减少胶囊数量来提高计算效率，同时实现高泛化。这种方法减少了CapsNets的存储需求和计算工作量。第三，我们探索胶囊网络中的部分关系学习。通过广泛研究，我们证明熵较低的胶囊可以提取比传统胶囊网络更简明和有区别的部分整体关系，即使网络规模合理。最后，我们展示了如何在现实应用中利用CapsNets，包括无人机的自主定位、合成数据集中基于四元数的旋转预测以及生物医学成像中的肺结节分割。本文提出的研究结果有助于更深入地理解CapsNets，并突出它们解决复杂计算机视觉挑战的潜力。

更新时间: 2024-05-30 09:10:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19861v1

Resilience of Deep Learning applications: a systematic literature review of analysis and hardening techniques

Machine Learning (ML) is currently being exploited in numerous applications being one of the most effective Artificial Intelligence (AI) technologies, used in diverse fields, such as vision, autonomous systems, and alike. The trend motivated a significant amount of contributions to the analysis and design of ML applications against faults affecting the underlying hardware. The authors investigate the existing body of knowledge on Deep Learning (among ML techniques) resilience against hardware faults systematically through a thoughtful review in which the strengths and weaknesses of this literature stream are presented clearly and then future avenues of research are set out. The review is based on 220 scientific articles published between January 2019 and March 2024. The authors adopt a classifying framework to interpret and highlight research similarities and peculiarities, based on several parameters, starting from the main scope of the work, the adopted fault and error models, to their reproducibility. This framework allows for a comparison of the different solutions and the identification of possible synergies. Furthermore, suggestions concerning the future direction of research are proposed in the form of open challenges to be addressed.

Updated: 2024-05-30 09:02:53

标题: 深度学习应用的韧性：分析和强化技术的系统文献综述

摘要: 机器学习（ML）目前正在许多应用中被利用，是最有效的人工智能（AI）技术之一，被用于视觉、自主系统等各个领域。这一趋势激发了大量关于分析和设计ML应用程序的贡献，以应对影响底层硬件的故障。作者通过系统性地审查现有关于深度学习（ML技术之一）对抗硬件故障的知识库，清晰地呈现了这一文献流的优势和劣势，并提出了未来的研究方向。审查基于2019年1月至2024年3月间发表的220篇科学文章。作者采用分类框架解释和突出研究的相似性和特殊性，基于多个参数，从工作的主要范围、采用的故障和错误模型到其可重现性。这一框架允许比较不同解决方案，并确定可能的协同作用。此外，提出了关于未来研究方向的建议，以开放性挑战的形式加以解决。

更新时间: 2024-05-30 09:02:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2309.16733v2

Guardians of DNS Integrity: A Remote Method for Identifying DNSSEC Validators Across the Internet

DNS Security Extensions (DNSSEC) provide the most effective way to fight DNS cache poisoning attacks. Yet, very few DNS resolvers perform DNSSEC validation. Identifying such systems is non-trivial and the existing methods are not suitable for Internet-scale measurements. In this paper, we propose a novel remote technique for identifying DNSSEC-validating resolvers. The proposed method consists of two steps. In the first step, we identify open resolvers by scanning 3.1 billion end hosts and request every non-forwarder to resolve one correct and seven deliberately misconfigured domains. We then build a classifier that discriminates validators from non-validators based on query patterns and DNS response codes. We find that while most open resolvers are DNSSEC-enabled, less than 18% in IPv4 (38% in IPv6) validate received responses. In the second step, we remotely identify closed non-forwarders in networks that do not have inbound Source Address Validation (SAV) in place. Using the classifier built in step one, we identify 37.4% IPv4 (42.9% IPv6) closed DNSSEC validators and cross-validate the results using RIPE Atlas probes. Finally, we show that the discovered (non)-validators actively send requests to DNS root servers, suggesting that we deal with operational recursive resolvers rather than misconfigured machines.

Updated: 2024-05-30 08:58:18

标题: DNS完整性的守护者：一种远程方法，用于在互联网上识别DNSSEC验证器

摘要: DNS安全扩展（DNSSEC）提供了对抗DNS缓存污染攻击的最有效方法。然而，很少有DNS解析器执行DNSSEC验证。识别这类系统并非易事，现有的方法也不适用于互联网规模的测量。在本文中，我们提出了一种新颖的远程技术，用于识别执行DNSSEC验证的解析器。所提出的方法包括两个步骤。在第一步中，我们通过扫描31亿个终端主机，请求每个非转发器解析一个正确和七个故意配置错误的域名来识别开放解析器。然后我们建立了一个分类器，根据查询模式和DNS响应代码来区分验证器和非验证器。我们发现，虽然大多数开放解析器都启用了DNSSEC，但IPv4中接收响应的验证器不到18%（IPv6中为38%）。在第二步中，我们远程识别没有入站源地址验证（SAV）的网络中的关闭非转发器。利用第一步中构建的分类器，我们识别了37.4%的IPv4（42.9%的IPv6）关闭DNSSEC验证器，并使用RIPE Atlas探针交叉验证结果。最后，我们展示了发现的（非）验证器主动向DNS根服务器发送请求，这表明我们正在处理运行中的递归解析器，而不是配置错误的机器。

更新时间: 2024-05-30 08:58:18

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2405.19851v1

Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models

Understanding human mobility patterns is essential for various applications, from urban planning to public safety. The individual trajectory such as mobile phone location data, while rich in spatio-temporal information, often lacks semantic detail, limiting its utility for in-depth mobility analysis. Existing methods can infer basic routine activity sequences from this data, lacking depth in understanding complex human behaviors and users' characteristics. Additionally, they struggle with the dependency on hard-to-obtain auxiliary datasets like travel surveys. To address these limitations, this paper defines trajectory semantic inference through three key dimensions: user occupation category, activity sequence, and trajectory description, and proposes the Trajectory Semantic Inference with Large Language Models (TSI-LLM) framework to leverage LLMs infer trajectory semantics comprehensively and deeply. We adopt spatio-temporal attributes enhanced data formatting (STFormat) and design a context-inclusive prompt, enabling LLMs to more effectively interpret and infer the semantics of trajectory data. Experimental validation on real-world trajectory datasets demonstrates the efficacy of TSI-LLM in deciphering complex human mobility patterns. This study explores the potential of LLMs in enhancing the semantic analysis of trajectory data, paving the way for more sophisticated and accessible human mobility research.

Updated: 2024-05-30 08:55:48

标题: 破译人类移动性：利用大型语言模型推断轨迹的语义

摘要: 理解人类移动模式对于各种应用至关重要，从城市规划到公共安全。个体轨迹，如手机定位数据，虽然富含时空信息，但通常缺乏语义细节，限制了其用于深入移动性分析的效用。现有方法可以从这些数据中推断基本例行活动序列，但缺乏对复杂人类行为和用户特征的深入理解。此外，它们在依赖难以获取的辅助数据集（如旅行调查）方面存在困难。为了解决这些限制，本文通过三个关键维度定义了轨迹语义推断：用户职业类别、活动序列和轨迹描述，并提出了Trajectory Semantic Inference with Large Language Models (TSI-LLM)框架，利用LLM全面深入地推断轨迹语义。我们采用增强时空属性的数据格式化（STFormat）并设计了一个包含上下文的提示，使LLM能够更有效地解释和推断轨迹数据的语义。在真实世界的轨迹数据集上的实验证实了TSI-LLM在解读复杂人类移动模式方面的有效性。本研究探讨了LLM在增强轨迹数据语义分析方面的潜力，为更复杂和可访问的人类移动性研究铺平了道路。

更新时间: 2024-05-30 08:55:48

领域: cs.AI

下载: http://arxiv.org/abs/2405.19850v1

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity. However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature of real-world computer use, thereby limiting the scope of tasks and agent scalability. To address this issue, we introduce OSWorld, the first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating systems such as Ubuntu, Windows, and macOS. OSWorld can serve as a unified, integrated computer environment for assessing open-ended computer tasks that involve arbitrary applications. Building upon OSWorld, we create a benchmark of 369 computer tasks involving real web and desktop apps in open domains, OS file I/O, and workflows spanning multiple applications. Each task example is derived from real-world computer use cases and includes a detailed initial state setup configuration and a custom execution-based evaluation script for reliable, reproducible evaluation. Extensive evaluation of state-of-the-art LLM/VLM-based agents on OSWorld reveals significant deficiencies in their ability to serve as computer assistants. While humans can accomplish over 72.36% of the tasks, the best model achieves only 12.24% success, primarily struggling with GUI grounding and operational knowledge. Comprehensive analysis using OSWorld provides valuable insights for developing multimodal generalist agents that were not possible with previous benchmarks. Our code, environment, baseline models, and data are publicly available at https://os-world.github.io.

Updated: 2024-05-30 08:55:12

标题: OSWorld：在真实计算机环境中对多模态代理进行开放式任务基准测试

摘要: 自主代理程序可以在最少人为干预的情况下完成复杂的计算机任务，有潜力改变人机交互，显著提高可访问性和生产力。然而，现有的基准要么缺乏交互环境，要么仅限于特定应用程序或领域的环境，未能反映现实世界计算机使用的多样性和复杂性，从而限制了任务范围和代理的可扩展性。为了解决这个问题，我们引入了OSWorld，这是第一个可扩展的真实计算机环境，适用于多模态代理，支持任务设置、基于执行的评估以及跨Ubuntu、Windows和macOS等各种操作系统的交互式学习。OSWorld可以作为一个统一的、集成的计算机环境，用于评估涉及任意应用程序的开放式计算机任务。在OSWorld的基础上，我们创建了一个包含369个计算机任务的基准，涉及开放领域的真实网络和桌面应用程序、操作系统文件I/O以及跨多个应用的工作流。每个任务示例都源自真实世界的计算机使用案例，包括详细的初始状态设置配置和一个定制的基于执行的评估脚本，用于可靠、可重复的评估。在OSWorld上对最先进的LLM/VLM代理进行广泛评估揭示了它们作为计算机助手的显著不足。虽然人类可以完成超过72.36%的任务，但最佳模型只能实现12.24%的成功率，主要困扰于GUI基础和操作知识。使用OSWorld进行全面分析为开发以往基准无法实现的多模态通用代理提供了宝贵的见解。我们的代码、环境、基准模型和数据均可在https://os-world.github.io上公开获取。

更新时间: 2024-05-30 08:55:12

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.07972v2

Position: Topological Deep Learning is the New Frontier for Relational Learning

Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning settings. To this end, this paper discusses open problems in TDL, ranging from practical benefits to theoretical foundations. For each problem, it outlines potential solutions and future research opportunities. At the same time, this paper serves as an invitation to the scientific community to actively participate in TDL research to unlock the potential of this emerging field.

Updated: 2024-05-30 08:52:56

标题: 职位：拓扑深度学习是关系学习的新前沿

摘要: 拓扑深度学习（TDL）是一个快速发展的领域，利用拓扑特征来理解和设计深度学习模型。本文认为TDL是关系学习的新前沿。TDL可以通过融入拓扑概念来补充图表示学习和几何深度学习，因此可以为各种机器学习场景提供自然选择。为此，本文讨论了TDL中的开放问题，涵盖了从实际收益到理论基础的范围。对于每个问题，它概述了潜在的解决方案和未来的研究机会。同时，本文也是对科学界的邀请，积极参与TDL研究，以释放这一新兴领域的潜力。

更新时间: 2024-05-30 08:52:56

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.08871v2

Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model

Large language models, initially pre-trained with a limited context length, can better handle longer texts by continuing training on a corpus with extended contexts. However, obtaining effective long-context data is challenging due to the scarcity and uneven distribution of long documents across different domains. To address this issue, we propose a Query-centric data synthesis method, abbreviated as Quest. Quest is an interpretable method based on the observation that documents retrieved by similar queries are relevant but low-redundant, thus well-suited for synthesizing long-context data. The method is also scalable and capable of constructing large amounts of long-context data. Using Quest, we synthesize a long-context dataset up to 128k context length, significantly outperforming other data synthesis methods on multiple long-context benchmark datasets. In addition, we further verify that the Quest method is predictable through scaling law experiments, making it a reliable solution for advancing long-context models.

Updated: 2024-05-30 08:50:55

标题: 探索：用于大型语言模型长文本规模的查询中心数据综合方法

摘要: 大型语言模型最初在有限上下文长度的情况下进行预训练，通过在具有扩展上下文的语料库上继续训练，可以更好地处理更长的文本。然而，由于不同领域中长文档的稀缺性和不均匀分布，获得有效的长上下文数据具有挑战性。为了解决这个问题，我们提出了一种基于查询的数据合成方法，缩写为Quest。Quest是一种可解释的方法，基于一个观察：由类似查询检索到的文档是相关但低冗余的，因此非常适合合成长上下文数据。该方法还具有可扩展性，并能够构建大量的长上下文数据。使用Quest，我们合成了一个长上下文数据集，长度高达128k，显著优于多个长上下文基准数据集上的其他数据合成方法。此外，我们通过缩放定律实验证实了Quest方法是可预测的，使其成为推进长上下文模型的可靠解决方案。

更新时间: 2024-05-30 08:50:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19846v1

Improve Student's Reasoning Generalizability through Cascading Decomposed CoTs Distillation

Large language models (LLMs) exhibit enhanced reasoning at larger scales, driving efforts to distill these capabilities into smaller models via teacher-student learning. Previous works simply fine-tune student models on teachers' generated Chain-of-Thoughts (CoTs) data. Although these methods enhance in-domain (IND) reasoning performance, they struggle to generalize to out-of-domain (OOD) tasks. We believe that the widespread spurious correlations between questions and answers may lead the model to preset a specific answer which restricts the diversity and generalizability of its reasoning process. In this paper, we propose Cascading Decomposed CoTs Distillation (CasCoD) to address these issues by decomposing the traditional single-step learning process into two cascaded learning steps. Specifically, by restructuring the training objectives -- removing the answer from outputs and concatenating the question with the rationale as input -- CasCoD's two-step learning process ensures that students focus on learning rationales without interference from the preset answers, thus improving reasoning generalizability. Extensive experiments demonstrate the effectiveness of CasCoD on both IND and OOD benchmark reasoning datasets. Code can be found at https://github.com/C-W-D/CasCoD.

Updated: 2024-05-30 08:49:34

标题: 通过级联分解的CoTs（Competencies of Thinking）蒸馏提高学生的推理泛化能力

摘要: 大型语言模型(LLMs)在较大规模上展现出增强的推理能力，推动了通过师生学习将这些能力蒸馏到较小模型的努力。先前的研究简单地在老师生成的思维链(CoTs)数据上微调学生模型。尽管这些方法增强了领域内(IND)的推理性能，但它们在推广到领域外(OOD)任务时遇到困难。我们认为问题和答案之间的广泛虚假相关可能会导致模型预设特定答案，从而限制其推理过程的多样性和一般性。在本文中，我们提出了级联分解CoTs蒸馏(CasCoD)来解决这些问题，通过将传统的单步学习过程分解为两个级联学习步骤。具体来说，通过重组训练目标--将答案从输出中删除，并将问题与理由连接作为输入--CasCoD的两步学习过程确保学生专注于学习理由，而不受预设答案的干扰，从而提高推理的一般化能力。大量实验证明了CasCoD在IND和OOD基准推理数据集上的有效性。代码可在https://github.com/C-W-D/CasCoD 上找到。

更新时间: 2024-05-30 08:49:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19842v1

Lifelong learning challenges in the era of artificial intelligence: a computational thinking perspective

The rapid advancement of artificial intelligence (AI) has brought significant challenges to the education and workforce skills required to take advantage of AI for human-AI collaboration in the workplace. As AI continues to reshape industries and job markets, the need to define how AI literacy can be considered in lifelong learning has become increasingly critical (Cetindamar et al., 2022; Laupichler et al., 2022; Romero et al., 2023). Like any new technology, AI is the subject of both hopes and fears, and what it entails today presents major challenges (Cugurullo \& Acheampong, 2023; Villani et al., 2018). It also raises profound questions about our own humanity. Will the machine surpass the intelligence of the humans who designed it? What will be the relationship between so-called AI and our human intelligences? How could human-AI collaboration be regulated in a way that serves the Sustainable Development Goals (SDGs)? This paper provides a review of the challenges of lifelong learning in the era of AI from a computational thinking, critical thinking, and creative competencies perspective, highlighting the implications for management and leadership in organizations.

Updated: 2024-05-30 08:46:11

标题: 人工智能时代的终身学习挑战：计算思维视角

摘要: 人工智能（AI）的快速发展给教育和职场技能带来了重大挑战，这些技能是利用AI进行人机协作所必需的。随着AI持续重塑行业和就业市场，如何将AI素养纳入终身学习已成为日益关键的问题。与任何新技术一样，AI既带来希望又引发恐惧，其今天的含义也带来重大挑战。AI也引发了对我们人类本质深刻的问题。机器是否会超越设计它的人类的智慧？所谓AI与我们的人类智慧之间会有怎样的关系？人机协作如何在符合可持续发展目标（SDGs）的前提下得到规范？本文从计算思维、批判思维和创造性能力的角度对AI时代终身学习的挑战进行了回顾，突出了对组织中管理和领导的影响。

更新时间: 2024-05-30 08:46:11

领域: cs.AI

下载: http://arxiv.org/abs/2405.19837v1

The Merit of River Network Topology for Neural Flood Forecasting

Climate change exacerbates riverine floods, which occur with higher frequency and intensity than ever. The much-needed forecasting systems typically rely on accurate river discharge predictions. To this end, the SOTA data-driven approaches treat forecasting at spatially distributed gauge stations as isolated problems, even within the same river network. However, incorporating the known topology of the river network into the prediction model has the potential to leverage the adjacency relationship between gauges. Thus, we model river discharge for a network of gauging stations with GNNs and compare the forecasting performance achieved by different adjacency definitions. Our results show that the model fails to benefit from the river network topology information, both on the entire network and small subgraphs. The learned edge weights correlate with neither of the static definitions and exhibit no regular pattern. Furthermore, the GNNs struggle to predict sudden, narrow discharge spikes. Our work hints at a more general underlying phenomenon of neural prediction not always benefitting from graphical structure and may inspire a systematic study of the conditions under which this happens.

Updated: 2024-05-30 08:45:45

标题: 河流网络拓扑结构在神经网络洪水预测中的优势

摘要: 气候变化加剧了河流洪水，其发生频率和强度比以往任何时候都更高。急需的预测系统通常依赖准确的河流流量预测。为此，SOTA数据驱动方法将空间分布的测站的预测视为孤立问题，即使在同一河网内也是如此。然而，将已知的河网拓扑结构纳入预测模型有可能利用测站之间的邻接关系。因此，我们利用GNN模型对一组测站的河流流量进行建模，并比较不同邻接定义所实现的预测性能。我们的结果表明，模型无法从河流网拓扑信息中获益，无论是整个网络还是小子图。学习到的边权重与静态定义均不相关，也没有规律模式。此外，GNN模型难以预测突然的、狭窄的流量峰值。我们的研究暗示了一种更普遍的神经预测不总是从图结构中受益的现象，并可能激发对此发生条件的系统研究。

更新时间: 2024-05-30 08:45:45

领域: cs.LG

下载: http://arxiv.org/abs/2405.19836v1

AI Safety: A Climb To Armageddon?

This paper presents an argument that certain AI safety measures, rather than mitigating existential risk, may instead exacerbate it. Under certain key assumptions - the inevitability of AI failure, the expected correlation between an AI system's power at the point of failure and the severity of the resulting harm, and the tendency of safety measures to enable AI systems to become more powerful before failing - safety efforts have negative expected utility. The paper examines three response strategies: Optimism, Mitigation, and Holism. Each faces challenges stemming from intrinsic features of the AI safety landscape that we term Bottlenecking, the Perfection Barrier, and Equilibrium Fluctuation. The surprising robustness of the argument forces a re-examination of core assumptions around AI safety and points to several avenues for further research.

Updated: 2024-05-30 08:41:54

标题: 人工智能安全性：通往世界末日的攀登？

摘要: 本文提出了一个论点，即某些人工智能安全措施，与其减轻存续风险，可能反而加剧了存续风险。在某些关键假设下 - 人工智能失败的不可避免性，人工智能系统在失败时的能力与造成的伤害严重程度之间的预期相关性，以及安全措施倾向于使人工智能系统在失败之前变得更强大 - 安全工作具有负的期望效用。本文考察了三种应对策略：乐观主义、缓解和整体主义。每一种策略都面临着源自人工智能安全领域固有特征的挑战，我们称之为瓶颈效应、完美障碍和平衡波动。这一论点的出乎意料的稳健性迫使重新审视围绕人工智能安全的核心假设，并指出了进一步研究的几个途径。

更新时间: 2024-05-30 08:41:54

领域: cs.AI

下载: http://arxiv.org/abs/2405.19832v1

CICLe: Conformal In-Context Learning for Largescale Multi-Class Food Risk Classification

Contaminated or adulterated food poses a substantial risk to human health. Given sets of labeled web texts for training, Machine Learning and Natural Language Processing can be applied to automatically detect such risks. We publish a dataset of 7,546 short texts describing public food recall announcements. Each text is manually labeled, on two granularity levels (coarse and fine), for food products and hazards that the recall corresponds to. We describe the dataset and benchmark naive, traditional, and Transformer models. Based on our analysis, Logistic Regression based on a tf-idf representation outperforms RoBERTa and XLM-R on classes with low support. Finally, we discuss different prompting strategies and present an LLM-in-the-loop framework, based on Conformal Prediction, which boosts the performance of the base classifier while reducing energy consumption compared to normal prompting.

Updated: 2024-05-30 08:37:45

标题: CICLe：大规模多类食品风险分类的一致上下文学习

摘要: 受污染或掺假食品对人类健康构成重大风险。通过给定的带标签的网络文本集进行训练，机器学习和自然语言处理可以应用于自动检测这种风险。我们发布了一个包含7,546个描述公共食品召回公告的短文本数据集。每个文本都经过手动标记，在粗粒度和细粒度两个级别上，标记了召回对应的食品产品和危害。我们描述了数据集并对朴素、传统和Transformer模型进行了基准测试。根据我们的分析，基于tf-idf表示的逻辑回归在支持度较低的类别上优于RoBERTa和XLM-R。最后，我们讨论了不同的提示策略，并提出了一种基于Conformal Prediction的LLM-in-the-loop框架，该框架提升了基础分类器的性能，同时减少了与正常提示相比的能源消耗。

更新时间: 2024-05-30 08:37:45

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.11904v3

Joint Selective State Space Model and Detrending for Robust Time Series Anomaly Detection

Deep learning-based sequence models are extensively employed in Time Series Anomaly Detection (TSAD) tasks due to their effective sequential modeling capabilities. However, the ability of TSAD is limited by two key challenges: (i) the ability to model long-range dependency and (ii) the generalization issue in the presence of non-stationary data. To tackle these challenges, an anomaly detector that leverages the selective state space model known for its proficiency in capturing long-term dependencies across various domains is proposed. Additionally, a multi-stage detrending mechanism is introduced to mitigate the prominent trend component in non-stationary data to address the generalization issue. Extensive experiments conducted on realworld public datasets demonstrate that the proposed methods surpass all 12 compared baseline methods.

Updated: 2024-05-30 08:31:18

标题: 联合选择状态空间模型和去趋势化的稳健时间序列异常检测

摘要: 基于深度学习的序列模型被广泛应用于时间序列异常检测（TSAD）任务，因为它们具有有效的序列建模能力。然而，TSAD的能力受到两个关键挑战的限制：（i）模拟长程依赖的能力和（ii）在非平稳数据存在的情况下的泛化问题。为了解决这些挑战，提出了一种利用选择性状态空间模型的异常检测器，该模型以其在各个领域中捕捉长期依赖性的熟练程度而闻名。此外，引入了一个多阶段去趋势机制，以减轻非平稳数据中突出的趋势成分，以解决泛化问题。在真实世界公共数据集上进行的大量实验表明，所提出的方法优于所有12种基准方法。

更新时间: 2024-05-30 08:31:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19823v1

Improving Object Detector Training on Synthetic Data by Starting With a Strong Baseline Methodology

Collecting and annotating real-world data for the development of object detection models is a time-consuming and expensive process. In the military domain in particular, data collection can also be dangerous or infeasible. Training models on synthetic data may provide a solution for cases where access to real-world training data is restricted. However, bridging the reality gap between synthetic and real data remains a challenge. Existing methods usually build on top of baseline Convolutional Neural Network (CNN) models that have been shown to perform well when trained on real data, but have limited ability to perform well when trained on synthetic data. For example, some architectures allow for fine-tuning with the expectation of large quantities of training data and are prone to overfitting on synthetic data. Related work usually ignores various best practices from object detection on real data, e.g. by training on synthetic data from a single environment with relatively little variation. In this paper we propose a methodology for improving the performance of a pre-trained object detector when training on synthetic data. Our approach focuses on extracting the salient information from synthetic data without forgetting useful features learned from pre-training on real images. Based on the state of the art, we incorporate data augmentation methods and a Transformer backbone. Besides reaching relatively strong performance without any specialized synthetic data transfer methods, we show that our methods improve the state of the art on synthetic data trained object detection for the RarePlanes and DGTA-VisDrone datasets, and reach near-perfect performance on an in-house vehicle detection dataset.

Updated: 2024-05-30 08:31:01

标题: 通过以强基线方法论为起点改善合成数据上的目标检测器训练

摘要: 收集和注释用于开发目标检测模型的真实世界数据是一个耗时且昂贵的过程。特别是在军事领域，数据收集也可能是危险的或不可行的。在真实世界训练数据受限的情况下，使用合成数据训练模型可能提供解决方案。然而，合成数据和真实数据之间的现实差距仍然是一个挑战。现有方法通常建立在已经表现良好的基线卷积神经网络(CNN)模型的基础上，当在真实数据上训练时表现良好，但在合成数据上训练时能力有限。例如，一些架构允许用大量训练数据进行微调，并且容易在合成数据上过拟合。相关工作通常忽略了来自真实数据目标检测的各种最佳实践，例如通过在相对变化较小的单一环境中训练合成数据。在本文中，我们提出了一种在合成数据上训练时改善预训练目标检测器性能的方法论。我们的方法着重于从合成数据中提取显著信息，同时不忘记从真实图像预训练中学到的有用特征。基于最新技术，我们结合了数据增强方法和Transformer骨干网络。除了在没有专门的合成数据传输方法的情况下取得相对强大的性能外，我们还展示了我们的方法改善了RarePlanes和DGTA-VisDrone数据集的合成数据训练目标检测的最新技术，并在内部车辆检测数据集上实现了接近完美的性能。

更新时间: 2024-05-30 08:31:01

领域: cs.CV,cs.AI,cs.ET

下载: http://arxiv.org/abs/2405.19822v1

Diffusion Model Patching via Mixture-of-Prompts

We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters. DMP inserts a small, learnable set of prompts into the model's input space while keeping the original model frozen. The effectiveness of DMP is not merely due to the addition of parameters but stems from its dynamic gating mechanism, which selects and combines a subset of learnable prompts at every step of the generative process (e.g., reverse denoising steps). This strategy, which we term "mixture-of-prompts", enables the model to draw on the distinct expertise of each prompt, essentially "patching" the model's functionality at every step with minimal yet specialized parameters. Uniquely, DMP enhances the model by further training on the same dataset on which it was originally trained, even in a scenario where significant improvements are typically not expected due to model convergence. Experiments show that DMP significantly enhances the converged FID of DiT-L/2 on FFHQ 256x256 by 10.38%, achieved with only a 1.43% parameter increase and 50K additional training iterations.

Updated: 2024-05-30 08:28:32

标题: 扩散模型修补：通过提示混合

摘要: 我们提出了扩散模型修补（DMP）方法，这是一种简单的方法，可以在参数增加微不足道的情况下提高已经收敛的预训练扩散模型的性能。DMP在模型的输入空间中插入一组小型可学习的提示，同时保持原始模型冻结。DMP的有效性不仅仅是由于参数的增加，而是源于其动态门控机制，该机制在生成过程的每一步（例如，反向去噪步骤）中选择并组合一组可学习的提示。我们称之为“提示混合”，这种策略使模型能够利用每个提示的独特专业知识，从而在每一步上以最小但专门化的参数“修补”模型的功能。独特的是，DMP通过在原始训练集上进一步训练模型来增强模型，即使在通常不会预期模型收敛的情况下也能取得显著提升。实验表明，DMP将DiT-L/2在FFHQ 256x256上的收敛FID提高了10.38％，只增加了1.43％的参数，并进行了50K次额外训练迭代。

更新时间: 2024-05-30 08:28:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.17825v2

WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale Benchmark

Underwater object tracking (UOT) is a foundational task for identifying and tracing submerged entities in underwater video sequences. However, current UOT datasets suffer from limitations in scale, diversity of target categories and scenarios covered, hindering the training and evaluation of modern tracking algorithms. To bridge this gap, we take the first step and introduce WebUOT-1M, \ie, the largest public UOT benchmark to date, sourced from complex and realistic underwater environments. It comprises 1.1 million frames across 1,500 video clips filtered from 408 target categories, largely surpassing previous UOT datasets, \eg, UVOT400. Through meticulous manual annotation and verification, we provide high-quality bounding boxes for underwater targets. Additionally, WebUOT-1M includes language prompts for video sequences, expanding its application areas, \eg, underwater vision-language tracking. Most existing trackers are tailored for open-air environments, leading to performance degradation when applied to UOT due to domain gaps. Retraining and fine-tuning these trackers are challenging due to sample imbalances and limited real-world underwater datasets. To tackle these challenges, we propose a novel omni-knowledge distillation framework based on WebUOT-1M, incorporating various strategies to guide the learning of the student Transformer. To the best of our knowledge, this framework is the first to effectively transfer open-air domain knowledge to the UOT model through knowledge distillation, as demonstrated by results on both existing UOT datasets and the newly proposed WebUOT-1M. Furthermore, we comprehensively evaluate WebUOT-1M using 30 deep trackers, showcasing its value as a benchmark for UOT research by presenting new challenges and opportunities for future studies. The complete dataset, codes and tracking results, will be made publicly available.

Updated: 2024-05-30 08:25:21

标题: WebUOT-1M：利用百万级基准数据推进深海目标跟踪

摘要: 水下目标跟踪（UOT）是识别和追踪水下视频序列中的潜在实体的基础任务。然而，当前的UOT数据集存在规模、目标类别多样性和涵盖的场景等方面的限制，这些限制阻碍了现代跟踪算法的训练和评估。为了弥补这一差距，我们迈出第一步，介绍WebUOT-1M，即迄今为止最大的公共UOT基准测试，数据源自复杂和现实的水下环境。它包括来自408个目标类别的1,500个视频剪辑中的1.1百万帧，大大超过以往的UOT数据集，例如UVOT400。通过细致的手动注释和验证，我们为水下目标提供了高质量的边界框。此外，WebUOT-1M还包括视频序列的语言提示，扩展了其应用领域，例如水下视觉语言跟踪。大多数现有的跟踪器专为开放空间环境设计，因此在应用于UOT时会导致性能下降，因为存在领域差距。由于样本不平衡和有限的真实世界水下数据集，重新训练和微调这些跟踪器具有挑战性。为了解决这些挑战，我们提出了一种基于WebUOT-1M的新型全知识蒸馏框架，结合各种策略来指导学生Transformer的学习。据我们所知，这个框架是第一个通过知识蒸馏有效地将开放空气领域知识转移到UOT模型的框架，这一点在现有UOT数据集和新提出的WebUOT-1M上的结果证明了。此外，我们全面评估了WebUOT-1M，使用30个深度追踪器展示了其作为UOT研究基准的价值，同时为未来研究提供了新的挑战和机会。完整的数据集、代码和跟踪结果将被公开发布。

更新时间: 2024-05-30 08:25:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19818v1

RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

Preference-based Reinforcement Learning (PbRL) circumvents the need for reward engineering by harnessing human preferences as the reward signal. However, current PbRL methods excessively depend on high-quality feedback from domain experts, which results in a lack of robustness. In this paper, we present RIME, a robust PbRL algorithm for effective reward learning from noisy preferences. Our method utilizes a sample selection-based discriminator to dynamically filter out noise and ensure robust training. To counteract the cumulative error stemming from incorrect selection, we suggest a warm start for the reward model, which additionally bridges the performance gap during the transition from pre-training to online training in PbRL. Our experiments on robotic manipulation and locomotion tasks demonstrate that RIME significantly enhances the robustness of the state-of-the-art PbRL method. Code is available at https://github.com/CJReinforce/RIME_ICML2024.

Updated: 2024-05-30 08:24:54

标题: RIME: 具有嘈杂偏好的稳健基于偏好的强化学习

摘要: Preference-based Reinforcement Learning (PbRL)通过利用人类偏好作为奖励信号，避免了对奖励工程的需求。然而，当前的PbRL方法过度依赖领域专家提供的高质量反馈，从而导致缺乏鲁棒性。本文介绍了RIME，一种用于有效从噪声偏好中学习奖励的鲁棒PbRL算法。我们的方法利用基于样本选择的鉴别器动态过滤噪声，确保鲁棒训练。为了抵消由于选择错误而导致的累积误差，我们建议为奖励模型提供一个热启动，同时在PbRL的从预训练到在线训练过渡期间填补性能差距。我们在机器人操作和运动任务上的实验证明，RIME显著增强了最先进的PbRL方法的鲁棒性。代码可在https://github.com/CJReinforce/RIME_ICML2024 上找到。

更新时间: 2024-05-30 08:24:54

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2402.17257v3

Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally

Machine learning tasks are generally formulated as optimization problems, where one searches for an optimal function within a certain functional space. In practice, parameterized functional spaces are considered, in order to be able to perform gradient descent. Typically, a neural network architecture is chosen and fixed, and its parameters (connection weights) are optimized, yielding an architecture-dependent result. This way of proceeding however forces the evolution of the function during training to lie within the realm of what is expressible with the chosen architecture, and prevents any optimization across architectures. Costly architectural hyper-parameter optimization is often performed to compensate for this. Instead, we propose to adapt the architecture on the fly during training. We show that the information about desirable architectural changes, due to expressivity bottlenecks when attempting to follow the functional gradient, can be extracted from %the backpropagation. To do this, we propose a mathematical definition of expressivity bottlenecks, which enables us to detect, quantify and solve them while training, by adding suitable neurons when and where needed. Thus, while the standard approach requires large networks, in terms of number of neurons per layer, for expressivity and optimization reasons, we are able to start with very small neural networks and let them grow appropriately. As a proof of concept, we show results~on the CIFAR dataset, matching large neural network accuracy, with competitive training time, while removing the need for standard architectural hyper-parameter search.

Updated: 2024-05-30 08:23:56

标题: 不好意思，我无法提供对已发表的文献标题的翻译。我可以帮您总结该文献的内容或回答相关问题。您可以提供更多信息吗？

摘要: 机器学习任务通常被表述为优化问题，其中人们在特定的功能空间中寻找最优函数。实际上，考虑到参数化的功能空间，以便能够执行梯度下降。通常情况下，选择并固定神经网络架构，优化其参数（连接权重），得出一个依赖于架构的结果。然而，这种处理方法迫使函数在训练过程中演化在所选择的架构可表达的范围内，并阻止跨架构的优化。为了弥补这一点，通常会进行昂贵的架构超参数优化。相反，我们提出在训练过程中动态调整架构。我们展示了由于尝试遵循功能梯度时的表达瓶颈而导致的有关理想架构更改的信息可以从反向传播中提取出来。为此，我们提出了表达性瓶颈的数学定义，这使我们能够在训练过程中检测、量化和解决它们，通过在需要时添加合适的神经元。因此，虽然标准方法要求大型网络，每层神经元数量较多，以实现表达和优化的目的，但我们能够从非常小的神经网络开始，并让它们适当增长。作为概念的证明，我们展示了在CIFAR数据集上的结果，匹配了大型神经网络的准确性，同时具有竞争性的训练时间，同时消除了标准架构超参数搜索的需求。

更新时间: 2024-05-30 08:23:56

领域: cs.AI

下载: http://arxiv.org/abs/2405.19816v1

Efficient Stimuli Generation using Reinforcement Learning in Design Verification

The increasing design complexity of System-on-Chips (SoCs) has led to significant verification challenges, particularly in meeting coverage targets within a timely manner. At present, coverage closure is heavily dependent on constrained random and coverage driven verification methodologies where the randomized stimuli are bounded to verify certain scenarios and to reach coverage goals. This process is said to be exhaustive and to consume a lot of project time. In this paper, a novel methodology is proposed to generate efficient stimuli with the help of Reinforcement Learning (RL) to reach the maximum code coverage of the Design Under Verification (DUV). Additionally, an automated framework is created using metamodeling to generate a SystemVerilog testbench and an RL environment for any given design. The proposed approach is applied to various designs and the produced results proves that the RL agent provides effective stimuli to achieve code coverage faster in comparison with baseline random simulations. Furthermore, various RL agents and reward schemes are analyzed in our work.

Updated: 2024-05-30 08:23:04

标题: 设计验证中使用强化学习进行高效刺激生成

摘要: 随着片上系统（SoCs）设计复杂性的增加，验证挑战显著增加，尤其是在及时满足覆盖目标方面。目前，覆盖闭合主要依赖于受限随机和覆盖驱动的验证方法，其中随机刺激被限制以验证特定场景并达到覆盖目标。这个过程被认为是详尽的并且耗费大量项目时间。本文提出了一种新颖的方法，利用强化学习（RL）生成高效刺激以达到待验证设计（DUV）的最大代码覆盖。此外，使用元模型创建了一个自动化框架，用于生成任何给定设计的SystemVerilog测试台和RL环境。所提出的方法应用于各种设计，并且产生的结果证明RL代理提供了有效刺激，以比基线随机模拟更快地实现代码覆盖。此外，在我们的工作中分析了各种RL代理和奖励方案。

更新时间: 2024-05-30 08:23:04

领域: cs.AI

下载: http://arxiv.org/abs/2405.19815v1

Approximate Global Convergence of Independent Learning in Multi-Agent Systems

Independent learning (IL), despite being a popular approach in practice to achieve scalability in large-scale multi-agent systems, usually lacks global convergence guarantees. In this paper, we study two representative algorithms, independent $Q$-learning and independent natural actor-critic, within value-based and policy-based frameworks, and provide the first finite-sample analysis for approximate global convergence. The results imply a sample complexity of $\tilde{\mathcal{O}}(\epsilon^{-2})$ up to an error term that captures the dependence among agents and characterizes the fundamental limit of IL in achieving global convergence. To establish the result, we develop a novel approach for analyzing IL by constructing a separable Markov decision process (MDP) for convergence analysis and then bounding the gap due to model difference between the separable MDP and the original one. Moreover, we conduct numerical experiments using a synthetic MDP and an electric vehicle charging example to verify our theoretical findings and to demonstrate the practical applicability of IL.

Updated: 2024-05-30 08:20:34

标题: 多智能体系统中独立学习的全局近似收敛性

摘要: 独立学习（IL）尽管在实践中是一种常见的方法，用于在大规模多智能体系统中实现可扩展性，但通常缺乏全局收敛保证。本文研究了两种代表性算法，独立$Q$-learning和独立自然演员-评论家，在基于价值和基于策略的框架内，并为近似全局收敛提供了首次有限样本分析。结果表明，取决于代理之间的相关性的一个误差项，样本复杂度为$\tilde{\mathcal{O}}(\epsilon^{-2})$，并刻画了IL在实现全局收敛方面的基本限制。为了建立这一结果，我们通过构建一个可分离的马尔可夫决策过程（MDP）进行收敛性分析，并限制由于可分离MDP与原始MDP之间的模型差异造成的差距。此外，我们使用一个合成MDP和一个电动汽车充电示例进行数值实验，验证了我们的理论发现，并展示了IL的实际适用性。

更新时间: 2024-05-30 08:20:34

领域: cs.LG,cs.MA

下载: http://arxiv.org/abs/2405.19811v1

Federated Causal Inference from Observational Data

Decentralized data sources are prevalent in real-world applications, posing a formidable challenge for causal inference. These sources cannot be consolidated into a single entity owing to privacy constraints. The presence of dissimilar data distributions and missing values within them can potentially introduce bias to the causal estimands. In this article, we propose a framework to estimate causal effects from decentralized data sources. The proposed framework avoid exchanging raw data among the sources, thus contributing towards privacy-preserving causal learning. Three instances of the proposed framework are introduced to estimate causal effects across a wide range of diverse scenarios within a federated setting. (1) FedCI: a Bayesian framework based on Gaussian processes for estimating causal effects from federated observational data sources. It estimates the posterior distributions of the causal effects to compute the higher-order statistics that capture the uncertainty. (2) CausalRFF: an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. It estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. (3) CausalFI: a new approach for federated causal inference from incomplete data, enabling the estimation of causal effects from multiple decentralized and incomplete data sources. It accounts for the missing data under the missing at random assumption, while also estimating higher-order statistics of the causal estimands. The proposed federated framework and its instances are an important step towards a privacy-preserving causal learning model.

Updated: 2024-05-30 08:19:34

标题: 来自观测数据的联合因果推断

摘要: 分散式数据源在现实世界应用中很常见，对因果推断构成了重大挑战。由于隐私约束，这些数据源无法整合成单一实体。这些数据源中存在不同的数据分布和缺失值，这可能会引入偏见到因果估计中。在本文中，我们提出了一个框架来从分散式数据源中估计因果效应。所提出的框架避免了在数据源之间交换原始数据，从而有助于保护隐私的因果学习。我们介绍了三种提出的框架实例，用于在联邦设置中估计各种不同情景下的因果效应。 (1) FedCI：基于高斯过程的贝叶斯框架，用于从联邦观测数据源中估计因果效应。它估计因果效应的后验分布，以计算捕捉不确定性的高阶统计量。(2) CausalRFF：一种自适应迁移算法，通过利用随机傅立叶特征将损失函数分解成多个组件，每个组件与一个数据源相关联，来学习数据源之间的相似性。它通过迁移系数估计源之间的相似性，因此不需要关于相似性度量的先验信息。(3) CausalFI：一种从不完整数据进行联邦因果推断的新方法，可以从多个分散和不完整数据源中估计因果效应。它在缺失数据下假定缺失是随机的，并估计因果估计的高阶统计量。所提出的联邦框架及其实例是朝着一个保护隐私的因果学习模型迈出的重要一步。

更新时间: 2024-05-30 08:19:34

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2308.13047v2

AI with Alien Content and Alien Metasemantics

AlphaGo plays chess and Go in a creative and novel way. It is natural for us to attribute contents to it, such as that it doesn't view being several pawns behind, if it has more board space, as bad. The framework introduced in Cappelen and Dever (2021) provides a way of thinking about the semantics and the metasemantics of AI content: does AlphaGo entertain contents like this, and if so, in virtue of what does a given state of the program mean that particular content? One salient question Cappelen and Dever didn't consider was the possibility of alien content. Alien content is content that is not or cannot be expressed by human beings. It's highly plausible that AlphaGo, or any other sophisticated AI system, expresses alien contents. That this is so, moreover, is plausibly a metasemantic fact: a fact that has to do with how AI comes to entertain content in the first place, one that will heed the vastly different etiology of AI and human content. This chapter explores the question of alien content in AI from a semantic and metasemantic perspective. It lays out the logical space of possible responses to the semantic and metasemantic questions alien content poses, considers whether and how we humans could communicate with entities who express alien content, and points out that getting clear about such questions might be important for more 'applied' issues in the philosophy of AI, such as existential risk and XAI.

Updated: 2024-05-30 08:17:15

标题: 人工智能与外星内容及外星元语义

摘要: AlphaGo以一种创新和新颖的方式下棋和围棋。我们很自然地会将内容归因于它，比如说它并不认为如果在棋盘上有更多的空间，那么即使落后几个兵也不是坏事。Cappelen和Dever（2021）提出的框架提供了一种思考AI内容的语义和元语义的方式：AlphaGo是否有类似这样的内容，并且如果有的话，一个给定程序状态意味着特定内容的原因是什么？Cappelen和Dever没有考虑的一个突出问题是外来内容的可能性。外来内容是人类无法表达的内容。AlphaGo或任何其他复杂的AI系统表达外来内容的可能性非常高。此外，这很可能是一个元语义事实：这与AI如何首次表达内容有关，这将考虑到AI和人类内容之间的巨大差异。本章从语义和元语义的角度探讨了AI中外来内容的问题。它阐明了对外来内容提出语义和元语义问题的可能响应的逻辑空间，考虑了我们人类是否可以与表达外来内容的实体进行交流，指出澄清这样的问题可能对AI哲学中的更“应用”问题，如存在风险和XAI，非常重要。

更新时间: 2024-05-30 08:17:15

领域: cs.AI

下载: http://arxiv.org/abs/2405.19808v1

MetaCURL: Non-stationary Concave Utility Reinforcement Learning

We explore online learning in episodic loop-free Markov decision processes on non-stationary environments (changing losses and probability transitions). Our focus is on the Concave Utility Reinforcement Learning problem (CURL), an extension of classical RL for handling convex performance criteria in state-action distributions induced by agent policies. While various machine learning problems can be written as CURL, its non-linearity invalidates traditional Bellman equations. Despite recent solutions to classical CURL, none address non-stationary MDPs. This paper introduces MetaCURL, the first CURL algorithm for non-stationary MDPs. It employs a meta-algorithm running multiple black-box algorithms instances over different intervals, aggregating outputs via a sleeping expert framework. The key hurdle is partial information due to MDP uncertainty. Under partial information on the probability transitions (uncertainty and non-stationarity coming only from external noise, independent of agent state-action pairs), we achieve optimal dynamic regret without prior knowledge of MDP changes. Unlike approaches for RL, MetaCURL handles full adversarial losses, not just stochastic ones. We believe our approach for managing non-stationarity with experts can be of interest to the RL community.

Updated: 2024-05-30 08:17:00

标题: MetaCURL：非稳态凹效用强化学习

摘要: 我们探讨了在非稳态环境（变化的损失和概率转换）中的无环路马尔可夫决策过程中的在线学习。我们关注凹凸效用强化学习问题（CURL），这是对经典强化学习的扩展，用于处理由代理政策引起的状态-动作分布中的凸性能准则。虽然各种机器学习问题可以写成CURL形式，但其非线性使传统的贝尔曼方程无效。尽管最近有对经典CURL的解决方案，但没有一个解决非稳态MDP。本文介绍了MetaCURL，这是第一个针对非稳态MDP的CURL算法。它采用一个元算法，在不同时间间隔内运行多个黑盒算法实例，通过睡眠专家框架汇总输出。关键障碍是由于MDP不确定性而导致的部分信息。在概率转换的部分信息下（不确定性和非稳态性仅来自外部噪声，独立于代理状态-动作对），我们实现了在不了解MDP变化的情况下的最优动态后悔。与RL方法不同，MetaCURL处理全面的对抗性损失，而不仅仅是随机的损失。我们相信我们的专家管理非稳态性的方法可能会引起RL社区的兴趣。

更新时间: 2024-05-30 08:17:00

领域: cs.LG,math.PR,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2405.19807v1

Preference Alignment with Flow Matching

We present Preference Flow Matching (PFM), a new framework for preference-based reinforcement learning (PbRL) that streamlines the integration of preferences into an arbitrary class of pre-trained models. Existing PbRL methods require fine-tuning pre-trained models, which presents challenges such as scalability, inefficiency, and the need for model modifications, especially with black-box APIs like GPT-4. In contrast, PFM utilizes flow matching techniques to directly learn from preference data, thereby reducing the dependency on extensive fine-tuning of pre-trained models. By leveraging flow-based models, PFM transforms less preferred data into preferred outcomes, and effectively aligns model outputs with human preferences without relying on explicit or implicit reward function estimation, thus avoiding common issues like overfitting in reward models. We provide theoretical insights that support our method's alignment with standard PbRL objectives. Experimental results indicate the practical effectiveness of our method, offering a new direction in aligning a pre-trained model to preference.

Updated: 2024-05-30 08:16:22

标题: Preference Alignment with Flow Matching 偏好与流匹配的一致性

摘要: 我们提出了Preference Flow Matching（PFM），这是一种新的基于偏好的强化学习（PbRL）框架，可以简化偏好集成到任意类别的预训练模型中。现有的PbRL方法需要对预训练模型进行微调，这带来了诸如可扩展性、效率低下和需要模型修改等挑战，特别是对于像GPT-4这样的黑盒API。相比之下，PFM利用流匹配技术直接从偏好数据中学习，从而减少了对预训练模型进行广泛微调的依赖性。通过利用基于流的模型，PFM将较少偏好的数据转化为偏好结果，并有效地将模型输出与人类偏好对齐，而无需依赖显式或隐式的奖励函数估计，从而避免了奖励模型中的过拟合等常见问题。我们提供支持我们方法与标准PbRL目标对齐的理论见解。实验结果表明我们方法的实际有效性，为将预训练模型与偏好对齐提供了一个新的方向。

更新时间: 2024-05-30 08:16:22

领域: cs.LG

下载: http://arxiv.org/abs/2405.19806v1

Complexity of Deciding Injectivity and Surjectivity of ReLU Neural Networks

Neural networks with ReLU activation play a key role in modern machine learning. In view of safety-critical applications, the verification of trained networks is of great importance and necessitates a thorough understanding of essential properties of the function computed by a ReLU network, including characteristics like injectivity and surjectivity. Recently, Puthawala et al. [JMLR 2022] came up with a characterization for injectivity of a ReLU layer, which implies an exponential time algorithm. However, the exact computational complexity of deciding injectivity remained open. We answer this question by proving coNP-completeness of deciding injectivity of a ReLU layer. On the positive side, as our main result, we present a parameterized algorithm which yields fixed-parameter tractability of the problem with respect to the input dimension. In addition, we also characterize surjectivity for two-layer ReLU networks with one-dimensional output. Remarkably, the decision problem turns out to be the complement of a basic network verification task. We prove NP-hardness for surjectivity, implying a stronger hardness result than previously known for the network verification problem. Finally, we reveal interesting connections to computational convexity by formulating the surjectivity problem as a zonotope containment problem

Updated: 2024-05-30 08:14:34

标题: 决定ReLU神经网络的单射性和满射性的复杂性

摘要: 具有ReLU激活函数的神经网络在现代机器学习中发挥着关键作用。考虑到安全关键应用，对经过训练的网络进行验证至关重要，并需要对由ReLU网络计算的函数的基本属性进行深入了解，包括像单射性和满射性这样的特征。最近，Puthawala等人[JMLR 2022]提出了一个关于ReLU层单射性的特征化，这意味着一个指数时间算法。然而，决定单射性的确切计算复杂度仍然是一个开放问题。我们通过证明决定ReLU层单射性的coNP完全性来回答这个问题。在积极的一面，作为我们的主要结果，我们提出了一个参数化算法，针对输入维度，该算法实现了该问题的固定参数可处理性。此外，我们还对具有一维输出的两层ReLU网络的满射性进行了特征化。值得注意的是，决策问题实际上是一个基本网络验证任务的补集。我们证明了满射性的NP难度，这意味着比先前已知的网络验证问题更严格的困难结果。最后，通过将满射性问题表述为zonotope包含问题，我们揭示了与计算凸性的有趣联系。

更新时间: 2024-05-30 08:14:34

领域: cs.CC,cs.DM,cs.LG

下载: http://arxiv.org/abs/2405.19805v1

Research on Credit Risk Early Warning Model of Commercial Banks Based on Neural Network Algorithm

In the realm of globalized financial markets, commercial banks are confronted with an escalating magnitude of credit risk, thereby imposing heightened requisites upon the security of bank assets and financial stability. This study harnesses advanced neural network techniques, notably the Backpropagation (BP) neural network, to pioneer a novel model for preempting credit risk in commercial banks. The discourse initially scrutinizes conventional financial risk preemptive models, such as ARMA, ARCH, and Logistic regression models, critically analyzing their real-world applications. Subsequently, the exposition elaborates on the construction process of the BP neural network model, encompassing network architecture design, activation function selection, parameter initialization, and objective function construction. Through comparative analysis, the superiority of neural network models in preempting credit risk in commercial banks is elucidated. The experimental segment selects specific bank data, validating the model's predictive accuracy and practicality. Research findings evince that this model efficaciously enhances the foresight and precision of credit risk management.

Updated: 2024-05-30 08:13:30

标题: 基于神经网络算法的商业银行信用风险预警模型研究

摘要: 在全球化金融市场领域，商业银行面临着不断增加的信贷风险，因此对银行资产安全和金融稳定性提出了更高要求。本研究利用先进的神经网络技术，特别是反向传播（BP）神经网络，开创了一种新颖的模型，用于预防商业银行的信贷风险。讨论首先审视传统的金融风险预防模型，如ARMA、ARCH和Logistic回归模型，批判性地分析它们在现实世界中的应用。随后，文章详细阐述了BP神经网络模型的构建过程，包括网络架构设计、激活函数选择、参数初始化和目标函数构建。通过比较分析，阐明了神经网络模型在预防商业银行信贷风险方面的优越性。实验部分选择了特定的银行数据，验证了模型的预测准确性和实用性。研究结果表明，该模型有效地提高了信贷风险管理的预见性和精确性。

更新时间: 2024-05-30 08:13:30

领域: q-fin.RM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.10762v2

Exploring Key Factors for Long-Term Vessel Incident Risk Prediction

Factor analysis acts a pivotal role in enhancing maritime safety. Most previous studies conduct factor analysis within the framework of incident-related label prediction, where the developed models can be categorized into short-term and long-term prediction models. The long-term models offer a more strategic approach, enabling more proactive risk management, compared to the short-term ones. Nevertheless, few studies have devoted to rigorously identifying the key factors for the long-term prediction and undertaking comprehensive factor analysis. Hence, this study aims to delve into the key factors for predicting the incident risk levels in the subsequent year given a specific datestamp. The majority of candidate factors potentially contributing to the incident risk are collected from vessels' historical safety performance data spanning up to five years. An improved embedded feature selection, which integrates Random Forest classifier with a feature filtering process is proposed to identify key risk-contributing factors from the candidate pool. The results demonstrate superior performance of the proposed method in incident prediction and factor interpretability. Comprehensive analysis is conducted upon the key factors, which could help maritime stakeholders formulate management strategies for incident prevenion.

Updated: 2024-05-30 08:12:51

标题: 探究长期船舶事故风险预测的关键因素

摘要: 因素分析在增强海事安全方面发挥关键作用。大多数先前的研究在与事故相关的标签预测框架内进行因素分析，其中开发的模型可以被归类为短期和长期预测模型。长期模型提供了更战略化的方法，相比于短期模型，使得更主动的风险管理成为可能。然而，很少有研究致力于严格识别长期预测的关键因素并进行全面的因素分析。因此，本研究旨在深入挖掘在给定特定日期标签时预测随后一年事故风险水平的关键因素。大部分可能对事故风险有贡献的候选因素来自船舶历史安全表现数据，时间跨度长达五年。提出了一种改进的嵌入式特征选择方法，将随机森林分类器与特征筛选过程相结合，以从候选池中识别关键风险贡献因素。结果表明，所提出的方法在事故预测和因素可解释性方面表现出卓越的性能。对关键因素进行了全面分析，这有助于海事利益相关者制定事故防范管理策略。

更新时间: 2024-05-30 08:12:51

领域: cs.LG

下载: http://arxiv.org/abs/2405.19804v1

Explainable Attribute-Based Speaker Verification

This paper proposes a fully explainable approach to speaker verification (SV), a task that fundamentally relies on individual speaker characteristics. The opaque use of speaker attributes in current SV systems raises concerns of trust. Addressing this, we propose an attribute-based explainable SV system that identifies speakers by comparing personal attributes such as gender, nationality, and age extracted automatically from voice recordings. We believe this approach better aligns with human reasoning, making it more understandable than traditional methods. Evaluated on the Voxceleb1 test set, the best performance of our system is comparable with the ground truth established when using all correct attributes, proving its efficacy. Whilst our approach sacrifices some performance compared to non-explainable methods, we believe that it moves us closer to the goal of transparent, interpretable AI and lays the groundwork for future enhancements through attribute expansion.

Updated: 2024-05-30 08:04:28

标题: 可解释的基于属性的说话人验证

摘要: 这篇论文提出了一种完全可解释的说话人验证（SV）方法，这是一项基本依赖于个体说话人特征的任务。当前SV系统中对说话人属性的不透明使用引发了信任问题。为了解决这一问题，我们提出了一种基于属性的可解释SV系统，通过比较从语音录音中自动提取的个人属性（如性别、国籍和年龄）来识别说话人。我们相信这种方法与人类推理更加契合，使其比传统方法更易理解。在Voxceleb1测试集上评估，我们系统的最佳性能与使用所有正确属性时建立的基准相当，证明了其有效性。尽管我们的方法在性能上与不可解释的方法相比有所牺牲，但我们认为它使我们更接近透明、可解释的人工智能目标，并为通过属性扩展实现未来增强打下了基础。

更新时间: 2024-05-30 08:04:28

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2405.19796v1

SLM as Guardian: Pioneering AI Safety with Small Language Models

Most prior safety research of large language models (LLMs) has focused on enhancing the alignment of LLMs to better suit the safety requirements of humans. However, internalizing such safeguard features into larger models brought challenges of higher training cost and unintended degradation of helpfulness. To overcome such challenges, a modular approach employing a smaller LLM to detect harmful user queries is regarded as a convenient solution in designing LLM-based system with safety requirements. In this paper, we leverage a smaller LLM for both harmful query detection and safeguard response generation. We introduce our safety requirements and the taxonomy of harmfulness categories, and then propose a multi-task learning mechanism fusing the two tasks into a single model. We demonstrate the effectiveness of our approach, providing on par or surpassing harmful query detection and safeguard response performance compared to the publicly available LLMs.

Updated: 2024-05-30 08:03:15

标题: SLM作为守护者：利用小语言模型开创人工智能安全领域

摘要: 大多数之前关于大型语言模型（LLMs）的安全性研究都集中在增强LLMs与人类安全要求更好地对齐。然而，将这种安全功能内化到更大的模型中带来了更高的训练成本和意外的有害帮助降级的挑战。为了克服这些挑战，采用模块化方法，利用较小的LLM来检测有害用户查询被认为是设计符合安全要求的基于LLM系统的便捷解决方案。在本文中，我们利用较小的LLM来进行有害查询检测和安全响应生成。我们介绍了我们的安全要求和有害类别的分类法，然后提出了一种融合这两个任务为单一模型的多任务学习机制。我们展示了我们的方法的有效性，与公开可用的LLMs相比，提供了与有害查询检测和安全响应性能相当或优越的结果。

更新时间: 2024-05-30 08:03:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19795v1

Opara: Exploiting Operator Parallelism for Expediting DNN Inference on GPUs

GPUs have become the \emph{defacto} hardware devices for accelerating Deep Neural Network (DNN) inference workloads. However, the conventional \emph{sequential execution mode of DNN operators} in mainstream deep learning frameworks cannot fully utilize GPU resources, even with the operator fusion enabled, due to the increasing complexity of model structures and a greater diversity of operators. Moreover, the \emph{inadequate operator launch order} in parallelized execution scenarios can lead to GPU resource wastage and unexpected performance interference among operators. In this paper, we propose \emph{Opara}, a resource- and interference-aware DNN \underline{Op}erator \underline{para}llel scheduling framework to accelerate DNN inference on GPUs. Specifically, \emph{Opara} first employs \texttt{CUDA Streams} and \texttt{CUDA Graph} to \emph{parallelize} the execution of multiple operators automatically. To further expedite DNN inference, \emph{Opara} leverages the resource demands of operators to judiciously adjust the operator launch order on GPUs, overlapping the execution of compute-intensive and memory-intensive operators. We implement and open source a prototype of \emph{Opara} based on PyTorch in a \emph{non-intrusive} manner. Extensive prototype experiments with representative DNN and Transformer-based models demonstrate that \emph{Opara} outperforms the default sequential \texttt{CUDA Graph} in PyTorch and the state-of-the-art operator parallelism systems by up to $1.68\times$ and $1.29\times$, respectively, yet with acceptable runtime overhead.

Updated: 2024-05-30 08:01:06

标题: Opara：利用操作并行性加速GPU上的DNN推断

摘要: GPU已经成为加速深度神经网络（DNN）推断工作负载的事实硬件设备。然而，在主流深度学习框架中，传统的DNN运算符的顺序执行模式无法充分利用GPU资源，即使启用了运算符融合，也由于模型结构的复杂性增加和运算符的多样性。此外，在并行化执行场景中不恰当的运算符启动顺序可能导致GPU资源浪费和运算符之间的意外性能干扰。在本文中，我们提出了Opara，一个资源和干扰感知的DNN运算符并行调度框架，以加速GPU上的DNN推断。具体来说，Opara首先使用CUDA Streams和CUDA Graph自动并行执行多个运算符。为了进一步加速DNN推断，Opara利用运算符的资源需求来审慎调整GPU上的运算符启动顺序，重叠计算密集型和内存密集型运算符的执行。我们以PyTorch为基础以非侵入方式实现并开源了Opara的原型。通过代表性的DNN和基于Transformer的模型的大量原型实验表明，Opara在性能上优于PyTorch中默认的顺序CUDA Graph和最先进的运算符并行系统，分别提高了高达1.68倍和1.29倍，同时具有可接受的运行时开销。

更新时间: 2024-05-30 08:01:06

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2312.10351v2

Estimating before Debiasing: A Bayesian Approach to Detaching Prior Bias in Federated Semi-Supervised Learning

Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model.In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model's prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the models themselves are biased, merely constraining their consistency is not sufficient to alleviate prediction bias. In this paper, we explore this bias from a Bayesian perspective and demonstrate that it principally originates from label prior bias within the training data. Building upon this insight, we propose a debiasing method for FSSL named FedDB. FedDB utilizes the Average Prediction Probability of Unlabeled Data (APP-U) to approximate the biased prior.During local training, FedDB employs APP-U to refine pseudo-labeling through Bayes' theorem, thereby significantly reducing the label prior bias. Concurrently, during the model aggregation, FedDB uses APP-U from participating clients to formulate unbiased aggregate weights, thereby effectively diminishing bias in the global model. Experimental results show that FedDB can surpass existing FSSL methods. The code is available at https://github.com/GuogangZhu/FedDB.

Updated: 2024-05-30 07:58:01

标题: 在去偏置之前进行估计：一种贝叶斯方法用于消除联邦半监督学习中的先验偏差

摘要: 联邦半监督学习（FSSL）利用客户端上的标记和未标记数据共同训练模型。在FSSL中，异构数据可能会向模型引入预测偏差，导致模型的预测偏向某些特定类别。现有的FSSL方法主要通过增强模型参数或输出的一致性来解决这个问题。然而，由于模型本身存在偏差，仅仅约束它们的一致性是不足以减轻预测偏差的。在本文中，我们从贝叶斯角度探讨了这种偏差，并证明它主要源自训练数据中的标签先验偏差。基于这一洞察，我们提出了一种名为FedDB的FSSL去偏方法。FedDB利用未标记数据的平均预测概率（APP-U）来近似有偏先验。在本地训练期间，FedDB利用APP-U通过贝叶斯定理来优化伪标记，从而显著减少标签先验偏差。同时，在模型聚合期间，FedDB使用来自参与客户端的APP-U来制定无偏的聚合权重，从而有效减少全局模型中的偏差。实验结果表明，FedDB可以超越现有的FSSL方法。该代码可在https://github.com/GuogangZhu/FedDB找到。

更新时间: 2024-05-30 07:58:01

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.19789v1

Rethinking Transformers in Solving POMDPs

Sequential decision-making algorithms such as reinforcement learning (RL) in real-world scenarios inevitably face environments with partial observability. This paper scrutinizes the effectiveness of a popular architecture, namely Transformers, in Partially Observable Markov Decision Processes (POMDPs) and reveals its theoretical limitations. We establish that regular languages, which Transformers struggle to model, are reducible to POMDPs. This poses a significant challenge for Transformers in learning POMDP-specific inductive biases, due to their lack of inherent recurrence found in other models like RNNs. This paper casts doubt on the prevalent belief in Transformers as sequence models for RL and proposes to introduce a point-wise recurrent structure. The Deep Linear Recurrent Unit (LRU) emerges as a well-suited alternative for Partially Observable RL, with empirical results highlighting the sub-optimal performance of the Transformer and considerable strength of LRU.

Updated: 2024-05-30 07:54:40

标题: 重新思考变压器在解决POMDPs中的作用

摘要: 顺序决策算法，如强化学习（RL），在真实世界场景中不可避免地面临部分可观测性环境。本文审视了一种流行的架构，即变压器，在部分可观测马尔可夫决策过程（POMDPs）中的有效性，并揭示了其理论局限性。我们确定了常规语言，变压器难以建模，可以归约为POMDPs。这对于变压器在学习POMDP特定归纳偏见方面构成了重大挑战，因为它们缺乏类似RNNs这样的其他模型中发现的固有循环性。本文对变压器作为RL序列模型的普遍信念提出了质疑，并建议引入逐点循环结构。深度线性循环单元（LRU）被提出作为部分可观测RL的合适替代方案，实证结果突显出变压器的次优性能和LRU的显著优势。

更新时间: 2024-05-30 07:54:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17358v3

From Symbolic Tasks to Code Generation: Diversification Yields Better Task Performers

Instruction tuning -- tuning large language models on instruction-output pairs -- is a promising technique for making models better adapted to the real world. Yet, the key factors driving the model's capability to understand and follow instructions not seen during training remain under-explored. Our investigation begins with a series of synthetic experiments within the theoretical framework of a Turing-complete algorithm called Markov algorithm, which allows fine-grained control over the instruction-tuning data. Generalization and robustness with respect to the training distribution emerge once a diverse enough set of tasks is provided, even though very few examples are provided for each task. We extend these initial results to a real-world application scenario of code generation and find that a more diverse instruction set, extending beyond code-related tasks, improves the performance of code generation. Our observations suggest that a more diverse semantic space for instruction-tuning sets greatly improves the model's ability to follow instructions and perform tasks.

Updated: 2024-05-30 07:54:07

标题: 从符号任务到代码生成：多样化产生更好的任务执行者

摘要: 指令调优——在指令-输出对上调优大型语言模型——是使模型更好地适应现实世界的一种有前途的技术。然而，驱动模型理解和遵循训练期间未见指令的关键因素仍未得到充分探讨。我们的研究从在一个名为马尔可夫算法的图灵完备算法的理论框架内进行一系列合成实验开始，该算法允许对指令调优数据进行细粒度控制。一旦提供了足够多样化的任务集，即使每个任务只提供了很少的示例，泛化性和对训练分布的稳健性也会出现。我们将这些初步结果扩展到代码生成的真实应用场景中，并发现一个更多样化的指令集，超越与代码相关的任务，可以提高代码生成的性能。我们的观察表明，更多样化的语义空间对指令调优集有很大帮助，大大提高了模型遵循指令并执行任务的能力。

更新时间: 2024-05-30 07:54:07

领域: cs.CL,cs.AI,cs.LG,cs.LO,cs.PL

下载: http://arxiv.org/abs/2405.19787v1

Recurrent Deep Kernel Learning of Dynamical Systems

Digital twins require computationally-efficient reduced-order models (ROMs) that can accurately describe complex dynamics of physical assets. However, constructing ROMs from noisy high-dimensional data is challenging. In this work, we propose a data-driven, non-intrusive method that utilizes stochastic variational deep kernel learning (SVDKL) to discover low-dimensional latent spaces from data and a recurrent version of SVDKL for representing and predicting the evolution of latent dynamics. The proposed method is demonstrated with two challenging examples -- a double pendulum and a reaction-diffusion system. Results show that our framework is capable of (i) denoising and reconstructing measurements, (ii) learning compact representations of system states, (iii) predicting system evolution in low-dimensional latent spaces, and (iv) quantifying modeling uncertainties.

Updated: 2024-05-30 07:49:02

标题: 动态系统的深度核学习的可重复性

摘要: 数字孪生需要高效的计算降阶模型（ROMs），能够准确描述物理资产的复杂动态。然而，从嘈杂的高维数据构建ROMs是具有挑战性的。在这项研究中，我们提出了一种数据驱动的、非侵入式方法，利用随机变分深度核学习（SVDKL）从数据中发现低维潜在空间，并使用SVDKL的循环版本来表示和预测潜在动态的演变。该方法在两个具有挑战性的例子中得到了验证--一个双摆和一个反应扩散系统。结果显示我们的框架能够（i）去噪和重构测量，（ii）学习系统状态的紧凑表示，（iii）在低维潜在空间中预测系统演化，以及（iv）量化建模不确定性。

更新时间: 2024-05-30 07:49:02

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.19785v1

PixelsDB: Serverless and Natural-Language-Aided Data Analytics with Flexible Service Levels and Prices

Serverless query processing has become increasingly popular due to its advantages, including automated hardware and software management, high elasticity, and pay-as-you-go pricing. For users who are not system experts, serverless query processing greatly reduces the cost of owning a data analytic system. However, it is still a significant challenge for non-expert users to transform their complex and evolving data analytic needs into proper SQL queries and select a serverless query engine that delivers satisfactory performance and price for each type of query. This paper presents PixelsDB, an open-source data analytic system that allows users who lack system or SQL expertise to explore data efficiently. It allows users to generate and debug SQL queries using a natural language interface powered by fine-tuned language models. The queries are then executed by a serverless query engine that offers varying prices for different service levels on query urgency. The service levels are natively supported by dedicated architecture design and heterogeneous resource scheduling that can apply cost-efficient resources to process non-urgent queries. We envision that the combination of a serverless paradigm, a natural-language-aided interface, and flexible service levels and prices will substantially improve the user experience in data analysis.

Updated: 2024-05-30 07:48:43

标题: PixelsDB：无服务器和自然语言辅助数据分析，具有灵活的服务水平和价格

摘要: 无服务器查询处理由于其自动化硬件和软件管理、高弹性和按需付费的优势而变得越来越受欢迎。对于不是系统专家的用户来说，无服务器查询处理大大降低了拥有数据分析系统的成本。然而，对于非专家用户来说，将他们复杂和不断发展的数据分析需求转化为适当的SQL查询，并选择一个能够为每种类型的查询提供令人满意的性能和价格的无服务器查询引擎仍然是一个重要挑战。本文介绍了PixelsDB，一个开源数据分析系统，允许缺乏系统或SQL专业知识的用户高效地探索数据。它允许用户使用经过精细调整的语言模型提供的自然语言界面生成和调试SQL查询。然后，这些查询由一个无服务器查询引擎执行，该引擎针对查询紧急程度提供不同服务水平的价格。这些服务水平通过专门的架构设计和异构资源调度进行本地支持，可以将成本高效的资源应用于处理非紧急查询。我们预见，无服务器范式、自然语言辅助界面、灵活的服务水平和价格的组合将大大改善用户在数据分析中的体验。

更新时间: 2024-05-30 07:48:43

领域: cs.DB,cs.AI,cs.DC,cs.HC,cs.LG

下载: http://arxiv.org/abs/2405.19784v1

Instruction-Guided Visual Masking

Instruction following is crucial in contemporary LLM. However, when extended to multimodal setting, it often suffers from misalignment between specific textual instruction and targeted local region of an image. To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with diverse multimodal models, such as LMM and robot model. By constructing visual masks for instruction-irrelevant regions, IVM-enhanced multimodal models can effectively focus on task-relevant image regions to better align with complex instructions. Specifically, we design a visual masking data generation pipeline and create an IVM-Mix-1M dataset with 1 million image-instruction pairs. We further introduce a new learning technique, Discriminator Weighted Supervised Learning (DWSL) for preferential IVM training that prioritizes high-quality data samples. Experimental results on generic multimodal tasks such as VQA and embodied robotic control demonstrate the versatility of IVM, which as a plug-and-play tool, significantly boosts the performance of diverse multimodal models, yielding new state-of-the-art results across challenging multimodal benchmarks. Code is available at https://github.com/2toinf/IVM.

Updated: 2024-05-30 07:48:32

标题: 指导性视觉遮蔽

摘要: 指示后续在当代LLM中至关重要。然而，当扩展到多模态设置时，往往会出现特定文本指令与图像的目标本地区域之间的不对齐问题。为了实现更准确和细致的多模态指示跟随，我们引入了Instruction-guided Visual Masking (IVM)，这是一个新的通用视觉基础模型，与多样化的多模态模型兼容，如LMM和机器人模型。通过为指令无关区域构建视觉掩码，IVM增强的多模态模型可以有效地专注于任务相关的图像区域，以更好地与复杂指令对齐。具体来说，我们设计了一个视觉掩码数据生成管道，并创建了一个包含100万个图像指令对的IVM-Mix-1M数据集。我们进一步引入一种新的学习技术，称为Discriminator Weighted Supervised Learning (DWSL)，用于优先进行IVM训练，优先考虑高质量的数据样本。在诸如VQA和具体机器人控制等通用多模态任务上的实验结果展示了IVM的多功能性，作为一种即插即用的工具，显著提升了各种多模态模型的性能，产生了挑战性多模态基准上的新的最先进结果。代码可在https://github.com/2toinf/IVM 上获得。

更新时间: 2024-05-30 07:48:32

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.19783v1

Automatic Graph Topology-Aware Transformer

Existing efforts are dedicated to designing many topologies and graph-aware strategies for the graph Transformer, which greatly improve the model's representation capabilities. However, manually determining the suitable Transformer architecture for a specific graph dataset or task requires extensive expert knowledge and laborious trials. This paper proposes an evolutionary graph Transformer architecture search framework (EGTAS) to automate the construction of strong graph Transformers. We build a comprehensive graph Transformer search space with the micro-level and macro-level designs. EGTAS evolves graph Transformer topologies at the macro level and graph-aware strategies at the micro level. Furthermore, a surrogate model based on generic architectural coding is proposed to directly predict the performance of graph Transformers, substantially reducing the evaluation cost of evolutionary search. We demonstrate the efficacy of EGTAS across a range of graph-level and node-level tasks, encompassing both small-scale and large-scale graph datasets. Experimental results and ablation studies show that EGTAS can construct high-performance architectures that rival state-of-the-art manual and automated baselines.

Updated: 2024-05-30 07:44:31

标题: 自动图拓扑感知变换器

摘要: 现有的工作致力于为图变换器设计许多拓扑结构和图感知策略，大大提高了模型的表示能力。然而，手动确定特定图数据集或任务的适合Transformer架构需要广泛的专业知识和繁琐的试验。本文提出了一种进化图变换器架构搜索框架（EGTAS）来自动构建强大的图变换器。我们构建了一个包括微观设计和宏观设计的全面的图变换器搜索空间。EGTAS在宏观层面和微观层面上演进图变换器的拓扑结构和图感知策略。此外，提出了基于通用架构编码的代理模型，直接预测图变换器的性能，大大减少了进化搜索的评估成本。我们在一系列图级和节点级任务中展示了EGTAS的有效性，涵盖了小规模和大规模图数据集。实验结果和消融研究表明，EGTAS能够构建出与最先进的手动和自动基线相媲美的高性能架构。

更新时间: 2024-05-30 07:44:31

领域: cs.NE,cs.GR,cs.LG

下载: http://arxiv.org/abs/2405.19779v1

Enhancing Consistency and Role-Specific Knowledge Capturing by Rebuilding Fictional Character's Persona

With the recent introduction of Assistants API, it is expected that document-based language models will be actively used in various domains, especially Role-playing. However, a key challenge lies in utilizing protagonist's persona: Assistants API often fails to achieve with its search because the information extraction part is different each time and it often omits important information such as protagonist's backstory or relationships. It is hard to maintain a consistent persona simply by using the persona document as input to the Assistants API. To address the challenge of achieving stable persona consistency, we propose CharacterGPT, a novel persona reconstruction framework to alleviate the shortcomings of the Assistants API. Our method involves Character Persona Training (CPT), an effective persona rebuilding process that updates the character persona by extracting the character's traits from given summary of the novel for each character as if the story in a novel progresses. In our experiments, we ask each character to take the Big Five Inventory personality test in various settings and analyze the results. To assess whether it can think outside the box, we let each character generate short novels. Extensive experiments and human evaluation demonstrate that CharacterGPT presents new possibilities for role-playing agent research.

Updated: 2024-05-30 07:44:16

标题: 通过重建虚构角色的个性，提高一致性和角色特定知识的捕捉

摘要: 随着最近引入助手API，预计基于文档的语言模型将在各个领域，特别是角色扮演中得到积极应用。然而，一个关键挑战在于利用主角的人物特性：助手API经常无法通过搜索实现这一目标，因为信息提取部分每次都不同，经常会忽略重要信息，例如主角的背景故事或关系。仅通过将人物文档作为助手API的输入来维持一致的人物特性是很困难的。为了解决实现稳定人物一致性的挑战，我们提出了CharacterGPT，这是一个新颖的人物特性重建框架，旨在缓解助手API的缺点。我们的方法涉及角色人物训练（CPT），这是一个有效的人物重建过程，通过从给定小说摘要中提取角色的特征来更新角色人物，就好像小说中的故事在进行。在我们的实验中，我们要求每个角色在不同环境中接受五大人格特质测试，并分析结果。为了评估是否能够打破常规，我们让每个角色创作短篇小说。广泛的实验和人类评估表明，CharacterGPT为角色扮演代理研究提供了新的可能性。

更新时间: 2024-05-30 07:44:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19778v1

Multi-Region Markovian Gaussian Process: An Efficient Method to Discover Directional Communications Across Multiple Brain Regions

Studying the complex interactions between different brain regions is crucial in neuroscience. Various statistical methods have explored the latent communication across multiple brain regions. Two main categories are the Gaussian Process (GP) and Linear Dynamical System (LDS), each with unique strengths. The GP-based approach effectively discovers latent variables with frequency bands and communication directions. Conversely, the LDS-based approach is computationally efficient but lacks powerful expressiveness in latent representation. In this study, we merge both methodologies by creating an LDS mirroring a multi-output GP, termed Multi-Region Markovian Gaussian Process (MRM-GP). Our work establishes a connection between an LDS and a multi-output GP that explicitly models frequencies and phase delays within the latent space of neural recordings. Consequently, the model achieves a linear inference cost over time points and provides an interpretable low-dimensional representation, revealing communication directions across brain regions and separating oscillatory communications into different frequency bands.

Updated: 2024-05-30 07:35:58

标题: 多区域马尔可夫高斯过程：一种有效的方法，用于发现多个脑区之间的定向通信

摘要: 研究不同脑区之间复杂相互作用在神经科学中至关重要。各种统计方法已经探索了多个脑区之间的潜在通信。两个主要类别是高斯过程（GP）和线性动态系统（LDS），每种方法都有独特的优势。基于GP的方法有效地发现了具有频带和通信方向的潜在变量。相反，基于LDS的方法在计算效率上具有优势，但在潜在表示方面缺乏强大的表达能力。在本研究中，我们通过创建一个反映多输出GP的LDS，即多区域马尔可夫高斯过程（MRM-GP），将这两种方法结合起来。我们的工作建立了LDS和多输出GP之间的连接，明确地模拟了神经记录的潜在空间中的频率和相位延迟。因此，该模型在时间点上实现了线性推断成本，并提供了一个可解释的低维表示，揭示了脑区之间的通信方向，并将振荡通信分成不同的频率带。

更新时间: 2024-05-30 07:35:58

领域: q-bio.NC,cs.LG

下载: http://arxiv.org/abs/2402.02686v3

SolNet: Open-source deep learning models for photovoltaic power forecasting across the globe

Deep learning models have gained increasing prominence in recent years in the field of solar pho-tovoltaic (PV) forecasting. One drawback of these models is that they require a lot of high-quality data to perform well. This is often infeasible in practice, due to poor measurement infrastructure in legacy systems and the rapid build-up of new solar systems across the world. This paper proposes SolNet: a novel, general-purpose, multivariate solar power forecaster, which addresses these challenges by using a two-step forecasting pipeline which incorporates transfer learning from abundant synthetic data generated from PVGIS, before fine-tuning on observational data. Using actual production data from hundreds of sites in the Netherlands, Australia and Belgium, we show that SolNet improves forecasting performance over data-scarce settings as well as baseline models. We find transfer learning benefits to be the strongest when only limited observational data is available. At the same time we provide several guidelines and considerations for transfer learning practitioners, as our results show that weather data, seasonal patterns, amount of synthetic data and possible mis-specification in source location, can have a major impact on the results. The SolNet models created in this way are applicable for any land-based solar photovoltaic system across the planet where simulated and observed data can be combined to obtain improved forecasting capabilities.

Updated: 2024-05-30 07:29:58

标题: SolNet: 用于全球光伏电力预测的开源深度学习模型

摘要: 深度学习模型在近年来在太阳能光伏（PV）预测领域日益显著。这些模型的一个缺点是它们需要大量高质量的数据才能表现良好。由于传统系统中测量基础设施不足以及全球各地新太阳能系统的快速建设，这在实践中通常是不可行的。本文提出了SolNet：一种新颖的、通用的、多变量的太阳能发电预测器，通过使用从PVGIS生成的大量合成数据进行迁移学习，然后在观测数据上进行微调，解决了这些挑战。利用荷兰、澳大利亚和比利时数百个站点的实际生产数据，我们展示了SolNet在数据稀缺环境以及基准模型上提高了预测性能。我们发现，当只有有限的观测数据可用时，迁移学习的效益最大。同时，我们为迁移学习从业者提供了几条指导原则和考虑事项，因为我们的结果表明，天气数据、季节模式、合成数据量以及源位置可能的错误规范都会对结果产生重大影响。通过这种方式创建的SolNet模型适用于任何地球上的陆地太阳能光伏系统，其中可以将模拟数据和观测数据相结合以获得改进的预测能力。

更新时间: 2024-05-30 07:29:58

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2405.14472v2

Towards Unified Multi-granularity Text Detection with Interactive Attention

Existing OCR engines or document image analysis systems typically rely on training separate models for text detection in varying scenarios and granularities, leading to significant computational complexity and resource demands. In this paper, we introduce "Detect Any Text" (DAT), an advanced paradigm that seamlessly unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model. This design enables DAT to efficiently manage text instances at different granularities, including *word*, *line*, *paragraph* and *page*. A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances at varying granularities by correlating structural information across different text queries. As a result, it enables the model to achieve mutually beneficial detection performances across multiple text granularities. Additionally, a prompt-based segmentation module refines detection outcomes for texts of arbitrary curvature and complex layouts, thereby improving DAT's accuracy and expanding its real-world applicability. Experimental results demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks, including multi-oriented/arbitrarily-shaped scene text detection, document layout analysis and page detection tasks.

Updated: 2024-05-30 07:25:23

标题: 朝向具有交互式注意力的统一多粒度文本检测

摘要: 现有的OCR引擎或文档图像分析系统通常依赖于在不同场景和粒度中训练单独的文本检测模型，从而导致显著的计算复杂性和资源需求。在本文中，我们介绍了“Detect Any Text”（DAT），这是一种先进的范式，将场景文本检测、布局分析和文档页面检测无缝地统一为一个连贯的端到端模型。这种设计使得DAT能够高效地管理不同粒度的文本实例，包括*单词*、*行*、*段落*和*页面*。DAT中的一个关键创新是跨粒度交互式注意力模块，通过相关联不同文本查询的结构信息，显著增强了不同粒度文本实例的表示学习。因此，它使模型能够在多个文本粒度上实现互惠的检测性能。此外，一个基于提示的分割模块可以细化检测结果，适用于任意曲线和复杂布局的文本，从而提高DAT的准确性并扩展其在现实世界中的适用性。实验结果表明，DAT在各种与文本相关的基准测试中取得了最先进的表现，包括多方向/任意形状的场景文本检测、文档布局分析和页面检测任务。

更新时间: 2024-05-30 07:25:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19765v1

Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making

The recent success of Transformer in natural language processing has sparked its use in various domains. In offline reinforcement learning (RL), Decision Transformer (DT) is emerging as a promising model based on Transformer. However, we discovered that the attention module of DT is not appropriate to capture the inherent local dependence pattern in trajectories of RL modeled as a Markov decision process. To overcome the limitations of DT, we propose a novel action sequence predictor, named Decision ConvFormer (DC), based on the architecture of MetaFormer, which is a general structure to process multiple entities in parallel and understand the interrelationship among the multiple entities. DC employs local convolution filtering as the token mixer and can effectively capture the inherent local associations of the RL dataset. In extensive experiments, DC achieved state-of-the-art performance across various standard RL benchmarks while requiring fewer resources. Furthermore, we show that DC better understands the underlying meaning in data and exhibits enhanced generalization capability.

Updated: 2024-05-30 07:19:34

标题: Decision ConvFormer：MetaFormer中的局部过滤已足以进行决策-making

摘要: 最近Transformer在自然语言处理中的成功引发了它在各个领域的应用。在离线强化学习（RL）中，基于Transformer的Decision Transformer（DT）作为一种有前景的模型正在崭露头角。然而，我们发现DT的注意力模块不适合捕捉RL轨迹中作为马尔可夫决策过程建模的固有局部依赖模式。为了克服DT的局限性，我们提出了一种新颖的动作序列预测器，名为Decision ConvFormer（DC），基于MetaFormer的架构，MetaFormer是一种处理多个实体并理解多个实体之间相互关系的通用结构。DC采用局部卷积滤波作为标记混合器，可以有效捕捉RL数据集的固有局部关联。在广泛的实验中，DC在各种标准RL基准测试中取得了最先进的性能，同时需要更少的资源。此外，我们展示了DC更好地理解数据的潜在含义，并展现了增强的泛化能力。

更新时间: 2024-05-30 07:19:34

领域: cs.LG

下载: http://arxiv.org/abs/2310.03022v3

The Kosmosis Use-Case of Crypto Rug Pull Detection and Prevention

Current methods to prevent crypto asset fraud are based on the analysis of transaction graphs within blockchain networks. While effective for identifying transaction patterns indicative of fraud, it does not capture the semantics of transactions and is constrained to blockchain data. Consequently, preventive methods based on transaction graphs are inherently limited. In response to these limitations, we propose the Kosmosis approach, which aims to incrementally construct a knowledge graph as new blockchain and social media data become available. During construction, it aims to extract the semantics of transactions and connect blockchain addresses to their real-world entities by fusing blockchain and social media data in a knowledge graph. This enables novel preventive methods against rug pulls as a form of crypto asset fraud. To demonstrate the effectiveness and practical applicability of the Kosmosis approach, we examine a series of real-world rug pulls from 2021. Through this case, we illustrate how Kosmosis can aid in identifying and preventing such fraudulent activities by leveraging the insights from the constructed knowledge graph.

Updated: 2024-05-30 07:17:57

标题: 加密地毯拉拔检测和预防的Kosmosis使用案例

摘要: 当前防范加密资产欺诈的方法基于对区块链网络中的交易图的分析。虽然可以有效识别欺诈迹象，但它并未捕捉交易的语义，并且受限于区块链数据。因此，基于交易图的预防方法本质上有限。为了应对这些限制，我们提出了Kosmosis方法，旨在在新的区块链和社交媒体数据可用时逐步构建知识图。在构建过程中，它旨在提取交易的语义，并通过融合区块链和社交媒体数据在知识图中连接区块链地址与其真实世界实体。这使得可以采用新颖的预防方法来防范作为一种加密资产欺诈形式的抢筹码行为。为了展示Kosmosis方法的有效性和实际适用性，我们研究了一系列来自2021年的真实世界抢筹码案例。通过这个案例，我们阐明了Kosmosis如何通过利用构建的知识图的见解来帮助识别和预防此类欺诈活动。

更新时间: 2024-05-30 07:17:57

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2405.19762v1

Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

In this paper, we study differentially private online learning problems in a stochastic environment under both bandit and full information feedback. For differentially private stochastic bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal $O \left(\sum_{j: \Delta_j>0} \frac{\ln(T)}{\min \left\{\Delta_j, \epsilon \right\}} \right)$ instance-dependent regret bound, where $T$ is the finite learning horizon, $\Delta_j$ denotes the suboptimality gap between the optimal arm and a suboptimal arm $j$, and $\epsilon$ is the required privacy parameter. For the differentially private full information setting with stochastic rewards, we show an $\Omega \left(\frac{\ln(K)}{\min \left\{\Delta_{\min}, \epsilon \right\}} \right)$ instance-dependent regret lower bound and an $\Omega\left(\sqrt{T\ln(K)} + \frac{\ln(K)}{\epsilon}\right)$ minimax lower bound, where $K$ is the total number of actions and $\Delta_{\min}$ denotes the minimum suboptimality gap among all the suboptimal actions. For the same differentially private full information setting, we also present an $\epsilon$-differentially private algorithm whose instance-dependent regret and worst-case regret match our respective lower bounds up to an extra $\log(T)$ factor.

Updated: 2024-05-30 07:17:20

标题: 在随机环境中差分隐私在线学习的近似最优算法

摘要: 在这篇论文中，我们研究了在随机环境中，在bandit和完全信息反馈下的差分隐私在线学习问题。对于差分隐私随机bandits，我们提出了基于UCB和Thompson Sampling的算法，这些算法是随时可用的，并且实现了最优的$O \left(\sum_{j: \Delta_j>0} \frac{\ln(T)}{\min \left\{\Delta_j, \epsilon \right\}} \right)$实例相关遗憾界，其中$T$是有限的学习时间，$\Delta_j$表示最优臂和次优臂$j$之间的次优性差距，$\epsilon$是所需的隐私参数。对于具有随机奖励的差分隐私完全信息设置，我们展示了一个$\Omega \left(\frac{\ln(K)}{\min \left\{\Delta_{\min}, \epsilon \right\}} \right)$实例相关遗憾下界和一个$\Omega\left(\sqrt{T\ln(K)} + \frac{\ln(K)}{\epsilon}\right)$极小下界，其中$K$是所有动作的总数，$\Delta_{\min}$表示所有次优动作中的最小次优性差距。对于相同的差分隐私完全信息设置，我们还提出了一个$\epsilon$-差分隐私算法，其实例相关遗憾和最坏情况遗憾与我们各自的下界匹配，最多多出一个$\log(T)$因子。

更新时间: 2024-05-30 07:17:20

领域: cs.LG

下载: http://arxiv.org/abs/2102.07929v3

Revisiting CNNs for Trajectory Similarity Learning

Similarity search is a fundamental but expensive operator in querying trajectory data, due to its quadratic complexity of distance computation. To mitigate the computational burden for long trajectories, neural networks have been widely employed for similarity learning and each trajectory is encoded as a high-dimensional vector for similarity search with linear complexity. Given the sequential nature of trajectory data, previous efforts have been primarily devoted to the utilization of RNNs or Transformers. In this paper, we argue that the common practice of treating trajectory as sequential data results in excessive attention to capturing long-term global dependency between two sequences. Instead, our investigation reveals the pivotal role of local similarity, prompting a revisit of simple CNNs for trajectory similarity learning. We introduce ConvTraj, incorporating both 1D and 2D convolutions to capture sequential and geo-distribution features of trajectories, respectively. In addition, we conduct a series of theoretical analyses to justify the effectiveness of ConvTraj. Experimental results on three real-world large-scale datasets demonstrate that ConvTraj achieves state-of-the-art accuracy in trajectory similarity search. Owing to the simple network structure of ConvTraj, the training and inference speed on the Porto dataset with 1.6 million trajectories are increased by at least $240$x and $2.16$x, respectively. The source code and dataset can be found at \textit{\url{https://github.com/Proudc/ConvTraj}}.

Updated: 2024-05-30 07:16:03

标题: 重新审视卷积神经网络用于轨迹相似性学习

摘要: 相似性搜索是在查询轨迹数据中的一个基本但昂贵的操作，因为其距离计算的二次复杂度。为了减轻长轨迹的计算负担，神经网络已广泛用于相似性学习，每个轨迹被编码为高维向量，以线性复杂度进行相似性搜索。鉴于轨迹数据的序贯性质，先前的努力主要致力于利用循环神经网络或变压器。在本文中，我们认为将轨迹视为序列数据的常见做法导致过多关注捕获两个序列之间的长期全局依赖性。相反，我们的研究揭示了局部相似性的关键作用，促使重新考虑简单的卷积神经网络用于轨迹相似性学习。我们引入ConvTraj，结合1D和2D卷积来分别捕获轨迹的序贯和地理分布特征。此外，我们进行一系列理论分析来证明ConvTraj的有效性。在三个真实世界的大规模数据集上的实验结果表明，ConvTraj在轨迹相似性搜索中实现了最先进的准确性。由于ConvTraj的简单网络结构，Porto数据集上的训练和推理速度分别提高了至少240倍和2.16倍。源代码和数据集可在\textit{\url{https://github.com/Proudc/ConvTraj}}找到。

更新时间: 2024-05-30 07:16:03

领域: cs.AI

下载: http://arxiv.org/abs/2405.19761v1

Medication Recommendation via Dual Molecular Modalities and Multi-Substructure Distillation

Medication recommendation combines patient medical history with biomedical knowledge to assist doctors in determining medication combinations more accurately and safely. Existing approaches based on molecular knowledge overlook the atomic geometric structure of molecules, failing to capture the high-dimensional characteristics and intrinsic physical properties of medications, leading to structural confusion and the inability to extract useful substructures from individual patient visits. To address these limitations, we propose BiMoRec, which overcomes the inherent lack of molecular essential information in 2D molecular structures by incorporating 3D molecular structures and atomic properties. To retain the fast response required of recommendation systems, BiMoRec maximizes the mutual information between the two molecular modalities through bimodal graph contrastive learning, achieving the integration of 2D and 3D molecular graphs, and finally distills substructures through interaction with single patient visits. Specifically, we use deep learning networks to construct a pre-training method to obtain representations of 2D and 3D molecular structures and substructures, and we use contrastive learning to derive mutual information. Subsequently, we generate fused molecular representations through a trained GNN module, re-determining the relevance of substructure representations in conjunction with the patient's clinical history information. Finally, we generate the final medication combination based on the extracted substructure sequences. Our implementation on the MIMIC-III and MIMIC-IV datasets demonstrates that our method achieves state-of-the-art performance. Compared to the next best baseline, our model improves accuracy by 1.8\% while maintaining the same level of DDI as the baseline.

Updated: 2024-05-30 07:13:08

标题: 通过双分子模态和多次结构蒸馏进行药物推荐

摘要: 药物推荐将患者的医疗历史与生物医学知识结合，帮助医生更准确、更安全地确定药物组合。现有的基于分子知识的方法忽略了分子的原子几何结构，无法捕捉药物的高维特征和固有物理属性，导致结构混乱和无法从个体患者访问中提取有用的亚结构。为了解决这些局限性，我们提出了BiMoRec，通过整合3D分子结构和原子属性，克服了2D分子结构中固有的分子基本信息不足。为了保留推荐系统所需的快速响应，BiMoRec通过双模图对比学习最大化两种分子模态之间的互信息，实现了2D和3D分子图的集成，最终通过与单个患者访问的交互提炼了亚结构。具体而言，我们使用深度学习网络构建一个预训练方法，以获得2D和3D分子结构和亚结构的表示，并使用对比学习来推导互信息。随后，我们通过经过训练的GNN模块生成融合的分子表示，再次确定与患者临床历史信息相关的亚结构表示。最后，我们基于提取的亚结构序列生成最终的药物组合。我们在MIMIC-III和MIMIC-IV数据集上的实现表明，我们的方法实现了最先进的性能。与次优基线相比，我们的模型将准确性提高了1.8％，同时保持了与基线相同水平的DDI。

更新时间: 2024-05-30 07:13:08

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2405.20358v1

Identifiability of a statistical model with two latent vectors: Importance of the dimensionality relation and application to graph embedding

Identifiability of statistical models is a key notion in unsupervised representation learning. Recent work of nonlinear independent component analysis (ICA) employs auxiliary data and has established identifiable conditions. This paper proposes a statistical model of two latent vectors with single auxiliary data generalizing nonlinear ICA, and establishes various identifiability conditions. Unlike previous work, the two latent vectors in the proposed model can have arbitrary dimensions, and this property enables us to reveal an insightful dimensionality relation among two latent vectors and auxiliary data in identifiability conditions. Furthermore, surprisingly, we prove that the indeterminacies of the proposed model has the same as \emph{linear} ICA under certain conditions: The elements in the latent vector can be recovered up to their permutation and scales. Next, we apply the identifiability theory to a statistical model for graph data. As a result, one of the identifiability conditions includes an appealing implication: Identifiability of the statistical model could depend on the maximum value of link weights in graph data. Then, we propose a practical method for identifiable graph embedding. Finally, we numerically demonstrate that the proposed method well-recovers the latent vectors and model identifiability clearly depends on the maximum value of link weights, which supports the implication of our theoretical results

Updated: 2024-05-30 07:11:20

标题: 一个具有两个潜变量的统计模型的可辨认性：维度关系的重要性及其在图嵌入中的应用

摘要: 统计模型的可识别性是无监督表示学习中的关键概念。最近的非线性独立分量分析（ICA）工作利用辅助数据并建立了可识别条件。本文提出了一个包含单个辅助数据的两个潜在向量的统计模型，泛化了非线性ICA，并建立了各种可识别条件。与先前的工作不同，所提出模型中的两个潜在向量可以具有任意维度，这一特性使我们能够在可识别条件中揭示两个潜在向量和辅助数据之间的有意义的维度关系。此外，令人惊讶的是，我们证明了在某些条件下，所提出模型的不确定性与\emph{线性} ICA相同：潜在向量中的元素可以恢复到它们的排列和比例。接下来，我们将可识别性理论应用于图数据的统计模型。结果表明，可识别性条件之一包括一个吸引人的含义：统计模型的可识别性可能取决于图数据中链接权重的最大值。然后，我们提出了一种用于可识别图嵌入的实用方法。最后，我们通过数值演示表明，所提出的方法能够很好地恢复潜在向量，并且模型的可识别性明显取决于链接权重的最大值，这支持我们理论结果的含义。

更新时间: 2024-05-30 07:11:20

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.19760v1

Domain Generalisation via Imprecise Learning

Out-of-distribution (OOD) generalisation is challenging because it involves not only learning from empirical data, but also deciding among various notions of generalisation, e.g., optimising the average-case risk, worst-case risk, or interpolations thereof. While this choice should in principle be made by the model operator like medical doctors, this information might not always be available at training time. The institutional separation between machine learners and model operators leads to arbitrary commitments to specific generalisation strategies by machine learners due to these deployment uncertainties. We introduce the Imprecise Domain Generalisation framework to mitigate this, featuring an imprecise risk optimisation that allows learners to stay imprecise by optimising against a continuous spectrum of generalisation strategies during training, and a model framework that allows operators to specify their generalisation preference at deployment. Supported by both theoretical and empirical evidence, our work showcases the benefits of integrating imprecision into domain generalisation.

Updated: 2024-05-30 07:11:03

标题: 领域泛化通过不精确学习

摘要: 超出分布（OOD）泛化是具有挑战性的，因为它不仅涉及从经验数据中学习，还涉及在各种泛化概念之间进行决策，例如，优化平均风险、最坏情况风险，或者它们的插值。虽然这种选择原则上应该由模型操作员像医生一样做出，但这些信息在训练时可能并不总是可用。机器学习者和模型操作员之间的机构分离导致机器学习者由于这些部署不确定性而对特定泛化策略做出任意承诺。我们引入了不精确领域泛化框架来缓解这一问题，该框架具有不精确风险优化，允许学习者在训练过程中通过优化连续谱的泛化策略来保持不精确，并且模型框架允许操作员在部署时指定他们的泛化偏好。通过理论和实证证据的支持，我们的工作展示了将不精确性整合到领域泛化中的好处。

更新时间: 2024-05-30 07:11:03

领域: cs.LG

下载: http://arxiv.org/abs/2404.04669v2

How the Future Works at SOUPS: Analyzing Future Work Statements and Their Impact on Usable Security and Privacy Research

Extending knowledge by identifying and investigating valuable research questions and problems is a core function of research. Research publications often suggest avenues for future work to extend and build upon their results. Considering these suggestions can contribute to developing research ideas that build upon previous work and produce results that tie into existing knowledge. Usable security and privacy researchers commonly add future work statements to their publications. However, our community lacks an in-depth understanding of their prevalence, quality, and impact on future research. Our work aims to address this gap in the research literature. We reviewed all 27 papers from the 2019 SOUPS proceedings and analyzed their future work statements. Additionally, we analyzed 978 publications that cite any paper from SOUPS 2019 proceedings to assess their future work statements' impact. We find that most papers from the SOUPS 2019 proceedings include future work statements. However, they are often unspecific or ambiguous, and not always easy to find. Therefore, the citing publications often matched the future work statements' content thematically, but rarely explicitly acknowledged them, indicating a limited impact. We conclude with recommendations for the usable security and privacy community to improve the utility of future work statements by making them more tangible and actionable, and avenues for future work.

Updated: 2024-05-30 07:07:18

标题: SOUPS会议上未来工作的运作方式：分析未来工作声明及其对可用安全和隐私研究的影响

摘要: 通过识别和研究有价值的研究问题和问题来扩展知识是研究的核心功能。研究出版物通常提出未来工作的途径，以延伸并建立在其结果之上。考虑这些建议可以有助于发展建立在先前工作基础之上并产生与现有知识联系的研究思想。可用安全性和隐私研究人员通常会在其出版物中添加未来工作声明。然而，我们的社区缺乏对这些声明在未来研究中的普及性、质量和影响力的深入理解。我们的工作旨在填补研究文献中的这一空白。我们审查了2019年SOUPS会议的27篇论文，并分析了它们的未来工作声明。此外，我们分析了引用2019年SOUPS会议中任何论文的978篇出版物，以评估它们对未来工作声明的影响。我们发现大多数来自2019年SOUPS会议的论文都包含未来工作声明。然而，它们通常缺乏具体性或含糊不清，并且并不总是容易找到。因此，引用的出版物通常在主题上与未来工作声明的内容相匹配，但很少明确承认它们，表明影响有限。我们总结了有关可用安全性和隐私社区改进未来工作声明的实用性的建议，使其更具体和可操作，并提出未来工作的途径。

更新时间: 2024-05-30 07:07:18

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2405.20785v1

Improving SMOTE via Fusing Conditional VAE for Data-adaptive Noise Filtering

Recent advances in a generative neural network model extend the development of data augmentation methods. However, the augmentation methods based on the modern generative models fail to achieve notable performance for class imbalance data compared to the conventional model, the SMOTE. We investigate the problem of the generative model for imbalanced classification and introduce a framework to enhance the SMOTE algorithm using Variational Autoencoders (VAE). Our approach systematically quantifies the density of data points in a low-dimensional latent space using the VAE, simultaneously incorporating information on class labels and classification difficulty. Then, the data points potentially degrading the augmentation are systematically excluded, and the neighboring observations are directly augmented on the data space. Empirical studies on several imbalanced datasets represent that this simple process innovatively improves the conventional SMOTE algorithm over the deep learning models. Consequently, we conclude that the selection of minority data and the interpolation in the data space are beneficial for imbalanced classification problems with a relatively small number of data points.

Updated: 2024-05-30 07:06:02

标题: 通过融合条件变分自动编码器改进SMOTE算法，实现数据自适应噪声过滤

摘要: 最近关于生成神经网络模型的研究取得了进展，扩展了数据增强方法的发展。然而，基于现代生成模型的增强方法在处理类别不平衡数据时与传统模型SMOTE相比表现不佳。我们研究了生成模型在不平衡分类中的问题，并介绍了一种利用变分自编码器(VAE)增强SMOTE算法的框架。我们的方法通过VAE系统地量化低维潜在空间中数据点的密度，同时结合了类别标签和分类困难度的信息。然后，系统地排除可能降低增强效果的数据点，并直接在数据空间上增强邻近观测。对几个不平衡数据集的实证研究表明，这一简单过程创新性地改进了传统的SMOTE算法，超越了深度学习模型。因此，我们得出结论，对于相对较少的数据点的不平衡分类问题，选择少数数据和在数据空间中进行插值是有益的。

更新时间: 2024-05-30 07:06:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19757v1

Mitigating annotation shift in cancer classification using single image generative models

Artificial Intelligence (AI) has emerged as a valuable tool for assisting radiologists in breast cancer detection and diagnosis. However, the success of AI applications in this domain is restricted by the quantity and quality of available data, posing challenges due to limited and costly data annotation procedures that often lead to annotation shifts. This study simulates, analyses and mitigates annotation shifts in cancer classification in the breast mammography domain. First, a high-accuracy cancer risk prediction model is developed, which effectively distinguishes benign from malignant lesions. Next, model performance is used to quantify the impact of annotation shift. We uncover a substantial impact of annotation shift on multiclass classification performance particularly for malignant lesions. We thus propose a training data augmentation approach based on single-image generative models for the affected class, requiring as few as four in-domain annotations to considerably mitigate annotation shift, while also addressing dataset imbalance. Lastly, we further increase performance by proposing and validating an ensemble architecture based on multiple models trained under different data augmentation regimes. Our study offers key insights into annotation shift in deep learning breast cancer classification and explores the potential of single-image generative models to overcome domain shift challenges.

Updated: 2024-05-30 07:02:50

标题: 使用单图像生成模型缓解癌症分类中的注释偏移

摘要: 人工智能（AI）已经成为乳腺癌检测和诊断中协助放射科医师的宝贵工具。然而，AI在该领域的应用成功受到可用数据的数量和质量的限制，由于常常导致注释偏移的有限且昂贵的数据注释程序，因此面临挑战。本研究模拟、分析和缓解了乳腺钼靶乳腺癌分类中的注释偏移。首先，开发了一个高准确率的癌症风险预测模型，有效区分良性和恶性病变。接下来，利用模型性能量化注释偏移的影响。我们发现注释偏移对多类别分类性能特别是对恶性病变的影响巨大。因此，我们提出了一种基于单图像生成模型的受影响类别的训练数据增强方法，只需要四个领域内的注释即可显著减轻注释偏移，同时也解决了数据集不平衡问题。最后，我们通过提出和验证基于多个模型在不同数据增强方案下训练的集成架构进一步提高性能。我们的研究为深度学习乳腺癌分类中的注释偏移提供了关键见解，并探讨了单图像生成模型克服领域偏移挑战的潜力。

更新时间: 2024-05-30 07:02:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19754v1

Understanding Memory-Regret Trade-Off for Streaming Stochastic Multi-Armed Bandits

We study the stochastic multi-armed bandit problem in the $P$-pass streaming model. In this problem, the $n$ arms are present in a stream and at most $m<n$ arms and their statistics can be stored in the memory. We give a complete characterization of the optimal regret in terms of $m, n$ and $P$. Specifically, we design an algorithm with $\tilde O\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)$ regret and complement it with an $\tilde \Omega\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)$ lower bound when the number of rounds $T$ is sufficiently large. Our results are tight up to a logarithmic factor in $n$ and $P$.

Updated: 2024-05-30 06:56:48

标题: 理解内存-后悔权衡在流式随机多臂老虎机中的运用

摘要: 我们在$P$-pass流模型中研究随机多臂老虎机问题。在这个问题中，$n$个臂以流的形式出现，最多$m<n$个臂及其统计数据可以存储在内存中。我们完全刻画了关于$m, n$和$P$的最优遗憾。具体来说，我们设计了一个算法，其遗憾为$\tilde O\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)$，并且在回合数$T$足够大时，我们用一个$\tilde \Omega\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)$的下界来补充它。我们的结果在$n$和$P$中是紧密的，只有对数因子的差异。

更新时间: 2024-05-30 06:56:48

领域: cs.LG,cs.DS,stat.ML

下载: http://arxiv.org/abs/2405.19752v1

HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

Diffusion Transformers (DiTs) have recently gained substantial attention in both industrial and academic fields for their superior visual generation capabilities, outperforming traditional diffusion models that use U-Net. However,the enhanced performance of DiTs also comes with high parameter counts and implementation costs, seriously restricting their use on resource-limited devices such as mobile phones. To address these challenges, we introduce the Hybrid Floating-point Quantization for DiT(HQ-DiT), an efficient post-training quantization method that utilizes 4-bit floating-point (FP) precision on both weights and activations for DiT inference. Compared to fixed-point quantization (e.g., INT8), FP quantization, complemented by our proposed clipping range selection mechanism, naturally aligns with the data distribution within DiT, resulting in a minimal quantization error. Furthermore, HQ-DiT also implements a universal identity mathematical transform to mitigate the serious quantization error caused by the outliers. The experimental results demonstrate that DiT can achieve extremely low-precision quantization (i.e., 4 bits) with negligible impact on performance. Our approach marks the first instance where both weights and activations in DiTs are quantized to just 4 bits, with only a 0.12 increase in sFID on ImageNet.

Updated: 2024-05-30 06:56:11

标题: HQ-DiT: 高效扩散变压器与FP4混合量化

摘要: 最近，扩散变压器（DiTs）在工业和学术领域都受到了广泛关注，因为它们具有卓越的视觉生成能力，胜过使用U-Net的传统扩散模型。然而，DiTs的增强性能也伴随着高参数数量和实施成本，严重限制了它们在资源有限的设备上（如手机）的使用。为了解决这些挑战，我们引入了适用于DiT的混合浮点量化（HQ-DiT），这是一种高效的后训练量化方法，利用4位浮点（FP）精度对DiT推断中的权重和激活进行量化。与固定点量化（例如INT8）相比，FP量化结合我们提出的剪切范围选择机制，自然地与DiT内的数据分布相一致，导致最小的量化误差。此外，HQ-DiT还实现了一个通用的身份数学变换，以减轻异常值引起的严重量化误差。实验结果表明，DiT可以实现极低精度量化（即4位），对性能影响微乎其微。我们的方法标志着DiTs中权重和激活量化仅为4位的首次实例，在ImageNet上仅增加了0.12的sFID。

更新时间: 2024-05-30 06:56:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19751v1

Understanding and mitigating difficulties in posterior predictive evaluation

Predictive posterior densities (PPDs) are of interest in approximate Bayesian inference. Typically, these are estimated by simple Monte Carlo (MC) averages using samples from the approximate posterior. We observe that the signal-to-noise ratio (SNR) of such estimators can be extremely low. An analysis for exact inference reveals SNR decays exponentially as there is an increase in (a) the mismatch between training and test data, (b) the dimensionality of the latent space, or (c) the size of the test data relative to the training data. Further analysis extends these results to approximate inference. To remedy the low SNR problem, we propose replacing simple MC sampling with importance sampling using a proposal distribution optimized at test time on a variational proxy for the SNR and demonstrate that this yields greatly improved estimates.

Updated: 2024-05-30 06:50:28

标题: 理解和减轻后验预测评估中的困难

摘要: 预测后验概率密度（PPDs）在近似贝叶斯推断中是感兴趣的。通常，这些是通过使用来自近似后验的样本来估计简单的蒙特卡洛（MC）平均值来估计的。我们观察到这些估计器的信噪比（SNR）可能非常低。对于精确推断的分析显示，随着（a）训练和测试数据之间的不匹配程度增加，（b）潜在空间的维度增加，或（c）测试数据相对于训练数据的大小增加，SNR将呈指数衰减。进一步的分析将这些结果扩展到近似推断。为了解决低SNR问题，我们提出用在测试时间优化的提议分布替换简单的MC抽样，并在一个代理信噪比上进行重要性抽样，证明这样可以产生大大改进的估计。

更新时间: 2024-05-30 06:50:28

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.19747v1

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems, built on the principle of archiving discovered states, and iteratively returning to and exploring from the most promising states. This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics to guide exploration, which is time-consuming and infeasible in general. To resolve this, we propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore by replacing these heuristics with the intelligence and internalized human notions of interestingness captured by giant foundation models (FMs). This provides IGE with a human-like ability to instinctively identify how interesting or promising any new state is (e.g. discovering new objects, locations, or behaviors), even in complex environments where heuristics are hard to define. Moreover, IGE offers the exciting and previously impossible opportunity to recognize and capitalize on serendipitous discoveries that cannot be predicted ahead of time. We evaluate IGE on a range of language-based tasks that require search and exploration. In Game of 24, a multistep mathematical reasoning problem, IGE reaches 100% success rate 70.8% faster than the best classic graph search baseline. Next, in BabyAI-Text, a challenging partially observable gridworld, IGE exceeds the previous SOTA with orders of magnitude fewer online samples. Finally, in TextWorld, we show the unique ability of IGE to succeed in settings requiring long-horizon exploration where prior SOTA FM agents like Reflexion completely fail. Overall, IGE combines the tremendous strengths of FMs and the powerful Go-Explore algorithm, opening up a new frontier of research into creating more generally capable agents with impressive exploration capabilities.

Updated: 2024-05-30 06:48:44

标题: 智能Go-Explore：站在巨人基础模型的肩膀上

摘要: Go-Explore是一组强大的算法家族，旨在解决难以探索的问题，其基于存档发现状态并迭代地回到并从最有前途的状态进行探索的原则构建。这种方法已经在包括Atari游戏和机器人控制在内的各种具有挑战性的问题上取得了超人类表现，但需要手动设计启发式来引导探索，这在一般情况下耗时且不可行。为了解决这个问题，我们提出了智能Go-Explore（IGE），通过用巨大的基础模型（FMs）捕捉的有趣性的智能和内化的人类概念代替这些启发式，从而极大地扩展了原始Go-Explore的范围。这为IGE提供了类似于人类的能力，可以本能地识别任何新状态的有趣程度或前景（例如发现新物体、位置或行为），即使在难以定义启发式的复杂环境中也可以做到。此外，IGE提供了一个令人兴奋且以前不可能的机会，即识别和利用无法提前预测的意外发现。我们在一系列需要搜索和探索的基于语言的任务上评估了IGE。在24点游戏中，一个多步数学推理问题，IGE的成功率达到100%，比最佳经典图搜索基线快了70.8%。接下来，在BabyAI-Text中，一个具有挑战性的部分可观察网格世界，IGE以数量级较少的在线样本超过了先前的SOTA。最后，在TextWorld中，我们展示了IGE在需要长期探索的环境中成功的独特能力，之前的SOTA FM代理（如Reflexion）完全失败。总的来说，IGE结合了FMs的巨大优势和强大的Go-Explore算法，开辟了一个新的研究领域，即创建更具有令人印象深刻的探索能力的普遍能力代理。

更新时间: 2024-05-30 06:48:44

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.15143v2

X-Instruction: Aligning Language Model in Low-resource Languages with Self-curated Cross-lingual Instructions

Large language models respond well in high-resource languages like English but struggle in low-resource languages. It may arise from the lack of high-quality instruction following data in these languages. Directly translating English samples into these languages can be a solution but unreliable, leading to responses with translation errors and lacking language-specific or cultural knowledge. To address this issue, we propose a novel method to construct cross-lingual instruction following samples with instruction in English and response in low-resource languages. Specifically, the language model first learns to generate appropriate English instructions according to the natural web texts in other languages as responses. The candidate cross-lingual instruction tuning samples are further refined and diversified. We have employed this method to build a large-scale cross-lingual instruction tuning dataset on 10 languages, namely X-Instruction. The instruction data built using our method incorporate more language-specific knowledge compared with the naive translation method. Experimental results have shown that the response quality of the model tuned on X-Instruction greatly exceeds the model distilled from a powerful teacher model, reaching or even surpassing the ones of ChatGPT. In addition, we find that models tuned on cross-lingual instruction following samples can follow the instruction in the output language without further tuning.

Updated: 2024-05-30 06:45:23

标题: X-指导：用自我策划的跨语言指导对低资源语言中的语言模型进行对齐

摘要: 大型语言模型在英语等高资源语言中表现良好，但在低资源语言中表现不佳。这可能是由于这些语言中缺乏高质量的指导数据所致。直接将英语样本翻译成这些语言可能是一个解决方案，但不可靠，会导致带有翻译错误并缺乏特定语言或文化知识的回应。为了解决这个问题，我们提出了一种新方法，用英语指导和低资源语言回应构建跨语言指导样本。具体来说，语言模型首先学习根据其他语言的自然网络文本生成适当的英语指导。候选的跨语言指导调整样本进一步得到改进和多样化。我们已经采用这种方法在10种语言上构建了一个大规模的跨语言指导调整数据集，即X-Instruction。使用我们的方法构建的指导数据相比于朴素翻译方法包含更多的特定语言知识。实验结果表明，在X-Instruction上调整的模型的响应质量大大超过了从强大的教师模型中提炼出的模型，甚至达到或超越了ChatGPT的模型。此外，我们发现，在跨语言指导样本上调整的模型可以在输出语言中遵循指导而无需进一步调整。

更新时间: 2024-05-30 06:45:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19744v1

May the Dance be with You: Dance Generation Framework for Non-Humanoids

We hypothesize dance as a motion that forms a visual rhythm from music, where the visual rhythm can be perceived from an optical flow. If an agent can recognize the relationship between visual rhythm and music, it will be able to dance by generating a motion to create a visual rhythm that matches the music. Based on this, we propose a framework for any kind of non-humanoid agents to learn how to dance from human videos. Our framework works in two processes: (1) training a reward model which perceives the relationship between optical flow (visual rhythm) and music from human dance videos, (2) training the non-humanoid dancer based on that reward model, and reinforcement learning. Our reward model consists of two feature encoders for optical flow and music. They are trained based on contrastive learning which makes the higher similarity between concurrent optical flow and music features. With this reward model, the agent learns dancing by getting a higher reward when its action creates an optical flow whose feature has a higher similarity with the given music feature. Experiment results show that generated dance motion can align with the music beat properly, and user study result indicates that our framework is more preferred by humans compared to the baselines. To the best of our knowledge, our work of non-humanoid agents which learn dance from human videos is unprecedented. An example video can be found at https://youtu.be/dOUPvo-O3QY.

Updated: 2024-05-30 06:43:55

标题: 愿舞蹈与你同在：非人形舞蹈生成框架

摘要: 我们假设舞蹈是一种从音乐中形成视觉节奏的运动，其中视觉节奏可以从光流中感知。如果一个代理能够识别视觉节奏和音乐之间的关系，它将能够通过生成动作来跳舞，从而创造出与音乐相匹配的视觉节奏。基于此，我们提出了一个框架，用于让任何类型的非人形代理从人类视频中学习跳舞。我们的框架在两个过程中运作：（1）训练一个奖励模型，从人类舞蹈视频中感知光流（视觉节奏）和音乐之间的关系，（2）基于该奖励模型训练非人形舞者，并进行强化学习。我们的奖励模型包括两个光流和音乐的特征编码器。它们是基于对比学习进行训练的，使得同时发生的光流和音乐特征之间的相似性更高。通过这个奖励模型，代理学习跳舞，当其动作产生的光流特征与给定音乐特征的相似性更高时，会获得更高的奖励。实验结果显示生成的舞蹈动作可以与音乐节拍正确对齐，用户研究结果表明，与基准相比，我们的框架更受人类喜爱。据我们所知，我们从人类视频中学习跳舞的非人形代理的工作是前所未有的。示例视频可在https://youtu.be/dOUPvo-O3QY找到。

更新时间: 2024-05-30 06:43:55

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.19743v1

PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations

Expert-designed close-ended benchmarks serve as vital tools in assessing the knowledge capacity of large language models (LLMs). Despite their widespread use, concerns have mounted regarding their reliability due to limited test scenarios and an unavoidable risk of data contamination. To rectify this, we present PertEval, a toolkit devised for in-depth probing of LLMs' knowledge capacity through knowledge-invariant perturbations. These perturbations employ human-like restatement techniques to generate on-the-fly test samples from static benchmarks, meticulously retaining knowledge-critical content while altering irrelevant details. Our toolkit further includes a suite of transition analyses that compare performance on raw vs. perturbed test sets to precisely assess LLMs' genuine knowledge capacity. Six state-of-the-art LLMs are re-evaluated using PertEval. Results reveal significantly inflated performance of the LLMs on raw benchmarks, including an absolute 21% overestimation for GPT-4. Additionally, through a nuanced response pattern analysis, we discover that PertEval retains LLMs' uncertainty to specious knowledge, potentially being resolved through rote memorization and leading to inflated performance. We also find that the detailed transition analyses by PertEval could illuminate weaknesses in existing LLMs' knowledge mastery and guide the development of refinement. Given these insights, we posit that PertEval can act as an essential tool that, when applied alongside any close-ended benchmark, unveils the true knowledge capacity of LLMs, marking a significant step toward more trustworthy LLM evaluation.

Updated: 2024-05-30 06:38:32

标题: PertEval：通过知识不变扰动揭示LLMs的真实知识容量

摘要: 专家设计的封闭式基准测试在评估大型语言模型（LLMs）的知识容量方面起着至关重要的作用。尽管它们被广泛使用，但由于测试情景有限和数据污染的风险，人们对它们的可靠性产生了担忧。为了纠正这一问题，我们提出了PertEval，这是一个专为通过知识不变扰动深入探索LLMs知识容量的工具包。这些扰动利用类似于人类的重新陈述技术，从静态基准测试中生成即时测试样本，精心保留知识关键内容同时改变无关紧要的细节。我们的工具包还包括一套过渡分析，比较原始和扰动测试集的表现，以准确评估LLMs的真实知识容量。使用PertEval重新评估了六种最先进的LLMs。结果显示LLMs在原始基准测试上的表现明显夸大，其中对于GPT-4的绝对过度估计达到21%。此外，通过细致的响应模式分析，我们发现PertEval保留了LLMs对虚假知识的不确定性，可能通过机械记忆得以解决，并导致表现夸大。我们还发现，PertEval的详细过渡分析可以揭示现有LLMs知识掌握的弱点，并指导改进的发展。鉴于这些见解，我们认为PertEval可以作为一个重要工具，当与任何封闭式基准测试一起应用时，揭示LLMs的真实知识容量，标志着更可信赖的LLMs评估迈出了重要一步。

更新时间: 2024-05-30 06:38:32

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.19740v1

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

We introduce Xmodel-VLM, a cutting-edge multimodal vision language model. It is designed for efficient deployment on consumer GPU servers. Our work directly confronts a pivotal industry issue by grappling with the prohibitive service costs that hinder the broad adoption of large-scale multimodal systems. Through rigorous training, we have developed a 1B-scale language model from the ground up, employing the LLaVA paradigm for modal alignment. The result, which we call Xmodel-VLM, is a lightweight yet powerful multimodal vision language model. Extensive testing across numerous classic multimodal benchmarks has revealed that despite its smaller size and faster execution, Xmodel-VLM delivers performance comparable to that of larger models. Our model checkpoints and code are publicly available on GitHub at https://github.com/XiaoduoAILab/XmodelVLM.

Updated: 2024-05-30 06:33:03

标题: Xmodel-VLM：一种用于多模态视觉语言模型的简单基线

摘要: 我们介绍了Xmodel-VLM，这是一种尖端的多模态视觉语言模型。它专为在消费级GPU服务器上高效部署而设计。我们的工作直接面对一个关键的行业问题，即应对阻碍大规模多模态系统广泛采用的昂贵服务成本。通过严格的训练，我们从零开始开发了一个10亿规模的语言模型，采用了LLaVA范式进行模态对齐。我们称之为Xmodel-VLM的结果是一种轻量级但功能强大的多模态视觉语言模型。通过在许多经典多模态基准测试中进行广泛测试，我们发现尽管其体积较小且执行速度更快，Xmodel-VLM的性能仍可与较大模型相媲美。我们的模型检查点和代码可在GitHub上公开获取，网址为https://github.com/XiaoduoAILab/XmodelVLM。

更新时间: 2024-05-30 06:33:03

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.09215v2

Beyond Imitation: Learning Key Reasoning Steps from Dual Chain-of-Thoughts in Reasoning Distillation

As Large Language Models (LLMs) scale up and gain powerful Chain-of-Thoughts (CoTs) reasoning abilities, practical resource constraints drive efforts to distill these capabilities into more compact Smaller Language Models (SLMs). We find that CoTs consist mainly of simple reasoning forms, with a small proportion ($\approx 4.7\%$) of key reasoning steps that truly impact conclusions. However, previous distillation methods typically involve supervised fine-tuning student SLMs only on correct CoTs data produced by teacher LLMs, resulting in students struggling to learn the key reasoning steps, instead imitating the teacher's reasoning forms and making errors or omissions on these steps. To address these issues, drawing an analogy to human learning, where analyzing mistakes according to correct solutions often reveals the crucial steps leading to successes or failures, we propose mistak\textbf{E}-\textbf{D}riven key reason\textbf{I}ng step distilla\textbf{T}ion (\textbf{EDIT}), a novel method that further aids SLMs learning key reasoning steps rather than mere simple fine-tuning. Firstly, to expose these crucial steps in CoTs, we design specific prompts to generate dual CoTs data with similar reasoning paths but divergent conclusions. Then, we apply the minimum edit distance algorithm on the dual CoTs data to locate these key steps and optimize the likelihood of these steps. Extensive experiments validate the effectiveness of EDIT across both in-domain and out-of-domain benchmark reasoning datasets. Further analysis shows that EDIT can generate high-quality CoTs with more correct key reasoning steps. Notably, we also explore how different mistake patterns affect performance and find that EDIT benefits more from logical errors than from knowledge or mathematical calculation errors in dual CoTs\footnote{Code can be found at \url{https://github.com/C-W-D/EDIT}}.

Updated: 2024-05-30 06:32:11

标题: 超越模仿：从推理精炼中的双重思维链学习关键推理步骤

摘要: 随着大型语言模型（LLMs）规模扩大并具有强大的思维链（CoTs）推理能力，实际资源限制驱使努力将这些能力提炼为更紧凑的较小语言模型（SLMs）。我们发现，CoTs主要由简单的推理形式组成，其中仅有一小部分（约4.7％）的关键推理步骤真正影响结论。然而，先前的提炼方法通常涉及仅在由教师LLMs生成的正确CoTs数据上对学生SLMs进行监督微调，导致学生难以学习关键推理步骤，而是模仿教师的推理形式并在这些步骤上出现错误或遗漏。为了解决这些问题，借鉴于人类学习，根据正确解决方案分析错误通常会揭示导致成功或失败的关键步骤，我们提出了基于错误驱动的关键推理步骤提炼（EDIT）方法，这是一种新颖的方法，可以帮助SLMs学习关键推理步骤而不仅仅是简单微调。首先，为了揭示CoTs中的关键步骤，我们设计了特定提示以生成具有类似推理路径但不同结论的双重CoTs数据。然后，我们在双重CoTs数据上应用最小编辑距离算法来定位这些关键步骤并优化这些步骤的可能性。广泛的实验证实了EDIT在领域内和领域外基准推理数据集上的有效性。进一步的分析显示，EDIT可以生成具有更多正确关键推理步骤的高质量CoTs。值得注意的是，我们还探讨了不同的错误模式如何影响性能，并发现EDIT更多受益于逻辑错误而不是知识或数学计算错误在双重CoTs中。

更新时间: 2024-05-30 06:32:11

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19737v1

Learning Task-relevant Sequence Representations via Intrinsic Dynamics Characteristics in Reinforcement Learning

Learning task-relevant state representations is crucial to solving the problem of scene generalization in visual deep reinforcement learning. Prior work typically establishes a self-supervised auxiliary learner, introducing elements (e.g., rewards and actions) to extract task-relevant state information from observations through behavioral similarity metrics. However, the methods often ignore the inherent relationships between the elements (e.g., dynamics relationships) that are essential for learning accurate representations, and they are also limited to single-step metrics, which impedes the discrimination of short-term similar task/behavior information in long-term dynamics transitions. To solve the issues, we propose an intrinsic dynamic characteristics-driven sequence representation learning method (DSR) over a common DRL frame. Concretely, inspired by the fact of state transition in the underlying system, it constrains the optimization of the encoder via modeling the dynamics equations related to the state transition, which prompts the latent encoding information to satisfy the state transition process and thereby distinguishes state space and noise space. Further, to refine the ability of encoding similar tasks based on dynamics constraints, DSR also sequentially models inherent dynamics equation relationships from the perspective of sequence elements' frequency domain and multi-step prediction. Finally, experimental results show that DSR has achieved a significant performance boost in the Distracting DMControl Benchmark, with an average of 78.9% over the backbone baseline. Further results indicate that it also achieves the best performance in real-world autonomous driving tasks in the CARLA simulator. Moreover, the qualitative analysis results of t-SNE visualization validate that our method possesses superior representation ability on visual tasks.

Updated: 2024-05-30 06:31:03

标题: 通过强化学习中的内在动力学特征学习任务相关的序列表示

摘要: 学习与任务相关的状态表示对于解决视觉深度强化学习中的场景泛化问题至关重要。先前的工作通常建立了一个自监督的辅助学习器，引入元素（例如奖励和行动），通过行为相似性度量从观察中提取与任务相关的状态信息。然而，这些方法通常忽视了元素之间的固有关系（例如动态关系），这些关系对于学习准确的表示是必不可少的，它们也仅限于单步度量，这妨碍了对长期动态转换中的短期相似任务/行为信息的区分。为了解决这些问题，我们提出了一种基于内在动态特征驱动的序列表示学习方法（DSR）在通用的DRL框架上。具体地，受到底层系统中状态转移的事实的启发，它通过建模与状态转移有关的动力学方程来约束编码器的优化，促使潜在编码信息满足状态转移过程，从而区分状态空间和噪声空间。此外，为了改进基于动态约束的编码相似任务的能力，DSR还从序列元素的频域和多步预测的角度顺序建模固有的动态方程关系。最后，实验结果显示，DSR在分散DMControl基准测试中取得了显著的性能提升，平均提高了78.9％以上的基线。进一步的结果表明，它还在CARLA模拟器中的真实世界自动驾驶任务中取得了最佳表现。此外，t-SNE可视化的定性分析结果验证了我们的方法在视觉任务上具有优越的表示能力。

更新时间: 2024-05-30 06:31:03

领域: cs.AI

下载: http://arxiv.org/abs/2405.19736v1

HLOB -- Information Persistence and Structure in Limit Order Books

We introduce a novel large-scale deep learning model for Limit Order Book mid-price changes forecasting, and we name it `HLOB'. This architecture (i) exploits the information encoded by an Information Filtering Network, namely the Triangulated Maximally Filtered Graph, to unveil deeper and non-trivial dependency structures among volume levels; and (ii) guarantees deterministic design choices to handle the complexity of the underlying system by drawing inspiration from the groundbreaking class of Homological Convolutional Neural Networks. We test our model against 9 state-of-the-art deep learning alternatives on 3 real-world Limit Order Book datasets, each including 15 stocks traded on the NASDAQ exchange, and we systematically characterize the scenarios where HLOB outperforms state-of-the-art architectures. Our approach sheds new light on the spatial distribution of information in Limit Order Books and on its degradation over increasing prediction horizons, narrowing the gap between microstructural modeling and deep learning-based forecasting in high-frequency financial markets.

Updated: 2024-05-30 06:29:36

标题: HLOB -- 限价订单簿中的信息持久性和结构

摘要: 我们介绍了一种新颖的用于预测限价订单簿中价位变化的大规模深度学习模型，我们将其命名为`HLOB'。这种架构（i）利用信息过滤网络编码的信息，即三角形最大过滤图，以揭示交易量水平之间更深层次且非平凡的依赖结构；（ii）通过借鉴开创性的同调卷积神经网络类设计选择来处理底层系统的复杂性。我们在3个真实的限价订单簿数据集上测试我们的模型，每个数据集包括在NASDAQ交易所交易的15只股票，并系统地表征了HLOB在何种情景下优于最先进的架构。我们的方法为限价订单簿中信息的空间分布以及在不断增加的预测时间跨度下的恶化提供了新的见解，缩小了微观结构建模和基于深度学习的高频金融市场预测之间的差距。

更新时间: 2024-05-30 06:29:36

领域: q-fin.TR,cs.LG

下载: http://arxiv.org/abs/2405.18938v2

Vocabulary Attack to Hijack Large Language Model Applications

The fast advancements in Large Language Models (LLMs) are driving an increasing number of applications. Together with the growing number of users, we also see an increasing number of attackers who try to outsmart these systems. They want the model to reveal confidential information, specific false information, or offensive behavior. To this end, they manipulate their instructions for the LLM by inserting separators or rephrasing them systematically until they reach their goal. Our approach is different. It inserts words from the model vocabulary. We find these words using an optimization procedure and embeddings from another LLM (attacker LLM). We prove our approach by goal hijacking two popular open-source LLMs from the Llama2 and the Flan-T5 families, respectively. We present two main findings. First, our approach creates inconspicuous instructions and therefore it is hard to detect. For many attack cases, we find that even a single word insertion is sufficient. Second, we demonstrate that we can conduct our attack using a different model than the target model to conduct our attack with.

Updated: 2024-05-30 06:28:31

标题: 大型语言模型应用的词汇攻击

摘要: 大型语言模型（LLMs）的快速发展推动着越来越多的应用。随着用户数量的增长，我们也看到越来越多的攻击者试图智胜这些系统。他们希望模型透露机密信息、特定虚假信息或冒犯性行为。为此，他们通过插入分隔符或系统性地重新表述指令来操纵LLM。我们的方法是不同的。我们从模型词汇中插入词语。我们使用优化程序和来自另一个LLM（攻击者LLM）的嵌入来找到这些词语。我们通过目标劫持Llama2和Flan-T5家族中的两个流行开源LLM，证明了我们的方法。我们提出了两个主要发现。首先，我们的方法创建了不引人注目的指令，因此很难检测。对于许多攻击案例，我们发现即使插入一个单词也足够了。其次，我们证明我们可以使用不同的模型进行攻击。

更新时间: 2024-05-30 06:28:31

领域: cs.CR,cs.AI,cs.DC

下载: http://arxiv.org/abs/2404.02637v2

Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial data intelligent large models and their applications in urban environments, aerospace remote sensing, geography, transportation, and other scenarios. Additionally, it summarizes the latest application cases of spatial data intelligent large models in themes such as urban development, multimodal systems, remote sensing, smart transportation, and resource environments. Finally, the report concludes with an overview and outlook on the development prospects of spatial data intelligent large models.

Updated: 2024-05-30 06:21:34

标题: 空间数据智能基础模型研究：中国2024年关于空间数据智能战略发展的白皮书

摘要: 这份报告侧重于空间数据智能大模型，深入探讨这些模型的原理、方法和前沿应用。报告对空间数据智能大模型的定义、发展历史、当前状况和趋势进行了深入讨论，以及它们面临的挑战。报告系统地阐明了空间数据智能大模型的关键技术及其在城市环境、航空航天遥感、地理学、交通运输和其他场景中的应用。此外，报告总结了空间数据智能大模型在城市发展、多模式系统、遥感、智能交通和资源环境等主题的最新应用案例。最后，报告总结了空间数据智能大模型发展前景的概述和展望。

更新时间: 2024-05-30 06:21:34

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19730v1

Dynamic feature selection in medical predictive monitoring by reinforcement learning

In this paper, we investigate dynamic feature selection within multivariate time-series scenario, a common occurrence in clinical prediction monitoring where each feature corresponds to a bio-test result. Many existing feature selection methods fall short in effectively leveraging time-series information, primarily because they are designed for static data. Our approach addresses this limitation by enabling the selection of time-varying feature subsets for each patient. Specifically, we employ reinforcement learning to optimize a policy under maximum cost restrictions. The prediction model is subsequently updated using synthetic data generated by trained policy. Our method can seamlessly integrate with non-differentiable prediction models. We conducted experiments on a sizable clinical dataset encompassing regression and classification tasks. The results demonstrate that our approach outperforms strong feature selection baselines, particularly when subjected to stringent cost limitations. Code will be released once paper is accepted.

Updated: 2024-05-30 06:21:11

标题: 强化学习在医疗预测监测中的动态特征选择

摘要: 在本文中，我们研究了多变量时间序列情景中的动态特征选择，这在临床预测监测中是一个常见现象，其中每个特征对应于生物测试结果。许多现有的特征选择方法在有效利用时间序列信息方面存在不足，主要是因为它们是为静态数据设计的。我们的方法通过允许为每个患者选择动态特征子集来解决这一限制。具体来说，我们使用强化学习来在最大成本限制下优化策略。预测模型随后使用训练策略生成的合成数据进行更新。我们的方法可以与不可微分的预测模型无缝集成。我们在一个包含回归和分类任务的大型临床数据集上进行了实验。结果表明，我们的方法胜过强特征选择基线，尤其是在面对严格的成本限制时。一旦论文被接受，代码将被发布。

更新时间: 2024-05-30 06:21:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19729v1

FedNoisy: Federated Noisy Label Learning Benchmark

Federated learning has gained popularity for distributed learning without aggregating sensitive data from clients. But meanwhile, the distributed and isolated nature of data isolation may be complicated by data quality, making it more vulnerable to noisy labels. Many efforts exist to defend against the negative impacts of noisy labels in centralized or federated settings. However, there is a lack of a benchmark that comprehensively considers the impact of noisy labels in a wide variety of typical FL settings. In this work, we serve the first standardized benchmark that can help researchers fully explore potential federated noisy settings. Also, we conduct comprehensive experiments to explore the characteristics of these data settings and unravel challenging scenarios on the federated noisy label learning, which may guide method development in the future. We highlight the 20 basic settings for more than 5 datasets proposed in our benchmark and standardized simulation pipeline for federated noisy label learning. We hope this benchmark can facilitate idea verification in federated learning with noisy labels. \texttt{FedNoisy} is available at \codeword{https://github.com/SMILELab-FL/FedNoisy}.

Updated: 2024-05-30 06:16:19

标题: FedNoisy: 联邦式嘈杂标签学习基准

摘要: 联邦学习因在不聚合客户敏感数据的情况下进行分布式学习而备受欢迎。但与此同时，数据隔离的分布式和孤立性可能会受到数据质量的复杂影响，使其更容易受到噪声标签的影响。存在许多努力来抵御中央化或联邦设置中噪声标签的负面影响。然而，缺乏一个全面考虑各种典型联邦学习设置中噪声标签影响的基准。在这项工作中，我们提供了第一个标准化的基准，可帮助研究人员充分探索潜在的联邦学习噪声设置。此外，我们进行了全面实验，探索这些数据设置的特征，并揭示联邦学习噪声标签学习中的挑战性场景，这可能指导未来的方法发展。我们强调了我们的基准中提出的超过5个数据集的20种基本设置，以及联邦学习噪声标签学习的标准化仿真流程。我们希望这个基准可以促进在带有噪声标签的联邦学习中的想法验证。FedNoisy可在https://github.com/SMILELab-FL/FedNoisy 上找到。

更新时间: 2024-05-30 06:16:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.11650v3

Density Descent for Diversity Optimization

Diversity optimization seeks to discover a set of solutions that elicit diverse features. Prior work has proposed Novelty Search (NS), which, given a current set of solutions, seeks to expand the set by finding points in areas of low density in the feature space. However, to estimate density, NS relies on a heuristic that considers the k-nearest neighbors of the search point in the feature space, which yields a weaker stability guarantee. We propose Density Descent Search (DDS), an algorithm that explores the feature space via CMA-ES on a continuous density estimate of the feature space that also provides a stronger stability guarantee. We experiment with DDS and two density estimation methods: kernel density estimation (KDE) and continuous normalizing flow (CNF). On several standard diversity optimization benchmarks, DDS outperforms NS, the recently proposed MAP-Annealing algorithm, and other state-of-the-art baselines. Additionally, we prove that DDS with KDE provides stronger stability guarantees than NS, making it more suitable for adaptive optimizers. Furthermore, we prove that NS is a special case of DDS that descends a KDE of the feature space.

Updated: 2024-05-30 06:13:44

标题: 密度下降用于多样性优化

摘要: 多样性优化旨在发现一组能引出多样性特征的解决方案。先前的研究提出了新颖性搜索（NS），它在给定当前解决方案集的情况下，通过在特征空间中低密度区域找到点来扩展集合。然而，为了估计密度，NS依赖于一种启发式方法，考虑搜索点在特征空间中的k个最近邻居，这导致了较弱的稳定性保证。我们提出了密度下降搜索（DDS）算法，该算法通过CMA-ES在特征空间上进行探索，利用特征空间的连续密度估计，同时提供了更强的稳定性保证。我们对DDS和两种密度估计方法进行实验：核密度估计（KDE）和连续正规化流（CNF）。在几个标准的多样性优化基准测试中，DDS表现优于NS，最近提出的MAP-Annealing算法以及其他最新技术基准。此外，我们证明DDS与KDE提供更强的稳定性保证，比NS更适合自适应优化器。此外，我们证明NS是DDS的一个特例，它降低了特征空间的KDE。

更新时间: 2024-05-30 06:13:44

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2312.11331v2

Encoding and Controlling Global Semantics for Long-form Video Question Answering

Seeking answers effectively for long videos is essential to build video question answering (videoQA) systems. Previous methods adaptively select frames and regions from long videos to save computations. However, this fails to reason over the whole sequence of video, leading to sub-optimal performance. To address this problem, we introduce a state space layer (SSL) into multi-modal Transformer to efficiently integrate global semantics of the video, which mitigates the video information loss caused by frame and region selection modules. Our SSL includes a gating unit to enable controllability over the flow of global semantics into visual representations. To further enhance the controllability, we introduce a cross-modal compositional congruence (C^3) objective to encourage global semantics aligned with the question. To rigorously evaluate long-form videoQA capacity, we construct two new benchmarks Ego-QA and MAD-QA featuring videos of considerably long length, i.e. 17.5 minutes and 1.9 hours, respectively. Extensive experiments demonstrate the superiority of our framework on these new as well as existing datasets.

Updated: 2024-05-30 06:10:10

标题: 对于长篇视频问答，编码和控制全局语义

摘要: 为了有效地在长视频中寻找答案，建立视频问答（videoQA）系统至关重要。先前的方法通过自适应地选择长视频中的帧和区域来节省计算。然而，这种方法未能对整个视频序列进行推理，导致性能次优。为了解决这个问题，我们在多模态Transformer中引入了一个状态空间层（SSL），以有效地整合视频的全局语义，从而减轻帧和区域选择模块引起的视频信息丢失。我们的SSL包括一个门控单元，以实现对全局语义流入视觉表示的可控性。为了进一步增强可控性，我们引入了一个跨模态组合一致性（C^3）目标，以鼓励全局语义与问题对齐。为了严格评估长形视频QA的能力，我们构建了两个新的基准测试Ego-QA和MAD-QA，分别展示了相当长的视频长度，即17.5分钟和1.9小时。大量实验证明了我们的框架在这些新的和现有数据集上的优越性。

更新时间: 2024-05-30 06:10:10

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19723v1

Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations

Randomly-hashed item ids are used ubiquitously in recommendation models. However, the learned representations from random hashing prevents generalization across similar items, causing problems of learning unseen and long-tail items, especially when item corpus is large, power-law distributed, and evolving dynamically. In this paper, we propose using content-derived features as a replacement for random ids. We show that simply replacing ID features with content-based embeddings can cause a drop in quality due to reduced memorization capability. To strike a good balance of memorization and generalization, we propose to use Semantic IDs -- a compact discrete item representation learned from frozen content embeddings using RQ-VAE that captures the hierarchy of concepts in items -- as a replacement for random item ids. Similar to content embeddings, the compactness of Semantic IDs poses a problem of easy adaption in recommendation models. We propose novel methods for adapting Semantic IDs in industry-scale ranking models, through hashing sub-pieces of of the Semantic-ID sequences. In particular, we find that the SentencePiece model that is commonly used in LLM tokenization outperforms manually crafted pieces such as N-grams. To the end, we evaluate our approaches in a real-world ranking model for YouTube recommendations. Our experiments demonstrate that Semantic IDs can replace the direct use of video IDs by improving the generalization ability on new and long-tail item slices without sacrificing overall model quality.

Updated: 2024-05-30 05:53:39

标题: 使用语义ID获得更好的泛化性能：推荐排名的案例研究

摘要: 随机哈希的项目id在推荐模型中被广泛使用。然而，从随机哈希学到的表示会阻止类似项目之间的泛化，导致学习未见和长尾项目时出现问题，特别是当项目语料库庞大、符合幂律分布且动态演化时。在本文中，我们提出使用基于内容的特征替代随机id。我们展示了仅仅用基于内容的嵌入替换ID特征可能会因为减少了记忆能力而导致质量下降。为了在记忆和泛化之间取得良好平衡，我们提出使用语义ID -- 从冻结的内容嵌入中学习到的紧凑离散项目表示，使用RQ-VAE捕捉项目中概念的层次结构 -- 替代随机项目id。与内容嵌入类似，语义ID的紧凑性会带来易于在推荐模型中适应的问题。我们提出了在行业规模排名模型中调整语义ID的新方法，通过对语义ID序列的哈希子片段。特别是，我们发现在LLM标记化中常用的SentencePiece模型优于手动制作的片段，如N-gram。最终，我们在YouTube推荐的实际排名模型中评估了我们的方法。我们的实验表明，语义ID可以通过提高对新和长尾项目片段的泛化能力来替代直接使用视频ID，而不牺牲整体模型质量。

更新时间: 2024-05-30 05:53:39

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2306.08121v2

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Speculative decoding reduces the inference latency of a target large language model via utilizing a smaller and faster draft model. Its performance depends on a hyperparameter K -- the candidate length, i.e., the number of candidate tokens for the target model to verify in each round. However, previous methods often use simple heuristics to choose K, which may result in sub-optimal performance. We study the choice of the candidate length K and formulate it as a Markov Decision Process. We theoretically show that the optimal policy of this Markov decision process takes the form of a threshold policy, i.e., the current speculation should stop and be verified when the probability of getting a rejection exceeds a threshold value. Motivated by this theory, we propose SpecDec++, an enhanced version of speculative decoding that adaptively determines the candidate length on the fly. We augment the draft model with a trained acceptance prediction head to predict the conditional acceptance probability of the candidate tokens. SpecDec++ will stop the current speculation when the predicted probability that at least one token gets rejected exceeds a threshold. We implement SpecDec++ and apply it to the llama-2-chat 7B & 70B model pair. Our adaptive method achieves a 2.04x speedup on the Alpaca dataset (an additional 7.2% improvement over the baseline speculative decoding). On the GSM8K and HumanEval datasets, our method achieves a 2.26x speedup (9.4% improvement) and 2.23x speedup (11.1% improvement), respectively.

Updated: 2024-05-30 05:49:38

标题: SpecDec++：通过自适应候选长度增强推测解码

摘要: 推测解码通过利用一个更小更快的草稿模型，减少了目标大型语言模型的推理延迟。其性能取决于一个超参数K -- 候选长度，即目标模型在每一轮中需要验证的候选标记的数量。然而，先前的方法通常使用简单的启发式方法来选择K，这可能导致次优的性能。我们研究了候选长度K的选择，并将其形式化为一个马尔可夫决策过程。我们在理论上表明，这个马尔可夫决策过程的最优策略采用了一个阈值策略，即当获得拒绝的概率超过一个阈值时，当前的推测应该停止并进行验证。受到这个理论的启发，我们提出了SpecDec++，这是一种增强版本的推测解码，可以动态确定候选长度。我们通过训练一个接受预测头来增强草稿模型，以预测候选标记的条件接受概率。当预测的至少有一个标记被拒绝的概率超过阈值时，SpecDec++将停止当前的推测。我们实现了SpecDec++并将其应用于llama-2-chat 7B & 70B模型对。我们的自适应方法在Alpaca数据集上实现了2.04倍的加速（比基准推测解码额外提高了7.2%）。在GSM8K和HumanEval数据集上，我们的方法分别实现了2.26倍（9.4%改进）和2.23倍（11.1%改进）的加速。

更新时间: 2024-05-30 05:49:38

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19715v1

Signal-noise separation using unsupervised reservoir computing

Removing noise from a signal without knowing the characteristics of the noise is a challenging task. This paper introduces a signal-noise separation method based on time series prediction. We use Reservoir Computing (RC) to extract the maximum portion of "predictable information" from a given signal. Reproducing the deterministic component of the signal using RC, we estimate the noise distribution from the difference between the original signal and reconstructed one. The method is based on a machine learning approach and requires no prior knowledge of either the deterministic signal or the noise distribution. It provides a way to identify additivity/multiplicativity of noise and to estimate the signal-to-noise ratio (SNR) indirectly. The method works successfully for combinations of various signal and noise, including chaotic signal and highly oscillating sinusoidal signal which are corrupted by non-Gaussian additive/ multiplicative noise. The separation performances are robust and notably outstanding for signals with strong noise, even for those with negative SNR.

Updated: 2024-05-30 05:47:45

标题: 使用无监督储层计算进行信噪分离

摘要: 在不了解噪声特征的情况下消除信号中的噪声是一项具有挑战性的任务。本文介绍了一种基于时间序列预测的信号-噪声分离方法。我们使用储层计算（RC）从给定信号中提取“可预测信息”的最大部分。通过使用RC重现信号的确定性成分，我们通过原始信号和重构信号之间的差异来估计噪声分布。该方法基于机器学习方法，不需要事先了解确定性信号或噪声分布。它提供了一种间接识别噪声的加性/乘性以及估计信噪比（SNR）的方法。该方法成功地适用于各种信号和噪声的组合，包括由非高斯加性/乘性噪声污染的混沌信号和高度振荡的正弦信号。分离性能稳健，尤其对于受强噪声影响的信号表现出显著优异的性能，甚至对于信噪比为负的信号也是如此。

更新时间: 2024-05-30 05:47:45

领域: cs.LG,eess.SP,nlin.CD

下载: http://arxiv.org/abs/2404.04870v2

Automated Multi-Task Learning for Joint Disease Prediction on Electronic Health Records

In the realm of big data and digital healthcare, Electronic Health Records (EHR) have become a rich source of information with the potential to improve patient care and medical research. In recent years, machine learning models have proliferated for analyzing EHR data to predict patients future health conditions. Among them, some studies advocate for multi-task learning (MTL) to jointly predict multiple target diseases for improving the prediction performance over single task learning. Nevertheless, current MTL frameworks for EHR data have significant limitations due to their heavy reliance on human experts to identify task groups for joint training and design model architectures. To reduce human intervention and improve the framework design, we propose an automated approach named AutoDP, which can search for the optimal configuration of task grouping and architectures simultaneously. To tackle the vast joint search space encompassing task combinations and architectures, we employ surrogate model-based optimization, enabling us to efficiently discover the optimal solution. Experimental results on real-world EHR data demonstrate the efficacy of the proposed AutoDP framework. It achieves significant performance improvements over both hand-crafted and automated state-of-the-art methods, also maintains a feasible search cost at the same time.

Updated: 2024-05-30 05:44:00

标题: 自动化多任务学习用于电子健康记录上共同疾病预测

摘要: 在大数据和数字医疗领域，电子健康记录（EHR）已成为一个信息丰富的来源，有潜力改善患者护理和医学研究。近年来，机器学习模型在分析EHR数据以预测患者未来健康状况方面得到了广泛应用。其中，一些研究主张使用多任务学习（MTL）来共同预测多个目标疾病，以提高预测性能，相对于单一任务学习。然而，目前针对EHR数据的MTL框架存在显著限制，因为它们严重依赖人类专家来识别任务组，进行联合训练和设计模型架构。为了减少人为干预并改进框架设计，我们提出了一种名为AutoDP的自动化方法，可以同时搜索任务分组和架构的最佳配置。为了解决涵盖任务组合和架构的广泛联合搜索空间，我们采用基于替代模型的优化方法，使我们能够高效地发现最佳解决方案。对真实世界的EHR数据的实验结果证明了所提出的AutoDP框架的有效性。它在性能方面明显优于手工制作和自动化最先进方法，同时也保持了可行的搜索成本。

更新时间: 2024-05-30 05:44:00

领域: cs.LG

下载: http://arxiv.org/abs/2403.04086v2

Enhancing Adversarial Robustness in SNNs with Sparse Gradients

Spiking Neural Networks (SNNs) have attracted great attention for their energy-efficient operations and biologically inspired structures, offering potential advantages over Artificial Neural Networks (ANNs) in terms of energy efficiency and interpretability. Nonetheless, similar to ANNs, the robustness of SNNs remains a challenge, especially when facing adversarial attacks. Existing techniques, whether adapted from ANNs or specifically designed for SNNs, exhibit limitations in training SNNs or defending against strong attacks. In this paper, we propose a novel approach to enhance the robustness of SNNs through gradient sparsity regularization. We observe that SNNs exhibit greater resilience to random perturbations compared to adversarial perturbations, even at larger scales. Motivated by this, we aim to narrow the gap between SNNs under adversarial and random perturbations, thereby improving their overall robustness. To achieve this, we theoretically prove that this performance gap is upper bounded by the gradient sparsity of the probability associated with the true label concerning the input image, laying the groundwork for a practical strategy to train robust SNNs by regularizing the gradient sparsity. We validate the effectiveness of our approach through extensive experiments on both image-based and event-based datasets. The results demonstrate notable improvements in the robustness of SNNs. Our work highlights the importance of gradient sparsity in SNNs and its role in enhancing robustness.

Updated: 2024-05-30 05:39:27

标题: 使用稀疏梯度增强SNNs中的对抗性稳健性

摘要: 脉冲神经网络（SNNs）因其高效能操作和生物启发结构而引起了广泛关注，相较于人工神经网络（ANNs），在能效和可解释性方面具有潜在优势。然而，类似于ANNs，SNNs的鲁棒性仍然是一个挑战，特别是面对对抗性攻击时。现有的技术，无论是改编自ANNs还是专门设计用于SNNs，都在训练SNNs或对抗强攻击方面存在局限。在本文中，我们提出了一种通过梯度稀疏正则化来增强SNNs鲁棒性的新方法。我们观察到，相较于对抗性扰动，SNNs对随机扰动具有更大的弹性，即使在较大尺度上也是如此。在此基础上，我们旨在缩小SNNs在对抗性和随机扰动下的差距，从而提高其整体鲁棒性。为了实现这一目标，我们在理论上证明了该性能差距被与输入图像关联的真实标签概率的梯度稀疏上限所限制，为通过正则化梯度稀疏来训练鲁棒性SNNs奠定了基础。我们通过对基于图像和基于事件的数据集进行广泛实验，验证了我们方法的有效性。结果表明，SNNs的鲁棒性得到了显著提高。我们的工作突出了梯度稀疏在SNNs中的重要性及其在增强鲁棒性方面的作用。

更新时间: 2024-05-30 05:39:27

领域: cs.NE,cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.20355v1

Text Guided Image Editing with Automatic Concept Locating and Forgetting

With the advancement of image-to-image diffusion models guided by text, significant progress has been made in image editing. However, a persistent challenge remains in seamlessly incorporating objects into images based on textual instructions, without relying on extra user-provided guidance. Text and images are inherently distinct modalities, bringing out difficulties in fully capturing the semantic intent conveyed through language and accurately translating that into the desired visual modifications. Therefore, text-guided image editing models often produce generations with residual object attributes that do not fully align with human expectations. To address this challenge, the models should comprehend the image content effectively away from a disconnect between the provided textual editing prompts and the actual modifications made to the image. In our paper, we propose a novel method called Locate and Forget (LaF), which effectively locates potential target concepts in the image for modification by comparing the syntactic trees of the target prompt and scene descriptions in the input image, intending to forget their existence clues in the generated image. Compared to the baselines, our method demonstrates its superiority in text-guided image editing tasks both qualitatively and quantitatively.

Updated: 2024-05-30 05:36:32

标题: 文本引导的图像编辑：自动概念定位和遗忘

摘要: 随着由文本指导的图像到图像扩散模型的进展，图像编辑取得了显著进展。然而，一个持续的挑战在于根据文本指令将对象无缝地整合到图像中，而不依赖于额外的用户提供的指导。文本和图像本质上是不同的模态，这给完全捕捉通过语言传达的语义意图并准确将其转化为所需的视觉修改带来了困难。因此，以文本为导向的图像编辑模型通常生成具有残余对象属性的生成物，这些属性与人类期望不完全一致。为了解决这一挑战，模型应该有效地理解图像内容，避免在提供的文本编辑提示和实际对图像进行的修改之间存在脱节。在我们的论文中，我们提出了一种称为“定位和遗忘”（LaF）的新方法，通过比较目标提示的句法树和输入图像中的场景描述，有效地定位图像中的潜在目标概念以进行修改，意图忘记它们在生成图像中的存在线索。与基线相比，我们的方法在定性和定量上都显示出在文本指导的图像编辑任务中的优越性。

更新时间: 2024-05-30 05:36:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19708v1

Deep Latent Variable Modeling of Physiological Signals

A deep latent variable model is a powerful method for capturing complex distributions. These models assume that underlying structures, but unobserved, are present within the data. In this dissertation, we explore high-dimensional problems related to physiological monitoring using latent variable models. First, we present a novel deep state-space model to generate electrical waveforms of the heart using optically obtained signals as inputs. This can bring about clinical diagnoses of heart disease via simple assessment through wearable devices. Second, we present a brain signal modeling scheme that combines the strengths of probabilistic graphical models and deep adversarial learning. The structured representations can provide interpretability and encode inductive biases to reduce the data complexity of neural oscillations. The efficacy of the learned representations is further studied in epilepsy seizure detection formulated as an unsupervised learning problem. Third, we propose a framework for the joint modeling of physiological measures and behavior. Existing methods to combine multiple sources of brain data provided are limited. Direct analysis of the relationship between different types of physiological measures usually does not involve behavioral data. Our method can identify the unique and shared contributions of brain regions to behavior and can be used to discover new functions of brain regions. The success of these innovative computational methods would allow the translation of biomarker findings across species and provide insight into neurocognitive analysis in numerous biological studies and clinical diagnoses, as well as emerging consumer applications.

Updated: 2024-05-30 05:36:30

标题: 生理信号的深度潜变量建模

摘要: 深度潜变量模型是捕捉复杂分布的强大方法。这些模型假设数据中存在但未观察到的潜在结构。在本文中，我们通过潜变量模型探索与生理监测相关的高维问题。首先，我们提出了一种新颖的深度状态空间模型，利用光学获取的信号作为输入来生成心脏的电波形。这可以通过可穿戴设备简单评估从而进行临床心脏病的诊断。其次，我们提出了一种结合概率图模型和深度对抗学习优势的脑信号建模方案。这种结构化表示可以提供可解释性，并编码归纳偏见以减少神经振荡数据的复杂性。进一步研究了学习表示的效果，将其应用于癫痫发作检测，构建成为一个无监督学习问题。第三，我们提出了一个联合建模生理测量和行为的框架。目前结合多种脑数据来源的方法有限。通常直接分析不同类型生理测量之间的关系并不涉及行为数据。我们的方法可以识别大脑区域对行为的独特和共享贡献，并可用于发现大脑区域的新功能。这些创新的计算方法的成功将有助于跨物种翻译生物标志物发现，并为众多生物学研究和临床诊断中的神经认知分析提供见解，以及新兴的消费应用。

更新时间: 2024-05-30 05:36:30

领域: cs.LG

下载: http://arxiv.org/abs/2405.19277v2

Universal Online Convex Optimization with $1$ Projection per Round

To address the uncertainty in function types, recent progress in online convex optimization (OCO) has spurred the development of universal algorithms that simultaneously attain minimax rates for multiple types of convex functions. However, for a $T$-round online problem, state-of-the-art methods typically conduct $O(\log T)$ projections onto the domain in each round, a process potentially time-consuming with complicated feasible sets. In this paper, inspired by the black-box reduction of Cutkosky and Orabona (2018), we employ a surrogate loss defined over simpler domains to develop universal OCO algorithms that only require $1$ projection. Embracing the framework of prediction with expert advice, we maintain a set of experts for each type of functions and aggregate their predictions via a meta-algorithm. The crux of our approach lies in a uniquely designed expert-loss for strongly convex functions, stemming from an innovative decomposition of the regret into the meta-regret and the expert-regret. Our analysis sheds new light on the surrogate loss, facilitating a rigorous examination of the discrepancy between the regret of the original loss and that of the surrogate loss, and carefully controlling meta-regret under the strong convexity condition. In this way, with only $1$ projection per round, we establish optimal regret bounds for general convex, exponentially concave, and strongly convex functions simultaneously. Furthermore, we enhance the expert-loss to exploit the smoothness property, and demonstrate that our algorithm can attain small-loss regret for multiple types of convex and smooth functions.

Updated: 2024-05-30 05:29:40

标题: 每轮仅需一个投影的通用在线凸优化

摘要: 为解决函数类型不确定性，最近在线凸优化（OCO）领域取得的进展推动了同时实现多种凸函数的极小-极大速率的通用算法的发展。然而，对于一个$T$轮在线问题，当前最先进的方法通常在每一轮中执行$O(\log T)$次在定义域上的投影，这个过程在复杂的可行集上可能耗时较长。在本文中，受Cutkosky和Orabona（2018）的黑盒化降维的启发，我们使用在简单定义域上定义的替代损失来开发仅需要$1$次投影的通用OCO算法。我们采用预测与专家建议的框架，为每种类型的函数维护一组专家，并通过元算法聚合他们的预测。我们方法的关键在于为强凸函数设计了一种独特的专家损失，这源自对遗憾进行元遗憾和专家遗憾的创新分解。我们的分析为替代损失投下了新的光，促进了对原始损失和替代损失之间遗憾差异的严格检查，并在强凸条件下仔细控制了元遗憾。通过这种方式，每轮仅需要$1$次投影，我们同时建立了一般凸函数、指数凹函数和强凸函数的最优遗憾界。此外，我们改进了专家损失以利用平滑性质，并展示了我们的算法可以对多种类型的凸和光滑函数实现小损失遗憾。

更新时间: 2024-05-30 05:29:40

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.19705v1

Enhancing Sufficient Dimension Reduction via Hellinger Correlation

In this work, we develop a new theory and method for sufficient dimension reduction (SDR) in single-index models, where SDR is a sub-field of supervised dimension reduction based on conditional independence. Our work is primarily motivated by the recent introduction of the Hellinger correlation as a dependency measure. Utilizing this measure, we develop a method capable of effectively detecting the dimension reduction subspace, complete with theoretical justification. Through extensive numerical experiments, we demonstrate that our proposed method significantly enhances and outperforms existing SDR methods. This improvement is largely attributed to our proposed method's deeper understanding of data dependencies and the refinement of existing SDR techniques.

Updated: 2024-05-30 05:29:12

标题: 通过Hellinger相关性增强充分维度减少

摘要: 在这项工作中，我们发展了一种新的理论和方法，用于单指数模型中的充分维度缩减（SDR），其中SDR是基于条件独立性的监督维度缩减的一个子领域。我们的工作主要受到最近引入的Hellinger相关性作为依赖度量的启发。利用这个度量，我们开发了一种能够有效检测维度缩减子空间的方法，并提供了理论上的证明。通过大量的数值实验，我们证明了我们提出的方法明显地提升并优于现有的SDR方法。这种改进很大程度上归因于我们提出的方法对数据依赖性的更深入理解以及对现有SDR技术的完善。

更新时间: 2024-05-30 05:29:12

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2405.19704v1

Significance of Chain of Thought in Gender Bias Mitigation for English-Dravidian Machine Translation

Gender bias in machine translation (MT) systems poses a significant challenge to achieving accurate and inclusive translations. This paper examines gender bias in machine translation systems for languages such as Telugu and Kannada from the Dravidian family, analyzing how gender inflections affect translation accuracy and neutrality using Google Translate and ChatGPT. It finds that while plural forms can reduce bias, individual-centric sentences often maintain the bias due to historical stereotypes. The study evaluates the Chain of Thought processing, noting significant bias mitigation from 80% to 4% in Telugu and from 40% to 0% in Kannada. It also compares Telugu and Kannada translations, emphasizing the need for language specific strategies to address these challenges and suggesting directions for future research to enhance fairness in both data preparation and prompts during inference.

Updated: 2024-05-30 05:26:57

标题: 《思维链在英度拉维底安机器翻译中性别偏见缓解中的重要性》

摘要: 机器翻译系统中的性别偏见对于实现准确和包容的翻译构成了重大挑战。本文研究了德拉维达语系诸如特鲁古语和卡纳达语的机器翻译系统中的性别偏见，分析了性别词尾如何影响翻译的准确性和中立性，使用了谷歌翻译和ChatGPT。研究发现，虽然复数形式可以减少偏见，但个体中心的句子往往保持了由历史刻板印象造成的偏见。该研究评估了思维链处理，在特鲁古语中偏见减少了80%至4%，在卡纳达语中从40%降至0%。它还比较了特鲁古语和卡纳达语的翻译，强调了需要针对性的语言策略来解决这些挑战，并提出了未来研究中改善数据准备和推理过程中提示公平性的方向。

更新时间: 2024-05-30 05:26:57

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19701v1

Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity

Bilevel reinforcement learning (RL), which features intertwined two-level problems, has attracted growing interest recently. The inherent non-convexity of the lower-level RL problem is, however, to be an impediment to developing bilevel optimization methods. By employing the fixed point equation associated with the regularized RL, we characterize the hyper-gradient via fully first-order information, thus circumventing the assumption of lower-level convexity. This, remarkably, distinguishes our development of hyper-gradient from the general AID-based bilevel frameworks since we take advantage of the specific structure of RL problems. Moreover, we propose both model-based and model-free bilevel reinforcement learning algorithms, facilitated by access to the fully first-order hyper-gradient. Both algorithms are provable to enjoy the convergence rate $\mathcal{O}(\epsilon^{-1})$. To the best of our knowledge, this is the first time that AID-based bilevel RL gets rid of additional assumptions on the lower-level problem. In addition, numerical experiments demonstrate that the hyper-gradient indeed serves as an integration of exploitation and exploration.

Updated: 2024-05-30 05:24:20

标题: 双层强化学习中的超梯度发展，无需低层次凸性

摘要: 双层强化学习（RL）具有相互交织的两个层次问题，近年来引起了越来越多的关注。然而，底层RL问题的固有非凸性阻碍了双层优化方法的发展。通过利用与正则化RL相关的不动点方程，我们通过完全一阶信息表征超梯度，从而避免了对底层凸性的假设。这一点显著地区别于基于AID的一般双层框架，因为我们利用了RL问题的特定结构。此外，我们提出了基于模型和无模型的双层强化学习算法，通过访问完全一阶超梯度实现。这两种算法均被证明具有收敛速率$\mathcal{O}(\epsilon^{-1})$。据我们所知，这是AID-based双层RL首次摆脱了对底层问题的额外假设。此外，数值实验表明，超梯度确实作为开发和探索的集成。

更新时间: 2024-05-30 05:24:20

领域: math.OC,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.19697v1

A Multi-Perspective Analysis of Memorization in Large Language Models

Large Language Models (LLMs), trained on massive corpora with billions of parameters, show unprecedented performance in various fields. Though surprised by their excellent performances, researchers also noticed some special behaviors of those LLMs. One of those behaviors is memorization, in which LLMs can generate the same content used to train them. Though previous research has discussed memorization, the memorization of LLMs still lacks explanation, especially the cause of memorization and the dynamics of generating them. In this research, we comprehensively discussed memorization from various perspectives and extended the discussion scope to not only just the memorized content but also less and unmemorized content. Through various studies, we found that: (1) Through experiments, we revealed the relation of memorization between model size, continuation size, and context size. Further, we showed how unmemorized sentences transition to memorized sentences. (2) Through embedding analysis, we showed the distribution and decoding dynamics across model size in embedding space for sentences with different memorization scores. The n-gram statistics analysis presents d (3) An analysis over n-gram and entropy decoding dynamics discovered a boundary effect when the model starts to generate memorized sentences or unmemorized sentences. (4)We trained a Transformer model to predict the memorization of different models, showing that it is possible to predict memorizations by context.

Updated: 2024-05-30 05:13:19

标题: 大型语言模型中记忆力的多角度分析

摘要: 大型语言模型（LLMs）在海量语料库上训练，具有数十亿参数，在各个领域展现了前所未有的性能。尽管研究人员对它们的出色表现感到惊讶，但他们也注意到了这些LLMs的一些特殊行为。其中之一是记忆，LLMs可以生成用于训练它们的相同内容。尽管先前的研究已经讨论了记忆，但LLMs的记忆仍然缺乏解释，尤其是记忆的原因和生成它们的动态。在这项研究中，我们全面讨论了记忆的各个方面，并将讨论范围扩展到不仅仅是记忆的内容，还包括较少和未记忆的内容。通过各种研究，我们发现：（1）通过实验，我们揭示了模型大小、延续大小和上下文大小之间的记忆关系。此外，我们展示了未记忆的句子如何过渡为记忆的句子。(2) 通过嵌入分析，我们展示了在嵌入空间中不同记忆分数句子的模型大小分布和解码动态。n-gram统计分析呈现了（3）在n-gram和熵解码动态分析中发现了当模型开始生成记忆的句子或未记忆的句子时的边界效应。(4) 我们训练了一个Transformer模型来预测不同模型的记忆，表明通过上下文可以预测记忆。

更新时间: 2024-05-30 05:13:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.11577v3

Posterior Sampling-Based Bayesian Optimization with Tighter Bayesian Regret Bounds

Among various acquisition functions (AFs) in Bayesian optimization (BO), Gaussian process upper confidence bound (GP-UCB) and Thompson sampling (TS) are well-known options with established theoretical properties regarding Bayesian cumulative regret (BCR). Recently, it has been shown that a randomized variant of GP-UCB achieves a tighter BCR bound compared with GP-UCB, which we call the tighter BCR bound for brevity. Inspired by this study, this paper first shows that TS achieves the tighter BCR bound. On the other hand, GP-UCB and TS often practically suffer from manual hyperparameter tuning and over-exploration issues, respectively. Therefore, we analyze yet another AF called a probability of improvement from the maximum of a sample path (PIMS). We show that PIMS achieves the tighter BCR bound and avoids the hyperparameter tuning, unlike GP-UCB. Furthermore, we demonstrate a wide range of experiments, focusing on the effectiveness of PIMS that mitigates the practical issues of GP-UCB and TS.

Updated: 2024-05-30 05:12:33

标题: 基于后验抽样的贝叶斯优化与更紧密的贝叶斯遗憾界

摘要: 在贝叶斯优化（BO）中，高斯过程上界置信度（GP-UCB）和汤普森抽样（TS）是已知的选择，具有关于贝叶斯累积遗憾（BCR）的建立了理论性质。最近，已经表明GP-UCB的随机变体相对于GP-UCB实现了更严格的BCR界限，我们简称为更严格的BCR界限。受到这项研究的启发，本文首先展示了TS实现了更严格的BCR界限。另一方面，GP-UCB和TS在实践中经常遇到手动超参数调整和过度探索问题。因此，我们分析了另一个称为样本路径中最大值的改进概率（PIMS）的AF。我们展示了PIMS实现了更严格的BCR界限，并避免了类似GP-UCB的超参数调整问题。此外，我们展示了一系列广泛的实验，重点关注PIMS的有效性，以缓解GP-UCB和TS的实践问题。

更新时间: 2024-05-30 05:12:33

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2311.03760v2

Grade Like a Human: Rethinking Automated Assessment with Large Language Models

While large language models (LLMs) have been used for automated grading, they have not yet achieved the same level of performance as humans, especially when it comes to grading complex questions. Existing research on this topic focuses on a particular step in the grading procedure: grading using predefined rubrics. However, grading is a multifaceted procedure that encompasses other crucial steps, such as grading rubrics design and post-grading review. There has been a lack of systematic research exploring the potential of LLMs to enhance the entire grading~process. In this paper, we propose an LLM-based grading system that addresses the entire grading procedure, including the following key components: 1) Developing grading rubrics that not only consider the questions but also the student answers, which can more accurately reflect students' performance. 2) Under the guidance of grading rubrics, providing accurate and consistent scores for each student, along with customized feedback. 3) Conducting post-grading review to better ensure accuracy and fairness. Additionally, we collected a new dataset named OS from a university operating system course and conducted extensive experiments on both our new dataset and the widely used Mohler dataset. Experiments demonstrate the effectiveness of our proposed approach, providing some new insights for developing automated grading systems based on LLMs.

Updated: 2024-05-30 05:08:15

标题: 像人一样评分：重新思考大型语言模型的自动评估

摘要: 尽管大型语言模型（LLMs）已被用于自动评分，但它们尚未达到与人类相同的性能水平，特别是在评分复杂问题时。现有研究集中在评分程序的一个特定步骤上：使用预定义的评分标准进行评分。然而，评分是一个多方面的程序，涵盖了其他关键步骤，如评分标准设计和评分后审查。目前缺乏系统性研究探讨LLMs增强整个评分过程的潜力。在本文中，我们提出了一个基于LLMs的评分系统，涵盖了整个评分程序，包括以下关键组成部分：1）制定评分标准，不仅考虑问题，还考虑学生答案，从而更准确地反映学生的表现。2）在评分标准的指导下，为每个学生提供准确一致的分数，同时提供定制化的反馈。3）进行评分后审查以更好地确保准确性和公平性。此外，我们收集了一个名为OS的新数据集，来自一所大学的操作系统课程，并在我们的新数据集和广泛使用的Mohler数据集上进行了广泛实验。实验证明了我们提出的方法的有效性，为基于LLMs开发自动评分系统提供了一些新的见解。

更新时间: 2024-05-30 05:08:15

领域: cs.AI

下载: http://arxiv.org/abs/2405.19694v1

Diffusion Policies creating a Trust Region for Offline Reinforcement Learning

Offline reinforcement learning (RL) leverages pre-collected datasets to train optimal policies. Diffusion Q-Learning (DQL), introducing diffusion models as a powerful and expressive policy class, significantly boosts the performance of offline RL. However, its reliance on iterative denoising sampling to generate actions slows down both training and inference. While several recent attempts have tried to accelerate diffusion-QL, the improvement in training and/or inference speed often results in degraded performance. In this paper, we introduce a dual policy approach, Diffusion Trusted Q-Learning (DTQL), which comprises a diffusion policy for pure behavior cloning and a practical one-step policy. We bridge the two polices by a newly introduced diffusion trust region loss. The diffusion policy maintains expressiveness, while the trust region loss directs the one-step policy to explore freely and seek modes within the region defined by the diffusion policy. DTQL eliminates the need for iterative denoising sampling during both training and inference, making it remarkably computationally efficient. We evaluate its effectiveness and algorithmic characteristics against popular Kullback-Leibler (KL) based distillation methods in 2D bandit scenarios and gym tasks. We then show that DTQL could not only outperform other methods on the majority of the D4RL benchmark tasks but also demonstrate efficiency in training and inference speeds. The PyTorch implementation will be made available.

Updated: 2024-05-30 05:04:33

标题: 创建脱机强化学习信任区域的扩散策略

摘要: 离线强化学习（RL）利用预先收集的数据集来训练最优策略。扩散Q学习（DQL）引入扩散模型作为一个强大和表达力强的策略类，显著提升了离线RL的性能。然而，它依赖于迭代去噪采样来生成动作，这既减慢了训练又减慢了推断速度。虽然最近有几次尝试加速扩散QL，但训练和/或推断速度的改进通常导致性能下降。在本文中，我们引入了一种双策略方法，即扩散信任Q学习（DTQL），它包括一个用于纯行为克隆的扩散策略和一个实用的一步策略。我们通过新引入的扩散信任区损失将这两种策略连接起来。扩散策略保持了表达力，而信任区损失指导一步策略自由探索，并在扩散策略定义的区域内寻找模式。DTQL消除了在训练和推断过程中迭代去噪采样的需要，使其在计算上非常高效。我们在2D赌博场景和gym任务中评估了其有效性和算法特性，并与流行的基于Kullback-Leibler（KL）的蒸馏方法进行比较。然后我们展示了DTQL不仅在大多数D4RL基准任务上胜过其他方法，而且在训练和推断速度上也表现出效率。PyTorch实现将会提供。

更新时间: 2024-05-30 05:04:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19690v1

Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue

Tuning language models for dialogue generation has been a prevalent paradigm for building capable dialogue agents. Yet, traditional tuning narrowly views dialogue generation as resembling other language generation tasks, ignoring the role disparities between two speakers and the multi-round interactive process that dialogues ought to be. Such a manner often leads to unsatisfactory chat consistency for the built agent. In this work, we emphasize the interactive, communicative nature of dialogue and argue that it is more feasible to model the speaker roles of agent and user separately, enabling the agent to adhere to its role consistently. With this in mind, we propose an efficient Multi-round Interactive Dialogue Tuning (Midi-Tuning) framework. It models the agent and user individually with two adapters built upon large language models. The adapters make use of respective utterances round by round in alternating order and they are tuned via a round-level memory caching mechanism. Extensive experiments demonstrate that, our framework performs superior to traditional fine-tuning and harbors the tremendous potential for improving dialogue consistency.

Updated: 2024-05-30 04:57:36

标题: 一次指导，多轮一致交流：对话的高效调节框架

摘要: 调整语言模型以用于对话生成一直是构建能力对话代理的一种流行范式。然而，传统的调整方法狭隘地将对话生成视为类似其他语言生成任务，忽略了两个发言者之间的角色差异以及对话应该具有的多轮互动过程。这种方式经常导致构建代理的聊天一致性不尽人意。在这项工作中，我们强调对话的互动性和交流性质，并认为更有可能单独建模代理和用户的发言者角色，使代理能够始终保持其角色。基于这一理念，我们提出了一种高效的多轮互动对话调整（Midi-Tuning）框架。该框架使用建立在大型语言模型之上的两个适配器分别对代理和用户进行建模。这些适配器交替使用每轮的各自话语，并通过轮级记忆缓存机制进行调整。大量实验证明，我们的框架比传统的微调效果更好，并具有极大的潜力来提高对话一致性。

更新时间: 2024-05-30 04:57:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.06967v2

Knowledge Graph Tuning: Real-time Large Language Model Personalization based on Human Feedback

Large language models (LLMs) have demonstrated remarkable proficiency in a range of natural language processing tasks. Once deployed, LLMs encounter users with personalized factual knowledge, and such personalized knowledge is consistently reflected through users' interactions with the LLMs. To enhance user experience, real-time model personalization is essential, allowing LLMs to adapt user-specific knowledge based on user feedback during human-LLM interactions. Existing methods mostly require back-propagation to finetune the model parameters, which incurs high computational and memory costs. In addition, these methods suffer from low interpretability, which will cause unforeseen impacts on model performance during long-term use, where the user's personalized knowledge is accumulated extensively.To address these challenges, we propose Knowledge Graph Tuning (KGT), a novel approach that leverages knowledge graphs (KGs) to personalize LLMs. KGT extracts personalized factual knowledge triples from users' queries and feedback and optimizes KGs without modifying the LLM parameters. Our method improves computational and memory efficiency by avoiding back-propagation and ensures interpretability by making the KG adjustments comprehensible to humans.Experiments with state-of-the-art LLMs, including GPT-2, Llama2, and Llama3, show that KGT significantly improves personalization performance while reducing latency and GPU memory costs. Ultimately, KGT offers a promising solution of effective, efficient, and interpretable real-time LLM personalization during user interactions with the LLMs.

Updated: 2024-05-30 04:57:03

标题: 知识图谱调整：基于人类反馈的实时大型语言模型个性化

摘要: 大型语言模型（LLMs）在各种自然语言处理任务中展现出了非凡的熟练度。一旦部署，LLMs会遇到具有个性化事实知识的用户，这种个性化知识通过用户与LLMs的互动不断反映出来。为了提升用户体验，实时模型个性化至关重要，使LLMs能够根据用户在人-LLM互动过程中的反馈调整用户特定的知识。现有方法大多需要反向传播来微调模型参数，这会导致高计算和内存成本。此外，这些方法存在低可解释性的问题，在长期使用过程中会对模型性能产生意想不到的影响，因为用户的个性化知识被广泛积累。为了解决这些挑战，我们提出了知识图调整（KGT），这是一种利用知识图（KGs）来个性化LLMs的新方法。KGT从用户的查询和反馈中提取个性化的事实知识三元组，并在不修改LLM参数的情况下优化KGs。我们的方法通过避免反向传播来提高计算和内存效率，并确保可解释性，使KG调整能够被人类理解。对包括GPT-2、Llama2和Llama3在内的最先进的LLMs进行实验表明，KGT显著提高了个性化性能，同时降低了延迟和GPU内存成本。最终，KGT提供了一种有效、高效和可解释的实时LLM个性化解决方案，使用户在与LLMs互动时能够更好地个性化。

更新时间: 2024-05-30 04:57:03

领域: cs.AI

下载: http://arxiv.org/abs/2405.19686v1

Breaking Indistinguishability with Transfer Learning: A First Look at SPECK32/64 Lightweight Block Ciphers

In this research, we introduce MIND-Crypt, a novel attack framework that uses deep learning (DL) and transfer learning (TL) to challenge the indistinguishability of block ciphers, specifically SPECK32/64 encryption algorithm in CBC mode (Cipher Block Chaining) against Known Plaintext Attacks (KPA). Our methodology includes training a DL model with ciphertexts of two messages encrypted using the same key. The selected messages have the same byte-length and differ by only one bit at the binary level. This DL model employs a residual network architecture. For the TL, we use the trained DL model as a feature extractor, and these features are then used to train a shallow machine learning, such as XGBoost. This dual strategy aims to distinguish ciphertexts of two encrypted messages, addressing traditional cryptanalysis challenges. Our findings demonstrate that the DL model achieves an accuracy of approximately 99% under consistent cryptographic conditions (Same Key or Rounds) with the SPECK32/64 cipher. However, performance degrades to random guessing levels (50%) when tested with ciphertext generated from different keys or different encryption rounds of SPECK32/64. To enhance the results, the DL model requires retraining with different keys or encryption rounds using larger datasets (10^7 samples). To overcome this limitation, we implement TL, achieving an accuracy of about 53% with just 10,000 samples, which is better than random guessing. Further training with 580,000 samples increases accuracy to nearly 99%, showing a substantial reduction in data requirements by over 94%. This shows that an attacker can utilize machine learning models to break indistinguishability by accessing pairs of plaintexts and their corresponding ciphertexts encrypted with the same key, without directly interacting with the communicating parties.

Updated: 2024-05-30 04:40:13

标题: 使用迁移学习打破不可区分性：首次探究SPECK32/64轻量级分组密码

摘要: 在这项研究中，我们介绍了MIND-Crypt，这是一个利用深度学习（DL）和迁移学习（TL）挑战块密码不可区分性的攻击框架，特别是针对已知明文攻击（KPA）的SPECK32/64加密算法在CBC模式（Cipher Block Chaining）下的情况。我们的方法包括使用相同密钥加密的两条消息的密文来训练DL模型。所选消息具有相同的字节长度，在二进制级别上只有一个位的不同。这个DL模型采用了残差网络架构。对于TL，我们使用训练过的DL模型作为特征提取器，然后利用这些特征来训练一个浅层机器学习模型，如XGBoost。这种双重策略旨在区分两条加密消息的密文，解决传统的密码分析挑战。我们的研究结果表明，在一致的加密条件（相同密钥或轮数）下，DL模型在SPECK32/64密码中实现了约99%的准确率。然而，当使用从不同密钥生成的密文进行测试时，性能会降至随机猜测水平（50%），或者在不同轮数的SPECK32/64加密中进行测试时也是如此。为了提高结果，DL模型需要使用更大的数据集（10^7个样本）重新训练不同密钥或加密轮数。为了克服这一限制，我们实施了TL，仅使用1万个样本就实现了约53%的准确率，比随机猜测好。进一步使用58万个样本进行训练，将准确率提高到近99%，显示出数据需求降低了94%以上。这表明，攻击者可以利用机器学习模型来破解不可区分性，而不必直接与通信双方进行交互，只需访问使用相同密钥加密的明文对及其相应的密文。

更新时间: 2024-05-30 04:40:13

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.19683v1

Bayesian Online Natural Gradient (BONG)

We propose a novel approach to sequential Bayesian inference based on variational Bayes. The key insight is that, in the online setting, we do not need to add the KL term to regularize to the prior (which comes from the posterior at the previous timestep); instead we can optimize just the expected log-likelihood, performing a single step of natural gradient descent starting at the prior predictive. We prove this method recovers exact Bayesian inference if the model is conjugate, and empirically outperforms other online VB methods in the non-conjugate setting, such as online learning for neural networks, especially when controlling for computational costs.

Updated: 2024-05-30 04:27:36

标题: 贝叶斯在线自然梯度（BONG）

摘要: 我们提出了一种基于变分贝叶斯的序贝叶斯推断的新方法。关键洞察是，在在线设置中，我们不需要将KL项添加到正则化以对先验进行调整（该先验来自上一个时间步的后验）；相反，我们可以仅优化期望对数似然，执行从先验预测开始的自然梯度下降的单步操作。我们证明了如果模型是共轭的，这种方法将恢复精确的贝叶斯推断，并在非共轭设置中在实证上优于其他在线VB方法，如用于神经网络的在线学习，特别是在控制计算成本时。

更新时间: 2024-05-30 04:27:36

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2405.19681v1

Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Large language models (LLMs) are increasingly prevalent in conversational systems due to their advanced understanding and generative capabilities in general contexts. However, their effectiveness in task-oriented dialogues (TOD), which requires not only response generation but also effective dialogue state tracking (DST) within specific tasks and domains, remains less satisfying. In this work, we propose a novel approach FnCTOD for solving DST with LLMs through function calling. This method improves zero-shot DST, allowing adaptation to diverse domains without extensive data collection or model tuning. Our experimental results demonstrate that our approach achieves exceptional performance with both modestly sized open-source and also proprietary LLMs: with in-context prompting it enables various 7B or 13B parameter models to surpass the previous state-of-the-art (SOTA) achieved by ChatGPT, and improves ChatGPT's performance beating the SOTA by 5.6% average joint goal accuracy (JGA). Individual model results for GPT-3.5 and GPT-4 are boosted by 4.8% and 14%, respectively. We also show that by fine-tuning on a small collection of diverse task-oriented dialogues, we can equip modestly sized models, specifically a 13B parameter LLaMA2-Chat model, with function-calling capabilities and DST performance comparable to ChatGPT while maintaining their chat capabilities. We have made the code publicly available at https://github.com/facebookresearch/FnCTOD

Updated: 2024-05-30 04:19:54

标题: 大型语言模型作为零-shot 对话状态追踪器通过函数调用

摘要: 大型语言模型（LLMs）在对话系统中越来越普遍，因为它们在一般情境下具有先进的理解和生成能力。然而，在需要不仅生成响应还需要在特定任务和领域内有效地跟踪对话状态（DST）的任务导向对话（TOD）中，它们的有效性仍然不尽人意。在这项工作中，我们提出了一种通过函数调用解决LLMs中DST的新方法FnCTOD。这种方法改进了零样本DST，使其能够适应各种领域，而无需大量数据收集或模型调整。我们的实验结果表明，我们的方法在中等规模的开源和专有LLMs上都取得了出色的表现：通过上下文提示，它使各种7B或13B参数模型超越了先前由ChatGPT实现的最新技术水平（SOTA），并改进了ChatGPT的表现，平均联合目标准确度（JGA）提高了5.6%。对于GPT-3.5和GPT-4，个别模型的结果分别提高了4.8%和14%。我们还展示了通过在少量多样化的任务导向对话集合上微调，我们可以为中等规模的模型，特别是13B参数LLaMA2-Chat模型，装备函数调用能力，并且具有与ChatGPT相当的DST性能，同时保持其聊天能力。我们已经将代码公开发布在https://github.com/facebookresearch/FnCTOD。

更新时间: 2024-05-30 04:19:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.10466v4

Efficient Trajectory Inference in Wasserstein Space Using Consecutive Averaging

Capturing data from dynamic processes through cross-sectional measurements is seen in many fields such as computational biology. Trajectory inference deals with the challenge of reconstructing continuous processes from such observations. In this work, we propose methods for B-spline approximation and interpolation of point clouds through consecutive averaging that is instrinsic to the Wasserstein space. Combining subdivision schemes with optimal transport-based geodesic, our methods carry out trajectory inference at a chosen level of precision and smoothness, and can automatically handle scenarios where particles undergo division over time. We rigorously evaluate our method by providing convergence guarantees and testing it on simulated cell data characterized by bifurcations and merges, comparing its performance against state-of-the-art trajectory inference and interpolation methods. The results not only underscore the effectiveness of our method in inferring trajectories, but also highlight the benefit of performing interpolation and approximation that respect the inherent geometric properties of the data.

Updated: 2024-05-30 04:19:20

标题: 在Wasserstein空间中使用连续平均的高效轨迹推断

摘要: 通过横截面测量捕获动态过程的数据在许多领域中都有应用，比如计算生物学。轨迹推断处理重建连续过程的挑战，从这些观察中。在这项工作中，我们提出了一种通过连续平均进行B样条逼近和插值点云的方法，该方法是Wasserstein空间的固有特性。将分割方案与基于最优传输的测地线结合起来，我们的方法以所选精度和平滑度执行轨迹推断，可以自动处理随时间发生分裂的粒子的情况。我们通过提供收敛保证并在模拟的细胞数据上测试我们的方法，该数据具有分叉和合并特征，将其性能与最先进的轨迹推断和插值方法进行比较，严格评估我们的方法。结果不仅强调了我们的方法在推断轨迹方面的有效性，还突出了进行插值和逼近的好处，这些插值和逼近尊重数据的固有几何属性。

更新时间: 2024-05-30 04:19:20

领域: cs.LG,cs.NA,math.NA,math.OC

下载: http://arxiv.org/abs/2405.19679v1

InferCept: Efficient Intercept Support for Augmented Large Language Model Inference

Large language models are increasingly integrated with external environments, tools, and agents like ChatGPT plugins to extend their capability beyond language-centric tasks. However, today's LLM inference systems are designed for standalone LLMs. They treat each external interaction as the end of LLM generation and form a new request when the interaction finishes, causing unnecessary recomputation of already computed contexts, which accounts for 37-40% of total model forwarding time. This paper presents InferCept, the first LLM inference framework targeting augmented LLMs and supporting the efficient interception of LLM generation. InferCept minimizes the GPU resource waste caused by LLM interceptions and dedicates saved memory for serving more requests. InferCept improves the overall serving throughput by 1.6x-2x and completes 2x more requests per second compared to the state-of-the-art LLM inference systems.

Updated: 2024-05-30 04:18:03

标题: InferCept：用于增强大型语言模型推理的高效拦截支持

摘要: 大型语言模型越来越多地与外部环境、工具和代理集成，如ChatGPT插件，以扩展其能力超出语言中心任务。然而，今天的LLM推理系统设计用于独立的LLMs。它们将每次外部交互视为LLM生成的结束，并在交互完成时形成新请求，导致不必要的重新计算已经计算的上下文，占总模型前向时间的37-40%。本文介绍了InferCept，这是第一个针对增强LLM的推理框架，支持对LLM生成的高效拦截。InferCept最小化了由LLM拦截引起的GPU资源浪费，并将节省的内存用于提供更多请求。相比于最先进的LLM推理系统，InferCept将整体服务吞吐量提高了1.6倍至2倍，并每秒完成了2倍更多的请求。

更新时间: 2024-05-30 04:18:03

领域: cs.LG,cs.CL,cs.DC

下载: http://arxiv.org/abs/2402.01869v2

View-Consistent Hierarchical 3D SegmentationUsing Ultrametric Feature Fields

Large-scale vision foundation models such as Segment Anything (SAM) demonstrate impressive performance in zero-shot image segmentation at multiple levels of granularity. However, these zero-shot predictions are rarely 3D-consistent. As the camera viewpoint changes in a scene, so do the segmentation predictions, as well as the characterizations of ``coarse" or ``fine" granularity. In this work, we address the challenging task of lifting multi-granular and view-inconsistent image segmentations into a hierarchical and 3D-consistent representation. We learn a novel feature field within a Neural Radiance Field (NeRF) representing a 3D scene, whose segmentation structure can be revealed at different scales by simply using different thresholds on feature distance. Our key idea is to learn an ultrametric feature space, which unlike a Euclidean space, exhibits transitivity in distance-based grouping, naturally leading to a hierarchical clustering. Put together, our method takes view-inconsistent multi-granularity 2D segmentations as input and produces a hierarchy of 3D-consistent segmentations as output. We evaluate our method and several baselines on synthetic datasets with multi-view images and multi-granular segmentation, showcasing improved accuracy and viewpoint-consistency. We additionally provide qualitative examples of our model's 3D hierarchical segmentations in real world scenes.\footnote{The code and dataset are available at:

Updated: 2024-05-30 04:14:58

标题: 使用超度量特征场的视图一致的分层三维分割

摘要: 大规模视觉基础模型，例如Segment Anything (SAM)，在多个细粒度级别的零样本图像分割中展现出令人印象深刻的性能。然而，这些零样本预测很少是3D一致的。当摄像机视角在场景中改变时，分割预测以及"粗糙"或"精细"粒度的特征描述也会发生变化。在这项工作中，我们解决了将多粒度和视角不一致的图像分割提升到分层和3D一致表示的挑战性任务。我们在表示3D场景的神经辐射场（NeRF）中学习了一个新颖的特征场，其分割结构可以通过简单地在特征距离上使用不同的阈值来在不同尺度上展示。我们的关键想法是学习一个超度量特征空间，与欧氏空间不同，它在基于距离的分组中表现出传递性，自然地导致分层聚类。总的来说，我们的方法将视角不一致的多粒度2D分割作为输入，并产生一系列3D一致的分割作为输出。我们在具有多视图图像和多粒度分割的合成数据集上评估了我们的方法和几种基线，展示了改进的准确性和视角一致性。此外，我们还提供了模型在真实场景中的3D分层分割的定性示例。

更新时间: 2024-05-30 04:14:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19678v1

Large Language Model Watermark Stealing With Mixed Integer Programming

The Large Language Model (LLM) watermark is a newly emerging technique that shows promise in addressing concerns surrounding LLM copyright, monitoring AI-generated text, and preventing its misuse. The LLM watermark scheme commonly includes generating secret keys to partition the vocabulary into green and red lists, applying a perturbation to the logits of tokens in the green list to increase their sampling likelihood, thus facilitating watermark detection to identify AI-generated text if the proportion of green tokens exceeds a threshold. However, recent research indicates that watermarking methods using numerous keys are susceptible to removal attacks, such as token editing, synonym substitution, and paraphrasing, with robustness declining as the number of keys increases. Therefore, the state-of-the-art watermark schemes that employ fewer or single keys have been demonstrated to be more robust against text editing and paraphrasing. In this paper, we propose a novel green list stealing attack against the state-of-the-art LLM watermark scheme and systematically examine its vulnerability to this attack. We formalize the attack as a mixed integer programming problem with constraints. We evaluate our attack under a comprehensive threat model, including an extreme scenario where the attacker has no prior knowledge, lacks access to the watermark detector API, and possesses no information about the LLM's parameter settings or watermark injection/detection scheme. Extensive experiments on LLMs, such as OPT and LLaMA, demonstrate that our attack can successfully steal the green list and remove the watermark across all settings.

Updated: 2024-05-30 04:11:17

标题: 大型语言模型水印窃取与混合整数规划

摘要: 大语言模型（LLM）水印是一种新兴的技术，显示出在解决LLM版权、监控AI生成文本和防止其滥用方面具有潜力。LLM水印方案通常包括生成秘钥将词汇划分为绿色和红色列表，对绿色列表中的令牌的逻辑应用扰动以增加其采样概率，从而促进水印检测以识别AI生成的文本，如果绿色令牌的比例超过阈值。然而，最近的研究表明，使用许多秘钥的水印方法容易受到删除攻击，如令牌编辑、同义词替换和改写，随着秘钥数量的增加，其鲁棒性下降。因此，采用较少或单个秘钥的最先进水印方案已被证明对文本编辑和改写更具鲁棒性。在本文中，我们提出了一种新颖的针对最先进LLM水印方案的绿色列表窃取攻击，并系统地检验其对这种攻击的脆弱性。我们将攻击形式化为一个带有约束的混合整数规划问题。我们在一个全面的威胁模型下评估我们的攻击，包括一个极端情景，即攻击者没有先验知识，无法访问水印检测器API，并且没有关于LLM参数设置或水印注入/检测方案的信息。对LLM进行的广泛实验，如OPT和LLaMA，证明了我们的攻击可以成功窃取绿色列表并在所有设置中删除水印。

更新时间: 2024-05-30 04:11:17

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.19677v1

Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models

AI-driven design problems, such as DNA/protein sequence design, are commonly tackled from two angles: generative modeling, which efficiently captures the feasible design space (e.g., natural images or biological sequences), and model-based optimization, which utilizes reward models for extrapolation. To combine the strengths of both approaches, we adopt a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL. Although prior work has explored similar avenues, they primarily focus on scenarios where accurate reward models are accessible. In contrast, we concentrate on an offline setting where a reward model is unknown, and we must learn from static offline datasets, a common scenario in scientific domains. In offline scenarios, existing approaches tend to suffer from overoptimization, as they may be misled by the reward model in out-of-distribution regions. To address this, we introduce a conservative fine-tuning approach, BRAID, by optimizing a conservative reward model, which includes additional penalization outside of offline data distributions. Through empirical and theoretical analysis, we demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models while avoiding the generation of invalid designs through pre-trained diffusion models.

Updated: 2024-05-30 03:57:29

标题: 通过对扩散模型进行保守微调，构建基于模型的优化和生成建模的桥梁

摘要: AI驱动的设计问题，如DNA/蛋白质序列设计，通常从两个角度来解决：生成建模，有效地捕捉可行的设计空间（例如自然图像或生物序列），以及基于模型的优化，利用奖励模型进行外推。为了结合这两种方法的优势，我们采用一种混合方法，通过RL优化奖励模型来微调最先进的扩散模型。尽管先前的工作已经探索了类似的途径，但它们主要集中在准确的奖励模型可访问的情景。相反，我们专注于离线设置，其中奖励模型是未知的，我们必须从静态离线数据集中学习，这在科学领域中是常见的情况。在离线情景中，现有方法往往会因为在分布外区域被奖励模型误导而过度优化。为了解决这个问题，我们引入了一种保守的微调方法BRAID，通过优化一个保守的奖励模型，该模型在离线数据分布之外增加了额外的惩罚。通过经验和理论分析，我们展示了我们的方法在离线数据中胜过最佳设计的能力，利用奖励模型的外推能力，同时避免通过预训练扩散模型生成无效的设计。

更新时间: 2024-05-30 03:57:29

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.19673v1

CRIS: Collaborative Refinement Integrated with Segmentation for Polyp Segmentation

Accurate detection of colorectal cancer and early prevention heavily rely on precise polyp identification during gastrointestinal colonoscopy. Due to limited data, many current state-of-the-art deep learning methods for polyp segmentation often rely on post-processing of masks to reduce noise and enhance results. In this study, we propose an approach that integrates mask refinement and binary semantic segmentation, leveraging a novel collaborative training strategy that surpasses current widely-used refinement strategies. We demonstrate the superiority of our approach through comprehensive evaluation on established benchmark datasets and its successful application across various medical image segmentation architectures.

Updated: 2024-05-30 03:56:01

标题: CRIS：集成分割的协同细化用于息肉分割

摘要: 结直肠癌的准确检测和早期预防在很大程度上依赖于在胃肠镜检查中准确识别息肉。由于数据有限，许多目前最先进的深度学习方法用于息肉分割通常依赖于对掩模进行后处理以减少噪音并增强结果。在本研究中，我们提出了一种整合了掩模细化和二进制语义分割的方法，利用一种新颖的协作训练策略，超越了当前广泛使用的细化策略。我们通过在已建立的基准数据集上进行全面评估，并在各种医学图像分割架构中成功应用，展示了我们方法的优越性。

更新时间: 2024-05-30 03:56:01

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.19672v1

Single-loop Stochastic Algorithms for Difference of Max-Structured Weakly Convex Functions

In this paper, we study a class of non-smooth non-convex problems in the form of $\min_{x}[\max_{y\in Y}\phi(x, y) - \max_{z\in Z}\psi(x, z)]$, where both $\Phi(x) = \max_{y\in Y}\phi(x, y)$ and $\Psi(x)=\max_{z\in Z}\psi(x, z)$ are weakly convex functions, and $\phi(x, y), \psi(x, z)$ are strongly concave functions in terms of $y$ and $z$, respectively. It covers two families of problems that have been studied but are missing single-loop stochastic algorithms, i.e., difference of weakly convex functions and weakly convex strongly-concave min-max problems. We propose a stochastic Moreau envelope approximate gradient method dubbed SMAG, the first single-loop algorithm for solving these problems, and provide a state-of-the-art non-asymptotic convergence rate. The key idea of the design is to compute an approximate gradient of the Moreau envelopes of $\Phi, \Psi$ using only one step of stochastic gradient update of the primal and dual variables. Empirically, we conduct experiments on positive-unlabeled (PU) learning and partial area under ROC curve (pAUC) optimization with an adversarial fairness regularizer to validate the effectiveness of our proposed algorithms.

Updated: 2024-05-30 03:46:44

标题: 单循环随机算法用于最大结构弱凸函数之差

摘要: 在本文中，我们研究了一类非光滑非凸问题，形式为$\min_{x}[\max_{y\in Y}\phi(x, y) - \max_{z\in Z}\psi(x, z)]$，其中$\Phi(x) = \max_{y\in Y}\phi(x, y)$和$\Psi(x)=\max_{z\in Z}\psi(x, z)$都是弱凸函数，并且$\phi(x, y), \psi(x, z)$分别是关于$y$和$z$的强凹函数。它涵盖了两类已被研究但缺少单循环随机算法的问题，即弱凸函数差和弱凸强凹极小-极大问题。我们提出了一种名为SMAG的随机Moreau包络近似梯度方法，这是解决这些问题的第一个单循环算法，并提供了最新的非渐近收敛速度。设计的关键思想是仅使用一步随机梯度更新原始和对偶变量来计算$\Phi, \Psi$的Moreau包络的近似梯度。在实证方面，我们进行了正-无标记（PU）学习和部分ROC曲线下面积（pAUC）优化的实验，并引入了一个对抗公平性正则化项，以验证我们提出的算法的有效性。

更新时间: 2024-05-30 03:46:44

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.18577v2

Reconciling Model Multiplicity for Downstream Decision Making

We consider the problem of model multiplicity in downstream decision-making, a setting where two predictive models of equivalent accuracy cannot agree on the best-response action for a downstream loss function. We show that even when the two predictive models approximately agree on their individual predictions almost everywhere, it is still possible for their induced best-response actions to differ on a substantial portion of the population. We address this issue by proposing a framework that calibrates the predictive models with regard to both the downstream decision-making problem and the individual probability prediction. Specifically, leveraging tools from multi-calibration, we provide an algorithm that, at each time-step, first reconciles the differences in individual probability prediction, then calibrates the updated models such that they are indistinguishable from the true probability distribution to the decision-maker. We extend our results to the setting where one does not have direct access to the true probability distribution and instead relies on a set of i.i.d data to be the empirical distribution. Finally, we provide a set of experiments to empirically evaluate our methods: compared to existing work, our proposed algorithm creates a pair of predictive models with both improved downstream decision-making losses and agrees on their best-response actions almost everywhere.

Updated: 2024-05-30 03:36:46

标题: 调和模型多样性以支持下游决策-making

摘要: 我们考虑在下游决策中的模型多样性问题，这是一个情境，在这个情境中，两个准确度相等的预测模型无法就下游损失函数的最佳响应行动达成一致。我们展示，即使两个预测模型在几乎所有地方的个别预测上基本一致，它们引发的最佳响应行动仍可能在人口的大部分地方有所不同。我们通过提出一个框架来解决这个问题，该框架考虑了预测模型对下游决策问题和个别概率预测的校准。具体来说，利用多校准工具，我们提供了一个算法，每个时间步骤首先调和个别概率预测的差异，然后校准更新的模型，使它们与真实概率分布对决策者不可区分。我们将结果扩展到没有直接访问真实概率分布的情况，而是依赖于一组独立同分布的数据作为经验分布。最后，我们提供了一组实验来实证评估我们的方法：与现有工作相比，我们提出的算法创建了一对预测模型，它们在下游决策损失上均有所改善，并且在几乎所有地方达成一致的最佳响应行动。

更新时间: 2024-05-30 03:36:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19667v1

SegICL: A Multimodal In-context Learning Framework for Enhanced Segmentation in Medical Imaging

In the field of medical image segmentation, tackling Out-of-Distribution (OOD) segmentation tasks in a cost-effective manner remains a significant challenge. Universal segmentation models is a solution, which aim to generalize across the diverse modality of medical images, yet their effectiveness often diminishes when applied to OOD data modalities and tasks, requiring intricate fine-tuning of model for optimal performance. Few-shot learning segmentation methods are typically designed for specific modalities of data and cannot be directly transferred for use with another modality. Therefore, we introduce SegICL, a novel approach leveraging In-Context Learning (ICL) for image segmentation. Unlike existing methods, SegICL has the capability to employ text-guided segmentation and conduct in-context learning with a small set of image-mask pairs, eliminating the need for training the model from scratch or fine-tuning for OOD tasks (including OOD modality and dataset). Extensive experimental demonstrates a positive correlation between the number of shots and segmentation performance on OOD tasks. The performance of segmentation when provided thre-shots is approximately 1.5 times better than the performance in a zero-shot setting. This indicates that SegICL effectively address new segmentation tasks based on contextual information. Additionally, SegICL also exhibits comparable performance to mainstream models on OOD and in-distribution tasks. Our code will be released after paper review.

Updated: 2024-05-30 03:35:06

标题: SegICL：一种用于增强医学影像分割的多模态上下文学习框架

摘要: 在医学图像分割领域，以一种经济有效的方式处理超出分布（OOD）分割任务仍然是一个重要挑战。通用分割模型是一种解决方案，旨在跨越医学图像的多样模态，但当应用于OOD数据模态和任务时，它们的有效性通常会减弱，需要对模型进行复杂的微调以获得最佳性能。少样本学习分割方法通常设计用于特定数据模态，并且不能直接转移到另一模态使用。因此，我们引入了SegICL，一种利用上下文学习（ICL）进行图像分割的新方法。与现有方法不同，SegICL具有利用文本引导分割和使用少量图像-掩模对进行上下文学习的能力，从而消除了从头开始训练模型或为OOD任务进行微调的需求（包括OOD模态和数据集）。广泛的实验表明，少样本数量与OOD任务的分割性能之间存在正相关关系。当提供三个样本时，分割性能约为零样本设置中的1.5倍。这表明SegICL有效地基于上下文信息解决了新的分割任务。此外，SegICL在OOD和分布任务上表现出与主流模型相当的性能。我们的代码将在论文审查后发布。

更新时间: 2024-05-30 03:35:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.16578v4

A novel fault localization with data refinement for hydroelectric units

Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learning (SG-WMBDL) based fault localization method for hydroelectric units is proposed. To overcome the data scarcity, a SAE is embedded into the GAN to generate more high-quality samples in the data generation module. Considering the signals involving non-linear and non-smooth characteristics, the improved WNR which combining both soft and hard thresholding and local linear embedding (LLE) are utilized to the data preprocessing module in order to reduce the noise and effectively capture the local features. In addition, to seek higher performance, the novel Adaptive Boost (AdaBoost) combined with multi deep learning is proposed to achieve accurate fault localization. The experimental results show that the SG-WMBDL can locate faults for hydroelectric units under a small number of fault samples with non-linear and non-smooth characteristics on higher precision and accuracy compared to other frontier methods, which verifies the effectiveness and practicality of the proposed method.

Updated: 2024-05-30 03:33:49

标题: 一种用于水电机组的数据精化故障定位新方法

摘要: 由于水电机组中故障样本稀缺以及非线性和非平滑特征数据的复杂性，大多数传统的水电机组故障定位方法难以实现准确的定位。为解决这些问题，提出了一种基于稀疏自动编码器（SAE）-生成对抗网络（GAN）-小波降噪（WNR）-流形增强深度学习（SG-WMBDL）的水电机组故障定位方法。为克服数据稀缺问题，在数据生成模块中将SAE嵌入到GAN中以生成更多高质量样本。考虑到涉及非线性和非平滑特征的信号，改进的WNR结合软硬阈值和局部线性嵌入（LLE）用于数据预处理模块，以减少噪音并有效捕捉局部特征。此外，为寻求更高性能，提出了新颖的自适应Boost（AdaBoost）结合多深度学习以实现准确的故障定位。实验结果表明，与其他前沿方法相比，SG-WMBDL可以在较少故障样本的情况下对水电机组进行非线性和非平滑特征的精确定位，具有更高的精度和准确性，验证了所提出方法的有效性和实用性。

更新时间: 2024-05-30 03:33:49

领域: eess.SY,cs.AI,cs.LG,cs.SY

下载: http://arxiv.org/abs/2405.19665v1

MGCP: A Multi-Grained Correlation based Prediction Network for Multivariate Time Series

Multivariate time series prediction is widely used in daily life, which poses significant challenges due to the complex correlations that exist at multi-grained levels. Unfortunately, the majority of current time series prediction models fail to simultaneously learn the correlations of multivariate time series at multi-grained levels, resulting in suboptimal performance. To address this, we propose a Multi-Grained Correlations-based Prediction (MGCP) Network, which simultaneously considers the correlations at three granularity levels to enhance prediction performance. Specifically, MGCP utilizes Adaptive Fourier Neural Operators and Graph Convolutional Networks to learn the global spatiotemporal correlations and inter-series correlations, enabling the extraction of potential features from multivariate time series at fine-grained and medium-grained levels. Additionally, MGCP employs adversarial training with an attention mechanism-based predictor and conditional discriminator to optimize prediction results at coarse-grained level, ensuring high fidelity between the generated forecast results and the actual data distribution. Finally, we compare MGCP with several state-of-the-art time series prediction algorithms on real-world benchmark datasets, and our results demonstrate the generality and effectiveness of the proposed model.

Updated: 2024-05-30 03:32:44

标题: MGCP：一种基于多粒度相关性的多元时间序列预测网络

摘要: 多元时间序列预测在日常生活中被广泛使用，由于存在在多粒度级别上的复杂相关性，这带来了重大挑战。不幸的是，目前大多数的时间序列预测模型都无法同时学习多元时间序列在多粒度级别上的相关性，导致性能不佳。为了解决这个问题，我们提出了一种基于多粒度相关性的预测（MGCP）网络，它同时考虑了三个粒度级别上的相关性以增强预测性能。具体来说，MGCP利用自适应傅里叶神经操作符和图卷积网络来学习全局时空相关性和跨系列相关性，从而提取多元时间序列在细粒度和中等粒度级别上的潜在特征。此外，MGCP采用带有基于注意机制的预测器和条件鉴别器的对抗训练，在粗粒度级别上优化预测结果，确保生成的预测结果和实际数据分布之间的高保真度。最后，我们将MGCP与几种最先进的时间序列预测算法在真实世界基准数据集上进行比较，我们的结果证明了所提出模型的通用性和有效性。

更新时间: 2024-05-30 03:32:44

领域: cs.LG

下载: http://arxiv.org/abs/2405.19661v1

Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian

3D Gaussian splatting has demonstrated impressive performance in real-time novel view synthesis. However, achieving successful reconstruction from RGB images generally requires multiple input views captured under static conditions. To address the challenge of sparse input views, previous approaches have incorporated depth supervision into the training of 3D Gaussians to mitigate overfitting, using dense predictions from pretrained depth networks as pseudo-ground truth. Nevertheless, depth predictions from monocular depth estimation models inherently exhibit significant uncertainty in specific areas. Relying solely on pixel-wise L2 loss may inadvertently incorporate detrimental noise from these uncertain areas. In this work, we introduce a novel method to supervise the depth distribution of 3D Gaussians, utilizing depth priors with integrated uncertainty estimates. To address these localized errors in depth predictions, we integrate a patch-wise optimal transport strategy to complement traditional L2 loss in depth supervision. Extensive experiments conducted on the LLFF, DTU, and Blender datasets demonstrate that our approach, UGOT, achieves superior novel view synthesis and consistently outperforms state-of-the-art methods.

Updated: 2024-05-30 03:18:30

标题: 不确定性引导的深度监督稀疏视图3D高斯最优输运

摘要: 3D高斯光斑在实时新视图合成中表现出色。然而，成功从RGB图像重建通常需要在静态条件下捕获的多个输入视图。为了解决稀疏输入视图的挑战，先前的方法将深度监督纳入3D高斯训练中，以减少过拟合，使用来自预训练深度网络的密集预测作为伪地面真实。然而，单目深度估计模型的深度预测在特定区域固有地表现出显著的不确定性。仅依赖像素级L2损失可能无意中纳入这些不确定区域的有害噪音。在这项工作中，我们引入了一种新方法来监督3D高斯的深度分布，利用集成不确定性估计的深度先验。为了解决深度预测中的这些局部错误，我们整合了一个基于补丁的最优传输策略来补充传统的深度监督中的L2损失。在LLFF、DTU和Blender数据集上进行的大量实验表明，我们的方法UGOT实现了卓越的新视图合成，并始终优于最先进的方法。

更新时间: 2024-05-30 03:18:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19657v1

Experimental demonstration of magnetic tunnel junction-based computational random-access memory

Conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence, because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called "computational random-access memory (CRAM)" has emerged to address this fundamental limitation. CRAM performs logic operations directly using the memory cells themselves, without having the data ever leave the memory. The energy and performance benefits of CRAM for both conventional and emerging applications have been well established by prior numerical studies. However, there lacks an experimental demonstration and study of CRAM to evaluate its computation accuracy, which is a realistic and application-critical metrics for its technological feasibility and competitiveness. In this work, a CRAM array based on magnetic tunnel junctions (MTJs) is experimentally demonstrated. First, basic memory operations as well as 2-, 3-, and 5-input logic operations are studied. Then, a 1-bit full adder with two different designs is demonstrated. Based on the experimental results, a suite of modeling has been developed to characterize the accuracy of CRAM computation. Scalar addition, multiplication, and matrix multiplication, which are essential building blocks for many conventional and machine intelligence applications, are evaluated and show promising accuracy performance. With the confirmation of MTJ-based CRAM's accuracy, there is a strong case that this technology will have a significant impact on power- and energy-demanding applications of machine intelligence.

Updated: 2024-05-30 03:17:30

标题: 基于磁隧道结构的计算随机存取内存的实验演示

摘要: 传统计算范式难以满足新兴应用程序，特别是机器智能应用程序快速增长的需求，因为大部分功耗和能量被逻辑和存储模块之间的数据传输所消耗。一种新的范式，称为“计算随机存取存储器（CRAM）”已经出现以解决这一基本限制。CRAM直接使用存储单元执行逻辑操作，而不需数据离开存储器。之前的数值研究已经证实了CRAM对传统和新兴应用程序的能源和性能优势。然而，缺乏CRAM的实验演示和研究来评估其计算准确性，这是其技术可行性和竞争力的现实和应用关键指标。在这项工作中，基于磁隧道结构（MTJs）的CRAM阵列进行了实验演示。首先，研究了基本存储操作以及2、3和5输入逻辑操作。然后，演示了具有两种不同设计的1位全加器。根据实验结果，开发了一套模型来表征CRAM计算的准确性。标量加法、乘法和矩阵乘法，这些是许多传统和机器智能应用程序的基本构件，已经进行了评估，并展示了令人满意的准确性表现。通过确认基于MTJ的CRAM的准确性，有充分理由认为这项技术将对能源需求和能源需求高的机器智能应用产生重大影响。

更新时间: 2024-05-30 03:17:30

领域: cs.ET,cond-mat.mes-hall,cs.AI,cs.AR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2312.14264v3

Game Generation via Large Language Models

Recently, the emergence of large language models (LLMs) has unlocked new opportunities for procedural content generation. However, recent attempts mainly focus on level generation for specific games with defined game rules such as Super Mario Bros. and Zelda. This paper investigates the game generation via LLMs. Based on video game description language, this paper proposes an LLM-based framework to generate game rules and levels simultaneously. Experiments demonstrate how the framework works with prompts considering different combinations of context. Our findings extend the current applications of LLMs and offer new insights for generating new games in the area of procedural content generation.

Updated: 2024-05-30 03:17:00

标题: 大语言模型生成游戏

摘要: 最近，大型语言模型（LLMs）的出现为程序内容生成开辟了新的机遇。然而，最近的尝试主要集中在为具有定义游戏规则的特定游戏生成关卡，如《超级马里奥兄弟》和《塞尔达传说》。本文研究了通过LLMs生成游戏。基于视频游戏描述语言，本文提出了一个基于LLMs的框架，可以同时生成游戏规则和关卡。实验展示了框架如何通过考虑不同上下文组合的提示来运作。我们的研究结果扩展了LLMs的当前应用，并为在程序内容生成领域生成新游戏提供了新的见解。

更新时间: 2024-05-30 03:17:00

领域: cs.AI

下载: http://arxiv.org/abs/2404.08706v2

Accurate and Reliable Predictions with Mutual-Transport Ensemble

Deep Neural Networks (DNNs) have achieved remarkable success in a variety of tasks, especially when it comes to prediction accuracy. However, in complex real-world scenarios, particularly in safety-critical applications, high accuracy alone is not enough. Reliable uncertainty estimates are crucial. Modern DNNs, often trained with cross-entropy loss, tend to be overconfident, especially with ambiguous samples. To improve uncertainty calibration, many techniques have been developed, but they often compromise prediction accuracy. To tackle this challenge, we propose the ``mutual-transport ensemble'' (MTE). This approach introduces a co-trained auxiliary model and adaptively regularizes the cross-entropy loss using Kullback-Leibler (KL) divergence between the prediction distributions of the primary and auxiliary models. We conducted extensive studies on various benchmarks to validate the effectiveness of our method. The results show that MTE can simultaneously enhance both accuracy and uncertainty calibration. For example, on the CIFAR-100 dataset, our MTE method on ResNet34/50 achieved significant improvements compared to previous state-of-the-art method, with absolute accuracy increases of 2.4%/3.7%, relative reductions in ECE of $42.3%/29.4%, and relative reductions in classwise-ECE of 11.6%/15.3%.

Updated: 2024-05-30 03:15:59

标题: 相互传输集成在准确可靠的预测中的应用

摘要: 深度神经网络（DNNs）在各种任务中取得了显著的成功，尤其是在预测准确性方面。然而，在复杂的现实场景中，特别是在安全关键应用中，高准确性本身并不足够。可靠的不确定性估计至关重要。现代DNNs通常使用交叉熵损失进行训练，往往过于自信，尤其是在含糊不清的样本中。为了改善不确定性校准，许多技术已被开发，但它们往往会牺牲预测准确性。为了解决这一挑战，我们提出了“相互传输集成”（MTE）方法。该方法引入了一个共同训练的辅助模型，并使用主模型和辅助模型的预测分布之间的Kullback-Leibler（KL）散度自适应地正则化交叉熵损失。我们对各种基准进行了广泛的研究，以验证我们方法的有效性。结果表明，MTE可以同时提高准确性和不确定性校准。例如，在CIFAR-100数据集上，我们的MTE方法在ResNet34/50上相对于先前的最先进方法取得了显著的改进，绝对准确性增加了2.4％/3.7％，ECE的相对减少为42.3％/29.4％，类别ECE的相对减少为11.6％/15.3％。

更新时间: 2024-05-30 03:15:59

领域: cs.AI

下载: http://arxiv.org/abs/2405.19656v1

Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training

Medical vision-language pre-training methods mainly leverage the correspondence between paired medical images and radiological reports. Although multi-view spatial images and temporal sequences of image-report pairs are available in off-the-shelf multi-modal medical datasets, most existing methods have not thoroughly tapped into such extensive supervision signals. In this paper, we introduce the Med-ST framework for fine-grained spatial and temporal modeling to exploit information from multiple spatial views of chest radiographs and temporal historical records. For spatial modeling, Med-ST employs the Mixture of View Expert (MoVE) architecture to integrate different visual features from both frontal and lateral views. To achieve a more comprehensive alignment, Med-ST not only establishes the global alignment between whole images and texts but also introduces modality-weighted local alignment between text tokens and spatial regions of images. For temporal modeling, we propose a novel cross-modal bidirectional cycle consistency objective by forward mapping classification (FMC) and reverse mapping regression (RMR). By perceiving temporal information from simple to complex, Med-ST can learn temporal semantics. Experimental results across four distinct tasks demonstrate the effectiveness of Med-ST, especially in temporal classification tasks. Our code and model are available at https://github.com/SVT-Yang/MedST.

Updated: 2024-05-30 03:15:09

标题: 解锁医学多模态预训练中的空间和时间信息力量

摘要: 医学视觉语言预训练方法主要利用医学图像和放射学报告之间的对应关系。尽管商用多模态医学数据集中提供了多视角空间图像和图像-报告对的时间序列，但大多数现有方法并没有充分利用这些广泛的监督信号。在本文中，我们介绍了Med-ST框架，用于精细的空间和时间建模，以利用来自胸部X光多个空间视图和时间历史记录的信息。对于空间建模，Med-ST采用了Mixture of View Expert (MoVE)架构，以整合来自正面和侧面视图的不同视觉特征。为了实现更全面的对齐，Med-ST不仅建立了整个图像和文本之间的全局对齐，还引入了基于模态加权的文本标记与图像空间区域之间的局部对齐。对于时间建模，我们提出了一种新颖的跨模态双向循环一致性目标，通过前向映射分类（FMC）和反向映射回归（RMR）。通过从简单到复杂地感知时间信息，Med-ST可以学习时间语义。在四个不同任务上的实验结果显示了Med-ST的有效性，特别是在时间分类任务中。我们的代码和模型可在https://github.com/SVT-Yang/MedST 上找到。

更新时间: 2024-05-30 03:15:09

领域: cs.AI

下载: http://arxiv.org/abs/2405.19654v1

SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems

Data-driven simulation surrogates help computational scientists study complex systems. They can also help inform impactful policy decisions. We introduce a learning framework for surrogate modeling where language is used to interface with the underlying system being simulated. We call a language description of a system a "system caption", or SysCap. To address the lack of datasets of paired natural language SysCaps and simulation runs, we use large language models (LLMs) to synthesize high-quality captions. Using our framework, we train multimodal text and timeseries regression models for two real-world simulators of complex energy systems. Our experiments demonstrate the feasibility of designing language interfaces for real-world surrogate models at comparable accuracy to standard baselines. We qualitatively and quantitatively show that SysCaps unlock text-prompt-style surrogate modeling and new generalization abilities beyond what was previously possible. We will release the generated SysCaps datasets and our code to support follow-on studies.

Updated: 2024-05-30 03:12:04

标题: SysCaps：用于复杂系统模拟替代品的语言界面

摘要: 数据驱动的模拟替代品帮助计算科学家研究复杂系统。它们还可以帮助制定有影响力的政策决策。我们引入了一个用于替代建模的学习框架，其中使用语言与正在模拟的基础系统进行接口。我们将系统的语言描述称为“系统标题”，或SysCap。为了解决缺乏配对自然语言SysCaps和模拟运行数据集的问题，我们使用大型语言模型(LLMs)来合成高质量的标题。使用我们的框架，我们为两个复杂能源系统的真实世界模拟器训练了多模态文本和时间序列回归模型。我们的实验证明了设计语言界面用于实际世界替代模型的可行性，其准确性与标准基线相媲美。我们以定性和定量的方式展示了SysCaps解锁了文本提示样式的替代建模和超越以前可能性的新泛化能力。我们将发布生成的SysCaps数据集和我们的代码，以支持后续研究。

更新时间: 2024-05-30 03:12:04

领域: cs.LG,cs.CL,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.19653v1

Few for Many: Tchebycheff Set Scalarization for Many-Objective Optimization

Multi-objective optimization can be found in many real-world applications where some conflicting objectives can not be optimized by a single solution. Existing optimization methods often focus on finding a set of Pareto solutions with different optimal trade-offs among the objectives. However, the required number of solutions to well approximate the whole Pareto optimal set could be exponentially large with respect to the number of objectives, which makes these methods unsuitable for handling many optimization objectives. In this work, instead of finding a dense set of Pareto solutions, we propose a novel Tchebycheff set scalarization method to find a few representative solutions (e.g., 5) to cover a large number of objectives (e.g., $>100$) in a collaborative and complementary manner. In this way, each objective can be well addressed by at least one solution in the small solution set. In addition, we further develop a smooth Tchebycheff set scalarization approach for efficient optimization with good theoretical guarantees. Experimental studies on different problems with many optimization objectives demonstrate the effectiveness of our proposed method.

Updated: 2024-05-30 03:04:57

标题: 少数对多数：Tchebycheff集标量化在多目标优化中的应用

摘要: 多目标优化可以在许多现实世界的应用中找到，一些相互冲突的目标无法通过单一解决方案进行优化。现有的优化方法通常专注于找到一组帕累托解决方案，这些解决方案在不同目标之间具有不同的最优权衡。然而，为了很好地近似整个帕累托最优解集，所需的解决方案数量可能会随着目标数量的增加呈指数级增长，这使得这些方法不适合处理许多优化目标。在这项工作中，我们提出了一种新颖的Tchebycheff集标量化方法，而不是找到一组密集的帕累托解决方案，以便以合作和互补的方式找到一些代表性解决方案（例如，5个），以覆盖大量的目标（例如，>100个）。这样，每个目标都可以至少被小解决方案集中的一个解决方案很好地解决。此外，我们进一步开发了一种平滑的Tchebycheff集标量化方法，以实现高效优化并具有良好的理论保证。对具有许多优化目标的不同问题进行的实验研究证明了我们提出的方法的有效性。

更新时间: 2024-05-30 03:04:57

领域: cs.LG,cs.AI,cs.NE,math.OC

下载: http://arxiv.org/abs/2405.19650v1

Towards Deeper Understanding of PPR-based Embedding Approaches: A Topological Perspective

Node embedding learns low-dimensional vectors for nodes in the graph. Recent state-of-the-art embedding approaches take Personalized PageRank (PPR) as the proximity measure and factorize the PPR matrix or its adaptation to generate embeddings. However, little previous work analyzes what information is encoded by these approaches, and how the information correlates with their superb performance in downstream tasks. In this work, we first show that state-of-the-art embedding approaches that factorize a PPR-related matrix can be unified into a closed-form framework. Then, we study whether the embeddings generated by this strategy can be inverted to better recover the graph topology information than random-walk based embeddings. To achieve this, we propose two methods for recovering graph topology via PPR-based embeddings, including the analytical method and the optimization method. Extensive experimental results demonstrate that the embeddings generated by factorizing a PPR-related matrix maintain more topological information, such as common edges and community structures, than that generated by random walks, paving a new way to systematically comprehend why PPR-based node embedding approaches outperform random walk-based alternatives in various downstream tasks. To the best of our knowledge, this is the first work that focuses on the interpretability of PPR-based node embedding approaches.

Updated: 2024-05-30 03:02:23

标题: 朝着更深入理解基于PPR的嵌入方法：拓扑学视角

摘要: Node embedding学习图中节点的低维向量。最近的最先进的嵌入方法将Personalized PageRank (PPR)作为接近度度量，并对PPR矩阵或其适应性进行因式分解以生成嵌入。然而，很少有先前的工作分析这些方法编码了什么信息，以及这些信息与它们在下游任务中出色表现之间的相关性。在这项工作中，我们首先展示了将PPR相关矩阵分解为嵌入方法可以统一为一个闭合形式框架。然后，我们研究了通过这种策略生成的嵌入是否可以被反转以更好地恢复图的拓扑信息，胜过基于随机游走的嵌入。为了实现这一点，我们提出了两种通过PPR嵌入恢复图拓扑的方法，包括分析方法和优化方法。大量实验结果表明，通过分解PPR相关矩阵生成的嵌入保留了更多的拓扑信息，如共同的边缘和社区结构，比通过随机游走生成的嵌入更好，为系统地理解为什么PPR基础的节点嵌入方法在各种下游任务中优于随机游走基础的替代方案铺平了一条新路。据我们所知，这是第一项专注于PPR基础的节点嵌入方法的可解释性的工作。

更新时间: 2024-05-30 03:02:23

领域: cs.LG,cs.SI,stat.ML

下载: http://arxiv.org/abs/2405.19649v1

Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach

Concerns regarding the propensity of Large Language Models (LLMs) to produce inaccurate outputs, also known as hallucinations, have escalated. Detecting them is vital for ensuring the reliability of applications relying on LLM-generated content. Current methods often demand substantial resources and rely on extensive LLMs or employ supervised learning with multidimensional features or intricate linguistic and semantic analyses difficult to reproduce and largely depend on using the same LLM that hallucinated. This paper introduces a supervised learning approach employing two simple classifiers utilizing only four numerical features derived from tokens and vocabulary probabilities obtained from other LLM evaluators, which are not necessarily the same. The method yields promising results, surpassing state-of-the-art outcomes in multiple tasks across three different benchmarks. Additionally, we provide a comprehensive examination of the strengths and weaknesses of our approach, highlighting the significance of the features utilized and the LLM employed as an evaluator. We have released our code publicly at https://github.com/Baylor-AI/HalluDetect.

Updated: 2024-05-30 03:00:47

标题: 在大型语言模型生成中检测幻觉：一种基于令牌概率的方法

摘要: 对于大型语言模型（LLMs）倾向产生不准确输出，也被称为幻觉的担忧已经加剧。检测它们对于确保依赖LLM生成内容的应用程序的可靠性至关重要。当前的方法通常需要大量资源，并且依赖于广泛的LLMs或使用具有多维特征或复杂的语言和语义分析的监督学习，这些分析难以复制，并且在很大程度上依赖于使用相同的LLM产生幻觉。本文介绍了一种监督学习方法，利用两个简单的分类器，仅利用从其他LLM评估器获得的来自标记和词汇概率的四个数值特征。该方法取得了令人期待的结果，在三个不同基准上的多项任务中超越了最新的成果。此外，我们对我们方法的优点和缺点进行了全面的检查，强调所使用的特征和LLM作为评估器的重要性。我们已经将我们的代码公开发布在https://github.com/Baylor-AI/HalluDetect上。

更新时间: 2024-05-30 03:00:47

领域: cs.CL,cs.AI,cs.LG,I.2.7

下载: http://arxiv.org/abs/2405.19648v1

FTS: A Framework to Find a Faithful TimeSieve

The field of time series forecasting has garnered significant attention in recent years, prompting the development of advanced models like TimeSieve, which demonstrates impressive performance. However, an analysis reveals certain unfaithfulness issues, including high sensitivity to random seeds and minute input noise perturbations. Recognizing these challenges, we embark on a quest to define the concept of \textbf{\underline{F}aithful \underline{T}ime\underline{S}ieve \underline{(FTS)}}, a model that consistently delivers reliable and robust predictions. To address these issues, we propose a novel framework aimed at identifying and rectifying unfaithfulness in TimeSieve. Our framework is designed to enhance the model's stability and resilience, ensuring that its outputs are less susceptible to the aforementioned factors. Experimentation validates the effectiveness of our proposed framework, demonstrating improved faithfulness in the model's behavior. Looking forward, we plan to expand our experimental scope to further validate and optimize our algorithm, ensuring comprehensive faithfulness across a wide range of scenarios. Ultimately, we aspire to make this framework can be applied to enhance the faithfulness of not just TimeSieve but also other state-of-the-art temporal methods, thereby contributing to the reliability and robustness of temporal modeling as a whole.

Updated: 2024-05-30 02:59:49

标题: FTS：一个寻找忠实TimeSieve的框架

摘要: 时间序列预测领域近年来备受关注，促使了像TimeSieve这样的先进模型的发展，展示出令人印象深刻的性能。然而，分析揭示了一些不忠实的问题，包括对随机种子和微小输入噪声扰动的高敏感性。认识到这些挑战，我们着手定义\textbf{\underline{F}aithful \underline{T}ime\underline{S}ieve（FTS）}的概念，这是一个始终提供可靠和稳健预测的模型。为了解决这些问题，我们提出了一个旨在识别和纠正TimeSieve中不忠实性的新框架。我们的框架旨在增强模型的稳定性和韧性，确保其输出对前述因素的影响较小。实验验证了我们提出的框架的有效性，展示了模型行为的改进忠实性。展望未来，我们计划扩大实验范围以进一步验证和优化我们的算法，确保在广泛的情景下实现全面的忠实性。最终，我们希望这个框架不仅能够应用于增强TimeSieve的忠实性，还能够应用于其他最先进的时间方法，从而为整个时间建模的可靠性和稳健性做出贡献。

更新时间: 2024-05-30 02:59:49

领域: cs.LG

下载: http://arxiv.org/abs/2405.19647v1

Literature Filtering for Systematic Reviews with Transformers

Identifying critical research within the growing body of academic work is an essential element of quality research. Systematic review processes, used in evidence-based medicine, formalise this as a procedure that must be followed in a research program. However, it comes with an increasing burden in terms of the time required to identify the important articles of research for a given topic. In this work, we develop a method for building a general-purpose filtering system that matches a research question, posed as a natural language description of the required content, against a candidate set of articles obtained via the application of broad search terms. Our results demonstrate that transformer models, pre-trained on biomedical literature then fine tuned for the specific task, offer a promising solution to this problem. The model can remove large volumes of irrelevant articles for most research questions.

Updated: 2024-05-30 02:55:49

标题: 使用Transformer进行系统评价中的文献筛选

摘要: 在不断增长的学术工作中识别关键研究是高质量研究的重要组成部分。系统性审查过程在循证医学中得到应用，将其规范为一个必须遵循的研究程序。然而，识别给定主题的重要研究文章所需的时间逐渐增加，构成了一个不断加重的负担。在这项工作中，我们开发了一种建立通用过滤系统的方法，该系统将一个研究问题（以自然语言描述所需内容）与通过应用广泛搜索词获取的候选文章集进行匹配。我们的结果表明，在生物医学文献上预先训练并针对特定任务进行微调的变压器模型为这一问题提供了一个有希望的解决方案。该模型可以为大多数研究问题去除大量无关文章。

更新时间: 2024-05-30 02:55:49

领域: cs.DL,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.20354v1

EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos

Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical phase recognition for open surgery understudied. This discrepancy is primarily attributed to the scarcity of publicly available open surgery video datasets for surgical phase recognition. To address this issue, we introduce a new egocentric open surgery video dataset for phase recognition, named EgoSurgery-Phase. This dataset comprises 15 hours of real open surgery videos spanning 9 distinct surgical phases all captured using an egocentric camera attached to the surgeon's head. In addition to video, the EgoSurgery-Phase offers eye gaze. As far as we know, it is the first real open surgery video dataset for surgical phase recognition publicly available. Furthermore, inspired by the notable success of masked autoencoders (MAEs) in video understanding tasks (e.g., action recognition), we propose a gaze-guided masked autoencoder (GGMAE). Considering the regions where surgeons' gaze focuses are often critical for surgical phase recognition (e.g., surgical field), in our GGMAE, the gaze information acts as an empirical semantic richness prior to guiding the masking process, promoting better attention to semantically rich spatial regions. GGMAE significantly improves the previous state-of-the-art recognition method (6.4% in Jaccard) and the masked autoencoder-based method (3.1% in Jaccard) on EgoSurgery-Phase. The dataset will be released at https://github.com/Fujiry0/EgoSurgery.

Updated: 2024-05-30 02:53:19

标题: EgoSurgery-Phase：一个来自自我中心开放手术视频的手术阶段识别数据集

摘要: 手术阶段识别已经引起了广泛关注，因为它有潜力为现代手术室的众多需求提供解决方案。然而，大多数现有方法集中在微创手术(MIS)上，而对于开放手术的手术阶段识别则研究不足。这种差异主要归因于公开可用的开放手术视频数据集稀缺。为了解决这一问题，我们引入了一个新的针对阶段识别的主观开放手术视频数据集，名为EgoSurgery-Phase。该数据集包括15小时的真实开放手术视频，跨越了9个不同的手术阶段，所有视频都是使用连接到外科医生头部的主观摄像头拍摄的。除了视频外，EgoSurgery-Phase还提供了眼睛注视信息。据我们所知，这是首个公开可用的真实开放手术视频数据集，用于手术阶段识别。此外，受到口罩自编码器(MAEs)在视频理解任务中（例如动作识别）的显着成功的启发，我们提出了一个注视引导的口罩自编码器(GGMAE)。考虑到外科医生注视焦点常常关键于手术阶段识别（例如手术场），在我们的GGMAE中，注视信息充当经验语义丰富度的先验，引导遮罩过程，促进对语义丰富空间区域的更好关注。GGMAE在EgoSurgery-Phase上显著改进了先前的最先进识别方法（在Jaccard上提高了6.4%）和基于遮罩自编码器的方法（在Jaccard上提高了3.1%）。该数据集将发布在https://github.com/Fujiry0/EgoSurgery。

更新时间: 2024-05-30 02:53:19

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19644v1

Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models

Understanding the internal representations of large language models (LLMs) can help explain models' behavior and verify their alignment with human values. Given the capabilities of LLMs in generating human-understandable text, we propose leveraging the model itself to explain its internal representations in natural language. We introduce a framework called Patchscopes and show how it can be used to answer a wide range of questions about an LLM's computation. We show that many prior interpretability methods based on projecting representations into the vocabulary space and intervening on the LLM computation can be viewed as instances of this framework. Moreover, several of their shortcomings such as failure in inspecting early layers or lack of expressivity can be mitigated by Patchscopes. Beyond unifying prior inspection techniques, Patchscopes also opens up new possibilities such as using a more capable model to explain the representations of a smaller model, and multihop reasoning error correction.

Updated: 2024-05-30 02:52:08

标题: Patchscopes：一个统一的框架，用于检查语言模型的隐藏表示

摘要: 理解大型语言模型（LLMs）的内部表示可以帮助解释模型的行为，并验证其与人类价值观的一致性。鉴于LLMs在生成人类可理解文本方面的能力，我们提出利用模型本身以自然语言解释其内部表示。我们引入了一个名为Patchscopes的框架，并展示了如何使用它来回答关于LLM计算的各种问题。我们展示了许多基于将表示投影到词汇空间并干预LLM计算的先前可解释性方法可以被视为这一框架的实例。此外，由Patchscopes可以缓解许多先前方法的缺点，如无法检查早期层或缺乏表达能力。除了统一先前的检查技术，Patchscopes还开辟了新的可能性，例如使用更强大的模型来解释较小模型的表示，以及多跳推理纠错。

更新时间: 2024-05-30 02:52:08

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.06102v3

Few-shot fault diagnosis based on multi-scale graph convolution filtering for industry

Industrial equipment fault diagnosis often encounter challenges such as the scarcity of fault data, complex operating conditions, and varied types of failures. Signal analysis, data statistical learning, and conventional deep learning techniques face constraints under these conditions due to their substantial data requirements and the necessity for transfer learning to accommodate new failure modes. To effectively leverage information and extract the intrinsic characteristics of faults across different domains under limited sample conditions, this paper introduces a fault diagnosis approach employing Multi-Scale Graph Convolution Filtering (MSGCF). MSGCF enhances the traditional Graph Neural Network (GNN) framework by integrating both local and global information fusion modules within the graph convolution filter block. This advancement effectively mitigates the over-smoothing issue associated with excessive layering of graph convolutional layers while preserving a broad receptive field. It also reduces the risk of overfitting in few-shot diagnosis, thereby augmenting the model's representational capacity. Experiments on the University of Paderborn bearing dataset (PU) demonstrate that the MSGCF method proposed herein surpasses alternative approaches in accuracy, thereby offering valuable insights for industrial fault diagnosis in few-shot learning scenarios.

Updated: 2024-05-30 02:51:29

标题: 《基于多尺度图卷积滤波的少样本故障诊断在工业中的应用》

摘要: 工业设备故障诊断常常面临挑战，如故障数据稀缺、复杂的操作条件和各种类型的故障。在这些情况下，信号分析、数据统计学习和传统的深度学习技术面临约束，因为它们需要大量的数据，以及需要进行迁移学习以适应新的故障模式。为了在有限的样本条件下有效利用信息并提取不同领域中故障的内在特征，本文介绍了一种采用多尺度图卷积滤波（MSGCF）的故障诊断方法。MSGCF通过在图卷积滤波器块中集成局部和全局信息融合模块，增强了传统的图神经网络（GNN）框架。这一进步有效地缓解了与图卷积层过多层叠相关的过度平滑问题，同时保持了广泛的感受野。它还减少了在少样本诊断中过拟合的风险，从而增强了模型的表征能力。对帕德博恩大学轴承数据集（PU）上的实验表明，本文提出的MSGCF方法在准确性上超过了替代方法，为少样本学习场景下的工业故障诊断提供了有价值的见解。

更新时间: 2024-05-30 02:51:29

领域: cs.AI

下载: http://arxiv.org/abs/2405.19642v1

Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning

Many processes in biology and drug discovery involve various 3D interactions between molecules, such as protein and protein, protein and small molecule, etc. Given that different molecules are usually represented in different granularity, existing methods usually encode each type of molecules independently with different models, leaving it defective to learn the various underlying interaction physics. In this paper, we first propose to universally represent an arbitrary 3D complex as a geometric graph of sets, shedding light on encoding all types of molecules with one model. We then propose a Generalist Equivariant Transformer (GET) to effectively capture both domain-specific hierarchies and domain-agnostic interaction physics. To be specific, GET consists of a bilevel attention module, a feed-forward module and a layer normalization module, where each module is E(3) equivariant and specialized for handling sets of variable sizes. Notably, in contrast to conventional pooling-based hierarchical models, our GET is able to retain fine-grained information of all levels. Extensive experiments on the interactions between proteins, small molecules and RNA/DNAs verify the effectiveness and generalization capability of our proposed method across different domains.

Updated: 2024-05-30 02:38:01

标题: 广义等变变压器朝向三维分子相互作用学习

摘要: 生物学和药物发现中的许多过程涉及分子之间的各种3D相互作用，例如蛋白质和蛋白质、蛋白质和小分子等。鉴于不同的分子通常以不同的粒度表示，现有方法通常使用不同的模型独立地对每种类型的分子进行编码，导致学习各种潜在相互作用物理性质的不完善。在本文中，我们首先提出将任意3D复合物通用地表示为一组几何图形，为使用一个模型编码所有类型的分子提供启示。然后，我们提出了一个通用等变Transformer（GET）来有效捕捉领域特定的层次结构和领域不可知的相互作用物理性质。具体而言，GET包括一个双层注意力模块、一个前馈模块和一个层归一化模块，其中每个模块都是E(3)等变的，专门用于处理不同大小的集合。值得注意的是，与传统的基于池化的分层模型相比，我们的GET能够保留所有层次的细粒度信息。对蛋白质、小分子和RNA/DNA之间的相互作用进行了大量实验，验证了我们提出的方法在不同领域的有效性和泛化能力。

更新时间: 2024-05-30 02:38:01

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2306.01474v6

Leveraging Open-Source Large Language Models for encoding Social Determinants of Health using an Intelligent Router

Social Determinants of Health (SDOH) play a significant role in patient health outcomes. The Center of Disease Control (CDC) introduced a subset of ICD-10 codes called Z-codes in an attempt to officially recognize and measure SDOH in the health care system. However, these codes are rarely annotated in a patient's Electronic Health Record (EHR), and instead, in many cases, need to be inferred from clinical notes. Previous research has shown that large language models (LLMs) show promise on extracting unstructured data from EHRs. However, with thousands of models to choose from with unique architectures and training sets, it's difficult to choose one model that performs the best on coding tasks. Further, clinical notes contain trusted health information making the use of closed-source language models from commercial vendors difficult, so the identification of open source LLMs that can be run within health organizations and exhibits high performance on SDOH tasks is an urgent problem. Here, we introduce an intelligent routing system for SDOH coding that uses a language model router to direct medical record data to open source LLMs that demonstrate optimal performance on specific SDOH codes. The intelligent routing system exhibits state of the art performance of 97.4% accuracy averaged across 5 codes, including homelessness and food insecurity, on par with closed models such as GPT-4o. In order to train the routing system and validate models, we also introduce a synthetic data generation and validation paradigm to increase the scale of training data without needing privacy protected medical records. Together, we demonstrate an architecture for intelligent routing of inputs to task-optimal language models to achieve high performance across a set of medical coding sub-tasks.

Updated: 2024-05-30 02:33:28

标题: 利用开源大规模语言模型，使用智能路由器对健康社会决定因素进行编码

摘要: 社会健康决定因素（SDOH）在患者健康结果中起着重要作用。疾病控制中心（CDC）引入了一组名为Z-代码的ICD-10代码子集，试图在卫生保健系统中正式承认和衡量SDOH。然而，这些代码很少在患者的电子健康记录（EHR）中进行标注，而在许多情况下，需要从临床笔记中推断出来。先前的研究表明，大型语言模型（LLMs）在从EHR中提取非结构化数据方面表现出潜力。然而，由于有数千种模型可供选择，具有独特的架构和训练集，因此很难选择一个在编码任务上表现最佳的模型。此外，临床笔记包含可信赖的健康信息，这使得使用来自商业供应商的闭源语言模型变得困难，因此，识别可以在卫生组织内运行并在SDOH任务上表现出较高性能的开源LLMs是一个紧迫的问题。在这里，我们介绍了一种智能SDOH编码路由系统，使用语言模型路由器将医疗记录数据引导到展示特定SDOH代码上表现最佳的开源LLMs。智能路由系统在5个代码上表现出97.4%的准确性，包括无家可归和食品不安全，与GPT-4o等闭源模型相媲美。为了训练路由系统并验证模型，我们还引入了一种合成数据生成和验证范式，以增加训练数据的规模，而无需隐私保护的医疗记录。总之，我们展示了一种智能路由输入到任务最优语言模型的架构，以在一组医疗编码子任务中实现高性能。

更新时间: 2024-05-30 02:33:28

领域: cs.AI

下载: http://arxiv.org/abs/2405.19631v1

The Relative Value of Prediction in Algorithmic Decision Making

Algorithmic predictions are increasingly used to inform the allocations of goods and interventions in the public sphere. In these domains, predictions serve as a means to an end. They provide stakeholders with insights into likelihood of future events as a means to improve decision making quality, and enhance social welfare. However, if maximizing welfare is the ultimate goal, prediction is only a small piece of the puzzle. There are various other policy levers a social planner might pursue in order to improve bottom-line outcomes, such as expanding access to available goods, or increasing the effect sizes of interventions. Given this broad range of design decisions, a basic question to ask is: What is the relative value of prediction in algorithmic decision making? How do the improvements in welfare arising from better predictions compare to those of other policy levers? The goal of our work is to initiate the formal study of these questions. Our main results are theoretical in nature. We identify simple, sharp conditions determining the relative value of prediction vis-\`a-vis expanding access, within several statistical models that are popular amongst quantitative social scientists. Furthermore, we illustrate how these theoretical insights may be used to guide the design of algorithmic decision making systems in practice.

Updated: 2024-05-30 02:32:27

标题: 算法决策中预测的相对价值

摘要: 算法预测越来越被用于公共领域中的商品和干预物品的分配。在这些领域，预测被视为手段而不是目的。它们为利益相关者提供未来事件可能性的洞察，以提高决策质量，并增强社会福利。然而，如果最大化福利是最终目标，预测只是谜题的一小部分。社会规划者可能会追求各种其他政策杠杆，以改善底线结果，例如扩大可获得商品的获取渠道，或增加干预效果尺度。鉴于这种广泛的设计决策范围，一个基本问题是：在算法决策中，预测的相对价值是多少？更好的预测带来的福利改善如何与其他政策杠杆相比？我们的工作目标是启动对这些问题的正式研究。我们的主要结果是理论性的。我们确定了在几种流行的数量社会科学家中的统计模型内，确定预测相对价值的简单而明确的条件。此外，我们阐明了如何利用这些理论洞察来指导实践中算法决策系统的设计。

更新时间: 2024-05-30 02:32:27

领域: cs.CY,cs.LG,econ.TH,stat.ML

下载: http://arxiv.org/abs/2312.08511v2

Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances

Optimal transport has been very successful for various machine learning tasks; however, it is known to suffer from the curse of dimensionality. Hence, dimensionality reduction is desirable when applied to high-dimensional data with low-dimensional structures. The kernel max-sliced (KMS) Wasserstein distance is developed for this purpose by finding an optimal nonlinear mapping that reduces data into $1$ dimensions before computing the Wasserstein distance. However, its theoretical properties have not yet been fully developed. In this paper, we provide sharp finite-sample guarantees under milder technical assumptions compared with state-of-the-art for the KMS $p$-Wasserstein distance between two empirical distributions with $n$ samples for general $p\in[1,\infty)$. Algorithm-wise, we show that computing the KMS $2$-Wasserstein distance is NP-hard, and then we further propose a semidefinite relaxation (SDR) formulation (which can be solved efficiently in polynomial time) and provide a relaxation gap for the SDP solution. We provide numerical examples to demonstrate the good performance of our scheme for high-dimensional two-sample testing.

Updated: 2024-05-30 02:23:39

标题: 核最大切片Wasserstein距离的统计和计算保证

摘要: 最大切片（KMS）Wasserstein距离是为了在计算Wasserstein距离之前找到一种将数据降维到1维的最优非线性映射而开发的。然而，其理论性质尚未完全发展。在本文中，我们提供了与现有技术相比更温和的技术假设下的尖锐有限样本保证，用于两个具有n个样本的经验分布之间的KMS p-Wasserstein距离（其中p∈[1,∞]）。在算法方面，我们表明计算KMS 2-Wasserstein距离是NP困难的，然后我们进一步提出了一个半定松弛（SDR）公式（可以在多项式时间内有效求解）并为SDP解决方案提供了一个松弛差距。我们提供数值示例来展示我们的方案在高维两样本测试中的良好性能。

更新时间: 2024-05-30 02:23:39

领域: stat.ML,cs.CC,cs.LG

下载: http://arxiv.org/abs/2405.15441v2

Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting

Large Language Models (LLMs) have exhibited remarkable proficiency in generating code. However, the misuse of LLM-generated (Synthetic) code has prompted concerns within both educational and industrial domains, highlighting the imperative need for the development of synthetic code detectors. Existing methods for detecting LLM-generated content are primarily tailored for general text and often struggle with code content due to the distinct grammatical structure of programming languages and massive "low-entropy" tokens. Building upon this, our work proposes a novel zero-shot synthetic code detector based on the similarity between the code and its rewritten variants. Our method relies on the intuition that the differences between the LLM-rewritten and original codes tend to be smaller when the original code is synthetic. We utilize self-supervised contrastive learning to train a code similarity model and assess our approach on two synthetic code detection benchmarks. Our results demonstrate a notable enhancement over existing synthetic content detectors designed for general texts, with an improvement of 20.5% in the APPS benchmark and 29.1% in the MBPP benchmark.

Updated: 2024-05-30 02:12:47

标题: 揭示LLM生成的代码：通过代码重写实现零-shot合成代码检测器

摘要: 大型语言模型（LLMs）在生成代码方面表现出色。然而，LLM生成的（合成）代码的滥用引发了教育和工业领域的担忧，突显了开发合成代码检测器的迫切需求。现有的检测LLM生成内容的方法主要针对一般文本，通常在处理代码内容时会遇到困难，因为编程语言具有独特的语法结构和大量的“低熵”标记。基于此，我们的工作提出了一种基于代码和其重写变体之间相似性的零样本合成代码检测器。我们的方法依赖于一个直觉，即LLM重写和原始代码之间的差异在原始代码是合成时往往更小。我们利用自监督对比学习来训练一个代码相似性模型，并在两个合成代码检测基准测试上评估我们的方法。我们的结果显示，相较于为一般文本设计的现有合成内容检测器，在APPS基准测试中提高了20.5％，在MBPP基准测试中提高了29.1％。

更新时间: 2024-05-30 02:12:47

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2405.16133v2

ORLM: Training Large Language Models for Optimization Modeling

Large Language Models (LLMs) have emerged as powerful tools for tackling complex Operations Research (OR) problem by providing the capacity in automating optimization modeling. However, current methodologies heavily rely on prompt engineering (e.g., multi-agent cooperation) with proprietary LLMs, raising data privacy concerns that could be prohibitive in industry applications. To tackle this issue, we propose training open-source LLMs for optimization modeling. We identify four critical requirements for the training dataset of OR LLMs, design and implement OR-Instruct, a semi-automated process for creating synthetic data tailored to specific requirements. We also introduce the IndustryOR benchmark, the first industrial benchmark for testing LLMs on solving real-world OR problems. We apply the data from OR-Instruct to various open-source LLMs of 7b size (termed as ORLMs), resulting in a significantly improved capability for optimization modeling. Our best-performing ORLM achieves state-of-the-art performance on the NL4OPT, MAMO, and IndustryOR benchmarks. Our code and data are available at \url{https://github.com/Cardinal-Operations/ORLM}.

Updated: 2024-05-30 02:12:05

标题: ORLM: 用于优化建模的大型语言模型训练

摘要: 大语言模型(LLMs)已经成为解决复杂运筹学(OR)问题的强大工具，通过提供自动化优化建模的能力。然而，当前的方法主要依赖于提示工程(例如，多代理合作)与专有LLMs，引发了可能在工业应用中禁止的数据隐私问题。为了解决这个问题，我们提出了训练开源LLMs进行优化建模。我们确定了OR LLMs训练数据集的四个关键要求，设计和实施了OR-Instruct，一个针对特定需求创建合成数据的半自动化过程。我们还介绍了IndustryOR基准，这是用于测试LLMs解决实际OR问题的第一个工业基准。我们将OR-Instruct的数据应用于各种7b大小的开源LLMs(称为ORLMs)，显著提高了优化建模的能力。我们表现最佳的ORLM在NL4OPT、MAMO和IndustryOR基准上实现了最先进的性能。我们的代码和数据可在\url{https://github.com/Cardinal-Operations/ORLM}获取。

更新时间: 2024-05-30 02:12:05

领域: cs.CL,cs.AI,cs.CE,cs.LG

下载: http://arxiv.org/abs/2405.17743v2

Easy Problems That LLMs Get Wrong

We introduce a comprehensive Linguistic Benchmark designed to evaluate the limitations of Large Language Models (LLMs) in domains such as logical reasoning, spatial intelligence, and linguistic understanding, among others. Through a series of straightforward questions, it uncovers the significant limitations of well-regarded models to perform tasks that humans manage with ease. It also highlights the potential of prompt engineering to mitigate some errors and underscores the necessity for better training methodologies. Our findings stress the importance of grounding LLMs with human reasoning and common sense, emphasising the need for human-in-the-loop for enterprise applications. We hope this work paves the way for future research to enhance the usefulness and reliability of new models.

Updated: 2024-05-30 02:09:51

标题: LLMs 错误的简单问题

摘要: 我们引入了一个全面的语言基准，旨在评估大型语言模型（LLMs）在逻辑推理、空间智能和语言理解等领域的局限性。通过一系列直接的问题，它揭示了备受推崇的模型在执行人类轻松完成的任务时存在的显著限制。它还突出了提示工程的潜力，以减轻一些错误，并强调了更好的训练方法的必要性。我们的发现强调了用人类推理和常识来基础LLMs的重要性，强调了企业应用中需要人类介入的必要性。我们希望这项工作为未来研究开辟道路，以增强新模型的实用性和可靠性。

更新时间: 2024-05-30 02:09:51

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.19616v1

Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models

In the field of natural language processing (NLP), Large Language Models (LLMs) have precipitated a paradigm shift, markedly enhancing performance in natural language generation tasks. Despite these advancements, the comprehensive evaluation of LLMs remains an inevitable challenge for the community. Recently, the utilization of Multiple Choice Question Answering (MCQA) as a benchmark for LLMs has gained considerable traction. This study first investigates the limitations of MCQA as an evaluation method for LLMs and then analyzes the fundamental reason for the limitations of MCQA, that while LLMs may select the correct answers, it is possible that they also recognize other wrong options as correct. Finally, we propose a dataset augmenting method for Multiple-Choice Questions (MCQs), MCQA+, that can more accurately reflect the performance of the model, which underscores the need for more robust evaluation mechanisms in assessing the performance of LLMs.

Updated: 2024-05-30 01:57:14

标题: 超越答案：审查用于评估大型语言模型的多选题答题的合理性

摘要: 在自然语言处理领域，大型语言模型（LLMs）引发了一次范式转变，显著提升了自然语言生成任务的性能。尽管取得了这些进展，对LLMs的全面评估仍然是社区面临的不可避免挑战。最近，将多项选择题答题（MCQA）作为LLMs的评估基准已经受到了相当多的关注。本研究首先调查了MCQA作为LLMs评估方法的局限性，然后分析了MCQA的局限性根本原因，即虽然LLMs可能选择了正确答案，但它们也可能将其他错误选项视为正确。最后，我们提出了一种用于多项选择题（MCQs）的数据增强方法MCQA+，可以更准确地反映模型的性能，这强调了在评估LLMs性能时需要更强大的评估机制的必要性。

更新时间: 2024-05-30 01:57:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.01349v2

Error Analysis of Shapley Value-Based Model Explanations: An Informative Perspective

Shapley value attribution (SVA) is an increasingly popular explainable AI (XAI) method, which quantifies the contribution of each feature to the model's output. However, recent work has shown that most existing methods to implement SVAs have some drawbacks, resulting in biased or unreliable explanations that fail to correctly capture the true intrinsic relationships between features and model outputs. Moreover, the mechanism and consequences of these drawbacks have not been discussed systematically. In this paper, we propose a novel error theoretical analysis framework, in which the explanation errors of SVAs are decomposed into two components: observation bias and structural bias. We further clarify the underlying causes of these two biases and demonstrate that there is a trade-off between them. Based on this error analysis framework, we develop two novel concepts: over-informative and underinformative explanations. We demonstrate how these concepts can be effectively used to understand potential errors of existing SVA methods. In particular, for the widely deployed assumption-based SVAs, we find that they can easily be under-informative due to the distribution drift caused by distributional assumptions. We propose a measurement tool to quantify such a distribution drift. Finally, our experiments illustrate how different existing SVA methods can be over- or under-informative. Our work sheds light on how errors incur in the estimation of SVAs and encourages new less error-prone methods.

Updated: 2024-05-30 01:56:53

标题: Shapley值模型解释的误差分析：一种信息角度

摘要: 沙普利值归因（SVA）是一种越来越受欢迎的可解释人工智能（XAI）方法，它量化了每个特征对模型输出的贡献。然而，最近的研究表明，大多数现有的实现SVA的方法存在一些缺点，导致偏见或不可靠的解释，不能正确捕捉特征与模型输出之间的真实内在关系。此外，这些缺点的机制和后果尚未得到系统讨论。在本文中，我们提出了一个新颖的错误理论分析框架，其中将SVA的解释错误分解为两个组成部分：观察偏差和结构偏差。我们进一步澄清了这两种偏差的根本原因，并展示了它们之间存在权衡。基于这个错误分析框架，我们开发了两个新概念：过度信息和不足信息的解释。我们展示了如何有效地使用这些概念来理解现有SVA方法的潜在错误。特别是对于广泛部署的基于假设的SVA，我们发现它们由于分布假设引起的分布漂移很容易成为不足信息。我们提出了一种度量工具来量化这种分布漂移。最后，我们的实验说明了不同现有SVA方法如何过度或不足信息。我们的工作揭示了在估计SVA时错误如何发生，并鼓励开发新的更少错误倾向的方法。

更新时间: 2024-05-30 01:56:53

领域: cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.13522v2

Enhancing Low-Resource Relation Representations through Multi-View Decoupling

Recently, prompt-tuning with pre-trained language models (PLMs) has demonstrated the significantly enhancing ability of relation extraction (RE) tasks. However, in low-resource scenarios, where the available training data is scarce, previous prompt-based methods may still perform poorly for prompt-based representation learning due to a superficial understanding of the relation. To this end, we highlight the importance of learning high-quality relation representation in low-resource scenarios for RE, and propose a novel prompt-based relation representation method, named MVRE (\underline{M}ulti-\underline{V}iew \underline{R}elation \underline{E}xtraction), to better leverage the capacity of PLMs to improve the performance of RE within the low-resource prompt-tuning paradigm. Specifically, MVRE decouples each relation into different perspectives to encompass multi-view relation representations for maximizing the likelihood during relation inference. Furthermore, we also design a Global-Local loss and a Dynamic-Initialization method for better alignment of the multi-view relation-representing virtual words, containing the semantics of relation labels during the optimization learning process and initialization. Extensive experiments on three benchmark datasets show that our method can achieve state-of-the-art in low-resource settings.

Updated: 2024-05-30 01:56:51

标题: 通过多视角解耦增强低资源关系表示

摘要: 最近，使用预训练语言模型（PLMs）进行即时调整已经显示出在关系提取（RE）任务中显著增强能力。然而，在资源匮乏的情况下，即训练数据有限的情况下，之前基于提示的方法可能仍然表现不佳，因为对关系的理解很肤浅。为此，我们强调了在资源匮乏的情况下学习高质量关系表示对于RE的重要性，并提出了一种新的基于提示的关系表示方法，名为MVRE（多视图关系提取），以更好地利用PLMs的能力来改善低资源提示调整范式中RE的性能。具体来说，MVRE将每个关系解耦为不同的视角，以涵盖多视图关系表示，以在关系推理过程中最大化可能性。此外，我们还设计了全局-局部损失和动态初始化方法，以更好地对齐多视图关系表示虚拟词，包含关系标签的语义，以在优化学习过程和初始化期间实现。在三个基准数据集上的大量实验证明，我们的方法在资源匮乏的环境中可以实现最新的技术水平。

更新时间: 2024-05-30 01:56:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2312.17267v4

Factor Augmented Tensor-on-Tensor Neural Networks

This paper studies the prediction task of tensor-on-tensor regression in which both covariates and responses are multi-dimensional arrays (a.k.a., tensors) across time with arbitrary tensor order and data dimension. Existing methods either focused on linear models without accounting for possibly nonlinear relationships between covariates and responses, or directly employed black-box deep learning algorithms that failed to utilize the inherent tensor structure. In this work, we propose a Factor Augmented Tensor-on-Tensor Neural Network (FATTNN) that integrates tensor factor models into deep neural networks. We begin with summarizing and extracting useful predictive information (represented by the ``factor tensor'') from the complex structured tensor covariates, and then proceed with the prediction task using the estimated factor tensor as input of a temporal convolutional neural network. The proposed methods effectively handle nonlinearity between complex data structures, and improve over traditional statistical models and conventional deep learning approaches in both prediction accuracy and computational cost. By leveraging tensor factor models, our proposed methods exploit the underlying latent factor structure to enhance the prediction, and in the meantime, drastically reduce the data dimensionality that speeds up the computation. The empirical performances of our proposed methods are demonstrated via simulation studies and real-world applications to three public datasets. Numerical results show that our proposed algorithms achieve substantial increases in prediction accuracy and significant reductions in computational time compared to benchmark methods.

Updated: 2024-05-30 01:56:49

标题: 因子增强的张量对张量神经网络

摘要: 本文研究了张量-张量回归的预测任务，其中协变量和响应都是跨时间的多维数组（也称为张量），具有任意张量阶和数据维度。现有方法要么专注于线性模型，而不考虑协变量和响应之间可能的非线性关系，要么直接使用无法利用固有张量结构的黑盒深度学习算法。在本研究中，我们提出了一种称为因子增强张量-张量神经网络（FATTNN）的方法，将张量因子模型集成到深度神经网络中。我们首先总结并从复杂结构的张量协变量中提取有用的预测信息（由“因子张量”表示），然后利用估计的因子张量作为时间卷积神经网络的输入进行预测任务。所提出的方法有效处理复杂数据结构之间的非线性，提高了传统统计模型和常规深度学习方法在预测准确性和计算成本方面的表现。通过利用张量因子模型，我们提出的方法利用潜在的因子结构增强了预测能力，同时大大减少了数据维度，加快了计算速度。通过模拟研究和对三个公共数据集的实际应用展示了我们提出的方法的实证表现。数值结果显示，与基准方法相比，我们提出的算法实现了预测准确性的显著提高和计算时间的显著降低。

更新时间: 2024-05-30 01:56:49

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2405.19610v1

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Diffusion models have achieved significant visual generation quality. However, their significant computational and memory costs pose challenge for their application on resource-constrained mobile devices or even desktop GPUs. Recent few-step diffusion models reduces the inference time by reducing the denoising steps. However, their memory consumptions are still excessive. The Post Training Quantization (PTQ) replaces high bit-width FP representation with low-bit integer values (INT4/8) , which is an effective and efficient technique to reduce the memory cost. However, when applying to few-step diffusion models, existing quantization methods face challenges in preserving both the image quality and text alignment. To address this issue, we propose an mixed-precision quantization framework - MixDQ. Firstly, We design specialized BOS-aware quantization method for highly sensitive text embedding quantization. Then, we conduct metric-decoupled sensitivity analysis to measure the sensitivity of each layer. Finally, we develop an integer-programming-based method to conduct bit-width allocation. While existing quantization methods fall short at W8A8, MixDQ could achieve W8A8 without performance loss, and W4A8 with negligible visual degradation. Compared with FP16, we achieve 3-4x reduction in model size and memory cost, and 1.45x latency speedup.

Updated: 2024-05-30 01:51:10

标题: MixDQ：具有度量解耦混合精度量化的内存高效少步文本到图像扩散模型

摘要: 扩散模型在视觉生成质量方面取得了显著进展。然而，它们显著的计算和内存成本对于在资源受限的移动设备甚至桌面GPU上的应用提出了挑战。最近的几步扩散模型通过减少去噪步骤来减少推断时间。然而，它们的内存消耗仍然过高。后训练量化（PTQ）用低位整数值（INT4/8）替换高位宽FP表示，这是一种有效和高效的减少内存成本的技术。然而，当应用于几步扩散模型时，现有的量化方法面临保留图像质量和文本对齐的挑战。为了解决这个问题，我们提出了混合精度量化框架MixDQ。首先，我们设计了专门针对高灵敏度文本嵌入量化的BOS感知量化方法。然后，我们进行度量分离灵敏度分析以衡量每一层的敏感性。最后，我们开发了一种基于整数规划的方法来进行位宽分配。在现有的量化方法在W8A8上表现不佳的情况下，MixDQ可以在不损失性能的情况下实现W8A8，并且在W4A8上几乎没有视觉降级。与FP16相比，我们实现了模型大小和内存成本的3-4倍减少，以及1.45倍的延迟加速。

更新时间: 2024-05-30 01:51:10

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.17873v2

Relation Modeling and Distillation for Learning with Noisy Labels

Learning with noisy labels has become an effective strategy for enhancing the robustness of models, which enables models to better tolerate inaccurate data. Existing methods either focus on optimizing the loss function to mitigate the interference from noise, or design procedures to detect potential noise and correct errors. However, their effectiveness is often compromised in representation learning due to the dilemma where models overfit to noisy labels. To address this issue, this paper proposes a relation modeling and distillation framework that models inter-sample relationships via self-supervised learning and employs knowledge distillation to enhance understanding of latent associations, which mitigate the impact of noisy labels. Specifically, the proposed method, termed RMDNet, includes two main modules, where the relation modeling (RM) module implements the contrastive learning technique to learn representations of all data, an unsupervised approach that effectively eliminates the interference of noisy tags on feature extraction. The relation-guided representation learning (RGRL) module utilizes inter-sample relation learned from the RM module to calibrate the representation distribution for noisy samples, which is capable of improving the generalization of the model in the inference phase. Notably, the proposed RMDNet is a plug-and-play framework that can integrate multiple methods to its advantage. Extensive experiments were conducted on two datasets, including performance comparison, ablation study, in-depth analysis and case study. The results show that RMDNet can learn discriminative representations for noisy data, which results in superior performance than the existing methods.

Updated: 2024-05-30 01:47:27

标题: 关系建模与蒸馏在带有嘈杂标签的学习中的应用

摘要: 学习具有噪声标签已成为增强模型鲁棒性的有效策略，使模型能够更好地容忍不准确的数据。现有方法要么专注于优化损失函数以减轻噪声的干扰，要么设计程序来检测潜在的噪声并纠正错误。然而，在表示学习中，它们的有效性通常会因模型过度拟合嘈杂标签而受损。为解决这一问题，本文提出了一个关系建模和蒸馏框架，通过自监督学习建模样本之间的关系，并利用知识蒸馏来增强对潜在关联的理解，从而减轻噪声标签的影响。具体地，所提出的方法称为RMDNet，包括两个主要模块，其中关系建模（RM）模块实现对所有数据的对比学习技术，这是一种无监督方法，能够有效消除噪声标签对特征提取的干扰。关系引导的表示学习（RGRL）模块利用从RM模块学习的样本间关系来校准嘈杂样本的表示分布，从而能够提高模型在推理阶段的泛化能力。值得注意的是，所提出的RMDNet是一个即插即用的框架，可以将多种方法整合到其中以获得优势。在两个数据集上进行了大量实验，包括性能比较、消融研究、深入分析和案例研究。结果表明，RMDNet能够为嘈杂数据学习出有区分性的表示，表现比现有方法更优秀。

更新时间: 2024-05-30 01:47:27

领域: cs.AI

下载: http://arxiv.org/abs/2405.19606v1

Continuously Optimizing Radar Placement with Model Predictive Path Integrals

Continuously optimizing sensor placement is essential for precise target localization in various military and civilian applications. While information theory has shown promise in optimizing sensor placement, many studies oversimplify sensor measurement models or neglect dynamic constraints of mobile sensors. To address these challenges, we employ a range measurement model that incorporates radar parameters and radar-target distance, coupled with Model Predictive Path Integral (MPPI) control to manage complex environmental obstacles and dynamic constraints. We compare the proposed approach against stationary radars or simplified range measurement models based on the root mean squared error (RMSE) of the Cubature Kalman Filter (CKF) estimator for the targets' state. Additionally, we visualize the evolving geometry of radars and targets over time, highlighting areas of highest measurement information gain, demonstrating the strengths of the approach. The proposed strategy outperforms stationary radars and simplified range measurement models in target localization, achieving a 38-74% reduction in mean RMSE and a 33-79% reduction in the upper tail of the 90% Highest Density Interval (HDI) over 500 Monte Carl (MC) trials across all time steps. Code will be made publicly available upon acceptance.

Updated: 2024-05-30 01:44:38

标题: 使用模型预测路径积分不断优化雷达布置

摘要: 持续优化传感器布置对于在各种军事和民用应用中精确目标定位至关重要。虽然信息理论在优化传感器布置方面表现出了潜力，但许多研究过于简化传感器测量模型或忽视移动传感器的动态约束。为了解决这些挑战，我们采用了一个结合了雷达参数和雷达-目标距离的距离测量模型，结合了模型预测路径积分（MPPI）控制来管理复杂的环境障碍和动态约束。我们通过Cubature Kalman Filter（CKF）估计器的均方根误差（RMSE）比较了所提出的方法与静止雷达或简化的距离测量模型对目标状态的性能。此外，我们可视化了雷达和目标随时间演变的几何形状，突出显示了具有最高测量信息增益的区域，展示了该方法的优势。所提出的策略在目标定位方面优于静止雷达和简化的距离测量模型，通过500次蒙特卡洛（MC）试验的所有时间步骤，平均RMSE减少了38-74％，90％最大密度区间（HDI）的上尾部减少了33-79％。代码将在接受后公开提供。

更新时间: 2024-05-30 01:44:38

领域: stat.AP,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.18999v2

Do spectral cues matter in contrast-based graph self-supervised learning?

The recent surge in contrast-based graph self-supervised learning has prominently featured an intensified exploration of spectral cues. However, an intriguing paradox emerges, as methods grounded in seemingly conflicting assumptions or heuristic approaches regarding the spectral domain demonstrate notable enhancements in learning performance. This paradox prompts a critical inquiry into the genuine contribution of spectral information to contrast-based graph self-supervised learning. This study undertakes an extensive investigation into this inquiry, conducting a thorough study of the relationship between spectral characteristics and the learning outcomes of contemporary methodologies. Based on this analysis, we claim that the effectiveness and significance of spectral information need to be questioned. Instead, we revisit simple edge perturbation: random edge dropping designed for node-level self-supervised learning and random edge adding intended for graph-level self-supervised learning. Compelling evidence is presented that these simple yet effective strategies consistently yield superior performance while demanding significantly fewer computational resources compared to all prior spectral augmentation methods. The proposed insights represent a significant leap forward in the field, potentially reshaping the understanding and implementation of graph self-supervised learning.

Updated: 2024-05-30 01:30:34

标题: 谱特征在基于对比图自监督学习中重要吗？

摘要: 最近对基于对比的图自监督学习的激增显著特点是对光谱线索的强化探索。然而，一个有趣的悖论出现了，基于看似矛盾的假设或启发式方法关于光谱域的方法表现出明显的学习性能增强。这个悖论促使对光谱信息对基于对比的图自监督学习的真实贡献进行关键的探讨。本研究对这一问题进行了广泛的调查，对当代方法学的光谱特征和学习结果之间的关系进行了深入研究。基于这一分析，我们认为光谱信息的有效性和重要性需要受到质疑。相反，我们重新审视了简单的边扰动：为节点级自监督学习设计的随机边删除和为图级自监督学习设计的随机边添加。我们提出了令人信服的证据，表明这些简单但有效的策略始终能够提供更优异的性能，同时相比于所有先前的光谱增强方法需要更少的计算资源。提出的见解代表了该领域的一次重大飞跃，有可能重新塑造对图自监督学习的理解和实施。

更新时间: 2024-05-30 01:30:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19600v1

Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models

Phishing attacks pose a significant threat to Internet users, with cybercriminals elaborately replicating the visual appearance of legitimate websites to deceive victims. Visual similarity-based detection systems have emerged as an effective countermeasure, but their effectiveness and robustness in real-world scenarios have been unexplored. In this paper, we comprehensively scrutinize and evaluate state-of-the-art visual similarity-based anti-phishing models using a large-scale dataset of 450K real-world phishing websites. Our analysis reveals that while certain models maintain high accuracy, others exhibit notably lower performance than results on curated datasets, highlighting the importance of real-world evaluation. In addition, we observe the real-world tactic of manipulating visual components that phishing attackers employ to circumvent the detection systems. To assess the resilience of existing models against adversarial attacks and robustness, we apply visible and perturbation-based manipulations to website logos, which adversaries typically target. We then evaluate the models' robustness in handling these adversarial samples. Our findings reveal vulnerabilities in several models, emphasizing the need for more robust visual similarity techniques capable of withstanding sophisticated evasion attempts. We provide actionable insights for enhancing the security of phishing defense systems, encouraging proactive actions. To the best of our knowledge, this work represents the first large-scale, systematic evaluation of visual similarity-based models for phishing detection in real-world settings, necessitating the development of more effective and robust defenses.

Updated: 2024-05-30 01:28:36

标题: 评估基于视觉相似性的网络钓鱼检测模型的有效性和稳健性

摘要: 网络钓鱼攻击对互联网用户构成了重大威胁，网络犯罪分子精心复制合法网站的视觉外观，以欺骗受害者。基于视觉相似性的检测系统已经成为一种有效的应对措施，但它们在真实世界场景中的有效性和稳健性尚未被探索。本文全面审查和评估了最先进的基于视觉相似性的反网络钓鱼模型，使用了一个包含45万个真实世界网络钓鱼网站的大规模数据集。我们的分析显示，虽然某些模型保持了高准确性，但其他模型在精选数据集上的表现明显低于结果，凸显了真实世界评估的重要性。此外，我们观察到网络钓鱼攻击者所采用的操纵视觉组件的真实世界策略，以规避检测系统。为评估现有模型对敌对攻击和稳健性的抵抗力，我们对网站标志进行可见和扰动性的操纵，这通常是攻击者的目标。然后，我们评估模型处理这些敌对样本的稳健性。我们的研究结果显示了几个模型的漏洞，强调了需要更加有效和稳健的视觉相似性技术来抵御复杂的规避尝试。我们提供了增强网络钓鱼防御系统安全性的可行见解，鼓励采取积极主动的行动。据我们所知，这项工作代表了首次针对真实世界环境中网络钓鱼检测的基于视觉相似性模型进行大规模系统评估，需要开发更加有效和稳健的防御措施。

更新时间: 2024-05-30 01:28:36

领域: cs.CR

下载: http://arxiv.org/abs/2405.19598v1

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights $W$ and inject learnable matrices $\Delta W$. These $\Delta W$ matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although recent PEFT methods have narrowed this gap, they do so at the cost of additional learnable parameters. We propose SVFT, a simple approach that fundamentally differs from existing methods: the structure imposed on $\Delta W$ depends on the specific weight matrix $W$. Specifically, SVFT updates $W$ as a sparse combination of outer products of its singular vectors, training only the coefficients (scales) of these sparse combinations. This approach allows fine-grained control over expressivity through the number of coefficients. Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that only recover up to 85% performance using 0.03 to 0.8% of the trainable parameter budget.

Updated: 2024-05-30 01:27:43

标题: SVFT: 使用奇异向量进行参数高效微调

摘要: 流行的参数高效调整（PEFT）方法，如LoRA及其变种，冻结预先训练的模型权重$W$并注入可学习的矩阵$\Delta W$。这些$\Delta W$矩阵被结构化为高效参数化，通常使用低秩近似或缩放向量等技术。然而，与完全微调相比，这些方法通常表现出性能差距。尽管最近的PEFT方法已经缩小了这一差距，但这是以额外可学习参数为代价的。我们提出了SVFT，这是一种与现有方法根本不同的简单方法：施加在$\Delta W$上的结构取决于特定的权重矩阵$W$。具体来说，SVFT将$W$更新为其奇异向量的外积的稀疏组合，仅训练这些稀疏组合的系数（比例）。这种方法通过系数的数量实现对表达能力的精细控制。对语言和视觉基准测试的大量实验表明，SVFT在仅训练0.006至0.25%的参数的情况下，恢复了高达96%的完全微调性能，优于现有方法，这些方法仅使用0.03至0.8%的可训练参数预算恢复高达85%的性能。

更新时间: 2024-05-30 01:27:43

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.19597v1

Computational Dualism and Objective Superintelligence

The concept of intelligent software is flawed. The behaviour of software is determined by the hardware that "interprets" it. This undermines claims regarding the behaviour of theorised, software superintelligence. Here we characterise this problem as "computational dualism", where instead of mental and physical substance, we have software and hardware. We argue that to make objective claims regarding performance we must avoid computational dualism. We propose a pancomputational alternative wherein every aspect of the environment is a relation between irreducible states. We formalise systems as behaviour (inputs and outputs), and cognition as embodied, embedded, extended and enactive. The result is cognition formalised as a part of the environment, rather than as a disembodied policy interacting with the environment through an interpreter. This allows us to make objective claims regarding intelligence, which we argue is the ability to "generalise", identify causes and adapt. We then establish objective upper bounds for intelligent behaviour. This suggests AGI will be safer, but more limited, than theorised.

Updated: 2024-05-30 01:26:25

标题: 计算二元论和客观超智能

摘要: 智能软件的概念是有缺陷的。软件的行为是由“解释”它的硬件所决定的。这破坏了关于理论上的软件超智能行为的主张。在这里，我们将这个问题描述为“计算双重论”，在这里，我们不是有精神和物理实质，而是软件和硬件。我们认为，为了对性能做出客观的声明，我们必须避免计算双重论。我们提出了一种全计算的替代方案，其中环境的每个方面都是不可减少状态之间的关系。我们将系统形式化为行为（输入和输出），并将认知形式化为体现、嵌入、扩展和使动。结果是认知形式化为环境的一部分，而不是作为一个脱离身体的策略通过一个解释器与环境互动。这使我们能够对智能做出客观的声明，我们认为智能是“概括”的能力，识别原因并适应。然后我们建立智能行为的客观上限。这表明通用人工智能将比理论上更安全，但更有限。

更新时间: 2024-05-30 01:26:25

领域: cs.AI,math.LO

下载: http://arxiv.org/abs/2302.00843v5

Why Larger Language Models Do In-context Learning Differently?

Large language models (LLM) have emerged as a powerful tool for AI, with the key ability of in-context learning (ICL), where they can perform well on unseen tasks based on a brief series of task examples without necessitating any adjustments to the model parameters. One recent interesting mysterious observation is that models of different scales may have different ICL behaviors: larger models tend to be more sensitive to noise in the test context. This work studies this observation theoretically aiming to improve the understanding of LLM and ICL. We analyze two stylized settings: (1) linear regression with one-layer single-head linear transformers and (2) parity classification with two-layer multiple attention heads transformers (non-linear data and non-linear model). In both settings, we give closed-form optimal solutions and find that smaller models emphasize important hidden features while larger ones cover more hidden features; thus, smaller models are more robust to noise while larger ones are more easily distracted, leading to different ICL behaviors. This sheds light on where transformers pay attention to and how that affects ICL. Preliminary experimental results on large base and chat models provide positive support for our analysis.

Updated: 2024-05-30 01:11:35

标题: 为什么更大的语言模型在上下文学习方面表现不同？

摘要: 大型语言模型(LLM)已经成为人工智能的强大工具，其关键能力是上下文学习(ICL)，它们可以根据一系列简短的任务示例在未见过的任务上表现良好，而无需对模型参数进行任何调整。最近一个有趣的神秘观察是，不同规模的模型可能具有不同的ICL行为：更大的模型倾向于对测试上下文中的噪音更敏感。本文在理论上研究了这一观察，旨在提高对LLM和ICL的理解。我们分析了两种简化设置：(1)具有一层单头线性变换器的线性回归和(2)具有两层多头关注变换器的奇偶分类(非线性数据和非线性模型)。在这两种设置中，我们给出了闭合形式的最优解，并发现较小的模型强调重要的隐藏特征，而较大的模型涵盖更多的隐藏特征；因此，较小的模型对噪音更具鲁棒性，而较大的模型更容易分心，导致不同的ICL行为。这揭示了变换器在哪里注意力，并且如何影响ICL。对大型基础和聊天模型的初步实验结果为我们的分析提供了积极的支持。

更新时间: 2024-05-30 01:11:35

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.19592v1

Weights Augmentation: it has never ever ever ever let her model down

Weight play an essential role in deep learning network models. Unlike network structure design, this article proposes the concept of weight augmentation, focusing on weight exploration. The core of Weight Augmentation Strategy (WAS) is to adopt random transformed weight coefficients training and transformed coefficients, named Shadow Weight(SW), for networks that can be used to calculate loss function to affect parameter updates. However, stochastic gradient descent is applied to Plain Weight(PW), which is referred to as the original weight of the network before the random transformation. During training, numerous SW collectively form high-dimensional space, while PW is directly learned from the distribution of SW instead of the data. The weight of the accuracy-oriented mode(AOM) relies on PW, which guarantees the network is highly robust and accurate. The desire-oriented mode(DOM) weight uses SW, which is determined by the network model's unique functions based on WAT's performance desires, such as lower computational complexity, lower sensitivity to particular data, etc. The dual mode be switched at anytime if needed. WAT extends the augmentation technique from data augmentation to weight, and it is easy to understand and implement, but it can improve almost all networks amazingly. Our experimental results show that convolutional neural networks, such as VGG-16, ResNet-18, ResNet-34, GoogleNet, MobilementV2, and Efficientment-Lite, can benefit much at little or no cost. The accuracy of models is on the CIFAR100 and CIFAR10 datasets, which can be evaluated to increase by 7.32\% and 9.28\%, respectively, with the highest values being 13.42\% and 18.93\%, respectively. In addition, DOM can reduce floating point operations (FLOPs) by up to 36.33\%. The code is available at https://github.com/zlearh/Weight-Augmentation-Technology.

Updated: 2024-05-30 00:57:06

标题: 权重增强：它从未让她的模型失望

摘要: Weight在深度学习网络模型中起着至关重要的作用。与网络结构设计不同，本文提出了权重增强的概念，重点放在权重探索上。权重增强策略（WAS）的核心是采用随机变换的权重系数训练和变换系数，称为Shadow Weight（SW），用于可以计算损失函数以影响参数更新的网络。然而，普通权重（PW）采用随机梯度下降，被称为网络在随机变换之前的原始权重。在训练过程中，大量SW共同形成高维空间，而PW直接从SW的分布中学习，而不是从数据中学习。以精度为导向的模式（AOM）的权重依赖于PW，这确保了网络具有高鲁棒性和准确性。以欲望为导向的模式（DOM）权重使用SW，根据WAT的性能需求确定网络模型的独特功能，例如较低的计算复杂性，对特定数据的更低敏感性等。如果需要，双模式可以随时切换。WAT将增强技术从数据增强扩展到权重，易于理解和实现，但几乎可以惊人地改进所有网络。我们的实验结果表明，卷积神经网络，如VGG-16、ResNet-18、ResNet-34、GoogleNet、MobilementV2和Efficientment-Lite，可以在几乎没有成本或很少成本的情况下获益。模型在CIFAR100和CIFAR10数据集上的准确性分别提高了7.32％和9.28％，最高值分别为13.42％和18.93％。此外，DOM可以将浮点运算（FLOPs）减少高达36.33％。代码可在https://github.com/zlearh/Weight-Augmentation-Technology找到。

更新时间: 2024-05-30 00:57:06

领域: cs.LG

下载: http://arxiv.org/abs/2405.19590v1

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

Acquiring a multi-task imitation policy in 3D manipulation poses challenges in terms of scene understanding and action prediction. Current methods employ both 3D representation and multi-view 2D representation to predict the poses of the robot's end-effector. However, they still require a considerable amount of high-quality robot trajectories, and suffer from limited generalization in unseen tasks and inefficient execution in long-horizon reasoning. In this paper, we propose SAM-E, a novel architecture for robot manipulation by leveraging a vision-foundation model for generalizable scene understanding and sequence imitation for long-term action reasoning. Specifically, we adopt Segment Anything (SAM) pre-trained on a huge number of images and promptable masks as the foundation model for extracting task-relevant features, and employ parameter-efficient fine-tuning on robot data for a better understanding of embodied scenarios. To address long-horizon reasoning, we develop a novel multi-channel heatmap that enables the prediction of the action sequence in a single pass, notably enhancing execution efficiency. Experimental results from various instruction-following tasks demonstrate that SAM-E achieves superior performance with higher execution efficiency compared to the baselines, and also significantly improves generalization in few-shot adaptation to new tasks.

Updated: 2024-05-30 00:32:51

标题: SAM-E：利用序列模仿的视觉基础模型进行实体操作

摘要: 在3D操作中获取多任务模仿策略面临着场景理解和动作预测方面的挑战。当前的方法利用了3D表示和多视角2D表示来预测机器人末端执行器的姿势。然而，它们仍然需要大量高质量的机器人轨迹，并且在未知任务中具有有限的泛化能力和长期推理中的低效执行。在本文中，我们提出了SAM-E，一种通过利用视觉基础模型进行可泛化场景理解和序列模仿进行长期动作推理的机器人操纵新架构。具体来说，我们采用了预先在大量图像和可提示掩模上进行训练的Segment Anything (SAM)作为提取任务相关特征的基础模型，并在机器人数据上进行参数高效的微调，以更好地理解具体情境。为了解决长期推理问题，我们开发了一种新颖的多通道热图，能够在单次传递中预测动作序列，显著增强执行效率。各种指令遵循任务的实验结果表明，与基线相比，SAM-E实现了更高的执行效率，并且在少样本适应新任务中显著提高了泛化性能。

更新时间: 2024-05-30 00:32:51

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.19586v1

Near-optimal Per-Action Regret Bounds for Sleeping Bandits

We derive near-optimal per-action regret bounds for sleeping bandits, in which both the sets of available arms and their losses in every round are chosen by an adversary. In a setting with $K$ total arms and at most $A$ available arms in each round over $T$ rounds, the best known upper bound is $O(K\sqrt{TA\ln{K}})$, obtained indirectly via minimizing internal sleeping regrets. Compared to the minimax $\Omega(\sqrt{TA})$ lower bound, this upper bound contains an extra multiplicative factor of $K\ln{K}$. We address this gap by directly minimizing the per-action regret using generalized versions of EXP3, EXP3-IX and FTRL with Tsallis entropy, thereby obtaining near-optimal bounds of order $O(\sqrt{TA\ln{K}})$ and $O(\sqrt{T\sqrt{AK}})$. We extend our results to the setting of bandits with advice from sleeping experts, generalizing EXP4 along the way. This leads to new proofs for a number of existing adaptive and tracking regret bounds for standard non-sleeping bandits. Extending our results to the bandit version of experts that report their confidences leads to new bounds for the confidence regret that depends primarily on the sum of experts' confidences. We prove a lower bound, showing that for any minimax optimal algorithms, there exists an action whose regret is sublinear in $T$ but linear in the number of its active rounds.

Updated: 2024-05-30 00:18:21

标题: 睡眠赌徒的近乎最优每次行动后悔界限

摘要: 我们为睡眠贪婪算法推导了近乎最优的每次操作后悔界限，其中可用手臂的集合和每一轮中它们的损失均由对手选择。在每一轮中具有$K$个总手臂并且每轮最多有$A$个可用手臂的情况下，目前已知的最佳上界是$O(K\sqrt{TA\ln{K}})$，通过间接最小化内部睡眠后悔而获得。与极小值$\Omega(\sqrt{TA})$下界相比，这个上界包含了额外的乘法因子$K\ln{K}$。我们通过直接最小化每次操作后悔，使用EXP3、EXP3-IX和带有Tsallis熵的FTRL的广义版本，从而获得了$O(\sqrt{TA\ln{K}})$和$O(\sqrt{T\sqrt{AK)})$的近乎最优界限。我们将结果扩展到具有来自睡眠专家建议的贪婪算法设置中，沿途泛化EXP4。这为许多现有标准非睡眠贪婪算法的自适应和跟踪后悔界限提供了新的证明。将结果扩展到报告他们信心的专家版本的贪婪算法导致了依赖于专家信心总和的置信后悔的新界限。我们证明了一个下界，表明对于任何极小值最优算法，存在一个操作，其后悔在$T$中是次线性的，但在其活动轮数中是线性的。

更新时间: 2024-05-30 00:18:21

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.01315v2

Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases

Human-Oriented Binary Reverse Engineering (HOBRE) lies at the intersection of binary and source code, aiming to lift binary code to human-readable content relevant to source code, thereby bridging the binary-source semantic gap. Recent advancements in uni-modal code model pre-training, particularly in generative Source Code Foundation Models (SCFMs) and binary understanding models, have laid the groundwork for transfer learning applicable to HOBRE. However, existing approaches for HOBRE rely heavily on uni-modal models like SCFMs for supervised fine-tuning or general LLMs for prompting, resulting in sub-optimal performance. Inspired by recent progress in large multi-modal models, we propose that it is possible to harness the strengths of uni-modal code models from both sides to bridge the semantic gap effectively. In this paper, we introduce a novel probe-and-recover framework that incorporates a binary-source encoder-decoder model and black-box LLMs for binary analysis. Our approach leverages the pre-trained knowledge within SCFMs to synthesize relevant, symbol-rich code fragments as context. This additional context enables black-box LLMs to enhance recovery accuracy. We demonstrate significant improvements in zero-shot binary summarization and binary function name recovery, with a 10.3% relative gain in CHRF and a 16.7% relative gain in a GPT4-based metric for summarization, as well as a 6.7% and 7.4% absolute increase in token-level precision and recall for name recovery, respectively. These results highlight the effectiveness of our approach in automating and improving binary code analysis.

Updated: 2024-05-30 00:17:44

标题: 源代码基础模型是可转移的二进制分析知识库

摘要: 人类导向的二进制逆向工程（HOBRE）位于二进制和源代码的交汇点，旨在将二进制代码转化为与源代码相关的可读内容，从而弥合二进制-源代码语义差距。最近在单模态代码模型预训练方面取得了重大进展，特别是在生成式源代码基础模型（SCFMs）和二进制理解模型方面，为适用于HOBRE的迁移学习奠定了基础。然而，现有的HOBRE方法主要依赖于像SCFMs这样的单模态模型进行监督微调或通用LLMs进行提示，导致性能不佳。受大型多模态模型最近进展的启发，我们提出可以利用来自两方的单模态代码模型的优势有效地弥合语义差距。在本文中，我们引入了一种新颖的探测和恢复框架，该框架结合了一个二进制-源代码编码器-解码器模型和黑盒LLMs用于二进制分析。我们的方法利用了SCFMs中的预训练知识，合成相关的、符号丰富的代码片段作为上下文。这种额外的上下文使黑盒LLMs能够提高恢复准确性。我们展示了在零-shot二进制摘要和二进制函数名称恢复方面的显著改进，其中CHRF相对增益为10.3%，基于GPT4的摘要度量相对增益为16.7%，以及名称恢复的标记级精度和召回率分别增加了6.7%和7.4%。这些结果突显了我们方法在自动化和改进二进制代码分析方面的有效性。

更新时间: 2024-05-30 00:17:44

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2405.19581v1

A Quantitative Study of SMS Phishing Detection

With the booming popularity of smartphones, threats related to these devices are increasingly on the rise. Smishing, a combination of SMS (Short Message Service) and phishing has emerged as a treacherous cyber threat used by malicious actors to deceive users, aiming to steal sensitive information, money or install malware on their mobile devices. Despite the increase in smishing attacks in recent years, there are very few studies aimed at understanding the factors that contribute to a user's ability to differentiate real from fake messages. To address this gap in knowledge, we have conducted an online survey on smishing detection with 187 participants. In this study, we presented them with 16 SMS screenshots and evaluated how different factors affect their decision making process in smishing detection. Next, we conducted a post-survey to garner information on the participants' security attitudes, behavior and knowledge. Our results highlighted that attention and security behavioral scores had a significant impact on participants' accuracy in identifying smishing messages. We found that participants had more difficulty identifying real messages from fake ones, with an accuracy of 67.1% with fake messages and 43.6% with real messages. Our study is crucial in developing proactive strategies to encounter and mitigate smishing attacks. By understanding what factors influence smishing detection, we aim to bolster users' resilience against such threats and create a safer digital environment for all.

Updated: 2024-05-30 00:03:56

标题: 一个短信钓鱼检测的定量研究

摘要: 随着智能手机的普及，与这些设备相关的威胁日益增加。Smishing是短信（Short Message Service）和网络钓鱼（phishing）的结合，已经成为一种危险的网络威胁，恶意行为者利用它欺骗用户，旨在窃取敏感信息、金钱或在他们的移动设备上安装恶意软件。尽管近年来smishing攻击有所增加，但很少有研究旨在了解影响用户区分真假信息能力的因素。为了填补这一知识空白，我们对187名参与者进行了一项关于smishing检测的在线调查。在这项研究中，我们向他们展示了16个短信截图，并评估了不同因素对他们在smishing检测过程中的决策过程的影响。接下来，我们进行了一项后调查，以获取有关参与者安全态度、行为和知识的信息。我们的结果显示，注意力和安全行为分数对参与者识别smishing信息的准确性有显著影响。我们发现，参与者更难以区分真实信息和伪造信息，对于伪造信息的准确率为67.1％，对于真实信息为43.6％。我们的研究对于制定积极应对和减轻smishing攻击的策略至关重要。通过了解影响smishing检测的因素，我们旨在增强用户对此类威胁的抵抗力，并为所有人创造更安全的数字环境。

更新时间: 2024-05-30 00:03:56

领域: cs.CR,cs.HC

下载: http://arxiv.org/abs/2311.06911v4