Arxiv Day: Article

No Free Lunch: Fundamental Limits of Learning Non-Hallucinating Generative Models

Generative models have shown impressive capabilities in synthesizing high-quality outputs across various domains. However, a persistent challenge is the occurrence of "hallucinations", where the model produces outputs that are plausible but invalid. While empirical strategies have been explored to mitigate this issue, a rigorous theoretical understanding remains elusive. In this paper, we develop a theoretical framework to analyze the learnability of non-hallucinating generative models from a learning-theoretic perspective. Our results reveal that non-hallucinating learning is statistically impossible when relying solely on the training dataset, even for a hypothesis class of size two and when the entire training set is truthful. To overcome these limitations, we show that incorporating inductive biases aligned with the actual facts into the learning process is essential. We provide a systematic approach to achieve this by restricting the facts set to a concept class of finite VC-dimension and demonstrate its effectiveness under various learning paradigms. Although our findings are primarily conceptual, they represent a first step towards a principled approach to addressing hallucinations in learning generative models.

Updated: 2024-10-24 23:57:11

标题: 没有免费的午餐：学习非幻觉生成模型的基本限制

摘要: 生成模型在合成高质量输出方面显示出令人印象深刻的能力，跨越各个领域。然而，一个持续存在的挑战是“幻觉”的发生，即模型产生看似合理但无效的输出。虽然已经探索了经验策略来减轻这个问题，但严格的理论理解仍然难以捉摸。在本文中，我们发展了一个理论框架，从学习理论的角度分析非幻觉生成模型的可学习性。我们的结果显示，当仅依赖于训练数据集时，即使对于大小为两的假设类，整个训练集都是真实的情况下，非幻觉学习在统计上是不可能的。为了克服这些限制，我们展示了将与实际事实相一致的归纳偏见纳入学习过程是必不可少的。我们提供了一个系统方法来实现这一点，通过将事实集限制为具有有限VC维度的概念类，并展示其在各种学习范式下的有效性。尽管我们的发现主要是概念性的，但它们代表了解决学习生成模型中幻觉问题的有原则的方法的第一步。

更新时间: 2024-10-24 23:57:11

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.19217v1

Predicting Liquidity Coverage Ratio with Gated Recurrent Units: A Deep Learning Model for Risk Management

With the global economic integration and the high interconnection of financial markets, financial institutions are facing unprecedented challenges, especially liquidity risk. This paper proposes a liquidity coverage ratio (LCR) prediction model based on the gated recurrent unit (GRU) network to help financial institutions manage their liquidity risk more effectively. By utilizing the GRU network in deep learning technology, the model can automatically learn complex patterns from historical data and accurately predict LCR for a period of time in the future. The experimental results show that compared with traditional methods, the GRU model proposed in this study shows significant advantages in mean absolute error (MAE), proving its higher accuracy and robustness. This not only provides financial institutions with a more reliable liquidity risk management tool but also provides support for regulators to formulate more scientific and reasonable policies, which helps to improve the stability of the entire financial system.

Updated: 2024-10-24 23:43:50

标题: 使用门控循环单元预测流动性覆盖比率：风险管理的深度学习模型

摘要: 随着全球经济一体化和金融市场的高度互联互通，金融机构面临前所未有的挑战，尤其是流动性风险。本文提出了一种基于门控循环单元（GRU）网络的流动性覆盖率（LCR）预测模型，以帮助金融机构更有效地管理其流动性风险。通过利用深度学习技术中的GRU网络，该模型可以自动从历史数据中学习复杂模式，并准确预测未来一段时间的LCR。实验结果表明，与传统方法相比，本研究提出的GRU模型在平均绝对误差（MAE）方面具有显著优势，证明其更高的准确性和稳健性。这不仅为金融机构提供了更可靠的流动性风险管理工具，还为监管机构制定更科学合理的政策提供支持，有助于提高整个金融系统的稳定性。

更新时间: 2024-10-24 23:43:50

领域: cs.LG

下载: http://arxiv.org/abs/2410.19211v1

Equitable Federated Learning with Activation Clustering

Federated learning is a prominent distributed learning paradigm that incorporates collaboration among diverse clients, promotes data locality, and thus ensures privacy. These clients have their own technological, cultural, and other biases in the process of data generation. However, the present standard often ignores this bias/heterogeneity, perpetuating bias against certain groups rather than mitigating it. In response to this concern, we propose an equitable clustering-based framework where the clients are categorized/clustered based on how similar they are to each other. We propose a unique way to construct the similarity matrix that uses activation vectors. Furthermore, we propose a client weighing mechanism to ensure that each cluster receives equal importance and establish $O(1/\sqrt{K})$ rate of convergence to reach an $\epsilon-$stationary solution. We assess the effectiveness of our proposed strategy against common baselines, demonstrating its efficacy in terms of reducing the bias existing amongst various client clusters and consequently ameliorating algorithmic bias against specific groups.

Updated: 2024-10-24 23:36:39

标题: 具有激活聚类的公平联邦学习

摘要: 联邦学习是一种突出的分布式学习范式，它整合了不同客户之间的合作，促进了数据本地化，从而确保隐私。这些客户在数据生成过程中具有自己的技术、文化和其他偏见。然而，目前的标准通常忽视了这种偏见/异质性，而不是减少它。针对这一问题，我们提出了一个基于公平聚类的框架，其中客户根据彼此的相似性进行分类/聚类。我们提出了一种使用激活向量构建相似性矩阵的独特方法。此外，我们提出了一个客户权重机制，确保每个簇获得相等重要性，并建立了达到$\epsilon-$稳定解的$O(1/\sqrt{K})$收敛速度。我们评估了我们提出的策略对常见基线的有效性，证明了它在减少不同客户群体之间存在的偏见和因此改善针对特定群体的算法偏见方面的功效。

更新时间: 2024-10-24 23:36:39

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2410.19207v1

Inference time LLM alignment in single and multidomain preference spectrum

Aligning Large Language Models (LLM) to address subjectivity and nuanced preference levels requires adequate flexibility and control, which can be a resource-intensive and time-consuming procedure. Existing training-time alignment methods require full re-training when a change is needed and inference-time ones typically require access to the reward model at each inference step. To address these limitations, we introduce inference-time model alignment method that learns encoded representations of preference dimensions, called \textit{Alignment Vectors} (AV). These representations are computed by subtraction of the base model from the aligned model as in model editing enabling dynamically adjusting the model behavior during inference through simple linear operations. Even though the preference dimensions can span various granularity levels, here we focus on three gradual response levels across three specialized domains: medical, legal, and financial, exemplifying its practical potential. This new alignment paradigm introduces adjustable preference knobs during inference, allowing users to tailor their LLM outputs while reducing the inference cost by half compared to the prompt engineering approach. Additionally, we find that AVs are transferable across different fine-tuning stages of the same model, demonstrating their flexibility. AVs also facilitate multidomain, diverse preference alignment, making the process 12x faster than the retraining approach.

Updated: 2024-10-24 23:31:39

标题: 推理时间LLM在单一和多域偏好谱中的对齐

摘要: 将大型语言模型（LLM）对齐以解决主观性和微妙偏好水平需要足够的灵活性和控制，这可能是一个资源密集型和耗时的过程。现有的训练时间对齐方法在需要更改时需要完全重新训练，而推理时间方法通常需要在每个推理步骤中访问奖励模型。为了解决这些限制，我们引入了推理时间模型对齐方法，该方法学习了偏好维度的编码表示，称为\textit{对齐向量}（AV）。这些表示是通过从对齐模型中减去基础模型来计算的，就像在模型编辑中一样，通过简单的线性操作在推理过程中动态调整模型行为。尽管偏好维度可以跨越各种粒度级别，但在这里我们专注于三个逐渐响应水平在三个专业领域之间的例证：医疗、法律和金融，展示其实际潜力。这种新的对齐范例在推理过程中引入了可调节的偏好旋钮，允许用户定制他们的LLM输出，同时将推理成本减少了一半，与提示工程方法相比。此外，我们发现AVs在同一模型的不同微调阶段之间具有可转移性，展示了它们的灵活性。AVs也促进多领域、多样化的偏好对齐，使该过程比重新训练方法快12倍。

更新时间: 2024-10-24 23:31:39

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.19206v1

An Inverse Modeling Constrained Multi-Objective Evolutionary Algorithm Based on Decomposition

This paper introduces the inverse modeling constrained multi-objective evolutionary algorithm based on decomposition (IM-C-MOEA/D) for addressing constrained real-world optimization problems. Our research builds upon the advancements made in evolutionary computing-based inverse modeling, and it strategically bridges the gaps in applying inverse models based on decomposition to problem domains with constraints. The proposed approach is experimentally evaluated on diverse real-world problems (RWMOP1-35), showing superior performance to state-of-the-art constrained multi-objective evolutionary algorithms (CMOEAs). The experimental results highlight the robustness of the algorithm and its applicability in real-world constrained optimization scenarios.

Updated: 2024-10-24 23:24:44

标题: 基于分解的反演建模约束多目标进化算法

摘要: 本文介绍了基于分解的反向建模约束多目标进化算法（IM-C-MOEA/D），用于解决受限制的现实世界优化问题。我们的研究建立在基于进化计算的反向建模方面取得的进展之上，有策略地弥合了将基于分解的反向模型应用于带约束问题领域的差距。所提出的方法在各种现实世界问题（RWMOP1-35）上进行了实验评估，表现出优于现有受限制多目标进化算法（CMOEAs）的性能。实验结果突显了该算法的稳健性以及在现实世界受限制优化场景中的适用性。

更新时间: 2024-10-24 23:24:44

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.19203v1

Binary Classification: Is Boosting stronger than Bagging?

Random Forests have been one of the most popular bagging methods in the past few decades, especially due to their success at handling tabular datasets. They have been extensively studied and compared to boosting models, like XGBoost, which are generally considered more performant. Random Forests adopt several simplistic assumptions, such that all samples and all trees that form the forest are equally important for building the final model. We introduce Enhanced Random Forests, an extension of vanilla Random Forests with extra functionalities and adaptive sample and model weighting. We develop an iterative algorithm for adapting the training sample weights, by favoring the hardest examples, and an approach for finding personalized tree weighting schemes for each new sample. Our method significantly improves upon regular Random Forests across 15 different binary classification datasets and considerably outperforms other tree methods, including XGBoost, when run with default hyperparameters, which indicates the robustness of our approach across datasets, without the need for extensive hyperparameter tuning. Our tree-weighting methodology results in enhanced or comparable performance to the uniformly weighted ensemble, and is, more importantly, leveraged to define importance scores for trees based on their contributions to classifying each new sample. This enables us to only focus on a small number of trees as the main models that define the outcome of a new sample and, thus, to partially recover interpretability, which is critically missing from both bagging and boosting methods. In binary classification problems, the proposed extensions and the corresponding results suggest the equivalence of bagging and boosting methods in performance, and the edge of bagging in interpretability by leveraging a few learners of the ensemble, which is not an option in the less explainable boosting methods.

Updated: 2024-10-24 23:22:33

标题: 二元分类：增强是否比装袋更强大？

摘要: 随机森林是过去几十年中最受欢迎的装袋方法之一，特别是由于它们在处理表格数据集方面的成功。它们已经被广泛研究并与提升模型（如XGBoost）进行比较，后者通常被认为性能更好。随机森林采用了几个简化的假设，即构成森林的所有样本和所有树对于构建最终模型都同样重要。我们引入了增强型随机森林，这是对普通随机森林的扩展，具有额外的功能和自适应样本和模型加权。我们开发了一个迭代算法，通过偏爱最困难的例子来调整训练样本权重，并为每个新样本找到个性化的树加权方案。我们的方法在15个不同的二分类数据集上显著改进了常规随机森林，并在运行默认超参数时明显优于其他树方法，包括XGBoost，这表明我们的方法在数据集之间具有鲁棒性，无需进行大量超参数调整。我们的树加权方法导致增强或与均匀加权集合相当的性能，并且更重要的是，利用这些方法来定义根据树对每个新样本分类的贡献来计算重要性分数。这使我们只需关注少量树作为定义新样本结果的主要模型，并因此，在一定程度上恢复了可解释性，这在装袋和提升方法中都是缺失的。在二分类问题中，所提出的扩展和相应结果表明了装袋和提升方法在性能上的等效性，以及通过利用集合中少量学习者提升解释性的优势，这在较不可解释的提升方法中并不存在。

更新时间: 2024-10-24 23:22:33

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.19200v1

MAP: Multi-Human-Value Alignment Palette

Ensuring that generative AI systems align with human values is essential but challenging, especially when considering multiple human values and their potential trade-offs. Since human values can be personalized and dynamically change over time, the desirable levels of value alignment vary across different ethnic groups, industry sectors, and user cohorts. Within existing frameworks, it is hard to define human values and align AI systems accordingly across different directions simultaneously, such as harmlessness, helpfulness, and positiveness. To address this, we develop a novel, first-principle approach called Multi-Human-Value Alignment Palette (MAP), which navigates the alignment across multiple human values in a structured and reliable way. MAP formulates the alignment problem as an optimization task with user-defined constraints, which define human value targets. It can be efficiently solved via a primal-dual approach, which determines whether a user-defined alignment target is achievable and how to achieve it. We conduct a detailed theoretical analysis of MAP by quantifying the trade-offs between values, the sensitivity to constraints, the fundamental connection between multi-value alignment and sequential alignment, and proving that linear weighted rewards are sufficient for multi-value alignment. Extensive experiments demonstrate MAP's ability to align multiple values in a principled manner while delivering strong empirical performance across various tasks.

Updated: 2024-10-24 23:16:39

标题: 地图：多人价值观对齐调色板

摘要: 确保生成式人工智能系统与人类价值观保持一致至关重要，但也具有挑战性，特别是考虑到多种人类价值观及它们之间的潜在权衡。由于人类价值观可以个性化并随时间动态变化，不同种族群体、行业部门和用户群体之间对价值观一致性的理想水平也各不相同。在现有框架内，很难同时定义人类价值观并相应地对齐人工智能系统的不同方向，如无害性、有益性和积极性。为了解决这一问题，我们开发了一种新颖的基于第一原则的方法，名为多人类价值观对齐调色板（MAP），它以结构化和可靠的方式引导跨多种人类价值观的对齐。MAP将对齐问题表述为一个带有用户定义约束的优化任务，这些约束定义了人类价值目标。它可以通过原始-对偶方法高效解决，该方法确定了用户定义的对齐目标是否可实现以及如何实现它。我们通过量化价值之间的权衡、对约束的敏感性、多价值对齐与顺序对齐之间的基本联系的详细理论分析，证明了线性加权奖励对于多价值对齐是足够的。广泛的实验表明，MAP能够以原则性方式对齐多种价值观，同时在各种任务中表现出强大的实证性能。

更新时间: 2024-10-24 23:16:39

领域: cs.AI,cs.CY,cs.ET,cs.HC,cs.LG

下载: http://arxiv.org/abs/2410.19198v1

Transformers need glasses! Information over-squashing in language tasks

We study how information propagates in decoder-only Transformers, which are the architectural backbone of most existing frontier large language models (LLMs). We rely on a theoretical signal propagation analysis -- specifically, we analyse the representations of the last token in the final layer of the Transformer, as this is the representation used for next-token prediction. Our analysis reveals a representational collapse phenomenon: we prove that certain distinct sequences of inputs to the Transformer can yield arbitrarily close representations in the final token. This effect is exacerbated by the low-precision floating-point formats frequently used in modern LLMs. As a result, the model is provably unable to respond to these sequences in different ways -- leading to errors in, e.g., tasks involving counting or copying. Further, we show that decoder-only Transformer language models can lose sensitivity to specific tokens in the input, which relates to the well-known phenomenon of over-squashing in graph neural networks. We provide empirical evidence supporting our claims on contemporary LLMs. Our theory also points to simple solutions towards ameliorating these issues.

Updated: 2024-10-24 23:12:55

标题: 变压器需要眼镜！语言任务中的信息过度压缩

摘要: 我们研究了信息在仅解码器变压器中的传播方式，这是目前大部分前沿大型语言模型（LLMs）的架构基础。我们依靠理论信号传播分析——具体而言，我们分析了变压器最终层中最后一个标记的表示，因为这是用于下一个标记预测的表示。我们的分析揭示了一种表示坍塌现象：我们证明了向变压器输入的某些不同序列可以在最终标记中产生任意接近的表示。这种效应在现代LLMs中经常使用的低精度浮点格式的情况下会加剧。因此，可以证明该模型无法以不同方式响应这些序列，导致在涉及计数或复制等任务中出现错误。此外，我们展示了仅解码器变压器语言模型可能会对输入中的特定标记失去敏感性，这与图神经网络中众所周知的过度压缩现象相关。我们提供了支持我们对当代LLMs的主张的经验证据。我们的理论还指向解决这些问题的简单方案。

更新时间: 2024-10-24 23:12:55

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.04267v2

Maintaining Plasticity in Continual Learning via Regenerative Regularization

In continual learning, plasticity refers to the ability of an agent to quickly adapt to new information. Neural networks are known to lose plasticity when processing non-stationary data streams. In this paper, we propose L2 Init, a simple approach for maintaining plasticity by incorporating in the loss function L2 regularization toward initial parameters. This is very similar to standard L2 regularization (L2), the only difference being that L2 regularizes toward the origin. L2 Init is simple to implement and requires selecting only a single hyper-parameter. The motivation for this method is the same as that of methods that reset neurons or parameter values. Intuitively, when recent losses are insensitive to particular parameters, these parameters should drift toward their initial values. This prepares parameters to adapt quickly to new tasks. On problems representative of different types of nonstationarity in continual supervised learning, we demonstrate that L2 Init most consistently mitigates plasticity loss compared to previously proposed approaches.

Updated: 2024-10-24 23:03:41

标题: 通过再生规范化维持持续学习的可塑性

摘要: 在持续学习中，可塑性指的是一个机构快速适应新信息的能力。神经网络在处理非稳态数据流时已知会失去可塑性。在本文中，我们提出了L2 Init，一种通过在损失函数中加入朝向初始参数的L2正则化来维持可塑性的简单方法。这与标准的L2正则化（L2）非常相似，唯一的区别在于L2正则化是朝向原点的。L2 Init易于实现，只需要选择一个超参数。这种方法的动机与重置神经元或参数值的方法相同。直观地说，当最近的损失对特定参数不敏感时，这些参数应该向它们的初始值漂移。这将使参数能够快速适应新的任务。在代表持续监督学习中不同类型的非稳态性问题上，我们证明了与先前提出的方法相比，L2 Init最一致地减少了可塑性损失。

更新时间: 2024-10-24 23:03:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2308.11958v3

Active Label Refinement for Robust Training of Imbalanced Medical Image Classification Tasks in the Presence of High Label Noise

The robustness of supervised deep learning-based medical image classification is significantly undermined by label noise. Although several methods have been proposed to enhance classification performance in the presence of noisy labels, they face some challenges: 1) a struggle with class-imbalanced datasets, leading to the frequent overlooking of minority classes as noisy samples; 2) a singular focus on maximizing performance using noisy datasets, without incorporating experts-in-the-loop for actively cleaning the noisy labels. To mitigate these challenges, we propose a two-phase approach that combines Learning with Noisy Labels (LNL) and active learning. This approach not only improves the robustness of medical image classification in the presence of noisy labels, but also iteratively improves the quality of the dataset by relabeling the important incorrect labels, under a limited annotation budget. Furthermore, we introduce a novel Variance of Gradients approach in LNL phase, which complements the loss-based sample selection by also sampling under-represented samples. Using two imbalanced noisy medical classification datasets, we demonstrate that that our proposed technique is superior to its predecessors at handling class imbalance by not misidentifying clean samples from minority classes as mostly noisy samples.

Updated: 2024-10-24 22:59:27

标题: 文献标题翻译为：在高标签噪声存在的情况下，针对不平衡的医学图像分类任务进行强健训练的主动标签细化

摘要: 监督深度学习在医学图像分类中的鲁棒性受到标签噪声的显著影响。虽然已经提出了几种方法来增强在存在噪声标签的情况下的分类性能，但它们面临一些挑战：1）在处理类别不平衡的数据集时遇到困难，导致常常忽视少数类作为噪声样本；2）只关注最大化使用噪声数据集的性能，而没有将专家纳入其中以主动清洗噪声标签。为了减轻这些挑战，我们提出了一个结合了具有噪声标签的学习（LNL）和主动学习的两阶段方法。这种方法不仅提高了在存在噪声标签的情况下医学图像分类的鲁棒性，还通过在有限的注释预算下重新标记重要的错误标签，迭代地提高了数据集的质量。此外，我们在LNL阶段引入了一种新颖的梯度方差方法，通过对欠表示的样本进行抽样，补充了基于损失的样本选择。通过使用两个不平衡且带有噪声的医学分类数据集，我们证明了我们提出的技术在处理类别不平衡方面优于其前身，不会将少数类的干净样本误认为主要是噪声样本。

更新时间: 2024-10-24 22:59:27

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.05973v3

Enriching GNNs with Text Contextual Representations for Detecting Disinformation Campaigns on Social Media

Disinformation on social media poses both societal and technical challenges. While previous studies have integrated textual information into propagation networks, they have yet to fully leverage the advancements in Transformer-based language models for high-quality contextual text representations. This work investigates the impact of incorporating textual features into Graph Neural Networks (GNNs) for fake news detection. Our experiments demonstrate that contextual representations improve performance by 9.3% in Macro F1 over static ones and 33.8% over GNNs without textual features. However, noisy data augmentation degrades performance and increases instability. We expect our methodology to open avenues for further research, and all code is made publicly available.

Updated: 2024-10-24 22:57:17

标题: 使用文本上下文表示丰富GNNs以便在社交媒体上检测虚假信息活动

摘要: 社交媒体上的虚假信息既带来了社会挑战，也带来了技术挑战。虽然先前的研究已经将文本信息整合到传播网络中，但尚未充分利用基于Transformer的语言模型的进展，以获得高质量的上下文文本表示。本研究调查了将文本特征整合到图神经网络（GNNs）中用于检测虚假新闻的影响。我们的实验表明，上下文表示相对于静态表示，在Macro F1中的性能提高了9.3％，相对于没有文本特征的GNNs提高了33.8％。然而，嘈杂的数据增强会降低性能并增加不稳定性。我们期望我们的方法论能为进一步研究打开新的途径，并且所有代码都是公开可用的。

更新时间: 2024-10-24 22:57:17

领域: cs.CL,cs.AI,cs.LG,cs.SI,stat.ML

下载: http://arxiv.org/abs/2410.19193v1

TEAM: Topological Evolution-aware Framework for Traffic Forecasting--Extended Version

Due to the global trend towards urbanization, people increasingly move to and live in cities that then continue to grow. Traffic forecasting plays an important role in the intelligent transportation systems of cities as well as in spatio-temporal data mining. State-of-the-art forecasting is achieved by deep-learning approaches due to their ability to contend with complex spatio-temporal dynamics. However, existing methods assume the input is fixed-topology road networks and static traffic time series. These assumptions fail to align with urbanization, where time series are collected continuously and road networks evolve over time. In such settings, deep-learning models require frequent re-initialization and re-training, imposing high computational costs. To enable much more efficient training without jeopardizing model accuracy, we propose the Topological Evolution-aware Framework (TEAM) for traffic forecasting that incorporates convolution and attention. This combination of mechanisms enables better adaptation to newly collected time series, while being able to maintain learned knowledge from old time series. TEAM features a continual learning module based on the Wasserstein metric that acts as a buffer that can identify the most stable and the most changing network nodes. Then, only data related to stable nodes is employed for re-training when consolidating a model. Further, only data of new nodes and their adjacent nodes as well as data pertaining to changing nodes are used to re-train the model. Empirical studies with two real-world traffic datasets offer evidence that TEAM is capable of much lower re-training costs than existing methods are, without jeopardizing forecasting accuracy.

Updated: 2024-10-24 22:50:21

标题: 团队：基于拓扑演化的交通预测框架--扩展版

摘要: 由于全球城市化趋势，人们越来越多地搬到城市并在那里生活，城市也在继续增长。交通预测在城市智能交通系统以及时空数据挖掘中起着重要作用。由于深度学习方法具有处理复杂时空动态的能力，因此最先进的预测是通过深度学习方法实现的。然而，现有方法假设输入是固定拓扑的道路网络和静态的交通时间序列。这些假设与城市化不符，其中时间序列持续收集，道路网络随时间演变。在这种情况下，深度学习模型需要频繁重新初始化和重新训练，造成高计算成本。为了实现更高效的训练而不损害模型准确性，我们提出了一种考虑拓扑演化的交通预测框架（TEAM），其中包含卷积和注意力机制。这种机制的组合使得模型能够更好地适应新收集的时间序列，同时能够保留从旧时间序列学到的知识。TEAM具有基于Wasserstein度量的持续学习模块，作为一个缓冲区，可以识别最稳定和最变化的网络节点。然后，在巩固模型时，只有与稳定节点相关的数据被用于重新训练。此外，仅使用新节点及其相邻节点的数据以及涉及变化节点的数据来重新训练模型。通过两个真实世界的交通数据集进行的实证研究表明，TEAM能够比现有方法具有更低的重新训练成本，而不损害预测准确性。

更新时间: 2024-10-24 22:50:21

领域: cs.LG

下载: http://arxiv.org/abs/2410.19192v1

Generalizing Differentially Private Decentralized Deep Learning with Multi-Agent Consensus

Cooperative decentralized learning relies on direct information exchange between communicating agents, each with access to locally available datasets. The goal is to agree on model parameters that are optimal over all data. However, sharing parameters with untrustworthy neighbors can incur privacy risks by leaking exploitable information. To enable trustworthy cooperative learning, we propose a framework that embeds differential privacy into decentralized deep learning and secures each agent's local dataset during and after cooperative training. We prove convergence guarantees for algorithms derived from this framework and demonstrate its practical utility when applied to subgradient and ADMM decentralized approaches, finding accuracies approaching the centralized baseline while ensuring individual data samples are resilient to inference attacks. Furthermore, we study the relationships between accuracy, privacy budget, and networks' graph properties on collaborative classification tasks, discovering a useful invariance to the communication graph structure beyond a threshold.

Updated: 2024-10-24 22:49:22

标题: Generalizing Differentially Private Decentralized Deep Learning with Multi-Agent Consensus 用多智能体共识泛化差分私有化去中心化深度学习

摘要: 合作式分散学习依赖于通信代理之间的直接信息交换，每个代理都可以访问本地可用数据集。目标是达成对所有数据最优的模型参数一致。然而，与不可信的邻居共享参数可能会泄露可利用信息，从而带来隐私风险。为了实现可信的合作学习，我们提出了一个框架，将差分隐私嵌入到分散的深度学习中，并在合作训练期间和之后保护每个代理的本地数据集。我们证明了从该框架衍生的算法的收敛性保证，并证明了在应用于次梯度和ADMM分散方法时，它的实际效用，找到接近中心基准的准确性，同时确保个体数据样本对推断攻击具有弹性。此外，我们研究了在协作分类任务中准确性、隐私预算和网络图属性之间的关系，发现在通信图结构超过阈值时对其具有有用的不变性。

更新时间: 2024-10-24 22:49:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.13892v2

Reinforcement Learning the Chromatic Symmetric Function

We propose a conjectural counting formula for the coefficients of the chromatic symmetric function of unit interval graphs using reinforcement learning. The formula counts specific disjoint cycle-tuples in the graphs, referred to as Eschers, which satisfy certain concatenation conditions. These conditions are identified by a reinforcement learning model and are independent of the particular unit interval graph, resulting a universal counting expression.

Updated: 2024-10-24 22:45:01

标题: 强化学习色对称函数

摘要: 我们提出了一个猜想的计数公式，用于通过强化学习计算单位区间图的色对称函数的系数。该公式计算图中特定的不相交循环元组，称为艾舍尔，满足特定的连接条件。这些条件由一个强化学习模型确定，并且与特定的单位区间图无关，从而得到一个通用的计数表达式。

更新时间: 2024-10-24 22:45:01

领域: math.CO,cs.LG,05C15, 05C31, 68T07,

下载: http://arxiv.org/abs/2410.19189v1

Tailored-LLaMA: Optimizing Few-Shot Learning in Pruned LLaMA Models with Task-Specific Prompts

Large language models demonstrate impressive proficiency in language understanding and generation. Nonetheless, training these models from scratch, even the least complex billion-parameter variant demands significant computational resources rendering it economically impractical for many organizations. With large language models functioning as general-purpose task solvers, this paper investigates their task-specific fine-tuning. We employ task-specific datasets and prompts to fine-tune two pruned LLaMA models having 5 billion and 4 billion parameters. This process utilizes the pre-trained weights and focuses on a subset of weights using the LoRA method. One challenge in fine-tuning the LLaMA model is crafting a precise prompt tailored to the specific task. To address this, we propose a novel approach to fine-tune the LLaMA model under two primary constraints: task specificity and prompt effectiveness. Our approach, Tailored LLaMA initially employs structural pruning to reduce the model sizes from 7B to 5B and 4B parameters. Subsequently, it applies a carefully designed prompt specific to the task and utilizes the LoRA method to accelerate the fine-tuning process. Moreover, fine-tuning a model pruned by 50\% for less than one hour restores the mean accuracy of classification tasks to 95.68\% at a 20\% compression ratio and to 86.54\% at a 50\% compression ratio through few-shot learning with 50 shots. Our validation of Tailored LLaMA on these two pruned variants demonstrates that even when compressed to 50\%, the models maintain over 65\% of the baseline model accuracy in few-shot classification and generation tasks. These findings highlight the efficacy of our tailored approach in maintaining high performance with significantly reduced model sizes.

Updated: 2024-10-24 22:34:27

标题: 定制-LLaMA：使用任务特定提示优化剪枝的LLaMA模型中的少样本学习

摘要: 大型语言模型展示了在语言理解和生成方面的出色能力。然而，即使是最简单的10亿参数变体的模型，也需要大量的计算资源来训练，这使得许多组织在经济上无法实际应用。作为通用任务解决器的大型语言模型，本文研究了它们在特定任务上的微调。我们使用了特定任务的数据集和提示来微调两个修剪后的LLaMA模型，分别具有50亿和40亿参数。这个过程利用了预先训练的权重，并使用了LoRA方法来聚焦一部分权重。在微调LLaMA模型的一个挑战是制定一个针对特定任务的精确提示。为了解决这个问题，我们提出了一种新颖的方法来微调LLaMA模型，主要受到两个约束的影响：任务特异性和提示的有效性。我们的方法，Tailored LLaMA首先通过结构修剪将模型大小从70亿减少到50亿和40亿参数。随后，它应用一个精心设计的特定于任务的提示，并利用LoRA方法加速微调过程。此外，通过少量学习50个样本，在不到一小时的时间内对一个被压缩了50%的模型进行微调，将分类任务的平均准确率恢复到95.68%，在20%的压缩比下恢复到86.54%。我们对这两个被压缩的变体上的Tailored LLaMA的验证表明，即使被压缩到50%，这些模型在少量学习的分类和生成任务中仍保持了65%以上的基准模型准确率。这些发现突显了我们的定制方法在保持高性能的同时显著减小模型大小的有效性。

更新时间: 2024-10-24 22:34:27

领域: cs.AI

下载: http://arxiv.org/abs/2410.19185v1

No Argument Left Behind: Overlapping Chunks for Faster Processing of Arbitrarily Long Legal Texts

In a context where the Brazilian judiciary system, the largest in the world, faces a crisis due to the slow processing of millions of cases, it becomes imperative to develop efficient methods for analyzing legal texts. We introduce uBERT, a hybrid model that combines Transformer and Recurrent Neural Network architectures to effectively handle long legal texts. Our approach processes the full text regardless of its length while maintaining reasonable computational overhead. Our experiments demonstrate that uBERT achieves superior performance compared to BERT+LSTM when overlapping input is used and is significantly faster than ULMFiT for processing long legal documents.

Updated: 2024-10-24 22:33:30

标题: 不放过任何论点：重叠块以更快速处理任意长的法律文本

摘要: 在巴西司法系统面临由于处理数百万案件缓慢而引发的危机的背景下，开发有效的法律文本分析方法变得至关重要。我们引入了uBERT，这是一个将Transformer和递归神经网络结构相结合的混合模型，能够有效处理长篇法律文本。我们的方法可以处理任意长度的全文，并保持合理的计算开销。我们的实验表明，当使用重叠输入时，uBERT相比于BERT+LSTM表现更好，并且在处理长篇法律文件时比ULMFiT快得多。

更新时间: 2024-10-24 22:33:30

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2410.19184v1

Can Self Supervision Rejuvenate Similarity-Based Link Prediction?

Although recent advancements in end-to-end learning-based link prediction (LP) methods have shown remarkable capabilities, the significance of traditional similarity-based LP methods persists in unsupervised scenarios where there are no known link labels. However, the selection of node features for similarity computation in similarity-based LP can be challenging. Less informative node features can result in suboptimal LP performance. To address these challenges, we integrate self-supervised graph learning techniques into similarity-based LP and propose a novel method: Self-Supervised Similarity-based LP (3SLP). 3SLP is suitable for the unsupervised condition of similarity-based LP without the assistance of known link labels. Specifically, 3SLP introduces a dual-view contrastive node representation learning (DCNRL) with crafted data augmentation and node representation learning. DCNRL is dedicated to developing more informative node representations, replacing the node attributes as inputs in the similarity-based LP backbone. Extensive experiments over benchmark datasets demonstrate the salient improvement of 3SLP, outperforming the baseline of traditional similarity-based LP by up to 21.2% (AUC).

Updated: 2024-10-24 22:31:12

标题: 自我监督能否使基于相似性的链接预测恢复活力？

摘要: 尽管最近基于端到端学习的链接预测（LP）方法取得了显著进展，但传统基于相似性的LP方法在无监督场景中仍具有重要意义，因为在这种情况下没有已知的链接标签。然而，在基于相似性的LP中，选择节点特征进行相似性计算可能具有挑战性。信息较少的节点特征可能导致次优的LP性能。为了解决这些挑战，我们将自监督图学习技术集成到基于相似性的LP中，并提出了一种新方法：自监督相似性LP（3SLP）。3SLP适用于基于相似性的LP的无监督条件，无需已知的链接标签的辅助。具体而言，3SLP引入了一种双视图对比节点表示学习（DCNRL），采用精心设计的数据增强和节点表示学习。DCNRL致力于开发更具信息量的节点表示，用于替代基于相似性的LP骨干中的节点属性作为输入。对基准数据集的大量实验显示了3SLP的显著改进，其AUC值较传统基于相似性的LP基线提高了高达21.2%。

更新时间: 2024-10-24 22:31:12

领域: cs.AI

下载: http://arxiv.org/abs/2410.19183v1

Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem

This work investigates the in-context learning abilities of pretrained large language models (LLMs) when instructed to translate text from a low-resource language into a high-resource language as part of an automated machine translation pipeline. We conduct a set of experiments translating Southern Quechua to Spanish and examine the informativity of various types of context retrieved from a constrained database of digitized pedagogical materials (dictionaries and grammar lessons) and parallel corpora. Using both automatic and human evaluation of model output, we conduct ablation studies that manipulate (1) context type (morpheme translations, grammar descriptions, and corpus examples), (2) retrieval methods (automated vs. manual), and (3) model type. Our results suggest that even relatively small LLMs are capable of utilizing prompt context for zero-shot low-resource translation when provided a minimally sufficient amount of relevant linguistic information. However, the variable effects of context type, retrieval method, model type, and language-specific factors highlight the limitations of using even the best LLMs as translation systems for the majority of the world's 7,000+ languages and their speakers.

Updated: 2024-10-24 22:24:57

标题: LLM在低资源翻译中的缺陷：检索与理解都是问题

摘要: 这项研究调查了预训练的大型语言模型（LLMs）在作为自动机器翻译流程的一部分时，被指示将文本从低资源语言翻译成高资源语言的上下文学习能力。我们进行了一系列实验，将南部凯楚亚语翻译成西班牙语，并检查了从受限制的数字化教学材料数据库（词典和语法课程）和平行语料库中检索的各种类型上下文的信息量。通过对模型输出的自动和人工评估，我们进行了消融研究，操纵（1）上下文类型（词素翻译、语法描述和语料示例），（2）检索方法（自动 vs 手动），以及（3）模型类型。我们的结果表明，即使是相对较小的LLMs在提供了足够的相关语言信息后，也能够利用提示上下文进行零翻译低资源语言。然而，上下文类型、检索方法、模型类型和语言特定因素的可变效果突显了即使是最好的LLMs也存在使用限制，无法成为世界上7000多种语言和其使用者的大多数的翻译系统。

更新时间: 2024-10-24 22:24:57

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.15625v3

Cascading Failure Prediction via Causal Inference

Causal inference provides an analytical framework to identify and quantify cause-and-effect relationships among a network of interacting agents. This paper offers a novel framework for analyzing cascading failures in power transmission networks. This framework generates a directed latent graph in which the nodes represent the transmission lines and the directed edges encode the cause-effect relationships. This graph has a structure distinct from the system's topology, signifying the intricate fact that both local and non-local interdependencies exist among transmission lines, which are more general than only the local interdependencies that topological graphs can present. This paper formalizes a causal inference framework for predicting how an emerging anomaly propagates throughout the system. Using this framework, two algorithms are designed, providing an analytical framework to identify the most likely and most costly cascading scenarios. The framework's effectiveness is evaluated compared to the pertinent literature on the IEEE 14-bus, 39-bus, and 118-bus systems.

Updated: 2024-10-24 22:22:08

标题: 通过因果推断预测级联故障

摘要: 因果推断提供了一个分析框架，用于识别和量化在一个相互作用的代理网络中的因果关系。本文提供了一个新颖的框架，用于分析电力传输网络中的级联故障。该框架生成一个有向潜在图，其中节点代表传输线，有向边编码因果关系。这个图具有与系统拓扑结构不同的结构，标志着传输线之间存在本地和非本地相互依赖关系的复杂事实，这比拓扑图可以呈现的本地相互依赖关系更加普遍。本文正式化了一个因果推断框架，用于预测新兴异常如何在系统中传播。利用这个框架，设计了两种算法，提供了一个分析框架，用于识别最可能和最昂贵的级联场景。将该框架的有效性与关于IEEE 14-总线、39-总线和118-总线系统的相关文献进行了评估。

更新时间: 2024-10-24 22:22:08

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2410.19179v1

Perturbation-based Graph Active Learning for Weakly-Supervised Belief Representation Learning

This paper addresses the problem of optimizing the allocation of labeling resources for semi-supervised belief representation learning in social networks. The objective is to strategically identify valuable messages on social media graphs that are worth labeling within a constrained budget, ultimately maximizing the task's performance. Despite the progress in unsupervised or semi-supervised methods in advancing belief and ideology representation learning on social networks and the remarkable efficacy of graph learning techniques, the availability of high-quality curated labeled social data can greatly benefit and further improve performances. Consequently, allocating labeling efforts is a critical research problem in scenarios where labeling resources are limited. This paper proposes a graph data augmentation-inspired perturbation-based active learning strategy (PerbALGraph) that progressively selects messages for labeling according to an automatic estimator, obviating human guidance. This estimator is based on the principle that messages in the network that exhibit heightened sensitivity to structural features of the observational data indicate landmark quality that significantly influences semi-supervision processes. We design the estimator to be the prediction variance under a set of designed graph perturbations, which is model-agnostic and application-independent. Extensive experiment results demonstrate the effectiveness of the proposed strategy for belief representation learning tasks.

Updated: 2024-10-24 22:11:06

标题: 基于扰动的图主动学习用于弱监督信念表示学习

摘要: 本文讨论了在社交网络中优化标记资源分配的问题，用于半监督信念表示学习。目标是在受限预算内战略性地识别社交媒体图中值得标记的有价值的消息，最终最大化任务的性能。尽管在推动社交网络上的信念和意识形态表示学习方面取得了进展，并且图学习技术的显著有效性，但高质量的筛选标记的社交数据的可用性可以极大地受益并进一步提高性能。因此，在标记资源有限的情况下，分配标记工作是一个关键的研究问题。本文提出了一种受图数据增强启发的基于扰动的主动学习策略（PerbALGraph），根据自动估计器逐渐选择要标记的消息，不需要人工指导。该估计器基于网络中表现出对观测数据的结构特征高度敏感的消息指示具有显著影响半监督过程的标志质量的原则。我们设计估计器为在一组设计的图扰动下的预测方差，它是与模型无关且与应用无关的。大量实验结果表明了所提出的策略对信念表示学习任务的有效性。

更新时间: 2024-10-24 22:11:06

领域: cs.LG

下载: http://arxiv.org/abs/2410.19176v1

Indication Finding: a novel use case for representation learning

Many therapies are effective in treating multiple diseases. We present an approach that leverages methods developed in natural language processing and real-world data to prioritize potential, new indications for a mechanism of action (MoA). We specifically use representation learning to generate embeddings of indications and prioritize them based on their proximity to the indications with the strongest available evidence for the MoA. We demonstrate the successful deployment of our approach for anti-IL-17A using embeddings generated with SPPMI and present an evaluation framework to determine the quality of indication finding results and the derived embeddings.

Updated: 2024-10-24 22:03:36

标题: 发现指示：表示学习的一个新用例

摘要: 许多疗法在治疗多种疾病方面都是有效的。我们提出了一种利用自然语言处理和现实世界数据开发的方法来优先考虑某种作用机制（MoA）的潜在新适应症的方法。我们特别使用表示学习来生成适应症的嵌入，并根据它们与具有最强有效证据的MoA适应症的接近程度对它们进行优先排序。我们展示了我们针对抗IL-17A的方法的成功部署，使用SPPMI生成的嵌入，并提出了一个评估框架，用于确定适应症发现结果的质量和生成的嵌入的质量。

更新时间: 2024-10-24 22:03:36

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.19174v1

Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). By adopting Mixture of Experts (MoE) within a transformer-based diffusion policy, SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model. SDP not only reduces the burden of active parameters but also facilitates the seamless integration and reuse of experts across various tasks. Extensive experiments on diverse tasks in both simulations and real world show that SDP 1) excels in multitask scenarios with negligible increases in active parameters, 2) prevents forgetting in continual learning of new tasks, and 3) enables efficient task transfer, offering a promising solution for advanced robotic applications. Demos and codes can be found in https://forrest-110.github.io/sparse_diffusion_policy/.

Updated: 2024-10-24 22:01:44

标题: 稀疏扩散策略：一种稀疏、可重复使用和灵活的机器人学习策略

摘要: 随着机器人任务的复杂性不断增加，对多任务和持续学习的高效策略需求日益迫切。传统模型通常依赖于适用于所有任务的通用策略，面临诸如高计算成本和在学习新任务时发生灾难性遗忘等挑战。为了解决这些问题，我们引入了一种稀疏、可重复使用且灵活的策略，称为Sparse Diffusion Policy（SDP）。通过在基于转换器的扩散策略中采用专家混合（MoE），SDP可以选择性地激活专家和技能，实现高效和针对任务的学习，而无需重新训练整个模型。SDP不仅减轻了活跃参数的负担，还促进了专家在各种任务中的无缝集成和重复使用。在模拟和现实世界中进行的广泛实验显示，SDP在多任务场景中表现出色，活跃参数增加微不足道，防止了在持续学习新任务时的遗忘，并实现了高效的任务转移，为先进的机器人应用提供了有希望的解决方案。演示和代码可在https://forrest-110.github.io/sparse_diffusion_policy/中找到。

更新时间: 2024-10-24 22:01:44

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.01531v2

Bayan Algorithm: Detecting Communities in Networks Through Exact and Approximate Optimization of Modularity

Community detection is a classic network problem with extensive applications in various fields. Its most common method is using modularity maximization heuristics which rarely return an optimal partition or anything similar. Partitions with globally optimal modularity are difficult to compute, and therefore have been underexplored. Using structurally diverse networks, we compare 30 community detection methods including our proposed algorithm that offers optimality and approximation guarantees: the Bayan algorithm. Unlike existing methods, Bayan globally maximizes modularity or approximates it within a factor. Our results show the distinctive accuracy and stability of maximum-modularity partitions in retrieving planted partitions at rates higher than most alternatives for a wide range of parameter settings in two standard benchmarks. Compared to the partitions from 29 other algorithms, maximum-modularity partitions have the best medians for description length, coverage, performance, average conductance, and well clusteredness. These advantages come at the cost of additional computations which Bayan makes possible for small networks (networks that have up to 3000 edges in their largest connected component). Bayan is several times faster than using open-source and commercial solvers for modularity maximization, making it capable of finding optimal partitions for instances that cannot be optimized by any other existing method. Our results point to a few well performing algorithms, among which Bayan stands out as the most reliable method for small networks. A Python implementation of the Bayan algorithm (bayanpy) is publicly available through the package installer for Python.

Updated: 2024-10-24 21:45:34

标题: Bayan算法：通过准确和近似优化模块度来检测网络中的社区

摘要: 社区检测是一个经典的网络问题，具有广泛的应用领域。最常见的方法是使用模块性最大化启发式算法，很少返回最佳分区或类似的结果。具有全局最优模块性的分区很难计算，因此一直被忽视。我们使用结构多样的网络比较了30种社区检测方法，包括我们提出的具有最优性和近似性保证的算法：Bayan算法。与现有方法不同，Bayan全局最大化模块性或在一个因子内近似。我们的结果表明，最大模块性分区在两个标准基准测试中以比大多数替代方法更高的速率检索到植入分区的独特准确性和稳定性。与其他29种算法的分区相比，最大模块性分区在描述长度、覆盖率、性能、平均传导率和良好的聚类性方面具有最佳中位数。这些优势是以额外计算为代价的，Bayan使小网络（其最大连接组件中最多有3000条边的网络）的计算成为可能。Bayan比使用开源和商业求解器进行模块性最大化快几倍，使其能够找到任何其他现有方法无法优化的实例的最佳分区。我们的结果指出了一些表现良好的算法，其中Bayan是小网络中最可靠的方法。Bayan算法的Python实现（bayanpy）通过Python的包安装程序公开提供。

更新时间: 2024-10-24 21:45:34

领域: cs.SI,cond-mat.stat-mech,cs.DS,cs.LG,math.OC,90C90, 90C10, 90C57, 90C59, 90C35, 05C15, 65K05

下载: http://arxiv.org/abs/2209.04562v5

Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates

This paper explores optimal architectures for evaluating the outputs of large language models (LLMs) using LLMs themselves. We propose a novel framework that interprets LLMs as advocates within an ensemble of interacting agents, allowing them to defend their answers and reach conclusions through a judge and jury system. This approach offers a more dynamic and comprehensive evaluation process compared to traditional human-based assessments or automated metrics. We discuss the motivation behind this framework, its key components, and comparative advantages. We also present a probabilistic model to evaluate the error reduction achieved by iterative advocate systems. Finally, we outline experiments to validate the effectiveness of multi-advocate architectures and discuss future research directions.

Updated: 2024-10-24 21:42:20

标题: 通过迭代辩论对大型语言模型进行对抗多智能体评估

摘要: 本文探讨了使用大型语言模型（LLMs）评估其输出的最佳架构，使用LLMs本身。我们提出了一个新颖的框架，将LLMs解释为一个相互作用代理的集合中的倡导者，使它们能够通过法官和陪审团系统捍卫他们的答案并得出结论。与传统基于人类评估或自动化指标相比，这种方法提供了更动态和全面的评估过程。我们讨论了该框架背后的动机、其关键组成部分和比较优势。我们还提出了一个概率模型来评估迭代倡导系统实现的误差减少。最后，我们概述了验证多倡导者架构的有效性的实验，并讨论了未来的研究方向。

更新时间: 2024-10-24 21:42:20

领域: cs.CL,cs.LG,cs.MA

下载: http://arxiv.org/abs/2410.04663v2

Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization

Despite the remarkable performance of generative large language models (LLMs) on abstractive summarization, they face two significant challenges: their considerable size and tendency to hallucinate. Hallucinations are concerning because they erode reliability and raise safety issues. Pruning is a technique that reduces model size by removing redundant weights, enabling more efficient sparse inference. Pruned models yield downstream task performance comparable to the original, making them ideal alternatives when operating on a limited budget. However, the effect that pruning has upon hallucinations in abstractive summarization with LLMs has yet to be explored. In this paper, we provide an extensive empirical study across five summarization datasets, two state-of-the-art pruning methods, and five instruction-tuned LLMs. Surprisingly, we find that hallucinations are less prevalent from pruned LLMs than the original models. Our analysis suggests that pruned models tend to depend more on the source document for summary generation. This leads to a higher lexical overlap between the generated summary and the source document, which could be a reason for the reduction in hallucination risk.

Updated: 2024-10-24 21:40:00

标题: 研究修剪后的大型语言模型在抽象总结中的幻觉

摘要: 尽管生成式大型语言模型（LLMs）在摘要概括方面表现出色，但它们面临两个重大挑战：其巨大体量和易出现幻觉的倾向。幻觉令人担忧，因为它们会削弱可靠性并引发安全问题。修剪是一种通过删除冗余权重来减小模型尺寸的技术，从而实现更有效的稀疏推理。修剪后的模型产生的下游任务性能与原始模型相当，使其成为在有限预算下操作时的理想选择。然而，修剪对LLMs在摘要概括中幻觉的影响尚未被探讨。在本文中，我们在五个摘要数据集、两种最先进的修剪方法和五个经过指导调整的LLMs上进行了广泛的实证研究。令人惊讶的是，我们发现修剪后的LLMs比原始模型更少出现幻觉。我们的分析表明，修剪后的模型更倾向于依赖源文档进行摘要生成。这导致生成的摘要与源文档之间存在更高的词汇重叠，这可能是减少幻觉风险的原因。

更新时间: 2024-10-24 21:40:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.09335v3

Large Language Models Can Be Contextual Privacy Protection Learners

The proliferation of Large Language Models (LLMs) has driven considerable interest in fine-tuning them with domain-specific data to create specialized language models. Nevertheless, such domain-specific fine-tuning data often contains contextually sensitive personally identifiable information (PII). Direct fine-tuning of LLMs on this data without privacy protection poses a risk of data leakage of sensitive PII during inference time. To address this challenge, we introduce Contextual Privacy Protection Language Models (CPPLM), a novel paradigm for fine-tuning LLMs that effectively injects domain-specific knowledge while safeguarding inference-time data privacy. Our work offers a theoretical analysis for model design and benchmarks various techniques such as corpus curation, penalty-based unlikelihood in training loss, instruction-based tuning, etc. Extensive experiments across diverse datasets and scenarios demonstrate the effectiveness of our approaches. In particular, instruction tuning with both positive and negative examples stands out as a promising method, effectively protecting private data while enhancing the model's knowledge. Our work underscores the potential for Large Language Models as robust contextual privacy protection learners. The complete code and data for the work can be found at https://github.com/Yijia-Xiao/PPLM.

Updated: 2024-10-24 21:36:36

标题: 大型语言模型可以作为上下文隐私保护学习者

摘要: 大型语言模型（LLMs）的泛滥已经引起了人们对使用特定领域数据对其进行微调，以创建专门化语言模型的极大兴趣。然而，这种特定领域微调数据通常包含上下文敏感的个人可识别信息（PII）。在没有隐私保护的情况下直接在此数据上对LLMs进行微调会在推理时造成敏感PII的数据泄露风险。为了解决这一挑战，我们引入了上下文隐私保护语言模型（CPPLM），这是一种新颖的微调LLMs的范例，有效地注入领域特定知识同时保护推理时的数据隐私。我们的工作为模型设计提供了理论分析，并基准测试了各种技术，如语料库筛选、训练损失中基于惩罚的不可能性、基于指令的微调等。在各种数据集和场景上进行的广泛实验表明了我们方法的有效性。特别是，使用正面和负面示例进行指令微调是一种有前途的方法，有效保护私人数据同时增强模型知识。我们的工作强调了大型语言模型作为强大的上下文隐私保护学习者的潜力。该工作的完整代码和数据可在https://github.com/Yijia-Xiao/PPLM找到。

更新时间: 2024-10-24 21:36:36

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.02469v2

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

The ability to comprehend audio--which includes speech, non-speech sounds, and music--is crucial for AI agents to interact effectively with the world. We present MMAU, a novel benchmark designed to evaluate multimodal audio understanding models on tasks requiring expert-level knowledge and complex reasoning. MMAU comprises 10k carefully curated audio clips paired with human-annotated natural language questions and answers spanning speech, environmental sounds, and music. It includes information extraction and reasoning questions, requiring models to demonstrate 27 distinct skills across unique and challenging tasks. Unlike existing benchmarks, MMAU emphasizes advanced perception and reasoning with domain-specific knowledge, challenging models to tackle tasks akin to those faced by experts. We assess 18 open-source and proprietary (Large) Audio-Language Models, demonstrating the significant challenges posed by MMAU. Notably, even the most advanced Gemini Pro v1.5 achieves only 52.97% accuracy, and the state-of-the-art open-source Qwen2-Audio achieves only 52.50%, highlighting considerable room for improvement. We believe MMAU will drive the audio and multimodal research community to develop more advanced audio understanding models capable of solving complex audio tasks.

Updated: 2024-10-24 21:20:10

标题: MMAU：一个大规模多任务音频理解和推理基准

摘要: 我们提出了MMAU，这是一个新颖的基准，旨在评估多模态音频理解模型在需要专家级知识和复杂推理的任务上的表现。MMAU包括10,000个精心策划的音频剪辑，配对人工注释的自然语言问题和答案，涵盖了语音、环境声音和音乐。它包括信息提取和推理问题，要求模型在独特且具有挑战性的任务中展示27种不同的技能。与现有基准不同，MMAU强调具有领域特定知识的高级感知和推理，挑战模型解决类似专家所面临的任务。我们评估了18个开源和专有（大型）音频-语言模型，展示了MMAU带来的重大挑战。值得注意的是，即使是最先进的Gemini Pro v1.5也仅达到52.97%的准确率，而最先进的开源Qwen2-Audio仅达到52.50%，突出了改进的巨大空间。我们相信MMAU将推动音频和多模态研究社区开发更先进的音频理解模型，能够解决复杂的音频任务。

更新时间: 2024-10-24 21:20:10

领域: eess.AS,cs.AI,cs.CL,cs.SD

下载: http://arxiv.org/abs/2410.19168v1

On the Expressive Power of Tree-Structured Probabilistic Circuits

Probabilistic circuits (PCs) have emerged as a powerful framework to compactly represent probability distributions for efficient and exact probabilistic inference. It has been shown that PCs with a general directed acyclic graph (DAG) structure can be understood as a mixture of exponentially (in its height) many components, each of which is a product distribution over univariate marginals. However, existing structure learning algorithms for PCs often generate tree-structured circuits or use tree-structured circuits as intermediate steps to compress them into DAG-structured circuits. This leads to the intriguing question of whether there exists an exponential gap between DAGs and trees for the PC structure. In this paper, we provide a negative answer to this conjecture by proving that, for $n$ variables, there exists a quasi-polynomial upper bound $n^{O(\log n)}$ on the size of an equivalent tree computing the same probability distribution. On the other hand, we also show that given a depth restriction on the tree, there is a super-polynomial separation between tree and DAG-structured PCs. Our work takes an important step towards understanding the expressive power of tree-structured PCs, and our techniques may be of independent interest in the study of structure learning algorithms for PCs.

Updated: 2024-10-24 21:15:42

标题: 关于树结构概率电路的表达能力

摘要: 概率电路（PCs）已经成为一种强大的框架，用于紧凑地表示概率分布，以实现高效和准确的概率推断。已经证明，具有一般有向无环图（DAG）结构的PCs可以被理解为指数级（在其高度上）的许多组成部分的混合体，其中每个组成部分都是单变量边缘的乘积分布。然而，现有的PCs结构学习算法通常生成树状电路或使用树状电路作为中间步骤将其压缩为DAG结构电路。这引发了一个有趣的问题，即PC结构中是否存在DAG和树之间的指数差距。在本文中，我们通过证明，在$n$个变量的情况下，存在一个等效树的大小的拟多项式上界$n^{O(\log n)}$来否定这一猜想，这个树计算相同的概率分布。另一方面，我们还展示了，在树上施加深度限制时，树和DAG结构的PCs之间存在超多项式的分离。我们的工作在理解树状PCs的表达能力方面迈出了重要的一步，我们的技术在PCs的结构学习算法研究中可能具有独立的兴趣。

更新时间: 2024-10-24 21:15:42

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05465v2

Adversarial Attacks on Large Language Models Using Regularized Relaxation

As powerful Large Language Models (LLMs) are now widely used for numerous practical applications, their safety is of critical importance. While alignment techniques have significantly improved overall safety, LLMs remain vulnerable to carefully crafted adversarial inputs. Consequently, adversarial attack methods are extensively used to study and understand these vulnerabilities. However, current attack methods face significant limitations. Those relying on optimizing discrete tokens suffer from limited efficiency, while continuous optimization techniques fail to generate valid tokens from the model's vocabulary, rendering them impractical for real-world applications. In this paper, we propose a novel technique for adversarial attacks that overcomes these limitations by leveraging regularized gradients with continuous optimization methods. Our approach is two orders of magnitude faster than the state-of-the-art greedy coordinate gradient-based method, significantly improving the attack success rate on aligned language models. Moreover, it generates valid tokens, addressing a fundamental limitation of existing continuous optimization methods. We demonstrate the effectiveness of our attack on five state-of-the-art LLMs using four datasets.

Updated: 2024-10-24 21:01:45

标题: 使用正则化松弛对大型语言模型进行对抗攻击

摘要: 随着强大的大型语言模型（LLMs）在许多实际应用中被广泛使用，它们的安全性至关重要。虽然对齐技术显著提高了整体安全性，但LLMs仍然容易受到精心制作的对抗性输入的攻击。因此，对抗攻击方法被广泛应用于研究和理解这些漏洞。然而，目前的攻击方法面临显著的局限性。那些依赖于优化离散标记的方法效率有限，而连续优化技术则无法从模型的词汇表中生成有效的标记，使它们在实际应用中变得不切实际。在本文中，我们提出了一种新颖的对抗攻击技术，通过利用连续优化方法的正则化梯度来克服这些局限性。我们的方法比最先进的贪婪坐标梯度方法快两个数量级，显著提高了对齐语言模型的攻击成功率。此外，它生成有效的标记，解决了现有连续优化方法的一个根本局限。我们使用四个数据集展示了我们的攻击对五种最先进的LLMs的有效性。

更新时间: 2024-10-24 21:01:45

领域: cs.LG,cs.AI,cs.CL,cs.CR,I.2.7

下载: http://arxiv.org/abs/2410.19160v1

EchoApex: A General-Purpose Vision Foundation Model for Echocardiography

Quantitative evaluation of echocardiography is essential for precise assessment of cardiac condition, monitoring disease progression, and guiding treatment decisions. The diverse nature of echo images, including variations in probe types, manufacturers, and pathologies, poses challenges for developing artificial intelligent models that can generalize across different clinical practice. We introduce EchoApex, the first general-purpose vision foundation model echocardiography with applications on a variety of clinical practice. Leveraging self-supervised learning, EchoApex is pretrained on over 20 million echo images from 11 clinical centres. By incorporating task-specific decoders and adapter modules, we demonstrate the effectiveness of EchoApex on 4 different kind of clinical applications with 28 sub-tasks, including view classification, interactive structure segmentation, left ventricle hypertrophy detection and automated ejection fraction estimation from view sequences. Compared to state-of-the-art task-specific models, EchoApex attains improved performance with a unified image encoding architecture, demonstrating the benefits of model pretraining at scale with in-domain data. Furthermore, EchoApex illustrates the potential for developing a general-purpose vision foundation model tailored specifically for echocardiography, capable of addressing a diverse range of clinical applications with high efficiency and efficacy.

Updated: 2024-10-24 20:57:00

标题: EchoApex：一种用于超声心动图的通用视觉基础模型

摘要: 定量评估超声心动图对于准确评估心脏状况、监测疾病进展和指导治疗决策至关重要。超声图像的多样性，包括探头类型、制造商和病理变化的差异，为开发能够在不同临床实践中泛化的人工智能模型带来挑战。我们引入了EchoApex，这是第一个通用视觉基础模型超声心动图，可应用于各种临床实践。通过利用自监督学习，EchoApex在来自11个临床中心的超过2000万个超声图像上进行了预训练。通过整合任务特定的解码器和适配器模块，我们展示了EchoApex在4种不同类型的临床应用中的有效性，涵盖了28个子任务，包括视图分类、交互式结构分割、左心室肥厚检测以及从视图序列中自动估算射血分数。与最先进的任务特定模型相比，EchoApex通过统一的图像编码架构实现了改进的性能，展示了在领域内数据规模下进行模型预训练的好处。此外，EchoApex展示了开发专门针对超声心动图的通用视觉基础模型的潜力，能够高效有效地应对各种临床应用。

更新时间: 2024-10-24 20:57:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.11092v3

Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use

Adverse Drug Reactions (ADRs) from psychiatric medications are the leading cause of hospitalizations among mental health patients. With healthcare systems and online communities facing limitations in resolving ADR-related issues, Large Language Models (LLMs) have the potential to fill this gap. Despite the increasing capabilities of LLMs, past research has not explored their capabilities in detecting ADRs related to psychiatric medications or in providing effective harm reduction strategies. To address this, we introduce the Psych-ADR benchmark and the Adverse Drug Reaction Response Assessment (ADRA) framework to systematically evaluate LLM performance in detecting ADR expressions and delivering expert-aligned mitigation strategies. Our analyses show that LLMs struggle with understanding the nuances of ADRs and differentiating between types of ADRs. While LLMs align with experts in terms of expressed emotions and tone of the text, their responses are more complex, harder to read, and only 70.86% aligned with expert strategies. Furthermore, they provide less actionable advice by a margin of 12.32% on average. Our work provides a comprehensive benchmark and evaluation framework for assessing LLMs in strategy-driven tasks within high-risk domains.

Updated: 2024-10-24 20:49:22

标题: 无法找到的生活经验：LLMs在与专家一致处理精神药物使用引起的不良药物反应方面的挣扎

摘要: 来自精神药物的不良药物反应（ADRs）是精神健康患者住院的主要原因。随着医疗系统和在线社区在解决与ADR相关的问题方面面临限制，大型语言模型（LLMs）有填补这一空白的潜力。尽管LLMs的能力不断增强，过去的研究并未探索它们在检测与精神药物相关的ADR或提供有效的危害减少策略方面的能力。为了解决这个问题，我们引入了Psych-ADR基准和Adverse Drug Reaction Response Assessment（ADRA）框架，系统评估LLM在检测ADR表达和提供专家对齐的减轻策略方面的表现。我们的分析显示，LLMs在理解ADR的微妙之处和区分ADR类型方面存在困难。虽然LLMs在表达情绪和文本语气方面与专家保持一致，但它们的反应更复杂、更难阅读，且仅有70.86%与专家策略保持一致。此外，它们提供的可行建议平均少了12.32%。我们的工作为在高风险领域内的策略驱动任务中评估LLMs提供了全面的基准和评估框架。

更新时间: 2024-10-24 20:49:22

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2410.19155v1

Cross Spline Net and a Unified World

In today's machine learning world for tabular data, XGBoost and fully connected neural network (FCNN) are two most popular methods due to their good model performance and convenience to use. However, they are highly complicated, hard to interpret, and can be overfitted. In this paper, we propose a new modeling framework called cross spline net (CSN) that is based on a combination of spline transformation and cross-network (Wang et al. 2017, 2021). We will show CSN is as performant and convenient to use, and is less complicated, more interpretable and robust. Moreover, the CSN framework is flexible, as the spline layer can be configured differently to yield different models. With different choices of the spline layer, we can reproduce or approximate a set of non-neural network models, including linear and spline-based statistical models, tree, rule-fit, tree-ensembles (gradient boosting trees, random forest), oblique tree/forests, multi-variate adaptive regression spline (MARS), SVM with polynomial kernel, etc. Therefore, CSN provides a unified modeling framework that puts the above set of non-neural network models under the same neural network framework. By using scalable and powerful gradient descent algorithms available in neural network libraries, CSN avoids some pitfalls (such as being ad-hoc, greedy or non-scalable) in the case-specific optimization methods used in the above non-neural network models. We will use a special type of CSN, TreeNet, to illustrate our point. We will compare TreeNet with XGBoost and FCNN to show the benefits of TreeNet. We believe CSN will provide a flexible and convenient framework for practitioners to build performant, robust and more interpretable models.

Updated: 2024-10-24 20:45:48

标题: 交叉样条网格与一个统一的世界

摘要: 在当今的表格数据机器学习领域中，XGBoost和全连接神经网络（FCNN）是两种最受欢迎的方法，因为它们具有良好的模型性能并且易于使用。然而，它们非常复杂，难以解释，并且可能会过拟合。在本文中，我们提出了一个基于样条变换和交叉网络（Wang等人，2017年，2021年）组合的新建模框架，称为交叉样条网络（CSN）。我们将展示CSN具有与XGBoost和FCNN相当的性能和便利性，但更简单、更易解释和更健壮。此外，CSN框架是灵活的，因为样条层可以配置不同以生成不同的模型。通过选择不同的样条层，我们可以再现或近似一组非神经网络模型，包括线性和基于样条的统计模型、树、规则拟合、树集成（梯度提升树、随机森林）、斜树/森林、多变量自适应回归样条（MARS）、多项式核支持向量机等。因此，CSN提供了一个统一的建模框架，将上述一组非神经网络模型置于同一神经网络框架下。通过使用神经网络库中可扩展和强大的梯度下降算法，CSN避免了上述非神经网络模型中使用的特定优化方法的一些缺陷（如临时性、贪婪性或不可扩展性）。我们将使用一种特殊类型的CSN，TreeNet，来说明我们的观点。我们将比较TreeNet与XGBoost和FCNN，以展示TreeNet的好处。我们相信CSN将为从业者提供一个灵活和便利的框架，用于构建性能良好、健壮且更易解释的模型。

更新时间: 2024-10-24 20:45:48

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.19154v1

Learning Coupled Subspaces for Multi-Condition Spike Data

In neuroscience, researchers typically conduct experiments under multiple conditions to acquire neural responses in the form of high-dimensional spike train datasets. Analysing high-dimensional spike data is a challenging statistical problem. To this end, Gaussian process factor analysis (GPFA), a popular class of latent variable models has been proposed. GPFA extracts smooth, low-dimensional latent trajectories underlying high-dimensional spike train datasets. However, such analyses are often done separately for each experimental condition, contrary to the nature of neural datasets, which contain recordings under multiple experimental conditions. Exploiting the parametric nature of these conditions, we propose a multi-condition GPFA model and inference procedure to learn the underlying latent structure in the corresponding datasets in sample-efficient manner. In particular, we propose a non-parametric Bayesian approach to learn a smooth tuning function over the experiment condition space. Our approach not only boosts model accuracy and is faster, but also improves model interpretability compared to approaches that separately fit models for each experimental condition.

Updated: 2024-10-24 20:44:28

标题: 学习用于多条件尖峰数据的耦合子空间

摘要: 在神经科学中，研究人员通常在多个条件下进行实验，以获取高维度脉冲列数据的神经响应。分析高维度脉冲数据是一个具有挑战性的统计问题。为此，提出了高斯过程因子分析（GPFA），这是一种流行的潜变量模型类。GPFA提取了高维度脉冲列数据下潜在的平滑、低维度的轨迹。然而，这样的分析通常是针对每个实验条件单独进行的，与神经数据集的本质相悖，后者包含在多个实验条件下的记录。利用这些条件的参数化特性，我们提出了一个多条件GPFA模型和推断过程，以一种有效的方式学习对应数据集中的潜在结构。具体来说，我们提出了一种非参数贝叶斯方法，以学习实验条件空间上的平滑调节函数。我们的方法不仅提高了模型的准确性和速度，而且与为每个实验条件单独拟合模型的方法相比，提高了模型的可解释性。

更新时间: 2024-10-24 20:44:28

领域: cs.LG

下载: http://arxiv.org/abs/2410.19153v1

Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

We propose a class of structured diffusion models, in which the prior distribution is chosen as a mixture of Gaussians, rather than a standard Gaussian distribution. The specific mixed Gaussian distribution, as prior, can be chosen to incorporate certain structured information of the data. We develop a simple-to-implement training procedure that smoothly accommodates the use of mixed Gaussian as prior. Theory is provided to quantify the benefits of our proposed models, compared to the classical diffusion models. Numerical experiments with synthetic, image and operational data are conducted to show comparative advantages of our model. Our method is shown to be robust to mis-specifications and in particular suits situations where training resources are limited or faster training in real time is desired.

Updated: 2024-10-24 20:34:06

标题: 使用混合高斯作为先验分布的结构化扩散模型

摘要: 我们提出了一类结构扩散模型，其中先验分布选择为高斯混合分布，而不是标准高斯分布。具体的混合高斯分布可以被选定为先验，以包含数据的特定结构信息。我们开发了一个简单实施的训练程序，可以平滑地适应混合高斯作为先验的使用。理论提供了对我们提出的模型相比于经典扩散模型的好处进行量化。通过对合成、图像和实际数据进行数值实验，展示了我们模型的比较优势。我们的方法被证明对规格不准确是稳健的，特别适用于训练资源有限或需要实时更快速训练的情况。

更新时间: 2024-10-24 20:34:06

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.19149v1

Least Squares Regression Can Exhibit Under-Parameterized Double Descent

The relationship between the number of training data points, the number of parameters, and the generalization capabilities of models has been widely studied. Previous work has shown that double descent can occur in the over-parameterized regime and that the standard bias-variance trade-off holds in the under-parameterized regime. These works provide multiple reasons for the existence of the peak. We postulate that the location of the peak depends on the technical properties of both the spectrum as well as the eigenvectors of the sample covariance. We present two simple examples that provably exhibit double descent in the under-parameterized regime and do not seem to occur for reasons provided in prior work.

Updated: 2024-10-24 20:32:20

标题: 最小二乘回归可能表现出欠参数化的双谷效应

摘要: 训练数据点数量、参数数量和模型泛化能力之间的关系已经被广泛研究。先前的研究表明，在过度参数化的情况下可能会发生双下降，而在不足参数化的情况下会出现标准的偏差-方差权衡。这些研究提供了存在峰值的多种原因。我们假设峰值的位置取决于样本协方差的谱和特征向量的技术属性。我们提出了两个简单的例子，证明在不足参数化的情况下确实存在双下降，并且不是由先前研究提供的原因造成的。

更新时间: 2024-10-24 20:32:20

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2305.14689v3

Functional Brain Network Identification in Opioid Use Disorder Using Machine Learning Analysis of Resting-State fMRI BOLD Signals

Understanding the neurobiology of opioid use disorder (OUD) using resting-state functional magnetic resonance imaging (rs-fMRI) may help inform treatment strategies to improve patient outcomes. Recent literature suggests temporal characteristics of rs-fMRI blood oxygenation level-dependent (BOLD) signals may offer complementary information to functional connectivity analysis. However, existing studies of OUD analyze BOLD signals using measures computed across all time points. This study, for the first time in the literature, employs data-driven machine learning (ML) modeling of rs-fMRI BOLD features representing multiple time points to identify region(s) of interest that differentiate OUD subjects from healthy controls (HC). Following the triple network model, we obtain rs-fMRI BOLD features from the default mode network (DMN), salience network (SN), and executive control network (ECN) for 31 OUD and 45 HC subjects. Then, we use the Boruta ML algorithm to identify statistically significant BOLD features that differentiate OUD from HC, identifying the DMN as the most salient functional network for OUD. Furthermore, we conduct brain activity mapping, showing heightened neural activity within the DMN for OUD. We perform 5-fold cross-validation classification (OUD vs. HC) experiments to study the discriminative power of functional network features with and without fusing demographic features. The DMN shows the most discriminative power, achieving mean AUC and F1 scores of 80.91% and 73.97%, respectively, when fusing BOLD and demographic features. Follow-up Boruta analysis using BOLD features extracted from the medial prefrontal cortex, posterior cingulate cortex, and left and right temporoparietal junctions reveals significant features for all four functional hubs within the DMN.

Updated: 2024-10-24 20:30:14

标题: 利用机器学习分析静息态fMRI BOLD信号在阿片类药物使用障碍中的功能性脑网络识别

摘要: 使用静息态功能性磁共振成像（rs-fMRI）了解阿片类药物使用障碍（OUD）的神经生物学可能有助于制定治疗策略，改善患者预后。最近的文献表明，rs-fMRI血氧水平依赖（BOLD）信号的时间特征可能提供补充信息以进行功能连接性分析。然而，现有的OUD研究使用在所有时间点上计算的测量来分析BOLD信号。本研究首次在文献中采用基于数据驱动的机器学习（ML）对代表多个时间点的rs-fMRI BOLD特征进行建模，以识别区域感兴趣，区分OUD受试者和健康对照组（HC）。按照三重网络模型，我们从默认模式网络（DMN）、显著性网络（SN）和执行控制网络（ECN）获取rs-fMRI BOLD特征，对31名OUD和45名HC受试者进行分析。然后，我们使用Boruta ML算法识别能够区分OUD和HC的具有统计显着性的BOLD特征，确定DMN为OUD最显著的功能网络。此外，我们进行了脑活动映射，显示OUD中DMN内的神经活动增强。我们进行了5倍交叉验证分类（OUD vs. HC）实验，研究功能网络特征在融合和不融合人口特征的情况下的区分能力。DMN显示出最具有区分力，当融合BOLD和人口特征时，平均AUC和F1得分分别达到80.91％和73.97％。随后的Boruta分析使用从前额皮质、后扣带皮质以及左右颞顶联合提取的BOLD特征，揭示了DMN内所有四个功能中枢的显著特征。

更新时间: 2024-10-24 20:30:14

领域: q-bio.NC,cs.LG

下载: http://arxiv.org/abs/2410.19147v1

Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant

We revisit knowledge-aware text-based visual question answering, also known as Text-KVQA, in the light of modern advancements in large multimodal models (LMMs), and make the following contributions: (i) We propose VisTEL - a principled approach to perform visual text entity linking. The proposed VisTEL module harnesses a state-of-the-art visual text recognition engine and the power of a large multimodal model to jointly reason using textual and visual context obtained using surrounding cues in the image to link the visual text entity to the correct knowledge base entity. (ii) We present KaLMA - a knowledge-aware large multimodal assistant that augments an LMM with knowledge associated with visual text entity in the image to arrive at an accurate answer. Further, we provide a comprehensive experimental analysis and comparison of our approach with traditional visual question answering, pre-large multimodal models, and large multimodal models, as well as prior top-performing approaches. Averaging over three splits of Text-KVQA, our proposed approach surpasses the previous best approach by a substantial 23.3% on an absolute scale and establishes a new state of the art. We make our implementation publicly available.

Updated: 2024-10-24 20:25:38

标题: 视觉文本的重要性：利用视觉文本实体知识增强文本-KVQA的大型多模态助手

摘要: 我们重新审视了基于知识的文本视觉问答，也被称为Text-KVQA，考虑到现代大型多模态模型（LMM）的进步，并做出以下贡献：(i) 我们提出了VisTEL - 一种执行视觉文本实体链接的原则性方法。所提出的VisTEL模块利用最先进的视觉文本识别引擎和大型多模态模型的能力，共同使用图像中周围线索获取的文本和视觉上下文进行推理，将视觉文本实体链接到正确的知识库实体。(ii) 我们提出了KaLMA - 一种知识感知的大型多模态助手，它通过与图像中的视觉文本实体相关联的知识来增强LMM，从而得出准确的答案。此外，我们对我们的方法与传统视觉问答、大型多模态模型以及之前表现最佳的方法进行了全面的实验分析和比较。在Text-KVQA的三个拆分上，我们提出的方法在绝对规模上比以前最佳方法提高了23.3%，并建立了一个新的技术水平。我们将我们的实现公开可用。

更新时间: 2024-10-24 20:25:38

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.19144v1

Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits

There has been significant recent interest in graph-based nearest neighbor search methods, many of which are centered on the construction of navigable graphs over high-dimensional point sets. A graph is navigable if we can successfully move from any starting node to any target node using a greedy routing strategy where we always move to the neighbor that is closest to the destination according to a given distance function. The complete graph is navigable for any point set, but the important question for applications is if sparser graphs can be constructed. While this question is fairly well understood in low-dimensions, we establish some of the first upper and lower bounds for high-dimensional point sets. First, we give a simple and efficient way to construct a navigable graph with average degree $O(\sqrt{n \log n })$ for any set of $n$ points, in any dimension, for any distance function. We compliment this result with a nearly matching lower bound: even under the Euclidean metric in $O(\log n)$ dimensions, a random point set has no navigable graph with average degree $O(n^{\alpha})$ for any $\alpha < 1/2$. Our lower bound relies on sharp anti-concentration bounds for binomial random variables, which we use to show that the near-neighborhoods of a set of random points do not overlap significantly, forcing any navigable graph to have many edges.

Updated: 2024-10-24 20:21:36

标题: 可导航图用于高维最近邻搜索：构造和限制

摘要: 最近对基于图的最近邻搜索方法引起了极大的兴趣，其中许多方法都集中在构建高维点集上可导航图的过程中。如果我们可以成功地从任何起始节点移动到任何目标节点，使用一种贪婪的路由策略，在这种策略中我们总是移动到最接近目标的邻居节点，根据给定的距离函数。对于任何点集来说，完整图都是可导航的，但对于应用程序来说，重要的问题是是否可以构建更稀疏的图。尽管在低维空间中这个问题已经比较清楚，我们为高维点集建立了一些首次的上限和下限。首先，我们提供了一种简单高效的方法，可以为任何维度中的任何距离函数和任何点集构建一个平均度为$O(\sqrt{n \log n })$的可导航图。我们用一个几乎匹配的下限来补充这个结果：即使在$O(\log n)$维的欧几里德度量下，一个随机点集也不具有平均度为$O(n^{\alpha})$的可导航图，对于任何$\alpha < 1/2$。我们的下限依赖于二项式随机变量的尖锐反集中边界，我们使用这些边界来展示一组随机点的近邻域没有显著重叠，迫使任何可导航图具有许多边。

更新时间: 2024-10-24 20:21:36

领域: cs.DS,cs.CG,cs.DB,cs.LG

下载: http://arxiv.org/abs/2405.18680v3

Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers

Benign overfitting refers to how over-parameterized neural networks can fit training data perfectly and generalize well to unseen data. While this has been widely investigated theoretically, existing works are limited to two-layer networks with fixed output layers, where only the hidden weights are trained. We extend the analysis to two-layer ReLU convolutional neural networks (CNNs) with fully trainable layers, which is closer to the practice. Our results show that the initialization scaling of the output layer is crucial to the training dynamics: large scales make the model training behave similarly to that with the fixed output, the hidden layer grows rapidly while the output layer remains largely unchanged; in contrast, small scales result in more complex layer interactions, the hidden layer initially grows to a specific ratio relative to the output layer, after which both layers jointly grow and maintain that ratio throughout training. Furthermore, in both settings, we provide nearly matching upper and lower bounds on the test errors, identifying the sharp conditions on the initialization scaling and signal-to-noise ratio (SNR) in which the benign overfitting can be achieved or not. Numerical experiments back up the theoretical results.

Updated: 2024-10-24 20:15:45

标题: 初始化很重要：关于具有完全可训练层的两层ReLU CNN的良性过拟合

摘要: 良性过拟合指的是过度参数化的神经网络可以完美拟合训练数据并且在未见数据上有很好的泛化能力。尽管这在理论上已被广泛研究，但现有的作品仅限于具有固定输出层的两层网络，只训练隐藏层权重。我们将分析扩展到具有完全可训练层的两层ReLU卷积神经网络（CNN），这更接近于实践。我们的结果显示，输出层的初始化缩放对训练动态至关重要：大规模使模型训练行为类似于具有固定输出的情况，隐藏层迅速增长而输出层基本保持不变；相反，小规模会导致更复杂的层间相互作用，隐藏层最初增长到与输出层的特定比例，之后两层共同增长并保持该比例在整个训练过程中。此外，在两种设置中，我们提供了几乎匹配的测试误差的上下界，确定了初始化缩放和信噪比（SNR）的尖锐条件，可以实现或不能实现良性过拟合。数值实验支持理论结果。

更新时间: 2024-10-24 20:15:45

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.19139v1

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

We introduce Videoshop, a training-free video editing algorithm for localized semantic edits. Videoshop allows users to use any editing software, including Photoshop and generative inpainting, to modify the first frame; it automatically propagates those changes, with semantic, spatial, and temporally consistent motion, to the remaining frames. Unlike existing methods that enable edits only through imprecise textual instructions, Videoshop allows users to add or remove objects, semantically change objects, insert stock photos into videos, etc. with fine-grained control over locations and appearance. We achieve this through image-based video editing by inverting latents with noise extrapolation, from which we generate videos conditioned on the edited image. Videoshop produces higher quality edits against 6 baselines on 2 editing benchmarks using 10 evaluation metrics.

Updated: 2024-10-24 20:10:30

标题: Videoshop：使用噪声外推扩散反演进行本地化语义视频编辑

摘要: 我们介绍了Videoshop，这是一个无需训练的用于局部语义编辑的视频编辑算法。Videoshop允许用户使用任何编辑软件，包括Photoshop和生成修补，对第一帧进行修改；它会自动将这些变化传播到其余帧，保持语义、空间和时间上的一致运动。与现有的仅通过不精确的文本说明进行编辑的方法不同，Videoshop允许用户添加或删除对象，对对象进行语义更改，将素材照片插入视频等，具有对位置和外观进行精细控制的能力。我们通过基于图像的视频编辑实现了这一点，通过对带有噪声外推的潜变量进行反演，从而生成基于编辑图像的视频。Videoshop在两个编辑基准上使用10个评估指标对6个基线进行比较，产生出更高质量的编辑结果。

更新时间: 2024-10-24 20:10:30

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.14617v3

Time-Varying Convex Optimization with $O(n)$ Computational Complexity

In this article, we consider the problem of unconstrained time-varying convex optimization, where the cost function changes with time. We provide an in-depth technical analysis of the problem and argue why freezing the cost at each time step and taking finite steps toward the minimizer is not the best tracking solution for this problem. We propose a set of algorithms that by taking into account the temporal variation of the cost aim to reduce the tracking error of the time-varying minimizer of the problem. The main contribution of our work is that our proposed algorithms only require the first-order derivatives of the cost function with respect to the decision variable. This approach significantly reduces computational cost compared to the existing algorithms, which use the inverse of the Hessian of the cost. Specifically, the proposed algorithms reduce the computational cost from $O(n^3)$ to $O(n)$ per timestep, where $n$ is the size of the decision variable. Avoiding the inverse of the Hessian also makes our algorithms applicable to non-convex optimization problems. We refer to these algorithms as $O(n)$-algorithms. These $O(n)$-algorithms are designed to solve the problem for different scenarios based on the available temporal information about the cost. We illustrate our results through various examples, including the solution of a model predictive control problem framed as a convex optimization problem with a streaming time-varying cost function.

Updated: 2024-10-24 20:09:31

标题: 具有$O(n)$计算复杂度的时变凸优化

摘要: 在这篇文章中，我们考虑了无约束的时变凸优化问题，其中成本函数随时间变化。我们对这个问题进行了深入的技术分析，并论证了为什么在每个时间步冻结成本并朝着最小化器方向采取有限步骤并不是这个问题的最佳跟踪解决方案。我们提出了一套算法，通过考虑成本的时间变化来减少问题的时变最小化器的跟踪误差。我们工作的主要贡献是，我们提出的算法只需要关于决策变量的成本函数的一阶导数。与使用成本Hessian的逆的现有算法相比，这种方法显著降低了计算成本。具体来说，所提出的算法将每个时间步的计算成本从$O(n^3)$降低到$O(n)$，其中$n$是决策变量的大小。避免Hessian的逆也使我们的算法适用于非凸优化问题。我们将这些算法称为$O(n)$-算法。这些$O(n)$-算法旨在根据成本的可用时间信息解决不同情景下的问题。我们通过各种例子来说明我们的结果，包括将作为一个带有流式时变成本函数的凸优化问题框架的模型预测控制问题的解决方案。

更新时间: 2024-10-24 20:09:31

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2410.15009v2

Context-Aware Trajectory Anomaly Detection

Trajectory anomaly detection is crucial for effective decision-making in urban and human mobility management. Existing methods of trajectory anomaly detection generally focus on training a trajectory generative model and evaluating the likelihood of reconstructing a given trajectory. However, previous work often lacks important contextual information on the trajectory, such as the agent's information (e.g., agent ID) or geographic information (e.g., Points of Interest (POI)), which could provide additional information on accurately capturing anomalous behaviors. To fill this gap, we propose a context-aware anomaly detection approach that models contextual information related to trajectories. The proposed method is based on a trajectory reconstruction framework guided by contextual factors such as agent ID and contextual POI embedding. The injection of contextual information aims to improve the performance of anomaly detection. We conducted experiments in two cities and demonstrated that the proposed approach significantly outperformed existing methods by effectively modeling contextual information. Overall, this paper paves a new direction for advancing trajectory anomaly detection.

Updated: 2024-10-24 20:09:13

标题: 上下文感知轨迹异常检测

摘要: 轨迹异常检测对于城市和人类移动管理中的有效决策至关重要。现有的轨迹异常检测方法通常侧重于训练轨迹生成模型并评估重建给定轨迹的可能性。然而，先前的工作通常缺乏关于轨迹的重要上下文信息，比如代理信息（例如，代理ID）或地理信息（例如，兴趣点（POI）），这些信息可以提供有关准确捕捉异常行为的额外信息。为了填补这一空白，我们提出了一种基于轨迹相关上下文信息的异常检测方法。所提出的方法基于一个由代理ID和上下文POI嵌入引导的轨迹重建框架。上下文信息的注入旨在提高异常检测的性能。我们在两个城市进行了实验，并证明所提出的方法通过有效建模上下文信息明显优于现有方法。总的来说，本文为推进轨迹异常检测开辟了新的方向。

更新时间: 2024-10-24 20:09:13

领域: cs.LG

下载: http://arxiv.org/abs/2410.19136v1

PDL: A Declarative Prompt Programming Language

Large language models (LLMs) have taken the world by storm by making many previously difficult uses of AI feasible. LLMs are controlled via highly expressive textual prompts and return textual answers. Unfortunately, this unstructured text as input and output makes LLM-based applications brittle. This motivates the rise of prompting frameworks, which mediate between LLMs and the external world. However, existing prompting frameworks either have a high learning curve or take away control over the exact prompts from the developer. To overcome this dilemma, this paper introduces the Prompt Declaration Language (PDL). PDL is a simple declarative data-oriented language that puts prompts at the forefront, based on YAML. PDL works well with many LLM platforms and LLMs. It supports writing interactive applications that call LLMs and tools, and makes it easy to implement common use-cases such as chatbots, RAG, or agents. We hope PDL will make prompt programming simpler, less brittle, and more enjoyable.

Updated: 2024-10-24 20:07:08

标题: PDL：一种声明式提示编程语言

摘要: 大型语言模型（LLMs）通过使以前难以实现的许多人工智能应用变得可行而风靡全球。LLMs通过高度表达性的文本提示进行控制，并返回文本答案。不幸的是，LLM基于无结构的文本输入和输出使得应用程序变得脆弱。这促使提示框架的兴起，这些框架在LLMs和外部世界之间进行调解。然而，现有的提示框架要么学习曲线陡峭，要么将对确切提示的控制从开发者手中夺走。为了克服这一困境，本文介绍了Prompt Declaration Language（PDL）。PDL是一种简单的声明性面向数据的语言，以YAML为基础，将提示置于前沿。PDL与许多LLM平台和LLMs配合良好。它支持编写调用LLMs和工具的交互式应用程序，并且易于实现诸如聊天机器人、RAG或代理等常见用例。我们希望PDL能使提示编程变得更简单、更不脆弱且更有趣。

更新时间: 2024-10-24 20:07:08

领域: cs.AI,cs.PL

下载: http://arxiv.org/abs/2410.19135v1

The Fallacy of Minimizing Cumulative Regret in the Sequential Task Setting

Online Reinforcement Learning (RL) is typically framed as the process of minimizing cumulative regret (CR) through interactions with an unknown environment. However, real-world RL applications usually involve a sequence of tasks, and the data collected in the first task is used to warm-start the second task. The performance of the warm-start policy is measured by simple regret (SR). While minimizing both CR and SR is generally a conflicting objective, previous research has shown that in stationary environments, both can be optimized in terms of the duration of the task, $T$. In practice, however, in real-world applications, human-in-the-loop decisions between tasks often results in non-stationarity. For instance, in clinical trials, scientists may adjust target health outcomes between implementations. Our results show that task non-stationarity leads to a more restrictive trade-off between CR and SR. To balance these competing goals, the algorithm must explore excessively, leading to a CR bound worse than the typical optimal rate of $T^{1/2}$. These findings are practically significant, indicating that increased exploration is necessary in non-stationary environments to accommodate task changes, impacting the design of RL algorithms in fields such as healthcare and beyond.

Updated: 2024-10-24 20:04:43

标题: 在顺序任务设置中最小化累积遗憾的谬论

摘要: 在线强化学习（RL）通常被构建为通过与未知环境的交互来最小化累积遗憾（CR）的过程。然而，现实世界中的RL应用通常涉及一系列任务，并且在第一个任务中收集的数据被用来启动第二个任务。启动策略的性能由简单遗憾（SR）衡量。尽管最小化CR和SR通常是一种冲突的目标，先前的研究表明，在稳态环境中，可以通过任务持续时间$T$来优化这两者。然而，在实践中，实际应用中，任务之间的人为决策往往导致非稳态性。例如，在临床试验中，科学家可能会在实施过程中调整目标健康结果。我们的结果表明，任务的非稳态性导致了CR和SR之间更为严格的权衡。为了平衡这些竞争目标，算法必须进行过多的探索，导致比典型的最佳速率$T^{1/2}$更差的CR界限。这些发现在实践中具有重要意义，表明在非稳态环境中需要增加探索以适应任务变化，影响了RL算法在医疗保健等领域的设计。

更新时间: 2024-10-24 20:04:43

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2403.10946v2

Model Collapse in the Self-Consuming Chain of Diffusion Finetuning: A Novel Perspective from Quantitative Trait Modeling

The success of generative models has reached a unique threshold where their outputs are indistinguishable from real data, leading to the inevitable contamination of future data collection pipelines with synthetic data. While their potential to generate infinite samples initially offers promise for reducing data collection costs and addressing challenges in data-scarce fields, the severe degradation in performance has been observed when iterative loops of training and generation occur -- known as ``model collapse.'' This paper explores a practical scenario in which a pretrained text-to-image diffusion model is finetuned using synthetic images generated from a previous iteration, a process we refer to as the ``Chain of Diffusion.'' We first demonstrate the significant degradation in image quality caused by this iterative process and identify the key factor driving this decline through rigorous empirical investigations. Drawing an analogy between the Chain of Diffusion and biological evolution, we then introduce a novel theoretical analysis based on quantitative trait modeling. Our theoretical analysis aligns with empirical observations of the generated images in the Chain of Diffusion. Finally, we propose Reusable Diffusion Finetuning (ReDiFine), a simple yet effective strategy inspired by genetic mutations. ReDiFine mitigates model collapse without requiring any hyperparameter tuning, making it a plug-and-play solution for reusable image generation.

Updated: 2024-10-24 20:03:46

标题: 扩散微调自我消耗链中的模型崩溃：从数量性状建模的新视角

摘要: 生成模型的成功已经达到了一个独特的阈值，它们的输出与真实数据无法区分，导致未来数据收集管道不可避免地受到合成数据的污染。虽然它们最初具有生成无限样本的潜力，从而为减少数据收集成本和解决数据匮乏领域的挑战提供了希望，但在训练和生成的迭代循环发生时观察到了性能严重下降，这被称为“模型崩溃”。本文探讨了一个实际场景，即使用从先前迭代生成的合成图像对预训练的文本到图像扩散模型进行微调，我们将这个过程称为“扩散链条”。我们首先展示了这种迭代过程导致的图像质量显著下降，并通过严格的经验调查确定了驱动这种下降的关键因素。然后，我们将扩散链与生物进化进行类比，并引入基于定量性状建模的新颖理论分析。我们的理论分析与扩散链中生成的图像的经验观察一致。最后，我们提出了可重复扩散微调（ReDiFine），这是一种受遗传突变启发的简单而有效的策略。ReDiFine可以缓解模型崩溃，而无需进行任何超参数调整，使其成为可重复图像生成的即插即用解决方案。

更新时间: 2024-10-24 20:03:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.17493v2

G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training

Recently, medical vision-language pre-training (VLP) has reached substantial progress to learn global visual representation from medical images and their paired radiology reports. However, medical imaging tasks in real world usually require finer granularity in visual features. These tasks include visual localization tasks (e.g., semantic segmentation, object detection) and visual grounding task. Yet, current medical VLP methods face challenges in learning these fine-grained features, as they primarily focus on brute-force alignment between image patches and individual text tokens for local visual feature learning, which is suboptimal for downstream dense prediction tasks. In this work, we propose a new VLP framework, named \textbf{G}lobal to \textbf{D}ense level representation learning (G2D) that achieves significantly improved granularity and more accurate grounding for the learned features, compared to existing medical VLP approaches. In particular, G2D learns dense and semantically-grounded image representations via a pseudo segmentation task parallel with the global vision-language alignment. Notably, generating pseudo segmentation targets does not incur extra trainable parameters: they are obtained on the fly during VLP with a parameter-free processor. G2D achieves superior performance across 6 medical imaging tasks and 25 diseases, particularly in semantic segmentation, which necessitates fine-grained, semantically-grounded image features. In this task, G2D surpasses peer models even when fine-tuned with just 1\% of the training data, compared to the 100\% used by these models. The code can be found in https://github.com/cheliu-computation/G2D-NeurIPS24/tree/main.

Updated: 2024-10-24 20:01:44

标题: G2D：通过视觉语言预训练从全局到密集的放射学表示学习

摘要: 最近，医学视觉语言预训练（VLP）已取得实质性进展，从医学图像及其配对的放射学报告中学习全局视觉表示。然而，在现实世界中，医学成像任务通常需要更精细的视觉特征。这些任务包括视觉定位任务（例如，语义分割，目标检测）和视觉定位任务。然而，当前的医学VLP方法在学习这些细粒度特征方面面临挑战，因为它们主要侧重于图像块与个别文本标记之间的蛮力对齐，用于本地视觉特征学习，这对下游密集预测任务来说是次优的。在这项工作中，我们提出了一个新的VLP框架，名为\textbf{G}lobal to \textbf{D}ense level representation learning（G2D），与现有的医学VLP方法相比，实现了显著改进的细粒度和更准确的特征定位。特别是，G2D通过与全局视觉语言对齐并行进行伪分割任务学习密集和语义基础的图像表示。值得注意的是，生成伪分割目标不会产生额外的可训练参数：它们在VLP过程中通过无参数处理器即时获得。G2D在6个医学成像任务和25种疾病中实现了优越的性能，特别是在语义分割方面，这需要细粒度、语义基础的图像特征。在这项任务中，与这些模型使用的100\%相比，即使仅使用1\%的训练数据进行微调，G2D也能超越同行模型。代码可以在https://github.com/cheliu-computation/G2D-NeurIPS24/tree/main找到。

更新时间: 2024-10-24 20:01:44

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.01522v4

VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models

Large language models (LLMs) often exhibit subtle yet distinctive characteristics in their outputs that users intuitively recognize, but struggle to quantify. These "vibes" - such as tone, formatting, or writing style - influence user preferences, yet traditional evaluations focus primarily on the single axis of correctness. We introduce VibeCheck, a system for automatically comparing a pair of LLMs by discovering identifying traits of a model ("vibes") that are well-defined, differentiating, and user-aligned. VibeCheck iteratively discover vibes from model outputs, then utilizes a panel of LLM judges to quantitatively measure the utility of each vibe. We validate that the vibes generated by VibeCheck align with those found in human discovery and run VibeCheck on pairwise preference data from real-world user conversations with llama-3-70b VS GPT-4. VibeCheck reveals that Llama has a friendly, funny, and somewhat controversial vibe. These vibes predict model identity with 80% accuracy and human preference with 61% accuracy. Lastly, we run VibeCheck on a variety of models and tasks including summarization, math, and captioning to provide insight into differences in model behavior. VibeCheck discovers vibes like Command X prefers to add concrete intros and conclusions when summarizing in comparison to TNGL, Llama-405b often overexplains its thought process on math problems compared to GPT-4o, and GPT-4 prefers to focus on the mood and emotions of the scene when captioning compared to Gemini-1.5-Flash. Code can be found at https://github.com/lisadunlap/VibeCheck

Updated: 2024-10-24 20:01:12

标题: VibeCheck：发现和量化大型语言模型中的定性差异

摘要: 大型语言模型（LLMs）在其输出中经常展现出微妙而独特的特征，用户直观地认识到，但很难量化。这些“氛围” - 如语调、格式或写作风格 - 影响用户偏好，然而传统评估主要侧重于正确性的单一轴线。我们引入了VibeCheck，这是一个系统，可以通过发现模型（“氛围”）的明确定义、区分和与用户对齐的特征，自动比较一对LLMs。VibeCheck通过从模型输出中发现氛围，然后利用LLM评委组对每种氛围的实用性进行定量测量。我们验证了VibeCheck生成的氛围与人类发现中发现的氛围一致，并在真实用户对话中使用llama-3-70b对GPT-4进行了成对偏好数据的VibeCheck。VibeCheck显示，Llama具有友好、幽默和有些争议的氛围。这些氛围以80%的准确率预测模型身份和61%的准确率预测人类偏好。最后，我们对各种模型和任务运行VibeCheck，包括摘要、数学和字幕，以揭示模型行为的差异。VibeCheck发现，与TNGL相比，Command X更喜欢在总结时添加具体的开头和结尾，Llama-405b在解决数学问题时经常过度解释其思维过程，而GPT-4则更喜欢在字幕中专注于场景的情绪和情感，而不是Gemini-1.5-Flash。代码可以在https://github.com/lisadunlap/VibeCheck找到。

更新时间: 2024-10-24 20:01:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.12851v2

Maximum a Posteriori Inference for Factor Graphs via Benders' Decomposition

Many Bayesian statistical inference problems come down to computing a maximum a-posteriori (MAP) assignment of latent variables. Yet, standard methods for estimating the MAP assignment do not have a finite time guarantee that the algorithm has converged to a fixed point. Previous research has found that MAP inference can be represented in dual form as a linear programming problem with a non-polynomial number of constraints. A Lagrangian relaxation of the dual yields a statistical inference algorithm as a linear programming problem. However, the decision as to which constraints to remove in the relaxation is often heuristic. We present a method for maximum a-posteriori inference in general Bayesian factor models that sequentially adds constraints to the fully relaxed dual problem using Benders' decomposition. Our method enables the incorporation of expressive integer and logical constraints in clustering problems such as must-link, cannot-link, and a minimum number of whole samples allocated to each cluster. Using this approach, we derive MAP estimation algorithms for the Bayesian Gaussian mixture model and latent Dirichlet allocation. Empirical results show that our method produces a higher optimal posterior value compared to Gibbs sampling and variational Bayes methods for standard data sets and provides certificate of convergence.

Updated: 2024-10-24 19:57:56

标题: 贝叶斯网络中基于本德斯分解的最大后验推断

摘要: 许多贝叶斯统计推断问题归结为计算潜变量的最大后验（MAP）赋值。然而，用于估计MAP赋值的标准方法并没有有限时间保证算法已经收敛到一个固定点。先前的研究发现，MAP推断可以表示为一个双重形式的线性规划问题，具有非多项式数量的约束。对双重松弛的拉格朗日乘子产生一个统计推断算法作为一个线性规划问题。然而，在放松中决定要移除哪些约束通常是启发式的。我们提出了一种方法，用于在一般贝叶斯因子模型中进行最大后验推断，该方法使用Benders'分解将约束逐步添加到完全松弛的双重问题中。我们的方法使得可以在聚类问题中加入具有表达能力的整数和逻辑约束，例如必链接、不能链接和每个簇分配的最小整体样本数。使用这种方法，我们推导出了贝叶斯高斯混合模型和潜在狄利克雷分配的MAP估计算法。实证结果表明，与Gibbs采样和变分Bayes方法相比，我们的方法对于标准数据集产生更高的最优后验值，并提供了收敛的证明。

更新时间: 2024-10-24 19:57:56

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.19131v1

Research on Key Technologies for Cross-Cloud Federated Training of Large Language Models

With the rapid development of natural language processing technology, large language models have demonstrated exceptional performance in various application scenarios. However, training these models requires significant computational resources and data processing capabilities. Cross-cloud federated training offers a new approach to addressing the resource bottlenecks of a single cloud platform, allowing the computational resources of multiple clouds to collaboratively complete the training tasks of large models. This study analyzes the key technologies of cross-cloud federated training, including data partitioning and distribution, communication optimization, model aggregation algorithms, and the compatibility of heterogeneous cloud platforms. Additionally, the study examines data security and privacy protection strategies in cross-cloud training, particularly the application of data encryption and differential privacy techniques. Through experimental validation, the proposed technical framework demonstrates enhanced training efficiency, ensured data security, and reduced training costs, highlighting the broad application prospects of cross-cloud federated training.

Updated: 2024-10-24 19:57:17

标题: 大型语言模型跨云联合训练的关键技术研究

摘要: 随着自然语言处理技术的快速发展，大型语言模型在各种应用场景中展现出卓越的性能。然而，训练这些模型需要大量的计算资源和数据处理能力。跨云联合训练提供了一种解决单一云平台资源瓶颈的新方法，允许多个云的计算资源共同完成大型模型的训练任务。本研究分析了跨云联合训练的关键技术，包括数据分区和分发、通信优化、模型聚合算法，以及异构云平台的兼容性。此外，研究还探讨了跨云训练中的数据安全和隐私保护策略，特别是数据加密和差分隐私技术的应用。通过实验验证，所提出的技术框架展示了增强的训练效率、确保数据安全以及降低训练成本，突显了跨云联合训练的广泛应用前景。

更新时间: 2024-10-24 19:57:17

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2410.19130v1

A spectral method for multi-view subspace learning using the product of projections

Multi-view data provides complementary information on the same set of observations, with multi-omics and multimodal sensor data being common examples. Analyzing such data typically requires distinguishing between shared (joint) and unique (individual) signal subspaces from noisy, high-dimensional measurements. Despite many proposed methods, the conditions for reliably identifying joint and individual subspaces remain unclear. We rigorously quantify these conditions, which depend on the ratio of the signal rank to the ambient dimension, principal angles between true subspaces, and noise levels. Our approach characterizes how spectrum perturbations of the product of projection matrices, derived from each view's estimated subspaces, affect subspace separation. Using these insights, we provide an easy-to-use and scalable estimation algorithm. In particular, we employ rotational bootstrap and random matrix theory to partition the observed spectrum into joint, individual, and noise subspaces. Diagnostic plots visualize this partitioning, providing practical and interpretable insights into the estimation performance. In simulations, our method estimates joint and individual subspaces more accurately than existing approaches. Applications to multi-omics data from colorectal cancer patients and nutrigenomic study of mice demonstrate improved performance in downstream predictive tasks.

Updated: 2024-10-24 19:51:55

标题: 一种使用投影积的多视角子空间学习的谱方法

摘要: 多视角数据提供了对同一组观测的互补信息，多组学和多模态传感器数据是常见的例子。分析这种数据通常需要区分在嘈杂、高维度测量中共享（联合）和独特（个体）信号子空间。尽管提出了许多方法，但可靠地识别联合和个体子空间的条件仍不清楚。我们严格量化了这些条件，这取决于信号秩与环境维度之比、真实子空间之间的主角度和噪声水平。我们的方法表征了从每个视图的估计子空间导出的投影矩阵乘积的频谱扰动如何影响子空间分离。利用这些见解，我们提供了一种易于使用和可扩展的估计算法。特别是，我们利用旋转自举和随机矩阵理论将观察到的频谱分为联合、个体和噪声子空间。诊断图可视化了这种分区，为估计性能提供了实用和可解释的见解。在模拟中，我们的方法比现有方法更准确地估计了联合和个体子空间。对结肠癌患者的多组学数据和小鼠的营养基因研究的应用展示了在下游预测任务中性能的提高。

更新时间: 2024-10-24 19:51:55

领域: stat.ML,cs.LG,math.ST,stat.CO,stat.ME,stat.TH

下载: http://arxiv.org/abs/2410.19125v1

Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

The proliferation of large language models (LLMs) has led to the adoption of Mixture-of-Experts (MoE) architectures that dynamically leverage specialized subnetworks for improved efficiency and performance. Despite their benefits, MoE models face significant challenges during inference, including inefficient memory management and suboptimal batching, due to misaligned design choices between the model architecture and the system policies. Furthermore, the conventional approach of training MoEs from scratch is increasingly prohibitive in terms of cost. In this paper, we propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models (in contrast to "upcycling" generalist MoEs), avoiding the high costs of ground-up training. Our approach employs activation sparsity to extract experts. To compose experts, we examine the widely-adopted layer-wise router design and show its redundancy, and thus we introduce the pre-gating router decoupled from the MoE backbone that facilitates system-friendly pre-computing and lookahead scheduling, enhancing expert-aware batching and caching. Our codesign therefore addresses critical gaps on both the algorithmic and system fronts, establishing a scalable and efficient alternative for LLM inference in resource-constrained settings. Read-ME outperforms other popular open-source dense models of similar scales, achieving improvements of up to 10.1% on MMLU, and improving mean end-to-end latency up to 6.1%. Codes are available at: https://github.com/VITA-Group/READ-ME.

Updated: 2024-10-24 19:48:51

标题: 阅读-ME：将LLM重新设计为与系统共同设计的路由器解耦专家混合模型

摘要: 大型语言模型（LLMs）的增加导致了专家混合（MoE）架构的采用，该架构动态利用专门的子网络以提高效率和性能。尽管MoE模型具有许多好处，在推断过程中面临着显著挑战，包括内存管理低效和批处理次优，这是由于模型架构与系统策略之间设计选择不一致导致的。此外，从头开始训练MoEs的传统方法在成本方面越来越具有禁止性。在本文中，我们提出了一个新颖的框架Read-ME，将预训练的密集LLMs转换为更小的MoE模型（与"升级"通用MoEs相对），避免了从头开始训练的高成本。我们的方法利用激活稀疏性来提取专家。为了组成专家，我们检查了广泛采用的逐层路由器设计，并展示了其冗余性，因此我们引入了与MoE主干分离的预门控路由器，有助于系统友好的预计算和前瞻调度，增强了专家感知的批处理和缓存。因此，我们的共同设计解决了算法和系统两方面的关键差距，在资源受限的环境中为LLM推断建立了可扩展和高效的替代方案。Read-ME在类似规模的其他流行开源密集模型上表现出色，MMLU的改进高达10.1％，平均端到端延迟提高了6.1％。代码可在以下链接获得：https://github.com/VITA-Group/READ-ME。

更新时间: 2024-10-24 19:48:51

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.19123v1

Loss Landscape Characterization of Neural Networks without Over-Parametrization

Optimization methods play a crucial role in modern machine learning, powering the remarkable empirical achievements of deep learning models. These successes are even more remarkable given the complex non-convex nature of the loss landscape of these models. Yet, ensuring the convergence of optimization methods requires specific structural conditions on the objective function that are rarely satisfied in practice. One prominent example is the widely recognized Polyak-Lojasiewicz (PL) inequality, which has gained considerable attention in recent years. However, validating such assumptions for deep neural networks entails substantial and often impractical levels of over-parametrization. In order to address this limitation, we propose a novel class of functions that can characterize the loss landscape of modern deep models without requiring extensive over-parametrization and can also include saddle points. Crucially, we prove that gradient-based optimizers possess theoretical guarantees of convergence under this assumption. Finally, we validate the soundness of our new function class through both theoretical analysis and empirical experimentation across a diverse range of deep learning models.

Updated: 2024-10-24 19:38:58

标题: 神经网络在不过度参数化的情况下的损失景观特征描述

摘要: 优化方法在现代机器学习中发挥着至关重要的作用，推动了深度学习模型的显著经验成就。这些成功尤其引人注目，因为这些模型的损失函数景观具有复杂的非凸特性。然而，确保优化方法收敛需要对目标函数具有特定的结构条件，而这些条件在实践中很少得到满足。一个著名的例子是广泛认可的Polyak-Lojasiewicz (PL)不等式，近年来受到了相当多的关注。然而，验证这种假设对于深度神经网络来说需要大量且常常不切实际的过度参数化。为了解决这一限制，我们提出了一种新的函数类，可以描述现代深度模型的损失景观，而无需过度参数化，并且还可以包括鞍点。关键是，我们证明了在这种假设下，基于梯度的优化器具有收敛的理论保证。最后，我们通过理论分析和对多种深度学习模型的实证实验，验证了我们新函数类的合理性。

更新时间: 2024-10-24 19:38:58

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2410.12455v3

LLM Tree Search

This project aims to investigate a novel sequence generation method inspired by the AlphaGo paradigm, adapting it for use with large language models (LLMs). The proposed approach involves creating search trees of different possible completions and evaluating these completions based on model confidence. By considering various paths in the search tree and scoring them according to the model's confidence in each completion, we can generate diverse and high-quality sequences. This research explores the implementation of this paradigm by using confidence as a proxy for response quality akin to beam search \citep{vijayakumar2016diverse}. The primary goal of this paper is to outline the paradigm and demonstrate its potential, rather than focusing on achieving perfect results. The paper will outline the reasons why we believe this paradigm has the potential to improve LLMs in the following manners: 1) increase output quality, 2) decrease errors, 3) eliminate or reduce the compound error problems, 4) generate diverse and creative completions, 5) allow for iterative problem-solving, and 6) self-training. We expect this approach to yield a set of diverse and coherent sequences, offering insights into balancing exploration and exploitation in sequence generation. Potential applications include creative text generation tasks, such as storytelling and content creation, as well as other natural language processing domains, like machine translation and automated summarization. The goal is that the model will be far more effective as it will be able to consider many possible variations allowing it to find the ideal completion. This research aims to contribute to the understanding of effective search strategies in sequence generation and their impact on generating high-quality, varied textual outputs.

Updated: 2024-10-24 19:38:50

标题: LLM树搜索

摘要: 这个项目旨在研究一种新颖的序列生成方法，受AlphaGo范式启发，将其调整为适用于大型语言模型（LLMs）。所提出的方法涉及创建不同可能完成的搜索树，并根据模型的置信度评估这些完成。通过考虑搜索树中的各种路径并根据模型对每个完成的置信度对其进行评分，我们可以生成多样化且高质量的序列。本研究探讨了使用置信度作为响应质量代理的实现方式，类似于波束搜索的方法。本文的主要目标是概述这一范式并展示其潜力，而不是专注于实现完美的结果。本文将概述我们认为这一范式有潜力改进LLMs的原因，包括：1）提高输出质量，2）减少错误，3）消除或减少复合错误问题，4）生成多样化和创意的完成，5）允许迭代问题解决，6）自我训练。我们期望这种方法能产生一组多样化且连贯的序列，为平衡探索和利用在序列生成中提供见解。潜在应用包括创造性文本生成任务，如讲故事和内容创作，以及其他自然语言处理领域，如机器翻译和自动摘要。目标是模型将更加有效，因为它能考虑许多可能的变化，从而找到理想的完成。本研究旨在为理解序列生成中有效搜索策略及其对生成高质量，多样化文本输出的影响做出贡献。

更新时间: 2024-10-24 19:38:50

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.19117v1

LanFL: Differentially Private Federated Learning with Large Language Models using Synthetic Samples

Federated Learning (FL) is a collaborative, privacy-preserving machine learning framework that enables multiple participants to train a single global model. However, the recent advent of powerful Large Language Models (LLMs) with tens to hundreds of billions of parameters makes the naive application of traditional FL methods to LLMs impractical due to high computational and communication costs. Furthermore, end users of LLMs often lack access to full architectures and weights of the models, making it impossible for participants to fine-tune these models directly. This paper introduces a novel FL scheme for LLMs, named LanFL, which is purely prompt-based and treats the underlying LLMs as black boxes. We have developed a differentially private synthetic sample generation mechanism to facilitate knowledge sharing among participants, along with a prompt optimization scheme that enables learning from synthetic samples. Our extensive experiments demonstrate that LanFL successfully facilitates learning among participants while preserving the privacy of local datasets across various tasks.

Updated: 2024-10-24 19:28:33

标题: LanFL：使用合成样本的大型语言模型实现差分隐私联邦学习

摘要: 联邦学习（FL）是一种协作的、隐私保护的机器学习框架，使多个参与者能够训练一个单一的全局模型。然而，最近出现的具有数十亿至数百亿参数的强大大型语言模型（LLMs）使传统FL方法在LLMs上的简单应用变得不切实际，因为计算和通信成本高。此外，LLMs的最终用户通常缺乏对模型的完整体系结构和权重的访问权限，这使得参与者无法直接对这些模型进行微调。本文介绍了一种针对LLMs的新型FL方案，名为LanFL，它完全基于提示，并将底层LLMs视为黑匣子。我们开发了一种差分私有合成样本生成机制，以促进参与者之间的知识共享，以及一个提示优化方案，使学习从合成样本中获益。我们的广泛实验证明，LanFL成功地促进了参与者之间的学习，同时保护了各种任务中本地数据集的隐私。

更新时间: 2024-10-24 19:28:33

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.19114v1

Distributed Blind Source Separation based on FastICA

With the emergence of wireless sensor networks (WSNs), many traditional signal processing tasks are required to be computed in a distributed fashion, without transmissions of the raw data to a centralized processing unit, due to the limited energy and bandwidth resources available to the sensors. In this paper, we propose a distributed independent component analysis (ICA) algorithm, which aims at identifying the original signal sources based on observations of their mixtures measured at various sensor nodes. One of the most commonly used ICA algorithms is known as FastICA, which requires a spatial pre-whitening operation in the first step of the algorithm. Such a pre-whitening across all nodes of a WSN is impossible in a bandwidth-constrained distributed setting as it requires to correlate each channel with each other channel in the WSN. We show that an explicit network-wide pre-whitening step can be circumvented by leveraging the properties of the so-called Distributed Adaptive Signal Fusion (DASF) framework. Despite the lack of such a network-wide pre-whitening, we can still obtain the $Q$ least Gaussian independent components of the centralized ICA solution, where $Q$ scales linearly with the required communication load.

Updated: 2024-10-24 19:27:05

标题: 基于FastICA的分布式盲源分离

摘要: 随着无线传感器网络（WSNs）的出现，许多传统的信号处理任务需要以分布式方式进行计算，而不需要将原始数据传输到集中处理单元，这是因为传感器可用的能源和带宽资源有限。在本文中，我们提出了一种分布式独立成分分析（ICA）算法，旨在根据在各个传感器节点测量到的它们混合物的观测来识别原始信号源。最常用的ICA算法之一被称为FastICA，该算法的第一步需要进行空间预白化操作。在带宽受限的分布式环境中，跨WSN的所有节点进行这样的预白化是不可能的，因为这需要将每个通道与WSN中的每个其他通道相关联。我们展示了通过利用所谓的分布式自适应信号融合（DASF）框架的特性，可以绕过显式的网络范围预白化步骤。尽管缺乏这样的网络范围预白化，我们仍然可以获得集中ICA解决方案的$Q$个最小高斯独立成分，其中$Q$随所需通信负载呈线性缩放。

更新时间: 2024-10-24 19:27:05

领域: eess.SP,cs.LG,cs.MA

下载: http://arxiv.org/abs/2410.19112v1

A Note on Shumailov et al. (2024): `AI Models Collapse When Trained on Recursively Generated Data'

The study conducted by Shumailov et al. (2024) demonstrates that repeatedly training a generative model on synthetic data leads to model collapse. This finding has generated considerable interest and debate, particularly given that current models have nearly exhausted the available data. In this work, we investigate the effects of fitting a distribution (through Kernel Density Estimation, or KDE) or a model to the data, followed by repeated sampling from it. Our objective is to develop a theoretical understanding of the phenomenon observed by Shumailov et al. (2024). Our results indicate that the outcomes reported are a statistical phenomenon and may be unavoidable.

Updated: 2024-10-24 19:23:50

标题: 关于Shumailov等人（2024年）的一则说明：`当人工智能模型在递归生成的数据上训练时会崩溃'

摘要: Shumailov等人（2024年）进行的研究表明，反复训练生成模型使用合成数据会导致模型崩溃。这一发现引起了广泛的兴趣和讨论，特别是考虑到当前模型几乎已经耗尽了可用数据。在这项工作中，我们研究了通过核密度估计（KDE）或模型拟合数据，然后反复从中抽样的效果。我们的目标是发展对Shumailov等人（2024年）观察到的现象的理论理解。我们的结果表明，报告的结果是一种统计现象，可能是不可避免的。

更新时间: 2024-10-24 19:23:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.12954v2

Bio2Token: All-atom tokenization of any biomolecular structure with Mamba

Efficient encoding and representation of large 3D molecular structures with high fidelity is critical for biomolecular design applications. Despite this, many representation learning approaches restrict themselves to modeling smaller systems or use coarse-grained approximations of the systems, for example modeling proteins at the resolution of amino acid residues rather than at the level of individual atoms. To address this, we develop quantized auto-encoders that learn atom-level tokenizations of complete proteins, RNA and small molecule structures with reconstruction accuracies below and around 1 Angstrom. We demonstrate that the Mamba state space model architecture employed is comparatively efficient, requiring a fraction of the training data, parameters and compute needed to reach competitive accuracies and can scale to systems with almost 100,000 atoms. The learned structure tokens of bio2token may serve as the input for all-atom language models in the future.

Updated: 2024-10-24 19:23:09

标题: Bio2Token: 使用Mamba进行任何生物分子结构的全原子标记化

摘要: 高保真度的高效编码和表示大型三维分子结构对生物分子设计应用至关重要。尽管如此，许多表示学习方法限制在建模较小系统或使用系统的粗粒近似，例如将蛋白质建模为氨基酸残基的分辨率，而不是个别原子的水平。为了解决这个问题，我们开发了量化自动编码器，学习完整蛋白质、RNA和小分子结构的原子级标记，重构精度在1埃以下和周围。我们展示了所采用的Mamba状态空间模型架构相对高效，只需要训练数据、参数和计算的一小部分就能达到竞争性精度，并且可以扩展到几乎具有10万个原子的系统。bio2token的学习结构标记可能在未来作为全原子语言模型的输入。

更新时间: 2024-10-24 19:23:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.19110v1

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Large language models (LLMs) show inherent brittleness in their safety mechanisms, as evidenced by their susceptibility to jailbreaking and even non-malicious fine-tuning. This study explores this brittleness of safety alignment by leveraging pruning and low-rank modifications. We develop methods to identify critical regions that are vital for safety guardrails, and that are disentangled from utility-relevant regions at both the neuron and rank levels. Surprisingly, the isolated regions we find are sparse, comprising about $3\%$ at the parameter level and $2.5\%$ at the rank level. Removing these regions compromises safety without significantly impacting utility, corroborating the inherent brittleness of the model's safety mechanisms. Moreover, we show that LLMs remain vulnerable to low-cost fine-tuning attacks even when modifications to the safety-critical regions are restricted. These findings underscore the urgent need for more robust safety strategies in LLMs.

Updated: 2024-10-24 19:21:52

标题: 评估安全对齐的脆弱性：通过修剪和低秩修改

摘要: 大型语言模型（LLMs）在其安全机制中表现出固有的脆弱性，这一点在它们易受越狱甚至非恶意微调的情况下得到证实。本研究通过修剪和低秩修改探索了这种安全对齐的脆弱性。我们开发了一种方法，用于识别对于安全护栏至关重要且与效用相关区域在神经元和秩级别上分离的关键区域。令人惊讶的是，我们发现的孤立区域是稀疏的，占参数级别约$3\%$和秩级别约$2.5\%$。移除这些区域会损害安全性而不会显著影响效用，证实了模型安全机制的固有脆弱性。此外，我们表明，即使限制对安全关键区域的修改，LLMs仍然容易受到低成本微调攻击的影响。这些发现强调了在LLMs中需要更加健壮的安全策略的紧迫性。

更新时间: 2024-10-24 19:21:52

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.05162v4

RSA-Control: A Pragmatics-Grounded Lightweight Controllable Text Generation Framework

Despite significant advancements in natural language generation, controlling language models to produce texts with desired attributes remains a formidable challenge. In this work, we introduce RSA-Control, a training-free controllable text generation framework grounded in pragmatics. RSA-Control directs the generation process by recursively reasoning between imaginary speakers and listeners, enhancing the likelihood that target attributes are correctly interpreted by listeners amidst distractors. Additionally, we introduce a self-adjustable rationality parameter, which allows for automatic adjustment of control strength based on context. Our experiments, conducted with two task types and two types of language models, demonstrate that RSA-Control achieves strong attribute control while maintaining language fluency and content consistency. Our code is available at https://github.com/Ewanwong/RSA-Control.

Updated: 2024-10-24 19:21:04

标题: RSA-Control：一种基于实用主义的轻量级可控文本生成框架

摘要: 尽管自然语言生成取得了重大进展，但控制语言模型以生成具有所需属性的文本仍然是一个巨大的挑战。在这项工作中，我们介绍了RSA-Control，这是一个基于语用学的无需训练的可控文本生成框架。RSA-Control通过在想象的发言者和听众之间进行递归推理来指导生成过程，增强目标属性被正确解释的可能性，而在干扰因素中。此外，我们引入了一个自动可调整的理性参数，它允许根据上下文自动调整控制强度。我们的实验使用两种任务类型和两种语言模型进行，结果表明RSA-Control在保持语言流畅性和内容一致性的同时实现了强大的属性控制。我们的代码可在https://github.com/Ewanwong/RSA-Control 上获得。

更新时间: 2024-10-24 19:21:04

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.19109v1

Wildfire Risk Prediction: A Review

Wildfires have significant impacts on global vegetation, wildlife, and humans. They destroy plant communities and wildlife habitats and contribute to increased emissions of carbon dioxide, nitrogen oxides, methane, and other pollutants. The prediction of wildfires relies on various independent variables combined with regression or machine learning methods. In this technical review, we describe the options for independent variables, data processing techniques, models, independent variables collinearity and importance estimation methods, and model performance evaluation metrics. First, we divide the independent variables into 4 aspects, including climate and meteorology conditions, socio-economical factors, terrain and hydrological features, and wildfire historical records. Second, preprocessing methods are described for different magnitudes, different spatial-temporal resolutions, and different formats of data. Third, the collinearity and importance evaluation methods of independent variables are also considered. Fourth, we discuss the application of statistical models, traditional machine learning models, and deep learning models in wildfire risk prediction. In this subsection, compared with other reviews, this manuscript particularly discusses the evaluation metrics and recent advancements in deep learning methods. Lastly, addressing the limitations of current research, this paper emphasizes the need for more effective deep learning time series forecasting algorithms, the utilization of three-dimensional data including ground and trunk fuel, extraction of more accurate historical fire point data, and improved model evaluation metrics.

Updated: 2024-10-24 19:20:41

标题: 野火风险预测：综述

摘要: 野火对全球植被、野生动物和人类产生重大影响。它们破坏植物群落和野生动物栖息地，并导致二氧化碳、氮氧化物、甲烷和其他污染物的排放增加。对野火的预测依赖于各种独立变量结合回归或机器学习方法。在这篇技术评论中，我们描述了独立变量、数据处理技术、模型、独立变量共线性和重要性评估方法以及模型性能评估指标的选项。首先，我们将独立变量分为4个方面，包括气候和气象条件、社会经济因素、地形和水文特征以及野火历史记录。其次，描述了不同幅度、不同时空分辨率和不同格式数据的预处理方法。第三，也考虑了独立变量的共线性和重要性评估方法。第四，我们讨论了统计模型、传统机器学习模型和深度学习模型在野火风险预测中的应用。在这个小节中，与其他评论相比，本文特别讨论了评估指标和深度学习方法的最新进展。最后，针对当前研究的局限性，本文强调了对更有效的深度学习时间序列预测算法、利用三维数据包括地面和树干燃料、提取更准确的历史火点数据以及改进模型评估指标的需求。

更新时间: 2024-10-24 19:20:41

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.01607v4

AIM: Automated Input Set Minimization for Metamorphic Security Testing

Although the security testing of Web systems can be automated by generating crafted inputs, solutions to automate the test oracle, i.e., vulnerability detection, remain difficult to apply in practice. Specifically, though previous work has demonstrated the potential of metamorphic testing, security failures can be determined by metamorphic relations that turn valid inputs into malicious inputs, metamorphic relations are typically executed on a large set of inputs, which is time-consuming and thus makes metamorphic testing impractical. We propose AIM, an approach that automatically selects inputs to reduce testing costs while preserving vulnerability detection capabilities. AIM includes a clustering-based black-box approach, to identify similar inputs based on their security properties. It also relies on a novel genetic algorithm to efficiently select diverse inputs while minimizing their total cost. Further, it contains a problem-reduction component to reduce the search space and speed up the minimization process. We evaluated the effectiveness of AIM on two well-known Web systems, Jenkins and Joomla, with documented vulnerabilities. We compared AIM's results with four baselines involving standard search approaches. Overall, AIM reduced metamorphic testing time by 84% for Jenkins and 82% for Joomla, while preserving the same level of vulnerability detection. Furthermore, AIM significantly outperformed all the considered baselines regarding vulnerability coverage.

Updated: 2024-10-24 19:13:48

标题: 目标：用于形态安全测试的自动输入集最小化

摘要: 尽管Web系统的安全测试可以通过生成精心设计的输入进行自动化，但自动化测试Oracle的解决方案，即漏洞检测，在实践中仍然难以应用。具体而言，尽管先前的工作已经展示了变态测试的潜力，安全失败可以通过将有效输入转换为恶意输入的变态关系来确定，但变态关系通常在大量输入上执行，这是耗时的，因此使变态测试变得不切实际。我们提出了AIM，一种自动选择输入以降低测试成本同时保留漏洞检测能力的方法。AIM包括基于聚类的黑盒方法，根据它们的安全属性识别相似的输入。它还依赖于一种新颖的遗传算法，以高效地选择多样化的输入，同时最小化它们的总成本。此外，它包含一个问题缩减组件，以减少搜索空间并加速最小化过程。我们在两个著名的Web系统Jenkins和Joomla上评估了AIM的有效性，这两个系统都有记录的漏洞。我们将AIM的结果与涉及标准搜索方法的四个基线进行了比较。总的来说，对于Jenkins，AIM将变态测试时间减少了84%，对于Joomla，减少了82%，同时保持了相同水平的漏洞检测。此外，就漏洞覆盖率而言，AIM在所有考虑的基线中表现出色。

更新时间: 2024-10-24 19:13:48

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2402.10773v4

Conditional diffusions for neural posterior estimation

Neural posterior estimation (NPE), a simulation-based computational approach for Bayesian inference, has shown great success in situations where posteriors are intractable or likelihood functions are treated as "black boxes." Existing NPE methods typically rely on normalizing flows, which transform a base distributions into a complex posterior by composing many simple, invertible transformations. But flow-based models, while state of the art for NPE, are known to suffer from several limitations, including training instability and sharp trade-offs between representational power and computational cost. In this work, we demonstrate the effectiveness of conditional diffusions as an alternative to normalizing flows for NPE. Conditional diffusions address many of the challenges faced by flow-based methods. Our results show that, across a highly varied suite of benchmarking problems for NPE architectures, diffusions offer improved stability, superior accuracy, and faster training times, even with simpler, shallower models. These gains persist across a variety of different encoder or "summary network" architectures, as well as in situations where no summary network is required. The code will be publicly available at \url{https://github.com/TianyuCodings/cDiff}.

Updated: 2024-10-24 19:13:13

标题: 神经后验估计的条件扩散

摘要: 神经后验估计（NPE）是一种基于仿真的计算方法，用于贝叶斯推断，在后验不可解析或似然函数被视为“黑匣子”的情况下已经取得了巨大成功。现有的NPE方法通常依赖于正规化流，通过组合许多简单的可逆变换将基本分布转换为复杂的后验。但是，虽然正规化流对于NPE来说是最先进的，但已知存在一些限制，包括训练不稳定性和表征能力与计算成本之间的尖锐权衡。在这项工作中，我们展示了条件扩散作为NPE的替代方法的有效性。条件扩散解决了正规化流方法面临的许多挑战。我们的结果表明，在NPE架构的一系列基准问题中，扩散提供了更好的稳定性、更高的准确性和更快的训练时间，即使使用更简单、更浅层的模型也是如此。这些收益在各种不同的编码器或“摘要网络”架构中持续存在，甚至在不需要摘要网络的情况下也是如此。代码将在\url{https://github.com/TianyuCodings/cDiff}上公开提供。

更新时间: 2024-10-24 19:13:13

领域: stat.ML,cs.AI,cs.LG,stat.AP

下载: http://arxiv.org/abs/2410.19105v1

TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction

Large language models (LLMs) have revolutionized natural language processing, albeit at the cost of immense memory and computation requirements. Post-training quantization (PTQ) is becoming the de facto method to reduce the memory footprint and improve the inference throughput of LLMs. In this work, we aim to push the upper limit of LLM PTQ by optimizing the weight rounding parameters with the block reconstruction technique, a predominant method in previous vision models. We propose TesseraQ, a new state-of-the-art PTQ technique, to quantize the weights of LLMs to ultra-low bits. To effectively optimize the rounding in LLMs and stabilize the reconstruction process, we introduce progressive adaptive rounding. This approach iteratively transits the soft rounding variables to hard variables during the reconstruction process. Additionally, we optimize the dequantization scale parameters to fully leverage the block reconstruction technique. We demonstrate that TesseraQ can be seamlessly integrated with existing scaling or clipping-based PTQ algorithms such as AWQ and OmniQuant, significantly enhancing their performance and establishing a new state-of-the-art. For instance, when compared to AWQ, TesseraQ improves the wikitext2 perplexity from 14.65 to 6.82 and average downstream accuracy from 50.52 to 59.27 with 2-bit weight-only quantization of LLaMA-2-7B. Across a range of quantization schemes, including W2A16, W3A16, W3A3, and W4A4, TesseraQ consistently exhibits superior performance.

Updated: 2024-10-24 19:06:51

标题: TesseraQ: 超低比特LLM后训练量化与块重建

摘要: 大型语言模型（LLM）已经彻底改变了自然语言处理，尽管以巨大的内存和计算需求为代价。训练后量化（PTQ）正成为减少LLM内存占用和提高推断吞吐量的事实标准方法。在这项工作中，我们旨在通过优化权重舍入参数和先前视觉模型中的主要方法——块重构技术，将LLM PTQ的上限推向更高。我们提出了TesseraQ，一种新的最先进的PTQ技术，可以将LLM的权重量化为超低比特。为了有效优化LLM中的舍入和稳定重构过程，我们引入了渐进自适应舍入。这种方法在重构过程中迭代地将软舍入变量转换为硬变量。此外，我们优化了解量化比例参数，充分利用块重构技术。我们证明TesseraQ可以与现有的基于缩放或剪切的PTQ算法（如AWQ和OmniQuant）无缝集成，显著提升它们的性能并确立了一个新的最先进水平。例如，与AWQ相比，TesseraQ将wikitext2的困惑度从14.65降至6.82，将平均下游准确率从50.52提高至59.27，同时使用LLaMA-2-7B的2比特权重量化。在一系列量化方案中，包括W2A16、W3A16、W3A3和W4A4，TesseraQ始终展现出更优越的性能。

更新时间: 2024-10-24 19:06:51

领域: cs.LG

下载: http://arxiv.org/abs/2410.19103v1

Evolution with Opponent-Learning Awareness

The universe involves many independent co-learning agents as an ever-evolving part of our observed environment. Yet, in practice, Multi-Agent Reinforcement Learning (MARL) applications are usually constrained to small, homogeneous populations and remain computationally intensive. In this paper, we study how large heterogeneous populations of learning agents evolve in normal-form games. We show how, under assumptions commonly made in the multi-armed bandit literature, Multi-Agent Policy Gradient closely resembles the Replicator Dynamic, and we further derive a fast, parallelizable implementation of Opponent-Learning Awareness tailored for evolutionary simulations. This enables us to simulate the evolution of very large populations made of heterogeneous co-learning agents, under both naive and advanced learning strategies. We demonstrate our approach in simulations of 200,000 agents, evolving in the classic games of Hawk-Dove, Stag-Hunt, and Rock-Paper-Scissors. Each game highlights distinct ways in which Opponent-Learning Awareness affects evolution.

Updated: 2024-10-24 19:03:36

标题: 对手学习意识下的进化

摘要: 宇宙涉及许多独立的共同学习代理，作为我们观察到的环境的不断发展的一部分。然而，在实践中，多智能体强化学习（MARL）应用通常受限于小规模、同质种群，并且计算密集。在本文中，我们研究了大规模异质学习代理在正态博弈中的演化。我们展示了在多臂老虎机文献中常见的假设下，多智能体策略梯度与复制动态密切相关，并进一步推导了适用于进化模拟的快速、可并行化的对手学习意识实现。这使我们能够模拟由异质共同学习代理组成的非常大规模种群的演化，采用天真和先进的学习策略。我们在200,000个代理的模拟中演示了我们的方法，在鹰-鸽、獐狩和剪刀石头布等经典游戏中演化。每个游戏突出了对手学习意识对演化的影响的不同方式。

更新时间: 2024-10-24 19:03:36

领域: cs.LG,cs.GT,cs.MA,q-bio.PE,q-fin.GN

下载: http://arxiv.org/abs/2410.17466v2

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Videos are often used to learn or extract the necessary information to complete tasks in ways different than what text and static imagery alone can provide. However, many existing agent benchmarks neglect long-context video understanding, instead focusing on text or static image inputs. To bridge this gap, we introduce VideoWebArena (VideoWA), a benchmark for evaluating the capabilities of long-context multimodal agents for video understanding. VideoWA consists of 2,021 web agent tasks based on manually crafted video tutorials, which total almost four hours of content. For our benchmark, we define a taxonomy of long-context video-based agent tasks with two main areas of focus: skill retention and factual retention. While skill retention tasks evaluate whether an agent can use a given human demonstration to complete a task efficiently, the factual retention task evaluates whether an agent can retrieve instruction-relevant information from a video to complete a task. We find that the best model achieves 13.3% success on factual retention tasks and 45.8% on factual retention QA pairs, far below human performance at 73.9% and 79.3%, respectively. On skill retention tasks, long-context models perform worse with tutorials than without, exhibiting a 5% performance decrease in WebArena tasks and a 10.3% decrease in VisualWebArena tasks. Our work highlights the need to improve the agentic abilities of long-context multimodal models and provides a testbed for future development with long-context video agents.

Updated: 2024-10-24 19:03:01

标题: VideoWebArena：通过视频理解网络任务评估长上下文多模态代理

摘要: 视频通常被用来学习或提取完成任务所需的必要信息，这种方式与仅使用文本和静态图像所能提供的方式不同。然而，许多现有的代理基准忽视了长篇视频理解，而是专注于文本或静态图像输入。为了弥合这一差距，我们介绍了VideoWebArena（VideoWA），这是一个用于评估长篇多模态代理在视频理解方面能力的基准。VideoWA包括2,021个基于手工制作视频教程的网络代理任务，总共约四个小时的内容。对于我们的基准，我们定义了一个基于长篇视频的代理任务分类法，重点关注两个主要领域：技能保持和事实保持。虽然技能保持任务评估代理是否能够利用给定的人类演示来高效完成任务，事实保持任务评估代理是否能够从视频中检索与指导相关的信息来完成任务。我们发现，最佳模型在事实保持任务上的成功率为13.3％，在事实保持QA对上为45.8％，远低于人类的表现分别为73.9％和79.3％。在技能保持任务上，长篇模型在教程中表现不佳，表现出在WebArena任务中的5％性能下降和在VisualWebArena任务中的10.3％下降。我们的工作凸显了改进长篇多模态模型的代理能力的必要性，并为未来开发长篇视频代理提供了一个测试平台。

更新时间: 2024-10-24 19:03:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.19100v1

Inherently Interpretable Tree Ensemble Learning

Tree ensemble models like random forests and gradient boosting machines are widely used in machine learning due to their excellent predictive performance. However, a high-performance ensemble consisting of a large number of decision trees lacks sufficient transparency and explainability. In this paper, we demonstrate that when shallow decision trees are used as base learners, the ensemble learning algorithms can not only become inherently interpretable subject to an equivalent representation as the generalized additive models but also sometimes lead to better generalization performance. First, an interpretation algorithm is developed that converts the tree ensemble into the functional ANOVA representation with inherent interpretability. Second, two strategies are proposed to further enhance the model interpretability, i.e., by adding constraints in the model training stage and post-hoc effect pruning. Experiments on simulations and real-world datasets show that our proposed methods offer a better trade-off between model interpretation and predictive performance, compared with its counterpart benchmarks.

Updated: 2024-10-24 18:58:41

标题: 固有可解释的树集成学习

摘要: 树集成模型如随机森林和梯度提升机在机器学习中被广泛使用，因为它们具有出色的预测性能。然而，由大量决策树组成的高性能集成缺乏足够的透明度和可解释性。本文表明，当浅层决策树被用作基学习器时，集成学习算法不仅可以成为固有可解释的，符合广义加性模型的等效表示，而且有时还会导致更好的泛化性能。首先，开发了一种解释算法，将树集成转化为具有固有可解释性的函数ANOVA表示。其次，提出了两种策略来进一步提高模型的可解释性，即在模型训练阶段添加约束和事后效果修剪。对模拟和真实数据集的实验表明，与其对应的基准相比，我们提出的方法在模型解释和预测性能之间提供了更好的平衡。

更新时间: 2024-10-24 18:58:41

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.19098v1

One-shot World Models Using a Transformer Trained on a Synthetic Prior

A World Model is a compressed spatial and temporal representation of a real world environment that allows one to train an agent or execute planning methods. However, world models are typically trained on observations from the real world environment, and they usually do not enable learning policies for other real environments. We propose One-Shot World Model (OSWM), a transformer world model that is learned in an in-context learning fashion from purely synthetic data sampled from a prior distribution. Our prior is composed of multiple randomly initialized neural networks, where each network models the dynamics of each state and reward dimension of a desired target environment. We adopt the supervised learning procedure of Prior-Fitted Networks by masking next-state and reward at random context positions and query OSWM to make probabilistic predictions based on the remaining transition context. During inference time, OSWM is able to quickly adapt to the dynamics of a simple grid world, as well as the CartPole gym and a custom control environment by providing 1k transition steps as context and is then able to successfully train environment-solving agent policies. However, transferring to more complex environments remains a challenge, currently. Despite these limitations, we see this work as an important stepping-stone in the pursuit of learning world models purely from synthetic data.

Updated: 2024-10-24 18:57:44

标题: 使用在合成先验上训练的Transformer的一次性世界模型

摘要: 世界模型是对真实世界环境的压缩空间和时间表示，允许训练代理或执行规划方法。然而，世界模型通常是在真实世界环境的观察数据上训练的，它们通常不支持为其他真实环境学习策略。我们提出了一种一次性世界模型（OSWM），这是一种变压器世界模型，它是通过纯合成数据从先验分布中采样学习的。我们的先验由多个随机初始化的神经网络组成，每个网络模拟所需目标环境的每个状态和奖励维度的动态。我们采用了先验拟合网络的监督学习过程，通过在随机上下文位置屏蔽下一个状态和奖励，并查询OSWM基于剩余过渡上下文进行概率预测。在推理时间，OSWM能够快速适应简单的网格世界的动态，以及CartPole健身房和自定义控制环境，通过提供1k次过渡步骤作为上下文，然后成功训练解决环境的代理策略。然而，目前转移到更复杂的环境仍然是一个挑战。尽管存在这些限制，我们认为这项工作是纯粹从合成数据学习世界模型的重要一步。

更新时间: 2024-10-24 18:57:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.14084v2

Watermarking Large Language Models and the Generated Content: Opportunities and Challenges

The widely adopted and powerful generative large language models (LLMs) have raised concerns about intellectual property rights violations and the spread of machine-generated misinformation. Watermarking serves as a promising approch to establish ownership, prevent unauthorized use, and trace the origins of LLM-generated content. This paper summarizes and shares the challenges and opportunities we found when watermarking LLMs. We begin by introducing techniques for watermarking LLMs themselves under different threat models and scenarios. Next, we investigate watermarking methods designed for the content generated by LLMs, assessing their effectiveness and resilience against various attacks. We also highlight the importance of watermarking domain-specific models and data, such as those used in code generation, chip design, and medical applications. Furthermore, we explore methods like hardware acceleration to improve the efficiency of the watermarking process. Finally, we discuss the limitations of current approaches and outline future research directions for the responsible use and protection of these generative AI tools.

Updated: 2024-10-24 18:55:33

标题: 大型语言模型和生成内容的水印：机遇与挑战

摘要: 广泛采用且功能强大的生成式大语言模型（LLMs）引发了关于知识产权侵犯和机器生成的不实信息传播的担忧。数字水印技术被视为建立所有权、防止未经授权使用以及追踪LLM生成内容来源的有希望方法。本文总结并分享了我们在为LLMs添加数字水印时发现的挑战和机遇。我们首先介绍了在不同威胁模型和场景下为LLMs本身添加数字水印的技术。接下来，我们调查了专为LLMs生成内容设计的数字水印方法，评估它们对各种攻击的有效性和韧性。我们还强调了数字水印领域特定模型和数据的重要性，例如用于代码生成、芯片设计和医学应用的模型。此外，我们探讨了如硬件加速等方法，以提高数字水印过程的效率。最后，我们讨论了当前方法的局限性，并概述了未来研究方向，以负责任地使用和保护这些生成式人工智能工具。

更新时间: 2024-10-24 18:55:33

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2410.19096v1

Provable Tempered Overfitting of Minimal Nets and Typical Nets

We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of weights) and a random interpolating NN. For both learning rules, we prove overfitting is tempered. Our analysis rests on a new bound on the size of a threshold circuit consistent with a partial function. To the best of our knowledge, ours are the first theoretical results on benign or tempered overfitting that: (1) apply to deep NNs, and (2) do not require a very high or very low input dimension.

Updated: 2024-10-24 18:51:56

标题: 可证明的淬火过拟合与极小网络和典型网络

摘要: 我们研究了使用二进制权重拟合的全连接深度神经网络（NNs）在完美分类嘈杂训练集时的过拟合行为。我们考虑使用最小NN（具有最小数量的权重）和随机插值NN进行插值。对于两种学习规则，我们证明了过拟合是受到抑制的。我们的分析基于一个与部分函数一致的阈值电路大小的新界限。据我们所知，我们是第一个关于深度NNs的良性或受抑制过拟合的理论结果，这些结果不需要非常高或非常低的输入维度。

更新时间: 2024-10-24 18:51:56

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.19092v1

STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Question-Answering

Large Vision-Language Models (LVLMs) have shown significant potential in assisting medical diagnosis by leveraging extensive biomedical datasets. However, the advancement of medical image understanding and reasoning critically depends on building high-quality visual instruction data, which is costly and labor-intensive to obtain, particularly in the medical domain. To mitigate this data-starving issue, we introduce Self-Training Large Language and Vision Assistant for Medicine (STLLaVA-Med). The proposed method is designed to train a policy model (an LVLM) capable of auto-generating medical visual instruction data to improve data efficiency, guided through Direct Preference Optimization (DPO). Specifically, a more powerful and larger LVLM (e.g., GPT-4o) is involved as a biomedical expert to oversee the DPO fine-tuning process on the auto-generated data, encouraging the policy model to align efficiently with human preferences. We validate the efficacy and data efficiency of STLLaVA-Med across three major medical Visual Question Answering (VQA) benchmarks, demonstrating competitive zero-shot performance with the utilization of only 9% of the medical data.

Updated: 2024-10-24 18:47:37

标题: STLLaVA-Med：用于医学问答的自我训练大型语言和视觉助手

摘要: 大型视觉语言模型（LVLMs）已经显示出在利用大量生物医学数据协助医学诊断方面具有显著潜力。然而，医学图像理解和推理的进展关键取决于构建高质量的视觉指导数据，特别是在医学领域获取这些数据成本高且劳动密集。为了缓解这一数据匮乏问题，我们引入了一种自我训练的大型语言和视觉医学助手（STLLaVA-Med）。所提出的方法旨在训练一个能够自动生成医学视觉指导数据以提高数据效率的策略模型（LVLM），通过直接偏好优化（DPO）进行指导。具体来说，一个更强大和更大的LVLM（例如GPT-4o）作为生物医学专家参与监督DPO对自动生成数据的微调过程，鼓励策略模型与人类偏好有效对齐。我们验证了STLLaVA-Med在三个主要医学视觉问答（VQA）基准上的功效和数据效率，展示了仅利用医学数据的9%即可进行竞争性零样本性能。

更新时间: 2024-10-24 18:47:37

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.19973v2

A Counterexample in Cross-Correlation Template Matching

Sampling and quantization are standard practices in signal and image processing, but a theoretical understanding of their impact is incomplete. We consider discrete image registration when the underlying function is a one-dimensional spatially-limited piecewise constant function. For ideal noiseless sampling the number of samples from each region of the support of the function generally depends on the placement of the sampling grid. Therefore, if the samples of the function are noisy, then image registration requires alignment and segmentation of the data sequences. One popular strategy for aligning images is selecting the maximum from cross-correlation template matching. To motivate more robust and accurate approaches which also address segmentation, we provide an example of a one-dimensional spatially-limited piecewise constant function for which the cross-correlation technique can perform poorly on noisy samples. While earlier approaches to improve the method involve normalization, our example suggests a novel strategy in our setting. Difference sequences, thresholding, and dynamic programming are well-known techniques in image processing. We prove that they are tools to correctly align and segment noisy data sequences under some conditions on the noise. We also address some of the potential difficulties that could arise in a more general case.

Updated: 2024-10-24 18:42:01

标题: 跨相关模板匹配中的一个反例

摘要: 采样和量化是信号和图像处理中的标准实践，但对它们的影响的理论理解尚不完整。我们考虑离散图像配准，当底层函数是一维空间有限的分段常数函数时。对于理想无噪声的采样，从函数支持区域的每个区域中取样的数量通常取决于采样网格的放置。因此，如果函数的样本有噪声，那么图像配准需要对数据序列进行对齐和分割。一种常用的图像对齐策略是选择最大的互相关模板匹配。为了激励更健壮和准确的方法，同时也解决分割问题，我们提供了一个一维空间有限的分段常数函数的例子，该交叉相关技术在噪声样本上表现不佳。虽然早期改进该方法的方法包括归一化，但我们的例子在我们的情况下提出了一种新颖的策略。差分序列、阈值处理和动态规划是图像处理中众所周知的技术。我们证明它们是在一些对噪声的条件下正确对齐和分割嘈杂数据序列的工具。我们还解决了更一般情况下可能出现的一些潜在困难。

更新时间: 2024-10-24 18:42:01

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2410.19085v1

FastSurvival: Hidden Computational Blessings in Training Cox Proportional Hazards Models

Survival analysis is an important research topic with applications in healthcare, business, and manufacturing. One essential tool in this area is the Cox proportional hazards (CPH) model, which is widely used for its interpretability, flexibility, and predictive performance. However, for modern data science challenges such as high dimensionality (both $n$ and $p$) and high feature correlations, current algorithms to train the CPH model have drawbacks, preventing us from using the CPH model at its full potential. The root cause is that the current algorithms, based on the Newton method, have trouble converging due to vanishing second order derivatives when outside the local region of the minimizer. To circumvent this problem, we propose new optimization methods by constructing and minimizing surrogate functions that exploit hidden mathematical structures of the CPH model. Our new methods are easy to implement and ensure monotonic loss decrease and global convergence. Empirically, we verify the computational efficiency of our methods. As a direct application, we show how our optimization methods can be used to solve the cardinality-constrained CPH problem, producing very sparse high-quality models that were not previously practical to construct. We list several extensions that our breakthrough enables, including optimization opportunities, theoretical questions on CPH's mathematical structure, as well as other CPH-related applications.

Updated: 2024-10-24 18:36:59

标题: FastSurvival: 在训练Cox比例风险模型中隐藏的计算福祉

摘要: 生存分析是一个重要的研究课题，应用于医疗保健、商业和制造业。在这个领域的一个基本工具是Cox比例风险（CPH）模型，因其可解释性、灵活性和预测性能而被广泛使用。然而，对于现代数据科学挑战，如高维度（$n$和$p$）和高特征相关性，目前用于训练CPH模型的算法存在缺陷，使我们无法充分发挥CPH模型的潜力。根本原因在于当前的基于牛顿方法的算法在极小值点以外的局部区域具有消失的二阶导数，导致难以收敛。为了规避这个问题，我们提出了通过构建和最小化利用CPH模型的隐藏数学结构的代理函数的新优化方法。我们的新方法易于实施，并确保损失单调减少和全局收敛。从经验上看，我们验证了我们方法的计算效率。作为直接应用，我们展示了我们的优化方法如何用于解决基数受限的CPH问题，产生了以前无法构建的非常稀疏的高质量模型。我们列出了我们的突破带来的几个扩展，包括优化机会、关于CPH数学结构的理论问题，以及其他与CPH相关的应用。

更新时间: 2024-10-24 18:36:59

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.19081v1

BIFRÖST: 3D-Aware Image compositing with Language Instructions

This paper introduces Bifr\"ost, a novel 3D-aware framework that is built upon diffusion models to perform instruction-based image composition. Previous methods concentrate on image compositing at the 2D level, which fall short in handling complex spatial relationships ($\textit{e.g.}$, occlusion). Bifr\"ost addresses these issues by training MLLM as a 2.5D location predictor and integrating depth maps as an extra condition during the generation process to bridge the gap between 2D and 3D, which enhances spatial comprehension and supports sophisticated spatial interactions. Our method begins by fine-tuning MLLM with a custom counterfactual dataset to predict 2.5D object locations in complex backgrounds from language instructions. Then, the image-compositing model is uniquely designed to process multiple types of input features, enabling it to perform high-fidelity image compositions that consider occlusion, depth blur, and image harmonization. Extensive qualitative and quantitative evaluations demonstrate that Bifr\"ost significantly outperforms existing methods, providing a robust solution for generating realistically composed images in scenarios demanding intricate spatial understanding. This work not only pushes the boundaries of generative image compositing but also reduces reliance on expensive annotated datasets by effectively utilizing existing resources in innovative ways.

Updated: 2024-10-24 18:35:12

标题: BIFRÖST：具有语言指令的3D感知图像合成

摘要: 这篇论文介绍了Bifr\"ost，这是一个基于扩散模型构建的新颖的3D感知框架，用于执行基于指令的图像合成。先前的方法集中在2D级别的图像合成上，无法处理复杂的空间关系（如遮挡）。Bifr\"ost通过训练MLLM作为2.5D位置预测器，并在生成过程中集成深度图作为额外条件来解决这些问题，以弥合2D和3D之间的差距，增强空间理解并支持复杂的空间交互。我们的方法首先通过使用自定义反事实数据集对MLLM进行微调，从语言指令中预测复杂背景中的2.5D物体位置。然后，图像合成模型被独特设计为处理多种类型的输入特征，使其能够执行考虑遮挡、深度模糊和图像协调的高保真图像合成。广泛的定性和定量评估表明，Bifr\"ost明显优于现有方法，在需要复杂空间理解的场景中提供了一个强大的解决方案，用于生成现实感图像。这项工作不仅推动了生成图像合成的边界，还通过有效利用现有资源的创新方式，减少了对昂贵注释数据集的依赖。

更新时间: 2024-10-24 18:35:12

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.19079v1

Target Strangeness: A Novel Conformal Prediction Difficulty Estimator

This paper introduces Target Strangeness, a novel difficulty estimator for conformal prediction (CP) that offers an alternative approach for normalizing prediction intervals (PIs). By assessing how atypical a prediction is within the context of its nearest neighbours' target distribution, Target Strangeness can surpass the current state-of-the-art performance. This novel difficulty estimator is evaluated against others in the context of several conformal regression experiments.

Updated: 2024-10-24 18:33:32

标题: 目标陌生性：一种新颖的符合预测困难度估计器

摘要: 本文介绍了目标奇异性，这是一种新颖的难度估计器，用于符合预测（CP），为标准化预测区间（PIs）提供了一种替代方法。通过评估预测在其最近邻目标分布的上下文中有多么不寻常，目标奇异性可以超越当前最先进的性能。这种新颖的难度估计器在几个符合回归实验的背景下与其他难度估计器进行了评估。

更新时间: 2024-10-24 18:33:32

领域: cs.LG

下载: http://arxiv.org/abs/2410.19077v1

Deep-MacroFin: Informed Equilibrium Neural Network for Continuous Time Economic Models

In this paper, we present Deep-MacroFin, a comprehensive framework designed to solve partial differential equations, with a particular focus on models in continuous time economics. This framework leverages deep learning methodologies, including conventional Multi-Layer Perceptrons and the newly developed Kolmogorov-Arnold Networks. It is optimized using economic information encapsulated by Hamilton-Jacobi-Bellman equations and coupled algebraic equations. The application of neural networks holds the promise of accurately resolving high-dimensional problems with fewer computational demands and limitations compared to standard numerical methods. This versatile framework can be readily adapted for elementary differential equations, and systems of differential equations, even in cases where the solutions may exhibit discontinuities. Importantly, it offers a more straightforward and user-friendly implementation than existing libraries.

Updated: 2024-10-24 18:31:36

标题: 深度宏观金融：用于连续时间经济模型的信息均衡神经网络

摘要: 在这篇论文中，我们介绍了Deep-MacroFin，一个专门设计用来解决偏微分方程的全面框架，特别关注连续时间经济模型。该框架利用了深度学习方法，包括传统的多层感知器和新开发的科尔莫哥洛夫-阿诺德网络。它通过哈密尔顿-雅可比-贝尔曼方程和耦合代数方程所包含的经济信息进行优化。神经网络的应用有望准确解决高维问题，相比标准数值方法，计算需求和限制更少。这个多功能框架可以轻松适应初等微分方程和微分方程系统，甚至在解决方案可能出现不连续性的情况下也可以。重要的是，它提供了比现有库更简单和用户友好的实现方式。

更新时间: 2024-10-24 18:31:36

领域: cs.LG,cs.CE,q-fin.CP,I.0; J.4

下载: http://arxiv.org/abs/2408.10368v3

A Generalized Framework for Multiscale State-Space Modeling with Nested Nonlinear Dynamics: An Application to Bayesian Learning under Switching Regimes

In this work, we introduce a generalized framework for multiscale state-space modeling that incorporates nested nonlinear dynamics, with a specific focus on Bayesian learning under switching regimes. Our framework captures the complex interactions between fast and slow processes within systems, allowing for the analysis of how these dynamics influence each other across various temporal scales. We model these interactions through a hierarchical structure in which finer time-scale dynamics are nested within coarser ones, while facilitating feedback between the scales. To promote the practical application of our framework, we address the problem of identifying switching regimes and transient dynamics. In particular, we develop a Bayesian learning approach to estimate latent states and indicators corresponding to switching dynamics, enabling the model to adapt effectively to regime changes. We employ Sequential Monte Carlo, or particle filtering, for inference. We illustrate the utility of our framework through simulations. The results demonstrate that our Bayesian learning approach effectively tracks state transitions and achieves accurate identification of switching dynamics in multiscale systems.

Updated: 2024-10-24 18:31:20

标题: 一个广义的多尺度状态空间建模框架与嵌套非线性动力学：基于贝叶斯学习在切换制度下的应用

摘要: 在这项工作中，我们介绍了一个广义的多尺度状态空间建模框架，该框架包括嵌套非线性动态，重点关注在切换制度下的贝叶斯学习。我们的框架捕捉了系统内快速和慢速过程之间复杂的相互作用，允许分析这些动态如何在不同的时间尺度上相互影响。我们通过分层结构来建模这些相互作用，在这个结构中，更细的时间尺度动态嵌套在更粗的动态中，同时促进尺度之间的反馈。为了促进我们框架的实际应用，我们解决了识别切换制度和瞬态动态的问题。特别是，我们开发了一种贝叶斯学习方法来估计与切换动态对应的潜在状态和指标，使模型能够有效地适应制度变化。我们使用顺序蒙特卡洛或粒子滤波进行推断。我们通过模拟展示了我们框架的实用性。结果表明，我们的贝叶斯学习方法有效跟踪状态转换，并在多尺度系统中准确地识别切换动态。

更新时间: 2024-10-24 18:31:20

领域: stat.ML,cs.CE,cs.LG,eess.SP

下载: http://arxiv.org/abs/2410.19074v1

All models are wrong, some are useful: Model Selection with Limited Labels

We introduce MODEL SELECTOR, a framework for label-efficient selection of pretrained classifiers. Given a pool of unlabeled target data, MODEL SELECTOR samples a small subset of highly informative examples for labeling, in order to efficiently identify the best pretrained model for deployment on this target dataset. Through extensive experiments, we demonstrate that MODEL SELECTOR drastically reduces the need for labeled data while consistently picking the best or near-best performing model. Across 18 model collections on 16 different datasets, comprising over 1,500 pretrained models, MODEL SELECTOR reduces the labeling cost by up to 94.15% to identify the best model compared to the cost of the strongest baseline. Our results further highlight the robustness of MODEL SELECTOR in model selection, as it reduces the labeling cost by up to 72.41% when selecting a near-best model, whose accuracy is only within 1% of the best model.

Updated: 2024-10-24 18:31:14

标题: 所有模型都是错误的，但有些是有用的：有限标签下的模型选择

摘要: 我们引入了MODEL SELECTOR，这是一个用于高效选择预训练分类器的标签效率框架。给定一组未标记的目标数据池，MODEL SELECTOR会对其中的一小部分高度信息丰富的示例进行抽样标记，以便有效地确定在该目标数据集上部署的最佳预训练模型。通过大量实验证明，MODEL SELECTOR显著减少了对标记数据的需求，同时始终选择最佳或接近最佳性能的模型。在16个不同数据集上的18个模型集合中，涵盖了超过1,500个预训练模型，MODEL SELECTOR将标记成本降低了高达94.15%，以确定最佳模型相对于最强基线成本的比较。我们的结果进一步突显了MODEL SELECTOR在模型选择中的稳健性，因为当选择一个接近最佳模型的模型时，其准确率仅比最佳模型低不到1%，MODEL SELECTOR将标记成本降低了高达72.41%。

更新时间: 2024-10-24 18:31:14

领域: cs.LG

下载: http://arxiv.org/abs/2410.13609v2

LoGU: Long-form Generation with Uncertainty Expressions

While Large Language Models (LLMs) demonstrate impressive capabilities, they still struggle with generating factually incorrect content (i.e., hallucinations). A promising approach to mitigate this issue is enabling models to express uncertainty when unsure. Previous research on uncertainty modeling has primarily focused on short-form QA, but realworld applications often require much longer responses. In this work, we introduce the task of Long-form Generation with Uncertainty(LoGU). We identify two key challenges: Uncertainty Suppression, where models hesitate to express uncertainty, and Uncertainty Misalignment, where models convey uncertainty inaccurately. To tackle these challenges, we propose a refinement-based data collection framework and a two-stage training pipeline. Our framework adopts a divide-and-conquer strategy, refining uncertainty based on atomic claims. The collected data are then used in training through supervised fine-tuning (SFT) and direct preference optimization (DPO) to enhance uncertainty expression. Extensive experiments on three long-form instruction following datasets show that our method significantly improves accuracy, reduces hallucinations, and maintains the comprehensiveness of responses.

Updated: 2024-10-24 18:26:39

标题: LoGU：使用不确定表达式进行长文生成

摘要: 尽管大型语言模型(LLMs)展现出令人印象深刻的能力，但它们仍然在生成事实上不正确的内容（即幻觉）方面存在困难。缓解这一问题的一个有前途的方法是在模型不确定时使其表达不确定性。先前关于不确定性建模的研究主要集中在短形式问答上，但现实世界中的应用通常需要更长的回答。在这项工作中，我们引入了带有不确定性的长篇生成任务(LoGU)。我们确定了两个关键挑战：不确定性抑制，即模型在表达不确定性时犹豫不决，以及不确定性不对齐，即模型不准确地传达不确定性。为了解决这些挑战，我们提出了一个基于改进的数据收集框架和一个两阶段训练流程。我们的框架采用分而治之的策略，根据原子主张来细化不确定性。然后，收集的数据通过监督微调(SFT)和直接偏好优化(DPO)用于训练，以增强不确定性表达。对三个长篇指示遵循数据集的广泛实验表明，我们的方法显著提高了准确性，减少了幻觉，并保持了回答的全面性。

更新时间: 2024-10-24 18:26:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14309v2

Creativity in AI: Progresses and Challenges

Creativity is the ability to produce novel, useful, and surprising ideas, and has been widely studied as a crucial aspect of human cognition. Machine creativity on the other hand has been a long-standing challenge. With the rise of advanced generative AI, there has been renewed interest and debate regarding AI's creative capabilities. Therefore, it is imperative to revisit the state of creativity in AI and identify key progresses and remaining challenges. In this work, we survey leading works studying the creative capabilities of AI systems, focusing on creative problem-solving, linguistic, artistic, and scientific creativity. Our review suggests that while the latest AI models are largely capable of producing linguistically and artistically creative outputs such as poems, images, and musical pieces, they struggle with tasks that require creative problem-solving, abstract thinking and compositionality and their generations suffer from a lack of diversity, originality, long-range incoherence and hallucinations. We also discuss key questions concerning copyright and authorship issues with generative models. Furthermore, we highlight the need for a comprehensive evaluation of creativity that is process-driven and considers several dimensions of creativity. Finally, we propose future research directions to improve the creativity of AI outputs, drawing inspiration from cognitive science and psychology.

Updated: 2024-10-24 18:25:15

标题: AI中的创造力：进展与挑战

摘要: 创造力是产生新颖、有用和令人惊讶的想法的能力，已被广泛研究作为人类认知的重要方面。另一方面，机器创造力一直是一个长期存在的挑战。随着先进生成人工智能的兴起，人们对人工智能的创造能力重新产生了兴趣和争论。因此，有必要重新审视人工智能的创造力状态，并确定关键的进展和剩余挑战。在这项工作中，我们调查了研究人工智能系统创造能力的领先作品，重点关注创造性问题解决、语言、艺术和科学创造力。我们的评论表明，尽管最新的人工智能模型在产生诸如诗歌、图像和音乐作品等语言和艺术创作方面具有相当的能力，但它们在需要创造性问题解决、抽象思维和组合性等任务上表现不佳，它们的生成物缺乏多样性、原创性、长距离的不连贯和幻觉。我们还讨论了涉及生成模型的版权和作者权问题的关键问题。此外，我们强调需要进行一个以过程为驱动，并考虑多个创造性维度的创造力综合评估的必要性。最后，我们提出了改善人工智能输出创造力的未来研究方向，从认知科学和心理学中汲取灵感。

更新时间: 2024-10-24 18:25:15

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.17218v2

Less Discriminatory Alternative and Interpretable XGBoost Framework for Binary Classification

Fair lending practices and model interpretability are crucial concerns in the financial industry, especially given the increasing use of complex machine learning models. In response to the Consumer Financial Protection Bureau's (CFPB) requirement to protect consumers against unlawful discrimination, we introduce LDA-XGB1, a novel less discriminatory alternative (LDA) machine learning model for fair and interpretable binary classification. LDA-XGB1 is developed through biobjective optimization that balances accuracy and fairness, with both objectives formulated using binning and information value. It leverages the predictive power and computational efficiency of XGBoost while ensuring inherent model interpretability, including the enforcement of monotonic constraints. We evaluate LDA-XGB1 on two datasets: SimuCredit, a simulated credit approval dataset, and COMPAS, a real-world recidivism prediction dataset. Our results demonstrate that LDA-XGB1 achieves an effective balance between predictive accuracy, fairness, and interpretability, often outperforming traditional fair lending models. This approach equips financial institutions with a powerful tool to meet regulatory requirements for fair lending while maintaining the advantages of advanced machine learning techniques.

Updated: 2024-10-24 18:20:52

标题: 二元分类的少歧视替代和可解释XGBoost框架

摘要: 公平的信贷实践和模型可解释性是金融行业的关键问题，特别是考虑到复杂机器学习模型的日益增多的情况。为了响应消费者金融保护局（CFPB）要求保护消费者免受非法歧视的要求，我们引入了LDA-XGB1，这是一种新颖的较少歧视性选择（LDA）机器学习模型，用于公平和可解释的二元分类。LDA-XGB1是通过双目标优化开发的，平衡准确性和公平性，双目标使用分箱和信息价值来制定。它利用XGBoost的预测能力和计算效率，同时确保固有的模型可解释性，包括强制单调约束的执行。我们在两个数据集上评估了LDA-XGB1：SimuCredit，一个模拟信用批准数据集，和COMPAS，一个现实世界的累犯预测数据集。我们的结果表明，LDA-XGB1在预测准确性、公平性和可解释性之间实现了有效的平衡，通常优于传统的公平信贷模型。这种方法为金融机构提供了一个强大的工具，以满足公平信贷的监管要求，同时保持先进机器学习技术的优势。

更新时间: 2024-10-24 18:20:52

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.19067v1

From a Tiny Slip to a Giant Leap: An LLM-Based Simulation for Fake News Evolution

With the growing spread of misinformation online, research has increasingly focused on detecting and tracking fake news. However, an overlooked issue is that fake news does not naturally exist in social networks -- it often originates from distorted facts or deliberate fabrication by malicious actors. Understanding how true news gradually evolves into fake news is critical for early detection and prevention, reducing its spread and impact. Hence, in this paper, we take the first step toward simulating and revealing this evolution, proposing a Fake News evolUtion Simulation framEwork (FUSE) based on large language models (LLMs). Specifically, we employ LLM as agents to represent individuals in a simulated social network. We define four types of agents commonly observed in daily interactions: spreaders, who propagate information; commentators, who provide opinions and interpretations; verifiers, who check the accuracy of information; and bystanders, who passively observe without engaging. For simulated environments, we model various social network structures, such as high-clustering networks and scale-free networks, to mirror real-world network dynamics. Each day, the agents engage in belief exchanges, reflect on their thought processes, and reintroduce the news accordingly. Given the lack of prior work in this area, we developed a FUSE-EVAL evaluation framework to measure the deviation from true news during the fake news evolution process. The results show that FUSE successfully captures the underlying patterns of how true news transforms into fake news and accurately reproduces previously discovered instances of fake news, aligning closely with human evaluations. Moreover, our work provides insights into the fact that combating fake news should not be delayed until it has fully evolved; instead, prevention in advance is key to achieving better outcomes.

Updated: 2024-10-24 18:17:16

标题: 从微小的失误到巨大的飞跃：基于LLM的模拟用于虚假新闻演变

摘要: 随着网络上虚假信息的传播范围不断扩大，研究越来越多地聚焦于检测和追踪假新闻。然而，一个被忽视的问题是假新闻并不自然存在于社交网络中 -- 它往往源自事实扭曲或恶意制造的行为。了解真实新闻如何逐渐演变成假新闻对于早期检测和预防，减少其传播和影响至关重要。因此，在本文中，我们迈出了模拟和揭示这种演变的第一步，提出了基于大型语言模型（LLMs）的假新闻演化模拟框架（FUSE）。具体来说，我们使用LLM作为代理来代表模拟社交网络中的个体。我们定义了在日常互动中常见的四种代理类型：传播者，传播信息；评论者，提供观点和解释；验证者，检查信息的准确性；旁观者，袖手旁观而不参与。在模拟环境中，我们建模了各种社交网络结构，如高聚类网络和无标度网络，以反映现实世界的网络动态。每天，代理们进行信念交流，反思他们的思维过程，并相应地重新引入新闻。鉴于这一领域缺乏先前的工作，我们开发了一个FUSE-EVAL评估框架来衡量假新闻演化过程中与真实新闻的偏差。结果显示，FUSE成功捕捉了真实新闻如何转变为假新闻的潜在模式，并准确地再现了先前发现的假新闻实例，与人类评估结果密切一致。此外，我们的工作揭示了一个事实，即打击假新闻不应该等到它完全演变完毕后才开始；相反，提前预防是实现更好结果的关键。

更新时间: 2024-10-24 18:17:16

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2410.19064v1

An Investigation on Machine Learning Predictive Accuracy Improvement and Uncertainty Reduction using VAE-based Data Augmentation

The confluence of ultrafast computers with large memory, rapid progress in Machine Learning (ML) algorithms, and the availability of large datasets place multiple engineering fields at the threshold of dramatic progress. However, a unique challenge in nuclear engineering is data scarcity because experimentation on nuclear systems is usually more expensive and time-consuming than most other disciplines. One potential way to resolve the data scarcity issue is deep generative learning, which uses certain ML models to learn the underlying distribution of existing data and generate synthetic samples that resemble the real data. In this way, one can significantly expand the dataset to train more accurate predictive ML models. In this study, our objective is to evaluate the effectiveness of data augmentation using variational autoencoder (VAE)-based deep generative models. We investigated whether the data augmentation leads to improved accuracy in the predictions of a deep neural network (DNN) model trained using the augmented data. Additionally, the DNN prediction uncertainties are quantified using Bayesian Neural Networks (BNN) and conformal prediction (CP) to assess the impact on predictive uncertainty reduction. To test the proposed methodology, we used TRACE simulations of steady-state void fraction data based on the NUPEC Boiling Water Reactor Full-size Fine-mesh Bundle Test (BFBT) benchmark. We found that augmenting the training dataset using VAEs has improved the DNN model's predictive accuracy, improved the prediction confidence intervals, and reduced the prediction uncertainties.

Updated: 2024-10-24 18:15:48

标题: 使用基于VAE的数据增强进行机器学习预测准确性改进和不确定性减少的研究

摘要: 超快速计算机与大容量存储器、机器学习算法的快速进展以及大规模数据集的可用性相结合，使多个工程领域处于戏剧性进步的边缘。然而，核工程领域面临的一个独特挑战是数据稀缺，因为对核系统进行实验通常比大多数其他学科更昂贵且耗时。解决数据稀缺问题的一个潜在途径是深度生成学习，它使用特定的机器学习模型学习现有数据的潜在分布，并生成类似真实数据的合成样本。通过这种方式，可以显著扩展数据集，以训练更准确的预测机器学习模型。本研究的目标是评估使用基于变分自动编码器（VAE）的深度生成模型进行数据增强的有效性。我们调查了数据增强是否会提高使用增强数据训练的深度神经网络（DNN）模型的预测准确性。此外，使用贝叶斯神经网络（BNN）和符合预测（CP）来量化DNN预测的不确定性，以评估对预测不确定性的影响减少。为了测试所提出的方法，我们使用基于NUPEC沸腾水反应堆全尺寸细网管束试验（BFBT）基准的TRACE模拟稳态气泡率数据。我们发现，使用VAE对训练数据集进行增强改善了DNN模型的预测准确性，改善了预测置信区间，并降低了预测不确定性。

更新时间: 2024-10-24 18:15:48

领域: cs.LG

下载: http://arxiv.org/abs/2410.19063v1

Planning in a recurrent neural network that plays Sokoban

How a neural network (NN) generalizes to novel situations depends on whether it has learned to select actions heuristically or via a planning process. "An investigation of model-free planning" (Guez et al. 2019) found that a recurrent NN (RNN) trained to play Sokoban appears to plan, with extra computation steps improving the RNN's success rate. We replicate and expand on their behavioral analysis, finding the RNN learns to give itself extra computation steps in complex situations by "pacing" in cycles. Moreover, we train linear probes that predict the future actions taken by the network and find that intervening on the hidden state using these probes controls the agent's subsequent actions. Leveraging these insights, we perform model surgery, enabling the convolutional NN to generalize beyond its 10x10 architectural limit to arbitrarily sized inputs. The resulting model solves challenging, highly off-distribution levels. We open-source our model and code, and believe the neural network's small size (1.29M parameters) makes it an excellent model organism to deepen our understanding of learned planning.

Updated: 2024-10-24 18:06:19

标题: 使用递归神经网络进行Sokoban游戏规划

摘要: 神经网络（NN）如何推广到新颖情况取决于它是否已经学会通过启发式选择行动或通过规划过程。《无模型规划的研究》（Guez等，2019）发现，经过训练玩推箱子的循环神经网络（RNN）似乎在计划，额外的计算步骤提高了RNN的成功率。我们复制并扩展了他们的行为分析，发现RNN学会在复杂情况下通过“节奏”循环来给自己额外的计算步骤。此外，我们训练线性探针来预测网络未来采取的行动，并发现通过这些探测器对隐藏状态进行干预可以控制代理的后续行动。利用这些见解，我们进行模型手术，使卷积神经网络能够超越其10x10的架构限制，适用于任意大小的输入。由此产生的模型解决了具有挑战性、高度超出分布的级别。我们开源我们的模型和代码，并相信神经网络的小规模（1.29M参数）使其成为一个优秀的模型生物，深化我们对学习规划的理解。

更新时间: 2024-10-24 18:06:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.15421v2

Fixed-Point Automatic Differentiation of Forward--Backward Splitting Algorithms for Partly Smooth Functions

A large class of non-smooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We examine such structured problems which also depend on a parameter vector and study the problem of differentiating its solution mapping with respect to the parameter which has far reaching applications in sensitivity analysis and parameter learning problems. Under partial smoothness and other mild assumptions, we apply Implicit (ID) and Automatic Differentiation (AD) to the fixed-point iterations of proximal splitting algorithms. We show that AD of the sequence generated by these algorithms converges (linearly under further assumptions) to the derivative of the solution mapping. For a variant of automatic differentiation, which we call Fixed-Point Automatic Differentiation (FPAD), we remedy the memory overhead problem of the Reverse Mode AD and moreover provide faster convergence theoretically. We numerically illustrate the convergence and convergence rates of AD and FPAD on Lasso and Group Lasso problems and demonstrate the working of FPAD on prototypical image denoising problems by learning the regularization term.

Updated: 2024-10-24 18:04:35

标题: 固定点前向-后向分裂算法对部分光滑函数的自动微分

摘要: 一大类非光滑的实际优化问题可以写成最小化平滑和部分光滑函数的和。我们研究了这种结构化问题，这些问题也依赖于一个参数向量，并研究了与参数有关的解映射的微分问题，在灵敏度分析和参数学习问题中具有深远的应用。在部分光滑性和其他温和的假设下，我们将隐式（ID）和自动微分（AD）应用于近端分裂算法的不动点迭代。我们表明，通过这些算法生成的序列的AD收敛（在进一步假设下线性收敛）到解映射的导数。对于一种称为固定点自动微分（FPAD）的自动微分变体，我们纠正了逆向模式AD的内存开销问题，并在理论上提供更快的收敛速度。我们在Lasso和Group Lasso问题上数值地说明了AD和FPAD的收敛和收敛速度，并通过学习正则化项展示了FPAD在典型图像去噪问题上的工作。

更新时间: 2024-10-24 18:04:35

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2208.03107v3

Automatic Differentiation of Optimization Algorithms with Time-Varying Updates

Numerous Optimization Algorithms have a time-varying update rule thanks to, for instance, a changing step size, momentum parameter or, Hessian approximation. In this paper, we apply unrolled or automatic differentiation to a time-varying iterative process and provide convergence (rate) guarantees for the resulting derivative iterates. We adapt these convergence results and apply them to proximal gradient descent with variable step size and FISTA when solving partly smooth problems. We confirm our findings numerically by solving $\ell_1$ and $\ell_2$-regularized linear and logisitc regression respectively. Our theoretical and numerical results show that the convergence rate of the algorithm is reflected in its derivative iterates.

Updated: 2024-10-24 18:03:31

标题: 时间变化更新的优化算法的自动微分

摘要: 许多优化算法具有时间变化的更新规则，这归功于例如变化的步长、动量参数或Hessian逼近。在本文中，我们将展开或自动微分应用于时间变化的迭代过程，并为导数迭代结果提供收敛（速率）保证。我们将这些收敛结果进行调整，并应用于具有可变步长的近端梯度下降和在解决部分光滑问题时的FISTA算法。我们通过分别解决$\ell_1$和$\ell_2$正则化的线性和逻辑回归来在数值上确认我们的发现。我们的理论和数值结果表明，算法的收敛速度体现在其导数迭代中。

更新时间: 2024-10-24 18:03:31

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2410.15923v2

ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning

Existing math datasets evaluate the reasoning abilities of large language models (LLMs) by either using the final answer or the intermediate reasoning steps derived from static examples. However, the former approach fails to surface model's uses of shortcuts and wrong reasoning while the later poses challenges in accommodating alternative solutions. In this work, we seek to use symbolic programs as a means for automated evaluation if a model can consistently produce correct final answers across various inputs to the program. We begin by extracting programs for popular math datasets (GSM8K and MATH) using GPT4-o. For those executable programs verified using the original input-output pairs, they are found to encapsulate the proper reasoning required to solve the original text questions. We then prompt GPT4-o to generate new questions using alternative input-output pairs based the extracted program. We apply the resulting datasets to evaluate a collection of LLMs. In our experiments, we observe significant accuracy drops using our proposed evaluation compared with original static examples, suggesting the fragility of math reasoning in state-of-the-art LLMs.

Updated: 2024-10-24 18:02:37

标题: ReasonAgain：使用可提取的符号程序评估数学推理

摘要: 现有的数学数据集通过使用最终答案或从静态示例派生的中间推理步骤来评估大型语言模型（LLMs）的推理能力。然而，前一种方法无法显示模型使用快捷方式和错误推理，而后一种方法在容纳替代解决方案方面存在挑战。在这项工作中，我们试图利用符号程序作为自动化评估的手段，以确定模型是否能在各种输入中始终产生正确的最终答案。我们首先使用GPT4-o为流行的数学数据集（GSM8K和MATH）提取程序。对于使用原始输入-输出对验证的可执行程序，发现它们包含解决原始文本问题所需的正确推理。然后，我们提示GPT4-o基于提取的程序生成新问题，使用替代输入-输出对。我们应用生成的数据集来评估一系列LLMs。在我们的实验中，与原始静态示例相比，我们提出的评估导致显著的准确率下降，表明最先进的LLMs在数学推理方面的脆弱性。

更新时间: 2024-10-24 18:02:37

领域: cs.AI

下载: http://arxiv.org/abs/2410.19056v1

Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms

When training neural networks with custom objectives, such as ranking losses and shortest-path losses, a common problem is that they are, per se, non-differentiable. A popular approach is to continuously relax the objectives to provide gradients, enabling learning. However, such differentiable relaxations are often non-convex and can exhibit vanishing and exploding gradients, making them (already in isolation) hard to optimize. Here, the loss function poses the bottleneck when training a deep neural network. We present Newton Losses, a method for improving the performance of existing hard to optimize losses by exploiting their second-order information via their empirical Fisher and Hessian matrices. Instead of training the neural network with second-order techniques, we only utilize the loss function's second-order information to replace it by a Newton Loss, while training the network with gradient descent. This makes our method computationally efficient. We apply Newton Losses to eight differentiable algorithms for sorting and shortest-paths, achieving significant improvements for less-optimized differentiable algorithms, and consistent improvements, even for well-optimized differentiable algorithms.

Updated: 2024-10-24 18:02:11

标题: 牛顿损失：利用曲率信息进行可微算法学习

摘要: 在训练具有自定义目标的神经网络时，例如排名损失和最短路径损失，一个常见问题是它们本身是不可微的。一种流行的方法是持续放松目标以提供梯度，从而实现学习。然而，这种可微的放松通常是非凸的，可能出现梯度消失和爆炸，使得它们（即使是独立的）难以优化。在这里，当训练深度神经网络时，损失函数构成了瓶颈。我们提出了Newton Losses，一种通过利用其经验Fisher和Hessian矩阵的二阶信息来改善现有难以优化损失的性能的方法。我们不是使用二阶技术训练神经网络，而是仅利用损失函数的二阶信息，将其替换为Newton Loss，并使用梯度下降来训练网络。这使得我们的方法在计算上更有效。我们将Newton Losses应用于八种可微分的排序和最短路径算法，为 less-optimized 的可微分算法实现了显著改进，并且对 well-optimized 的可微分算法也实现了一致的改进。

更新时间: 2024-10-24 18:02:11

领域: cs.LG

下载: http://arxiv.org/abs/2410.19055v1

Infogent: An Agent-Based Framework for Web Information Aggregation

Despite seemingly performant web agents on the task-completion benchmarks, most existing methods evaluate the agents based on a presupposition: the web navigation task consists of linear sequence of actions with an end state that marks task completion. In contrast, our work focuses on web navigation for information aggregation, wherein the agent must explore different websites to gather information for a complex query. We consider web information aggregation from two different perspectives: (i) Direct API-driven Access relies on a text-only view of the Web, leveraging external tools such as Google Search API to navigate the web and a scraper to extract website contents. (ii) Interactive Visual Access uses screenshots of the webpages and requires interaction with the browser to navigate and access information. Motivated by these diverse information access settings, we introduce Infogent, a novel modular framework for web information aggregation involving three distinct components: Navigator, Extractor and Aggregator. Experiments on different information access settings demonstrate Infogent beats an existing SOTA multi-agent search framework by 7% under Direct API-Driven Access on FRAMES, and improves over an existing information-seeking web agent by 4.3% under Interactive Visual Access on AssistantBench.

Updated: 2024-10-24 18:01:28

标题: Infogent：用于网络信息聚合的基于代理的框架

摘要: 尽管在任务完成基准测试中似乎表现良好的网络代理，大多数现有方法都是基于一个假设来评估代理：网络导航任务由一系列线性操作组成，具有标记任务完成的最终状态。相比之下，我们的工作侧重于信息聚合的网络导航，在这种情况下，代理必须探索不同的网站以获取复杂查询的信息。我们从两个不同的视角考虑网络信息聚合：(i)直接API驱动访问依赖于Web的纯文本视图，利用外部工具如Google搜索API来导航网页，并使用抓取器提取网站内容。 (ii)交互式视觉访问使用网页的截图，并需要与浏览器进行交互以导航和获取信息。受到这些多样化信息访问设置的启发，我们引入了Infogent，一个涉及三个不同组件的新颖模块化框架，用于网络信息聚合：导航器，提取器和聚合器。在不同的信息访问设置上进行的实验表明，在FRAMES上，Infogent在直接API驱动访问下比现有的SOTA多代理搜索框架提高了7％，并且在AssistantBench上，在交互式视觉访问下比现有的信息搜索网络代理提高了4.3％。

更新时间: 2024-10-24 18:01:28

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.19054v1

Gradient-based Discrete Sampling with Automatic Cyclical Scheduling

Discrete distributions, particularly in high-dimensional deep models, are often highly multimodal due to inherent discontinuities. While gradient-based discrete sampling has proven effective, it is susceptible to becoming trapped in local modes due to the gradient information. To tackle this challenge, we propose an automatic cyclical scheduling, designed for efficient and accurate sampling in multimodal discrete distributions. Our method contains three key components: (1) a cyclical step size schedule where large steps discover new modes and small steps exploit each mode; (2) a cyclical balancing schedule, ensuring "balanced" proposals for given step sizes and high efficiency of the Markov chain; and (3) an automatic tuning scheme for adjusting the hyperparameters in the cyclical schedules, allowing adaptability across diverse datasets with minimal tuning. We prove the non-asymptotic convergence and inference guarantee for our method in general discrete distributions. Extensive experiments demonstrate the superiority of our method in sampling complex multimodal discrete distributions.

Updated: 2024-10-24 18:01:24

标题: 基于梯度的离散抽样与自动周期调度

摘要: 离散分布在高维深度模型中常常是高度多模态的，这是由于固有的不连续性导致的。虽然基于梯度的离散抽样已被证明是有效的，但它容易被梯度信息困住在局部模式中。为了解决这一挑战，我们提出了一种自动循环调度的方法，旨在高效准确地对多模态离散分布进行抽样。我们的方法包含三个关键组件：(1) 循环步长调度，大步长用于发现新模式，小步长用于利用每个模式；(2) 循环平衡调度，确保给定步长的“平衡”提案和马尔可夫链的高效率；(3) 自动调整方案，用于调整循环调度中的超参数，允许在不同数据集之间具有最小的调整性。我们证明了我们的方法在一般的离散分布中的非渐近收敛和推断保证。大量实验表明了我们的方法在抽样复杂多模态离散分布中的优越性。

更新时间: 2024-10-24 18:01:24

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.17699v2

A Geometric View of Data Complexity: Efficient Local Intrinsic Dimension Estimation with Diffusion Models

High-dimensional data commonly lies on low-dimensional submanifolds, and estimating the local intrinsic dimension (LID) of a datum -- i.e. the dimension of the submanifold it belongs to -- is a longstanding problem. LID can be understood as the number of local factors of variation: the more factors of variation a datum has, the more complex it tends to be. Estimating this quantity has proven useful in contexts ranging from generalization in neural networks to detection of out-of-distribution data, adversarial examples, and AI-generated text. The recent successes of deep generative models present an opportunity to leverage them for LID estimation, but current methods based on generative models produce inaccurate estimates, require more than a single pre-trained model, are computationally intensive, or do not exploit the best available deep generative models: diffusion models (DMs). In this work, we show that the Fokker-Planck equation associated with a DM can provide an LID estimator which addresses the aforementioned deficiencies. Our estimator, called FLIPD, is easy to implement and compatible with all popular DMs. Applying FLIPD to synthetic LID estimation benchmarks, we find that DMs implemented as fully-connected networks are highly effective LID estimators that outperform existing baselines. We also apply FLIPD to natural images where the true LID is unknown. Despite being sensitive to the choice of network architecture, FLIPD estimates remain a useful measure of relative complexity; compared to competing estimators, FLIPD exhibits a consistently higher correlation with image PNG compression rate and better aligns with qualitative assessments of complexity. Notably, FLIPD is orders of magnitude faster than other LID estimators, and the first to be tractable at the scale of Stable Diffusion.

Updated: 2024-10-24 18:01:03

标题: 数据复杂性的几何视角：使用扩散模型高效地进行局部内在维度估计

摘要: 高维数据通常位于低维子流形上，并估计数据的局部固有维度（LID）--即其所属子流形的维度--是一个长期存在的问题。LID可以被理解为局部变化因素的数量：数据具有的变化因素越多，它就越复杂。估计这个数量在各种情况下都被证明是有用的，从神经网络的泛化到检测超出分布的数据、对抗性示例和AI生成的文本。深度生成模型的最近成功为利用它们进行LID估计提供了机会，但基于生成模型的当前方法会产生不准确的估计，需要多个预先训练的模型，计算密集，或者没有利用最好的深度生成模型：扩散模型（DMs）。在这项工作中，我们展示了与DM相关联的福克-普朗克方程可以提供一个解决上述缺陷的LID估计器。我们的估计器，称为FLIPD，易于实现并与所有流行的DM兼容。将FLIPD应用于合成LID估计基准测试中，我们发现作为全连接网络实现的DMs是高效的LID估计器，胜过现有基线。我们还将FLIPD应用于未知真实LID的自然图像。尽管对网络架构的选择敏感，但FLIPD估计仍然是相对复杂性的一个有用的度量；与竞争估计器相比，FLIPD与图像PNG压缩率的相关性始终更高，并且更符合对复杂性的定性评估。值得注意的是，FLIPD比其他LID估计器快几个数量级，并且是第一个在稳定扩散规模上可行的估计器。

更新时间: 2024-10-24 18:01:03

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.03537v2

PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views

We propose PixelGaussian, an efficient feed-forward framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Most existing methods rely on uniform pixel-wise Gaussian representations, which learn a fixed number of 3D Gaussians for each view and cannot generalize well to more input views. Differently, our PixelGaussian dynamically adapts both the Gaussian distribution and quantity based on geometric complexity, leading to more efficient representations and significant improvements in reconstruction quality. Specifically, we introduce a Cascade Gaussian Adapter to adjust Gaussian distribution according to local geometry complexity identified by a keypoint scorer. CGA leverages deformable attention in context-aware hypernetworks to guide Gaussian pruning and splitting, ensuring accurate representation in complex regions while reducing redundancy. Furthermore, we design a transformer-based Iterative Gaussian Refiner module that refines Gaussian representations through direct image-Gaussian interactions. Our PixelGaussian can effectively reduce Gaussian redundancy as input views increase. We conduct extensive experiments on the large-scale ACID and RealEstate10K datasets, where our method achieves state-of-the-art performance with good generalization to various numbers of views. Code: https://github.com/Barrybarry-Smith/PixelGaussian.

Updated: 2024-10-24 17:59:58

标题: PixelGaussian：从任意视角重建可泛化的3D高斯模型

摘要: 我们提出了PixelGaussian，这是一个有效的前馈框架，用于从任意视角学习可泛化的3D高斯重建。大多数现有方法依赖于均匀的像素级高斯表示，为每个视图学习固定数量的3D高斯，并且不能很好地推广到更多的输入视图。与之不同的是，我们的PixelGaussian根据几何复杂性动态地调整高斯分布和数量，从而导致更高效的表示和重建质量的显着改进。具体来说，我们引入了一个级联高斯适配器(Cascade Gaussian Adapter)，根据由关键点评分器识别的局部几何复杂性调整高斯分布。CGA利用可变形注意力在上下文感知的超网络中引导高斯修剪和分裂，确保在复杂区域中准确表示同时减少冗余。此外，我们设计了一个基于变压器的迭代高斯细化器模块，通过直接的图像-高斯交互来优化高斯表示。我们的PixelGaussian能够在输入视图增加时有效减少高斯冗余。我们在大规模ACID和RealEstate10K数据集上进行了大量实验，在这些数据集上我们的方法取得了最先进的性能，并且能够很好地推广到各种视图数量。源代码：https://github.com/Barrybarry-Smith/PixelGaussian。

更新时间: 2024-10-24 17:59:58

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18979v1

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark

Recent years have witnessed a significant interest in developing large multimodal models (LMMs) capable of performing various visual reasoning and understanding tasks. This has led to the introduction of multiple LMM benchmarks to evaluate LMMs on different tasks. However, most existing LMM evaluation benchmarks are predominantly English-centric. In this work, we develop a comprehensive LMM evaluation benchmark for the Arabic language to represent a large population of over 400 million speakers. The proposed benchmark, named CAMEL-Bench, comprises eight diverse domains and 38 sub-domains including, multi-image understanding, complex visual perception, handwritten document understanding, video understanding, medical imaging, plant diseases, and remote sensing-based land use understanding to evaluate broad scenario generalizability. Our CAMEL-Bench comprises around 29,036 questions that are filtered from a larger pool of samples, where the quality is manually verified by native speakers to ensure reliable model assessment. We conduct evaluations of both closed-source, including GPT-4 series, and open-source LMMs. Our analysis reveals the need for substantial improvement, especially among the best open-source models, with even the closed-source GPT-4o achieving an overall score of 62%. Our benchmark and evaluation scripts are open-sourced.

Updated: 2024-10-24 17:59:38

标题: CAMEL-Bench：一个全面的阿拉伯语LMM基准测试

摘要: 最近几年，人们对开发能够执行各种视觉推理和理解任务的大型多模态模型（LMMs）表现出了显著的兴趣。这导致了引入多个LMM基准来评估不同任务上的LMMs。然而，大多数现有的LMM评估基准主要是以英语为中心的。在这项工作中，我们为阿拉伯语开发了一个全面的LMM评估基准，以代表超过4亿使用者的大型人口。提出的基准，命名为CAMEL-Bench，包括八个不同的领域和38个子领域，包括多图像理解、复杂视觉感知、手写文件理解、视频理解、医学成像、植物疾病和基于遥感的土地利用理解，以评估广泛的场景通用性。我们的CAMEL-Bench包括大约29,036个问题，这些问题从更大的样本池中筛选出来，其质量由母语使用者手动验证，以确保可靠的模型评估。我们对闭源和开源LMMs进行评估。我们的分析显示，尤其是在最好的开源模型中，需要进行大幅改进，即使是闭源的GPT-4o也能获得62%的总体分数。我们的基准和评估脚本是开源的。

更新时间: 2024-10-24 17:59:38

领域: cs.CV,cs.AI,cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2410.18976v1

Unbounded: A Generative Infinite Game of Character Life Simulation

We introduce the concept of a generative infinite game, a video game that transcends the traditional boundaries of finite, hard-coded systems by using generative models. Inspired by James P. Carse's distinction between finite and infinite games, we leverage recent advances in generative AI to create Unbounded: a game of character life simulation that is fully encapsulated in generative models. Specifically, Unbounded draws inspiration from sandbox life simulations and allows you to interact with your autonomous virtual character in a virtual world by feeding, playing with and guiding it - with open-ended mechanics generated by an LLM, some of which can be emergent. In order to develop Unbounded, we propose technical innovations in both the LLM and visual generation domains. Specifically, we present: (1) a specialized, distilled large language model (LLM) that dynamically generates game mechanics, narratives, and character interactions in real-time, and (2) a new dynamic regional image prompt Adapter (IP-Adapter) for vision models that ensures consistent yet flexible visual generation of a character across multiple environments. We evaluate our system through both qualitative and quantitative analysis, showing significant improvements in character life simulation, user instruction following, narrative coherence, and visual consistency for both characters and the environments compared to traditional related approaches.

Updated: 2024-10-24 17:59:31

标题: 无限：一个生成无限游戏的角色生活模拟游戏

摘要: 我们介绍了生成式无限游戏的概念，这是一种视频游戏，通过使用生成模型超越了传统有限、硬编码系统的边界。受詹姆斯·P·卡斯的有限和无限游戏区分的启发，我们利用了生成式人工智能的最新进展，创造了Unbounded：一款完全封装在生成模型中的角色生活模拟游戏。具体来说，Unbounded从沙盒生活模拟中汲取灵感，允许您通过喂养、玩耍和引导您的虚拟角色在虚拟世界中互动 - 由LLM生成的开放式机制，其中一些可能是新兴的。为了开发Unbounded，我们在LLM和视觉生成领域提出了技术创新。具体来说，我们提出：(1) 一种专门的、精炼的大型语言模型（LLM），可以动态生成游戏机制、叙事和角色互动，(2) 一种新的动态区域图像提示适配器（IP-Adapter）用于视觉模型，确保在多个环境中对角色进行一致而灵活的视觉生成。我们通过定性和定量分析评估了我们的系统，显示相对于传统相关方法，角色生活模拟、用户指导遵循、叙事连贯性以及角色和环境的视觉一致性均有显著改进。

更新时间: 2024-10-24 17:59:31

领域: cs.CV,cs.AI,cs.CL,cs.GR,cs.LG

下载: http://arxiv.org/abs/2410.18975v1

3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation

Multi-view image diffusion models have significantly advanced open-domain 3D object generation. However, most existing models rely on 2D network architectures that lack inherent 3D biases, resulting in compromised geometric consistency. To address this challenge, we introduce 3D-Adapter, a plug-in module designed to infuse 3D geometry awareness into pretrained image diffusion models. Central to our approach is the idea of 3D feedback augmentation: for each denoising step in the sampling loop, 3D-Adapter decodes intermediate multi-view features into a coherent 3D representation, then re-encodes the rendered RGBD views to augment the pretrained base model through feature addition. We study two variants of 3D-Adapter: a fast feed-forward version based on Gaussian splatting and a versatile training-free version utilizing neural fields and meshes. Our extensive experiments demonstrate that 3D-Adapter not only greatly enhances the geometry quality of text-to-multi-view models such as Instant3D and Zero123++, but also enables high-quality 3D generation using the plain text-to-image Stable Diffusion. Furthermore, we showcase the broad application potential of 3D-Adapter by presenting high quality results in text-to-3D, image-to-3D, text-to-texture, and text-to-avatar tasks.

Updated: 2024-10-24 17:59:30

标题: 3D-Adapter：几何一致的多视角扩散技术用于高质量3D生成

摘要: 多视图图像扩散模型显著推进了开放领域的3D对象生成。然而，大多数现有模型依赖于缺乏固有3D偏差的2D网络架构，导致几何一致性受损。为了解决这一挑战，我们引入了3D-Adapter，这是一个插件模块，旨在将3D几何意识融入预训练的图像扩散模型中。我们方法的核心是3D反馈增强的思想：对于采样循环中的每个去噪步骤，3D-Adapter将中间多视图特征解码为连贯的3D表示，然后重新编码渲染的RGBD视图，通过特征添加增强预训练基础模型。我们研究了两个3D-Adapter的变体：基于高斯喷洒的快速前向版本和利用神经场和网格的多功能无需训练版本。我们的广泛实验表明，3D-Adapter不仅极大地提升了文本到多视图模型如Instant3D和Zero123++的几何质量，还使得使用纯文本到图像稳定扩散进行高质量的3D生成成为可能。此外，我们通过在文本到3D、图像到3D、文本到纹理和文本到头像任务中展示高质量的结果，展示了3D-Adapter的广泛应用潜力。

更新时间: 2024-10-24 17:59:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.18974v1

Tuning-free coreset Markov chain Monte Carlo

A Bayesian coreset is a small, weighted subset of a data set that replaces the full data during inference to reduce computational cost. The state-of-the-art coreset construction algorithm, Coreset Markov chain Monte Carlo (Coreset MCMC), uses draws from an adaptive Markov chain targeting the coreset posterior to train the coreset weights via stochastic gradient optimization. However, the quality of the constructed coreset, and thus the quality of its posterior approximation, is sensitive to the stochastic optimization learning rate. In this work, we propose a learning-rate-free stochastic gradient optimization procedure, Hot-start Distance over Gradient (Hot DoG), for training coreset weights in Coreset MCMC without user tuning effort. Empirical results demonstrate that Hot DoG provides higher quality posterior approximations than other learning-rate-free stochastic gradient methods, and performs competitively to optimally-tuned ADAM.

Updated: 2024-10-24 17:59:23

标题: 无调参核心集马尔可夫链蒙特卡罗

摘要: 贝叶斯核心集是数据集的一个小型加权子集，用于在推理过程中替代完整数据以减少计算成本。最先进的核心集构建算法，核心集马尔科夫链蒙特卡罗（核心集MCMC），利用自适应马尔科夫链的抽样来针对核心集后验进行训练，通过随机梯度优化来确定核心集权重。然而，构建的核心集的质量，以及因此后验近似的质量，对随机优化学习率敏感。在这项工作中，我们提出了一种无学习率的随机梯度优化过程，Hot-start Distance over Gradient（Hot DoG），用于在核心集MCMC中训练核心集权重，无需用户调整。实证结果表明，Hot DoG提供比其他无学习率随机梯度方法更高质量的后验近似，并与优化调整的ADAM表现竞争力强。

更新时间: 2024-10-24 17:59:23

领域: stat.CO,cs.LG

下载: http://arxiv.org/abs/2410.18973v1

Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques

Cognitive decline is a natural part of aging, often resulting in reduced cognitive abilities. In some cases, however, this decline is more pronounced, typically due to disorders such as Alzheimer's disease. Early detection of anomalous cognitive decline is crucial, as it can facilitate timely professional intervention. While medical data can help in this detection, it often involves invasive procedures. An alternative approach is to employ non-intrusive techniques such as speech or handwriting analysis, which do not necessarily affect daily activities. This survey reviews the most relevant methodologies that use deep learning techniques to automate the cognitive decline estimation task, including audio, text, and visual processing. We discuss the key features and advantages of each modality and methodology, including state-of-the-art approaches like Transformer architecture and foundation models. In addition, we present works that integrate different modalities to develop multimodal models. We also highlight the most significant datasets and the quantitative results from studies using these resources. From this review, several conclusions emerge. In most cases, the textual modality achieves the best results and is the most relevant for detecting cognitive decline. Moreover, combining various approaches from individual modalities into a multimodal model consistently enhances performance across nearly all scenarios.

Updated: 2024-10-24 17:59:21

标题: 深入了解认知衰退：利用深度学习技术的非侵入式模式调查

摘要: 认知能力下降是衰老的自然过程，通常会导致认知能力下降。然而，在某些情况下，这种下降更为明显，通常是由于像阿尔茨海默病这样的疾病。及早检测异常的认知能力下降至关重要，因为这可以促使及时的专业干预。虽然医疗数据可以帮助进行这种检测，但通常涉及侵入性程序。另一种方法是采用非侵入性技术，如语音或手写分析，这些技术不一定会影响日常活动。本调查审查了使用深度学习技术自动化认知能力下降估计任务的最相关方法，包括音频、文本和视觉处理。我们讨论了每种方式和方法的关键特点和优势，包括Transformer架构和基础模型等最先进的方法。此外，我们提出了整合不同方式以开发多模型的工作。我们还强调了最重要的数据集以及使用这些资源的研究的定量结果。从这一综述中得出了几个结论。在大多数情况下，文本方式取得了最佳结果，并且对于检测认知能力下降最为相关。此外，将来自各个方式的各种方法结合到多模型中，始终能够在几乎所有情况下提升性能。

更新时间: 2024-10-24 17:59:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.18972v1

ConceptDrift: Uncovering Biases through the Lens of Foundational Models

Datasets and pre-trained models come with intrinsic biases. Most methods rely on spotting them by analysing misclassified samples, in a semi-automated human-computer validation. In contrast, we propose ConceptDrift, a method which analyzes the weights of a linear probe, learned on top a foundational model. We capitalize on the weight update trajectory, which starts from the embedding of the textual representation of the class, and proceeds to drift towards embeddings that disclose hidden biases. Different from prior work, with this approach we can pin-point unwanted correlations from a dataset, providing more than just possible explanations for the wrong predictions. We empirically prove the efficacy of our method, by significantly improving zero-shot performance with biased-augmented prompting. Our method is not bounded to a single modality, and we experiment in this work with both image (Waterbirds, CelebA, Nico++) and text datasets (CivilComments).

Updated: 2024-10-24 17:59:16

标题: 概念漂移：透过基础模型的视角揭示偏见

摘要: 数据集和预训练模型具有固有的偏见。大多数方法依赖于通过分析错误分类的样本来发现它们，在半自动的人机验证中。相比之下，我们提出了ConceptDrift，这是一种分析线性探测器的权重的方法，该探测器在基础模型之上学习。我们利用权重更新轨迹，该轨迹从类的文本表示的嵌入开始，并继续漂移到揭示隐藏偏见的嵌入中。与先前的工作不同，通过这种方法，我们可以准确定位数据集中的不良相关性，提供的不仅仅是对错误预测的可能解释。我们通过显著改进带有偏见增强提示的零-shot性能来实证证明我们方法的有效性。我们的方法不受限于单一模态，并在这项工作中对图像（Waterbirds，CelebA，Nico++）和文本数据集（CivilComments）进行实验。

更新时间: 2024-10-24 17:59:16

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18970v1

PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches

As large language models (LLMs) increasingly shape the AI landscape, fine-tuning pretrained models has become more popular than in the pre-LLM era for achieving optimal performance in domain-specific tasks. However, pretrained LLMs such as ChatGPT are periodically evolved, i.e., model parameters are frequently updated), making it challenging for downstream users with limited resources to keep up with fine-tuning the newest LLMs for their domain application. Even though fine-tuning costs have nowadays been reduced thanks to the innovations of parameter-efficient fine-tuning such as LoRA, not all downstream users have adequate computing for frequent personalization. Moreover, access to fine-tuning datasets, particularly in sensitive domains such as healthcare, could be time-restrictive, making it crucial to retain the knowledge encoded in earlier fine-tuned rounds for future adaptation. In this paper, we present PortLLM, a training-free framework that (i) creates an initial lightweight model update patch to capture domain-specific knowledge, and (ii) allows a subsequent seamless plugging for the continual personalization of evolved LLM at minimal cost. Our extensive experiments cover seven representative datasets, from easier question-answering tasks {BoolQ, SST2} to harder reasoning tasks {WinoGrande, GSM8K}, and models including {Mistral-7B, Llama2, Llama3.1, and Gemma2}, validating the portability of our designed model patches and showcasing the effectiveness of our proposed framework. For instance, PortLLM achieves comparable performance to LoRA fine-tuning with reductions of up to 12.2x in GPU memory usage. Finally, we provide theoretical justifications to understand the portability of our model update patches, which offers new insights into the theoretical dimension of LLMs' personalization.

Updated: 2024-10-24 17:58:52

标题: PortLLM：通过无需训练和可移植的模型补丁个性化不断进化的大型语言模型

摘要: 随着大型语言模型（LLMs）在AI领域中的影响日益增强，微调预训练模型比LLM时代更受欢迎，以实现在特定领域任务中的最佳性能。然而，诸如ChatGPT之类的预训练LLMs会定期演进，即模型参数经常更新，这使得资源有限的下游用户难以跟上为其领域应用微调最新LLMs的步伐。尽管由于LoRA等参数高效微调创新，微调成本如今已经降低，但并非所有下游用户都具备频繁个性化所需的充足计算能力。此外，特别是在敏感领域如医疗保健中，访问微调数据集可能受到时间限制，因此保留早期微调轮次中编码的知识以供未来适应至关重要。在本文中，我们提出了PortLLM，这是一个无需训练的框架，旨在（i）创建一个初始轻量级模型更新补丁以捕捉领域特定知识，并且（ii）允许后续无缝插入对演进LLM的持续个性化，成本最小化。我们的大量实验涵盖了七个代表性数据集，从较为简单的问题回答任务{BoolQ，SST2}到更难的推理任务{WinoGrande，GSM8K}，以及包括{Mistral-7B，Llama2，Llama3.1和Gemma2}在内的模型，验证了我们设计的模型补丁的可移植性，并展示了我们提出的框架的有效性。例如，PortLLM在GPU内存使用方面实现了与LoRA微调相当的性能，减少了高达12.2倍。最后，我们提供理论证明来理解我们模型更新补丁的可移植性，这为LLMs个性化的理论维度提供了新的见解。

更新时间: 2024-10-24 17:58:52

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.10870v2

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Building a generalist model for user interface (UI) understanding is challenging due to various foundational issues, such as platform diversity, resolution variation, and data limitation. In this paper, we introduce Ferret-UI 2, a multimodal large language model (MLLM) designed for universal UI understanding across a wide range of platforms, including iPhone, Android, iPad, Webpage, and AppleTV. Building on the foundation of Ferret-UI, Ferret-UI 2 introduces three key innovations: support for multiple platform types, high-resolution perception through adaptive scaling, and advanced task training data generation powered by GPT-4o with set-of-mark visual prompting. These advancements enable Ferret-UI 2 to perform complex, user-centered interactions, making it highly versatile and adaptable for the expanding diversity of platform ecosystems. Extensive empirical experiments on referring, grounding, user-centric advanced tasks (comprising 9 subtasks $\times$ 5 platforms), GUIDE next-action prediction dataset, and GUI-World multi-platform benchmark demonstrate that Ferret-UI 2 significantly outperforms Ferret-UI, and also shows strong cross-platform transfer capabilities.

Updated: 2024-10-24 17:58:31

标题: Ferret-UI 2：跨平台通用用户界面理解的掌握

摘要: 建立一个通用的用户界面（UI）理解模型是具有挑战性的，因为存在各种基础问题，如平台多样性、分辨率变化和数据限制。在本文中，我们介绍了Ferret-UI 2，这是一个面向广泛平台的多模态大型语言模型（MLLM），包括iPhone、Android、iPad、网页和AppleTV。基于Ferret-UI的基础，Ferret-UI 2引入了三个关键创新：支持多种平台类型，通过自适应缩放实现高分辨率感知，以及由GPT-4o提供支持的具有集合标记视觉提示的高级任务训练数据生成。这些进步使得Ferret-UI 2能够执行复杂的、以用户为中心的交互，使其在不断扩大的平台生态系统中具有高度的多功能性和适应性。对指代、基础、用户中心的高级任务（包括9个子任务×5个平台）、GUIDE下一步行动预测数据集和GUI-World多平台基准的大量实证实验表明，Ferret-UI 2在性能上显著优于Ferret-UI，并且显示出强大的跨平台迁移能力。

更新时间: 2024-10-24 17:58:31

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.18967v1

On the Crucial Role of Initialization for Matrix Factorization

This work revisits the classical low-rank matrix factorization problem and unveils the critical role of initialization in shaping convergence rates for such nonconvex and nonsmooth optimization. We introduce Nystrom initialization, which significantly improves the global convergence of Scaled Gradient Descent (ScaledGD) in both symmetric and asymmetric matrix factorization tasks. Specifically, we prove that ScaledGD with Nystrom initialization achieves quadratic convergence in cases where only linear rates were previously known. Furthermore, we extend this initialization to low-rank adapters (LoRA) commonly used for finetuning foundation models. Our approach, NoRA, i.e., LoRA with Nystrom initialization, demonstrates superior performance across various downstream tasks and model scales, from 1B to 7B parameters, in large language and diffusion models.

Updated: 2024-10-24 17:58:21

标题: 关于矩阵分解中初始化的关键作用

摘要: 这项工作重新审视了经典的低秩矩阵分解问题，并揭示了初始化在塑造此类非凸和非光滑优化的收敛速度中的关键作用。我们引入了Nystrom初始化，显著改善了在对称和非对称矩阵分解任务中缩放梯度下降（ScaledGD）的全局收敛性。具体来说，我们证明了使用Nystrom初始化的ScaledGD在先前仅知道线性速率的情况下实现了二次收敛。此外，我们将这种初始化扩展到低秩适配器（LoRA），这是用于微调基础模型的常用方法。我们的方法NoRA，即LoRA与Nystrom初始化，展现了在各种下游任务和模型规模上（从10亿到70亿参数）的卓越性能，在大型语言和扩散模型中。

更新时间: 2024-10-24 17:58:21

领域: cs.LG,eess.SP,math.OC

下载: http://arxiv.org/abs/2410.18965v1

Learning to Look: Seeking Information for Decision Making via Policy Factorization

Many robot manipulation tasks require active or interactive exploration behavior in order to be performed successfully. Such tasks are ubiquitous in embodied domains, where agents must actively search for the information necessary for each stage of a task, e.g., moving the head of the robot to find information relevant to manipulation, or in multi-robot domains, where one scout robot may search for the information that another robot needs to make informed decisions. We identify these tasks with a new type of problem, factorized Contextual Markov Decision Processes, and propose DISaM, a dual-policy solution composed of an information-seeking policy that explores the environment to find the relevant contextual information and an information-receiving policy that exploits the context to achieve the manipulation goal. This factorization allows us to train both policies separately, using the information-receiving one to provide reward to train the information-seeking policy. At test time, the dual agent balances exploration and exploitation based on the uncertainty the manipulation policy has on what the next best action is. We demonstrate the capabilities of our dual policy solution in five manipulation tasks that require information-seeking behaviors, both in simulation and in the real-world, where DISaM significantly outperforms existing methods. More information at https://robin-lab.cs.utexas.edu/learning2look/.

Updated: 2024-10-24 17:58:11

标题: 学习观察：通过政策因子分解寻求决策信息

摘要: 许多机器人操作任务需要积极或互动的探索行为才能成功完成。这些任务在具体领域中随处可见，代理人必须积极搜索每个阶段任务所需的信息，例如，移动机器人的头部以找到与操作相关的信息，或者在多机器人领域中，一个侦察机器人可以搜索另一个机器人需要做出明智决策的信息。我们将这些任务标识为一种新类型的问题，即分解的上下文马尔可夫决策过程，并提出了DISaM，一个由信息搜索策略和信息接收策略组成的双策略解决方案，前者探索环境以找到相关的上下文信息，后者利用上下文实现操作目标。这种分解使我们能够分别训练两种策略，利用信息接收策略提供奖励来训练信息搜索策略。在测试时，双代理根据操作策略对下一个最佳行动的不确定性平衡探索和开发。我们在需要信息搜索行为的五个操作任务中展示了我们的双策略解决方案的能力，无论是在模拟中还是在现实世界中，DISaM都明显优于现有方法。更多信息请访问https://robin-lab.cs.utexas.edu/learning2look/。

更新时间: 2024-10-24 17:58:11

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2410.18964v1

OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning

Large language models (LLMs) and large multimodal models (LMMs) have shown great potential in automating complex tasks like web browsing and gaming. However, their ability to generalize across diverse applications remains limited, hindering broader utility. To address this challenge, we present OSCAR: Operating System Control via state-Aware reasoning and Re-planning. OSCAR is a generalist agent designed to autonomously navigate and interact with various desktop and mobile applications through standardized controls, such as mouse and keyboard inputs, while processing screen images to fulfill user commands. OSCAR translates human instructions into executable Python code, enabling precise control over graphical user interfaces (GUIs). To enhance stability and adaptability, OSCAR operates as a state machine, equipped with error-handling mechanisms and dynamic task re-planning, allowing it to efficiently adjust to real-time feedback and exceptions. We demonstrate OSCAR's effectiveness through extensive experiments on diverse benchmarks across desktop and mobile platforms, where it transforms complex workflows into simple natural language commands, significantly boosting user productivity. Our code will be open-source upon publication.

Updated: 2024-10-24 17:58:08

标题: 奥斯卡：基于状态感知推理和重新规划的操作系统控制

摘要: 大型语言模型(LLMs)和大型多模态模型(LMMs)已经展示出在自动化复杂任务如网络浏览和游戏方面的巨大潜力。然而，它们在不同应用之间的泛化能力仍然有限，限制了更广泛的实用性。为了解决这一挑战，我们提出了OSCAR:通过状态感知推理和重新规划来控制操作系统。OSCAR是一个通用的代理程序，旨在通过标准化控件，如鼠标和键盘输入，自主地导航和与各种桌面和移动应用程序进行交互，同时处理屏幕图像以完成用户命令。OSCAR将人类指令转化为可执行的Python代码，从而精确控制图形用户界面(GUIs)。为了增强稳定性和适应性，OSCAR作为一个状态机运行，配备了错误处理机制和动态任务重新规划功能，使其能够有效地根据实时反馈和异常调整。我们通过在桌面和移动平台上进行广泛实验展示了OSCAR的有效性，它将复杂的工作流程转化为简单的自然语言命令，显著提高了用户的生产力。我们的代码将在发表后开源。

更新时间: 2024-10-24 17:58:08

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18963v1

Scaling Law with Learning Rate Annealing

We find that the cross-entropy loss curves of neural language models empirically adhere to a scaling law with learning rate (LR) annealing over training steps: $$L(s) = L_0 + A\cdot S_1^{-\alpha} - C\cdot S_2,$$ where $L(s)$ is the validation loss at step $s$, $S_1$ is the area under the LR curve, $S_2$ is the LR annealing area, and $L_0$, $A$, $C$, $\alpha$ are constant parameters. This formulation takes into account two factors: (1) power-law scaling over data size, and (2) the additional loss reduction during LR annealing. Therefore, this formulation can describe the full loss curve at each step, rather than the single loss point at the end of training. Applying the scaling law with LR annealing and fitting only one or two training curves, we can accurately predict the loss at any given step across any learning rate scheduler (LRS). This approach significantly reduces computational cost in formulating scaling laws while providing more accuracy and expressiveness for training dynamics. Extensive experiments demonstrate that our findings hold across a range of hyper-parameters and model architectures, and our equation can extend to scaling effect of model sizes. Moreover, our formulation provides accurate theoretical verification and explanation for empirical results observed in numerous previous studies, particularly those focusing on LR schedule and annealing. We believe that this work is promising to enhance the understanding of LLM training dynamics while greatly democratizing scaling laws, and it can guide researchers in refining training strategies (e.g. critical LRS) for further LLMs.

Updated: 2024-10-24 17:56:14

标题: 学习率退火的缩放定律

摘要: 我们发现神经语言模型的交叉熵损失曲线在训练步骤中经验上遵循学习率（LR）退火的比例定律：$$L(s) = L_0 + A\cdot S_1^{-\alpha} - C\cdot S_2,$$其中$L(s)$是第$s$步的验证损失，$S_1$是LR曲线下的面积，$S_2$是LR退火面积，$L_0$、$A$、$C$、$\alpha$是常数参数。该公式考虑了两个因素：（1）数据大小的幂律比例缩放，以及（2）LR退火期间的额外损失减少。因此，该公式可以描述每一步的完整损失曲线，而不是训练结束时的单一损失点。应用具有LR退火的比例定律并拟合一个或两个训练曲线，我们可以准确预测任何给定步骤下的损失，无论采用何种学习率调度器（LRS）。这种方法显著降低了制定比例定律的计算成本，同时为训练动态提供更准确和更具表现力的描述。大量实验表明，我们的发现适用于一系列超参数和模型架构，并且我们的方程可以扩展到模型大小的比例效应。此外，我们的公式为之前许多研究中观察到的经验结果提供了准确的理论验证和解释，特别是那些专注于LR调度和退火的研究。我们相信这项工作有望增进对LLM训练动态的理解，同时大大推动比例定律的民主化，并可指导研究人员完善进一步的LLM训练策略（例如关键的学习率调度器）。

更新时间: 2024-10-24 17:56:14

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.11029v2

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Forecasting is a critical task in decision making across various domains. While numerical data provides a foundation, it often lacks crucial context necessary for accurate predictions. Human forecasters frequently rely on additional information, such as background knowledge or constraints, which can be efficiently communicated through natural language. However, the ability of existing forecasting models to effectively integrate this textual information remains an open question. To address this, we introduce "Context is Key" (CiK), a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context, requiring models to integrate both modalities. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters, and propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings. By presenting this benchmark, we aim to advance multimodal forecasting, promoting models that are both accurate and accessible to decision-makers with varied technical expertise. The benchmark can be visualized at https://servicenow.github.io/context-is-key-forecasting/v0/ .

Updated: 2024-10-24 17:56:08

标题: 关键在于上下文：具有基本文本信息的预测基准

摘要: 预测是决策制定在各个领域中的关键任务。虽然数字数据提供了基础，但往往缺乏准确预测所需的关键背景信息。人类预测者经常依赖额外的信息，如背景知识或约束条件，这些信息可以通过自然语言有效传达。然而，现有的预测模型能否有效整合这些文本信息仍然是一个悬而未决的问题。为了解决这个问题，我们引入了“关键在于上下文”（CiK），这是一个时间序列预测基准，将数字数据与各种精心制作的文本上下文配对，要求模型整合这两种模态。我们评估了一系列方法，包括统计模型、时间序列基础模型和基于LLM的预测者，并提出了一种简单而有效的LLM提示方法，该方法在我们的基准测试中胜过所有其他测试方法。我们的实验突显了整合上下文信息的重要性，在使用基于LLM的预测模型时展示了令人惊讶的表现，同时也揭示了一些关键缺点。通过提出这一基准，我们旨在推进多模态预测，促进既准确又适用于各种技术水平决策者的模型。该基准可以在https://servicenow.github.io/context-is-key-forecasting/v0/ 上进行可视化。

更新时间: 2024-10-24 17:56:08

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.18959v1

Stable Consistency Tuning: Understanding and Improving Consistency Models

Diffusion models achieve superior generation quality but suffer from slow generation speed due to the iterative nature of denoising. In contrast, consistency models, a new generative family, achieve competitive performance with significantly faster sampling. These models are trained either through consistency distillation, which leverages pretrained diffusion models, or consistency training/tuning directly from raw data. In this work, we propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference~(TD) Learning. More importantly, this framework allows us to analyze the limitations of current consistency training/tuning strategies. Built upon Easy Consistency Tuning (ECT), we propose Stable Consistency Tuning (SCT), which incorporates variance-reduced learning using the score identity. SCT leads to significant performance improvements on benchmarks such as CIFAR-10 and ImageNet-64. On ImageNet-64, SCT achieves 1-step FID 2.42 and 2-step FID 1.55, a new SoTA for consistency models.

Updated: 2024-10-24 17:55:52

标题: 稳定一致性调整：理解和改进一致性模型

摘要: 扩散模型在生成质量上表现出色，但由于去噪的迭代性质，生成速度较慢。相比之下，一种新的生成模型家族——一致性模型，具有竞争性能，并且采样速度明显更快。这些模型通过一致性蒸馏或直接从原始数据进行一致性训练/调整来进行训练。在这项工作中，我们提出了一个新颖的框架，通过将扩散模型的去噪过程建模为马尔可夫决策过程（MDP），并将一致性模型训练视为通过时序差分学习进行值估计来理解一致性模型。更重要的是，这个框架使我们能够分析当前一致性训练/调整策略的局限性。基于Easy Consistency Tuning（ECT），我们提出了Stable Consistency Tuning（SCT），它采用了使用分数恒等式的方差缩减学习。SCT在CIFAR-10和ImageNet-64等基准测试上实现了显著的性能提升。在ImageNet-64上，SCT实现了1步FID 2.42和2步FID 1.55，成为一致性模型的新的最优结果。

更新时间: 2024-10-24 17:55:52

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.18958v1

Mixture of Parrots: Experts improve memorization more than reasoning

The Mixture-of-Experts (MoE) architecture enables a significant increase in the total number of model parameters with minimal computational overhead. However, it is not clear what performance tradeoffs, if any, exist between MoEs and standard dense transformers. In this paper, we show that as we increase the number of experts (while fixing the number of active parameters), the memorization performance consistently increases while the reasoning capabilities saturate. We begin by analyzing the theoretical limitations of MoEs at reasoning. We prove that there exist graph problems that cannot be solved by any number of experts of a certain width; however, the same task can be easily solved by a dense model with a slightly larger width. On the other hand, we find that on memory-intensive tasks, MoEs can effectively leverage a small number of active parameters with a large number of experts to memorize the data. We empirically validate these findings on synthetic graph problems and memory-intensive closed book retrieval tasks. Lastly, we pre-train a series of MoEs and dense transformers and evaluate them on commonly used benchmarks in math and natural language. We find that increasing the number of experts helps solve knowledge-intensive tasks, but fails to yield the same benefits for reasoning tasks.

Updated: 2024-10-24 17:54:41

标题: 鹦鹉混合：专家在记忆方面比推理方面表现更好

摘要: 混合专家(MoE)架构可以在最小的计算开销下显著增加模型参数的总数。然而，目前尚不清楚在MoE和标准密集变压器之间是否存在性能权衡。本文表明，随着专家数量的增加(同时固定活跃参数的数量)，记忆性能持续增加，而推理能力达到饱和。我们首先分析MoE在推理方面的理论限制。我们证明存在某些宽度的专家无法解决某些图问题；然而，同样的任务可以很容易地通过稍大宽度的密集模型解决。另一方面，我们发现在记忆密集型任务中，MoE可以有效利用大量专家的少量活跃参数来记忆数据。我们在合成图问题和记忆密集型闭卷检索任务上进行实证验证。最后，我们预先训练一系列MoE和密集变压器，并在数学和自然语言常用基准测试中对它们进行评估。我们发现增加专家数量有助于解决知识密集型任务，但在推理任务方面却无法获得同样的好处。

更新时间: 2024-10-24 17:54:41

领域: cs.LG

下载: http://arxiv.org/abs/2410.19034v1

TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks

Advances in machine learning research drive progress in real-world applications. To ensure this progress, it is important to understand the potential pitfalls on the way from a novel method's success on academic benchmarks to its practical deployment. In this work, we analyze existing tabular benchmarks and find two common characteristics of tabular data in typical industrial applications that are underrepresented in the datasets usually used for evaluation in the literature. First, in real-world deployment scenarios, distribution of data often changes over time. To account for this distribution drift, time-based train/test splits should be used in evaluation. However, popular tabular datasets often lack timestamp metadata to enable such evaluation. Second, a considerable portion of datasets in production settings stem from extensive data acquisition and feature engineering pipelines. This can have an impact on the absolute and relative number of predictive, uninformative, and correlated features compared to academic datasets. In this work, we aim to understand how recent research advances in tabular deep learning transfer to these underrepresented conditions. To this end, we introduce TabReD -- a collection of eight industry-grade tabular datasets. We reassess a large number of tabular ML models and techniques on TabReD. We demonstrate that evaluation on time-based data splits leads to different methods ranking, compared to evaluation on random splits, which are common in current benchmarks. Furthermore, simple MLP-like architectures and GBDT show the best results on the TabReD datasets, while other methods are less effective in the new setting.

Updated: 2024-10-24 17:54:37

标题: TabReD：分析表格式深度学习基准中的缺陷并填补空白。

摘要: 机器学习研究的进展推动了现实世界应用的进步。为了确保这一进步，了解从新方法在学术基准上取得成功到实际部署的过程中潜在的陷阱是很重要的。在这项工作中，我们分析了现有的表格基准，并发现了在文献中通常用于评估的数据集中往往忽视的典型工业应用中表格数据的两个共同特征。首先，在实际部署场景中，数据的分布经常随时间变化。为了考虑这种分布漂移，评估应该使用基于时间的训练/测试拆分。然而，流行的表格数据集通常缺乏时间戳元数据，无法进行这样的评估。其次，在生产环境中，相当一部分数据集来自于广泛的数据获取和特征工程流程。这可能会对预测、无信息和相关特征的绝对和相对数量产生影响，与学术数据集相比。在这项工作中，我们旨在了解最近在表格深度学习领域取得的研究进展如何转移到这些被忽视的条件下。为此，我们引入了TabReD - 一个包含八个行业级表格数据集的集合。我们在TabReD上重新评估了大量的表格机器学习模型和技术。我们展示了基于时间数据拆分的评估会导致不同的方法排名，与在当前基准中常见的随机拆分评估相比。此外，简单的MLP类似架构和GBDT在TabReD数据集上显示出最佳结果，而其他方法在新环境中效果较差。

更新时间: 2024-10-24 17:54:37

领域: cs.LG

下载: http://arxiv.org/abs/2406.19380v4

Learning Structured Compressed Sensing with Automatic Resource Allocation

Multidimensional data acquisition often requires extensive time and poses significant challenges for hardware and software regarding data storage and processing. Rather than designing a single compression matrix as in conventional compressed sensing, structured compressed sensing yields dimension-specific compression matrices, reducing the number of optimizable parameters. Recent advances in machine learning (ML) have enabled task-based supervised learning of subsampling matrices, albeit at the expense of complex downstream models. Additionally, the sampling resource allocation across dimensions is often determined in advance through heuristics. To address these challenges, we introduce Structured COmpressed Sensing with Automatic Resource Allocation (SCOSARA) with an information theory-based unsupervised learning strategy. SCOSARA adaptively distributes samples across sampling dimensions while maximizing Fisher information content. Using ultrasound localization as a case study, we compare SCOSARA to state-of-the-art ML-based and greedy search algorithms. Simulation results demonstrate that SCOSARA can produce high-quality subsampling matrices that achieve lower Cram\'er-Rao Bound values than the baselines. In addition, SCOSARA outperforms other ML-based algorithms in terms of the number of trainable parameters, computational complexity, and memory requirements while automatically choosing the number of samples per axis.

Updated: 2024-10-24 17:53:33

标题: 学习具有自动资源分配的结构化压缩感知

摘要: 多维数据采集通常需要大量时间，并对硬件和软件在数据存储和处理方面提出了重大挑战。与传统的压缩感知中设计单一压缩矩阵不同，结构化压缩感知产生特定于维度的压缩矩阵，减少了可优化参数的数量。最近机器学习（ML）的进展使得可以通过基于任务的监督学习来学习子采样矩阵，尽管这需要复杂的下游模型。另外，跨维度的采样资源分配通常是通过启发式方法提前确定的。为了解决这些挑战，我们引入了一种基于信息论的无监督学习策略的自动资源分配的结构化压缩感知（SCOSARA）。SCOSARA自适应地将样本分布到采样维度，同时最大化费舍尔信息内容。以超声定位为案例研究，我们将SCOSARA与最先进的基于ML和贪婪搜索算法进行比较。模拟结果表明，SCOSARA可以生成高质量的子采样矩阵，其Cram\'er-Rao边界值低于基线。此外，SCOSARA在可训练参数数量、计算复杂度和内存要求方面优于其他基于ML的算法，同时自动选择每个轴的样本数量。

更新时间: 2024-10-24 17:53:33

领域: cs.LG

下载: http://arxiv.org/abs/2410.18954v1

The Learning Stabilizers with Noise problem

Random classical codes have good error correcting properties, and yet they are notoriously hard to decode in practice. Despite many decades of extensive study, the fastest known algorithms still run in exponential time. The Learning Parity with Noise (LPN) problem, which can be seen as the task of decoding a random linear code in the presence of noise, has thus emerged as a prominent hardness assumption with numerous applications in both cryptography and learning theory. Is there a natural quantum analog of the LPN problem? In this work, we introduce the Learning Stabilizers with Noise (LSN) problem, the task of decoding a random stabilizer code in the presence of local depolarizing noise. We give both polynomial-time and exponential-time quantum algorithms for solving LSN in various depolarizing noise regimes, ranging from extremely low noise, to low constant noise rates, and even higher noise rates up to a threshold. Next, we provide concrete evidence that LSN is hard. First, we show that LSN includes LPN as a special case, which suggests that it is at least as hard as its classical counterpart. Second, we prove a worst-case to average-case reduction for variants of LSN. We then ask: what is the computational complexity of solving LSN? Because the task features quantum inputs, its complexity cannot be characterized by traditional complexity classes. Instead, we show that the LSN problem lies in a recently introduced (distributional and oracle) unitary synthesis class. Finally, we identify several applications of our LSN assumption, ranging from the construction of quantum bit commitment schemes to the computational limitations of learning from quantum data.

Updated: 2024-10-24 17:53:02

标题: 带有噪声的学习稳定器问题

摘要: 随机经典码具有良好的纠错特性，但在实践中解码起来极其困难。尽管经过数十年的广泛研究，已知的最快算法仍然运行在指数时间内。带有噪音的学习奇偶校验（LPN）问题，可以看作是在噪音存在的情况下解码随机线性码的任务，因此已经成为一个具有重要意义的难度假设，在密码学和学习理论中有着许多应用。有没有LPN问题的自然量子类比？在这项工作中，我们引入了带有噪音的学习稳定器（LSN）问题，即在局部去极化噪音存在的情况下解码随机稳定器码的任务。我们提供了解决LSN问题的多项式时间和指数时间量子算法，适用于各种去极化噪音范围，从极低噪音到低常数噪音率，甚至高于阈值的更高噪音率。接下来，我们提供了LSN是困难的具体证据。首先，我们展示LSN包含LPN作为特例，这表明它至少与其经典对应物一样困难。其次，我们证明了LSN各种变体的最坏情况到平均情况的降低。然后我们提出一个问题：解决LSN的计算复杂度是多少？因为该任务涉及量子输入，其复杂度无法用传统的复杂度类来描述。相反，我们表明LSN问题属于最近引入的（分布和oracle）酉合成类。最后，我们确定了我们LSN假设的几个应用，包括构建量子比特承诺方案以及从量子数据中学习的计算限制。

更新时间: 2024-10-24 17:53:02

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2410.18953v1

Dynamic Vocabulary Pruning in Early-Exit LLMs

Increasing the size of large language models (LLMs) has been shown to lead to better performance. However, this comes at the cost of slower and more expensive inference. Early-exiting is a promising approach for improving the efficiency of LLM inference by enabling next token prediction at intermediate layers. Yet, the large vocabulary size in modern LLMs makes the confidence estimation required for exit decisions computationally expensive, diminishing the efficiency gains. To address this, we propose dynamically pruning the vocabulary at test time for each token. Specifically, the vocabulary is pruned at one of the initial layers, and the smaller vocabulary is then used throughout the rest of the forward pass. Our experiments demonstrate that such post-hoc dynamic vocabulary pruning improves the efficiency of confidence estimation in early-exit LLMs while maintaining competitive performance.

Updated: 2024-10-24 17:52:31

标题: 在早期退出LLMs中的动态词汇修剪

摘要: 增加大型语言模型（LLMs）的规模已被证明可以提高性能。然而，这样做的代价是推理速度更慢、成本更高。提前退出是一种改善LLM推理效率的有前途的方法，通过在中间层实现下一个标记预测。然而，现代LLMs中的大词汇量使得用于退出决策的置信度估计计算成本昂贵，从而降低了效率增益。为了解决这个问题，我们提出在每个标记的测试时动态修剪词汇表。具体而言，词汇表在初始层之一被修剪，然后在整个前向传递过程中使用较小的词汇表。我们的实验表明，这种事后动态词汇修剪改善了提前退出LLMs中置信度估计的效率，同时保持了竞争性能。

更新时间: 2024-10-24 17:52:31

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18952v1

Adjusted Overfitting Regression

In this paper, I will introduce a new form of regression, that can adjust overfitting and underfitting through, "distance-based regression." Overfitting often results in finding false patterns causing inaccurate results, so by having a new approach that minimizes overfitting, more accurate predictions can be derived. Then I will proceed with a test of my regression form and show additional ways to optimize the regression. Finally, I will apply my new technique to a specific data set to demonstrate its practical value.

Updated: 2024-10-24 17:50:08

标题: 调整过度拟合回归

摘要: 在这篇论文中，我将介绍一种新形式的回归，可以通过“基于距离的回归”来调整过度拟合和拟合不足。过度拟合通常会导致发现虚假模式，从而产生不准确的结果，因此通过采用一种最小化过度拟合的新方法，可以得出更准确的预测。然后我将进行对我的回归形式的测试，并展示进一步优化回归的方式。最后，我将应用我的新技术到特定数据集中，以展示其实际价值。

更新时间: 2024-10-24 17:50:08

领域: cs.LG

下载: http://arxiv.org/abs/2410.18950v1

Estimating the Spectral Moments of the Kernel Integral Operator from Finite Sample Matrices

Analyzing the structure of sampled features from an input data distribution is challenging when constrained by limited measurements in both the number of inputs and features. Traditional approaches often rely on the eigenvalue spectrum of the sample covariance matrix derived from finite measurement matrices; however, these spectra are sensitive to the size of the measurement matrix, leading to biased insights. In this paper, we introduce a novel algorithm that provides unbiased estimates of the spectral moments of the kernel integral operator in the limit of infinite inputs and features from finitely sampled measurement matrices. Our method, based on dynamic programming, is efficient and capable of estimating the moments of the operator spectrum. We demonstrate the accuracy of our estimator on radial basis function (RBF) kernels, highlighting its consistency with the theoretical spectra. Furthermore, we showcase the practical utility and robustness of our method in understanding the geometry of learned representations in neural networks.

Updated: 2024-10-24 17:47:20

标题: 估计核积分算子的频谱矩阵从有限样本矩阵

摘要: 通过从输入数据分布中采样的特征的结构分析在输入和特征数量有限的情况下是具有挑战性的。传统方法通常依赖于从有限测量矩阵导出的样本协方差矩阵的特征值谱；然而，这些谱对测量矩阵的大小敏感，导致偏倚的见解。在本文中，我们介绍了一种新的算法，它能够在有限采样测量矩阵的情况下提供对核积分算子的谱矩的无偏估计，从而在输入和特征数量趋于无穷时。我们的方法基于动态规划，高效且能够估计算子谱的矩。我们通过径向基函数（RBF）核展示了我们估计器的准确性，突出了它与理论谱的一致性。此外，我们展示了我们的方法在理解神经网络中学习表示的几何结构方面的实用性和鲁棒性。

更新时间: 2024-10-24 17:47:20

领域: cs.LG,math.SP,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2410.17998v2

White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs

Social biases can manifest in language agency. While several studies approached agency-related bias in human-written language, very limited research has investigated such biases in Large Language Model (LLM)-generated content. In addition, previous works often rely on string-matching techniques to identify agentic and communal words within texts, which fall short of accurately classifying language agency. We introduce the novel Language Agency Bias Evaluation (LABE) benchmark, which comprehensively evaluates biases in LLMs by analyzing agency levels attributed to different demographic groups in model generations. LABE leverages 5,400 template-based prompts, an accurate agency classifier, and corresponding bias metrics to test for gender, racial, and intersectional language agency biases in LLMs on 3 text generation tasks: biographies, professor reviews, and reference letters. We also contribute the Language Agency Classification (LAC) dataset, consisting of 3,724 agentic and communal sentences. Using LABE, we unveil language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral. We observe that: (1) LLM generations tend to demonstrate greater gender bias than human-written texts; (2) Models demonstrate remarkably higher levels of intersectional bias than the other bias aspects. Those who are at the intersection of gender and racial minority groups--such as Black females--are consistently described by texts with lower levels of agency, aligning with real-world social inequalities; (3) Among the 3 LLMs investigated, Llama3 demonstrates the greatest overall bias; (4) Not only does prompt-based mitigation fail to resolve language agency bias in LLMs, but it frequently leads to the exacerbation of biases in generated texts.

Updated: 2024-10-24 17:43:28

标题: 白人男性领导，黑人女性帮忙？在LLMs中基准化语言代理社会偏见

摘要: 社会偏见可以在语言代理中表现出来。虽然有几项研究探讨了人类书写语言中与代理相关的偏见，但在大型语言模型（LLM）生成的内容中，对这种偏见的研究非常有限。此外，先前的研究通常依赖于字符串匹配技术来识别文本中的代理和社群词语，这种方法在准确分类语言代理方面存在缺陷。我们引入了新颖的语言代理偏见评估（LABE）基准，通过分析模型生成的不同人口群体所归属的代理水平，全面评估LLMs中的偏见。LABE利用5,400个基于模板的提示、准确的代理分类器和相应的偏见度量来测试LLMs在3个文本生成任务（传记、教授评价和推荐信）中的性别、种族和交叉语言代理偏见。我们还贡献了包含3,724个代理和社群句子的语言代理分类（LAC）数据集。通过使用LABE，我们揭示了3个最近LLMs中的语言代理社会偏见：ChatGPT、Llama3和Mistral。我们观察到：（1）LLM生成往往表现出比人类书写文本更大的性别偏见；（2）模型展示出明显更高水平的交叉偏见比其他偏见方面。那些处于性别和种族少数群体交叉点的人，如黑人女性，被文本一贯描述为具有较低水平的代理，与现实世界中的社会不平等相一致；（3）在调查的3个LLMs中，Llama3表现出最大的整体偏见；（4）基于提示的缓解措施不仅无法解决LLMs中的语言代理偏见，而且经常导致生成文本中偏见的加剧。

更新时间: 2024-10-24 17:43:28

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2404.10508v4

Influence Functions for Scalable Data Attribution in Diffusion Models

Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges in diffusion models by developing an \textit{influence functions} framework. Influence function-based data attribution methods approximate how a model's output would have changed if some training data were removed. In supervised learning, this is usually used for predicting how the loss on a particular example would change. For diffusion models, we focus on predicting the change in the probability of generating a particular example via several proxy measurements. We show how to formulate influence functions for such quantities and how previously proposed methods can be interpreted as particular design choices in our framework. To ensure scalability of the Hessian computations in influence functions, we systematically develop K-FAC approximations based on generalised Gauss-Newton matrices specifically tailored to diffusion models. We recast previously proposed methods as specific design choices in our framework and show that our recommended method outperforms previous data attribution approaches on common evaluations, such as the Linear Data-modelling Score (LDS) or retraining without top influences, without the need for method-specific hyperparameter tuning.

Updated: 2024-10-24 17:43:00

标题: 扩展模型中可扩展数据归因的影响函数

摘要: 扩散模型在生成建模方面取得了重大进展。然而，它们的广泛应用在数据归因和可解释性方面提出了挑战。本文旨在通过开发“影响函数”框架来帮助解决扩散模型中的这些挑战。基于影响函数的数据归因方法近似模型的输出在去除某些训练数据后会发生的变化。在监督学习中，这通常用于预测特定示例上损失会如何改变。对于扩散模型，我们专注于通过几个代理测量来预测生成特定示例的概率变化。我们展示了如何为这些量制定影响函数，并且先前提出的方法如何被解释为我们框架中的特定设计选择。为了确保影响函数中的Hessian计算的可扩展性，我们系统地开发了基于广义Gauss-Newton矩阵的K-FAC近似，专门针对扩散模型。我们重新构建了先前提出的方法作为我们框架中的特定设计选择，并展示我们推荐的方法在常见评估中优于先前的数据归因方法，如线性数据建模分数（LDS）或重新训练而不使用顶级影响，而无需进行特定方法的超参数调整。

更新时间: 2024-10-24 17:43:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.13850v2

A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU

Deep learning (DL) has emerged as a powerful subset of machine learning (ML) and artificial intelligence (AI), outperforming traditional ML methods, especially in handling unstructured and large datasets. Its impact spans across various domains, including speech recognition, healthcare, autonomous vehicles, cybersecurity, predictive analytics, and more. However, the complexity and dynamic nature of real-world problems present challenges in designing effective deep learning models. Consequently, several deep learning models have been developed to address different problems and applications. In this article, we conduct a comprehensive survey of various deep learning models, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Models, Deep Reinforcement Learning (DRL), and Deep Transfer Learning. We examine the structure, applications, benefits, and limitations of each model. Furthermore, we perform an analysis using three publicly available datasets: IMDB, ARAS, and Fruit-360. We compare the performance of six renowned deep learning models: CNN, Simple RNN, Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional GRU.

Updated: 2024-10-24 17:41:58

标题: 深度学习模型的全面概述和比较分析：CNN、RNN、LSTM、GRU

摘要: 深度学习（DL）已经成为机器学习（ML）和人工智能（AI）的一个强大子集，在处理非结构化和大型数据集方面表现优于传统的ML方法。其影响跨越各个领域，包括语音识别、医疗保健、自动驾驶车辆、网络安全、预测分析等。然而，现实世界问题的复杂性和动态性给设计有效的深度学习模型带来了挑战。因此，已经开发了多种深度学习模型来解决不同的问题和应用。在本文中，我们对各种深度学习模型进行了全面的调查，包括卷积神经网络（CNNs）、循环神经网络（RNNs）、生成模型、深度强化学习（DRL）和深度迁移学习。我们研究了每种模型的结构、应用、优点和局限性。此外，我们使用三个公开可用的数据集：IMDB、ARAS和Fruit-360进行了分析。我们比较了六种著名的深度学习模型的性能：CNN、简单RNN、长短期记忆（LSTM）、双向LSTM、门控循环单元（GRU）和双向GRU。

更新时间: 2024-10-24 17:41:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2305.17473v3

Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models

With models getting stronger, evaluations have grown more complex, testing multiple skills in one benchmark and even in the same instance at once. However, skill-wise performance is obscured when inspecting aggregate accuracy, under-utilizing the rich signal modern benchmarks contain. We propose an automatic approach to recover the underlying skills relevant for any evaluation instance, by way of inspecting model-generated rationales. After validating the relevance of rationale-parsed skills and inferring skills for $46$k instances over $12$ benchmarks, we observe many skills to be common across benchmarks, resulting in the curation of hundreds of skill-slices (i.e. sets of instances testing a common skill). Inspecting accuracy over these slices yields novel insights on model trade-offs: e.g., compared to GPT-4o and Claude 3.5 Sonnet, on average, Gemini 1.5 Pro is $18\%$ more accurate in "computing molar mass", but $19\%$ less accurate in "applying constitutional law", despite the overall accuracies of the three models differing by a mere $0.4\%$. Furthermore, we demonstrate the practical utility of our approach by showing that insights derived from skill slice analysis can generalize to held-out instances: when routing each instance to the model strongest on the relevant skills, we see a $3\%$ accuracy improvement over our $12$ dataset corpus. Our skill-slices and framework open a new avenue in model evaluation, leveraging skill-specific analyses to unlock a more granular and actionable understanding of model capabilities.

Updated: 2024-10-24 17:27:22

标题: 揭示技能水平洞见，了解基础模型的权衡Trade-Offs

摘要: 随着模型变得更加强大，评估变得更加复杂，一次在一个基准测试中甚至在同一实例中测试多种技能。然而，当检查综合准确性时，技能方面的表现被掩盖了，未充分利用现代基准测试中包含的丰富信号。我们提出了一种自动方法，通过检查模型生成的理由来恢复与任何评估实例相关的基本技能。在验证了理由解析技能的相关性并推断出46K个实例在12个基准测试中的技能之后，我们发现许多技能在各个基准测试之间是共通的，导致了数百个技能片段的整理（即测试共同技能的实例集）。检查这些片段的准确性可以提供关于模型权衡的新见解：例如，与GPT-4o和Claude 3.5 Sonnet相比，Gemini 1.5 Pro在“计算摩尔质量”方面平均准确性提高了18％，但在“应用宪法法律”方面准确性降低了19％，尽管这三个模型的整体准确性仅相差0.4％。此外，我们通过展示从技能片段分析中得出的见解可以推广到留存实例，展示了我们方法的实际效用：当将每个实例路由到在相关技能上最强的模型时，我们看到在我们的12个数据集语料库上的3％准确性提高。我们的技能片段和框架开辟了一个新的模型评估途径，利用特定于技能的分析来解锁对模型能力更细粒度和可操作性的理解。

更新时间: 2024-10-24 17:27:22

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.13826v2

Pointer Networks with Q-Learning for Combinatorial Optimization

We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets) to enhance the optimality of attention-based sequence generation, focusing on long-term outcomes. This integration proves particularly effective in solving combinatorial optimization (CO) tasks, especially the Travelling Salesman Problem (TSP), which is the focus of our study. We address this challenge by defining a Markov Decision Process (MDP) compatible with PQN, which involves iterative graph embedding, encoding and decoding by an LSTM-based recurrent neural network. This process generates a context vector and computes raw attention scores, which are dynamically adjusted by Q-values calculated for all available state-action pairs before applying softmax. The resulting attention vector is utilized as an action distribution, with actions selected hinged to exploration-exploitation dynamic adaptibility of PQN. Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.

Updated: 2024-10-24 17:25:19

标题: 使用Q学习的指针网络用于组合优化

摘要: 我们介绍了指针Q网络（PQN），这是一种混合神经架构，将无模型Q值策略逼近与指针网络（Ptr-Nets）结合起来，以增强基于注意力的序列生成的最优性，重点放在长期结果上。这种集成在解决组合优化（CO）任务中特别有效，特别是旅行推销员问题（TSP），这是我们研究的重点。我们通过定义与PQN兼容的马尔可夫决策过程（MDP）来解决这一挑战，其中包括通过基于LSTM的递归神经网络进行迭代图嵌入、编码和解码。这个过程生成一个上下文向量，并计算原始注意力分数，这些分数通过所有可用状态-动作对的Q值在应用softmax之前进行动态调整。生成的注意力向量被用作动作分布，动作的选择取决于PQN的探索-利用动态适应性。我们的实证结果证明了这种方法的有效性，同时也在不稳定环境中测试了该模型。

更新时间: 2024-10-24 17:25:19

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2311.02629v4

A Random Matrix Theory Perspective on the Spectrum of Learned Features and Asymptotic Generalization Capabilities

A key property of neural networks is their capacity of adapting to data during training. Yet, our current mathematical understanding of feature learning and its relationship to generalization remain limited. In this work, we provide a random matrix analysis of how fully-connected two-layer neural networks adapt to the target function after a single, but aggressive, gradient descent step. We rigorously establish the equivalence between the updated features and an isotropic spiked random feature model, in the limit of large batch size. For the latter model, we derive a deterministic equivalent description of the feature empirical covariance matrix in terms of certain low-dimensional operators. This allows us to sharply characterize the impact of training in the asymptotic feature spectrum, and in particular, provides a theoretical grounding for how the tails of the feature spectrum modify with training. The deterministic equivalent further yields the exact asymptotic generalization error, shedding light on the mechanisms behind its improvement in the presence of feature learning. Our result goes beyond standard random matrix ensembles, and therefore we believe it is of independent technical interest. Different from previous work, our result holds in the challenging maximal learning rate regime, is fully rigorous and allows for finitely supported second layer initialization, which turns out to be crucial for studying the functional expressivity of the learned features. This provides a sharp description of the impact of feature learning in the generalization of two-layer neural networks, beyond the random features and lazy training regimes.

Updated: 2024-10-24 17:24:34

标题: 一种随机矩阵理论视角下对学习特征谱和渐近泛化能力的研究

摘要: 神经网络的一个关键特性是它们在训练过程中能够适应数据。然而，我们对特征学习及其与泛化的关系的当前数学理解仍然有限。在本研究中，我们对全连接两层神经网络在经过单个但激进的梯度下降步骤后如何适应目标函数进行了随机矩阵分析。我们严格建立了更新后特征与各向同性尖峰随机特征模型之间的等价性，在大批量大小的极限情况下。对于后者模型，我们推导出特征经验协方差矩阵的确定性等效描述，其中涉及某些低维算子。这使我们能够清晰地表征训练在渐近特征谱中的影响，并特别提供了特征谱尾部如何随训练而修改的理论基础。确定性等效进一步给出了精确的渐近泛化错误，揭示了在特征学习存在的情况下其改进背后的机制。我们的结果超越了标准的随机矩阵集合，因此我们认为它具有独立的技术兴趣。与先前的工作不同，我们的结果适用于具有挑战性的最大学习速率范围，是完全严格的，并允许有限支持的第二层初始化，这对于研究学习特征的功能表达能力是至关重要的。这提供了对特征学习对两层神经网络泛化的影响的清晰描述，超越了随机特征和懒惰训练模式。

更新时间: 2024-10-24 17:24:34

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2410.18938v1

Schema-Guided Culture-Aware Complex Event Simulation with Multi-Agent Role-Play

Complex news events, such as natural disasters and socio-political conflicts, require swift responses from the government and society. Relying on historical events to project the future is insufficient as such events are sparse and do not cover all possible conditions and nuanced situations. Simulation of these complex events can help better prepare and reduce the negative impact. We develop a controllable complex news event simulator guided by both the event schema representing domain knowledge about the scenario and user-provided assumptions representing case-specific conditions. As event dynamics depend on the fine-grained social and cultural context, we further introduce a geo-diverse commonsense and cultural norm-aware knowledge enhancement component. To enhance the coherence of the simulation, apart from the global timeline of events, we take an agent-based approach to simulate the individual character states, plans, and actions. By incorporating the schema and cultural norms, our generated simulations achieve much higher coherence and appropriateness and are received favorably by participants from a humanitarian assistance organization.

Updated: 2024-10-24 17:21:43

标题: 基于模式引导的文化感知复杂事件模拟与多智能体角色扮演

摘要: 复杂的新闻事件，如自然灾害和社会政治冲突，需要政府和社会迅速做出反应。依靠历史事件来预测未来是不够的，因为这类事件稀少且无法涵盖所有可能的条件和微妙的情况。模拟这些复杂事件可以帮助更好地准备并减少负面影响。我们开发了一个可控的复杂新闻事件模拟器，根据代表情景的事件模式和代表特定案例条件的用户提供的假设进行指导。由于事件动态取决于细粒度的社会和文化背景，我们进一步引入了一个地理多样性、常识和文化规范意识的知识增强组件。为了增强模拟的连贯性，除了全球事件时间表之外，我们采用基于代理的方法来模拟个体角色的状态、计划和行动。通过结合事件模式和文化规范，我们生成的模拟具有更高的连贯性和适当性，并获得了人道主义援助组织参与者的好评。

更新时间: 2024-10-24 17:21:43

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18935v1

ANAVI: Audio Noise Awareness using Visuals of Indoor environments for NAVIgation

We propose Audio Noise Awareness using Visuals of Indoors for NAVIgation for quieter robot path planning. While humans are naturally aware of the noise they make and its impact on those around them, robots currently lack this awareness. A key challenge in achieving audio awareness for robots is estimating how loud will the robot's actions be at a listener's location? Since sound depends upon the geometry and material composition of rooms, we train the robot to passively perceive loudness using visual observations of indoor environments. To this end, we generate data on how loud an 'impulse' sounds at different listener locations in simulated homes, and train our Acoustic Noise Predictor (ANP). Next, we collect acoustic profiles corresponding to different actions for navigation. Unifying ANP with action acoustics, we demonstrate experiments with wheeled (Hello Robot Stretch) and legged (Unitree Go2) robots so that these robots adhere to the noise constraints of the environment. See code and data at https://anavi-corl24.github.io/

Updated: 2024-10-24 17:19:53

标题: ANAVI：利用室内环境的视觉辅助实现导航的音频噪音感知

摘要: 我们提出使用室内视觉进行声音噪声感知，为更安静的机器人路径规划提供导航。虽然人类自然地意识到他们所制造的噪音及其对周围人的影响，但机器人目前缺乏这种意识。实现机器人音频感知的关键挑战是估计机器人行动在听众位置会有多大声音？由于声音取决于房间的几何形状和材料组成，我们训练机器人通过视觉观察室内环境 passively感知声音的响度。为此，我们生成了模拟家庭中不同听众位置的'冲击'声音有多大声音的数据，并训练我们的声学噪声预测器（ANP）。接下来，我们收集与导航的不同行动对应的声学配置文件。将ANP与行动声学统一起来，我们展示了使用轮式（Hello Robot Stretch）和腿式（Unitree Go2）机器人进行实验，以使这些机器人遵守环境的噪音约束。请查看代码和数据：https://anavi-corl24.github.io/

更新时间: 2024-10-24 17:19:53

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.18932v1

AutoStep: Locally adaptive involutive MCMC

Many common Markov chain Monte Carlo (MCMC) kernels can be formulated using a deterministic involutive proposal with a step size parameter. Selecting an appropriate step size is often a challenging task in practice; and for complex multiscale targets, there may not be one choice of step size that works well globally. In this work, we address this problem with a novel class of involutive MCMC methods -- AutoStep MCMC -- that selects an appropriate step size at each iteration adapted to the local geometry of the target distribution. We prove that AutoStep MCMC is $\pi$-invariant and has other desirable properties under mild assumptions on the target distribution $\pi$ and involutive proposal. Empirical results examine the effect of various step size selection design choices, and show that AutoStep MCMC is competitive with state-of-the-art methods in terms of effective sample size per unit cost on a range of challenging target distributions.

Updated: 2024-10-24 17:17:11

标题: AutoStep：本地自适应的逆向MCMC

摘要: 许多常见的马尔可夫链蒙特卡罗（MCMC）核心可以使用具有步长参数的确定性逆向提议来表述。在实践中，选择适当的步长通常是一个具有挑战性的任务；对于复杂的多尺度目标，可能没有一个步长选择在全局范围内都能很好地工作。在这项工作中，我们提出了一种新颖的逆向MCMC方法--AutoStep MCMC--它在每次迭代中选择适当的步长，适应目标分布的局部几何形状。我们证明AutoStep MCMC是$\pi$-不变的，并在目标分布$\pi$和逆向提议的轻微假设下具有其他理想属性。经验结果考察了各种步长选择设计选择的影响，并显示AutoStep MCMC在一系列具有挑战性目标分布上的每单位成本有效样本大小方面与最先进的方法相竞争。

更新时间: 2024-10-24 17:17:11

领域: stat.CO,cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.18929v1

Learning $k$-body Hamiltonians via compressed sensing

We study the problem of learning a $k$-body Hamiltonian with $M$ unknown Pauli terms that are not necessarily geometrically local. We propose a protocol that learns the Hamiltonian to precision $\epsilon$ with total evolution time ${\mathcal{O}}(M^{1/2+1/p}/\epsilon)$ up to logarithmic factors, where the error is quantified by the $\ell^p$-distance between Pauli coefficients. Our learning protocol uses only single-qubit control operations and a GHZ state initial state, is non-adaptive, is robust against SPAM errors, and performs well even if $M$ and $k$ are not precisely known in advance or if the Hamiltonian is not exactly $M$-sparse. Methods from the classical theory of compressed sensing are used for efficiently identifying the $M$ terms in the Hamiltonian from among all possible $k$-body Pauli operators. We also provide a lower bound on the total evolution time needed in this learning task, and we discuss the operational interpretations of the $\ell^1$ and $\ell^2$ error metrics. In contrast to previous works, our learning protocol requires neither geometric locality nor any other relaxed locality conditions.

Updated: 2024-10-24 17:16:19

标题: 通过压缩感知学习$k$-体哈密顿量

摘要: 我们研究了学习具有$M$个未知Pauli项的$k$体哈密顿量的问题，这些项不一定是几何局域的。我们提出了一个协议，可以将哈密顿量学习到精度$\epsilon$，总演化时间为${\mathcal{O}}(M^{1/2+1/p}/\epsilon)$，其中误差由Pauli系数之间的$\ell^p$距离来量化，除去对数因子。我们的学习协议仅使用单比特控制操作和GHZ态初始态，是非自适应的，对SPAM误差具有鲁棒性，即使$M$和$k$事先不精确地知道或者哈密顿量不完全是$M$-稀疏的情况下也能表现良好。我们使用压缩感知的经典理论方法来高效地从所有可能的$k$体Pauli算符中识别哈密顿量中的$M$个项。我们还给出了这个学习任务所需的总演化时间的下界，并讨论了$\ell^1$和$\ell^2$误差度量的操作解释。与先前的作品不同，我们的学习协议既不需要几何局域性，也不需要任何其他放松的局域性条件。

更新时间: 2024-10-24 17:16:19

领域: quant-ph,cs.DS,cs.LG

下载: http://arxiv.org/abs/2410.18928v1

SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) are showing strong safety concerns (e.g., generating harmful outputs for users), which motivates the development of safety evaluation benchmarks. However, we observe that existing safety benchmarks for MLLMs show limitations in query quality and evaluation reliability limiting the detection of model safety implications as MLLMs continue to evolve. In this paper, we propose \toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs. Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol that aims to address the above limitations, respectively. We first design an automatic safety dataset generation pipeline, where we employ a set of LLM judges to recognize and categorize the risk scenarios that are most harmful and diverse for MLLMs; based on the taxonomy, we further ask these judges to generate high-quality harmful queries accordingly resulting in 23 risk scenarios with 2,300 multi-modal harmful query pairs. During safety evaluation, we draw inspiration from the jury system in judicial proceedings and pioneer the jury deliberation evaluation protocol that adopts collaborative LLMs to evaluate whether target models exhibit specific harmful behaviors, providing a reliable and unbiased assessment of content security risks. In addition, our benchmark can also be extended to the audio modality showing high scalability and potential. Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs (e.g., GPT-4o, Gemini), where we revealed widespread safety issues in existing MLLMs and instantiated several insights on MLLM safety performance such as image quality and parameter size.

Updated: 2024-10-24 17:14:40

标题: SafeBench: 一个用于多模态大型语言模型的安全评估框架

摘要: 多模态大型语言模型（MLLMs）显示出强烈的安全性问题（例如，为用户生成有害输出），这促使安全评估基准的发展。然而，我们观察到现有的MLLMs安全基准在查询质量和评估可靠性方面存在限制，限制了对模型安全性含义的检测，因为MLLMs继续发展。在本文中，我们提出了\toolns，一个专门用于进行MLLMs安全性评估的综合框架。我们的框架包括一个全面的有害查询数据集和一个旨在解决上述限制的自动评估协议。我们首先设计了一个自动安全数据集生成流水线，在这里，我们雇用一组LLM评委来识别和分类对MLLMs最有害和多样化的风险情景；基于分类法，我们进一步要求这些评委相应地生成高质量的有害查询，从而产生了23个风险情景和2300个多模态有害查询对。在安全性评估过程中，我们从司法程序中的陪审团系统获取灵感，并开创了采用协作LLMs评估目标模型是否表现出特定有害行为的陪审团审议评估协议，提供了对内容安全风险的可靠和公正评估。此外，我们的基准还可以扩展到音频模态，具有高可扩展性和潜力。基于我们的框架，我们对15个广泛使用的开源MLLMs和6个商业MLLMs（例如，GPT-4o，Gemini）进行了大规模实验，在这些实验中，我们揭示了现有MLLMs中普遍存在的安全问题，并提出了关于MLLMs安全性能的一些见解，如图像质量和参数大小。

更新时间: 2024-10-24 17:14:40

领域: cs.CR

下载: http://arxiv.org/abs/2410.18927v1

LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search

Approximate nearest neighbor (ANN) search is a key component in many modern machine learning pipelines; recent use cases include retrieval-augmented generation (RAG) and vector databases. Clustering-based ANN algorithms, that use score computation methods based on product quantization (PQ), are often used in industrial-scale applications due to their scalability and suitability for distributed and disk-based implementations. However, they have slower query times than the leading graph-based ANN algorithms. In this work, we propose a new supervised score computation method based on the observation that inner product approximation is a multivariate (multi-output) regression problem that can be solved efficiently by reduced-rank regression. Our experiments show that on modern high-dimensional data sets, the proposed reduced-rank regression (RRR) method is superior to PQ in both query latency and memory usage. We also introduce LoRANN, a clustering-based ANN library that leverages the proposed score computation method. LoRANN is competitive with the leading graph-based algorithms and outperforms the state-of-the-art GPU ANN methods on high-dimensional data sets.

Updated: 2024-10-24 17:13:39

标题: LoRANN：用于近似最近邻搜索的低秩矩阵分解

摘要: 近似最近邻（ANN）搜索是许多现代机器学习流水线中的关键组件；最近的使用案例包括检索增强生成（RAG）和向量数据库。基于聚类的ANN算法，使用基于产品量化（PQ）的得分计算方法，通常用于工业规模应用程序，因为它们具有可伸缩性，并适用于分布式和基于磁盘的实现。然而，它们的查询时间比领先的基于图的ANN算法慢。在这项工作中，我们提出了一种基于监督学习得分计算方法，该方法基于内积近似是一个多变量（多输出）回归问题，可以通过降维回归方法有效地解决。我们的实验表明，在现代高维数据集上，所提出的降维回归（RRR）方法在查询延迟和内存使用方面优于PQ。我们还引入了LoRANN，一个基于聚类的ANN库，利用了所提出的得分计算方法。LoRANN在与领先的基于图的算法竞争中表现出色，并在高维数据集上胜过最先进的GPU ANN方法。

更新时间: 2024-10-24 17:13:39

领域: cs.LG

下载: http://arxiv.org/abs/2410.18926v1

SegLLM: Multi-round Reasoning Segmentation

We present SegLLM, a novel multi-round interactive reasoning segmentation model that enhances LLM-based segmentation by exploiting conversational memory of both visual and textual outputs. By leveraging a mask-aware multimodal LLM, SegLLM re-integrates previous segmentation results into its input stream, enabling it to reason about complex user intentions and segment objects in relation to previously identified entities, including positional, interactional, and hierarchical relationships, across multiple interactions. This capability allows SegLLM to respond to visual and text queries in a chat-like manner. Evaluated on the newly curated MRSeg benchmark, SegLLM outperforms existing methods in multi-round interactive reasoning segmentation by over 20%. Additionally, we observed that training on multi-round reasoning segmentation data enhances performance on standard single-round referring segmentation and localization tasks, resulting in a 5.5% increase in cIoU for referring expression segmentation and a 4.5% improvement in Acc@0.5 for referring expression localization.

Updated: 2024-10-24 17:11:52

标题: SegLLM：多轮推理分割

摘要: 我们提出了SegLLM，一种新颖的多轮交互式推理分割模型，通过利用视觉和文本输出的对话记忆来增强基于LLM的分割。通过利用一个具有掩模意识的多模态LLM，SegLLM重新整合了之前的分割结果到其输入流中，使其能够推理复杂用户意图并根据先前识别的实体，包括位置、交互和层次关系，跨多次交互分割对象。这种能力使SegLLM能够以类似聊天的方式回应视觉和文本查询。在新策划的MRSeg基准上进行评估，SegLLM在多轮交互式推理分割方面的性能优于现有方法超过20%。此外，我们观察到在多轮推理分割数据上进行训练可以提高标准单轮指称分割和定位任务的性能，导致指称表达分割的cIoU提高了5.5%，指称表达定位的Acc@0.5提高了4.5%。

更新时间: 2024-10-24 17:11:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.18923v1

From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems

Consider the math problem: "Lily received 3 cookies from her best friend yesterday and ate 5 for breakfast. Today, her friend gave her 3 more cookies. How many cookies does Lily have now?" Many large language models (LLMs) in previous research approach this problem by calculating the answer "1" using the equation "3 - 5 + 3." However, from a human perspective, we recognize the inherent flaw in this problem: Lily cannot eat 5 cookies if she initially only had 3. This discrepancy prompts a key question: Are current LLMs merely Blind Solver that apply mathematical operations without deeper reasoning, or can they function as Logical Thinker capable of identifying logical inconsistencies? To explore this question, we propose a benchmark dataset, FaultyMath, which includes faulty math problems of rich diversity: i) multiple mathematical categories, e.g., algebra, geometry, number theory, etc., ii) varying levels of difficulty, and iii) different origins of faultiness -- ranging from violations of common sense and ambiguous statements to mathematical contradictions and more. We evaluate a broad spectrum of LLMs, including open-source, closed-source, and math-specialized models, using FaultyMath across three dimensions: (i) How accurately can the models detect faulty math problems without being explicitly prompted to do so? (ii) When provided with hints -- either correct or misleading -- about the validity of the problems, to what extent do LLMs adapt to become reliable Logical Thinker? (iii) How trustworthy are the explanations generated by LLMs when they recognize a math problem as flawed? Through extensive experimentation and detailed analysis, our results demonstrate that existing LLMs largely function as Blind Solver and fall short of the reasoning capabilities required to perform as Logical Thinker.

Updated: 2024-10-24 17:10:39

标题: 从盲目求解者到逻辑思考者：对LLMs在错误的数学问题上的逻辑完整性进行基准测试

摘要: 考虑这个数学问题：“莉莉昨天从她最好的朋友那里收到3块饼干，早餐吃了5块。今天，她的朋友又给了她3块饼干。莉莉现在有多少块饼干？”先前的许多大型语言模型（LLMs）在研究中通过计算方程式“3-5+3”得出答案“1”。然而，从人类的角度来看，我们意识到这个问题中的固有缺陷：如果一开始只有3块饼干，莉莉不可能吃掉5块。这种矛盾引发了一个关键问题：当前的LLMs只是盲目求解器，进行数学运算而缺乏更深层次的推理能力，还是可以作为具有识别逻辑不一致性能力的逻辑思维者？为了探讨这个问题，我们提出了一个基准数据集，称为FaultyMath，其中包含各种各样的错误数学问题：i）多种数学类别，例如代数、几何、数论等，ii）不同难度级别，以及iii）不同的错误来源-- 包括违反常识和含糊陈述到数学矛盾等。我们通过FaultyMath在三个维度上评估各种LLMs，包括开源、闭源和数学专用模型：（i）模型在没有明确提示的情况下，能够多准确地检测到错误的数学问题？（ii）当提供关于问题有效性的提示--包括正确或误导性提示--时，LLMs在多大程度上能够适应成为可靠的逻辑思维者？（iii）当LLMs识别一个数学问题存在缺陷时，生成的解释有多可信？通过广泛的实验和详细分析，我们的结果表明现有的LLMs主要作为盲目求解器，并且缺乏作为逻辑思维者所需的推理能力。

更新时间: 2024-10-24 17:10:39

领域: cs.CL,cs.AI,cs.LO

下载: http://arxiv.org/abs/2410.18921v1

Optimizing Edge Offloading Decisions for Object Detection

Recent advances in machine learning and hardware have produced embedded devices capable of performing real-time object detection with commendable accuracy. We consider a scenario in which embedded devices rely on an onboard object detector, but have the option to offload detection to a more powerful edge server when local accuracy is deemed too low. Resource constraints, however, limit the number of images that can be offloaded to the edge. Our goal is to identify which images to offload to maximize overall detection accuracy under those constraints. To that end, the paper introduces a reward metric designed to quantify potential accuracy improvements from offloading individual images, and proposes an efficient approach to make offloading decisions by estimating this reward based only on local detection results. The approach is computationally frugal enough to run on embedded devices, and empirical findings indicate that it outperforms existing alternatives in improving detection accuracy even when the fraction of offloaded images is small.

Updated: 2024-10-24 17:09:37

标题: 优化边缘卸载决策以进行目标检测

摘要: 最近机器学习和硬件的进步已经生产出能够实时进行目标检测并具有可观准确性的嵌入式设备。我们考虑的情景是，嵌入式设备依赖于内置目标检测器，但在本地准确性被认为太低时可以选择将检测任务卸载到更强大的边缘服务器。然而，资源限制限制了可以卸载到边缘的图像数量。我们的目标是确定哪些图像可以卸载以在这些限制下最大化整体检测准确性。为此，本文引入了一种奖励度量标准，旨在量化从卸载单个图像中可能获得的准确性改进，并提出了一种有效的方法，仅基于本地检测结果估计这种奖励来做出卸载决策。这种方法在计算上足够节俭以在嵌入式设备上运行，并实证结果表明，即使卸载图像的比例很小，它仍然优于现有的替代方案在提高检测准确性方面的表现。

更新时间: 2024-10-24 17:09:37

领域: cs.DC,cs.LG,cs.NI

下载: http://arxiv.org/abs/2410.18919v1

MissNODAG: Differentiable Cyclic Causal Graph Learning from Incomplete Data

Causal discovery in real-world systems, such as biological networks, is often complicated by feedback loops and incomplete data. Standard algorithms, which assume acyclic structures or fully observed data, struggle with these challenges. To address this gap, we propose MissNODAG, a differentiable framework for learning both the underlying cyclic causal graph and the missingness mechanism from partially observed data, including data missing not at random. Our framework integrates an additive noise model with an expectation-maximization procedure, alternating between imputing missing values and optimizing the observed data likelihood, to uncover both the cyclic structures and the missingness mechanism. We demonstrate the effectiveness of MissNODAG through synthetic experiments and an application to real-world gene perturbation data.

Updated: 2024-10-24 17:09:10

标题: MissNODAG: 不完整数据中可微的循环因果图学习

摘要: 在真实世界系统中进行因果发现，比如生物网络，通常会受到反馈回路和数据不完整的影响。标准算法通常假设无环结构或完全观察到的数据，因此难以应对这些挑战。为了解决这一问题，我们提出了MissNODAG，这是一个可微分的框架，旨在从部分观测到的数据中学习潜在的循环因果图和缺失机制，包括非随机缺失数据。我们的框架将加性噪声模型与期望最大化程序相结合，交替进行缺失值填补和优化观察到的数据似然，以揭示循环结构和缺失机制。我们通过合成实验和应用于真实世界基因扰动数据的案例展示了MissNODAG的有效性。

更新时间: 2024-10-24 17:09:10

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.18918v1

Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python

In this work, we present \skfp, a Python package for computation of molecular fingerprints for applications in chemoinformatics. Our library offers an industry-standard scikit-learn interface, allowing intuitive usage and easy integration with machine learning pipelines. It is also highly optimized, featuring parallel computation that enables efficient processing of large molecular datasets. Currently, \skfp~stands as the most feature-rich library in the open source Python ecosystem, offering over 30 molecular fingerprints. Our library simplifies chemoinformatics tasks based on molecular fingerprints, including molecular property prediction and virtual screening. It is also flexible, highly efficient, and fully open source.

Updated: 2024-10-24 17:08:56

标题: Scikit-fingerprints：Python中分子指纹的简单高效计算

摘要: 在这项工作中，我们提出了\skfp，一个用于在化学信息学应用中计算分子指纹的Python软件包。我们的库提供了一个行业标准的scikit-learn接口，使用户可以直观地使用并轻松地与机器学习管道集成。它还经过高度优化，具有并行计算功能，可以有效处理大型分子数据集。目前，\skfp~是开源Python生态系统中功能最丰富的库，提供超过30种分子指纹。我们的库简化了基于分子指纹的化学信息学任务，包括分子性质预测和虚拟筛选。它还具有灵活性、高效性和完全开源。

更新时间: 2024-10-24 17:08:56

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2407.13291v3

Using Parametric PINNs for Predicting Internal and External Turbulent Flows

Computational fluid dynamics (CFD) solvers employing two-equation eddy viscosity models are the industry standard for simulating turbulent flows using the Reynolds-averaged Navier-Stokes (RANS) formulation. While these methods are computationally less expensive than direct numerical simulations, they can still incur significant computational costs to achieve the desired accuracy. In this context, physics-informed neural networks (PINNs) offer a promising approach for developing parametric surrogate models that leverage both existing, but limited CFD solutions and the governing differential equations to predict simulation outcomes in a computationally efficient, differentiable, and near real-time manner. In this work, we build upon the previously proposed RANS-PINN framework, which only focused on predicting flow over a cylinder. To investigate the efficacy of RANS-PINN as a viable approach to building parametric surrogate models, we investigate its accuracy in predicting relevant turbulent flow variables for both internal and external flows. To ensure training convergence with a more complex loss function, we adopt a novel sampling approach that exploits the domain geometry to ensure a proper balance among the contributions from various regions within the solution domain. The effectiveness of this framework is then demonstrated for two scenarios that represent a broad class of internal and external flow problems.

Updated: 2024-10-24 17:08:20

标题: 使用参数化PINNs预测内部和外部湍流流动

摘要: 采用双方程涡粘度模型的计算流体动力学（CFD）求解器是使用雷诺平均纳维-斯托克斯（RANS）公式模拟湍流流动的行业标准。虽然这些方法在计算上比直接数值模拟要便宜，但仍可能产生显著的计算成本以实现所需的精度。在这种情况下，基于物理的神经网络（PINNs）提供了一种有望开发参数替代模型的方法，利用现有但有限的CFD解决方案和控制微分方程来预测模拟结果，以一种计算高效、可微分和接近实时的方式。在这项工作中，我们基于先前提出的RANS-PINN框架，该框架仅专注于预测圆柱上的流动。为了研究RANS-PINN作为建立参数替代模型的可行方法的有效性，我们调查其在预测内部和外部流动的相关湍流流变量方面的准确性。为了确保在更复杂的损失函数下的训练收敛，我们采用一种利用域几何来确保在解决方案域内各个区域的贡献之间保持适当平衡的新型采样方法。然后，该框架的有效性在代表内部和外部流动问题的广泛类别的两种情景中得到了证明。

更新时间: 2024-10-24 17:08:20

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2410.18917v1

Testing Support Size More Efficiently Than Learning Histograms

Consider two problems about an unknown probability distribution $p$: 1. How many samples from $p$ are required to test if $p$ is supported on $n$ elements or not? Specifically, given samples from $p$, determine whether it is supported on at most $n$ elements, or it is "$\epsilon$-far" (in total variation distance) from being supported on $n$ elements. 2. Given $m$ samples from $p$, what is the largest lower bound on its support size that we can produce? The best known upper bound for problem (1) uses a general algorithm for learning the histogram of the distribution $p$, which requires $\Theta(\tfrac{n}{\epsilon^2 \log n})$ samples. We show that testing can be done more efficiently than learning the histogram, using only $O(\tfrac{n}{\epsilon \log n} \log(1/\epsilon))$ samples, nearly matching the best known lower bound of $\Omega(\tfrac{n}{\epsilon \log n})$. This algorithm also provides a better solution to problem (2), producing larger lower bounds on support size than what follows from previous work. The proof relies on an analysis of Chebyshev polynomial approximations outside the range where they are designed to be good approximations, and the paper is intended as an accessible self-contained exposition of the Chebyshev polynomial method.

Updated: 2024-10-24 17:05:34

标题: 比学习直方图更有效地测试支持大小

摘要: 考虑一个关于未知概率分布$p$的问题： 1. 需要多少样本才能测试$p$是否支持$n$个元素？具体来说，给定$p$的样本，确定它是否支持最多$n$个元素，或者它与支持$n$个元素相差"$\epsilon$-远"（以总变差距离衡量）。 2. 给定$p$的$m$个样本，我们能够产生的支持大小的最大下界是多少？对于问题（1），已知的最佳上界使用了一个学习概率分布$p$直方图的通用算法，需要$\Theta(\tfrac{n}{\epsilon^2 \log n})$个样本。我们表明，测试可以比学习直方图更有效地完成，只需使用$O(\tfrac{n}{\epsilon \log n} \log(1/\epsilon))$个样本，几乎与已知的最佳下界$\Omega(\tfrac{n}{\epsilon \log n})$相匹配。这个算法还提供了一个更好的解决方案给问题（2），产生了比以往研究得出的更大的支持大小的下界。证明依赖于对切比雪夫多项式逼近在设计为良好逼近的范围之外的分析，本文旨在作为切比雪夫多项式方法的易于理解的自包含阐述。

更新时间: 2024-10-24 17:05:34

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2410.18915v1

Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling

Videos of robots interacting with objects encode rich information about the objects' dynamics. However, existing video prediction approaches typically do not explicitly account for the 3D information from videos, such as robot actions and objects' 3D states, limiting their use in real-world robotic applications. In this work, we introduce a framework to learn object dynamics directly from multi-view RGB videos by explicitly considering the robot's action trajectories and their effects on scene dynamics. We utilize the 3D Gaussian representation of 3D Gaussian Splatting (3DGS) to train a particle-based dynamics model using Graph Neural Networks. This model operates on sparse control particles downsampled from the densely tracked 3D Gaussian reconstructions. By learning the neural dynamics model on offline robot interaction data, our method can predict object motions under varying initial configurations and unseen robot actions. The 3D transformations of Gaussians can be interpolated from the motions of control particles, enabling the rendering of predicted future object states and achieving action-conditioned video prediction. The dynamics model can also be applied to model-based planning frameworks for object manipulation tasks. We conduct experiments on various kinds of deformable materials, including ropes, clothes, and stuffed animals, demonstrating our framework's ability to model complex shapes and dynamics. Our project page is available at https://gs-dynamics.github.io.

Updated: 2024-10-24 17:02:52

标题: 动态3D高斯跟踪用于基于图的神经动力学建模

摘要: 机器人与物体互动的视频编码了丰富的关于物体动态的信息。然而，现有的视频预测方法通常没有明确考虑来自视频的三维信息，如机器人的动作和物体的三维状态，从而限制了它们在现实世界机器人应用中的使用。在这项工作中，我们引入了一个框架，通过明确考虑机器人的动作轨迹及其对场景动态的影响，直接从多视角RGB视频中学习物体动态。我们利用3D高斯喷涂（3DGS）的3D高斯表示来训练一个基于粒子的动力学模型，使用图神经网络。该模型在从密集跟踪的3D高斯重建中下采样的稀疏控制粒子上运行。通过在离线机器人交互数据上学习神经动力学模型，我们的方法可以预测不同初始配置和未见机器人动作下的物体运动。高斯的3D变换可以从控制粒子的运动中插值，从而实现预测未来物体状态并实现动作条件下的视频预测。动力学模型还可以应用于基于模型的规划框架，用于物体操纵任务。我们在各种可变形材料上进行实验，包括绳索、衣物和填充动物，展示了我们的框架对复杂形状和动态建模的能力。我们的项目页面位于https://gs-dynamics.github.io。

更新时间: 2024-10-24 17:02:52

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18912v1

SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment

Imitation learning from human demonstrations is an effective paradigm for robot manipulation, but acquiring large datasets is costly and resource-intensive, especially for long-horizon tasks. To address this issue, we propose SkillMimicGen (SkillGen), an automated system for generating demonstration datasets from a few human demos. SkillGen segments human demos into manipulation skills, adapts these skills to new contexts, and stitches them together through free-space transit and transfer motion. We also propose a Hybrid Skill Policy (HSP) framework for learning skill initiation, control, and termination components from SkillGen datasets, enabling skills to be sequenced using motion planning at test-time. We demonstrate that SkillGen greatly improves data generation and policy learning performance over a state-of-the-art data generation framework, resulting in the capability to produce data for large scene variations, including clutter, and agents that are on average 24% more successful. We demonstrate the efficacy of SkillGen by generating over 24K demonstrations across 18 task variants in simulation from just 60 human demonstrations, and training proficient, often near-perfect, HSP agents. Finally, we apply SkillGen to 3 real-world manipulation tasks and also demonstrate zero-shot sim-to-real transfer on a long-horizon assembly task. Videos, and more at https://skillgen.github.io.

Updated: 2024-10-24 16:59:26

标题: SkillMimicGen：用于有效技能学习和部署的自动演示生成

摘要: 从人类演示中学习模仿是机器人操纵的有效范例，但获取大规模数据集成本高且资源密集，特别是对于长期任务。为解决这一问题，我们提出了SkillMimicGen（SkillGen），这是一个自动生成演示数据集的自动化系统，仅需少量人类演示。SkillGen将人类演示分段为操纵技能，将这些技能适应到新的环境中，并通过自由空间传递和转移运动将它们连接在一起。我们还提出了一个Hybrid Skill Policy（HSP）框架，用于从SkillGen数据集中学习技能启动、控制和终止组件，使技能能够在测试时使用运动规划进行排序。我们证明，SkillGen在数据生成和策略学习性能方面远远优于最先进的数据生成框架，使得能够生成适用于大场景变化的数据，包括混乱场景，并使代理人的成功率平均提高了24%。我们通过仅使用60个人类演示，在模拟中生成了超过24K个演示，涵盖18个任务变体，并训练出熟练、甚至几乎完美的HSP代理。最后，我们将SkillGen应用于3个真实世界的操纵任务，并展示了对长期任务进行零次模拟到真实的转移。视频和更多内容请访问https://skillgen.github.io。

更新时间: 2024-10-24 16:59:26

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18907v1

PRISM: A Methodology for Auditing Biases in Large Language Models

Auditing Large Language Models (LLMs) to discover their biases and preferences is an emerging challenge in creating Responsible Artificial Intelligence (AI). While various methods have been proposed to elicit the preferences of such models, countermeasures have been taken by LLM trainers, such that LLMs hide, obfuscate or point blank refuse to disclosure their positions on certain subjects. This paper presents PRISM, a flexible, inquiry-based methodology for auditing LLMs - that seeks to illicit such positions indirectly through task-based inquiry prompting rather than direct inquiry of said preferences. To demonstrate the utility of the methodology, we applied PRISM on the Political Compass Test, where we assessed the political leanings of twenty-one LLMs from seven providers. We show LLMs, by default, espouse positions that are economically left and socially liberal (consistent with prior work). We also show the space of positions that these models are willing to espouse - where some models are more constrained and less compliant than others - while others are more neutral and objective. In sum, PRISM can more reliably probe and audit LLMs to understand their preferences, biases and constraints.

Updated: 2024-10-24 16:57:20

标题: PRISM：大规模语言模型中审计偏见的方法论

摘要: 审计大型语言模型（LLMs）以发现它们的偏见和偏好是在创建负责任的人工智能（AI）中面临的新挑战。虽然已经提出了各种方法来引出这些模型的偏好，但LLM训练者已采取了相应措施，使LLMs隐藏、混淆或直截了当地拒绝披露它们在某些主题上的立场。本文介绍了PRISM，这是一种灵活的、基于询问的审计LLMs的方法论，通过基于任务的询问提示间接引出这些立场，而不是直接询问这些偏好。为了展示该方法的实用性，我们在政治罗盘测试上应用了PRISM，在这里我们评估了来自七个提供商的二十一个LLMs的政治倾向。我们展示LLMs默认地支持经济上的左倾和社会上的自由主义立场（与先前的工作一致）。我们还展示了这些模型愿意支持的立场空间-一些模型受到更多限制，不太顺从，而另一些则更中立和客观。总之，PRISM可以更可靠地探测和审计LLMs，以了解它们的偏好、偏见和约束。

更新时间: 2024-10-24 16:57:20

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2410.18906v1

GOAL: A Generalist Combinatorial Optimization Agent Learning

Machine Learning-based heuristics have recently shown impressive performance in solving a variety of hard combinatorial optimization problems (COPs). However they generally rely on a separate neural model, specialized and trained for each single problem. Any variation of a problem requires adjustment of its model and re-training from scratch. In this paper, we propose GOAL (for Generalist combinatorial Optimization Agent Learning), a generalist model capable of efficiently solving multiple COPs and which can be fine-tuned to solve new COPs. GOAL consists of a single backbone plus light-weight problem-specific adapters for input and output processing. The backbone is based on a new form of mixed-attention blocks which allows to handle problems defined on graphs with arbitrary combinations of node, edge and instance-level features. Additionally, problems which involve heterogeneous types of nodes or edges are handled through a novel multi-type transformer architecture, where the attention blocks are duplicated to attend the meaningful combinations of types while relying on the same shared parameters. We train GOAL on a set of routing, scheduling and classic graph problems and show that it is only slightly inferior to the specialized baselines while being the first multi-task model that solves a wide range of COPs. Finally we showcase the strong transfer learning capacity of GOAL by fine-tuning it on several new problems. Our code is available at https://github.com/naver/goal-co/.

Updated: 2024-10-24 16:52:15

标题: 目标：一个通用的组合优化代理学习算法

摘要: 基于机器学习的启发式算法最近在解决各种困难的组合优化问题（COPs）方面表现出令人印象深刻的性能。然而，它们通常依赖于单独的神经模型，专门针对每个单独的问题进行训练。问题的任何变化都需要调整其模型并从头开始重新训练。在本文中，我们提出了GOAL（通用组合优化代理学习），这是一个通用模型，能够高效地解决多个COPs，并可以微调以解决新的COPs。GOAL由一个单一的主干加上轻量级的问题特定适配器组成，用于输入和输出处理。主干基于一种新形式的混合注意力块，可以处理基于图的问题，其中节点、边和实例级特征可以任意组合。此外，涉及异构节点或边类型的问题通过一种新颖的多类型变压器架构处理，其中注意力块被复制以关注类型的有意义组合，同时依赖于相同的共享参数。我们在一组路径规划、调度和经典图问题上训练GOAL，并展示其与专门基线模型略有差距，同时成为第一个解决多种COPs的多任务模型。最后，我们展示了GOAL强大的迁移学习能力，通过对几个新问题进行微调。我们的代码可在https://github.com/naver/goal-co/上找到。

更新时间: 2024-10-24 16:52:15

领域: cs.LG

下载: http://arxiv.org/abs/2406.15079v2

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning

The vast majority of Reinforcement Learning methods is largely impacted by the computation effort and data requirements needed to obtain effective estimates of action-value functions, which in turn determine the quality of the overall performance and the sample-efficiency of the learning procedure. Typically, action-value functions are estimated through an iterative scheme that alternates the application of an empirical approximation of the Bellman operator and a subsequent projection step onto a considered function space. It has been observed that this scheme can be potentially generalized to carry out multiple iterations of the Bellman operator at once, benefiting the underlying learning algorithm. However, till now, it has been challenging to effectively implement this idea, especially in high-dimensional problems. In this paper, we introduce iterated $Q$-Network (i-QN), a novel principled approach that enables multiple consecutive Bellman updates by learning a tailored sequence of action-value functions where each serves as the target for the next. We show that i-QN is theoretically grounded and that it can be seamlessly used in value-based and actor-critic methods. We empirically demonstrate the advantages of i-QN in Atari $2600$ games and MuJoCo continuous control problems.

Updated: 2024-10-24 16:50:57

标题: 迭代$Q$网络：深度强化学习中超越一步贝尔曼更新

摘要: 大多数强化学习方法在很大程度上受到计算量和数据需求的影响，这些需求需要获得有效的动作值函数估计，进而决定整体性能的质量和学习过程的样本效率。通常，动作值函数通过一个迭代方案来估计，该方案交替应用贝尔曼算子的经验逼近和随后的投影步骤到考虑的函数空间。已经观察到这种方案可以潜在地推广到一次执行多次贝尔曼算子的情况，从而使底层学习算法受益。然而，到目前为止，有效地实现这一想法一直是具有挑战性的，尤其是在高维问题中。在本文中，我们引入了迭代$Q$-网络（i-QN），这是一种新颖的原则性方法，通过学习一系列定制的动作值函数，使得多个连续的贝尔曼更新成为可能，其中每个函数都作为下一个函数的目标。我们证明了i-QN在理论上是有根据的，并且可以无缝地用于基于价值和演员-评论方法。我们通过在Atari 2600游戏和MuJoCo连续控制问题中的实证演示了i-QN的优势。

更新时间: 2024-10-24 16:50:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.02107v3

Disentangled Representation Learning with the Gromov-Monge Gap

Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability, or fairness. Although remarkably challenging to solve in theory, disentanglement is often achieved in practice through prior matching. Furthermore, recent works have shown that prior matching approaches can be enhanced by leveraging geometrical considerations, e.g., by learning representations that preserve geometric features of the data, such as distances or angles between points. However, matching the prior while preserving geometric features is challenging, as a mapping that fully preserves these features while aligning the data distribution with the prior does not exist in general. To address these challenges, we introduce a novel approach to disentangled representation learning based on quadratic optimal transport. We formulate the problem using Gromov-Monge maps that transport one distribution onto another with minimal distortion of predefined geometric features, preserving them as much as can be achieved. To compute such maps, we propose the Gromov-Monge-Gap (GMG), a regularizer quantifying whether a map moves a reference distribution with minimal geometry distortion. We demonstrate the effectiveness of our approach for disentanglement across four standard benchmarks, outperforming other methods leveraging geometric considerations.

Updated: 2024-10-24 16:49:16

标题: 使用Gromov-Monge差异进行解缠表示学习

摘要: 从无标签数据中学习解耦表示是机器学习中的一个基本挑战。解决这个问题可能会解锁其他问题，如泛化、可解释性或公平性。虽然在理论上解决这个问题非常具有挑战性，但在实践中通常通过先前匹配来实现解耦。此外，最近的研究表明，通过利用几何考虑因素，例如学习保留数据的几何特征（如点之间的距离或角度）的表示，可以增强先前匹配方法。然而，匹配先前并同时保留几何特征是具有挑战性的，因为一种完全保留这些特征并将数据分布与先前对齐的映射在一般情况下是不存在的。为了解决这些挑战，我们提出了一种基于二次最优传输的解耦表示学习的新方法。我们使用Gromov-Monge映射来制定问题，通过这些映射将一个分布转移到另一个分布，最小化预定义几何特征的扭曲，尽可能地保持这些特征。为了计算这样的映射，我们提出了Gromov-Monge-Gap（GMG），一个正则化器，用于量化映射是否以最小几何扭曲移动参考分布。我们在四个标准基准测试中展示了我们方法在解耦方面的有效性，优于其他利用几何考虑因素的方法。

更新时间: 2024-10-24 16:49:16

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2407.07829v2

Modulated Adaptive Fourier Neural Operators for Temporal Interpolation of Weather Forecasts

Weather and climate data are often available at limited temporal resolution, either due to storage limitations, or in the case of weather forecast models based on deep learning, their inherently long time steps. The coarse temporal resolution makes it difficult to capture rapidly evolving weather events. To address this limitation, we introduce an interpolation model that reconstructs the atmospheric state between two points in time for which the state is known. The model makes use of a novel network layer that modifies the adaptive Fourier neural operator (AFNO), which has been previously used in weather prediction and other applications of machine learning to physics problems. The modulated AFNO (ModAFNO) layer takes an embedding, here computed from the interpolation target time, as an additional input and applies a learned shift-scale operation inside the AFNO layers to adapt them to the target time. Thus, one model can be used to produce all intermediate time steps. Trained to interpolate between two time steps 6 h apart, the ModAFNO-based interpolation model produces 1 h resolution intermediate time steps that are visually nearly indistinguishable from the actual corresponding 1 h resolution data. The model reduces the RMSE loss of reconstructing the intermediate steps by approximately 50% compared to linear interpolation. We also demonstrate its ability to reproduce the statistics of extreme weather events such as hurricanes and heat waves better than 6 h resolution data. The ModAFNO layer is generic and is expected to be applicable to other problems, including weather forecasting with tunable lead time.

Updated: 2024-10-24 16:48:32

标题: 调制自适应傅立叶神经算子用于天气预报时间插值

摘要: 天气和气候数据通常以有限的时间分辨率可用，这可能是由于存储限制，或者在基于深度学习的天气预测模型中，由于其固有的长时间步长。粗糙的时间分辨率使得捕捉快速演变的天气事件变得困难。为了解决这一限制，我们引入了一个插值模型，用于在已知状态的两个时间点之间重建大气状态。该模型利用了一个新颖的网络层，修改了自适应傅里叶神经操作员（AFNO），该操作员先前已在天气预测和物理问题的机器学习应用中使用。调制的AFNO（ModAFNO）层将一个嵌入作为额外输入，该嵌入是从插值目标时间计算得出，并在AFNO层内应用学习的移位-缩放操作，使其适应目标时间。因此，一个模型可以用来生成所有中间时间步。经过训练，以6小时间隔插值两个时间步，基于ModAFNO的插值模型生成了1小时分辨率的中间时间步，几乎无法从实际对应的1小时分辨率数据中区分出来。与线性插值相比，该模型将重建中间步的RMSE损失减少约50％。我们还展示了它在重现极端天气事件（如飓风和热浪）的统计数据方面优于6小时分辨率数据的能力。ModAFNO层是通用的，预计可以应用于其他问题，包括具有可调节前期的天气预测。

更新时间: 2024-10-24 16:48:32

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2410.18904v1

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference

In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference information. Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable. However, caching transformer states can easily require almost as much space as the model parameters. When the right context isn't known in advance, caching ICL can be challenging. This work addresses these limitations by introducing models that, inspired by the encoder-decoder architecture, use cross-attention to condition generation on reference text without the prompt. More precisely, we leverage pre-trained decoder-only models and only train a small number of added layers. We use Question-Answering (QA) as a testbed to evaluate the ability of our models to perform conditional generation and observe that they outperform ICL, are comparable to fine-tuned prompted LLMs, and drastically reduce the space footprint relative to standard KV caching by two orders of magnitude.

Updated: 2024-10-24 16:40:10

标题: XC-Cache：为了高效LLM推理而交叉关注缓存上下文

摘要: 上下文学习（ICL）方法通常利用提示来使解码器-仅语言模型生成基于参考信息的条件。由于自注意力操作的二次成本，立即处理上下文效率低下，因此缓存是可取的。然而，缓存变换器状态很容易需要几乎与模型参数一样多的空间。当事先不知道正确的上下文时，缓存ICL可能具有挑战性。本研究通过引入受编码器-解码器架构启发的模型来解决这些限制，使用交叉注意力来使生成基于参考文本而无提示。更准确地说，我们利用预训练的解码器-仅模型，并仅训练少量添加层。我们使用问答（QA）作为一个测试平台来评估我们的模型执行条件生成的能力，并观察到它们优于ICL，与精调提示的LLM相当，并相对于标准KV缓存大幅减少了空间占用，降低了两个数量级。

更新时间: 2024-10-24 16:40:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.15420v2

ArterialNet: Reconstructing Arterial Blood Pressure Waveform with Wearable Pulsatile Signals, a Cohort-Aware Approach

Continuous arterial blood pressure (ABP) monitoring is invasive but essential for hemodynamic monitoring. Recent techniques have reconstructed ABP non-invasively using pulsatile signals but produced inaccurate systolic and diastolic blood pressure (SBP and DBP) values and were sensitive to individual variability. ArterialNet integrates generalized pulsatile-to-ABP signal translation and personalized feature extraction using hybrid loss functions and regularization. We validated ArterialNet using the MIMIC-III dataset and achieved a root mean square error (RMSE) of 5.41 mmHg, with at least a 58% lower standard deviation. ArterialNet reconstructed ABP with an RMSE of 7.99 mmHg in remote health scenarios. ArterialNet achieved superior performance in ABP reconstruction and SBP and DBP estimations, with significantly reduced subject variance, demonstrating its potential in remote health settings. We also ablated ArterialNet architecture to investigate the contributions of each component and evaluated its translational impact and robustness by conducting a series of ablations on data quality and availability.

Updated: 2024-10-24 16:35:23

标题: 动脉网络：利用可穿戴脉搏信号重建动脉血压波形，一种考虑队列的方法

摘要: 连续动脉血压（ABP）监测是侵入性的但对血液动力学监测至关重要。最近的技术已经通过脉搏信号非侵入性地重建了ABP，但产生了不准确的收缩压和舒张压（SBP和DBP）数值，并且对个体变异性敏感。ArterialNet整合了广义脉搏到ABP信号的转换和个性化特征提取，使用混合损失函数和正则化。我们使用MIMIC-III数据集验证了ArterialNet，并实现了5.41 mmHg的均方根误差（RMSE），标准偏差至少降低了58%。ArterialNet在远程健康场景中重建了ABP，RMSE为7.99 mmHg。ArterialNet在ABP重建和SBP和DBP估计方面表现出优越性能，显著降低了主体变异性，展示了其在远程健康设置中的潜力。我们还消融了ArterialNet的架构，以研究每个组件的贡献，并通过对数据质量和可用性进行一系列消融来评估其转化影响和鲁棒性。

更新时间: 2024-10-24 16:35:23

领域: cs.LG

下载: http://arxiv.org/abs/2410.18895v1

AutoRD: An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models

Rare diseases affect millions worldwide but often face limited research focus due to their low prevalence. This results in prolonged diagnoses and a lack of approved therapies. Recent advancements in Large Language Models (LLMs) have shown promise in automating the extraction of medical information, offering potential to improve medical diagnosis and management. However, most LLMs lack professional medical knowledge, especially concerning rare diseases, and struggle to handle the latest rare disease information. They also cannot effectively manage rare disease data and are not directly suitable for diagnosis and management tasks. Our objective is to create an end-to-end system called AutoRD, which automates the extraction of information from medical texts about rare diseases, focusing on entities and their relations. AutoRD integrates up-to-date structured knowledge and demonstrates superior performance in rare disease extraction tasks. We conduct various experiments to evaluate AutoRD's performance, aiming to surpass common LLMs and traditional methods.

Updated: 2024-10-24 16:32:34

标题: AutoRD:一种基于本体增强的大型语言模型的罕见疾病知识图构建的自动化端到端系统

摘要: 罕见疾病影响全球数百万人，但由于其低发病率，往往面临有限的研究关注。这导致诊断时间延长和缺乏批准的治疗方法。最近大型语言模型（LLMs）的进展显示出自动提取医学信息的潜力，为提高医学诊断和管理提供可能性。然而，大多数LLMs缺乏专业医学知识，特别是罕见疾病方面，并且难以处理最新的罕见疾病信息。它们也无法有效管理罕见疾病数据，也不直接适用于诊断和管理任务。我们的目标是创建一个名为AutoRD的端到端系统，自动从医学文本中提取关于罕见疾病的信息，重点放在实体及其关系上。AutoRD整合了最新的结构化知识，并在罕见疾病提取任务中表现出卓越的性能。我们进行各种实验评估AutoRD的性能，旨在超越常见的LLMs和传统方法。

更新时间: 2024-10-24 16:32:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.00953v3

Meta-Learning with Heterogeneous Tasks

Meta-learning is a general approach to equip machine learning models with the ability to handle few-shot scenarios when dealing with many tasks. Most existing meta-learning methods work based on the assumption that all tasks are of equal importance. However, real-world applications often present heterogeneous tasks characterized by varying difficulty levels, noise in training samples, or being distinctively different from most other tasks. In this paper, we introduce a novel meta-learning method designed to effectively manage such heterogeneous tasks by employing rank-based task-level learning objectives, Heterogeneous Tasks Robust Meta-learning (HeTRoM). HeTRoM is proficient in handling heterogeneous tasks, and it prevents easy tasks from overwhelming the meta-learner. The approach allows for an efficient iterative optimization algorithm based on bi-level optimization, which is then improved by integrating statistical guidance. Our experimental results demonstrate that our method provides flexibility, enabling users to adapt to diverse task settings and enhancing the meta-learner's overall performance.

Updated: 2024-10-24 16:32:23

标题: 使用异构任务的元学习

摘要: 元学习是一种通用方法，用于赋予机器学习模型处理许多任务时处理少样本情况的能力。大多数现有的元学习方法基于一个假设，即所有任务都具有相等的重要性。然而，现实世界的应用通常涉及具有不同难度级别、训练样本中的噪音或与大多数其他任务明显不同的异质任务。在本文中，我们介绍了一种新颖的元学习方法，旨在通过采用基于排名的任务级学习目标来有效管理这些异质任务，即Heterogeneous Tasks Robust Meta-learning (HeTRoM)。HeTRoM能够有效处理异质任务，防止简单任务压倒元学习器。该方法允许基于双层优化的高效迭代优化算法，并通过整合统计指导来改进。我们的实验结果表明，我们的方法提供了灵活性，使用户能够适应不同的任务设置，并提高元学习器的整体性能。

更新时间: 2024-10-24 16:32:23

领域: cs.LG

下载: http://arxiv.org/abs/2410.18894v1

Creating and Repairing Robot Programs in Open-World Domains

Using Large Language Models (LLMs) to produce robot programs from natural language has allowed for robot systems that can complete a higher diversity of tasks. However, LLM-generated programs may be faulty, either due to ambiguity in instructions, misinterpretation of the desired task, or missing information about the world state. As these programs run, the state of the world changes and they gather new information. When a failure occurs, it is important that they recover from the current world state and avoid repeating steps that they they previously completed successfully. We propose RoboRepair, a system which traces the execution of a program up until error, and then runs an LLM-produced recovery program that minimizes repeated actions. To evaluate the efficacy of our system, we create a benchmark consisting of eleven tasks with various error conditions that require the generation of a recovery program. We compare the efficiency of the recovery program to a plan built with an oracle that has foreknowledge of future errors.

Updated: 2024-10-24 16:30:14

标题: 在开放世界领域中创建和修复机器人程序

摘要: 使用大型语言模型（LLMs）从自然语言生成机器人程序使得机器人系统能够完成更多种类的任务。然而，LLM生成的程序可能存在缺陷，可能是由于指令的模糊性、对期望任务的错误解释，或者对世界状态的缺失信息。随着这些程序的运行，世界状态会发生变化，它们会收集新的信息。当发生故障时，重要的是它们能够从当前的世界状态中恢复，并避免重复之前成功完成的步骤。我们提出了RoboRepair，这是一个系统，它跟踪程序的执行直到出现错误，然后运行一个由LLM生成的恢复程序，以最小化重复动作。为了评估我们系统的有效性，我们创建了一个基准测试，包括十一个任务，具有各种需要生成恢复程序的错误条件。我们将恢复程序的效率与一个具有对未来错误的预知的神谕构建的计划进行比较。

更新时间: 2024-10-24 16:30:14

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.18893v1

Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks

Recent advancements in Large Language Models (LLMs) have demonstrated exceptional capabilities in natural language understanding and generation. While these models excel in general complex reasoning tasks, they still face challenges in mathematical problem-solving and logical reasoning. To address these limitations, researchers have explored function calling abilities, allowing LLMs to execute provided functions and utilize their outputs for task completion. However, concentrating on specific tasks can be very inefficient for large-scale LLMs to be used, because of the expensive cost of training and inference stages they need in terms of computational resources. This study introduces a novel framework for training smaller language models in function calling, focusing on specific logical and mathematical reasoning tasks. The approach aims to improve performances of small-scale models for these tasks using function calling, ensuring a high level of accuracy. Our framework employs an agent that, given a problem and a set of callable functions, queries the LLM by injecting a description and examples of the usable functions into the prompt and managing their calls in a step-by-step reasoning chain. This process is used to create a dataset of correct and incorrect reasoning chain chat completions from a large-scale LLM. This dataset is used to train a smaller LLM using Reinforcement Learning from Human Feedback (RLHF), specifically employing the Direct Preference Optimization (DPO) technique. Experimental results demonstrate how the proposed approach balances the trade-off between model size and performance, improving the ability of function calling for reasoning tasks, in smaller models.

Updated: 2024-10-24 16:27:35

标题: 改进小规模大语言模型的函数调用以用于推理任务

摘要: 最近对大型语言模型（LLMs）的进展表明，在自然语言理解和生成方面具有卓越的能力。虽然这些模型在一般复杂推理任务中表现出色，但它们在数学问题解决和逻辑推理方面仍面临挑战。为了解决这些限制，研究人员探索了函数调用能力，使LLMs能够执行提供的函数并利用它们的输出来完成任务。然而，集中于特定任务对于大规模LLMs的使用来说可能非常低效，因为它们在计算资源方面需要昂贵的训练和推理阶段。本研究引入了一个新颖的框架，用于在函数调用中训练较小的语言模型，重点关注特定的逻辑和数学推理任务。该方法旨在通过函数调用提高小规模模型在这些任务中的性能，确保高水平的准确性。我们的框架利用一个代理，给定一个问题和一组可调用函数，通过将可用函数的描述和示例注入提示，并在逐步推理链中管理它们的调用来查询LLM。这个过程用于从大规模LLM中创建一个正确和错误推理链聊天完成的数据集。这个数据集用于使用来自人类反馈的强化学习（RLHF）来训练一个较小的LLM，具体采用直接偏好优化（DPO）技术。实验结果展示了所提出的方法如何平衡模型大小和性能之间的权衡，提高了推理任务中函数调用的能力，适用于较小的模型。

更新时间: 2024-10-24 16:27:35

领域: cs.AI

下载: http://arxiv.org/abs/2410.18890v1

Generalizable, Fast, and Accurate DeepQSPR with fastprop

Quantitative Structure Property Relationship studies aim to define a mapping between molecular structure and arbitrary quantities of interest. This was historically accomplished via the development of descriptors which requires significant domain expertise and struggles to generalize. Thus the field has morphed into Molecular Property Prediction and been given over to learned representations which are highly generalizable. The paper introduces fastprop, a DeepQSPR framework which uses a cogent set of molecular level descriptors to meet and exceed the performance of learned representations on diverse datasets in dramatically less time. fastprop is freely available on github at github.com/JacksonBurns/fastprop.

Updated: 2024-10-24 16:18:47

标题: 可推广、快速且准确的具有fastprop的DeepQSPR

摘要: 数量结构性质关系研究旨在定义分子结构与任意感兴趣的数量之间的映射关系。这在历史上是通过开发描述符来实现的，这需要相当领域专业知识并且难以泛化。因此，该领域已经发展成为分子性质预测，并且转向使用高度泛化的学习表示。本文介绍了fastprop，一个使用一致的分子级描述符集合的DeepQSPR框架，以极大地缩短时间，在各种数据集上达到并超过学习表示的性能。fastprop可以在github上免费获得，网址为github.com/JacksonBurns/fastprop。

更新时间: 2024-10-24 16:18:47

领域: cs.LG,physics.chem-ph

下载: http://arxiv.org/abs/2404.02058v4

Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences

One-step text-to-image generator models offer advantages such as swift inference efficiency, flexible architectures, and state-of-the-art generation performance. In this paper, we study the problem of aligning one-step generator models with human preferences for the first time. Inspired by the success of reinforcement learning using human feedback (RLHF), we formulate the alignment problem as maximizing expected human reward functions while adding an Integral Kullback-Leibler divergence term to prevent the generator from diverging. By overcoming technical challenges, we introduce Diff-Instruct++ (DI++), the first, fast-converging and image data-free human preference alignment method for one-step text-to-image generators. We also introduce novel theoretical insights, showing that using CFG for diffusion distillation is secretly doing RLHF with DI++. Such an interesting finding brings understanding and potential contributions to future research involving CFG. In the experiment sections, we align both UNet-based and DiT-based one-step generators using DI++, which use the Stable Diffusion 1.5 and the PixelArt-$\alpha$ as the reference diffusion processes. The resulting DiT-based one-step text-to-image model achieves a strong Aesthetic Score of 6.19 and an Image Reward of 1.24 on the COCO validation prompt dataset. It also achieves a leading Human preference Score (HPSv2.0) of 28.48, outperforming other open-sourced models such as Stable Diffusion XL, DMD2, SD-Turbo, as well as PixelArt-$\alpha$. Both theoretical contributions and empirical evidence indicate that DI++ is a strong human-preference alignment approach for one-step text-to-image models.

Updated: 2024-10-24 16:17:18

标题: Diff-Instruct++：训练一步文本到图像生成模型以与人类偏好对齐

摘要: 一步式文本到图像生成模型具有诸如快速推理效率、灵活的架构和最先进的生成性能等优势。本文首次研究将一步生成模型与人类偏好对齐的问题。受到使用人类反馈的强化学习（RLHF）成功的启发，我们将对齐问题表述为最大化期望人类奖励函数，同时添加一个积分Kullback-Leibler散度项以防止生成器发散。通过克服技术挑战，我们引入了Diff-Instruct++（DI++），这是第一个快速收敛且不需要图像数据的一步式文本到图像生成器的人类偏好对齐方法。我们还介绍了新颖的理论见解，表明使用CFG进行扩散精馏实际上是在秘密地使用DI++进行RLHF。这一有趣的发现为涉及CFG的未来研究带来了理解和潜在的贡献。在实验部分，我们使用DI++对基于UNet和DiT的一步生成器进行对齐，它们使用稳定扩散1.5和PixelArt-$\alpha$作为参考扩散过程。结果显示，基于DiT的一步式文本到图像模型在COCO验证提示数据集上实现了较强的美学评分6.19和图像奖励1.24。它还实现了领先的人类偏好得分（HPSv2.0）28.48，优于其他开源模型如Stable Diffusion XL、DMD2、SD-Turbo以及PixelArt-$\alpha$。理论贡献和实证证据表明，DI++是一种强大的人类偏好对齐方法，适用于一步式文本到图像模型。

更新时间: 2024-10-24 16:17:18

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18881v1

Towards Personal Data Sharing Autonomy:A Task-driven Data Capsule Sharing System

Personal data custodian services enable data owners to share their data with data consumers in a convenient manner, anytime and anywhere. However, with data hosted in these services being beyond the control of the data owners, it raises significant concerns about privacy in personal data sharing. Many schemes have been proposed to realize fine-grained access control and privacy protection in data sharing. However, they fail to protect the rights of data owners to their data under the law, since their designs focus on the management of system administrators rather than enhancing the data owners' privacy. In this paper, we introduce a novel task-driven personal data sharing system based on the data capsule paradigm realizing personal data sharing autonomy. It enables data owners in our system to fully control their data, and share it autonomously. Specifically, we present a tamper-resistant data capsule encapsulation method, where the data capsule is the minimal unit for independent and secure personal data storage and sharing. Additionally, to realize selective sharing and informed-consent based authorization, we propose a task-driven data sharing mechanism that is resistant to collusion and EDoS attacks. Furthermore, by updating parts of the data capsules, the permissions granted to data consumers can be immediately revoked. Finally, we conduct a security and performance analysis, proving that our scheme is correct, sound, and secure, as well as revealing more advantageous features in practicality, compared with the state-of-the-art schemes.

Updated: 2024-10-24 16:08:36

标题: 朝向个人数据共享自主权：基于任务驱动的数据胶囊共享系统

摘要: 个人数据保管服务使数据所有者能够以便捷的方式随时随地与数据使用者分享他们的数据。然而，由于托管在这些服务中的数据超出了数据所有者的控制范围，这引发了关于个人数据共享隐私的重大担忧。许多方案已被提出，以实现数据共享中的细粒度访问控制和隐私保护。然而，它们未能在法律下保护数据所有者对其数据的权利，因为它们的设计侧重于系统管理员的管理而不是增强数据所有者的隐私。在本文中，我们介绍了一种基于数据胶囊范式的新型任务驱动个人数据共享系统，实现了个人数据共享的自治。它使我们系统中的数据所有者完全控制其数据，并自主共享。具体地，我们提出了一种防篡改的数据胶囊封装方法，其中数据胶囊是独立和安全的个人数据存储和共享的最小单位。此外，为了实现选择性共享和基于知情同意的授权，我们提出了一种抵抗串通和EDoS攻击的任务驱动数据共享机制。此外，通过更新数据胶囊的部分内容，立即可以吊销授予数据使用者的权限。最后，我们进行了安全性和性能分析，证明我们的方案是正确、完善和安全的，同时揭示了与现有技术方案相比在实用性上更具有优势的特点。

更新时间: 2024-10-24 16:08:36

领域: cs.CR

下载: http://arxiv.org/abs/2409.18449v2

Guiding Empowerment Model: Liberating Neurodiversity in Online Higher Education

In this innovative practice full paper, we address the equity gap for neurodivergent and situationally limited learners by identifying the spectrum of dynamic factors that impact learning and function. Educators have shown a growing interest in identifying learners' cognitive abilities and learning preferences to measure their impact on academic achievement. Often institutions employ one-size-fits-all approaches leaving the burden on disabled students to self-advocate or tolerate inadequate support. Emerging frameworks guide neurodivergent learners through instructional approaches, such as online education. However, these frameworks fail to address holistic environmental needs or recommend technology interventions, particularly for those with undisclosed learning or developmental disabilities and situational limitations. In this article, we integrate a neurodivergent perspective through secondary research of around 100 articles to introduce a Guiding Empowerment Model involving key cognitive and situational factors that contextualize day-to-day experiences affecting learner ability. We synthesize three sample student profiles that highlight user problems in functioning. We use this model to evaluate sample learning platform features and other supportive technology solutions. The proposed approach augments frameworks such as Universal Design for Learning to consider factors including various sensory processing differences, social connection challenges, and environmental limitations. We suggest that by applying the mode through technology-enabled features such as customizable task management, guided varied content access, and guided multi-modal collaboration, major learning barriers of neurodivergent and situationally limited learners will be removed to activate the successful pursuit of their academic goals.

Updated: 2024-10-24 16:05:38

标题: 指导赋权模式：在在线高等教育中解放神经多样性

摘要: 在这篇创新实践的完整论文中，我们通过确定影响学习和功能的动态因素的范围，解决了神经非正常和情境受限学习者的公平差距。教育工作者对识别学习者的认知能力和学习偏好以衡量其对学术成就的影响表现出越来越浓厚的兴趣。通常机构采用一揽子方法，将负担留给残障学生自我倡导或容忍不足的支持。新兴框架指导神经非正常学习者通过教学方法，如在线教育。然而，这些框架未能解决全面的环境需求，也未推荐技术干预，特别是对于那些未披露学习或发展性残疾和情境限制的人。在本文中，我们通过约100篇文章的二次研究，整合了神经非正常观点，引入了一个引导赋权模型，涉及关键的认知和情境因素，这些因素对影响学习者能力的日常经验进行了情境化。我们综合了三个示例学生档案，突显了功能中的用户问题。我们使用这一模型评估了示例学习平台功能和其他支持技术解决方案。我们提出的方法增强了诸如通用设计学习的框架，考虑了各种感觉处理差异、社交连接挑战和环境限制等因素。我们建议通过技术支持功能，如可定制的任务管理、引导多样内容访问和引导多模式协作，可以消除神经非正常和情境受限学习者的主要学习障碍，激活他们的学术目标的成功追求。

更新时间: 2024-10-24 16:05:38

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2410.18876v1

Exploring the Universe with SNAD: Anomaly Detection in Astronomy

SNAD is an international project with a primary focus on detecting astronomical anomalies within large-scale surveys, using active learning and other machine learning algorithms. The work carried out by SNAD not only contributes to the discovery and classification of various astronomical phenomena but also enhances our understanding and implementation of machine learning techniques within the field of astrophysics. This paper provides a review of the SNAD project and summarizes the advancements and achievements made by the team over several years.

Updated: 2024-10-24 16:05:11

标题: 用SNAD探索宇宙：天文学中的异常检测

摘要: SNAD是一个国际项目，其主要重点是利用主动学习和其他机器学习算法在大规模调查中检测天文异常。SNAD开展的工作不仅有助于发现和分类各种天文现象，还加强了我们对机器学习技术在天体物理领域的理解和应用。本文回顾了SNAD项目，并总结了团队多年来取得的进展和成就。

更新时间: 2024-10-24 16:05:11

领域: astro-ph.IM,cs.HC,cs.LG

下载: http://arxiv.org/abs/2410.18875v1

Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense

Recent advances in multi-agent reinforcement learning (MARL) have created opportunities to solve complex real-world tasks. Cybersecurity is a notable application area, where defending networks against sophisticated adversaries remains a challenging task typically performed by teams of security operators. In this work, we explore novel MARL strategies for building autonomous cyber network defenses that address challenges such as large policy spaces, partial observability, and stealthy, deceptive adversarial strategies. To facilitate efficient and generalized learning, we propose a hierarchical Proximal Policy Optimization (PPO) architecture that decomposes the cyber defense task into specific sub-tasks like network investigation and host recovery. Our approach involves training sub-policies for each sub-task using PPO enhanced with domain expertise. These sub-policies are then leveraged by a master defense policy that coordinates their selection to solve complex network defense tasks. Furthermore, the sub-policies can be fine-tuned and transferred with minimal cost to defend against shifts in adversarial behavior or changes in network settings. We conduct extensive experiments using CybORG Cage 4, the state-of-the-art MARL environment for cyber defense. Comparisons with multiple baselines across different adversaries show that our hierarchical learning approach achieves top performance in terms of convergence speed, episodic return, and several interpretable metrics relevant to cybersecurity, including the fraction of clean machines on the network, precision, and false positives on recoveries.

Updated: 2024-10-24 15:57:45

标题: 层次化多智能体强化学习用于网络安全防御

摘要: 最近在多智能体强化学习（MARL）方面取得的进展创造了解决复杂现实世界任务的机会。网络安全是一个显著的应用领域，在这里，防御网络免受复杂对手的攻击仍然是一个具有挑战性的任务，通常由安全操作团队执行。在这项工作中，我们探讨了用于构建自主网络防御的新型MARL策略，以解决诸如大型策略空间、部分可观察性以及隐蔽、欺骗性对手策略等挑战。为了促进高效和广义学习，我们提出了一种分层近端策略优化（PPO）架构，将网络防御任务分解为特定的子任务，如网络调查和主机恢复。我们的方法涉及使用领域专业知识增强的PPO训练每个子任务的子策略。然后，这些子策略由一个协调它们选择以解决复杂网络防御任务的主防御策略利用。此外，这些子策略可以在防御对手行为转变或网络设置更改时进行微调和转移，成本很小。我们使用CybORG Cage 4进行了广泛的实验，这是用于网络防御的最先进的MARL环境。与多个基线对手的比较表明，我们的分层学习方法在收敛速度、剧集回报和与网络安全相关的几个可解释指标方面实现了最佳性能，包括网络上干净机器的比例、精确度和恢复中的误报。

更新时间: 2024-10-24 15:57:45

领域: cs.LG,cs.CR,cs.MA

下载: http://arxiv.org/abs/2410.17351v2

End-to-end Training for Recommendation with Language-based User Profiles

Many online platforms maintain user profiles for personalization. Unfortunately, these profiles are typically not interpretable or easily modifiable by the user. To remedy this shortcoming, we explore natural language-based user profiles, as they promise enhanced transparency and scrutability of recommender systems. While existing work has shown that language-based profiles from standard LLMs can be effective, such generalist LLMs are unlikely to be optimal for this task. In this paper, we introduce LangPTune, the first end-to-end learning method for training LLMs to produce language-based user profiles that optimize recommendation effectiveness. Through comprehensive evaluations of LangPTune across various training configurations and benchmarks, we demonstrate that our approach significantly outperforms existing profile-based methods. In addition, it approaches performance levels comparable to state-of-the-art, less transparent recommender systems, providing a robust and interpretable alternative to conventional systems. Finally, we validate the relative interpretability of these language-based user profiles through user studies involving crowdworkers and GPT-4-based evaluations. Implementation of LangPTune can be found at https://github.com/ZhaolinGao/LangPTune.

Updated: 2024-10-24 15:57:17

标题: 基于语言的用户画像的推荐系统端到端训练

摘要: 许多在线平台维护用户个人资料以进行个性化定制。不幸的是，这些个人资料通常不易解释或用户难以修改。为了弥补这一不足，我们探索了基于自然语言的用户个人资料，因为它们承诺增强了推荐系统的透明度和可审查性。尽管现有研究表明，标准LLM生成的基于语言的个人资料可以有效，但这些通用的LLM不太可能对这一任务最优。在本文中，我们介绍了LangPTune，这是第一个端到端学习方法，用于训练LLM生成基于语言的用户个人资料，以优化推荐效果。通过对LangPTune在各种训练配置和基准测试中的全面评估，我们证明了我们的方法明显优于现有的基于个人资料的方法。此外，它接近性能水平与最先进的、不太透明的推荐系统相媲美，为传统系统提供了一个强大且可解释的替代方案。最后，我们通过涉及众包工作者和基于GPT-4的评估的用户研究验证了这些基于语言的用户个人资料的相对可解释性。LangPTune的实现可以在https://github.com/ZhaolinGao/LangPTune找到。

更新时间: 2024-10-24 15:57:17

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2410.18870v1

A Riemannian Framework for Learning Reduced-order Lagrangian Dynamics

By incorporating physical consistency as inductive bias, deep neural networks display increased generalization capabilities and data efficiency in learning nonlinear dynamic models. However, the complexity of these models generally increases with the system dimensionality, requiring larger datasets, more complex deep networks, and significant computational effort. We propose a novel geometric network architecture to learn physically-consistent reduced-order dynamic parameters that accurately describe the original high-dimensional system behavior. This is achieved by building on recent advances in model-order reduction and by adopting a Riemannian perspective to jointly learn a structure-preserving latent space and the associated low-dimensional dynamics. Our approach enables accurate long-term predictions of the high-dimensional dynamics of rigid and deformable systems with increased data efficiency by inferring interpretable and physically plausible reduced Lagrangian models.

Updated: 2024-10-24 15:53:21

标题: 学习降阶拉格朗日动力学的黎曼框架

摘要: 通过将物理一致性作为归纳偏差，深度神经网络展现出在学习非线性动态模型时具有增强的泛化能力和数据效率。然而，这些模型的复杂性通常随着系统维度的增加而增加，需要更大的数据集、更复杂的深度网络和显著的计算工作量。我们提出了一种新颖的几何网络架构，用于学习具有物理一致性的降阶动态参数，准确地描述原始高维系统行为。通过借鉴模型阶数缩减的最新进展，并采用黎曼视角来共同学习一个保持结构的潜在空间和相关的低维动态。我们的方法通过推断可解释和物理合理的降阶拉格朗日模型，实现了对刚体和可变形系统高维动态的准确长期预测，同时具有增强的数据效率。

更新时间: 2024-10-24 15:53:21

领域: cs.LG

下载: http://arxiv.org/abs/2410.18868v1

The Cat and Mouse Game: The Ongoing Arms Race Between Diffusion Models and Detection Methods

The emergence of diffusion models has transformed synthetic media generation, offering unmatched realism and control over content creation. These advancements have driven innovation across fields such as art, design, and scientific visualization. However, they also introduce significant ethical and societal challenges, particularly through the creation of hyper-realistic images that can facilitate deepfakes, misinformation, and unauthorized reproduction of copyrighted material. In response, the need for effective detection mechanisms has become increasingly urgent. This review examines the evolving adversarial relationship between diffusion model development and the advancement of detection methods. We present a thorough analysis of contemporary detection strategies, including frequency and spatial domain techniques, deep learning-based approaches, and hybrid models that combine multiple methodologies. We also highlight the importance of diverse datasets and standardized evaluation metrics in improving detection accuracy and generalizability. Our discussion explores the practical applications of these detection systems in copyright protection, misinformation prevention, and forensic analysis, while also addressing the ethical implications of synthetic media. Finally, we identify key research gaps and propose future directions to enhance the robustness and adaptability of detection methods in line with the rapid advancements of diffusion models. This review emphasizes the necessity of a comprehensive approach to mitigating the risks associated with AI-generated content in an increasingly digital world.

Updated: 2024-10-24 15:51:04

标题: 猫鼠游戏：扩散模型和检测方法之间持续的军备竞赛

摘要: 扩散模型的出现已经改变了合成媒体生成，提供了无与伦比的逼真性和对内容创作的控制。这些进步推动了艺术、设计和科学可视化等领域的创新。然而，它们也引入了重大的伦理和社会挑战，特别是通过创建可以促进深度伪造、误导和未经授权复制受版权保护材料的超逼真图像。作为回应，对有效的检测机制的需求变得日益紧迫。本综述文章考察了扩散模型发展与检测方法进步之间不断演变的对抗关系。我们对当代检测策略进行了深入分析，包括频域和空域技术、基于深度学习的方法，以及结合多种方法的混合模型。我们还强调了多样化数据集和标准化评估指标在提高检测准确性和泛化能力方面的重要性。我们的讨论探讨了这些检测系统在版权保护、防止误导和法庭分析等方面的实际应用，同时也讨论了合成媒体的伦理影响。最后，我们确定了关键的研究空白，并提出了未来方向，以增强检测方法的鲁棒性和适应性，与扩散模型的快速进步保持一致。本综述强调了在日益数字化的世界中综合应对人工智能生成内容风险的必要性。

更新时间: 2024-10-24 15:51:04

领域: cs.AI

下载: http://arxiv.org/abs/2410.18866v1

Omics-driven hybrid dynamic modeling of bioprocesses with uncertainty estimation

This work presents an omics-driven modeling pipeline that integrates machine-learning tools to facilitate the dynamic modeling of multiscale biological systems. Random forests and permutation feature importance are proposed to mine omics datasets, guiding feature selection and dimensionality reduction for dynamic modeling. Continuous and differentiable machine-learning functions can be trained to link the reduced omics feature set to key components of the dynamic model, resulting in a hybrid model. As proof of concept, we apply this framework to a high-dimensional proteomics dataset of $\textit{Saccharomyces cerevisiae}$. After identifying key intracellular proteins that correlate with cell growth, targeted dynamic experiments are designed, and key model parameters are captured as functions of the selected proteins using Gaussian processes. This approach captures the dynamic behavior of yeast strains under varying proteome profiles while estimating the uncertainty in the hybrid model's predictions. The outlined modeling framework is adaptable to other scenarios, such as integrating additional layers of omics data for more advanced multiscale biological systems, or employing alternative machine-learning methods to handle larger datasets. Overall, this study outlines a strategy for leveraging omics data to inform multiscale dynamic modeling in systems biology and bioprocess engineering.

Updated: 2024-10-24 15:50:35

标题: 基于组学驱动的生物过程混合动力学建模及不确定性估计

摘要: 这项工作提出了一个基于组学驱动的建模流程，整合了机器学习工具，以便实现多尺度生物系统的动态建模。随机森林和排列特征重要性被提出用于挖掘组学数据集，指导特征选择和降维以用于动态建模。连续且可微的机器学习函数可以被训练来将减少的组学特征集与动态模型的关键组件相连，从而得到一个混合模型。作为概念验证，我们将这一框架应用到了$\textit{Saccharomyces cerevisiae}$的高维蛋白质组学数据集中。在确定与细胞生长相关的关键胞内蛋白质之后，设计了有针对性的动态实验，并使用高斯过程将关键模型参数捕获为所选蛋白质的函数。这种方法捕获了酵母菌株在不同蛋白质组谱下的动态行为，同时估计了混合模型预测中的不确定性。所概述的建模框架可适用于其他情景，例如整合额外层次的组学数据以用于更高级的多尺度生物系统，或者采用其他机器学习方法来处理更大的数据集。总体而言，这项研究概述了一种利用组学数据来为系统生物学和生物过程工程中的多尺度动态建模提供信息的策略。

更新时间: 2024-10-24 15:50:35

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2410.18864v1

Learning Mathematical Rules with Large Language Models

In this paper, we study the ability of large language models to learn specific mathematical rules such as distributivity or simplifying equations. We present an empirical analysis of their ability to generalize these rules, as well as to reuse them in the context of word problems. For this purpose, we provide a rigorous methodology to build synthetic data incorporating such rules, and perform fine-tuning of large language models on such data. Our experiments show that our model can learn and generalize these rules to some extent, as well as suitably reuse them in the context of word problems.

Updated: 2024-10-24 15:49:03

标题: 使用大型语言模型学习数学规则

摘要: 在这篇论文中，我们研究了大型语言模型学习特定数学规则的能力，例如分配律或简化方程式。我们对它们推广这些规则的能力以及在单词问题背景下重复利用它们进行了实证分析。为此，我们提供了一个严谨的方法论来构建包含这些规则的合成数据，并在这些数据上对大型语言模型进行微调。我们的实验表明，我们的模型在一定程度上可以学习和推广这些规则，并在单词问题的背景下适当地重复利用它们。

更新时间: 2024-10-24 15:49:03

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.16973v2

Data Augmentation of Multivariate Sensor Time Series using Autoregressive Models and Application to Failure Prognostics

This work presents a novel data augmentation solution for non-stationary multivariate time series and its application to failure prognostics. The method extends previous work from the authors which is based on time-varying autoregressive processes. It can be employed to extract key information from a limited number of samples and generate new synthetic samples in a way that potentially improves the performance of PHM solutions. This is especially valuable in situations of data scarcity which are very usual in PHM, especially for failure prognostics. The proposed approach is tested based on the CMAPSS dataset, commonly employed for prognostics experiments and benchmarks. An AutoML approach from PHM literature is employed for automating the design of the prognostics solution. The empirical evaluation provides evidence that the proposed method can substantially improve the performance of PHM solutions.

Updated: 2024-10-24 15:48:48

标题: 多变量传感器时间序列的数据增强使用自回归模型及其在故障预测中的应用

摘要: 这项工作提出了一种新颖的数据增强解决方案，用于非平稳多变量时间序列及其在故障预测中的应用。该方法扩展了作者先前基于时间变动自回归过程的工作。它可以用于从有限数量的样本中提取关键信息，并以一种可能改进PHM解决方案性能的方式生成新的合成样本。这在PHM中数据稀缺的情况下尤为重要，尤其是对于故障预测。提出的方法在常用于预测实验和基准测试的CMAPSS数据集上进行了测试。PHM文献中的AutoML方法被用于自动化设计预测解决方案。实证评估表明，提出的方法可以显著提高PHM解决方案的性能。

更新时间: 2024-10-24 15:48:48

领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2410.16419v2

RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation

The emergence of Segment Anything (SAM) sparked research interest in the field of interactive segmentation, especially in the context of image editing tasks and speeding up data annotation. Unlike common semantic segmentation, interactive segmentation methods allow users to directly influence their output through prompts (e.g. clicks). However, click patterns in real-world interactive segmentation scenarios remain largely unexplored. Most methods rely on the assumption that users would click in the center of the largest erroneous area. Nevertheless, recent studies show that this is not always the case. Thus, methods may have poor performance in real-world deployment despite high metrics in a baseline benchmark. To accurately simulate real-user clicks, we conducted a large crowdsourcing study of click patterns in an interactive segmentation scenario and collected 475K real-user clicks. Drawing on ideas from saliency tasks, we develop a clickability model that enables sampling clicks, which closely resemble actual user inputs. Using our model and dataset, we propose RClicks benchmark for a comprehensive comparison of existing interactive segmentation methods on realistic clicks. Specifically, we evaluate not only the average quality of methods, but also the robustness w.r.t. click patterns. According to our benchmark, in real-world usage interactive segmentation models may perform worse than it has been reported in the baseline benchmark, and most of the methods are not robust. We believe that RClicks is a significant step towards creating interactive segmentation methods that provide the best user experience in real-world cases.

Updated: 2024-10-24 15:48:41

标题: RClicks：用于交互式分割基准测试的逼真点击模拟

摘要: 段分割技术(SAM)的出现引起了对交互式分割领域的研究兴趣，特别是在图像编辑任务和加速数据注释的背景下。与常见的语义分割不同，交互式分割方法允许用户通过提示(例如点击)直接影响其输出。然而，在真实世界的交互式分割场景中，点击模式仍然大部分未被探索。大多数方法依赖于用户会在最大错误区域的中心进行点击的假设。然而，最近的研究表明这并不总是成立。因此，尽管在基准测试中指标很高，但方法在实际部署中可能表现不佳。为了准确模拟真实用户的点击，我们进行了一项大规模的众包研究，研究了交互式分割场景中的点击模式，并收集了475K个真实用户点击。借鉴显著性任务的思想，我们开发了一个可实现采样点击的可点性模型，这些点击与实际用户输入非常相似。利用我们的模型和数据集，我们提出了一个名为RClicks的基准测试，用于全面比较现有交互式分割方法在真实点击上的表现。具体而言，我们评估方法的平均质量，以及相对于点击模式的稳健性。根据我们的基准测试，在真实世界的使用中，交互式分割模型的表现可能比基准测试中报道的要差，大多数方法也不够稳健。我们认为RClicks是朝着创建能在实际情况下提供最佳用户体验的交互式分割方法迈出的重要一步。

更新时间: 2024-10-24 15:48:41

领域: cs.CV,cs.AI,cs.HC,I.4.6

下载: http://arxiv.org/abs/2410.11722v2

FedSPD: A Soft-clustering Approach for Personalized Decentralized Federated Learning

Federated learning has recently gained popularity as a framework for distributed clients to collaboratively train a machine learning model using local data. While traditional federated learning relies on a central server for model aggregation, recent advancements adopt a decentralized framework, enabling direct model exchange between clients and eliminating the single point of failure. However, existing decentralized frameworks often assume all clients train a shared model. Personalizing each client's model can enhance performance, especially with heterogeneous client data distributions. We propose FedSPD, an efficient personalized federated learning algorithm for the decentralized setting, and show that it learns accurate models even in low-connectivity networks. To provide theoretical guarantees on convergence, we introduce a clustering-based framework that enables consensus on models for distinct data clusters while personalizing to unique mixtures of these clusters at different clients. This flexibility, allowing selective model updates based on data distribution, substantially reduces communication costs compared to prior work on personalized federated learning in decentralized settings. Experimental results on real-world datasets show that FedSPD outperforms multiple decentralized variants of personalized federated learning algorithms, especially in scenarios with low-connectivity networks.

Updated: 2024-10-24 15:48:34

标题: FedSPD：一种个性化去中心化联邦学习的软聚类方法

摘要: 最近，联邦学习作为一个框架，让分布式客户端利用本地数据协作训练机器学习模型已经变得流行起来。传统的联邦学习依赖于中央服务器进行模型聚合，而最近的进展采用了分散化框架，实现了客户端之间直接模型交换，消除了单点故障。然而，现有的分散化框架通常假定所有客户端训练一个共享模型。个性化每个客户端的模型可以提升性能，尤其是在客户端数据分布不均匀的情况下。我们提出了FedSPD，一种高效的个性化联邦学习算法，适用于分散化环境，并展示它在低连接网络中学习准确的模型。为了在收敛上提供理论保证，我们引入了一个基于聚类的框架，可以在不同数据簇上达成模型共识，同时在不同客户端上个性化这些簇的独特混合。这种灵活性允许基于数据分布选择性地更新模型，与先前在分散化环境中个性化联邦学习的工作相比，大大降低了通信成本。在真实世界数据集上的实验结果表明，FedSPD在多个分散化变体的个性化联邦学习算法中表现更好，尤其是在低连接网络场景下。

更新时间: 2024-10-24 15:48:34

领域: cs.LG

下载: http://arxiv.org/abs/2410.18862v1

On high-dimensional modifications of the nearest neighbor classifier

Nearest neighbor classifier is arguably the most simple and popular nonparametric classifier available in the literature. However, due to the concentration of pairwise distances and the violation of the neighborhood structure, this classifier often suffers in high-dimension, low-sample size (HDLSS) situations, especially when the scale difference between the competing classes dominates their location difference. Several attempts have been made in the literature to take care of this problem. In this article, we discuss some of these existing methods and propose some new ones. We carry out some theoretical investigations in this regard and analyze several simulated and benchmark datasets to compare the empirical performances of proposed methods with some of the existing ones.

Updated: 2024-10-24 15:47:36

标题: 关于最近邻分类器的高维修正

摘要: 最近邻分类器可以说是文献中最简单且最受欢迎的非参数分类器。然而，由于成对距离的集中以及邻域结构的破坏，这种分类器在高维低样本量（HDLSS）情况下经常遇到困难，特别是当竞争类别之间的尺度差异占据其位置差异时。文献中已经进行了几次尝试来解决这个问题。在本文中，我们讨论了一些现有方法并提出了一些新方法。我们进行了一些理论研究，并分析了几个模拟和基准数据集，以比较所提出方法与一些现有方法的经验性能。

更新时间: 2024-10-24 15:47:36

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.05145v3

Predicting the Performance of Foundation Models via Agreement-on-the-Line

Estimating the out-of-distribution performance in regimes where labels are scarce is critical to safely deploy foundation models. Recently, it was shown that ensembles of neural networks observe the phenomena "agreement-on-the-line", which can be leveraged to reliably predict OOD performance without labels. However, in contrast to classical neural networks that are trained on in-distribution data from scratch for numerous epochs, foundation models undergo minimal finetuning from heavily pretrained weights, which may reduce the ensemble diversity needed to observe agreement-on-the-line. In our work, we demonstrate that when lightly finetuning multiple runs from a single foundation model, the choice of randomness during training (linear head initialization, data ordering, and data subsetting) can lead to drastically different levels of agreement-on-the-line in the resulting ensemble. Surprisingly, only random head initialization is able to reliably induce agreement-on-the-line in finetuned foundation models across vision and language benchmarks. Second, we demonstrate that ensembles of multiple foundation models pretrained on different datasets but finetuned on the same task can also show agreement-on-the-line. In total, by careful construction of a diverse ensemble, we can utilize agreement-on-the-line-based methods to predict the OOD performance of foundation models with high precision.

Updated: 2024-10-24 15:47:02

标题: 通过“一致性”预测基础模型的性能

摘要: 在标签稀缺的情况下估计基础模型的超出分布性能对于安全部署是至关重要的。最近，研究表明，神经网络集成观察到“在线一致性”现象，可以利用这一现象可靠地预测超出分布性能而无需标签。然而，与经典神经网络相反，后者是从头开始在分布数据上进行训练多个纪元，基础模型只需进行最少的微调，从预训练权重中受益，这可能会减少观察到在线一致性所需的集成多样性。在我们的工作中，我们展示了当从单个基础模型轻微微调多次运行时，培训过程中的随机选择（线性头初始化、数据排序和数据子集）会导致生成的集成中完全不同水平的在线一致性。令人惊讶的是，只有随机头初始化能够可靠地诱导视觉和语言基础模型中生成在线一致性。其次，我们证明，预训练于不同数据集但在相同任务上微调的多个基础模型集成也可以显示在线一致性。通过精心构建多样化的集成，我们可以利用基于在线一致性的方法高精度地预测基础模型的超出分布性能。

更新时间: 2024-10-24 15:47:02

领域: cs.LG

下载: http://arxiv.org/abs/2404.01542v2

Provably Robust Watermarks for Open-Source Language Models

The recent explosion of high-quality language models has necessitated new methods for identifying AI-generated text. Watermarking is a leading solution and could prove to be an essential tool in the age of generative AI. Existing approaches embed watermarks at inference and crucially rely on the large language model (LLM) specification and parameters being secret, which makes them inapplicable to the open-source setting. In this work, we introduce the first watermarking scheme for open-source LLMs. Our scheme works by modifying the parameters of the model, but the watermark can be detected from just the outputs of the model. Perhaps surprisingly, we prove that our watermarks are unremovable under certain assumptions about the adversary's knowledge. To demonstrate the behavior of our construction under concrete parameter instantiations, we present experimental results with OPT-6.7B and OPT-1.3B. We demonstrate robustness to both token substitution and perturbation of the model parameters. We find that the stronger of these attacks, the model-perturbation attack, requires deteriorating the quality score to 0 out of 100 in order to bring the detection rate down to 50%.

Updated: 2024-10-24 15:44:34

标题: 可证明的开源语言模型水印技术

摘要: 最近高质量语言模型的爆发迫使人们寻找新的方法来识别人工智能生成的文本。数字水印是一种领先的解决方案，可能在生成式人工智能时代成为一种必不可少的工具。现有方法在推理时嵌入水印，并且关键地依赖于大型语言模型（LLM）的规范和参数保密，这使它们不适用于开源环境。在这项工作中，我们引入了第一个针对开源LLM的水印方案。我们的方案通过修改模型的参数来运作，但水印可以仅通过模型的输出来检测。或许令人惊讶的是，我们证明了在对对手的知识做出一定假设的情况下，我们的水印是无法移除的。为了展示我们的构建在具体参数实例化下的行为，我们展示了与OPT-6.7B和OPT-1.3B的实验结果。我们证明了对令牌替换和模型参数扰动的强大韧性。我们发现，其中更强大的攻击，即模型扰动攻击，需要将质量分数降至100分中的0分，才能将检测率降至50%。

更新时间: 2024-10-24 15:44:34

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18861v1

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context or incorrectly recalling internal knowledge. Recent studies have identified specific attention heads within the Transformer architecture, known as retrieval heads, responsible for extracting relevant contextual information. We hypothesise that masking these retrieval heads can induce hallucinations and that contrasting the outputs of the base LLM and the masked LLM can reduce hallucinations. To this end, we propose Decoding by Contrasting Retrieval Heads (DeCoRe), a novel training-free decoding strategy that amplifies information found in the context and model parameters. DeCoRe mitigates potentially hallucinated responses by dynamically contrasting the outputs of the base LLM and the masked LLM, using conditional entropy as a guide. Our extensive experiments confirm that DeCoRe significantly improves performance on tasks requiring high contextual faithfulness, such as summarisation (XSum by 18.6%), instruction following (MemoTrap by 10.9%), and open-book question answering (NQ-Open by 2.4% and NQ-Swap by 5.5%).

Updated: 2024-10-24 15:44:33

标题: DeCoRe：通过对比检索头进行解码以减轻幻觉

摘要: 大型语言模型（LLMs）经常会产生幻觉，通过错误地表达提供的上下文或错误地回忆内部知识，从而产生不忠实或事实不准确的输出。最近的研究已经确定了Transformer架构内部的特定注意力头，被称为检索头，负责提取相关的上下文信息。我们假设屏蔽这些检索头可以诱发幻觉，并且对比基础LLM和屏蔽LLM的输出可以减少幻觉。为此，我们提出了一种新颖的无需训练的解码策略，名为通过对比检索头解码（DeCoRe），该策略可以放大上下文和模型参数中找到的信息。DeCoRe通过动态对比基础LLM和屏蔽LLM的输出，使用条件熵作为指导，从而减轻潜在的幻觉响应。我们的广泛实验证实，DeCoRe显著改善了需要高度上下文忠实的任务的性能，例如摘要（XSum提高了18.6%）、遵循说明（MemoTrap提高了10.9%）以及开放式问答（NQ-Open提高了2.4%，NQ-Swap提高了5.5%）。

更新时间: 2024-10-24 15:44:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.18860v1

Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional Tokens

Current progress in artificial intelligence is centered around so-called large language models that consist of neural networks processing long sequences of high-dimensional vectors called tokens. Statistical physics provides powerful tools to study the functioning of learning with neural networks and has played a recognized role in the development of modern machine learning. The statistical physics approach relies on simplified and analytically tractable models of data. However, simple tractable models for long sequences of high-dimensional tokens are largely underexplored. Inspired by the crucial role models such as the single-layer teacher-student perceptron (aka generalized linear regression) played in the theory of fully connected neural networks, in this paper, we introduce and study the bilinear sequence regression (BSR) as one of the most basic models for sequences of tokens. We note that modern architectures naturally subsume the BSR model due to the skip connections. Building on recent methodological progress, we compute the Bayes-optimal generalization error for the model in the limit of long sequences of high-dimensional tokens, and provide a message-passing algorithm that matches this performance. We quantify the improvement that optimal learning brings with respect to vectorizing the sequence of tokens and learning via simple linear regression. We also unveil surprising properties of the gradient descent algorithms in the BSR model.

Updated: 2024-10-24 15:44:03

标题: 双线性序列回归：一种用于从高维标记长序列中学习的模型

摘要: 目前人工智能领域的进展集中在所谓的大语言模型上，这些模型由处理称为令牌的高维向量的长序列的神经网络组成。统计物理提供了强大的工具来研究神经网络学习的功能，并在现代机器学习的发展中发挥了重要作用。统计物理方法依赖于数据的简化和分析可追踪的模型。然而，对于长序列高维令牌的简单可追踪模型在很大程度上尚未得到充分探索。受到单层教师-学生感知器（也称为广义线性回归）在全连接神经网络理论中发挥的关键作用的启发，本文介绍并研究了双线性序列回归（BSR）作为令牌序列的最基本模型之一。我们注意到现代架构自然包含了BSR模型，因为存在跳跃连接。借助最近的方法论进展，我们计算了在长序列高维令牌的极限情况下该模型的贝叶斯最优泛化误差，并提供了一个可以匹配该性能的消息传递算法。我们量化了最优学习相对于将令牌序列矢量化并通过简单线性回归进行学习带来的改进。我们还揭示了BSR模型中梯度下降算法的一些令人惊讶的特性。

更新时间: 2024-10-24 15:44:03

领域: cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2410.18858v1

Probabilistic Language-Image Pre-Training

Vision-language models (VLMs) embed aligned image-text pairs into a joint space but often rely on deterministic embeddings, assuming a one-to-one correspondence between images and texts. This oversimplifies real-world relationships, which are inherently many-to-many, with multiple captions describing a single image and vice versa. We introduce Probabilistic Language-Image Pre-training (ProLIP), the first probabilistic VLM pre-trained on a billion-scale image-text dataset using only probabilistic objectives, achieving a strong zero-shot capability (e.g., 74.6% ImageNet zero-shot accuracy with ViT-B/16). ProLIP efficiently estimates uncertainty by an "uncertainty token" without extra parameters. We also introduce a novel inclusion loss that enforces distributional inclusion relationships between image-text pairs and between original and masked inputs. Experiments demonstrate that, by leveraging uncertainty estimates, ProLIP benefits downstream tasks and aligns with intuitive notions of uncertainty, e.g., shorter texts being more uncertain and more general inputs including specific ones. Utilizing text uncertainties, we further improve ImageNet accuracy from 74.6% to 75.8% (under a few-shot setting), supporting the practical advantages of our probabilistic approach. The code is available at https://github.com/naver-ai/prolip

Updated: 2024-10-24 15:42:25

标题: 概率语言图像预训练

摘要: 视觉语言模型（VLMs）将对齐的图像文本对嵌入到一个共同的空间中，但通常依赖确定性嵌入，假设图像和文本之间是一对一的对应关系。这简化了现实世界的关系，实际上关系是多对多的，一个图像可能有多个描述，反之亦然。我们介绍了概率语言-图像预训练（ProLIP），这是第一个在十亿规模的图像文本数据集上只使用概率目标进行预训练的概率VLM，实现了强大的零样本能力（例如，使用ViT-B/16的ImageNet零样本准确率为74.6%）。ProLIP通过“不确定性标记”有效地估计不确定性，而不需要额外的参数。我们还引入了一种新的包含损失，强化了图像-文本对和原始及屏蔽输入之间的分布包含关系。实验证明，通过利用不确定性估计，ProLIP有益于下游任务，并与不确定性的直观概念相一致，例如，较短的文本更不确定，包含更多通用输入的输入包含特定输入。利用文本的不确定性，我们进一步将ImageNet准确率从74.6%提高到75.8%（在少样本设置下），支持我们概率方法的实际优势。代码可在https://github.com/naver-ai/prolip找到。

更新时间: 2024-10-24 15:42:25

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18857v1

Demystifying Large Language Models for Medicine: A Primer

Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare by generating human-like responses across diverse contexts and adapting to novel tasks following human instructions. Their potential application spans a broad range of medical tasks, such as clinical documentation, matching patients to clinical trials, and answering medical questions. In this primer paper, we propose an actionable guideline to help healthcare professionals more efficiently utilize LLMs in their work, along with a set of best practices. This approach consists of several main phases, including formulating the task, choosing LLMs, prompt engineering, fine-tuning, and deployment. We start with the discussion of critical considerations in identifying healthcare tasks that align with the core capabilities of LLMs and selecting models based on the selected task and data, performance requirements, and model interface. We then review the strategies, such as prompt engineering and fine-tuning, to adapt standard LLMs to specialized medical tasks. Deployment considerations, including regulatory compliance, ethical guidelines, and continuous monitoring for fairness and bias, are also discussed. By providing a structured step-by-step methodology, this tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice, ensuring that these powerful technologies are applied in a safe, reliable, and impactful manner.

Updated: 2024-10-24 15:41:56

标题: 揭秘用于医学的大型语言模型：入门指南

摘要: 大型语言模型（LLMs）代表了一类具有变革性的人工智能工具，能够通过生成在不同背景下类似人类的回应，并根据人类指示适应新任务，从而彻底改变医疗保健的各个方面。它们的潜在应用涵盖了广泛的医疗任务，如临床文档编制、将患者与临床试验匹配以及回答医学问题。在本文中，我们提出了一个可操作的指南，以帮助医疗保健专业人士更有效地利用LLMs进行工作，并提出了一套最佳实践。这种方法包括几个主要阶段，包括制定任务、选择LLMs、提示工程、微调和部署。我们首先讨论了在确定与LLMs的核心能力相符的医疗任务和根据选定的任务和数据、性能要求以及模型接口选择模型时的关键考虑因素。然后，我们回顾了一些策略，如提示工程和微调，以将标准LLMs调整为专门的医学任务。部署考虑因素，包括遵守监管要求、道德指导方针以及对公平性和偏见的持续监控，也在讨论范围之内。通过提供一个结构化的逐步方法，本教程旨在装备医疗保健专业人员所需的工具，以有效地将LLMs整合到临床实践中，确保这些强大的技术以安全、可靠和有影响力的方式应用。

更新时间: 2024-10-24 15:41:56

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18856v1

ONCOPILOT: A Promptable CT Foundation Model For Solid Tumor Evaluation

Carcinogenesis is a proteiform phenomenon, with tumors emerging in various locations and displaying complex, diverse shapes. At the crucial intersection of research and clinical practice, it demands precise and flexible assessment. However, current biomarkers, such as RECIST 1.1's long and short axis measurements, fall short of capturing this complexity, offering an approximate estimate of tumor burden and a simplistic representation of a more intricate process. Additionally, existing supervised AI models face challenges in addressing the variability in tumor presentations, limiting their clinical utility. These limitations arise from the scarcity of annotations and the models' focus on narrowly defined tasks. To address these challenges, we developed ONCOPILOT, an interactive radiological foundation model trained on approximately 7,500 CT scans covering the whole body, from both normal anatomy and a wide range of oncological cases. ONCOPILOT performs 3D tumor segmentation using visual prompts like point-click and bounding boxes, outperforming state-of-the-art models (e.g., nnUnet) and achieving radiologist-level accuracy in RECIST 1.1 measurements. The key advantage of this foundation model is its ability to surpass state-of-the-art performance while keeping the radiologist in the loop, a capability that previous models could not achieve. When radiologists interactively refine the segmentations, accuracy improves further. ONCOPILOT also accelerates measurement processes and reduces inter-reader variability, facilitating volumetric analysis and unlocking new biomarkers for deeper insights. This AI assistant is expected to enhance the precision of RECIST 1.1 measurements, unlock the potential of volumetric biomarkers, and improve patient stratification and clinical care, while seamlessly integrating into the radiological workflow.

Updated: 2024-10-24 15:35:58

标题: ONCOPILOT：一个用于固体肿瘤评估的可提示的CT基础模型

摘要: 癌症发生是一种多样化的现象，肿瘤在不同部位出现，呈现复杂多样的形态。在研究和临床实践的关键交汇处，对精确和灵活的评估提出了要求。然而，当前的生物标志物，如 RECIST 1.1 的长短轴测量，未能捕捉到这种复杂性，只提供了肿瘤负担的近似估计和更复杂过程的简单表示。此外，现有的监督式人工智能模型在解决肿瘤表现的多样性方面面临挑战，限制了它们在临床中的实用性。这些限制源于注释的稀缺性和模型专注于狭义任务的特点。为了解决这些挑战，我们开发了 ONCOPILOT，这是一个交互式放射学基础模型，训练了大约 7500 个涵盖全身正常解剖和各种肿瘤病例的 CT 扫描。ONCOPILOT 使用点击和边界框等视觉提示进行 3D 肿瘤分割，优于最先进的模型（如 nnUnet），在 RECIST 1.1 测量中实现了放射科医生水平的准确性。这个基础模型的关键优势在于它能够超越最先进的性能，同时保持放射科医生的参与，这是之前的模型无法实现的。当放射科医生互动地优化分割时，准确性会进一步提高。ONCOPILOT 还加快了测量过程，减少了读者之间的差异，促进了容积分析，为更深入的洞察解锁了新的生物标志物。预计这个人工智能助手将提高 RECIST 1.1 测量的精度，释放容积生物标志物的潜力，改善患者分层和临床护理，同时无缝地融入放射学工作流程中。

更新时间: 2024-10-24 15:35:58

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.07908v3

Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets

We prove rich algebraic structures of the solution space for 2-layer neural networks with quadratic activation and $L_2$ loss, trained on reasoning tasks in Abelian group (e.g., modular addition). Such a rich structure enables analytical construction of global optimal solutions from partial solutions that only satisfy part of the loss, despite its high nonlinearity. We coin the framework as CoGO (Composing Global Optimizers). Specifically, we show that the weight space over different numbers of hidden nodes of the 2-layer network is equipped with a semi-ring algebraic structure, and the loss function to be optimized consists of monomial potentials, which are ring homomorphism, allowing partial solutions to be composed into global ones by ring addition and multiplication. Our experiments show that around $95\%$ of the solutions obtained by gradient descent match exactly our theoretical constructions. Although the global optimizers constructed only required a small number of hidden nodes, our analysis on gradient dynamics shows that over-parameterization asymptotically decouples training dynamics and is beneficial. We further show that training dynamics favors simpler solutions under weight decay, and thus high-order global optimizers such as perfect memorization are unfavorable.

Updated: 2024-10-24 15:35:48

标题: 利用神经网络中的代数对象将全局优化器组合到推理任务中 (Note: "Composing" in this context means combining or integrating.)

摘要: 我们证明了使用二次激活和$L_2$损失训练在阿贝尔群（例如，模加法）推理任务上的2层神经网络的解空间具有丰富的代数结构。这种丰富的结构使得可以从仅满足部分损失的部分解析构造全局最优解，尽管其非线性性很高。我们将这个框架称为CoGO（组合全局优化器）。具体来说，我们展示了2层网络不同隐藏节点数量的权重空间具有半环代数结构，而要优化的损失函数由单项式势构成，这些势是环同态，允许通过环加法和乘法将部分解合并成全局解。我们的实验表明，通过梯度下降获得的解约95%与我们的理论构建完全匹配。尽管构建的全局优化器只需要很少的隐藏节点，但我们对梯度动力学的分析显示，过度参数化在渐近条件下解耦训练动态是有益的。我们进一步表明，权重衰减下的训练动态偏向于简单的解，因此高阶全局优化器如完全记忆是不利的。

更新时间: 2024-10-24 15:35:48

领域: cs.LG,cs.AI,cs.CL,math.AC,math.RA

下载: http://arxiv.org/abs/2410.01779v2

DL-Polycube: Deep learning enhanced polycube method for high-quality hexahedral mesh generation and volumetric spline construction

In this paper, we present a novel algorithm that integrates deep learning with the polycube method (DL-Polycube) to generate high-quality hexahedral (hex) meshes, which are then used to construct volumetric splines for isogeometric analysis. Our DL-Polycube algorithm begins by establishing a connection between surface triangular meshes and polycube structures. We employ deep neural network to classify surface triangular meshes into their corresponding polycube structures. Following this, we combine the acquired polycube structural information with unsupervised learning to perform surface segmentation of triangular meshes. This step addresses the issue of segmentation not corresponding to a polycube while reducing manual intervention. Quality hex meshes are then generated from the polycube structures, with employing octree subdivision, parametric mapping and quality improvement techniques. The incorporation of deep learning for creating polycube structures, combined with unsupervised learning for segmentation of surface triangular meshes, substantially accelerates hex mesh generation. Finally, truncated hierarchical B-splines are constructed on the generated hex meshes. We extract trivariate B\'ezier elements from these splines and apply them directly in isogeometric analysis. We offer several examples to demonstrate the robustness of our DL-Polycube algorithm.

Updated: 2024-10-24 15:35:08

标题: DL-Polycube：深度学习增强的多立方体方法用于高质量六面体网格生成和体积样条构建

摘要: 在本文中，我们提出了一种集成深度学习和多立方体方法（DL-Polycube）的新算法，用于生成高质量的六面体（hex）网格，然后用于构建用于等几何分析的体积样条。我们的DL-Polycube算法首先建立了表面三角形网格和多立方体结构之间的联系。我们利用深度神经网络将表面三角形网格分类为其相应的多立方体结构。随后，我们将获得的多立方体结构信息与无监督学习相结合，对三角形网格进行表面分割。这一步解决了分割不对应多立方体的问题，同时减少了手动干预。然后，通过使用八叉树细分、参数映射和质量改进技术，从多立方体结构中生成高质量的六面体网格。将深度学习用于创建多立方体结构，结合无监督学习用于三角形网格的分割，大大加速了六面体网格的生成。最后，在生成的六面体网格上构建了截断的分层B样条。我们从这些样条中提取三变量Bézier元素，并将它们直接应用于等几何分析。我们提供了几个示例来展示我们的DL-Polycube算法的稳健性。

更新时间: 2024-10-24 15:35:08

领域: cs.CG,cs.AI,cs.NA,math.NA

下载: http://arxiv.org/abs/2410.18852v1

Ensemble architecture in polyp segmentation

In this research, we revisit the architecture of semantic segmentation and evaluate the models excelling in polyp segmentation. We introduce an integrated framework that harnesses the advantages of different models to attain an optimal outcome. More specifically, we fuse the learned features from convolutional and transformer models for prediction, and we view this approach as an ensemble technique to enhance model performance. Our experiments on polyp segmentation reveal that the proposed architecture surpasses other top models, exhibiting improved learning capacity and resilience. The code is available at https://github.com/HuangDLab/EnFormer.

Updated: 2024-10-24 15:28:32

标题: 多多体架构在息肉分割中的应用

摘要: 在这项研究中，我们重新审视了语义分割的架构，并评估在息肉分割方面表现优异的模型。我们引入了一个集成框架，利用不同模型的优势来实现最佳结果。更具体地，我们融合了来自卷积和转换器模型的学习特征进行预测，我们将这种方法视为一种集成技术，以增强模型性能。我们在息肉分割实验中发现，提出的架构超越了其他顶尖模型，展示了改进的学习能力和韧性。代码可在https://github.com/HuangDLab/EnFormer 上找到。

更新时间: 2024-10-24 15:28:32

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.07262v2

Expanding AI Awareness Through Everyday Interactions with AI: A Reflective Journal Study

As the application of AI continues to expand, students in technology programs are poised to be both producers and users of the technologies. They are also positioned to engage with AI applications within and outside the classroom. While focusing on the curriculum when examining students' AI knowledge is common, extending this connection to students' everyday interactions with AI provides a more complete picture of their learning. In this paper, we explore student's awareness and engagement with AI in the context of school and their daily lives. Over six weeks, 22 undergraduate students participated in a reflective journal study and submitted a weekly journal entry about their interactions with AI. The participants were recruited from a technology and society course that focuses on the implications of technology on people, communities, and processes. In their weekly journal entries, participants reflected on interactions with AI on campus (coursework, advertises campus events, or seminars) and beyond (social media, news, or conversations with friends and family). The journal prompts were designed to help them think through what they had read, watched, or been told and reflect on the development of their own perspectives, knowledge, and literacy on the topic. Overall, students described nine categories of interactions: coursework, news and current events, using software and applications, university events, social media related to their work, personal discussions with friends and family, interacting with content, and gaming. Students reported that completing the diaries allowed them time for reflection and made them more aware of the presence of AI in their daily lives and of its potential benefits and drawbacks. This research contributes to the ongoing work on AI awareness and literacy by bringing in perspectives from beyond a formal educational context.

Updated: 2024-10-24 15:26:34

标题: 通过与AI的日常互动拓展AI意识：一项反思性日记研究

摘要: 随着人工智能的应用不断扩大，技术项目的学生有望成为技术的生产者和使用者。他们还可以参与课堂内外的人工智能应用。在研究学生人工智能知识时关注课程是常见的，将这种联系延伸到学生日常与人工智能的互动可以更全面地了解他们的学习情况。本文探讨了学生在学校和日常生活中对人工智能的认识和参与。在为期六周的时间里，22名本科生参与了一项反思性日记研究，并每周提交一篇关于他们与人工智能互动的日记。参与者是从一个关注技术对人、社区和流程影响的技术与社会课程中招募的。在他们每周的日记条目中，参与者反思了与人工智能在校园内（课程、宣传校园活动或研讨会）和校园外（社交媒体、新闻或与朋友家人的交谈）的互动。日记提醒旨在帮助他们思考他们所读、所看或所听到的，并反思他们对该主题的观点、知识和素养的发展。总体而言，学生描述了九类互动：课程作业、新闻与时事、使用软件和应用、大学活动、与社交媒体相关的工作、与朋友和家人的个人讨论、与内容互动以及游戏。学生报告称完成日记让他们有时间反思，并让他们更加意识到人工智能在他们日常生活中的存在以及其潜在的益处和弊端。这项研究通过引入来自非正式教育背景的观点，为人工智能认知和素养的持续工作做出了贡献。

更新时间: 2024-10-24 15:26:34

领域: cs.HC,cs.AI,cs.CY

下载: http://arxiv.org/abs/2410.18845v1

Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints

Pure exploration in bandits models multiple real-world problems, such as tuning hyper-parameters or conducting user studies, where different safety, resource, and fairness constraints on the decision space naturally appear. We study these problems as pure exploration in multi-armed bandits with unknown linear constraints, where the aim is to identify an $r$$\textit{-good feasible policy}$. First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration under constraints. We show how this lower bound evolves with the sequential estimation of constraints. Second, we leverage the Lagrangian lower bound and the properties of convex optimisation to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorer, namely LATS and LAGEX. To this end, we propose a constraint-adaptive stopping rule, and while tracking the lower bound, use pessimistic estimate of the feasible set at each step. We show that these algorithms achieve asymptotically optimal sample complexity upper bounds up to constraint-dependent constants. Finally, we conduct numerical experiments with different reward distributions and constraints that validate efficient performance of LAGEX and LATS with respect to baselines.

Updated: 2024-10-24 15:26:14

标题: 学习使用拉格朗日方法在未知线性约束下进行探索的赌博机

摘要: 在赌博模型中的纯探索解决了多个现实世界中的问题，例如调整超参数或进行用户研究，在这些问题中，决策空间上自然出现不同的安全性、资源和公平性约束。我们将这些问题研究为带有未知线性约束的多臂赌博中的纯探索，其目标是识别一个$r$-good可行策略。首先，我们提出了一个拉格朗日松弛的纯探索下界的样本复杂度。我们展示了这个下界如何随着对约束的顺序估计而演变。其次，我们利用拉格朗日下界和凸优化的性质，提出了Track-and-Stop和Gamified Explorer的两种计算效率扩展，即LATS和LAGEX。为此，我们提出了一种约束自适应停止规则，并在跟踪下界的同时，在每一步使用可行集的悲观估计。我们展示这些算法达到了渐近最优的样本复杂度上界，直到与约束相关的常数。最后，我们进行了具有不同奖励分布和约束的数值实验，验证了LAGEX和LATS相对于基线的高效性能。

更新时间: 2024-10-24 15:26:14

领域: cs.LG,cs.AI,stat.ME,stat.ML

下载: http://arxiv.org/abs/2410.18844v1

From Efficiency to Equity: Measuring Fairness in Preference Learning

As AI systems, particularly generative models, increasingly influence decision-making, ensuring that they are able to fairly represent diverse human preferences becomes crucial. This paper introduces a novel framework for evaluating epistemic fairness in preference learning models inspired by economic theories of inequality and Rawlsian justice. We propose metrics adapted from the Gini Coefficient, Atkinson Index, and Kuznets Ratio to quantify fairness in these models. We validate our approach using two datasets: a custom visual preference dataset (AI-EDI-Space) and the Jester Jokes dataset. Our analysis reveals variations in model performance across users, highlighting potential epistemic injustices. We explore pre-processing and in-processing techniques to mitigate these inequalities, demonstrating a complex relationship between model efficiency and fairness. This work contributes to AI ethics by providing a framework for evaluating and improving epistemic fairness in preference learning models, offering insights for developing more inclusive AI systems in contexts where diverse human preferences are crucial.

Updated: 2024-10-24 15:25:56

标题: 从效率到公平：衡量偏好学习中的公平性

摘要: 随着人工智能系统，特别是生成模型，越来越影响决策，确保它们能够公平地代表多样化的人类偏好变得至关重要。本文介绍了一个新颖的框架，用于评估受经济不平等理论和罗尔斯正义启发的偏好学习模型中的认知公平性。我们提出了从基尼系数、阿特金森指数和库兹涅茨比率中改编的度量方法来量化这些模型中的公平性。我们使用两个数据集验证了我们的方法：一个是自定义的视觉偏好数据集（AI-EDI-Space），另一个是Jester Jokes数据集。我们的分析揭示了用户之间模型性能的变化，突出显示潜在的认知不公正。我们探索了预处理和内部处理技术来减轻这些不平等，展示了模型效率和公平性之间的复杂关系。本研究通过提供一个评估和改善偏好学习模型中认知公平性的框架，为AI伦理学做出了贡献，为在多样化人类偏好至关重要的情境中开发更具包容性的人工智能系统提供了见解。

更新时间: 2024-10-24 15:25:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.18841v1

High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws

A growing number of machine learning scenarios rely on knowledge distillation where one uses the output of a surrogate model as labels to supervise the training of a target model. In this work, we provide a sharp characterization of this process for ridgeless, high-dimensional regression, under two settings: (i) model shift, where the surrogate model is arbitrary, and (ii) distribution shift, where the surrogate model is the solution of empirical risk minimization with out-of-distribution data. In both cases, we characterize the precise risk of the target model through non-asymptotic bounds in terms of sample size and data distribution under mild conditions. As a consequence, we identify the form of the optimal surrogate model, which reveals the benefits and limitations of discarding weak features in a data-dependent fashion. In the context of weak-to-strong (W2S) generalization, this has the interpretation that (i) W2S training, with the surrogate as the weak model, can provably outperform training with strong labels under the same data budget, but (ii) it is unable to improve the data scaling law. We validate our results on numerical experiments both on ridgeless regression and on neural network architectures.

Updated: 2024-10-24 15:22:53

标题: 高维知识蒸馏的分析：从弱到强的泛化和标度律

摘要: 机器学习场景中越来越多地依赖于知识蒸馏，即使用替代模型的输出作为标签来监督目标模型的训练。在这项工作中，我们对高维度回归的无岭情况下的这一过程进行了锐利的表征，分为两种情况：(i)模型偏移，其中替代模型是任意的，(ii)分布偏移，其中替代模型是在分布偏移数据的经验风险最小化解。在两种情况下，我们在轻微条件下通过样本大小和数据分布的非渐近界限来表征目标模型的精确风险。因此，我们确定了最佳替代模型的形式，揭示了以数据相关方式丢弃弱特征的好处和局限性。在弱到强（W2S）泛化的背景下，这意味着(i) W2S训练，以弱模型作为替代模型，可以在相同的数据预算下明显优于使用强标签进行训练，但(ii) 无法改进数据缩放定律。我们通过无岭回归和神经网络架构的数值实验验证了我们的结果。

更新时间: 2024-10-24 15:22:53

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.18837v1

From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages

In this paper, we propose a model-agnostic cost-effective approach to developing bilingual base large language models (LLMs) to support English and any target language. The method includes vocabulary expansion, initialization of new embeddings, model training and evaluation. We performed our experiments with three languages, each using a non-Latin script - Ukrainian, Arabic, and Georgian. Our approach demonstrates improved language performance while reducing computational costs. It mitigates the disproportionate penalization of underrepresented languages, promoting fairness and minimizing adverse phenomena such as code-switching and broken grammar. Additionally, we introduce new metrics to evaluate language quality, revealing that vocabulary size significantly impacts the quality of generated text.

Updated: 2024-10-24 15:20:54

标题: 从以英语为中心到有效的双语能力：具有定制分词器的代表性语言的LLM

摘要: 在这篇论文中，我们提出了一种与模型无关且成本效益高的方法，用于开发支持英语和任何目标语言的双语基础大语言模型（LLMs）。该方法包括词汇扩展、新嵌入的初始化、模型训练和评估。我们使用三种非拉丁文脚本的语言（乌克兰语、阿拉伯语和格鲁吉亚语）进行了实验。我们的方法展示了提高语言性能同时降低计算成本的优势。它减轻了对少数语言的不成比例惩罚，促进公平并最小化诸如代码切换和语法错误等不良现象。此外，我们引入了新的评估语言质量的指标，揭示了词汇量对生成文本质量的显著影响。

更新时间: 2024-10-24 15:20:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.18836v1

Highly efficient non-rigid registration in k-space with application to cardiac Magnetic Resonance Imaging

In Magnetic Resonance Imaging (MRI), high temporal-resolved motion can be useful for image acquisition and reconstruction, MR-guided radiotherapy, dynamic contrast-enhancement, flow and perfusion imaging, and functional assessment of motion patterns in cardiovascular, abdominal, peristaltic, fetal, or musculoskeletal imaging. Conventionally, these motion estimates are derived through image-based registration, a particularly challenging task for complex motion patterns and high dynamic resolution. The accelerated scans in such applications result in imaging artifacts that compromise the motion estimation. In this work, we propose a novel self-supervised deep learning-based framework, dubbed the Local-All Pass Attention Network (LAPANet), for non-rigid motion estimation directly from the acquired accelerated Fourier space, i.e. k-space. The proposed approach models non-rigid motion as the cumulative sum of local translational displacements, following the Local All-Pass (LAP) registration technique. LAPANet was evaluated on cardiac motion estimation across various sampling trajectories and acceleration rates. Our results demonstrate superior accuracy compared to prior conventional and deep learning-based registration methods, accommodating as few as 2 lines/frame in a Cartesian trajectory and 3 spokes/frame in a non-Cartesian trajectory. The achieved high temporal resolution (less than 5 ms) for non-rigid motion opens new avenues for motion detection, tracking and correction in dynamic and real-time MRI applications.

Updated: 2024-10-24 15:19:59

标题: 高效的k空间非刚性配准及其在心脏磁共振成像中的应用

摘要: 在磁共振成像（MRI）中，高时间分辨率的运动对于图像获取和重建、MR引导放疗、动态对比增强、流动和灌注成像，以及心血管、腹部、蠕动、胎儿或肌肉骨骼成像中的运动模式功能评估都是有用的。传统上，这些运动估计是通过基于图像的配准推导出来的，对于复杂的运动模式和高动态分辨率来说，这是一个特别具有挑战性的任务。在这些应用中加速扫描会导致成像伪影，从而影响了运动估计。在本研究中，我们提出了一个新颖的基于自监督深度学习的框架，命名为局部全通道关注网络（LAPANet），用于直接从获取的加速傅里叶空间，即k空间，进行非刚性运动估计。所提出的方法将非刚性运动建模为局部平移位移的累积和，遵循局部全通道（LAP）配准技术。LAPANet在不同采样轨迹和加速率下对心脏运动估计进行了评估。我们的结果表明，与传统和基于深度学习的配准方法相比，我们的方法具有更高的准确性，在笛卡尔轨迹中每帧只需2行，非笛卡尔轨迹中每帧只需3个条纹。对于非刚性运动，实现高时间分辨率（小于5毫秒）为动态和实时MRI应用中的运动检测、跟踪和校正开辟了新的途径。

更新时间: 2024-10-24 15:19:59

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18834v1

MazeNet: An Accurate, Fast, and Scalable Deep Learning Solution for Steiner Minimum Trees

The Obstacle Avoiding Rectilinear Steiner Minimum Tree (OARSMT) problem, which seeks the shortest interconnection of a given number of terminals in a rectilinear plane while avoiding obstacles, is a critical task in integrated circuit design, network optimization, and robot path planning. Since OARSMT is NP-hard, exact algorithms scale poorly with the number of terminals, leading practical solvers to sacrifice accuracy for large problems. We propose MazeNet, a deep learning-based method that learns to solve the OARSMT from data. MazeNet reframes OARSMT as a maze-solving task that can be addressed with a recurrent convolutional neural network (RCNN). A key hallmark of MazeNet is its scalability: we only need to train the RCNN blocks on mazes with a small number of terminals; larger mazes can be solved by replicating the same pre-trained blocks to create a larger network. Across a wide range of experiments, MazeNet achieves perfect OARSMT-solving accuracy, significantly reduces runtime compared to classical exact algorithms, and can handle more terminals than state-of-the-art approximate algorithms.

Updated: 2024-10-24 15:19:48

标题: MazeNet：一种准确、快速和可扩展的用于Steiner最小树的深度学习解决方案

摘要: 避障直线Steiner最小树（OARSMT）问题是在矩形平面上寻找给定数量终端的最短互连路径，同时避开障碍物，这是集成电路设计、网络优化和机器人路径规划中的关键任务。由于OARSMT是NP难问题，精确算法在终端数量增加时性能下降，导致实际求解器在处理大问题时牺牲准确性。我们提出了MazeNet，一种基于深度学习的方法，从数据中学习解决OARSMT问题。MazeNet将OARSMT重新构造为一个迷宫解决任务，可以用递归卷积神经网络（RCNN）来解决。MazeNet的一个关键特点是其可扩展性：我们只需要在具有少量终端的迷宫上训练RCNN块；更大的迷宫可以通过复制相同的预训练块来创建一个更大的网络。在广泛的实验中，MazeNet实现了完美的OARSMT解决准确性，与经典精确算法相比显著减少运行时间，并且可以处理比最先进的近似算法更多的终端。

更新时间: 2024-10-24 15:19:48

领域: cs.LG

下载: http://arxiv.org/abs/2410.18832v1

PSY: Posterior Sampling Based Privacy Enhancer in Large Language Models

Privacy vulnerabilities in LLMs, such as leakage from memorization, have been constantly identified, and various mitigation proposals have been proposed. LoRA is usually used in fine-tuning LLMs and a good entry point to insert privacy-enhancing modules. In this ongoing research, we introduce PSY, a Posterior Sampling based PrivacY enhancer that can be used in LoRA. We propose a simple yet effective realization of PSY using posterior sampling, which effectively prevents privacy leakage from intermediate information and, in turn, preserves the privacy of data owners. We evaluate LoRA extended with PSY against state-of-the-art membership inference and data extraction attacks. The experiments are executed on three different LLM architectures fine-tuned on three datasets with LoRA. In contrast to the commonly used differential privacy method, we find that our proposed modification consistently reduces the attack success rate. Meanwhile, our method has almost no negative impact on model fine-tuning or final performance. Most importantly, PSY reveals a promising path toward privacy enhancement with latent space extensions.

Updated: 2024-10-24 15:15:42

标题: PSY：大型语言模型中基于后验采样的隐私增强器

摘要: 对LLMs中的隐私漏洞，比如从记忆中泄漏，不断被发现，并提出了各种缓解方案。LoRA通常用于微调LLMs，并且是插入增强隐私模块的良好切入点。在这项正在进行的研究中，我们介绍了PSY，一个基于后验采样的隐私增强器，可以用于LoRA。我们提出了一个简单而有效的PSY实现，使用后验采样，有效地防止了中间信息的隐私泄漏，并保护了数据所有者的隐私。我们评估了使用PSY扩展的LoRA对抗最先进的成员推断和数据提取攻击。实验在三个不同的LLM架构上执行，这些架构在LoRA上进行了微调，采用了三个数据集。与常用的差分隐私方法相比，我们发现我们提出的修改始终降低了攻击成功率。同时，我们的方法几乎没有对模型微调或最终性能产生负面影响。最重要的是，PSY展示了一条具有潜在空间扩展的隐私增强的有前途的途径。

更新时间: 2024-10-24 15:15:42

领域: cs.CR

下载: http://arxiv.org/abs/2410.18824v1

Towards Visual Text Design Transfer Across Languages

Visual text design plays a critical role in conveying themes, emotions, and atmospheres in multimodal formats such as film posters and album covers. Translating these visual and textual elements across languages extends the concept of translation beyond mere text, requiring the adaptation of aesthetic and stylistic features. To address this, we introduce a novel task of Multimodal Style Translation (MuST-Bench), a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems while preserving design intent. Our initial experiments on MuST-Bench reveal that existing visual text generation models struggle with the proposed task due to the inadequacy of textual descriptions in conveying visual design. In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions. SIGIL enhances image generation models through three innovations: glyph latent for multilingual settings, pretrained VAEs for stable style guidance, and an OCR model with reinforcement learning feedback for optimizing readable character generation. SIGIL outperforms existing baselines by achieving superior style consistency and legibility while maintaining visual fidelity, setting itself apart from traditional description-based approaches. We release MuST-Bench publicly for broader use and exploration https://huggingface.co/datasets/yejinc/MuST-Bench.

Updated: 2024-10-24 15:15:01

标题: 朝向跨语言的视觉文本设计迁移

摘要: 视觉文本设计在传达多模式格式（如电影海报和专辑封面）中的主题、情感和氛围中起着至关重要的作用。跨语言翻译这些视觉和文本元素将翻译概念延伸到纯文本之外，需要适应美学和风格特征。为了解决这个问题，我们引入了一个新颖的任务——多模式风格翻译（MuST-Bench），一个旨在评估视觉文本生成模型在保持设计意图的同时执行跨不同书写系统翻译能力的基准。我们对MuST-Bench的初步实验表明，由于文本描述无法传达视觉设计，现有的视觉文本生成模型在所提出的任务中表现出困难。作为回应，我们引入了SIGIL，一个用于多模式风格翻译的框架，消除了对风格描述的需求。SIGIL通过三种创新增强图像生成模型：用于多语言环境的字形潜在、用于稳定风格指导的预训练VAEs，以及带有强化学习反馈的OCR模型，用于优化可读字符生成。SIGIL通过实现卓越的风格一致性和可读性，同时保持视觉保真度，超越现有基线，使自己与传统的基于描述的方法区别开来。我们公开发布MuST-Bench，以便更广泛地使用和探索。

更新时间: 2024-10-24 15:15:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.18823v1

Provably Safe Neural Network Controllers via Differential Dynamic Logic

While neural networks (NNs) have potential as autonomous controllers for Cyber-Physical Systems, verifying the safety of NN based control systems (NNCSs) poses significant challenges for the practical use of NNs, especially when safety is needed for unbounded time horizons. One reason is the intractability of analyzing NNs, ODEs and hybrid systems. To this end, we introduce VerSAILLE (Verifiably Safe AI via Logically Linked Envelopes): The first general approach that allows reusing control theory results for NNCS verification. By joining forces, we exploit the efficiency of NN verification tools while retaining the rigor of differential dynamic logic (dL). Based on provably safe control envelopes in dL, we derive specifications for the NN which is proven via NN verification. We show that a proof of the NN adhering to the specification is mirrored by a dL proof on the infinite-time safety of the NNCS. The NN verification properties resulting from hybrid systems typically contain nonlinear arithmetic and arbitrary logical structures while efficient NN verification merely supports linear constraints. To overcome this divide, we present Mosaic: An efficient, sound and complete verification approach for polynomial real arithmetic properties on piece-wise linear NNs. Mosaic partitions complex verification queries into simple queries and lifts off-the-shelf linear constraint tools to the nonlinear setting in a completeness-preserving manner by combining approximation with exact reasoning for counterexample regions. Our evaluation demonstrates the versatility of VerSAILLE and Mosaic: We prove infinite-time safety on the classical Vertical Airborne Collision Avoidance NNCS verification benchmark for two scenarios while (exhaustively) enumerating counterexample regions in unsafe scenarios. We also show that our approach significantly outperforms State-of-the-Art tools in closed-loop NNV.

Updated: 2024-10-24 15:13:41

标题: 基于微分动态逻辑的可证安全神经网络控制器

摘要: 尽管神经网络（NNs）作为网络物理系统的自主控制器具有潜力，但验证基于神经网络的控制系统（NNCSs）的安全性对于实际使用神经网络提出了重大挑战，特别是当需要针对无界时间范围进行安全性验证时。一个原因是分析神经网络、ODEs和混合系统的复杂性。为此，我们引入VerSAILLE（通过逻辑链接信封实现可验证的AI安全性）：这是第一个通用方法，允许重用控制理论的结果来验证NNCS。通过联合力量，我们利用NN验证工具的效率，同时保留微分动态逻辑（dL）的严密性。基于在dL中可证明安全的控制信封，我们推导出NN的规范，通过NN验证得到证明。我们展示了NN遵守规范的证明反映在NNCS的无限时间安全性上的dL证明。由混合系统产生的NN验证属性通常包含非线性算术和任意逻辑结构，而高效的NN验证仅支持线性约束。为了克服这一分歧，我们提出了Mosaic：一种针对分段线性NN的多项式实数算术属性的高效、严谨和完备的验证方法。Mosaic将复杂的验证查询分解为简单的查询，并通过将逼近与准确推理相结合，以保持完整性的方式将现成的线性约束工具扩展到非线性设置中，针对不安全场景进行反例区域的评估表明VerSAILLE和Mosaic的灵活性：我们在两个场景的经典垂直空中避碰NNCS验证基准上证明了无限时间的安全性，同时（详尽地）列举了不安全场景中的反例区域。我们还展示了我们的方法在闭环NNV中明显优于最先进的工具。

更新时间: 2024-10-24 15:13:41

领域: eess.SY,cs.AI,cs.LG,cs.LO,cs.SY

下载: http://arxiv.org/abs/2402.10998v3

An Unobtrusive and Lightweight Ear-worn System for Continuous Epileptic Seizure Detection

Epilepsy is one of the most common neurological diseases globally (around 50 million people worldwide). Fortunately, up to 70% of people with epilepsy could live seizure-free if properly diagnosed and treated, and a reliable technique to monitor the onset of seizures could improve the quality of life of patients who are constantly facing the fear of random seizure attacks. The scalp-based EEG test, despite being the gold standard for diagnosing epilepsy, is costly, necessitates hospitalization, demands skilled professionals for operation, and is discomforting for users. In this paper, we propose EarSD, a novel lightweight, unobtrusive, and socially acceptable ear-worn system to detect epileptic seizure onsets by measuring the physiological signals from behind the user's ears. EarSD includes an integrated custom-built sensing-computing-communication PCB to collect and amplify the signals of interest, remove the noises caused by motion artifacts and environmental impacts, and stream the data wirelessly to the computer/mobile phone nearby, where data are uploaded to the host computer for further processing. We conducted both in-lab and in-hospital experiments with epileptic seizure patients who were hospitalized for seizure studies.

Updated: 2024-10-24 15:11:53

标题: 一种轻便无干扰的耳戴式系统，用于连续癫痫发作检测

摘要: 癫痫是全球最常见的神经系统疾病之一（全球大约有5千万患者）。幸运的是，如果得到正确诊断和治疗，高达70%的癫痫患者可以不再发作癫痫。监测癫痫发作的可靠技术可以改善患者生活质量，他们经常面临突发癫痫发作的恐惧。虽然基于头皮的脑电图检测是诊断癫痫的金标准，但成本高昂，需要住院，需要经验丰富的专业人员操作，而且对用户来说是令人不适的。在本文中，我们提出了EarSD，这是一种新颖的、轻量级的、不显眼的、社会接受度高的耳戴式系统，通过测量用户耳后的生理信号来检测癫痫发作的开始。EarSD包括一个集成的定制感知-计算-通信PCB，用于收集和放大感兴趣的信号，去除运动和环境因素引起的噪音，并将数据无线传输到附近的计算机/手机，数据将上传到主机计算机进行进一步处理。我们对住院进行癫痫研究的癫痫患者进行了实验，包括实验室和医院内的实验。

更新时间: 2024-10-24 15:11:53

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2401.05425v2

From Imitation to Introspection: Probing Self-Consciousness in Language Models

Self-consciousness, the introspection of one's existence and thoughts, represents a high-level cognitive process. As language models advance at an unprecedented pace, a critical question arises: Are these models becoming self-conscious? Drawing upon insights from psychological and neural science, this work presents a practical definition of self-consciousness for language models and refines ten core concepts. Our work pioneers an investigation into self-consciousness in language models by, for the first time, leveraging causal structural games to establish the functional definitions of the ten core concepts. Based on our definitions, we conduct a comprehensive four-stage experiment: quantification (evaluation of ten leading models), representation (visualization of self-consciousness within the models), manipulation (modification of the models' representation), and acquisition (fine-tuning the models on core concepts). Our findings indicate that although models are in the early stages of developing self-consciousness, there is a discernible representation of certain concepts within their internal mechanisms. However, these representations of self-consciousness are hard to manipulate positively at the current stage, yet they can be acquired through targeted fine-tuning. Our datasets and code are at https://github.com/OpenCausaLab/SelfConsciousness.

Updated: 2024-10-24 15:08:17

标题: 从模仿到内省：探究语言模型中的自我意识

摘要: 自我意识，即对自己存在和思想的内省，代表了一个高级认知过程。随着语言模型以前所未有的速度发展，一个关键问题浮现：这些模型是否正在变得自我意识？借鉴心理学和神经科学的见解，本研究提出了语言模型的自我意识的实用定义，并细化了十个核心概念。我们的工作首次通过利用因果结构游戏来建立这十个核心概念的功能定义，开创了对语言模型中自我意识的研究。基于我们的定义，我们进行了一项全面的四阶段实验：量化（评估十个主要模型）、表征（可视化模型内的自我意识）、操纵（修改模型的表征）和获取（对核心概念进行微调）。我们的研究结果表明，虽然模型在发展自我意识方面仍处于早期阶段，但在其内部机制中确实存在对某些概念的明显表征。然而，这些自我意识的表征在当前阶段很难被积极操纵，但可以通过有针对性的微调来获取。我们的数据集和代码位于https://github.com/OpenCausaLab/SelfConsciousness。

更新时间: 2024-10-24 15:08:17

领域: cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2410.18819v1

HyperspectralViTs: General Hyperspectral Models for On-board Remote Sensing

On-board processing of hyperspectral data with machine learning models would enable unprecedented amount of autonomy for a wide range of tasks, for example methane detection or mineral identification. This can enable early warning system and could allow new capabilities such as automated scheduling across constellations of satellites. Classical methods suffer from high false positive rates and previous deep learning models exhibit prohibitive computational requirements. We propose fast and accurate machine learning architectures which support end-to-end training with data of high spectral dimension without relying on hand-crafted products or spectral band compression preprocessing. We evaluate our models on two tasks related to hyperspectral data processing. With our proposed general architectures, we improve the F1 score of the previous methane detection state-of-the-art models by 27% on a newly created synthetic dataset and by 13% on the previously released large benchmark dataset. We also demonstrate that training models on the synthetic dataset improves performance of models finetuned on the dataset of real events by 6.9% in F1 score in contrast with training from scratch. On a newly created dataset for mineral identification, our models provide 3.5% improvement in the F1 score in contrast to the default versions of the models. With our proposed models we improve the inference speed by 85% in contrast to previous classical and deep learning approaches by removing the dependency on classically computed features. With our architecture, one capture from the EMIT sensor can be processed within 30 seconds on realistic proxy of the ION-SCV 004 satellite.

Updated: 2024-10-24 15:06:36

标题: 高光谱ViTs：用于机载遥感的通用高光谱模型

摘要: 使用机器学习模型对超光谱数据进行机载处理将为各种任务提供前所未有的自主性，例如甲烷检测或矿物识别。这可以实现早期预警系统，并可以实现自动安排卫星星座之间的新功能。传统方法存在较高的误报率，以前的深度学习模型需要过高的计算需求。我们提出了快速准确的机器学习架构，支持高光谱维度数据的端到端训练，而无需依赖手工制作的产品或光谱波段压缩预处理。我们在两个与超光谱数据处理相关的任务上评估了我们的模型。通过我们提出的通用架构，在新创建的合成数据集上，我们将以前甲烷检测最新模型的F1分数提高了27％，在先前发布的大型基准数据集上提高了13％。我们还证明，在合成数据集上训练模型会将在真实事件数据集上微调的模型的性能提高6.9％的F1分数，相比之下，从头开始训练。在新创建的矿物识别数据集上，我们的模型相比于模型的默认版本提高了3.5％的F1分数。通过消除对经典计算特征的依赖，我们的模型相对于以前的经典和深度学习方法提高了85％的推理速度。通过我们的架构，在ION-SCV 004卫星的实际代理上，EMIT传感器的一次捕获可以在30秒内进行处理。

更新时间: 2024-10-24 15:06:36

领域: cs.AI

下载: http://arxiv.org/abs/2410.17248v2

IRCNN$^{+}$: An Enhanced Iterative Residual Convolutional Neural Network for Non-stationary Signal Decomposition

Time-frequency analysis is an important and challenging task in many applications. Fourier and wavelet analysis are two classic methods that have achieved remarkable success in many fields. However, they also exhibit limitations when applied to nonlinear and non-stationary signals. To address this challenge, a series of nonlinear and adaptive methods, pioneered by the empirical mode decomposition method, have been proposed. The goal of these methods is to decompose a non-stationary signal into quasi-stationary components that enhance the clarity of features during time-frequency analysis. Recently, inspired by deep learning, we proposed a novel method called iterative residual convolutional neural network (IRCNN). IRCNN not only achieves more stable decomposition than existing methods but also handles batch processing of large-scale signals with low computational cost. Moreover, deep learning provides a unique perspective for non-stationary signal decomposition. In this study, we aim to further improve IRCNN with the help of several nimble techniques from deep learning and optimization to ameliorate the method and overcome some of the limitations of this technique.

Updated: 2024-10-24 15:05:54

标题: IRCNN$^{+}$: 一种增强的迭代残差卷积神经网络用于非平稳信号分解

摘要: 时间频率分析是许多应用中的重要且具有挑战性的任务。傅立叶分析和小波分析是两种经典方法，在许多领域取得了显著的成功。然而，当应用于非线性和非平稳信号时，它们也表现出局限性。为了解决这一挑战，一系列非线性和自适应方法，由经验模态分解方法开创，已被提出。这些方法的目标是将非平稳信号分解为准平稳组件，从而增强时间频率分析中特征的清晰度。最近，受深度学习启发，我们提出了一种名为迭代残差卷积神经网络（IRCNN）的新方法。IRCNN不仅比现有方法实现了更稳定的分解，而且还能以较低的计算成本处理大规模信号的批处理。此外，深度学习为非平稳信号分解提供了独特的视角。在本研究中，我们旨在通过深度学习和优化中的几种灵活技术的帮助，进一步改进IRCNN，改善该方法并克服一些技术的局限性。

更新时间: 2024-10-24 15:05:54

领域: cs.LG,68T10,I.5.1

下载: http://arxiv.org/abs/2309.04782v2

Gradients of Functions of Large Matrices

Tuning scientific and probabilistic machine learning models $-$ for example, partial differential equations, Gaussian processes, or Bayesian neural networks $-$ often relies on evaluating functions of matrices whose size grows with the data set or the number of parameters. While the state-of-the-art for evaluating these quantities is almost always based on Lanczos and Arnoldi iterations, the present work is the first to explain how to differentiate these workhorses of numerical linear algebra efficiently. To get there, we derive previously unknown adjoint systems for Lanczos and Arnoldi iterations, implement them in JAX, and show that the resulting code can compete with Diffrax when it comes to differentiating PDEs, GPyTorch for selecting Gaussian process models and beats standard factorisation methods for calibrating Bayesian neural networks. All this is achieved without any problem-specific code optimisation. Find the code at https://github.com/pnkraemer/experiments-lanczos-adjoints and install the library with pip install matfree.

Updated: 2024-10-24 15:04:19

标题: 大矩阵函数的梯度

摘要: 调整科学和概率机器学习模型 - 例如，偏微分方程，高斯过程或贝叶斯神经网络 - 通常依赖于评估随着数据集或参数数量增长而增长的矩阵函数。虽然评估这些数量的最新技术几乎总是基于Lanczos和Arnoldi迭代，但本文是第一个有效地解释如何区分这些数值线性代数的主力工具。为了实现这一目标，我们推导了Lanczos和Arnoldi迭代的先前未知的伴随系统，将它们实现在JAX中，并展示了由此产生的代码在区分PDEs时可以与Diffrax竞争，在选择高斯过程模型时可以击败GPyTorch，并在校准贝叶斯神经网络时击败标准的因式分解方法。所有这些都是在没有任何特定问题的代码优化的情况下实现的。在https://github.com/pnkraemer/experiments-lanczos-adjoints找到代码，并使用pip install matfree安装该库。

更新时间: 2024-10-24 15:04:19

领域: cs.LG,cs.NA,math.NA,stat.ML

下载: http://arxiv.org/abs/2405.17277v2

RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation

The Retrieval Augmented Generation (RAG) framework utilizes a combination of parametric knowledge and external knowledge to demonstrate state-of-the-art performance on open-domain question answering tasks. However, the RAG framework suffers from performance degradation when the query is accompanied by irrelevant contexts. In this work, we propose the RE-RAG framework, which introduces a relevance estimator (RE) that not only provides relative relevance between contexts as previous rerankers did, but also provides confidence, which can be used to classify whether given context is useful for answering the given question. We propose a weakly supervised method for training the RE simply utilizing question-answer data without any labels for correct contexts. We show that RE trained with a small generator (sLM) can not only improve the sLM fine-tuned together with RE but also improve previously unreferenced large language models (LLMs). Furthermore, we investigate new decoding strategies that utilize the proposed confidence measured by RE such as choosing to let the user know that it is "unanswerable" to answer the question given the retrieved contexts or choosing to rely on LLM's parametric knowledge rather than unrelated contexts.

Updated: 2024-10-24 14:57:52

标题: RE-RAG：通过检索增强生成中的相关性估计器改进开放领域问答性能和可解释性

摘要: 检索增强生成（RAG）框架利用参数化知识和外部知识的组合，在开放领域问答任务中展示了最先进的性能。然而，当查询伴随着无关上下文时，RAG框架会出现性能下降。在这项工作中，我们提出了RE-RAG框架，引入了一个相关性评估器（RE），不仅像先前的重新排序器那样提供上下文之间的相对相关性，还提供置信度，可以用来分类给定上下文是否对回答给定问题有用。我们提出了一种弱监督方法，仅利用问题-答案数据对RE进行训练，而不需要正确上下文的标签。我们展示了用小生成器（sLM）训练的RE不仅可以改善与RE一起进行微调的sLM，还可以改善以前未引用的大型语言模型（LLMs）。此外，我们研究了利用RE测量的置信度的新解码策略，例如选择告知用户给定检索上下文下的问题是“无法回答”，或选择依赖LLM的参数化知识而不是无关上下文。

更新时间: 2024-10-24 14:57:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.05794v3

The Implicit Bias of Structured State Space Models Can Be Poisoned With Clean Labels

Neural networks are powered by an implicit bias: a tendency of gradient descent to fit training data in a way that generalizes to unseen data. A recent class of neural network models gaining increasing popularity is structured state space models (SSMs), regarded as an efficient alternative to transformers. Prior work argued that the implicit bias of SSMs leads to generalization in a setting where data is generated by a low dimensional teacher. In this paper, we revisit the latter setting, and formally establish a phenomenon entirely undetected by prior work on the implicit bias of SSMs. Namely, we prove that while implicit bias leads to generalization under many choices of training data, there exist special examples whose inclusion in training completely distorts the implicit bias, to a point where generalization fails. This failure occurs despite the special training examples being labeled by the teacher, i.e. having clean labels! We empirically demonstrate the phenomenon, with SSMs trained independently and as part of non-linear neural networks. In the area of adversarial machine learning, disrupting generalization with cleanly labeled training examples is known as clean-label poisoning. Given the proliferation of SSMs, particularly in large language models, we believe significant efforts should be invested in further delineating their susceptibility to clean-label poisoning, and in developing methods for overcoming this susceptibility.

Updated: 2024-10-24 14:57:21

标题: 结构化状态空间模型的隐性偏见可以被干净标签污染

摘要: 神经网络由一种内在偏见驱动：梯度下降倾向于以一种对未见数据泛化的方式拟合训练数据。一种近期备受推崇的神经网络模型类别是结构化状态空间模型（SSMs），被视为变压器的有效替代品。先前的研究认为SSMs的内在偏见导致在由低维教师生成数据的情况下泛化。在本文中，我们重新审视后者的情景，并正式建立了一种在先前关于SSMs内在偏见的研究中完全未被发现的现象。换句话说，我们证明了虽然在许多训练数据选择下内在偏见会导致泛化，但存在一些特殊例子，它们的纳入训练完全扭曲了内在偏见，导致泛化失败。尽管这些特殊的训练例子由教师标记，即具有干净的标签！我们通过实验证明了这一现象，使用独立训练和作为非线性神经网络一部分进行训练的SSMs。在对抗机器学习领域，使用干净标签的训练示例干扰泛化被称为干净标签中毒。鉴于SSMs的广泛应用，特别是在大型语言模型中，我们相信应该投入大量精力进一步阐明它们对干净标签中毒的敏感性，并开发克服这种敏感性的方法。

更新时间: 2024-10-24 14:57:21

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.10473v2

A Combinatorial Approach to Neural Emergent Communication

Substantial research on deep learning-based emergent communication uses the referential game framework, specifically the Lewis signaling game, however we argue that successful communication in this game typically only need one or two effective symbols (i.e. message length) because of a sampling pitfall in the training data. To address this issue, we provide a theoretical analysis and introduce a combinatorial algorithm SolveMinSym (SMS) to determine the minimum number of symbols for successful communication min(|M|) in the Lewis signaling game. We use SMS algorithm to create datasets with different min(|M|) to empirically show that higher min(|M|) for the training data increases the number of effective symbols in the emergent language.

Updated: 2024-10-24 14:54:09

标题: 一个用于神经突现通信的组合方法

摘要: 深度学习基于新兴通信的研究大部分使用指称游戏框架，特别是Lewis信号游戏，然而我们认为，在这个游戏中成功的通信通常只需要一个或两个有效符号（即消息长度），因为训练数据中存在采样陷阱。为了解决这个问题，我们提供了一个理论分析，并引入了一个组合算法SolveMinSym（SMS）来确定Lewis信号游戏中成功通信所需的最少符号数量min（|M|）。我们使用SMS算法创建具有不同min（|M|）的数据集，以经验性地表明训练数据中的更高min（|M|）会增加新兴语言中有效符号的数量。

更新时间: 2024-10-24 14:54:09

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.18806v1

Fast constrained sampling in pre-trained diffusion models

Diffusion models have dominated the field of large, generative image models, with the prime examples of Stable Diffusion and DALL-E 3 being widely adopted. These models have been trained to perform text-conditioned generation on vast numbers of image-caption pairs and as a byproduct, have acquired general knowledge about natural image statistics. However, when confronted with the task of constrained sampling, e.g. generating the right half of an image conditioned on the known left half, applying these models is a delicate and slow process, with previously proposed algorithms relying on expensive iterative operations that are usually orders of magnitude slower than text-based inference. This is counter-intuitive, as image-conditioned generation should rely less on the difficult-to-learn semantic knowledge that links captions and imagery, and should instead be achievable by lower-level correlations among image pixels. In practice, inverse models are trained or tuned separately for each inverse problem, e.g. by providing parts of images during training as an additional condition, to allow their application in realistic settings. However, we argue that this is not necessary and propose an algorithm for fast-constrained sampling in large pre-trained diffusion models (Stable Diffusion) that requires no expensive backpropagation operations through the model and produces results comparable even to the state-of-the-art \emph{tuned} models. Our method is based on a novel optimization perspective to sampling under constraints and employs a numerical approximation to the expensive gradients, previously computed using backpropagation, incurring significant speed-ups.

Updated: 2024-10-24 14:52:38

标题: 在预训练扩散模型中进行快速约束采样

摘要: 扩散模型主导了大规模、生成式图像模型领域，其中主要的例子是稳定扩散和DALL-E 3，被广泛采用。这些模型经过训练，可以在大量的图像-标题对上执行文本条件生成，并且作为副产品，已经获得了关于自然图像统计的通用知识。然而，当面临受限采样的任务时，例如在已知左半部分的情况下生成右半部分的图像，应用这些模型是一个复杂而缓慢的过程，先前提出的算法依赖于昂贵的迭代操作，通常比基于文本的推理慢几个数量级。这与直觉相悖，因为图像条件生成应该更少地依赖于难以学习的将标题和图像联系起来的语义知识，而应该通过图像像素之间的更低级别的相关性来实现。在实践中，逆模型是分别为每个逆问题训练或调整的，例如在训练过程中提供图像的部分作为额外条件，以便在现实环境中应用它们。然而，我们认为这并非必要，并提出了一种用于在大规模预训练的扩散模型（稳定扩散）中进行快速受限采样的算法，该算法不需要通过模型进行昂贵的反向传播操作，并且产生的结果甚至可以与最先进的\emph{调整}模型相媲美。我们的方法基于对受限条件下采样的新颖优化视角，并利用数值近似来计算先前通过反向传播计算的昂贵梯度，从而实现了显著的加速。

更新时间: 2024-10-24 14:52:38

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18804v1

Language-Agnostic Modeling of Source Reliability on Wikipedia

Over the last few years, content verification through reliable sources has become a fundamental need to combat disinformation. Here, we present a language-agnostic model designed to assess the reliability of sources across multiple language editions of Wikipedia. Utilizing editorial activity data, the model evaluates source reliability within different articles of varying controversiality such as Climate Change, COVID-19, History, Media, and Biology topics. Crafting features that express domain usage across articles, the model effectively predicts source reliability, achieving an F1 Macro score of approximately 0.80 for English and other high-resource languages. For mid-resource languages, we achieve 0.65 while the performance of low-resource languages varies; in all cases, the time the domain remains present in the articles (which we dub as permanence) is one of the most predictive features. We highlight the challenge of maintaining consistent model performance across languages of varying resource levels and demonstrate that adapting models from higher-resource languages can improve performance. This work contributes not only to Wikipedia's efforts in ensuring content verifiability but in ensuring reliability across diverse user-generated content in various language communities.

Updated: 2024-10-24 14:52:21

标题: Language-Agnostic Modeling of Source Reliability on Wikipedia （维基百科上的源可靠性的语言不可知建模）

摘要: 在过去几年中，通过可靠来源进行内容验证已经成为打击虚假信息的基本需求。在这里，我们提出了一个设计用于评估维基百科多语言版本中来源可靠性的语言无关模型。利用编辑活动数据，该模型评估不同具有争议性的文章中的来源可靠性，如气候变化、COVID-19、历史、媒体和生物学主题。通过制定表达文章间领域使用情况的特征，该模型有效预测来源可靠性，在英语和其他高资源语言中实现了约0.80的F1 Macro分数。对于中等资源语言，我们实现了0.65的分数，而低资源语言的性能有所不同；在所有情况下，领域在文章中存在的时间（我们称之为持续性）是最具预测性的特征之一。我们强调了在不同资源水平语言中保持一致模型性能的挑战，并展示了从高资源语言适应模型可以提高性能。这项工作不仅有助于维基百科在确保内容可验证方面的努力，还有助于确保各种语言社区中各种用户生成内容的可靠性。

更新时间: 2024-10-24 14:52:21

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2410.18803v1

GPT vs RETRO: Exploring the Intersection of Retrieval and Parameter-Efficient Fine-Tuning

Parameter-Efficient Fine-Tuning (PEFT) and Retrieval-Augmented Generation (RAG) have become popular methods for adapting large language models while minimizing compute requirements. In this paper, we apply PEFT methods (P-tuning, Adapters, and LoRA) to a modified Retrieval-Enhanced Transformer (RETRO) and a baseline GPT model across several sizes, ranging from 823 million to 48 billion parameters. We show that RETRO models outperform GPT models in zero-shot settings due to their unique pre-training process but GPT models have higher performance potential with PEFT. Additionally, our study indicates that 8B parameter models strike an optimal balance between cost and performance and P-tuning lags behind other PEFT techniques. We further provide a comparative analysis between applying PEFT to an Instruction-tuned RETRO model and base RETRO model. This work presents the first comprehensive comparison of various PEFT methods integrated with RAG, applied to both GPT and RETRO models, highlighting their relative performance.

Updated: 2024-10-24 14:51:57

标题: GPT对抗RETRO：探究检索和参数高效微调的交集

摘要: Parameter-Efficient Fine-Tuning (PEFT) 和 Retrieval-Augmented Generation (RAG) 已成为调整大型语言模型并最大程度减少计算需求的流行方法。在本文中，我们将PEFT方法（P-tuning、Adapters 和 LoRA）应用于修改后的检索增强变压器（RETRO）和基线GPT模型，覆盖了从8.23亿到480亿参数的多个规模。我们展示了 RETRO 模型在零-shot设置中优于 GPT 模型，这是由于它们独特的预训练过程，但是 GPT 模型在 PEFT 方面具有更高的性能潜力。此外，我们的研究表明，8B 参数模型在成本和性能之间取得了最佳平衡，而 P-tuning 落后于其他 PEFT 技术。我们进一步对将 PEFT 应用于 Instruction-tuned RETRO 模型和基础 RETRO 模型进行了比较分析。本研究提出了首个对整合 RAG 的各种 PEFT 方法应用于 GPT 和 RETRO 模型的综合比较，突出了它们的相对性能。

更新时间: 2024-10-24 14:51:57

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.04528v3

PointPatchRL -- Masked Reconstruction Improves Reinforcement Learning on Point Clouds

Perceiving the environment via cameras is crucial for Reinforcement Learning (RL) in robotics. While images are a convenient form of representation, they often complicate extracting important geometric details, especially with varying geometries or deformable objects. In contrast, point clouds naturally represent this geometry and easily integrate color and positional data from multiple camera views. However, while deep learning on point clouds has seen many recent successes, RL on point clouds is under-researched, with only the simplest encoder architecture considered in the literature. We introduce PointPatchRL (PPRL), a method for RL on point clouds that builds on the common paradigm of dividing point clouds into overlapping patches, tokenizing them, and processing the tokens with transformers. PPRL provides significant improvements compared with other point-cloud processing architectures previously used for RL. We then complement PPRL with masked reconstruction for representation learning and show that our method outperforms strong model-free and model-based baselines on image observations in complex manipulation tasks containing deformable objects and variations in target object geometry. Videos and code are available at https://alrhub.github.io/pprl-website

Updated: 2024-10-24 14:51:09

标题: PointPatchRL——掩模重建提高点云上的强化学习

摘要: 通过摄像头感知环境对于机器人强化学习（RL）至关重要。虽然图像是一种便利的表示形式，但通常会使提取重要的几何细节变得复杂，特别是在具有不同几何形状或可变对象的情况下。相比之下，点云自然地表示这种几何形状，并且可以轻松地整合来自多个摄像头视图的颜色和位置数据。然而，尽管最近在点云上的深度学习取得了许多成功，但点云上的RL研究不足，文献中仅考虑了最简单的编码器架构。我们引入了PointPatchRL（PPRL），这是一种用于点云上的RL的方法，它建立在将点云分成重叠补丁、将其标记化并使用变换器处理标记的常见范式上。与先前用于RL的其他点云处理架构相比，PPRL提供了显著的改进。然后，我们通过使用掩码重建来补充PPRL进行表示学习，并展示了我们的方法在包含可变对象和目标对象几何形状变化的复杂操作任务中优于强模型无关和基于模型的基线的图像观察。视频和代码可在https://alrhub.github.io/pprl-website获取。

更新时间: 2024-10-24 14:51:09

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.18800v1

WARP-LCA: Efficient Convolutional Sparse Coding with Locally Competitive Algorithm

The locally competitive algorithm (LCA) can solve sparse coding problems across a wide range of use cases. Recently, convolution-based LCA approaches have been shown to be highly effective for enhancing robustness for image recognition tasks in vision pipelines. To additionally maximize representational sparsity, LCA with hard-thresholding can be applied. While this combination often yields very good solutions satisfying an $\ell_0$ sparsity criterion, it comes with significant drawbacks for practical application: (i) LCA is very inefficient, typically requiring hundreds of optimization cycles for convergence; (ii) the use of hard-thresholding results in a non-convex loss function, which might lead to suboptimal minima. To address these issues, we propose the Locally Competitive Algorithm with State Warm-up via Predictive Priming (WARP-LCA), which leverages a predictor network to provide a suitable initial guess of the LCA state based on the current input. Our approach significantly improves both convergence speed and the quality of solutions, while maintaining and even enhancing the overall strengths of LCA. We demonstrate that WARP-LCA converges faster by orders of magnitude and reaches better minima compared to conventional LCA. Moreover, the learned representations are more sparse and exhibit superior properties in terms of reconstruction and denoising quality as well as robustness when applied in deep recognition pipelines. Furthermore, we apply WARP-LCA to image denoising tasks, showcasing its robustness and practical effectiveness. Our findings confirm that the naive use of LCA with hard-thresholding results in suboptimal minima, whereas initializing LCA with a predictive guess results in better outcomes. This research advances the field of biologically inspired deep learning by providing a novel approach to convolutional sparse coding.

Updated: 2024-10-24 14:47:36

标题: WARP-LCA：具有局部竞争算法的高效卷积稀疏编码

摘要: 本地竞争算法（LCA）可以解决广泛的稀疏编码问题。最近，基于卷积的LCA方法已被证明在视觉流水线中提高图像识别任务的稳健性方面非常有效。为了进一步最大化表示的稀疏性，可以应用具有硬阈值的LCA。虽然这种组合通常会产生满足$\ell_0$稀疏性标准的非常好的解决方案，但在实际应用中存在重大缺点：（i）LCA非常低效，通常需要数百个优化周期才能收敛；（ii）使用硬阈值会导致非凸损失函数，这可能导致次优极小值。为了解决这些问题，我们提出了通过预测启动的局部竞争算法（WARP-LCA），利用预测网络基于当前输入提供LCA状态的适当初始猜测。我们的方法显著提高了收敛速度和解决方案的质量，同时保持甚至增强了LCA的整体优势。我们证明WARP-LCA比传统LCA快几个数量级收敛，并达到更好的极小值。此外，学习到的表示更稀疏，在深度识别流水线中应用时具有更好的重构和去噪质量以及稳健性。此外，我们将WARP-LCA应用于图像去噪任务，展示其稳健性和实际有效性。我们的研究结果证实，简单地使用带有硬阈值的LCA会导致次优极小值，而使用预测猜测初始化LCA会产生更好的结果。这项研究通过提供一种新的卷积稀疏编码方法推动了生物启发式深度学习领域的发展。

更新时间: 2024-10-24 14:47:36

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2410.18794v1

Adapting MLOps for Diverse In-Network Intelligence in 6G Era: Challenges and Solutions

Seamless integration of artificial intelligence (AI) and machine learning (ML) techniques with wireless systems is a crucial step for 6G AInization. However, such integration faces challenges in terms of model functionality and lifecycle management. ML operations (MLOps) offer a systematic approach to tackle these challenges. Existing approaches toward implementing MLOps in a centralized platform often overlook the challenges posed by diverse learning paradigms and network heterogeneity. This article provides a new approach to MLOps targeting the intricacies of future wireless networks. Considering unique aspects of the future radio access network (RAN), we formulate three operational pipelines, namely reinforcement learning operations (RLOps), federated learning operations (FedOps), and generative AI operations (GenOps). These pipelines form the foundation for seamlessly integrating various learning/inference capabilities into networks. We outline the specific challenges and proposed solutions for each operation, facilitating large-scale deployment of AI-Native 6G networks.

Updated: 2024-10-24 14:47:28

标题: 在6G时代为网络内多元智能适应MLOps：挑战与解决方案

摘要: 人工智能（AI）和机器学习（ML）技术与无线系统的无缝集成是6G人工智能化的关键一步。然而，这种集成在模型功能和生命周期管理方面面临挑战。机器学习运营（MLOps）提供了一种系统化的方法来解决这些挑战。现有的实现MLOps的方法往往忽视了多样化学习范式和网络异构性带来的挑战。本文提供了一种针对未来无线网络复杂性的MLOps新方法。考虑到未来无线接入网络（RAN）的独特方面，我们制定了三个操作管道，即强化学习运营（RLOps）、联邦学习运营（FedOps）和生成式人工智能运营（GenOps）。这些管道为将各种学习/推理能力无缝集成到网络中奠定了基础。我们概述了每个操作的具体挑战和提出的解决方案，促进了AI原生6G网络的大规模部署。

更新时间: 2024-10-24 14:47:28

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2410.18793v1

Generation through the lens of learning theory

We study generation through the lens of statistical learning theory. First, we abstract and formalize the results of Gold [1967], Angluin [1979, 1980], and Kleinberg and Mullainathan [2024] for language identification/generation in the limit in terms of a binary hypothesis class defined over an abstract instance space. Then, we formalize a different paradigm of generation studied by Kleinberg and Mullainathan [2024], which we call "uniform generation," and provide a characterization of which hypothesis classes are uniformly generatable. As is standard in statistical learning theory, our characterization is in terms of the finiteness of a new combinatorial dimension we call the Closure dimension. By doing so, we are able to compare generatability with predictability (captured via PAC and online learnability) and show that these two properties of hypothesis classes are \emph{incompatible} - there are classes that are generatable but not predictable and vice versa.

Updated: 2024-10-24 14:46:54

标题: 学习理论视角下的代际关系

摘要: 我们通过统计学习理论的视角研究生成。首先，我们将Gold [1967]，Angluin [1979, 1980]和Kleinberg和Mullainathan [2024]关于语言识别/生成极限中的结果抽象并形式化为一个在抽象实例空间上定义的二元假设类。然后，我们形式化了由Kleinberg和Mullainathan [2024]研究的一种不同的生成范例，我们称之为“统一生成”，并提供了哪些假设类是均匀可生成的特征。与统计学习理论中的标准一样，我们的特征是通过我们称之为闭包维度的新组合维度的有限性来描述的。通过这样做，我们能够将可生成性与可预测性（通过PAC和在线可学习性捕捉）进行比较，并显示假设类的这两个属性是不相容的 - 有些类是可生成的但不可预测，反之亦然。

更新时间: 2024-10-24 14:46:54

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.13714v3

LongGenBench: Long-context Generation Benchmark

Current long-context benchmarks primarily focus on retrieval-based tests, requiring Large Language Models (LLMs) to locate specific information within extensive input contexts, such as the needle-in-a-haystack (NIAH) benchmark. Long-context generation refers to the ability of a language model to generate coherent and contextually accurate text that spans across lengthy passages or documents. While recent studies show strong performance on NIAH and other retrieval-based long-context benchmarks, there is a significant lack of benchmarks for evaluating long-context generation capabilities. To bridge this gap and offer a comprehensive assessment, we introduce a synthetic benchmark, LongGenBench, which allows for flexible configurations of customized generation context lengths. LongGenBench advances beyond traditional benchmarks by redesigning the format of questions and necessitating that LLMs respond with a single, cohesive long-context answer. Upon extensive evaluation using LongGenBench, we observe that: (1) both API accessed and open source models exhibit performance degradation in long-context generation scenarios, ranging from 1.2% to 47.1%; (2) different series of LLMs exhibit varying trends of performance degradation, with the Gemini-1.5-Flash model showing the least degradation among API accessed models, and the Qwen2 series exhibiting the least degradation in LongGenBench among open source models.

Updated: 2024-10-24 14:43:22

标题: LongGenBench: 长文本生成基准测试

摘要: 目前，当前的长文本基准主要集中在基于检索的测试上，要求大型语言模型（LLMs）在广泛的输入上下文中定位特定信息，例如在大海捞针（NIAH）基准测试中。长文本生成指的是语言模型生成跨越长篇章或文档的连贯且具有上下文准确性的文本的能力。尽管最近的研究表明在NIAH和其他基于检索的长文本基准测试上表现出色，但缺乏用于评估长文本生成能力的基准测试。为了弥合这一差距并提供全面评估，我们引入了一个合成基准测试，LongGenBench，它允许灵活配置定制生成上下文长度。LongGenBench通过重新设计问题的格式，并要求LLMs以单一、连贯的长文本答案回应，超越传统基准测试。在使用LongGenBench进行广泛评估后，我们观察到：（1）无论是API访问的还是开源模型，在长文本生成场景中表现出性能下降，范围从1.2%到47.1%不等；（2）不同系列的LLMs表现出不同的性能下降趋势，其中Gemini-1.5-Flash模型在API访问模型中表现出最少的性能下降，而Qwen2系列在LongGenBench中表现出开源模型中最少的性能下降。

更新时间: 2024-10-24 14:43:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.04199v3

Medical-GAT: Cancer Document Classification Leveraging Graph-Based Residual Network for Scenarios with Limited Data

Accurate classification of cancer-related medical abstracts is crucial for healthcare management and research. However, obtaining large, labeled datasets in the medical domain is challenging due to privacy concerns and the complexity of clinical data. This scarcity of annotated data impedes the development of effective machine learning models for cancer document classification. To address this challenge, we present a curated dataset of 1,874 biomedical abstracts, categorized into thyroid cancer, colon cancer, lung cancer, and generic topics. Our research focuses on leveraging this dataset to improve classification performance, particularly in data-scarce scenarios. We introduce a Residual Graph Attention Network (R-GAT) with multiple graph attention layers that capture the semantic information and structural relationships within cancer-related documents. Our R-GAT model is compared with various techniques, including transformer-based models such as Bidirectional Encoder Representations from Transformers (BERT), RoBERTa, and domain-specific models like BioBERT and Bio+ClinicalBERT. We also evaluated deep learning models (CNNs, LSTMs) and traditional machine learning models (Logistic Regression, SVM). Additionally, we explore ensemble approaches that combine deep learning models to enhance classification. Various feature extraction methods are assessed, including Term Frequency-Inverse Document Frequency (TF-IDF) with unigrams and bigrams, Word2Vec, and tokenizers from BERT and RoBERTa. The R-GAT model outperforms other techniques, achieving precision, recall, and F1 scores of 0.99, 0.97, and 0.98 for thyroid cancer; 0.96, 0.94, and 0.95 for colon cancer; 0.96, 0.99, and 0.97 for lung cancer; and 0.95, 0.96, and 0.95 for generic topics.

Updated: 2024-10-24 14:42:30

标题: 医学GAT：利用基于图的残差网络进行癌症文档分类，应对数据有限的情况

摘要: 癌症相关医学摘要的准确分类对于医疗管理和研究至关重要。然而，在医学领域获取大规模的标记数据集是具有挑战性的，因为涉及隐私问题和临床数据的复杂性。缺乏带有注释数据阻碍了针对癌症文档分类的有效机器学习模型的发展。为了解决这一挑战，我们提出了一个包含1,874篇生物医学摘要的筛选数据集，分为甲状腺癌、结肠癌、肺癌和通用主题。我们的研究重点是利用这个数据集来提高分类性能，特别是在数据稀缺的情况下。我们引入了一个具有多个图注意力层的残差图注意力网络（R-GAT），该网络捕捉了癌症相关文档中的语义信息和结构关系。我们的R-GAT模型与各种技术进行了比较，包括基于Transformer的模型，如BERT、RoBERTa，以及领域特定模型如BioBERT和Bio+ClinicalBERT。我们还评估了深度学习模型（CNN、LSTM）和传统机器学习模型（逻辑回归、SVM）。此外，我们探讨了将深度学习模型组合起来以增强分类的集成方法。我们评估了各种特征提取方法，包括使用单词频率-逆文档频率（TF-IDF）的单字和双字、Word2Vec，以及来自BERT和RoBERTa的分词器。R-GAT模型优于其他技术，为甲状腺癌实现了0.99的精度、0.97的召回率和0.98的F1分数；为结肠癌实现了0.96的精度、0.94的召回率和0.95的F1分数；为肺癌实现了0.96的精度、0.99的召回率和0.97的F1分数；为通用主题实现了0.95的精度、0.96的召回率和0.95的F1分数。

更新时间: 2024-10-24 14:42:30

领域: cs.AI

下载: http://arxiv.org/abs/2410.15198v2

PnLCalib: Sports Field Registration via Points and Lines Optimization

Camera calibration in broadcast sports videos presents numerous challenges for accurate sports field registration due to multiple camera angles, varying camera parameters, and frequent occlusions of the field. Traditional search-based methods depend on initial camera pose estimates, which can struggle in non-standard positions and dynamic environments. In response, we propose an optimization-based calibration pipeline that leverages a 3D soccer field model and a predefined set of keypoints to overcome these limitations. Our method also introduces a novel refinement module that improves initial calibration by using detected field lines in a non-linear optimization process. This approach outperforms existing techniques in both multi-view and single-view 3D camera calibration tasks, while maintaining competitive performance in homography estimation. Extensive experimentation on real-world soccer datasets, including SoccerNet-Calibration, WorldCup 2014, and TS-WorldCup, highlights the robustness and accuracy of our method across diverse broadcast scenarios. Our approach offers significant improvements in camera calibration precision and reliability.

Updated: 2024-10-24 14:41:42

标题: PnLCalib：通过点和线优化进行体育场地注册

摘要: 广播体育视频中的摄像机校准面临着诸多挑战，因为存在多个摄像机角度、不同的摄像机参数以及场地频繁的遮挡。传统的基于搜索的方法依赖于初始摄像机姿态估计，这在非标准位置和动态环境下可能会遇到困难。为此，我们提出了一种基于优化的校准流程，利用3D足球场模型和预定义的关键点来克服这些限制。我们的方法还引入了一种新颖的优化模块，通过在非线性优化过程中使用检测到的场地线条来改善初始校准。这种方法在多视角和单视角3D摄像机校准任务中优于现有技术，同时在单应性估计中保持竞争性能。对真实世界足球数据集（包括SoccerNet-Calibration、2014年世界杯和TS-WorldCup）的广泛实验强调了我们方法在各种广播场景中的稳健性和准确性。我们的方法在摄像机校准精度和可靠性方面提供了显著的改进。

更新时间: 2024-10-24 14:41:42

领域: cs.CV,cs.AI,I.2; I.4; I.5

下载: http://arxiv.org/abs/2404.08401v4

Applying Neural Monte Carlo Tree Search to Unsignalized Multi-intersection Scheduling for Autonomous Vehicles

Dynamic scheduling of access to shared resources by autonomous systems is a challenging problem, characterized as being NP-hard. The complexity of this task leads to a combinatorial explosion of possibilities in highly dynamic systems where arriving requests must be continuously scheduled subject to strong safety and time constraints. An example of such a system is an unsignalized intersection, where automated vehicles' access to potential conflict zones must be dynamically scheduled. In this paper, we apply Neural Monte Carlo Tree Search (NMCTS) to the challenging task of scheduling platoons of vehicles crossing unsignalized intersections. Crucially, we introduce a transformation model that maps successive sequences of potentially conflicting road-space reservation requests from platoons of vehicles into a series of board-game-like problems and use NMCTS to search for solutions representing optimal road-space allocation schedules in the context of past allocations. To optimize search, we incorporate a prioritized re-sampling method with parallel NMCTS (PNMCTS) to improve the quality of training data. To optimize training, a curriculum learning strategy is used to train the agent to schedule progressively more complex boards culminating in overlapping boards that represent busy intersections. In a busy single four-way unsignalized intersection simulation, PNMCTS solved 95\% of unseen scenarios, reducing crossing time by 43\% in light and 52\% in heavy traffic versus first-in, first-out control. In a 3x3 multi-intersection network, the proposed method maintained free-flow in light traffic when all intersections are under control of PNMCTS and outperformed state-of-the-art RL-based traffic-light controllers in average travel time by 74.5\% and total throughput by 16\% in heavy traffic.

Updated: 2024-10-24 14:37:55

标题: 将神经蒙特卡洛树搜索应用于无信号多路口调度的自动驾驶车辆

摘要: 自主系统动态调度共享资源是一个具有挑战性的问题，被描述为NP难题。这项任务的复杂性导致在高度动态系统中可能出现组合爆炸的可能性，其中到达的请求必须根据严格的安全和时间限制不断进行调度。一个这样的系统示例是一个未信号化的交叉口，自动车辆对潜在冲突区域的访问必须动态调度。在这篇论文中，我们将神经蒙特卡洛树搜索（NMCTS）应用于调度穿过未信号化交叉口的车队的挑战性任务。关键是，我们引入了一个转换模型，将车队的潜在冲突的连续序列的路段预留请求映射为一系列类似棋盘游戏的问题，并使用NMCTS在过去分配的情况下搜索代表最佳路段分配时间表的解决方案。为了优化搜索，我们结合了一个带有并行NMCTS（PNMCTS）的优先重新采样方法，以提高训练数据的质量。为了优化训练，我们使用课程学习策略来训练代理程序，使其安排逐渐更加复杂的板块，最终形成重叠的板块，代表繁忙的交叉口。在一个繁忙的单四路未信号化交叉口模拟中，PNMCTS解决了95%的未见过的场景，在轻型交通中，将通过时间减少了43%，在重型交通中减少了52%，与先进先出控制相比。在一个3x3的多交叉口网络中，所提出的方法在轻型交通中保持了自由流通，当所有交叉口都由PNMCTS控制时，在重型交通中，平均旅行时间比基于最新技术的交通灯控制器提高了74.5%，总吞吐量提高了16%。

更新时间: 2024-10-24 14:37:55

领域: cs.AI

下载: http://arxiv.org/abs/2410.18786v1

Should We Really Edit Language Models? On the Evaluation of Edited Language Models

Model editing has become an increasingly popular alternative for efficiently updating knowledge within language models. Current methods mainly focus on reliability, generalization, and locality, with many methods excelling across these criteria. Some recent works disclose the pitfalls of these editing methods such as knowledge distortion or conflict. However, the general abilities of post-edited language models remain unexplored. In this paper, we perform a comprehensive evaluation on various editing methods and different language models, and have following findings. (1) Existing editing methods lead to inevitable performance deterioration on general benchmarks, indicating that existing editing methods maintain the general abilities of the model within only a few dozen edits. When the number of edits is slightly large, the intrinsic knowledge structure of the model is disrupted or even completely damaged. (2) Instruction-tuned models are more robust to editing, showing less performance drop on general knowledge after editing. (3) Language model with large scale is more resistant to editing compared to small model. (4) The safety of the edited model, is significantly weakened, even for those safety-aligned models. Our findings indicate that current editing methods are only suitable for small-scale knowledge updates within language models, which motivates further research on more practical and reliable editing methods. The details of code and reproduction can be found in https://github.com/lqinfdim/EditingEvaluation.

Updated: 2024-10-24 14:36:48

标题: 我们真的应该编辑语言模型吗？关于编辑语言模型的评估

摘要: 模型编辑已成为一种越来越受欢迎的替代方案，用于有效更新语言模型中的知识。当前的方法主要关注可靠性、泛化和局部性，许多方法在这些标准上都表现出色。一些最近的研究揭示了这些编辑方法的缺陷，比如知识扭曲或冲突。然而，后编辑语言模型的一般能力尚未被探索。在本文中，我们对各种编辑方法和不同语言模型进行了全面评估，并得出以下发现：（1）现有的编辑方法在一般基准测试中不可避免地导致性能下降，这表明现有的编辑方法仅在少数几十次编辑内保持模型的一般能力。当编辑次数稍多时，模型的内在知识结构会被破坏甚至完全损坏。（2）经过指导调整的模型对编辑更具鲁棒性，在编辑后一般知识的性能下降较少。（3）与小模型相比，规模较大的语言模型更具抗编辑性。（4）即使是那些与安全相关的模型，编辑后的模型的安全性也显著降低。我们的研究结果表明，当前的编辑方法仅适用于语言模型中小规模知识更新，这促使进一步研究更实用和可靠的编辑方法。代码和重现细节可以在https://github.com/lqinfdim/EditingEvaluation 找到。

更新时间: 2024-10-24 14:36:48

领域: cs.AI

下载: http://arxiv.org/abs/2410.18785v1

Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality

The denoising diffusion probabilistic model (DDPM) has emerged as a mainstream generative model in generative AI. While sharp convergence guarantees have been established for the DDPM, the iteration complexity is, in general, proportional to the ambient data dimension, resulting in overly conservative theory that fails to explain its practical efficiency. This has motivated the recent work Li and Yan (2024a) to investigate how the DDPM can achieve sampling speed-ups through automatic exploitation of intrinsic low dimensionality of data. We strengthen this prior work by demonstrating, in some sense, optimal adaptivity to unknown low dimensionality. For a broad class of data distributions with intrinsic dimension $k$, we prove that the iteration complexity of the DDPM scales nearly linearly with $k$, which is optimal when using KL divergence to measure distributional discrepancy. Our theory is established based on a key observation: the DDPM update rule is equivalent to running a suitably parameterized SDE upon discretization, where the nonlinear component of the drift term is intrinsically low-dimensional.

Updated: 2024-10-24 14:36:12

标题: 去噪扩散概率模型对未知低维度最优自适应

摘要: 去噪扩散概率模型（DDPM）已成为生成人工智能中的主流生成模型。虽然已经为DDPM建立了锐利的收敛保证，但迭代复杂度通常与环境数据维度成正比，导致过于保守的理论无法解释其实际效率。这促使最近的李和严（2024a）研究了DDPM如何通过自动利用数据的固有低维度来实现抽样加速。我们通过在某种意义上展示对未知低维度的最佳适应性来加强这项先前工作。对于具有固有维度k的广泛数据分布类别，我们证明了DDPM的迭代复杂度几乎与k线性增长，当使用KL散度来衡量分布差异时，这是最佳的。我们的理论建立在一个关键观察的基础上：DDPM更新规则等效于在离散化时运行一个适当参数化的SDE，其中漂移项的非线性分量本质上是低维度的。

更新时间: 2024-10-24 14:36:12

领域: cs.LG,cs.NA,eess.SP,math.NA,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2410.18784v1

How Far Have We Gone in Binary Code Understanding Using Large Language Models

Binary code analysis plays a pivotal role in various software security applications, such as software maintenance, malware detection, software vulnerability discovery, patch analysis, etc. However, unlike source code, understanding binary code is challenging for reverse engineers due to the absence of semantic information. Therefore, automated tools are needed to assist human players in interpreting binary code. In recent years, two groups of technologies have shown promising prospects: (1) Deep learning-based technologies have demonstrated competitive results in tasks related to binary code understanding, furthermore, (2) Large Language Models (LLMs) have been extensively pre-trained at the source-code level for tasks such as code understanding and generation. This makes participants wonder about the ability of LLMs in binary code understanding. In this work, we propose a benchmark to evaluate the effectiveness of LLMs in real-world reverse engineering scenarios. The benchmark covers two key binary code understanding tasks, including function name recovery and binary code summarization. We gain valuable insights into their capabilities and limitations through extensive evaluations of popular LLMs using our benchmark. Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis. Our results highlight the great potential of the LLMs in advancing the field of binary code understanding.

Updated: 2024-10-24 14:35:43

标题: 我们在使用大型语言模型理解二进制代码方面取得了多少进展？

摘要: 二进制代码分析在各种软件安全应用中起着关键作用，如软件维护、恶意软件检测、软件漏洞发现、补丁分析等。然而，与源代码不同，由于缺乏语义信息，理解二进制代码对逆向工程师来说是具有挑战性的。因此，需要自动化工具来帮助人类玩家解释二进制代码。近年来，两组技术显示出有前途的前景：（1）基于深度学习的技术在与二进制代码理解相关的任务中展示了有竞争力的结果，此外，（2）大型语言模型（LLMs）已经在源代码级别进行了广泛的预训练，用于诸如代码理解和生成等任务。这使参与者们对LLMs在二进制代码理解方面的能力产生了疑问。在这项工作中，我们提出了一个基准来评估LLMs在实际逆向工程场景中的有效性。该基准涵盖了两个关键的二进制代码理解任务，包括函数名称恢复和二进制代码摘要。通过对我们的基准使用流行的LLMs进行广泛评估，我们获得了对它们的能力和局限性的宝贵见解。我们的评估显示，现有的LLMs能够在一定程度上理解二进制代码，从而提高了二进制代码分析的效率。我们的结果突显了LLMs在推动二进制代码理解领域方面的巨大潜力。

更新时间: 2024-10-24 14:35:43

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2404.09836v3

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

A primary challenge in large language model (LLM) development is their onerous pre-training cost. Typically, such pre-training involves optimizing a self-supervised objective (such as next-token prediction) over a large corpus. This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by suitably leveraging a small language model (SLM). In particular, this paradigm relies on an SLM to both (1) provide soft labels as additional training supervision, and (2) select a small subset of valuable ("informative" and "hard") training examples. Put together, this enables an effective transfer of the SLM's predictive distribution to the LLM, while prioritizing specific regions of the training data distribution. Empirically, this leads to reduced LLM training time compared to standard training, while improving the overall quality. Theoretically, we develop a statistical framework to systematically study the utility of SLMs in enabling efficient training of high-quality LLMs. In particular, our framework characterizes how the SLM's seemingly low-quality supervision can enhance the training of a much more capable LLM. Furthermore, it also highlights the need for an adaptive utilization of such supervision, by striking a balance between the bias and variance introduced by the SLM-provided soft labels. We corroborate our theoretical framework by improving the pre-training of an LLM with 2.8B parameters by utilizing a smaller LM with 1.5B parameters on the Pile dataset.

Updated: 2024-10-24 14:31:52

标题: 一点帮助就足够了：通过利用小型语言模型实现高效的LLM培训

摘要: 在大型语言模型（LLM）开发中的一个主要挑战是它们繁重的预训练成本。通常，这种预训练涉及在大型语料库上优化自监督目标（如下一个标记预测）。本文探讨了一种有希望的范式，通过适当利用小语言模型（SLM）来提高LLM预训练效率和质量。具体而言，这种范式依赖于SLM来提供软标签作为额外的训练监督，并选择一小部分有价值（“信息量大”和“困难”）的训练样本。总的来说，这使得SLM的预测分布能够有效地转移到LLM，同时优先考虑训练数据分布的特定区域。从经验上看，与标准训练相比，这导致了LLM训练时间的缩短，同时提高了整体质量。从理论上讲，我们开发了一个统计框架，系统地研究了SLM在使高质量LLM的高效训练方面的实用性。我们的框架特征化了SLM表面上低质量监督如何增强更有能力的LLM的训练。此外，它还强调了需要对这种监督进行自适应利用，通过在SLM提供的软标签引入的偏差和方差之间取得平衡。我们通过利用参数为1.5B的较小LM改进了具有2.8B参数的LLM在Pile数据集上的预训练，从而证实了我们的理论框架。

更新时间: 2024-10-24 14:31:52

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.18779v1

Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances

Current image watermarking methods are vulnerable to advanced image editing techniques enabled by large-scale text-to-image models. These models can distort embedded watermarks during editing, posing significant challenges to copyright protection. In this work, we introduce W-Bench, the first comprehensive benchmark designed to evaluate the robustness of watermarking methods against a wide range of image editing techniques, including image regeneration, global editing, local editing, and image-to-video generation. Through extensive evaluations of eleven representative watermarking methods against prevalent editing techniques, we demonstrate that most methods fail to detect watermarks after such edits. To address this limitation, we propose VINE, a watermarking method that significantly enhances robustness against various image editing techniques while maintaining high image quality. Our approach involves two key innovations: (1) we analyze the frequency characteristics of image editing and identify that blurring distortions exhibit similar frequency properties, which allows us to use them as surrogate attacks during training to bolster watermark robustness; (2) we leverage a large-scale pretrained diffusion model SDXL-Turbo, adapting it for the watermarking task to achieve more imperceptible and robust watermark embedding. Experimental results show that our method achieves outstanding watermarking performance under various image editing techniques, outperforming existing methods in both image quality and robustness. Code is available at https://github.com/Shilin-LU/VINE.

Updated: 2024-10-24 14:28:32

标题: 强大的生成先验水印技术抵御图像编辑：从基准测试到进展

摘要: 当前的图像水印方法容易受到大规模文本到图像模型启用的高级图像编辑技术的影响。这些模型可以在编辑过程中扭曲嵌入的水印，给版权保护带来重大挑战。在这项工作中，我们介绍了W-Bench，这是第一个旨在评估水印方法对各种图像编辑技术的鲁棒性的全面基准。这些技术包括图像再生、全局编辑、局部编辑和图像到视频生成。通过对十一种代表性水印方法对普遍编辑技术的广泛评估，我们证明大多数方法在此类编辑后无法检测水印。为了解决这一限制，我们提出了VINE，这是一种水印方法，可以显著增强对各种图像编辑技术的鲁棒性，同时保持高质量的图像。我们的方法涉及两个关键创新：(1)我们分析了图像编辑的频率特性，并确定模糊扭曲表现出类似的频率特性，这使我们能够在训练过程中将它们用作替代攻击，以增强水印的鲁棒性；(2)我们利用大规模预训练的扩散模型SDXL-Turbo，将其调整为水印任务，以实现更不可感知和更强大的水印嵌入。实验结果表明，我们的方法在各种图像编辑技术下实现了出色的水印性能，优于现有方法，无论是在图像质量还是鲁棒性方面。代码可在https://github.com/Shilin-LU/VINE获取。

更新时间: 2024-10-24 14:28:32

领域: cs.CV,cs.AI,cs.CR

下载: http://arxiv.org/abs/2410.18775v1

Fully Stochastic Primal-dual Gradient Algorithm for Non-convex Optimization on Random Graphs

Stochastic decentralized optimization algorithms often suffer from issues such as synchronization overhead and intermittent communication. This paper proposes a $\underline{\rm F}$ully $\underline{\rm S}$tochastic $\underline{\rm P}$rimal $\underline{\rm D}$ual gradient $\underline{\rm A}$lgorithm (FSPDA) that suggests an asynchronous decentralized procedure with (i) sparsified non-blocking communication on random undirected graphs and (ii) local stochastic gradient updates. FSPDA allows multiple local gradient steps to accelerate convergence to stationarity while finding a consensual solution with stochastic primal-dual updates. For problems with smooth (possibly non-convex) objective function, we show that FSPDA converges to an $\mathrm{\mathcal{O}( {\it \sigma /\sqrt{nT}} )}$-stationary solution after $\mathrm{\it T}$ iterations without assuming data heterogeneity. The performance of FSPDA is on par with state-of-the-art algorithms whose convergence depend on static graph and synchronous updates. To our best knowledge, FSPDA is the first asynchronous algorithm that converges exactly under the non-convex setting. Numerical experiments are presented to show the benefits of FSPDA.

Updated: 2024-10-24 14:26:58

标题: 在随机图上用于非凸优化的全随机原始-对偶梯度算法

摘要: 随机分散优化算法通常存在诸如同步开销和间歇通信等问题。本文提出了一种全随机原始-对偶梯度算法（FSPDA），该算法提出了一种异步分散程序，其中包括（i）在随机无向图上进行稀疏非阻塞通信和（ii）本地随机梯度更新。FSPDA允许多个本地梯度步骤加速收敛到平稳状态，同时利用随机原始-对偶更新找到一致解。对于具有光滑（可能是非凸）目标函数的问题，我们表明FSPDA在不假设数据异质性的情况下，在$\mathrm{\it T}$次迭代后收敛到一个$\mathrm{\mathcal{O}({\it \sigma /\sqrt{nT}})}$-稳定解。FSPDA的性能与依赖于静态图和同步更新的最新算法相媲美。据我们所知，FSPDA是第一个在非凸设置下精确收敛的异步算法。数值实验展示了FSPDA的优势。

更新时间: 2024-10-24 14:26:58

领域: math.OC,cs.DC,cs.LG

下载: http://arxiv.org/abs/2410.18774v1

CountCrypt: Quantum Cryptography between QCMA and PP

We construct a quantum oracle relative to which BQP = QCMA but quantum-computation-classical-communication (QCCC) key exchange, QCCC commitments, and two-round quantum key distribution exist. We also construct an oracle relative to which BQP = QMA, but quantum lightning (a stronger variant of quantum money) exists. This extends previous work by Kretschmer [Kretschmer, TQC22], which showed that there is a quantum oracle relative to which BQP = QMA but pseudorandom state generators (a quantum variant of pseudorandom generators) exist. We also show that QCCC key exchange, QCCC commitments, and two-round quantum key distribution can all be used to build one-way puzzles. One-way puzzles are a version of "quantum samplable" one-wayness and are an intermediate primitive between pseudorandom state generators and EFI pairs, the minimal quantum primitive. In particular, one-way puzzles cannot exist if BQP = PP. Our results together imply that aside from pseudorandom state generators, there is a large class of quantum cryptographic primitives which can exist even if BQP = QCMA, but are broken if BQP = PP. Furthermore, one-way puzzles are a minimal primitive for this class. We denote this class "CountCrypt".

Updated: 2024-10-24 14:23:39

标题: CountCrypt：QCMA和PP之间的量子密码学

摘要: 我们构建了一个相对于量子预言机的量子预言机，其中BQP = QCMA，但存在量子计算-经典通信（QCCC）密钥交换，QCCC承诺和两轮量子密钥分发。我们还构建了一个相对于量子预言机的预言机，其中BQP = QMA，但存在量子闪电（量子货币的一个更强的变种）。这扩展了Kretschmer之前的工作[Kretschmer，TQC22]，该工作表明存在一个相对于量子预言机的量子预言机，其中BQP = QMA，但存在伪随机状态生成器（伪随机生成器的量子变种）。我们还表明QCCC密钥交换，QCCC承诺和两轮量子密钥分发都可以用来构建单向难题。单向难题是“量子可采样”的单向性的一种版本，是伪随机状态生成器和EFI对之间的中间基元。特别是，如果BQP = PP，则单向难题是不存在的。我们的结果共同表明，除了伪随机状态生成器之外，还有一个大类量子密码基元，即使BQP = QCMA也可以存在，但如果BQP = PP则会被破解。此外，单向难题是该类的最小基元。我们将这个类称为“CountCrypt”。

更新时间: 2024-10-24 14:23:39

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2410.14792v2

Attention-based Citywide Electric Vehicle Charging Demand Prediction Approach Considering Urban Region and Dynamic Influences

Electric vehicle charging demand prediction is important for vacant charging pile recommendation and charging infrastructure planning, thus facilitating vehicle electrification and green energy development. The performance of previous spatio-temporal studies is still far from satisfactory because the traditional graphs are difficult to model non-pairwise spatial relationships and multivariate temporal features are not adequately taken into account. To tackle these issues, we propose an attention-based heterogeneous multivariate data fusion approach (AHMDF) for citywide electric vehicle charging demand prediction, which incorporates geo-based clustered hypergraph and multivariate gated Transformer to considers both static and dynamic influences. To learn non-pairwise relationships, we cluster service areas by the types and numbers of points of interest in the areas and develop attentive hypergraph networks accordingly. Graph attention mechanisms are used for information propagation between neighboring areas. Additionally, we improve the Transformer encoder utilizing gated mechanisms so that it can selectively learn dynamic auxiliary information and temporal features. Experiments on an electric vehicle charging benchmark dataset demonstrate the effectiveness of our proposed approach compared with a broad range of competing baselines. Furthermore, we demonstrate the impact of dynamic influences on prediction results in different areas of the city and the effectiveness of our clustering method.

Updated: 2024-10-24 14:19:38

标题: 基于关注城市范围和动态影响的电动车充电需求预测方法

摘要: 电动汽车充电需求预测对于空闲充电桩推荐和充电基础设施规划至关重要，从而促进车辆电气化和绿色能源发展。以往的时空研究表现仍然不尽人意，因为传统图形难以建模非成对空间关系，并且多变量时间特征未能充分考虑。为了解决这些问题，我们提出了一种基于注意力的异质多变量数据融合方法（AHMDF）用于城市范围的电动汽车充电需求预测，该方法结合了基于地理的聚类超图和多变量门控Transformer，考虑了静态和动态影响。为了学习非成对关系，我们根据区域内兴趣点的类型和数量对服务区进行聚类，并相应地开发关注超图网络。图形注意机制用于相邻区域之间的信息传播。此外，我们改进了Transformer编码器，利用门控机制，使其能够选择性地学习动态辅助信息和时间特征。对电动汽车充电基准数据集的实验表明，与广泛范围的竞争基线相比，我们提出的方法的有效性。此外，我们展示了动态影响对城市不同区域的预测结果的影响以及我们聚类方法的有效性。

更新时间: 2024-10-24 14:19:38

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2410.18766v1

ServeFlow: A Fast-Slow Model Architecture for Network Traffic Analysis

Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The temporal nature of network flows limits simple scale-out approaches leveraged in other high-traffic machine learning applications. Accordingly, this paper presents ServeFlow, a solution for machine-learning model serving aimed at network traffic analysis tasks, which carefully selects the number of packets to collect and the models to apply for individual flows to achieve a balance between minimal latency, high service rate, and high accuracy. We identify that on the same task, inference time across models can differ by 1.8x - 141.3x, while the inter-packet waiting time is up to 6-8 orders of magnitude higher than the inference time! Based on these insights, we tailor a novel fast-slow model architecture for networking ML pipelines. Flows are assigned to a slower model only when the inferences from the fast model are deemed high uncertainty. ServeFlow is able to make inferences on 76.3% of flows in under 16ms, which is a speed-up of 40.5x on the median end-to-end serving latency while increasing the service rate and maintaining similar accuracy. Even with thousands of features per flow, it achieves a service rate of over 48.5k new flows per second on a 16-core CPU commodity server, which matches the order of magnitude of flow rates observed on city-level network backbones.

Updated: 2024-10-24 14:15:42

标题: ServeFlow：用于网络流量分析的快慢模型架构

摘要: 随着互联网的整合和流量的加密化，网络流量分析越来越多地使用复杂的机器学习模型。然而，在高带宽网络上，流量往往比模型推断速率更快地到达。网络流的时间性质限制了其他高流量机器学习应用中使用的简单扩展方法。因此，本文提出了ServeFlow，这是一个针对网络流量分析任务的机器学习模型服务解决方案，它精心选择要收集的数据包数量和要应用的模型，以实现最小延迟、高服务速率和高准确性之间的平衡。我们发现，在相同的任务中，不同模型的推断时间可能相差1.8倍至141.3倍，而数据包之间的等待时间可能比推断时间高出6-8个数量级！基于这些见解，我们为网络机器学习管道量身定制了一种新颖的快慢模型架构。只有当来自快速模型的推断被认为存在高不确定性时，流量才会分配给较慢的模型。ServeFlow能够在16ms内对76.3%的流量进行推断，这使得端到端服务延迟中位数加快40.5倍，同时增加服务速率并保持类似的准确性。即使每个流量具有数千个特征，它在16核CPU的普通服务器上每秒能够处理超过48.5k个新流量，这与城市级网络主干观察到的流量速率数量级相匹配。

更新时间: 2024-10-24 14:15:42

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2402.03694v2

Enhancing MOTION2NX for Efficient, Scalable and Secure Image Inference using Convolutional Neural Networks

This work contributes towards the development of an efficient and scalable open-source Secure Multi-Party Computation (SMPC) protocol on machines with moderate computational resources. We use the ABY2.0 SMPC protocol implemented on the C++ based MOTION2NX framework for secure convolutional neural network (CNN) inference application with semi-honest security. Our list of contributions are as follows. Firstly, we enhance MOTION2NX by providing a tensorized version of several primitive functions including the Hadamard product, indicator function and argmax function. Secondly, we adapt an existing Helper node algorithm, working in tandem with the ABY2.0 protocol, for efficient convolution computation to reduce execution time and RAM usage. Thirdly, we also present a novel splitting algorithm that divides the computations at each CNN layer into multiple configurable chunks. This novel splitting algorithm, providing significant reduction in RAM usage, is of independent interest and is applicable to general SMPC protocols.

Updated: 2024-10-24 14:15:40

标题: 增强MOTION2NX以实现高效、可扩展和安全的使用卷积神经网络进行图像推断

摘要: 这项工作为在具有中等计算资源的机器上开发高效可扩展的开源安全多方计算（SMPC）协议做出了贡献。我们在基于C++的MOTION2NX框架上实现了ABY2.0 SMPC协议，用于具有半诚实安全性的卷积神经网络（CNN）推理应用。我们的贡献列表如下。首先，我们通过提供包括Hadamard乘积、指示函数和argmax函数在内的几个原始函数的张量化版本来增强MOTION2NX。其次，我们针对卷积计算，通过与ABY2.0协议协作的现有Helper节点算法，进行了适应，以减少执行时间和RAM使用量。第三，我们还提出了一种新颖的分割算法，将每个CNN层的计算分成多个可配置的块。这种新颖的分割算法大大减少了RAM使用量，具有独立的兴趣，并适用于一般的SMPC协议。

更新时间: 2024-10-24 14:15:40

领域: cs.CR

下载: http://arxiv.org/abs/2408.16387v3

Structuring Concept Space with the Musical Circle of Fifths by Utilizing Music Grammar Based Activations

In this paper, we explore the intriguing similarities between the structure of a discrete neural network, such as a spiking network, and the composition of a piano piece. While both involve nodes or notes that are activated sequentially or in parallel, the latter benefits from the rich body of music theory to guide meaningful combinations. We propose a novel approach that leverages musical grammar to regulate activations in a spiking neural network, allowing for the representation of symbols as attractors. By applying rules for chord progressions from music theory, we demonstrate how certain activations naturally follow others, akin to the concept of attraction. Furthermore, we introduce the concept of modulating keys to navigate different basins of attraction within the network. Ultimately, we show that the map of concepts in our model is structured by the musical circle of fifths, highlighting the potential for leveraging music theory principles in deep learning algorithms.

Updated: 2024-10-24 14:14:04

标题: 利用基于音乐语法激活的音乐五度圆构建概念空间

摘要: 在这篇论文中，我们探讨了离散神经网络（例如尖峰网络）的结构与钢琴作品的构成之间引人注目的相似之处。虽然两者都涉及按顺序或并行激活的节点或音符，但后者受益于丰富的音乐理论，以指导有意义的组合。我们提出了一种新颖的方法，利用音乐语法来调节尖峰神经网络中的激活，允许将符号表示为吸引子。通过应用音乐理论中的和弦进行规则，我们展示了某些激活自然地跟随其他激活，类似于吸引力的概念。此外，我们引入了调制键的概念，以在网络中导航不同的吸引盆地。最终，我们展示了我们模型中概念的映射由音乐五度圆所构成，突显了在深度学习算法中利用音乐理论原则的潜力。

更新时间: 2024-10-24 14:14:04

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2403.00790v2

Neural incomplete factorization: learning preconditioners for the conjugate gradient method

The convergence of the conjugate gradient method for solving large-scale and sparse linear equation systems depends on the spectral properties of the system matrix, which can be improved by preconditioning. In this paper, we develop a computationally efficient data-driven approach to accelerate the generation of effective preconditioners. We, therefore, replace the typically hand-engineered preconditioners by the output of graph neural networks. Our method generates an incomplete factorization of the matrix and is, therefore, referred to as neural incomplete factorization (NeuralIF). Optimizing the condition number of the linear system directly is computationally infeasible. Instead, we utilize a stochastic approximation of the Frobenius loss which only requires matrix-vector multiplications for efficient training. At the core of our method is a novel message-passing block, inspired by sparse matrix theory, that aligns with the objective of finding a sparse factorization of the matrix. We evaluate our proposed method on both synthetic problem instances and on problems arising from the discretization of the Poisson equation on varying domains. Our experiments show that by using data-driven preconditioners within the conjugate gradient method we are able to speed up the convergence of the iterative procedure. The code is available at https://github.com/paulhausner/neural-incomplete-factorization.

Updated: 2024-10-24 14:06:40

标题: 神经网络不完全因式分解：学习共轭梯度法的预处理器

摘要: 共轭梯度法用于解决大规模和稀疏线性方程系统的收敛取决于系统矩阵的谱特性，这可以通过预处理来改善。在本文中，我们开发了一种计算效率高的数据驱动方法，用于加速生成有效的预处理器。因此，我们将通常手工设计的预处理器替换为图神经网络的输出。我们的方法生成矩阵的不完全因子分解，因此被称为神经不完全因子分解（NeuralIF）。直接优化线性系统的条件数在计算上是不可行的。相反，我们利用Frobenius损失的随机逼近，只需要进行矩阵-向量乘法以进行有效训练。我们方法的核心是一种受稀疏矩阵理论启发的新型消息传递块，符合寻找矩阵稀疏因子分解的目标。我们在合成问题实例和来自在不同域上离散化的泊松方程的问题上评估我们提出的方法。我们的实验表明，通过在共轭梯度法中使用数据驱动的预处理器，我们能够加速迭代过程的收敛。代码可在https://github.com/paulhausner/neural-incomplete-factorization找到。

更新时间: 2024-10-24 14:06:40

领域: math.OC,cs.LG,cs.NA,math.NA,stat.ML

下载: http://arxiv.org/abs/2305.16368v3

Does Differential Privacy Impact Bias in Pretrained NLP Models?

Differential privacy (DP) is applied when fine-tuning pre-trained large language models (LLMs) to limit leakage of training examples. While most DP research has focused on improving a model's privacy-utility tradeoff, some find that DP can be unfair to or biased against underrepresented groups. In this work, we show the impact of DP on bias in LLMs through empirical analysis. Differentially private training can increase the model bias against protected groups w.r.t AUC-based bias metrics. DP makes it more difficult for the model to differentiate between the positive and negative examples from the protected groups and other groups in the rest of the population. Our results also show that the impact of DP on bias is not only affected by the privacy protection level but also the underlying distribution of the dataset.

Updated: 2024-10-24 13:59:03

标题: 差分隐私对预训练NLP模型中的偏见是否产生影响？

摘要: 差分隐私（DP）被应用于微调预训练的大型语言模型（LLMs）以限制训练样本的泄露。虽然大多数DP研究侧重于改进模型的隐私效用权衡，但有些人发现DP可能对少数群体不公平或存在偏见。在这项工作中，我们通过实证分析展示了DP对LLMs中偏见的影响。差分私有培训可以增加模型对受保护群体的偏见，与基于AUC的偏见指标相关。DP使模型更难区分来自受保护群体和其他群体的正面和负面示例。我们的结果还表明，DP对偏见的影响不仅受隐私保护水平的影响，还受数据集的基础分布的影响。

更新时间: 2024-10-24 13:59:03

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18749v1

Cellpose+, a morphological analysis tool for feature extraction of stained cell images

Advanced image segmentation and processing tools present an opportunity to study cell processes and their dynamics. However, image analysis is often routine and time-consuming. Nowadays, alternative data-driven approaches using deep learning are potentially offering automatized, accurate, and fast image analysis. In this paper, we extend the applications of Cellpose, a state-of-the-art cell segmentation framework, with feature extraction capabilities to assess morphological characteristics. We also introduce a dataset of DAPI and FITC stained cells to which our new method is applied.

Updated: 2024-10-24 13:41:40

标题: Cellpose+，一种用于提取染色细胞图像特征的形态分析工具

摘要: 先进的图像分割和处理工具为研究细胞过程及其动态提供了机会。然而，图像分析通常是例行且耗时的。如今，使用深度学习的替代数据驱动方法可能提供自动化、准确和快速的图像分析。在本文中，我们扩展了Cellpose的应用，这是一个最先进的细胞分割框架，具有特征提取能力，以评估形态特征。我们还介绍了一个DAPI和FITC染色细胞的数据集，我们的新方法被应用于其中。

更新时间: 2024-10-24 13:41:40

领域: cs.CV,cs.AI,68T07

下载: http://arxiv.org/abs/2410.18738v1

Investigating Labeler Bias in Face Annotation for Machine Learning

In a world increasingly reliant on artificial intelligence, it is more important than ever to consider the ethical implications of artificial intelligence on humanity. One key under-explored challenge is labeler bias, which can create inherently biased datasets for training and subsequently lead to inaccurate or unfair decisions in healthcare, employment, education, and law enforcement. Hence, we conducted a study to investigate and measure the existence of labeler bias using images of people from different ethnicities and sexes in a labeling task. Our results show that participants possess stereotypes that influence their decision-making process and that labeler demographics impact assigned labels. We also discuss how labeler bias influences datasets and, subsequently, the models trained on them. Overall, a high degree of transparency must be maintained throughout the entire artificial intelligence training process to identify and correct biases in the data as early as possible.

Updated: 2024-10-24 13:38:34

标题: 调查标注器偏差在面部注释中的影响机器学习

摘要: 在一个越来越依赖人工智能的世界中，考虑人工智能对人类的伦理影响变得比以往更加重要。一个尚未充分探讨的关键挑战是标注者偏见，这可能会为训练创造固有偏见的数据集，并随后导致在医疗保健、就业、教育和执法等领域中做出不准确或不公平的决定。因此，我们进行了一项研究，以探讨和衡量标注者偏见在标注任务中使用不同种族和性别的人的图像时的存在。我们的结果显示，参与者持有影响其决策过程的刻板印象，而标注者的人口统计信息影响了分配的标签。我们还讨论了标注者偏见如何影响数据集，进而影响训练在其上的模型。总的来说，在整个人工智能训练过程中必须保持高度透明，以尽早识别和纠正数据中的偏见。

更新时间: 2024-10-24 13:38:34

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2301.09902v3

A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization

This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparameterized two-layer neural networks. In particular, we consider the minimax optimization problem stemming from estimating linear functional equations defined by conditional expectations, where the objective functions are quadratic in the functional spaces. We address (i) the convergence of the stochastic gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. We establish convergence under the mean-field regime by considering the continuous-time and infinite-width limit of the optimization dynamics. Under this regime, the stochastic gradient descent-ascent corresponds to a Wasserstein gradient flow over the space of probability measures defined over the space of neural network parameters. We prove that the Wasserstein gradient flow converges globally to a stationary point of the minimax objective at a $O(T^{-1} + \alpha^{-1})$ sublinear rate, and additionally finds the solution to the functional equation when the regularizer of the minimax objective is strongly convex. Here $T$ denotes the time and $\alpha$ is a scaling parameter of the neural networks. In terms of representation learning, our results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(\alpha^{-1})$, measured in terms of the Wasserstein distance. Finally, we apply our general results to concrete examples including policy evaluation, nonparametric instrumental variable regression, asset pricing, and adversarial Riesz representer estimation.

Updated: 2024-10-24 13:38:19

标题: 神经随机梯度下降-上升的平均场分析：用于函数极小化优化的研究

摘要: 这篇论文研究了定义在过度参数化的两层神经网络的无限维函数类上的极小极大优化问题。具体而言，我们考虑源自估计由条件期望定义的线性功能方程的极小极大优化问题，其中目标函数在功能空间中是二次的。我们讨论了（i）随机梯度下降-上升算法的收敛性和（ii）神经网络的表示学习。通过考虑优化动态的连续时间和无穷宽度极限，我们建立了在均场极限下的收敛性。在这种极限下，随机梯度下降-上升对应于在神经网络参数空间上定义的概率测度空间上的Wasserstein梯度流。我们证明了Wasserstein梯度流以$O(T^{-1} + \alpha^{-1})$的次线性速率全局收敛于极小极大目标的稳定点，并且在极小极大目标的正强凸正则化器下找到了功能方程的解。这里$T$表示时间，$\alpha$是神经网络的一个缩放参数。在表示学习方面，我们的结果表明由神经网络引起的特征表示允许与初始特征表示相差$O(\alpha^{-1})$的量级，用Wasserstein距离来衡量。最后，我们将我们的一般结果应用于具体示例，包括策略评估、非参数工具变量回归、资产定价和对抗Riesz再现估计。

更新时间: 2024-10-24 13:38:19

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2404.12312v3

Evaluating AI-Generated Essays with GRE Analytical Writing Assessment

The recent revolutionary advance in generative AI enables the generation of realistic and coherent texts by large language models (LLMs). Despite many existing evaluation metrics on the quality of the generated texts, there is still a lack of rigorous assessment of how well LLMs perform in complex and demanding writing assessments. This study examines essays generated by ten leading LLMs for the analytical writing assessment of the Graduate Record Exam (GRE). We assessed these essays using both human raters and the e-rater automated scoring engine as used in the GRE scoring pipeline. Notably, the top-performing Gemini and GPT-4o received an average score of 4.78 and 4.67, respectively, falling between "generally thoughtful, well-developed analysis of the issue and conveys meaning clearly" and "presents a competent analysis of the issue and conveys meaning with acceptable clarity" according to the GRE scoring guideline. We also evaluated the detection accuracy of these essays, with detectors trained on essays generated by the same and different LLMs.

Updated: 2024-10-24 13:34:47

标题: 用GRE分析写作评估评估人工智能生成的论文

摘要: 最近在生成式人工智能领域的革命性进展使得大型语言模型（LLMs）能够生成逼真且连贯的文本。尽管有许多现有的评估指标用于评估生成文本的质量，但对于LLMs在复杂和要求高的写作评估中表现如何仍然缺乏严格的评估。本研究考察了由十个领先的LLMs生成的用于研究生入学考试（GRE）分析写作评估的文章。我们使用人工评分员和GRE评分流程中使用的e-rater自动评分引擎对这些文章进行了评估。值得注意的是，表现最好的Gemini和GPT-4o分别获得了平均分数4.78和4.67，落在GRE评分指南中“对问题进行一般思考，分析充分，表达清晰”和“对问题进行胜任分析，表达清晰度可接受”之间。我们还评估了这些文章的检测准确度，检测器经过训练，分别在相同和不同的LLMs生成的文章上。

更新时间: 2024-10-24 13:34:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.17439v2

HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks

Coding tasks have been valuable for evaluating Large Language Models (LLMs), as they demand the comprehension of high-level instructions, complex reasoning, and the implementation of functional programs -- core capabilities for advancing Artificial General Intelligence. Despite the progress in Large Multimodal Models (LMMs), which extend LLMs with visual perception and understanding capabilities, there remains a notable lack of coding benchmarks that rigorously assess these models, particularly in tasks that emphasize visual reasoning. To address this gap, we introduce HumanEval-V, a novel and lightweight benchmark specifically designed to evaluate LMMs' visual understanding and reasoning capabilities through code generation. HumanEval-V includes 108 carefully crafted, entry-level Python coding tasks derived from platforms like CodeForces and Stack Overflow. Each task is adapted by modifying the context and algorithmic patterns of the original problems, with visual elements redrawn to ensure distinction from the source, preventing potential data leakage. LMMs are required to complete the code solution based on the provided visual context and a predefined Python function signature outlining the task requirements. Every task is equipped with meticulously handcrafted test cases to ensure a thorough and reliable evaluation of model-generated solutions. We evaluate 19 state-of-the-art LMMs using HumanEval-V, uncovering significant challenges. Proprietary models like GPT-4o achieve only 13% pass@1 and 36.4% pass@10, while open-weight models with 70B parameters score below 4% pass@1. Ablation studies further reveal the limitations of current LMMs in vision reasoning and coding capabilities. These results underscore key areas for future research to enhance LMMs' capabilities. We have open-sourced our code and benchmark at https://github.com/HumanEval-V/HumanEval-V-Benchmark.

Updated: 2024-10-24 13:33:58

标题: HumanEval-V: 通过编码任务评估大型多模态模型的视觉理解和推理能力

摘要: 编码任务对于评估大型语言模型（LLMs）非常有价值，因为它们要求理解高级指令、复杂推理和实现功能性程序，这是推进人工通用智能的核心能力。尽管在大型多模态模型（LMMs）方面取得了进展，这些模型通过视觉感知和理解能力扩展了LLMs，但在严格评估这些模型的编码基准方面仍存在明显缺乏，特别是在强调视觉推理的任务中。为了填补这一空白，我们介绍了HumanEval-V，这是一个新颖且轻量级的基准，专门设计用于通过代码生成评估LMMs的视觉理解和推理能力。HumanEval-V包括108个精心设计的入门级Python编码任务，这些任务来源于CodeForces和Stack Overflow等平台。通过修改原始问题的上下文和算法模式，重新绘制视觉元素以确保与原始来源的区分，从而防止潜在数据泄漏。LMMs需要根据提供的视觉上下文和预定义的Python函数签名来完成代码解决方案，概述任务要求。每个任务都配备了精心制作的测试用例，以确保对模型生成的解决方案进行全面而可靠的评估。我们使用HumanEval-V评估了19个最先进的LMMs，揭示了重大挑战。专有模型如GPT-4o只能实现13%的通过率@1和36.4%的通过率@10，而参数为70B的开放权重模型在通过率@1方面得分低于4%。消融研究进一步揭示了当前LMMs在视觉推理和编码能力方面的局限性。这些结果强调了未来研究增强LMMs能力的关键领域。我们已在https://github.com/HumanEval-V/HumanEval-V-Benchmark上开源我们的代码和基准。

更新时间: 2024-10-24 13:33:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.12381v2

AI Readiness in Healthcare through Storytelling XAI

Artificial Intelligence is rapidly advancing and radically impacting everyday life, driven by the increasing availability of computing power. Despite this trend, the adoption of AI in real-world healthcare is still limited. One of the main reasons is the trustworthiness of AI models and the potential hesitation of domain experts with model predictions. Explainable Artificial Intelligence (XAI) techniques aim to address these issues. However, explainability can mean different things to people with different backgrounds, expertise, and goals. To address the target audience with diverse needs, we develop storytelling XAI. In this research, we have developed an approach that combines multi-task distillation with interpretability techniques to enable audience-centric explainability. Using multi-task distillation allows the model to exploit the relationships between tasks, potentially improving interpretability as each task supports the other leading to an enhanced interpretability from the perspective of a domain expert. The distillation process allows us to extend this research to large deep models that are highly complex. We focus on both model-agnostic and model-specific methods of interpretability, supported by textual justification of the results in healthcare through our use case. Our methods increase the trust of both the domain experts and the machine learning experts to enable a responsible AI.

Updated: 2024-10-24 13:30:18

标题: 《通过叙事XAI实现医疗保健领域的人工智能准备情况》

摘要: 人工智能正在迅速发展，并且通过日益增加的计算能力，彻底地影响着日常生活。尽管存在这一趋势，但人工智能在现实世界的医疗保健中的应用仍然有限。其中一个主要原因是人工智能模型的可靠性以及领域专家对模型预测可能存在的犹豫。可解释人工智能（XAI）技术旨在解决这些问题。然而，对于具有不同背景、专业知识和目标的人来说，可解释性可能意味着不同的事情。为了满足具有不同需求的目标受众，我们开发了故事化XAI。在这项研究中，我们开发了一种方法，将多任务蒸馏与可解释性技术结合起来，以实现以受众为中心的可解释性。使用多任务蒸馏允许模型利用任务之间的关系，潜在地提高可解释性，因为每个任务相互支持，从领域专家的角度来看，导致了增强的可解释性。蒸馏过程使我们能够将这项研究拓展到高度复杂的大型深度模型。我们关注解释性的模型无关和模型特定方法，通过我们的用例在医疗保健领域支持结果的文本说明。我们的方法提高了领域专家和机器学习专家对于实现负责任人工智能的信任。

更新时间: 2024-10-24 13:30:18

领域: cs.AI

下载: http://arxiv.org/abs/2410.18725v1

Lightweight Correlation-Aware Table Compression

The growing adoption of data lakes for managing relational data necessitates efficient, open storage formats that provide high scan performance and competitive compression ratios. While existing formats achieve fast scans through lightweight encoding techniques, they have reached a plateau in terms of minimizing storage footprint. Recently, correlation-aware compression schemes have been shown to reduce file sizes further. Yet, current approaches either incur significant scan overheads or require manual specification of correlations, limiting their practicability. We present $\texttt{Virtual}$, a framework that integrates seamlessly with existing open formats to automatically leverage data correlations, achieving substantial compression gains while having minimal scan performance overhead. Experiments on data-gov datasets show that $\texttt{Virtual}$ reduces file sizes by up to 40% compared to Apache Parquet.

Updated: 2024-10-24 13:28:18

标题: 轻量级相关性感知表压缩

摘要: 随着数据湖在管理关系数据方面的日益普及，需要高效、开放的存储格式，提供高扫描性能和竞争性压缩比。虽然现有格式通过轻量级编码技术实现快速扫描，但在最小化存储占用方面已经达到了平台。最近，已经证明了具有相关性感知压缩方案可以进一步减小文件大小。然而，当前方法要么产生显着的扫描开销，要么需要手动指定相关性，从而限制了实用性。我们提出了一个名为$\texttt{Virtual}$的框架，与现有开放格式无缝集成，自动利用数据相关性，实现了可观的压缩增益，同时具有最小的扫描性能开销。对data-gov数据集的实验表明，与Apache Parquet相比，$\texttt{Virtual}$可以将文件大小减少多达40%。

更新时间: 2024-10-24 13:28:18

领域: cs.DB,cs.IR,cs.LG

下载: http://arxiv.org/abs/2410.14066v3

GeoLoRA: Geometric integration for parameter efficient fine-tuning

Low-Rank Adaptation (LoRA) has become a widely used method for parameter-efficient fine-tuning of large-scale, pre-trained neural networks. However, LoRA and its extensions face several challenges, including the need for rank adaptivity, robustness, and computational efficiency during the fine-tuning process. We introduce GeoLoRA, a novel approach that addresses these limitations by leveraging dynamical low-rank approximation theory. GeoLoRA requires only a single backpropagation pass over the small-rank adapters, significantly reducing computational cost as compared to similar dynamical low-rank training methods and making it faster than popular baselines such as AdaLoRA. This allows GeoLoRA to efficiently adapt the allocated parameter budget across the model, achieving smaller low-rank adapters compared to heuristic methods like AdaLoRA and LoRA, while maintaining critical convergence, descent, and error-bound theoretical guarantees. The resulting method is not only more efficient but also more robust to varying hyperparameter settings. We demonstrate the effectiveness of GeoLoRA on several state-of-the-art benchmarks, showing that it outperforms existing methods in both accuracy and computational efficiency.

Updated: 2024-10-24 13:26:10

标题: GeoLoRA: 几何集成用于参数高效微调

摘要: Low-Rank Adaptation (LoRA)已成为广泛使用的方法，用于对大规模、预训练的神经网络进行参数高效微调。然而，LoRA及其扩展面临几个挑战，包括需要在微调过程中进行秩自适应、鲁棒性和计算效率。我们引入了GeoLoRA，一种利用动态低秩逼近理论解决这些限制的新方法。GeoLoRA仅需要对小秩适配器进行一次反向传播，与类似的动态低秩训练方法相比，显着降低了计算成本，并使其比如AdaLoRA等流行基线更快。这使得GeoLoRA能够有效地在模型中调整分配的参数预算，实现较小的低秩适配器，而与AdaLoRA和LoRA等经验方法相比，仍能保持关键的收敛、下降和误差界理论保证。由此产生的方法不仅更高效，而且更能适应不同的超参数设置。我们在几个最先进的基准测试上展示了GeoLoRA的有效性，表明它在准确性和计算效率方面优于现有方法。

更新时间: 2024-10-24 13:26:10

领域: cs.LG,cs.AI,cs.NA,math.NA

下载: http://arxiv.org/abs/2410.18720v1

LLM-based Online Prediction of Time-varying Graph Signals

In this paper, we propose a novel framework that leverages large language models (LLMs) for predicting missing values in time-varying graph signals by exploiting spatial and temporal smoothness. We leverage the power of LLM to achieve a message-passing scheme. For each missing node, its neighbors and previous estimates are fed into and processed by LLM to infer the missing observations. Tested on the task of the online prediction of wind-speed graph signals, our model outperforms online graph filtering algorithms in terms of accuracy, demonstrating the potential of LLMs in effectively addressing partially observed signals in graphs.

Updated: 2024-10-24 13:22:50

标题: 基于LLM的在线预测时变图信号

摘要: 在本文中，我们提出了一个新颖的框架，利用大型语言模型（LLMs）来预测时变图信号中的缺失数值，通过利用空间和时间的平滑性。我们利用LLM的能力来实现一个消息传递方案。对于每个缺失节点，其邻居和先前的估计被输入和由LLM处理，从而推断出缺失的观测。在风速图信号的在线预测任务上进行测试，我们的模型在准确性方面优于在线图滤波算法，展示了LLMs在有效地处理图中部分观测信号方面的潜力。

更新时间: 2024-10-24 13:22:50

领域: cs.AI

下载: http://arxiv.org/abs/2410.18718v1

Low-Latency Video Anonymization for Crowd Anomaly Detection: Privacy vs. Performance

Recent advancements in artificial intelligence promise ample potential in monitoring applications with surveillance cameras. However, concerns about privacy and model bias have made it challenging to utilize them in public. Although de-identification approaches have been proposed in the literature, aiming to achieve a certain level of anonymization, most of them employ deep learning models that are computationally demanding for real-time edge deployment. In this study, we revisit conventional anonymization solutions for privacy protection and real-time video anomaly detection (VAD) applications. We propose a novel lightweight adaptive anonymization for VAD (LA3D) that employs dynamic adjustment to enhance privacy protection. We evaluated the approaches on publicly available privacy and VAD data sets to examine the strengths and weaknesses of the different anonymization techniques and highlight the promising efficacy of our approach. Our experiment demonstrates that LA3D enables substantial improvement in the privacy anonymization capability without majorly degrading VAD efficacy.

Updated: 2024-10-24 13:22:33

标题: 低延迟视频匿名化用于群体异常检测：隐私与性能

摘要: 人工智能的最新进展为监控应用与监控摄像头提供了丰富的潜力。然而，对隐私和模型偏差的担忧使得在公共场所利用它们变得具有挑战性。虽然文献中提出了去识别的方法，旨在实现一定程度的匿名化，但大多数方法都采用深度学习模型，这些模型对于实时边缘部署而言计算需求较高。在本研究中，我们重新审视了用于隐私保护和实时视频异常检测（VAD）应用的传统匿名化解决方案。我们提出了一种新颖的轻量级自适应VAD匿名化（LA3D）方法，该方法采用动态调整以增强隐私保护。我们在公开可用的隐私和VAD数据集上评估了这些方法，以检验不同匿名化技术的优势和劣势，并突出了我们方法的有望功效。我们的实验表明，LA3D能够显著提高隐私匿名化能力，同时不会严重降低VAD的功效。

更新时间: 2024-10-24 13:22:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.18717v1

Retrieval-Augmented Diffusion Models for Time Series Forecasting

While time series diffusion models have received considerable focus from many recent works, the performance of existing models remains highly unstable. Factors limiting time series diffusion models include insufficient time series datasets and the absence of guidance. To address these limitations, we propose a Retrieval- Augmented Time series Diffusion model (RATD). The framework of RATD consists of two parts: an embedding-based retrieval process and a reference-guided diffusion model. In the first part, RATD retrieves the time series that are most relevant to historical time series from the database as references. The references are utilized to guide the denoising process in the second part. Our approach allows leveraging meaningful samples within the database to aid in sampling, thus maximizing the utilization of datasets. Meanwhile, this reference-guided mechanism also compensates for the deficiencies of existing time series diffusion models in terms of guidance. Experiments and visualizations on multiple datasets demonstrate the effectiveness of our approach, particularly in complicated prediction tasks.

Updated: 2024-10-24 13:14:39

标题: 检索增强扩散模型用于时间序列预测

摘要: 虽然时间序列扩散模型在许多最近的研究中受到了相当多的关注，但现有模型的性能仍然非常不稳定。限制时间序列扩散模型的因素包括时间序列数据集不足和缺乏指导。为了解决这些限制，我们提出了一种检索增强时间序列扩散模型（RATD）。RATD的框架由两部分组成：基于嵌入的检索过程和参考引导的扩散模型。在第一部分中，RATD从数据库中检索与历史时间序列最相关的时间序列作为参考。这些参考被用来指导第二部分中的去噪过程。我们的方法允许利用数据库中有意义的样本来帮助采样，从而最大限度地利用数据集。同时，这种参考引导机制也弥补了现有时间序列扩散模型在指导方面的不足。对多个数据集进行的实验和可视化展示了我们的方法的有效性，特别是在复杂的预测任务中。

更新时间: 2024-10-24 13:14:39

领域: cs.LG

下载: http://arxiv.org/abs/2410.18712v1

Exploiting Interpretable Capabilities with Concept-Enhanced Diffusion and Prototype Networks

Concept-based machine learning methods have increasingly gained importance due to the growing interest in making neural networks interpretable. However, concept annotations are generally challenging to obtain, making it crucial to leverage all their prior knowledge. By creating concept-enriched models that incorporate concept information into existing architectures, we exploit their interpretable capabilities to the fullest extent. In particular, we propose Concept-Guided Conditional Diffusion, which can generate visual representations of concepts, and Concept-Guided Prototype Networks, which can create a concept prototype dataset and leverage it to perform interpretable concept prediction. These results open up new lines of research by exploiting pre-existing information in the quest for rendering machine learning more human-understandable.

Updated: 2024-10-24 13:07:56

标题: 利用增强概念扩散和原型网络的可解释性能

摘要: 基于概念的机器学习方法因日益增长的对神经网络可解释性兴趣而变得越来越重要。然而，概念注释通常难以获得，因此利用所有先前的知识至关重要。通过创建将概念信息融入现有架构的概念丰富模型，我们最大程度地利用其可解释能力。特别地，我们提出了概念引导条件扩散和概念引导原型网络，前者可以生成概念的视觉表示，后者可以创建概念原型数据集并利用它进行可解释的概念预测。这些结果通过利用现有信息来探索使机器学习更易于人理解的新研究方向。

更新时间: 2024-10-24 13:07:56

领域: cs.LG

下载: http://arxiv.org/abs/2410.18705v1

BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching

The advanced capabilities of Large Language Models (LLMs) have inspired the development of various interactive web services or applications, such as ChatGPT, which offer query inference services for users. Unlike traditional DNN model, the inference of LLM entails different iterations of forward computation for different queries, which result in efficiency challenges for existing run-to-completion batch-wise inference. Hence, some methods refine batch-wise inference to iteration-level by duplicating all nonlinear layers of LLM. However, this approach not only increases resource usage but also introduces idle computations to the batch due to the prefilling of newly added queries. Therefore, we propose BATON, an efficient batch-wise LLM inference scheme by dynamically adjusting processing batch, which can achieve near-zero idle computations without incurring additional resource consumption. To do so, BATON 1) shapes the vectors involved in the inference of the newly inserted query and processing batch to align dimensions and generates a new attention mask based on vector shaping to ensure inference correctness, which enables query inserting without consuming additional resource; 2) embeds prefilled Keys and Values of the new query into the KV_Cache of the processing batch by leveraging the prefilling and decoding separation mechanism, eliminating idle computations to the batch introduced by the prefilling process of the new query. Experimental results show that compared to the state-of-the-art solution Orca, BATON improves query processing by up to 1.75 times.

Updated: 2024-10-24 12:53:39

标题: BATON：通过动态重新分批提高大型语言模型的批处理推理效率

摘要: 大型语言模型（LLM）的先进功能已经激发了各种交互式网络服务或应用程序的发展，例如ChatGPT，为用户提供查询推理服务。与传统的深度神经网络模型不同，LLM的推理涉及对不同查询的前向计算的不同迭代，这导致了现有的按批次运行完成的推理的效率挑战。因此，一些方法通过复制LLM的所有非线性层，将批次推理细化到迭代级别。然而，这种方法不仅增加了资源使用，还因新增查询的预填充而向批次引入了空闲计算。因此，我们提出了BATON，一种通过动态调整处理批次的高效批次LLM推理方案，可以实现几乎零空闲计算而不增加额外的资源消耗。为此，BATON 1）塑造参与新插入查询和处理批次推理的向量，以对齐维度，并基于向量塑造生成新的注意力掩码，以确保推理的正确性，从而实现查询插入而不消耗额外资源；2）利用预填充和解码分离机制，将新查询的预填充键和值嵌入到处理批次的KV_Cache中，消除了新查询的预填充过程引入批次的空闲计算。实验结果显示，与最先进的解决方案Orca相比，BATON将查询处理性能提高了最多1.75倍。

更新时间: 2024-10-24 12:53:39

领域: cs.LG

下载: http://arxiv.org/abs/2410.18701v1

How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs

Recent research has focused on literary machine translation (MT) as a new challenge in MT. However, the evaluation of literary MT remains an open problem. We contribute to this ongoing discussion by introducing LITEVAL-CORPUS, a paragraph-level parallel corpus comprising multiple verified human translations and outputs from 9 MT systems, which totals over 2k paragraphs and includes 13k annotated sentences across four language pairs, costing 4.5k Euro. This corpus enables us to (i) examine the consistency and adequacy of multiple annotation schemes, (ii) compare evaluations by students and professionals, and (iii) assess the effectiveness of LLM-based metrics. We find that Multidimensional Quality Metrics (MQM), as the de facto standard in non-literary human MT evaluation, is inadequate for literary translation: While Best-Worst Scaling (BWS) with students and Scalar Quality Metric (SQM) with professional translators prefer human translations at rates of ~82% and ~94%, respectively, MQM with student annotators prefers human professional translations over the translations of the best-performing LLMs in only ~42% of cases. While automatic metrics generally show a moderate correlation with human MQM and SQM, they struggle to accurately identify human translations, with rates of at most ~20%. Our overall evaluation indicates that human professional translations consistently outperform LLM translations, where even the most recent LLMs tend to produce more literal and less diverse translations compared to human translations. However, newer LLMs such as GPT-4o perform substantially better than older ones.

Updated: 2024-10-24 12:48:03

标题: LLM对文学翻译到底有多好？人类和LLM对文学翻译的评估

摘要: 最近的研究重点放在文学机器翻译（MT）上，作为MT中的一个新挑战。然而，文学MT的评估仍然是一个未解决的问题。我们通过引入LITEVAL-CORPUS来参与这一持续讨论，这是一个段落级平行语料库，包括多个经过验证的人类翻译和来自9个MT系统的输出，总共超过2千个段落，跨越四种语言对，耗费4.5千欧元。这个语料库使我们能够（i）检查多种注释方案的一致性和充分性，（ii）比较学生和专业人士的评估，以及（iii）评估基于LLM的度量的有效性。我们发现，作为非文学人类MT评估的事实标准的多维质量度量（MQM）对于文学翻译来说是不足的：虽然学生使用最佳-最坏比例（BWS）和专业翻译人员使用标量质量度量（SQM）分别以约82%和约94%的比率偏好人类翻译，但MQM与学生注释者倾向于人类专业翻译而不是最佳LLM翻译的情况只有约42%。虽然自动度量通常与人类MQM和SQM呈现出中等相关性，但它们难以准确识别人类翻译，最多只有约20%的比率。我们的整体评估表明，人类专业翻译始终优于LLM翻译，即使最新的LLM也倾向于产生比人类翻译更直接和不太多样化的翻译。然而，像GPT-4o这样的较新LLM表现要好得多。

更新时间: 2024-10-24 12:48:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.18697v1

Versatile Motion Language Models for Multi-Turn Interactive Agents

Recent advancements in large language models (LLMs) have greatly enhanced their ability to generate natural and contextually relevant text, making AI interactions more human-like. However, generating and understanding interactive human-like motion, where two individuals engage in coordinated movements, remains a challenge due to the complexity of modeling these coordinated interactions. Furthermore, a versatile model is required to handle diverse interactive scenarios, such as chat systems that follow user instructions or adapt to their assigned role while adjusting interaction dynamics. To tackle this problem, we introduce VIM, short for the Versatile Interactive Motion language model, which integrates both language and motion modalities to effectively understand, generate, and control interactive motions in multi-turn conversational contexts. To address the scarcity of multi-turn interactive motion data, we introduce a synthetic dataset, INERT-MT2, where we utilize pre-trained models to create diverse instructional datasets with interactive motion. Our approach first trains a motion tokenizer that encodes interactive motions into residual discrete tokens. In the pretraining stage, the model learns to align motion and text representations with these discrete tokens. During the instruction fine-tuning stage, VIM adapts to multi-turn conversations using the INTER-MT2 dataset. We evaluate the versatility of our method across motion-related tasks, motion to text, text to motion, reaction generation, motion editing, and reasoning about motion sequences. The results highlight the versatility and effectiveness of proposed method in handling complex interactive motion synthesis.

Updated: 2024-10-24 12:47:56

标题: 多回合交互代理的多功能运动语言模型

摘要: 最近大型语言模型（LLM）的进展极大地增强了它们生成自然和上下文相关文本的能力，使得人工智能交互更加类似人类。然而，生成和理解人类交互式运动，其中两个个体进行协调运动，仍然是一个挑战，因为模拟这些协调交互的复杂性。此外，需要一个多才多艺的模型来处理各种交互场景，如遵循用户指令或适应其分配角色并调整交互动态的聊天系统。为了解决这个问题，我们引入了VIM，即通用交互动作语言模型，它整合了语言和动作模态，以有效地理解、生成和控制多轮会话背景下的交互动作。为了解决多轮交互动作数据的稀缺性，我们引入了一个合成数据集INERT-MT2，我们利用预训练模型创建了包含交互动作的多样化指导数据集。我们的方法首先训练一个动作分词器，将交互动作编码为残差离散标记。在预训练阶段，模型学习使用这些离散标记对齐动作和文本表示。在指导微调阶段，VIM使用INTER-MT2数据集适应多轮对话。我们评估了我们的方法在与运动相关的任务、运动到文本、文本到运动、反应生成、运动编辑以及推理关于运动序列的任务中的多才多艺。结果突出了所提出方法在处理复杂交互式运动合成方面的多才多艺和有效性。

更新时间: 2024-10-24 12:47:56

领域: cs.AI

下载: http://arxiv.org/abs/2410.05628v3

Self-Adaptive Physics-Informed Quantum Machine Learning for Solving Differential Equations

Chebyshev polynomials have shown significant promise as an efficient tool for both classical and quantum neural networks to solve linear and nonlinear differential equations. In this work, we adapt and generalize this framework in a quantum machine learning setting for a variety of problems, including the 2D Poisson's equation, second-order differential equation, system of differential equations, and nonlinear Riccati equation. In particular, we propose in the quantum setting a modified Self-Adaptive Physics-Informed Neural Network (SAPINN) approach, where self-adaptive weights are applied to problems with multi-objective loss functions. We further explore capturing correlations in our loss function using a quantum-correlated measurement, resulting in improved accuracy for initial value problems. We analyse also the use of entangling layers and their impact on the solution accuracy for second-order differential equations. The results indicate a promising approach to the near-term evaluation of differential equations on quantum devices.

Updated: 2024-10-24 12:43:52

标题: 自适应物理启发的量子机器学习用于求解微分方程

摘要: 切比雪夫多项式已显示出作为解决线性和非线性微分方程的经典和量子神经网络的有效工具的重要潜力。在这项工作中，我们在量子机器学习环境中调整和推广了这一框架，用于解决各种问题，包括二维泊松方程、二阶微分方程、微分方程系统和非线性里卡蒂方程。特别是，我们在量子环境中提出了一种修改的自适应物理信息神经网络（SAPINN）方法，其中自适应权重应用于具有多目标损失函数的问题。我们进一步探讨了使用量子相关测量来捕捉损失函数中的相关性，从而提高初值问题的准确性。我们还分析了纠缠层的使用及其对二阶微分方程解的准确性的影响。结果表明这是一种有前景的途径，可以在量子设备上近期评估微分方程。

更新时间: 2024-10-24 12:43:52

领域: quant-ph,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2312.09215v2

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

The availability of high-quality data is one of the most important factors in improving the reasoning capability of LLMs. Existing works have demonstrated the effectiveness of creating more instruction data from seed questions or knowledge bases. Recent research indicates that continually scaling up data synthesis from strong models (e.g., GPT-4) can further elicit reasoning performance. Though promising, the open-sourced community still lacks high-quality data at scale and scalable data synthesis methods with affordable costs. To address this, we introduce ScaleQuest, a scalable and novel data synthesis method that utilizes "small-size" (e.g., 7B) open-source models to generate questions from scratch without the need for seed data with complex augmentation constraints. With the efficient ScaleQuest, we automatically constructed a mathematical reasoning dataset consisting of 1 million problem-solution pairs, which are more effective than existing open-sourced datasets. It can universally increase the performance of mainstream open-source models (i.e., Mistral, Llama3, DeepSeekMath, and Qwen2-Math) by achieving 29.2% to 46.4% gains on MATH. Notably, simply fine-tuning the Qwen2-Math-7B-Base model with our dataset can even surpass Qwen2-Math-7B-Instruct, a strong and well-aligned model on closed-source data, and proprietary models such as GPT-4-Turbo and Claude-3.5 Sonnet.

Updated: 2024-10-24 12:42:04

标题: 从零开始通过可扩展的问题合成释放LLMs的推理能力

摘要: 高质量数据的可用性是提高LLMs推理能力的最重要因素之一。现有研究已经证明了从种子问题或知识库中创建更多指导数据的有效性。最近的研究表明，持续扩大从强模型（例如GPT-4）合成数据可以进一步激发推理性能。尽管有希望，但开源社区仍然缺乏规模和可扩展的数据合成方法，成本也相对较低。为了解决这个问题，我们介绍了ScaleQuest，这是一种可扩展且新颖的数据合成方法，利用“小型”（例如7B）的开源模型，可以从零开始生成问题，而无需复杂的增强约束种子数据。通过高效的ScaleQuest，我们自动构建了一个数学推理数据集，包含100万个问题-解决方案对，比现有的开源数据集更有效。它可以普遍提高主流开源模型的性能（即Mistral、Llama3、DeepSeekMath和Qwen2-Math），在数学方面实现29.2%到46.4%的增益。值得注意的是，仅通过使用我们的数据集微调Qwen2-Math-7B-Base模型，甚至可以超越Qwen2-Math-7B-Instruct，这是一个在闭源数据和专有模型（如GPT-4-Turbo和Claude-3.5 Sonnet）上的强大且对齐良好的模型。

更新时间: 2024-10-24 12:42:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.18693v1

Large Language Models for Financial Aid in Financial Time-series Forecasting

Considering the difficulty of financial time series forecasting in financial aid, much of the current research focuses on leveraging big data analytics in financial services. One modern approach is to utilize "predictive analysis", analogous to forecasting financial trends. However, many of these time series data in Financial Aid (FA) pose unique challenges due to limited historical datasets and high dimensional financial information, which hinder the development of effective predictive models that balance accuracy with efficient runtime and memory usage. Pre-trained foundation models are employed to address these challenging tasks. We use state-of-the-art time series models including pre-trained LLMs (GPT-2 as the backbone), transformers, and linear models to demonstrate their ability to outperform traditional approaches, even with minimal ("few-shot") or no fine-tuning ("zero-shot"). Our benchmark study, which includes financial aid with seven other time series tasks, shows the potential of using LLMs for scarce financial datasets.

Updated: 2024-10-24 12:41:47

标题: 大型语言模型在金融时间序列预测中的应用

摘要: 鉴于金融时间序列预测在金融领域的困难，目前许多研究集中于利用大数据分析技术在金融服务中的应用。一种现代方法是利用“预测分析”，类似于预测金融趋势。然而，金融援助（FA）中许多时间序列数据面临独特挑战，因为历史数据集有限，且金融信息维度高，这阻碍了开发既准确又高效运行和内存利用率平衡的有效预测模型。预先训练的基础模型被用来解决这些具有挑战性的任务。我们使用包括预先训练的LLMs（以GPT-2为骨干）、变压器和线性模型在内的最先进的时间序列模型来展示它们在超越传统方法方面的能力，即使是在最小（“少量样本”）或没有微调（“零样本”）的情况下也是如此。我们的基准研究，包括金融援助和其他七个时间序列任务，展示了利用LLMs处理稀缺金融数据集的潜力。

更新时间: 2024-10-24 12:41:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.19025v1

System 2 thinking in OpenAI's o1-preview model: Near-perfect performance on a mathematics exam

The processes underlying human cognition are often divided into System 1, which involves fast, intuitive thinking, and System 2, which involves slow, deliberate reasoning. Previously, large language models were criticized for lacking the deeper, more analytical capabilities of System 2. In September 2024, OpenAI introduced the o1 model series, designed to handle System 2-like reasoning. While OpenAI's benchmarks are promising, independent validation is still needed. In this study, we tested the o1-preview model twice on the Dutch 'Mathematics B' final exam. It scored a near-perfect 76 and 74 out of 76 points. For context, only 24 out of 16,414 students in the Netherlands achieved a perfect score. By comparison, the GPT-4o model scored 66 and 62 out of 76, well above the Dutch students' average of 40.63 points. Neither model had access to the exam figures. Since there was a risk of model contami-nation (i.e., the knowledge cutoff for o1-preview and GPT-4o was after the exam was published online), we repeated the procedure with a new Mathematics B exam that was published after the cutoff date. The results again indicated that o1-preview performed strongly (97.8th percentile), which suggests that contamination was not a factor. We also show that there is some variability in the output of o1-preview, which means that sometimes there is 'luck' (the answer is correct) or 'bad luck' (the output has diverged into something that is incorrect). We demonstrate that the self-consistency approach, where repeated prompts are given and the most common answer is selected, is a useful strategy for identifying the correct answer. It is concluded that while OpenAI's new model series holds great potential, certain risks must be considered.

Updated: 2024-10-24 12:39:19

标题: OpenAI的o1-preview模型中的系统2思维：在数学考试中表现接近完美

摘要: 人类认知背后的过程常被划分为系统1，涉及快速、直觉性思维，和系统2，涉及缓慢、审慎的推理。先前，大型语言模型被批评缺乏系统2更深层、更分析性的能力。2024年9月，OpenAI推出了o1模型系列，旨在处理类似系统2的推理。虽然OpenAI的基准测试结果令人鼓舞，但仍需要独立验证。在这项研究中，我们将o1-preview模型两次应用于荷兰“数学B”期末考试。它得分接近完美，分别为76和74分（满分76分）。值得一提的是，荷兰共有16,414名学生中仅有24名获得满分。相比之下，GPT-4o模型得分为66和62分，远高于荷兰学生的平均分40.63分。两个模型均没有获得考试题目。由于存在模型污染的风险（即，o1-preview和GPT-4o的知识截止日期在考试发布在线后），我们用在截止日期后发布的新“数学B”考试重复了该程序。结果再次表明o1-preview表现良好（97.8百分位数），这表明污染不是一个因素。我们还展示了o1-preview输出存在一定的变化性，有时会出现“运气”（答案正确）或 “倒霉”（输出偏离了正确答案）。我们证明了自一致方法的有效性，即给出重复提示并选择最常见的答案，以识别正确答案。总结认为，虽然OpenAI的新模型系列具有巨大潜力，但必须考虑某些风险。

更新时间: 2024-10-24 12:39:19

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.07114v4

Safe machine learning model release from Trusted Research Environments: The SACRO-ML package

We present SACRO-ML, an integrated suite of open source Python tools to facilitate the statistical disclosure control (SDC) of machine learning (ML) models trained on confidential data prior to public release. SACRO-ML combines (i) a SafeModel package that extends commonly used ML models to provide ante-hoc SDC by assessing the vulnerability of disclosure posed by the training regime; and (ii) an Attacks package that provides post-hoc SDC by rigorously assessing the empirical disclosure risk of a model through a variety of simulated attacks after training. The SACRO-ML code and documentation are available under an MIT license at https://github.com/AI-SDC/SACRO-ML

Updated: 2024-10-24 12:33:53

标题: 可信研究环境中安全发布机器学习模型：SACRO-ML软件包

摘要: 我们提出了SACRO-ML，这是一个集成的开源Python工具套件，用于在机器学习（ML）模型在发布之前基于机密数据进行统计披露控制（SDC）。 SACRO-ML结合了（i）一个SafeModel包，它扩展了常用的ML模型，通过评估训练方案带来的披露脆弱性来提供前置SDC；和（ii）一个Attacks包，通过在训练后进行各种模拟攻击来严格评估模型的经验披露风险，从而提供后置SDC。 SACRO-ML的代码和文档可在https://github.com/AI-SDC/SACRO-ML下使用MIT许可证获得。

更新时间: 2024-10-24 12:33:53

领域: cs.LG,cs.CR,cs.IR

下载: http://arxiv.org/abs/2212.01233v3

Distributed and Secure Kernel-Based Quantum Machine Learning

Quantum computing promises to revolutionize machine learning, offering significant efficiency gains in tasks such as clustering and distance estimation. Additionally, it provides enhanced security through fundamental principles like the measurement postulate and the no-cloning theorem, enabling secure protocols such as quantum teleportation and quantum key distribution. While advancements in secure quantum machine learning are notable, the development of secure and distributed quantum analogues of kernel-based machine learning techniques remains underexplored. In this work, we present a novel approach for securely computing common kernels, including polynomial, radial basis function (RBF), and Laplacian kernels, when data is distributed, using quantum feature maps. Our methodology introduces a robust framework that leverages quantum teleportation to ensure secure and distributed kernel learning. The proposed architecture is validated using IBM's Qiskit Aer Simulator on various public datasets.

Updated: 2024-10-24 12:33:41

标题: 分布式和安全的基于内核的量子机器学习

摘要: 量子计算有望彻底改变机器学习，为聚类和距离估计等任务提供显著的效率提升。此外，它通过测量原理和无克隆定理等基本原理提供了增强的安全性，从而实现了量子传输和量子密钥分发等安全协议。虽然安全量子机器学习的进展值得注意，但安全和分布式的基于核的量子机器学习技术的发展仍未被充分探讨。在这项工作中，我们提出了一种新颖的方法，用于在数据分布时使用量子特征映射安全计算常见的核函数，包括多项式、径向基函数（RBF）和拉普拉斯核。我们的方法引入了一个稳健的框架，利用量子传输来确保安全和分布式的核学习。所提出的架构在IBM的Qiskit Aer模拟器上利用各种公共数据集进行验证。

更新时间: 2024-10-24 12:33:41

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2408.10265v2

Hierarchical Multimodal LLMs with Semantic Space Alignment for Enhanced Time Series Classification

Leveraging large language models (LLMs) has garnered increasing attention and introduced novel perspectives in time series classification. However, existing approaches often overlook the crucial dynamic temporal information inherent in time series data and face challenges in aligning this data with textual semantics. To address these limitations, we propose HiTime, a hierarchical multi-modal model that seamlessly integrates temporal information into LLMs for multivariate time series classification (MTSC). Our model employs a hierarchical feature encoder to capture diverse aspects of time series data through both data-specific and task-specific embeddings. To facilitate semantic space alignment between time series and text, we introduce a dual-view contrastive alignment module that bridges the gap between modalities. Additionally, we adopt a hybrid prompting strategy to fine-tune the pre-trained LLM in a parameter-efficient manner. By effectively incorporating dynamic temporal features and ensuring semantic alignment, HiTime enables LLMs to process continuous time series data and achieves state-of-the-art classification performance through text generation. Extensive experiments on benchmark datasets demonstrate that HiTime significantly enhances time series classification accuracy compared to most competitive baseline methods. Our findings highlight the potential of integrating temporal features into LLMs, paving the way for advanced time series analysis. The code is publicly available for further research and validation. Our codes are publicly available1.

Updated: 2024-10-24 12:32:19

标题: 多模态分层LLMs与语义空间对齐用于增强时间序列分类

摘要: 利用大型语言模型（LLMs）在时间序列分类中引起了越来越多的关注，并引入了新颖的视角。然而，现有方法往往忽视了时间序列数据中固有的关键动态时间信息，并面临将这些数据与文本语义对齐的挑战。为了解决这些限制，我们提出了HiTime，一种将时间信息无缝集成到LLMs中用于多变量时间序列分类（MTSC）的分层多模态模型。我们的模型采用分层特征编码器来通过数据特定和任务特定的嵌入捕获时间序列数据的多个方面。为了促进时间序列和文本之间的语义空间对齐，我们引入了一个双视图对比对齐模块，弥合了模态之间的差距。此外，我们采用了混合提示策略，以参数高效的方式微调预训练的LLM。通过有效地整合动态时间特征并确保语义对齐，HiTime使LLMs能够处理连续时间序列数据，并通过文本生成实现了最先进的分类性能。对基准数据集的广泛实验表明，与大多数竞争基线方法相比，HiTime显著提高了时间序列分类准确性。我们的研究结果突显了将时间特征整合到LLMs中的潜力，为先进的时间序列分析铺平了道路。代码已公开供进一步研究和验证。我们的代码已公开可用。

更新时间: 2024-10-24 12:32:19

领域: cs.LG

下载: http://arxiv.org/abs/2410.18686v1

Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models

In this work, we introduce Pixelsmith, a zero-shot text-to-image generative framework to sample images at higher resolutions with a single GPU. We are the first to show that it is possible to scale the output of a pre-trained diffusion model by a factor of 1000, opening the road for gigapixel image generation at no additional cost. Our cascading method uses the image generated at the lowest resolution as a baseline to sample at higher resolutions. For the guidance, we introduce the Slider, a tunable mechanism that fuses the overall structure contained in the first-generated image with enhanced fine details. At each inference step, we denoise patches rather than the entire latent space, minimizing memory demands such that a single GPU can handle the process, regardless of the image's resolution. Our experimental results show that Pixelsmith not only achieves higher quality and diversity compared to existing techniques, but also reduces sampling time and artifacts. The code for our work is available at https://github.com/Thanos-DB/Pixelsmith.

Updated: 2024-10-24 12:31:09

标题: 一个GPU足够吗？使用基础模型推动更高分辨率的图像生成

摘要: 在这项工作中，我们介绍了Pixelsmith，一个零样本文本到图像生成框架，可以在单个GPU上以更高分辨率采样图像。我们是第一个展示通过扩大1000倍预训练扩散模型输出的可能性的人，为生成千兆像素图像开辟了道路，而无需额外成本。我们的级联方法使用以最低分辨率生成的图像作为基线，以在更高分辨率上进行采样。为了辅助，我们引入了Slider，一个可调节的机制，将第一次生成的图像中包含的整体结构与增强的细节结合起来。在每个推断步骤中，我们去噪声小块而不是整个潜在空间，最小化内存需求，使单个GPU可以处理该过程，无论图像的分辨率如何。我们的实验结果表明，与现有技术相比，Pixelsmith不仅实现了更高质量和多样性，还减少了采样时间和伪影。我们的工作代码可在https://github.com/Thanos-DB/Pixelsmith 上找到。

更新时间: 2024-10-24 12:31:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.07251v3

Rigid Single-Slice-in-Volume registration via rotation-equivariant 2D/3D feature matching

2D to 3D registration is essential in tasks such as diagnosis, surgical navigation, environmental understanding, navigation in robotics, autonomous systems, or augmented reality. In medical imaging, the aim is often to place a 2D image in a 3D volumetric observation to w. Current approaches for rigid single slice in volume registration are limited by requirements such as pose initialization, stacks of adjacent slices, or reliable anatomical landmarks. Here, we propose a self-supervised 2D/3D registration approach to match a single 2D slice to the corresponding 3D volume. The method works in data without anatomical priors such as images of tumors. It addresses the dimensionality disparity and establishes correspondences between 2D in-plane and 3D out-of-plane rotation-equivariant features by using group equivariant CNNs. These rotation-equivariant features are extracted from the 2D query slice and aligned with their 3D counterparts. Results demonstrate the robustness of the proposed slice-in-volume registration on the NSCLC-Radiomics CT and KIRBY21 MRI datasets, attaining an absolute median angle error of less than 2 degrees and a mean-matching feature accuracy of 89% at a tolerance of 3 pixels.

Updated: 2024-10-24 12:24:27

标题: 通过旋转等变的2D/3D特征匹配进行刚性单切片体积配准

摘要: 2D到3D的配准在诊断、手术导航、环境理解、机器人导航、自主系统或增强现实等任务中至关重要。在医学成像中，常常旨在将2D图像放置在3D体积观察中。当前刚性单层体积配准的方法受到姿态初始化、相邻切片堆叠或可靠解剖标志物等要求的限制。在这里，我们提出了一种自监督的2D/3D配准方法，将单个2D切片与相应的3D体积匹配。该方法适用于没有肿瘤图像等解剖先验信息的数据。它通过使用群等变CNN建立了2D平面和3D非平面旋转等变特征之间的对应关系，从而解决了维度差异。这些旋转等变特征是从2D查询切片中提取出来的，并与它们的3D对应物对齐。结果表明，在NSCLC-Radiomics CT和KIRBY21 MRI数据集上，提出的切片-体积配准方法具有良好的鲁棒性，绝对中值角误差小于2度，并在容差为3像素时具有89%的平均匹配特征准确率。

更新时间: 2024-10-24 12:24:27

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18683v1

COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis

Program synthesis methods, whether formal or neural-based, lack fine-grained control and flexible modularity, which limits their adaptation to complex software development. These limitations stem from rigid Domain-Specific Language (DSL) frameworks and neural network incorrect predictions. To this end, we propose the Chain of Logic (CoL), which organizes synthesis stages into a chain and provides precise heuristic control to guide the synthesis process. Furthermore, by integrating neural networks with libraries and introducing a Neural Network Feedback Control (NNFC) mechanism, our approach modularizes synthesis and mitigates the impact of neural network mispredictions. Experiments on relational and symbolic synthesis tasks show that CoL significantly enhances the efficiency and reliability of DSL program synthesis across multiple metrics. Specifically, CoL improves accuracy by 70% while reducing tree operations by 91% and time by 95%. Additionally, NNFC further boosts accuracy by 6%, with a 64% reduction in tree operations under challenging conditions such as insufficient training data, increased difficulty, and multidomain synthesis. These improvements confirm COOL as a highly efficient and reliable program synthesis framework.

Updated: 2024-10-24 12:16:31

标题: COOL: 高效可靠的面向链式目标逻辑，具有神经网络反馈控制用于程序合成

摘要: 程序综合方法，无论是形式化还是基于神经网络的，缺乏精细的控制和灵活的模块化，这限制了它们对复杂软件开发的适应性。这些限制源自于刚性的领域特定语言（DSL）框架和神经网络错误预测。为此，我们提出了逻辑链（CoL），将综合阶段组织成一条链，并提供精确的启发式控制来引导综合过程。此外，通过将神经网络与库集成，并引入神经网络反馈控制（NNFC）机制，我们的方法模块化综合，并减轻了神经网络错误预测的影响。在关系和符号综合任务上的实验表明，CoL显著提高了DSL程序综合在多个指标上的效率和可靠性。具体来说，CoL将准确率提高了70%，同时将树操作减少了91%，时间减少了95%。此外，NNFC在挑战性条件下，如训练数据不足、难度增加和多领域综合时，进一步提高了准确率6%，减少了64%的树操作。这些改进证实了COOL作为一个高效可靠的程序综合框架。

更新时间: 2024-10-24 12:16:31

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2410.13874v2

Neural Concept Binder

The challenge in object-based visual reasoning lies in generating concept representations that are both descriptive and distinct. Achieving this in an unsupervised manner requires human users to understand the model's learned concepts and, if necessary, revise incorrect ones. To address this challenge, we introduce the Neural Concept Binder (NCB), a novel framework for deriving both discrete and continuous concept representations, which we refer to as "concept-slot encodings". NCB employs two types of binding: "soft binding", which leverages the recent SysBinder mechanism to obtain object-factor encodings, and subsequent "hard binding", achieved through hierarchical clustering and retrieval-based inference. This enables obtaining expressive, discrete representations from unlabeled images. Moreover, the structured nature of NCB's concept representations allows for intuitive inspection and the straightforward integration of external knowledge, such as human input or insights from other AI models like GPT-4. Additionally, we demonstrate that incorporating the hard binding mechanism preserves model performance while enabling seamless integration into both neural and symbolic modules for complex reasoning tasks. We validate the effectiveness of NCB through evaluations on our newly introduced CLEVR-Sudoku dataset.

Updated: 2024-10-24 12:13:54

标题: 神经概念绑定器

摘要: 目标基视觉推理中的挑战在于生成既具有描述性又具有区分性的概念表示。以无监督方式实现这一目标需要人类用户理解模型学习的概念，并在必要时修正错误的概念。为了应对这一挑战，我们引入了神经概念绑定器（NCB），这是一个新颖的框架，用于获取离散和连续的概念表示，我们将其称为“概念槽编码”。NCB采用两种绑定方式：“软绑定”，利用最近的SysBinder机制获取对象因子编码，以及随后的“硬绑定”，通过层次聚类和基于检索的推理实现。这使得可以从未标记的图像中获得富有表现力的离散表示。此外，NCB的概念表示的结构化特性允许直观检查和外部知识的简单整合，例如人类输入或来自其他AI模型（如GPT-4）的见解。此外，我们证明了整合硬绑定机制可以保持模型性能，同时实现对复杂推理任务的神经和符号模块的无缝整合。我们通过对我们新引入的CLEVR-Sudoku数据集进行评估验证了NCB的有效性。

更新时间: 2024-10-24 12:13:54

领域: cs.AI,cs.LG,cs.SC

下载: http://arxiv.org/abs/2406.09949v2

Ali-AUG: Innovative Approaches to Labeled Data Augmentation using One-Step Diffusion Model

This paper introduces Ali-AUG, a novel single-step diffusion model for efficient labeled data augmentation in industrial applications. Our method addresses the challenge of limited labeled data by generating synthetic, labeled images with precise feature insertion. Ali-AUG utilizes a stable diffusion architecture enhanced with skip connections and LoRA modules to efficiently integrate masks and images, ensuring accurate feature placement without affecting unrelated image content. Experimental validation across various industrial datasets demonstrates Ali-AUG's superiority in generating high-quality, defect-enhanced images while maintaining rapid single-step inference. By offering precise control over feature insertion and minimizing required training steps, our technique significantly enhances data augmentation capabilities, providing a powerful tool for improving the performance of deep learning models in scenarios with limited labeled data. Ali-AUG is especially useful for use cases like defective product image generation to train AI-based models to improve their ability to detect defects in manufacturing processes. Using different data preparation strategies, including Classification Accuracy Score (CAS) and Naive Augmentation Score (NAS), we show that Ali-AUG improves model performance by 31% compared to other augmentation methods and by 45% compared to models without data augmentation. Notably, Ali-AUG reduces training time by 32% and supports both paired and unpaired datasets, enhancing flexibility in data preparation.

Updated: 2024-10-24 12:12:46

标题: Ali-AUG：使用一步扩散模型进行标记数据增强的创新方法

摘要: 这篇论文介绍了Ali-AUG，一种新颖的单步扩散模型，用于工业应用中高效的标记数据增强。我们的方法通过生成具有精确特征插入的合成标记图像，解决了有限标记数据的挑战。Ali-AUG利用稳定的扩散架构，增强了跳跃连接和LoRA模块，以高效地将掩模和图像整合在一起，确保准确的特征放置而不影响无关的图像内容。在各种工业数据集上进行的实验证明了Ali-AUG在生成高质量、缺陷增强图像方面的优越性，同时保持快速的单步推理。通过提供对特征插入的精确控制并最小化所需的训练步骤，我们的技术显著增强了数据增强能力，为改善深度学习模型在有限标记数据场景中的性能提供了强大的工具。Ali-AUG特别适用于缺陷产品图像生成等用例，以训练基于AI的模型，提高其检测制造过程中缺陷的能力。通过使用不同的数据准备策略，包括分类准确度评分（CAS）和简单增强评分（NAS），我们展示了与其他增强方法相比，Ali-AUG将模型性能提高了31％，与没有数据增强的模型相比提高了45％。值得注意的是，Ali-AUG将训练时间缩短了32％，并支持配对和非配对数据集，增强了数据准备的灵活性。

更新时间: 2024-10-24 12:12:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.18678v1

Enhancing pretraining efficiency for medical image segmentation via transferability metrics

In medical image segmentation tasks, the scarcity of labeled training data poses a significant challenge when training deep neural networks. When using U-Net-style architectures, it is common practice to address this problem by pretraining the encoder part on a large general-purpose dataset like ImageNet. However, these methods are resource-intensive and do not guarantee improved performance on the downstream task. In this paper we investigate a variety of training setups on medical image segmentation datasets, using ImageNet-pretrained models. By examining over 300 combinations of models, datasets, and training methods, we find that shorter pretraining often leads to better results on the downstream task, providing additional proof to the well-known fact that the accuracy of the model on ImageNet is a poor indicator for downstream performance. As our main contribution, we introduce a novel transferability metric, based on contrastive learning, that measures how robustly a pretrained model is able to represent the target data. In contrast to other transferability scores, our method is applicable to the case of transferring from ImageNet classification to medical image segmentation. We apply our robustness score by measuring it throughout the pretraining phase to indicate when the model weights are optimal for downstream transfer. This reduces pretraining time and improves results on the target task.

Updated: 2024-10-24 12:11:52

标题: 通过可转移性度量增强医学图像分割的预训练效率

摘要: 在医学图像分割任务中，标记训练数据的稀缺性在训练深度神经网络时构成了一个重要挑战。在使用U-Net风格架构时，通常的做法是通过在类似ImageNet这样的大型通用数据集上对编码器部分进行预训练来解决这个问题。然而，这些方法需要大量资源，并不能保证在下游任务上表现更好。本文研究了在医学图像分割数据集上使用ImageNet预训练模型的各种训练设置。通过检查300多种模型、数据集和训练方法的组合，我们发现较短的预训练通常会在下游任务上获得更好的结果，进一步证实了一个众所周知的事实，即模型在ImageNet上的准确性并不是下游性能的良好指标。作为我们的主要贡献，我们引入了一种基于对比学习的新型可迁移度指标，用于衡量预训练模型能够如何稳健地表示目标数据。与其他可迁移性评分不同，我们的方法适用于从ImageNet分类到医学图像分割的情况。我们通过在整个预训练阶段测量我们的稳健性评分来应用它，以指示模型权重何时达到下游转移的最佳状态。这减少了预训练时间并改善了目标任务的结果。

更新时间: 2024-10-24 12:11:52

领域: cs.CV,cs.LG,eess.IV,I.4.6

下载: http://arxiv.org/abs/2410.18677v1

Homomorphism Counts as Structural Encodings for Graph Learning

Graph Transformers are popular neural networks that extend the well-known Transformer architecture to the graph domain. These architectures operate by applying self-attention on graph nodes and incorporating graph structure through the use of positional encodings (e.g., Laplacian positional encoding) or structural encodings (e.g., random-walk structural encoding). The quality of such encodings is critical, since they provide the necessary $\textit{graph inductive biases}$ to condition the model on graph structure. In this work, we propose $\textit{motif structural encoding}$ (MoSE) as a flexible and powerful structural encoding framework based on counting graph homomorphisms. Theoretically, we compare the expressive power of MoSE to random-walk structural encoding and relate both encodings to the expressive power of standard message passing neural networks. Empirically, we observe that MoSE outperforms other well-known positional and structural encodings across a range of architectures, and it achieves state-of-the-art performance on widely studied molecular property prediction datasets.

Updated: 2024-10-24 12:09:01

标题: 同态计数作为图学习的结构编码

摘要: 图形变压器是流行的神经网络，它将著名的Transformer架构扩展到图领域。这些架构通过在图节点上应用自注意力并通过使用位置编码（例如，拉普拉斯位置编码）或结构编码（例如，随机游走结构编码）来整合图结构。这些编码的质量至关重要，因为它们为模型提供了必要的图归纳偏差，以使模型基于图结构。在这项工作中，我们提出了基于计数图同态的灵活而强大的结构编码框架“模型结构编码”（MoSE）。从理论上讲，我们比较了MoSE的表达能力与随机游走结构编码的表达能力，并将这两种编码与标准消息传递神经网络的表达能力联系起来。在实证方面，我们观察到MoSE在各种架构上优于其他知名的位置和结构编码，并且它在广泛研究的分子性质预测数据集上取得了最先进的性能。

更新时间: 2024-10-24 12:09:01

领域: cs.LG

下载: http://arxiv.org/abs/2410.18676v1

Rethinking Randomized Smoothing from the Perspective of Scalability

Machine learning models have demonstrated remarkable success across diverse domains but remain vulnerable to adversarial attacks. Empirical defense mechanisms often fail, as new attacks constantly emerge, rendering existing defenses obsolete, shifting the focus to certification-based defenses. Randomized smoothing has emerged as a promising technique among notable advancements. This study reviews the theoretical foundations and empirical effectiveness of randomized smoothing and its derivatives in verifying machine learning classifiers from a perspective of scalability. We provide an in-depth exploration of the fundamental concepts underlying randomized smoothing, highlighting its theoretical guarantees in certifying robustness against adversarial perturbations and discuss the challenges of existing methodologies.

Updated: 2024-10-24 12:03:15

标题: 重新考虑从可扩展性的角度看随机平滑

摘要: 机器学习模型在各个领域取得了显著的成功，但仍然容易受到对抗性攻击的影响。经验性的防御机制经常失败，因为新的攻击不断出现，使得现有的防御变得过时，将焦点转向基于认证的防御。随机平滑技术已经成为显著进展中的一种有希望的技术。本研究从可扩展性的角度审查了随机平滑及其衍生物在验证机器学习分类器方面的理论基础和经验有效性。我们深入探讨了随机平滑背后的基本概念，突出了其在认证对抗性扰动的鲁棒性方面的理论保证，并讨论了现有方法的挑战。

更新时间: 2024-10-24 12:03:15

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2312.12608v2

Health Misinformation in Social Networks: A Survey of IT Approaches

In this paper, we present a comprehensive survey on the pervasive issue of medical misinformation in social networks from the perspective of information technology. The survey aims at providing a systematic review of related research and helping researchers and practitioners navigate through this fast-changing field. Specifically, we first present manual and automatic approaches for fact-checking. We then explore fake news detection methods, using content, propagation features, or source features, as well as mitigation approaches for countering the spread of misinformation. We also provide a detailed list of several datasets on health misinformation and of publicly available tools. We conclude the survey with a discussion on the open challenges and future research directions in the battle against health misinformation.

Updated: 2024-10-24 12:00:51

标题: 社交网络中的健康信息误传：信息技术方法调查

摘要: 在这篇论文中，我们从信息技术的角度全面调查了社交网络中医疗信息不准确的普遍问题。该调查旨在对相关研究进行系统性回顾，帮助研究人员和从业者在这个快速变化的领域中导航。具体来说，我们首先介绍了手动和自动事实核查的方法。然后我们探讨了检测假新闻的方法，使用内容、传播特征或来源特征，以及对抗错误信息传播的缓解方法。我们还提供了几个健康错误信息数据集和公开可用工具的详细清单。最后，我们通过讨论健康错误信息斗争中的挑战和未来研究方向来总结调查。

更新时间: 2024-10-24 12:00:51

领域: cs.SI,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.18670v1

3D Shape Completion with Test-Time Training

This work addresses the problem of \textit{shape completion}, i.e., the task of restoring incomplete shapes by predicting their missing parts. While previous works have often predicted the fractured and restored shape in one step, we approach the task by separately predicting the fractured and newly restored parts, but ensuring these predictions are interconnected. We use a decoder network motivated by related work on the prediction of signed distance functions (DeepSDF). In particular, our representation allows us to consider test-time-training, i.e., finetuning network parameters to match the given incomplete shape more accurately during inference. While previous works often have difficulties with artifacts around the fracture boundary, we demonstrate that our overfitting to the fractured parts leads to significant improvements in the restoration of eight different shape categories of the ShapeNet data set in terms of their chamfer distances.

Updated: 2024-10-24 11:59:32

标题: 3D形状完整性的测试时间训练

摘要: 这项工作解决了“形状完成”的问题，即通过预测缺失部分来恢复不完整的形状。虽然先前的研究经常在一步中预测破碎和恢复的形状，我们通过分别预测破碎和新恢复的部分来处理这个任务，但确保这些预测是相互关联的。我们使用了一个受到有关预测符号距离函数（DeepSDF）的相关工作启发的解码器网络。特别地，我们的表示允许我们考虑测试时训练，即微调网络参数以在推断过程中更准确地匹配给定的不完整形状。虽然先前的工作常常在破裂边界周围出现问题，我们展示了我们对破碎部分的过度拟合导致在ShapeNet数据集的八种不同形状类别的恢复中在Chamfer距离方面取得了显著改进。

更新时间: 2024-10-24 11:59:32

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18668v1

Fundamental computational limits of weak learnability in high-dimensional multi-index models

Multi-index models - functions which only depend on the covariates through a non-linear transformation of their projection on a subspace - are a useful benchmark for investigating feature learning with neural nets. This paper examines the theoretical boundaries of efficient learnability in this hypothesis class, focusing on the minimum sample complexity required for weakly recovering their low-dimensional structure with first-order iterative algorithms, in the high-dimensional regime where the number of samples $n\!=\!\alpha d$ is proportional to the covariate dimension $d$. Our findings unfold in three parts: (i) we identify under which conditions a trivial subspace can be learned with a single step of a first-order algorithm for any $\alpha\!>\!0$; (ii) if the trivial subspace is empty, we provide necessary and sufficient conditions for the existence of an easy subspace where directions that can be learned only above a certain sample complexity $\alpha\!>\!\alpha_c$, where $\alpha_{c}$ marks a computational phase transition. In a limited but interesting set of really hard directions -- akin to the parity problem -- $\alpha_c$ is found to diverge. Finally, (iii) we show that interactions between different directions can result in an intricate hierarchical learning phenomenon, where directions can be learned sequentially when coupled to easier ones. We discuss in detail the grand staircase picture associated to these functions (and contrast it with the original staircase one). Our theory builds on the optimality of approximate message-passing among first-order iterative methods, delineating the fundamental learnability limit across a broad spectrum of algorithms, including neural networks trained with gradient descent, which we discuss in this context.

Updated: 2024-10-24 11:58:59

标题: 高维多索引模型中弱可学习性的基本计算上限

摘要: 多指数模型-仅依赖于协变量通过它们在子空间上的投影的非线性变换的函数-是研究神经网络特征学习的有用基准。本文考察了在这种假设类中高效可学习性的理论边界，重点关注使用一阶迭代算法弱恢复其低维结构所需的最小样本复杂度，在样本数量$n=\alpha d$与协变量维度$d$成比例的高维度区域。我们的发现分为三部分：(i)我们确定了在什么条件下，任何$\alpha>0$，一个微不足道的子空间可以通过一阶算法的单步学习；(ii)如果微不足道的子空间为空，我们提供了易学习子空间存在的必要和充分条件，这些方向只能在某个样本复杂度$\alpha>\alpha_c$以上学习，其中$\alpha_{c}$标志着计算相变。在一组真正困难的方向中，类似于奇偶问题，$\alpha_c$被发现发散。最后，(iii)我们展示了不同方向之间的相互作用可以导致一个复杂的层次学习现象，当与更容易的方向耦合时，方向可以被顺序学习。我们详细讨论了与这些函数相关的大楼梯图像（并将其与原始楼梯图像进行对比）。我们的理论建立在一阶迭代方法中近似传递消息的最优性基础上，勾勒出横跨一系列算法的基本可学习性极限，包括使用梯度下降训练的神经网络，在此背景下我们进行了讨论。

更新时间: 2024-10-24 11:58:59

领域: cs.LG,cond-mat.dis-nn,cs.CC

下载: http://arxiv.org/abs/2405.15480v3

NIDS Neural Networks Using Sliding Time Window Data Processing with Trainable Activations and its Generalization Capability

This paper presents neural networks for network intrusion detection systems (NIDS), that operate on flow data preprocessed with a time window. It requires only eleven features which do not rely on deep packet inspection and can be found in most NIDS datasets and easily obtained from conventional flow collectors. The time window aggregates information with respect to hosts facilitating the identification of flow signatures that are missed by other aggregation methods. Several network architectures are studied and the use of Kalmogorov-Arnold Network (KAN)-inspired trainable activation functions that help to achieve higher accuracy with simpler network structure is proposed. The reported training accuracy exceeds 99% for the proposed method with as little as twenty neural network input features. This work also studies the generalization capability of NIDS, a crucial aspect that has not been adequately addressed in the previous studies. The generalization experiments are conducted using CICIDS2017 dataset and a custom dataset collected as part of this study. It is shown that the performance metrics decline significantly when changing datasets, and the reduction in performance metrics can be attributed to the difference in signatures of the same type flows in different datasets, which in turn can be attributed to the differences between the underlying networks. It is shown that the generalization accuracy of some neural networks can be very unstable and sensitive to random initialization parameters, and neural networks with fewer parameters and well-tuned activations are more stable and achieve higher accuracy.

Updated: 2024-10-24 11:36:19

标题: NIDS神经网络使用滑动时间窗口数据处理与可训练激活函数及其泛化能力

摘要: 本文提出了用于网络入侵检测系统（NIDS）的神经网络，它们在经过时间窗口预处理的流数据上运行。它仅需要十一个特征，不依赖深度包检查，并且可以在大多数NIDS数据集中找到，并且可以轻松从传统流收集器中获得。时间窗口根据主机聚合信息，有助于识别其他聚合方法忽略的流特征。研究了几种网络架构，并提出了使用Kalmogorov-Arnold Network（KAN）启发的可训练激活函数，有助于实现更高的准确性与更简单的网络结构。所提出的方法的训练准确度超过了99％，仅使用二十个神经网络输入特征。本研究还研究了NIDS的泛化能力，这是以前研究中尚未充分解决的关键方面。使用CICIDS2017数据集和本研究的一部分收集的自定义数据集进行了泛化实验。结果显示，更改数据集时性能指标显着下降，性能指标的降低可以归因于不同数据集中相同类型流的特征差异，这又可以归因于底层网络之间的差异。结果显示，一些神经网络的泛化准确性可能非常不稳定，并对随机初始化参数敏感，具有更少参数和良好调整激活的神经网络更稳定且能实现更高的准确性。

更新时间: 2024-10-24 11:36:19

领域: cs.LG,cs.CR,G.2.2

下载: http://arxiv.org/abs/2410.18658v1

Learning dissipative Hamiltonian dynamics with reproducing kernel Hilbert spaces and random Fourier features

This paper presents a new method for learning dissipative Hamiltonian dynamics from a limited and noisy dataset. The method uses the Helmholtz decomposition to learn a vector field as the sum of a symplectic and a dissipative vector field. The two vector fields are learned using two reproducing kernel Hilbert spaces, defined by a symplectic and a curl-free kernel, where the kernels are specialized to enforce odd symmetry. Random Fourier features are used to approximate the kernels to reduce the dimension of the optimization problem. The performance of the method is validated in simulations for two dissipative Hamiltonian systems, and it is shown that the method improves predictive accuracy significantly compared to a method where a Gaussian separable kernel is used.

Updated: 2024-10-24 11:35:39

标题: 学习耗散哈密顿动力学的再生核希尔伯特空间和随机傅里叶特征

摘要: 本文提出了一种从有限且噪声数据集中学习耗散哈密顿动力学的新方法。该方法利用Helmholtz分解来学习一个矢量场，将其表示为一个辛矢量场和一个耗散矢量场的和。这两个矢量场分别使用两个再生核希尔伯特空间来学习，这两个空间由一个辛核和一个无旋核定义，其中核函数专门设计为强制实现奇对称性。随机傅里叶特征被用来近似核函数，以减少优化问题的维度。该方法在两个耗散哈密顿系统的仿真中经过验证，结果表明相比使用高斯可分核的方法，该方法显著提高了预测准确性。

更新时间: 2024-10-24 11:35:39

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.18656v1

Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework

Open-ended text generation has become a prominent task in natural language processing due to the rise of powerful (large) language models. However, evaluating the quality of these models and the employed decoding strategies remains challenging because of trade-offs among widely used metrics such as coherence, diversity, and perplexity. Decoding methods often excel in some metrics while underperforming in others, complicating the establishment of a clear ranking. In this paper, we present novel ranking strategies within this multicriteria framework. Specifically, we employ benchmarking approaches based on partial orderings and present a new summary metric designed to balance existing automatic indicators, providing a more holistic evaluation of text generation quality. Furthermore, we discuss the alignment of these approaches with human judgments. Our experiments demonstrate that the proposed methods offer a robust way to compare decoding strategies, exhibit similarities with human preferences, and serve as valuable tools in guiding model selection for open-ended text generation tasks. Finally, we suggest future directions for improving evaluation methodologies in text generation. Our codebase, datasets, and models are publicly available.

Updated: 2024-10-24 11:32:01

标题: 朝着更好的开放式文本生成方向：多标准评估框架

摘要: 开放式文本生成由于强大（大型）语言模型的兴起已成为自然语言处理中的一个重要任务。然而，评估这些模型和所采用的解码策略的质量仍然具有挑战性，因为广泛使用的指标之间存在协调性、多样性和困惑度等权衡。解码方法在某些指标上表现出色，而在其他方面表现不佳，使得建立明确的排名变得复杂。在本文中，我们提出了在多准则框架内的新型排名策略。具体来说，我们采用基于偏序的基准方法，并提出了一种旨在平衡现有自动指标的新摘要指标，从而提供对文本生成质量的更全面评估。此外，我们讨论了这些方法与人类判断的一致性。我们的实验证明，所提出的方法提供了一种强大的比较解码策略的方式，展示出与人类偏好的相似性，并作为在指导开放式文本生成任务的模型选择中的有价值的工具。最后，我们提出了改进文本生成评估方法的未来方向。我们的代码库、数据集和模型是公开可用的。

更新时间: 2024-10-24 11:32:01

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.18653v1

$C^2$: Scalable Auto-Feedback for LLM-based Chart Generation

Generating high-quality charts with Large Language Models presents significant challenges due to limited data and the high cost of scaling through human curation. Instruction, data, and code triplets are scarce and expensive to manually curate as their creation demands technical expertise. To address this scalability issue, we introduce a reference-free automatic feedback generator, which eliminates the need for costly human intervention. Our novel framework, $C^2$, consists of (1) an automatic feedback provider (ChartAF) and (2) a diverse, reference-free dataset (ChartUIE-8K). Quantitative results are compelling: in our first experiment, 74% of respondents strongly preferred, and 10% preferred, the results after feedback. The second post-feedback experiment demonstrates that ChartAF outperforms nine baselines. Moreover, ChartUIE-8K significantly improves data diversity by increasing queries, datasets, and chart types by 5982%, 1936%, and 91%, respectively, over benchmarks. Finally, an LLM user study revealed that 94% of participants preferred ChartUIE-8K's queries, with 93% deeming them aligned with real-world use cases. Core contributions are available as open-source at an anonymized project site, with ample qualitative examples.

Updated: 2024-10-24 11:32:00

标题: $C^2$: 可扩展的基于LLM的图表生成的自动反馈

摘要: 使用大型语言模型生成高质量图表面临着重大挑战，原因在于数据有限以及通过人工筛选进行扩展的高昂成本。指导、数据和代码三元组稀缺且昂贵，手动筛选的创建需要技术专业知识。为了解决这一可扩展性问题，我们引入了一个无需昂贵人工干预的无参考自动反馈生成器。我们的新框架$C^2$包括(1)一个自动反馈提供程序(ChartAF)和(2)一个多样化的、无参考的数据集(ChartUIE-8K)。定量结果令人信服：在我们的第一个实验中，74%的受访者强烈偏好，10%偏好反馈后的结果。第二次反馈后的实验表明ChartAF胜过了九个基线。此外，ChartUIE-8K通过在基准上分别增加5982%、1936%和91%的查询、数据集和图表类型，显著提高了数据多样性。最后，LLM用户研究显示，94%的参与者更喜欢ChartUIE-8K的查询，93%认为它们与真实世界用例一致。核心贡献可在一个匿名项目网站上以开源形式获得，附有丰富的定性示例。

更新时间: 2024-10-24 11:32:00

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18652v1

Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts

LLMs show remarkable emergent abilities, such as inferring concepts from presumably out-of-distribution prompts, known as in-context learning. Though this success is often attributed to the Transformer architecture, our systematic understanding is limited. In complex real-world data sets, even defining what is out-of-distribution is not obvious. To better understand the OOD behaviour of autoregressive LLMs, we focus on formal languages, which are defined by the intersection of rules. We define a new scenario of OOD compositional generalization, termed rule extrapolation. Rule extrapolation describes OOD scenarios, where the prompt violates at least one rule. We evaluate rule extrapolation in formal languages with varying complexity in linear and recurrent architectures, the Transformer, and state space models to understand the architectures' influence on rule extrapolation. We also lay the first stones of a normative theory of rule extrapolation, inspired by the Solomonoff prior in algorithmic information theory.

Updated: 2024-10-24 11:30:33

标题: 语言模型中的规则外推：关于OOD提示上的组合概括研究

摘要: LLMs表现出卓越的新兴能力，例如从可能属于分布之外的提示中推断概念，即所谓的上下文学习。尽管这种成功通常被归因于Transformer架构，但我们对其系统理解有限。在复杂的真实世界数据集中，甚至确定何为分布之外并不明显。为了更好地理解自回归LLMs的OOD行为，我们专注于形式语言，这些语言由规则的交集定义。我们定义了一种新的OOD组合泛化场景，称为规则外推。规则外推描述了在提示违反至少一个规则的OOD场景。我们在具有不同复杂性的形式语言中评估了规则外推，在线性和递归架构、Transformer和状态空间模型中了解架构对规则外推的影响。我们还借鉴了算法信息论中的Solomonoff先验，为规则外推提出了一种规范理论的第一步。

更新时间: 2024-10-24 11:30:33

领域: cs.CL,cs.LG,stat.ML

下载: http://arxiv.org/abs/2409.13728v2

DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning

In this work, we address the challenge of developing automatic evaluation metrics for image captioning, with a particular focus on robustness against hallucinations. Existing metrics are often inadequate for handling hallucinations, primarily due to their limited ability to compare candidate captions with multifaceted reference captions. To address this shortcoming, we propose DENEB, a novel supervised automatic evaluation metric specifically robust against hallucinations. DENEB incorporates the Sim-Vec Transformer, a mechanism that processes multiple references simultaneously, thereby efficiently capturing the similarity between an image, a candidate caption, and reference captions. To train DENEB, we construct the diverse and balanced Nebula dataset comprising 32,978 images, paired with human judgments provided by 805 annotators. We demonstrated that DENEB achieves state-of-the-art performance among existing LLM-free metrics on the FOIL, Composite, Flickr8K-Expert, Flickr8K-CF, Nebula, and PASCAL-50S datasets, validating its effectiveness and robustness against hallucinations.

Updated: 2024-10-24 11:29:41

标题: DENEB：一种针对图像描述的幻觉鲁棒自动评估度量

摘要: 在这项工作中，我们解决了为图像字幕开发自动评估指标的挑战，特别关注对幻觉的稳健性。现有的指标通常无法处理幻觉，主要是因为它们有限的能力无法将候选字幕与多层次的参考字幕进行比较。为了解决这个缺点，我们提出了DENEB，一种新颖的受监督自动评估指标，特别针对幻觉具有稳健性。DENEB融合了Sim-Vec Transformer，这是一种处理多个参考同时的机制，从而有效地捕捉图像、候选字幕和参考字幕之间的相似性。为了训练DENEB，我们构建了包含32,978张图像的多样化且平衡的Nebula数据集，与805名标注者提供的人类判断配对。我们证明DENEB在FOIL、Composite、Flickr8K-Expert、Flickr8K-CF、Nebula和PASCAL-50S数据集中取得了现有无LLM指标的最新性能，验证了其对幻觉的有效性和稳健性。

更新时间: 2024-10-24 11:29:41

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.19255v2

MoESD: Mixture of Experts Stable Diffusion to Mitigate Gender Bias

Text-to-image models are known to propagate social biases. For example, when prompted to generate images of people in certain professions, these models tend to systematically generate specific genders or ethnicities. In this paper, we show that this bias is already present in the text encoder of the model and introduce a Mixture-of-Experts approach by identifying text-encoded bias in the latent space and then creating a Bias-Identification Gate mechanism. More specifically, we propose MoESD (Mixture of Experts Stable Diffusion) with BiAs (Bias Adapters) to mitigate gender bias in text-to-image models. We also demonstrate that introducing an arbitrary special token to the prompt is essential during the mitigation process. With experiments focusing on gender bias, we show that our approach successfully mitigates gender bias while maintaining image quality.

Updated: 2024-10-24 11:28:27

标题: MoESD：专家混合稳定扩散以减少性别偏见

摘要: 文本到图像模型被认为存在社会偏见。例如，当提示生成某些职业的人的图像时，这些模型往往会系统性地生成特定性别或种族。本文表明，这种偏见已经存在于模型的文本编码器中，并引入了一种专家混合方法，通过识别潜在空间中的文本编码偏见，然后创建一个偏见识别门机制。更具体地，我们提出了带有Bias Adapters的MoESD（专家稳定扩散）来减轻文本到图像模型中的性别偏见。我们还证明，在减轻过程中引入一个任意的特殊标记是必不可少的。通过重点关注性别偏见的实验，我们展示了我们的方法成功地减轻了性别偏见同时保持图像质量。

更新时间: 2024-10-24 11:28:27

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.11002v2

GADT: Enhancing Transferable Adversarial Attacks through Gradient-guided Adversarial Data Transformation

Current Transferable Adversarial Examples (TAE) are primarily generated by adding Adversarial Noise (AN). Recent studies emphasize the importance of optimizing Data Augmentation (DA) parameters along with AN, which poses a greater threat to real-world AI applications. However, existing DA-based strategies often struggle to find optimal solutions due to the challenging DA search procedure without proper guidance. In this work, we propose a novel DA-based attack algorithm, GADT. GADT identifies suitable DA parameters through iterative antagonism and uses posterior estimates to update AN based on these parameters. We uniquely employ a differentiable DA operation library to identify adversarial DA parameters and introduce a new loss function as a metric during DA optimization. This loss term enhances adversarial effects while preserving the original image content, maintaining attack crypticity. Extensive experiments on public datasets with various networks demonstrate that GADT can be integrated with existing transferable attack methods, updating their DA parameters effectively while retaining their AN formulation strategies. Furthermore, GADT can be utilized in other black-box attack scenarios, e.g., query-based attacks, offering a new avenue to enhance attacks on real-world AI applications in both research and industrial contexts.

Updated: 2024-10-24 11:21:49

标题: GADT：通过梯度引导的对抗性数据转换增强可转移的对抗性攻击

摘要: 目前可转移的对抗样本（TAE）主要是通过添加对抗性噪声（AN）生成的。最近的研究强调了优化数据增强（DA）参数与AN的重要性，这对现实世界的人工智能应用构成了更大的威胁。然而，现有的基于DA的策略通常由于缺乏适当的指导而难以找到最佳解决方案。在本文中，我们提出了一种新颖的基于DA的攻击算法，GADT。GADT通过迭代对抗来识别适当的DA参数，并利用后验估计来根据这些参数更新AN。我们独特地使用可微分的DA操作库来识别对抗性DA参数，并在DA优化过程中引入一个新的损失函数作为度量标准。这个损失项增强了对抗效果，同时保留了原始图像内容，保持了攻击的隐蔽性。在公共数据集上进行的大量实验表明，GADT可以与现有的可转移攻击方法集成，有效地更新它们的DA参数，同时保留它们的AN制定策略。此外，GADT还可以在其他黑盒攻击场景中使用，例如基于查询的攻击，为增强对现实世界中研究和工业应用的攻击提供了一个新途径。

更新时间: 2024-10-24 11:21:49

领域: cs.AI

下载: http://arxiv.org/abs/2410.18648v1

ReCAP: Recursive Cross Attention Network for Pseudo-Label Generation in Robotic Surgical Skill Assessment

In surgical skill assessment, the Objective Structured Assessments of Technical Skills (OSATS) and Global Rating Scale (GRS) are well-established tools for evaluating surgeons during training. These metrics, along with performance feedback, help surgeons improve and reach practice standards. Recent research on the open-source JIGSAWS dataset, which includes both GRS and OSATS labels, has focused on regressing GRS scores from kinematic data, video, or their combination. However, we argue that regressing GRS alone is limiting, as it aggregates OSATS scores and overlooks clinically meaningful variations during a surgical trial. To address this, we developed a recurrent transformer model that tracks a surgeon's performance throughout a session by mapping hidden states to six OSATS, derived from kinematic data, using a clinically motivated objective function. These OSATS scores are averaged to predict GRS, allowing us to compare our model's performance against state-of-the-art (SOTA) methods. We report Spearman's Correlation Coefficients (SCC) demonstrating that our model outperforms SOTA using kinematic data (SCC 0.83-0.88), and matches performance with video-based models. Our model also surpasses SOTA in most tasks for average OSATS predictions (SCC 0.46-0.70) and specific OSATS (SCC 0.56-0.95). The generation of pseudo-labels at the segment level translates quantitative predictions into qualitative feedback, vital for automated surgical skill assessment pipelines. A senior surgeon validated our model's outputs, agreeing with 77% of the weakly-supervised predictions (p=0.006).

Updated: 2024-10-24 11:18:24

标题: ReCAP：用于机器人手术技能评估中伪标签生成的递归交叉注意力网络

摘要: 在外科技能评估中，客观结构化技能评估（OSATS）和全局评分量表（GRS）是用于评估外科医生在培训期间的成熟工具。这些指标与绩效反馈一起帮助外科医生改进并达到实践标准。最近针对开源JIGSAWS数据集的研究，该数据集包括GRS和OSATS标签，专注于从运动数据、视频或二者的组合中回归GRS分数。然而，我们认为仅回归GRS是有限的，因为它聚合了OSATS分数，并忽视了在外科试验期间的临床意义变化。为了解决这个问题，我们开发了一种循环变换器模型，通过将隐藏状态映射到从运动数据中导出的六个OSATS，使用临床动机的目标函数，来跟踪外科医生在整个会话中的表现。这些OSATS分数被平均以预测GRS，使我们能够将我们的模型性能与最先进的方法进行比较。我们报告了斯皮尔曼相关系数（SCC），表明我们的模型在使用运动数据时优于最先进的方法（SCC为0.83-0.88），并与基于视频的模型性能相匹配。我们的模型在大多数任务中超越了最先进的方法，用于平均OSATS预测（SCC 0.46-0.70）和特定OSATS（SCC 0.56-0.95）。在段级别生成伪标签，将定量预测转化为定性反馈，这对于自动化外科技能评估管道至关重要。一名资深外科医生验证了我们模型的输出，同意77%的弱监督预测（p=0.006）。

更新时间: 2024-10-24 11:18:24

领域: cs.CV,cs.AI,cs.LG,eess.IV

下载: http://arxiv.org/abs/2407.05180v3

Forecasting trends in food security with real time data

Early warning systems are an essential tool for effective humanitarian action. Advance warnings on impending disasters facilitate timely and targeted response which help save lives and livelihoods. In this work we present a quantitative methodology to forecast levels of food consumption for 60 consecutive days, at the sub-national level, in four countries: Mali, Nigeria, Syria, and Yemen. The methodology is built on publicly available data from the World Food Programme's global hunger monitoring system which collects, processes, and displays daily updates on key food security metrics, conflict, weather events, and other drivers of food insecurity. In this study we assessed the performance of various models including Autoregressive Integrated Moving Average (ARIMA), Extreme Gradient Boosting (XGBoost), Long Short Term Memory (LSTM) Network, Convolutional Neural Network (CNN), and Reservoir Computing (RC), by comparing their Root Mean Squared Error (RMSE) metrics. Our findings highlight Reservoir Computing as a particularly well-suited model in the field of food security given both its notable resistance to over-fitting on limited data samples and its efficient training capabilities. The methodology we introduce establishes the groundwork for a global, data-driven early warning system designed to anticipate and detect food insecurity.

Updated: 2024-10-24 11:15:28

标题: 使用实时数据预测食品安全趋势

摘要: 提前预警系统是有效的人道主义行动工具。对即将发生的灾难提前警告有助于及时和有针对性地响应，从而拯救生命和生计。在这项工作中，我们提出了一种定量方法，用于预测马里、尼日利亚、叙利亚和也门四个国家60天连续食品消费水平，以次国家级别。该方法建立在世界粮食计划署全球饥饿监测系统的公开数据基础上，该系统收集、处理和展示关键食品安全指标、冲突、气象事件和其他导致食品不安全的驱动因素的每日更新。在这项研究中，我们评估了各种模型的表现，包括自回归综合移动平均（ARIMA）、极端梯度提升（XGBoost）、长短期记忆（LSTM）网络、卷积神经网络（CNN）和水库计算（RC），通过比较它们的均方根误差（RMSE）指标。我们的研究结果突出了水库计算作为食品安全领域特别适合的模型，因为它不仅对有限数据样本具有显着的抗过度拟合能力，而且具有高效的训练能力。我们介绍的方法奠定了全球、数据驱动的早期预警系统的基础，旨在预测和检测食品不安全。

更新时间: 2024-10-24 11:15:28

领域: cs.LG,physics.soc-ph,stat.ML

下载: http://arxiv.org/abs/2312.00626v3

Smart ETL and LLM-based contents classification: the European Smart Tourism Tools Observatory experience

Purpose: Our research project focuses on improving the content update of the online European Smart Tourism Tools (STTs) Observatory by incorporating and categorizing STTs. The categorization is based on their taxonomy, and it facilitates the end user's search process. The use of a Smart ETL (Extract, Transform, and Load) process, where \emph{Smart} indicates the use of Artificial Intelligence (AI), is central to this endeavor. Methods: The contents describing STTs are derived from PDF catalogs, where PDF-scraping techniques extract QR codes, images, links, and text information. Duplicate STTs between the catalogs are removed, and the remaining ones are classified based on their text information using Large Language Models (LLMs). Finally, the data is transformed to comply with the Dublin Core metadata structure (the observatory's metadata structure), chosen for its wide acceptance and flexibility. Results: The Smart ETL process to import STTs to the observatory combines PDF-scraping techniques with LLMs for text content-based classification. Our preliminary results have demonstrated the potential of LLMs for text content-based classification. Conclusion: The proposed approach's feasibility is a step towards efficient content-based classification, not only in Smart Tourism but also adaptable to other fields. Future work will mainly focus on refining this classification process.

Updated: 2024-10-24 11:10:54

标题: 智能ETL和LLM基础内容分类：欧洲智慧旅游工具观测经验

摘要: 目的：我们的研究项目旨在通过整合和分类智能旅游工具（STTs），改进在线欧洲智能旅游工具（STTs）观察站的内容更新。这种分类基于它们的分类法，并且有助于最终用户的搜索过程。在这一努力中，使用智能ETL（提取、转换和加载）过程至关重要，其中“智能”表示使用人工智能（AI）。方法：描述STTs的内容来自PDF目录，PDF抓取技术提取QR码、图片、链接和文本信息。目录之间的重复STTs被移除，剩余的STTs根据它们的文本信息使用大型语言模型（LLMs）进行分类。最后，数据被转换以符合Dublin Core元数据结构（观察站的元数据结构），选择该结构是因为它被广泛接受且灵活。结果：将STTs导入观察站的智能ETL过程将PDF抓取技术与基于文本内容的LLMs进行了结合。我们的初步结果已经展示了LLMs在基于文本内容的分类中的潜力。结论：所提出的方法的可行性是朝着高效的基于内容的分类迈出的一步，不仅适用于智能旅游，还可适应其他领域。未来的工作主要将集中在完善这一分类过程。

更新时间: 2024-10-24 11:10:54

领域: cs.IR,cs.AI,H.3.3; I.2.7; I.5.2

下载: http://arxiv.org/abs/2410.18641v1

Learning Hamiltonian Dynamics with Reproducing Kernel Hilbert Spaces and Random Features

A method for learning Hamiltonian dynamics from a limited and noisy dataset is proposed. The method learns a Hamiltonian vector field on a reproducing kernel Hilbert space (RKHS) of inherently Hamiltonian vector fields, and in particular, odd Hamiltonian vector fields. This is done with a symplectic kernel, and it is shown how the kernel can be modified to an odd symplectic kernel to impose the odd symmetry. A random feature approximation is developed for the proposed odd kernel to reduce the problem size. The performance of the method is validated in simulations for three Hamiltonian systems. It is demonstrated that the use of an odd symplectic kernel improves prediction accuracy and data efficiency, and that the learned vector fields are Hamiltonian and exhibit the imposed odd symmetry characteristics.

Updated: 2024-10-24 11:01:00

标题: 在使用再生核希尔伯特空间和随机特征学习哈密顿动力学

摘要: 提出了一种从有限且嘈杂的数据集中学习哈密顿动力学的方法。该方法在固有哈密顿矢量场的再生核希尔伯特空间（RKHS）上学习一个哈密顿矢量场，特别是奇哈密顿矢量场。这是通过一个辛核实现的，并展示了如何将核修改为奇辛核以施加奇对称性。为所提出的奇核开发了一个随机特征近似以减小问题规模。该方法在三个哈密顿系统的模拟中得到验证。结果表明，使用奇辛核可以提高预测准确性和数据效率，并且学习到的矢量场是哈密顿的，并展示了施加的奇对称特性。

更新时间: 2024-10-24 11:01:00

领域: cs.LG,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.07703v2

Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Model

As diffusion models become increasingly popular, the misuse of copyrighted and private images has emerged as a major concern. One promising solution to mitigate this issue is identifying the contribution of specific training samples in generative models, a process known as data attribution. Existing data attribution methods for diffusion models typically quantify the contribution of a training sample by evaluating the change in diffusion loss when the sample is included or excluded from the training process. However, we argue that the direct usage of diffusion loss cannot represent such a contribution accurately due to the calculation of diffusion loss. Specifically, these approaches measure the divergence between predicted and ground truth distributions, which leads to an indirect comparison between the predicted distributions and cannot represent the variances between model behaviors. To address these issues, we aim to measure the direct comparison between predicted distributions with an attribution score to analyse the training sample importance, which is achieved by Diffusion Attribution Score (DAS). Underpinned by rigorous theoretical analysis, we elucidate the effectiveness of DAS. Additionally, we explore strategies to accelerate DAS calculations, facilitating its application to large-scale diffusion models. Our extensive experiments across various datasets and diffusion models demonstrate that DAS significantly surpasses previous benchmarks in terms of the linear data-modelling score, establishing new state-of-the-art performance.

Updated: 2024-10-24 10:58:17

标题: 扩散归因分数：评估扩散模型中训练数据影响

摘要: 随着扩散模型变得越来越受欢迎，对受版权保护和私人图像的滥用已经成为一个主要关注点。缓解这一问题的一个有希望的解决方案是识别生成模型中特定训练样本的贡献，这个过程被称为数据归因。现有的针对扩散模型的数据归因方法通常通过评估在训练过程中包含或排除样本时扩散损失的变化来量化训练样本的贡献。然而，我们认为直接使用扩散损失无法准确表示这种贡献，这是因为扩散损失的计算。具体来说，这些方法衡量了预测和实际分布之间的差异，这导致了对预测分布的间接比较，不能表示模型行为之间的差异。为了解决这些问题，我们旨在通过一个归因分数来衡量预测分布之间的直接比较，以分析训练样本的重要性，这是通过扩散归因分数（DAS）实现的。在严谨的理论分析支持下，我们阐明了DAS的有效性。此外，我们探索了加速DAS计算的策略，促进其在大规模扩散模型中的应用。我们在各种数据集和扩散模型上进行的广泛实验表明，DAS在线性数据建模得分方面显著超越了先前的基准，确立了新的最先进表现。

更新时间: 2024-10-24 10:58:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.18639v1

Remote Detection of Applications for Improved Beam Tracking in mmWave/sub-THz 5G/6G Systems

Beam tracking is an essential functionality of millimeter wave (mmWave, 30-100 GHz) and sub-terahertz (sub-THz, 100-300 GHz) 5G/6G systems. It operates by performing antenna sweeping at both base station (BS) and user equipment (UE) sides using the Synchronization Signal Blocks (SSB). The optimal frequency of beam tracking events is not specified by 3GPP standards and heavily depends on the micromobility properties of the applications currently utilized by the user. In absence of explicit signalling for the type of application at the air interface, in this paper, we propose a way to remotely detect it at the BS side based on the received signal strength pattern. To this aim, we first perform a multi-stage measurement campaign at 156 GHz, belonging to the sub-THz band, to obtain the received signal strength traces of popular smartphone applications. Then, we proceed applying conventional statistical Mann-Whitney tests and various machine learning (ML) based classification techniques to discriminate applications remotely. Our results show that Mann-Whitney test can be used to differentiate between fast and slow application classes with a confidence of 0.95 inducing class detection delay on the order of 1 s after application initialization. With the same time budget, random forest classifiers can differentiate between applications with fast and slow micromobility with 80% accuracy using received signal strength metric only. The accuracy of detecting a specific application however is lower, reaching 60%. By utilizing the proposed technique one can estimate the optimal values of the beam tracking intervals without adding additional signalling to the air interface.

Updated: 2024-10-24 10:55:21

标题: 远程检测应用程序以提高毫米波/亚太特兹5G/6G系统中的波束跟踪

摘要: 波束跟踪是毫米波（mmWave，30-100 GHz）和次太赫兹（sub-THz，100-300 GHz）5G/6G系统的基本功能。它通过在基站（BS）和用户设备（UE）双方使用同步信号块（SSB）进行天线扫描来运行。波束跟踪事件的最佳频率未在3GPP标准中指定，并且在很大程度上取决于用户当前使用的应用程序的微移动特性。在空中接口没有明确信令应用程序类型的情况下，在本文中，我们提出了一种在BS端基于接收信号强度模式远程检测应用程序的方法。为此，我们首先在属于次太赫兹频段的156 GHz上进行多阶段测量活动，以获取流行智能手机应用程序的接收信号强度跟踪。然后，我们继续应用传统的统计Mann-Whitney测试和各种基于机器学习（ML）的分类技术来远程区分应用程序。我们的结果显示，Mann-Whitney测试可用于以0.95的置信度区分快速和慢速应用程序类别，在应用程序初始化后约1秒引入类别检测延迟。在相同的时间预算下，随机森林分类器可以仅使用接收信号强度度量将快速和慢速微移动的应用程序区分度达到80％的准确率。然而，检测特定应用程序的准确度较低，仅达到60％。通过利用提出的技术，可以估计波束跟踪间隔的最佳值，而无需向空中接口添加额外的信令。

更新时间: 2024-10-24 10:55:21

领域: eess.SP,cs.LG,physics.ins-det

下载: http://arxiv.org/abs/2410.18637v1

Multi-agent cooperation through learning-aware policy gradients

Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning. How can we achieve cooperation among self-interested, independent learning agents? Promising recent work has shown that in certain tasks cooperation can be established between learning-aware agents who model the learning dynamics of each other. Here, we present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning, which takes into account that other agents are themselves learning through trial and error based on multiple noisy trials. We then leverage efficient sequence models to condition behavior on long observation histories that contain traces of the learning dynamics of other agents. Training long-context policies with our algorithm leads to cooperative behavior and high returns on standard social dilemmas, including a challenging environment where temporally-extended action coordination is required. Finally, we derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.

Updated: 2024-10-24 10:48:42

标题: 多智能体通过学习感知策略梯度进行合作

摘要: 自私的个体经常无法合作，这对多代理学习构成了根本挑战。我们如何在自私的、独立学习的代理之间实现合作？有前景的最近工作表明，在某些任务中，学习感知代理之间可以建立合作，他们可以模拟彼此的学习动态。在这里，我们提出了第一个无偏见的、无高阶导数的策略梯度算法，用于学习感知强化学习，考虑到其他代理通过多次嘈杂的试验和错误进行学习。然后，我们利用高效的序列模型来将行为条件化为包含其他代理的学习动态痕迹的长期观测历史。使用我们的算法训练长期上下文策略可以导致合作行为，并在标准社会困境中获得高回报，包括一个需要时间延长的行动协调的具有挑战性的环境。最后，我们从迭代囚徒困境中得出一个新颖的解释，解释自私的学习感知代理何时以及如何建立合作。

更新时间: 2024-10-24 10:48:42

领域: cs.AI

下载: http://arxiv.org/abs/2410.18636v1

Little Giants: Synthesizing High-Quality Embedding Data at Scale

Synthetic data generation has become an increasingly popular way of training models without the need for large, manually labeled datasets. For tasks like text embedding, synthetic data offers diverse and scalable training examples, significantly reducing the cost of human annotation. However, most current approaches rely heavily on proprietary models like GPT-4, which are expensive and inefficient for generating large-scale embedding data. In this paper, we introduce SPEED, a framework that aligns open-source small models (8B) to efficiently generate large-scale synthetic embedding data. Through supervised fine-tuning, preference optimization, and self-improvement, SPEED enables small open-source models to produce high-quality data. Remarkably, SPEED uses only less than 1/10 of the GPT API calls, outperforming the state-of-the-art embedding model E5_mistral when both are trained solely on their synthetic data. Using this efficient generator, we conduct a comprehensive study on how various factors within the alignment pipeline impact data quality and reveal the scaling law for synthetic embedding data.

Updated: 2024-10-24 10:47:30

标题: 小巨人：在规模上合成高质量的嵌入数据

摘要: 合成数据生成已经成为一种越来越受欢迎的训练模型的方式，无需大量手工标记的数据集。对于文本嵌入等任务，合成数据提供了多样化和可扩展的训练示例，显著降低了人工标注的成本。然而，大多数当前方法严重依赖于诸如GPT-4之类的专有模型，这些模型生成大规模嵌入数据的成本高且效率低下。在本文中，我们介绍了SPEED，一个框架，通过对开源小型模型（8B）进行对齐，有效地生成大规模合成嵌入数据。通过监督微调、偏好优化和自我改进，SPEED使得小型开源模型能够产生高质量的数据。值得注意的是，SPEED仅使用不到GPT API调用的1/10，当两者仅在其合成数据上训练时，优于最先进的嵌入模型E5_mistral。利用这种高效的生成器，我们对对齐流程中各种因素如何影响数据质量进行了全面研究，并揭示了合成嵌入数据的规模定律。

更新时间: 2024-10-24 10:47:30

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2410.18634v1

Supporting Assessment of Novelty of Design Problems Using Concept of Problem SAPPhIRE

This paper proposes a framework for assessing the novelty of design problems using the SAPPhIRE model of causality. The novelty of a problem is measured as its minimum distance from the problems in a reference problem database. The distance is calculated by comparing the current problem and each reference past problem at the various levels of abstraction in the SAPPhIRE ontology. The basis for comparison is textual similarity. To demonstrate the applicability of the proposed framework, The current set of problems associated with an artifact, as collected from its stakeholders, were compared with the past set of problems, as collected from patents and other web sources, to assess the novelty of the current set. This approach is aimed at providing a better understanding of the degree of novelty of any given set of current problems by comparing them to similar problems available from historical records. Since manual assessment, the current mode of such assessments as reported in the literature, is a tedious process, to reduce time complexity and to afford better applicability for larger sets of problem statements, an automated assessment is proposed and used in this paper.

Updated: 2024-10-24 10:39:49

标题: 支持使用“问题SAPPhIRE”概念评估设计问题的新颖性

摘要: 本文提出了一个框架来评估设计问题的新颖性，使用SAPPhIRE模型的因果关系。问题的新颖性被测量为其与参考问题数据库中问题的最小距离。距离是通过比较SAPPhIRE本体论中不同抽象层次上的当前问题和每个参考过去问题来计算的。比较的基础是文本相似性。为了展示所提出的框架的适用性，从相关利益相关者收集的与一个工件相关的当前问题集与从专利和其他网络来源收集的过去问题集进行比较，以评估当前集合的新颖性。这种方法旨在通过将它们与历史记录中可用的类似问题进行比较，提供对任何给定当前问题集的新颖程度的更好理解。由于手动评估，即报道文献中的当前评估模式是一个繁琐的过程，为了减少时间复杂性并为更大的问题陈述集提供更好的适用性，本文提出并使用了自动评估。

更新时间: 2024-10-24 10:39:49

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.18629v1

Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label

Synthesizers are essential in modern music production. However, their complex timbre parameters, often filled with technical terms, require expertise. This research introduces a method of timbre control in wavetable synthesis that is intuitive and sensible and utilizes semantic labels. Using a conditional variational autoencoder (CVAE), users can select a wavetable and define the timbre with labels such as bright, warm, and rich. The CVAE model, featuring convolutional and upsampling layers, effectively captures the wavetable nuances, ensuring real-time performance owing to their processing in the time domain. Experiments demonstrate that this approach allows for real-time, effective control of the timbre of the wavetable using semantic inputs and aims for intuitive timbre control through data-based semantic control.

Updated: 2024-10-24 10:37:54

标题: 使用CVAE进行波表合成以实现基于语义标签的音色控制

摘要: 合成器在现代音乐制作中是必不可少的。然而，它们复杂的音色参数，通常充斥着技术术语，需要专业知识。本研究介绍了一种在波表合成中直观和合理地控制音色的方法，并利用语义标签。使用条件变分自动编码器（CVAE），用户可以选择一个波表并用明亮、温暖和丰富等标签定义音色。该CVAE模型采用卷积和上采样层，有效捕捉波表的细微差别，确保由于在时间域中处理而实现实时性能。实验表明，这种方法允许通过语义输入实时、有效地控制波表的音色，并旨在通过基于数据的语义控制实现直观的音色控制。

更新时间: 2024-10-24 10:37:54

领域: cs.SD,cs.AI,eess.AS,eess.SP

下载: http://arxiv.org/abs/2410.18628v1

SAMG: State-Action-Aware Offline-to-Online Reinforcement Learning with Offline Model Guidance

The offline-to-online (O2O) paradigm in reinforcement learning (RL) utilizes pre-trained models on offline datasets for subsequent online fine-tuning. However, conventional O2O RL algorithms typically require maintaining and retraining the large offline datasets to mitigate the effects of out-of-distribution (OOD) data, which limits their efficiency in exploiting online samples. To address this challenge, we introduce a new paradigm called SAMG: State-Action-Conditional Offline-to-Online Reinforcement Learning with Offline Model Guidance. In particular, rather than directly training on offline data, SAMG freezes the pre-trained offline critic to provide offline values for each state-action pair to deliver compact offline information. This framework eliminates the need for retraining with offline data by freezing and leveraging these values of the offline model. These are then incorporated with the online target critic using a Bellman equation weighted by a policy state-action-aware coefficient. This coefficient, derived from a conditional variational auto-encoder (C-VAE), aims to capture the reliability of the offline data on a state-action level. SAMG could be easily integrated with existing Q-function based O2O RL algorithms. Theoretical analysis shows good optimality and lower estimation error of SAMG. Empirical evaluations demonstrate that SAMG outperforms four state-of-the-art O2O RL algorithms in the D4RL benchmark.

Updated: 2024-10-24 10:35:02

标题: SAMG：具有脱机模型引导的状态动作感知离线到在线强化学习

摘要: 离线到在线（O2O）强化学习（RL）范式利用离线数据集上预先训练的模型进行后续在线微调。然而，传统的O2O RL算法通常需要维护和重新训练大型离线数据集，以减轻分布外（OOD）数据的影响，这限制了它们在利用在线样本方面的效率。为了解决这一挑战，我们引入了一种称为SAMG的新范式：带离线模型引导的状态-动作条件离线到在线强化学习。具体而言，SAMG不直接在离线数据上进行训练，而是冻结预先训练的离线评论者以为每个状态-动作对提供离线值，以提供紧凑的离线信息。这一框架通过冻结并利用离线模型的这些值，消除了使用离线数据进行重新训练的需求。然后，这些值与在线目标评论者结合使用由策略状态-动作感知系数加权的贝尔曼方程。这个系数，从条件变分自动编码器（C-VAE）推导而来，旨在捕捉离线数据在状态-动作层面上的可靠性。SAMG可以轻松集成到现有基于Q函数的O2O RL算法中。理论分析显示SAMG的良好最优性和较低的估计误差。实证评估表明，SAMG在D4RL基准测试中优于四种最先进的O2O RL算法。

更新时间: 2024-10-24 10:35:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.18626v1

Prompting and Fine-Tuning of Small LLMs for Length-Controllable Telephone Call Summarization

This paper explores the rapid development of a telephone call summarization system utilizing large language models (LLMs). Our approach involves initial experiments with prompting existing LLMs to generate summaries of telephone conversations, followed by the creation of a tailored synthetic training dataset utilizing stronger frontier models. We place special focus on the diversity of the generated data and on the ability to control the length of the generated summaries to meet various use-case specific requirements. The effectiveness of our method is evaluated using two state-of-the-art LLM-as-a-judge-based evaluation techniques to ensure the quality and relevance of the summaries. Our results show that fine-tuned Llama-2-7B-based summarization model performs on-par with GPT-4 in terms of factual accuracy, completeness and conciseness. Our findings demonstrate the potential for quickly bootstrapping a practical and efficient call summarization system.

Updated: 2024-10-24 10:32:10

标题: 电话呼叫摘要的长度可控小型LLM的提示和微调

摘要: 本文探讨了利用大型语言模型（LLMs）快速开发电话通话摘要系统的发展。我们的方法涉及通过提示现有的LLMs生成电话会话摘要的初步实验，随后创建一个定制的合成训练数据集，利用更强大的前沿模型。我们特别关注生成数据的多样性以及控制生成摘要长度以满足各种用例特定需求的能力。我们使用两种最先进的LLM作为评判依据的评估技术来评估我们的方法的有效性，以确保摘要的质量和相关性。我们的结果显示，经过调优的Llama-2-7B基础摘要模型在事实准确性、完整性和简洁性方面与GPT-4相媲美。我们的发现表明，快速启动一个实用和高效的电话摘要系统的潜力。

更新时间: 2024-10-24 10:32:10

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18624v1

Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs

Large language models (LLMs), although having revolutionized many fields, still suffer from the challenging extrapolation problem, where the inference ability of LLMs sharply declines beyond their max training lengths. In this work, we conduct a theoretical analysis to better understand why No Position Encoding (NoPE) fails outside its effective range, as well as examining the power of Position Encoding (PE) in this context. Our findings reveal that with meticulous weave position, PE can indeed be extended beyond effective range. Our theorems establish that LLMs equipped with weave PE can achieve improved extrapolation performance without additional cost. Furthermore, we introduce a novel weave PE method, Mesa-Extrapolation, which utilizes a chunk-based triangular attention matrix and applies Stair PE to manage the final chunk. This method not only retains competitive performance but also offers substantial benefits such as significantly reduced memory demand and faster inference speed. Extensive experiments validate the effectiveness of Mesa-Extrapolation, demonstrating its potential as a scalable solution to enhancing LLMs applicative reach. Our code is available at \url{https://github.com/soacker/Mesa-Extrapolation}.

Updated: 2024-10-24 10:29:15

标题: 梅萨外推法：一种用于增强LLMs中外推的编织位置编码方法

摘要: 大型语言模型（LLMs）虽然在许多领域中进行了革命性的改变，但仍然面临挑战性的外推问题，即LLMs的推理能力在超过其最大训练长度后急剧下降。在这项工作中，我们进行了理论分析，以更好地理解为什么无位置编码（NoPE）在其有效范围之外失败，同时检验位置编码（PE）在这一背景下的作用。我们的研究发现，通过精心编织位置，PE确实可以扩展到有效范围之外。我们的定理证明了装备编织PE的LLMs可以在不增加额外成本的情况下实现改进的外推性能。此外，我们引入了一种新颖的编织PE方法Mesa-Extrapolation，该方法利用基于块的三角形注意力矩阵，并应用Stair PE来管理最后一个块。该方法不仅保持了竞争性能，还提供了诸多好处，如显著减少的内存需求和更快的推理速度。广泛的实验验证了Mesa-Extrapolation的有效性，展示了其作为增强LLMs应用范围可行解决方案的潜力。我们的代码可在\url{https://github.com/soacker/Mesa-Extrapolation}上找到。

更新时间: 2024-10-24 10:29:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.15859v3

Evolutionary Dispersal of Ecological Species via Multi-Agent Deep Reinforcement Learning

Understanding species dynamics in heterogeneous environments is essential for ecosystem studies. Traditional models assumed homogeneous habitats, but recent approaches include spatial and temporal variability, highlighting species migration. We adopt starvation-driven diffusion (SDD) models as nonlinear diffusion to describe species dispersal based on local resource conditions, showing advantages for species survival. However, accurate prediction remains challenging due to model simplifications. This study uses multi-agent reinforcement learning (MARL) with deep Q-networks (DQN) to simulate single species and predator-prey interactions, incorporating SDD-type rewards. Our simulations reveal evolutionary dispersal strategies, providing insights into species dispersal mechanisms and validating traditional mathematical models.

Updated: 2024-10-24 10:21:23

标题: 通过多智能体深度强化学习的生态物种进化性传播

摘要: 在异质环境中理解物种动态对于生态系统研究至关重要。传统模型假设生境是均质的，但最近的方法包括空间和时间的变化，突出物种迁移。我们采用以饥饿驱动的扩散（SDD）模型作为非线性扩散，描述基于当地资源条件的物种扩散，显示出对物种生存的优势。然而，由于模型简化，准确预测仍具挑战性。本研究使用深度Q网络（DQN）的多智能体强化学习（MARL）来模拟单一物种和捕食者-被捕食者相互作用，融合SDD类型的奖励。我们的模拟揭示了进化的扩散策略，提供了对物种扩散机制的洞察，并验证了传统的数学模型。

更新时间: 2024-10-24 10:21:23

领域: q-bio.PE,cs.LG,math.DS,35J60, 35K57, 92D25, 68T05, 93E35

下载: http://arxiv.org/abs/2410.18621v1

Learning of Hamiltonian Dynamics with Reproducing Kernel Hilbert Spaces

This paper presents a method for learning Hamiltonian dynamics from a limited set of data points. The Hamiltonian vector field is found by regularized optimization over a reproducing kernel Hilbert space of vector fields that are inherently Hamiltonian, and where the vector field is required to be odd or even. This is done with a symplectic kernel, and it is shown how this symplectic kernel can be modified to be odd or even. The performance of the method is validated in simulations for two Hamiltonian systems. The simulations show that the learned dynamics reflect the energy-preservation of the Hamiltonian dynamics, and that the restriction to symplectic and odd dynamics gives improved accuracy over a large domain of the phase space.

Updated: 2024-10-24 10:16:58

标题: 用再生核希尔伯特空间学习哈密顿动力学

摘要: 这篇论文介绍了一种从有限数据点学习哈密顿动力学的方法。通过在一个向量场的再生核希尔伯特空间上进行正则化优化，找到了哈密顿向量场，这些向量场本质上是哈密顿的，并且要求向量场是奇数或偶数的。这是通过一个辛核来实现的，并展示了如何修改这个辛核使其成为奇数或偶数。该方法在两个哈密顿系统的模拟中得到验证。模拟结果显示，学习到的动态反映了哈密顿动力学的能量保持性，并且限制为辛和奇数动态在相空间的大范围内提供了改进的精度。

更新时间: 2024-10-24 10:16:58

领域: cs.RO,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2312.09734v2

FairQueue: Rethinking Prompt Learning for Fair Text-to-Image Generation

Recently, prompt learning has emerged as the state-of-the-art (SOTA) for fair text-to-image (T2I) generation. Specifically, this approach leverages readily available reference images to learn inclusive prompts for each target Sensitive Attribute (tSA), allowing for fair image generation. In this work, we first reveal that this prompt learning-based approach results in degraded sample quality. Our analysis shows that the approach's training objective -- which aims to align the embedding differences of learned prompts and reference images -- could be sub-optimal, resulting in distortion of the learned prompts and degraded generated images. To further substantiate this claim, as our major contribution, we deep dive into the denoising subnetwork of the T2I model to track down the effect of these learned prompts by analyzing the cross-attention maps. In our analysis, we propose a novel prompt switching analysis: I2H and H2I. Furthermore, we propose new quantitative characterization of cross-attention maps. Our analysis reveals abnormalities in the early denoising steps, perpetuating improper global structure that results in degradation in the generated samples. Building on insights from our analysis, we propose two ideas: (i) Prompt Queuing and (ii) Attention Amplification to address the quality issue. Extensive experimental results on a wide range of tSAs show that our proposed method outperforms SOTA approach's image generation quality, while achieving competitive fairness. More resources at FairQueue Project site: https://sutd-visual-computing-group.github.io/FairQueue

Updated: 2024-10-24 10:16:09

标题: FairQueue：重新思考公平文本到图像生成的快速学习

摘要: 最近，快速学习已经成为公平文本到图像（T2I）生成的最先进技术。具体来说，这种方法利用现成的参考图像来学习每个目标敏感属性（tSA）的包容性提示，从而实现公平的图像生成。在这项工作中，我们首先发现，这种基于提示学习的方法导致了样本质量下降。我们的分析显示，该方法的训练目标 - 旨在使学习提示和参考图像的嵌入差异对齐 - 可能是次优的，导致学习提示的扭曲和生成图像的质量下降。为了进一步证实这一观点，作为我们的主要贡献，我们深入研究了T2I模型中去噪子子网络的效果，通过分析交叉注意力图来追踪这些学习提示的影响。在我们的分析中，我们提出了一种新的提示切换分析：I2H和H2I。此外，我们提出了交叉注意力图的新的定量特征化方法。我们的分析揭示了早期去噪步骤中的异常，导致生成样本中全局结构不当的问题。基于我们分析的见解，我们提出了两个想法：（i）提示排队和（ii）注意力放大，以解决质量问题。对一系列tSA的广泛实验结果表明，我们提出的方法在图像生成质量方面优于SOTA方法，同时实现了竞争力的公平性。FairQueue项目网站上有更多资源：https://sutd-visual-computing-group.github.io/FairQueue

更新时间: 2024-10-24 10:16:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.18615v1

Rethinking Softmax: Self-Attention with Polynomial Activations

This paper challenges the conventional belief that softmax attention in transformers is effective primarily because it generates a probability distribution for attention allocation. Instead, we theoretically show that its success lies in its ability to implicitly regularize the Frobenius norm of the attention matrix during training. We then explore alternative activations that regularize the Frobenius norm of the attention matrix, demonstrating that certain polynomial activations can achieve this effect, making them suitable for attention-based architectures. Empirical results indicate these activations perform comparably or better than softmax across various computer vision and language tasks, suggesting new possibilities for attention mechanisms beyond softmax.

Updated: 2024-10-24 10:08:25

标题: 重新思考Softmax：使用多项式激活函数的自注意力机制

摘要: 本文挑战了传统观念，即transformers中的softmax注意力机制之所以有效主要是因为它生成了注意力分配的概率分布。相反，我们在理论上展示了它的成功在于其在训练过程中能够隐式地规范化注意力矩阵的Frobenius范数。然后，我们探索了能够规范化注意力矩阵的Frobenius范数的替代激活函数，证明了某些多项式激活函数可以实现这种效果，使它们适用于基于注意力的架构。实证结果表明，这些激活函数在各种计算机视觉和语言任务中表现出与softmax相当或更好的性能，为超越softmax的注意力机制提供了新的可能性。

更新时间: 2024-10-24 10:08:25

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2410.18613v1

TripCast: Pre-training of Masked 2D Transformers for Trip Time Series Forecasting

Deep learning and pre-trained models have shown great success in time series forecasting. However, in the tourism industry, time series data often exhibit a leading time property, presenting a 2D structure. This introduces unique challenges for forecasting in this sector. In this study, we propose a novel modelling paradigm, TripCast, which treats trip time series as 2D data and learns representations through masking and reconstruction processes. Pre-trained on large-scale real-world data, TripCast notably outperforms other state-of-the-art baselines in in-domain forecasting scenarios and demonstrates strong scalability and transferability in out-domain forecasting scenarios.

Updated: 2024-10-24 10:08:05

标题: TripCast：预训练的掩码2D变换器用于旅行时间序列预测

摘要: 深度学习和预训练模型在时间序列预测中取得了巨大成功。然而，在旅游业中，时间序列数据通常具有领先时间属性，呈现出2D结构。这为该行业的预测带来了独特的挑战。在本研究中，我们提出了一种新颖的建模范式TripCast，将旅行时间序列视为2D数据，并通过掩模和重建过程学习表示。在大规模实际数据上进行预训练后，TripCast在领域内预测场景中明显优于其他最先进的基线，并在领域外预测场景中展现出强大的可扩展性和可转移性。

更新时间: 2024-10-24 10:08:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.18612v1

Privacy-Preserving Logistic Regression Training on Large Datasets

Privacy-preserving machine learning is one class of cryptographic methods that aim to analyze private and sensitive data while keeping privacy, such as homomorphic logistic regression training over large encrypted data. In this paper, we propose an efficient algorithm for logistic regression training on large encrypted data using Homomorphic Encryption (HE), which is the mini-batch version of recent methods using a faster gradient variant called $\texttt{quadratic gradient}$. It is claimed that $\texttt{quadratic gradient}$ can integrate curve information (Hessian matrix) into the gradient and therefore can effectively accelerate the first-order gradient (descent) algorithms. We also implement the full-batch version of their method when the encrypted dataset is so large that it has to be encrypted in the mini-batch manner. We compare our mini-batch algorithm with our full-batch implementation method on real financial data consisting of 422,108 samples with 200 freatures. %Our experiments show that Nesterov's accelerated gradient (NAG) Given the inefficiency of HEs, our results are inspiring and demonstrate that the logistic regression training on large encrypted dataset is of practical feasibility, marking a significant milestone in our understanding.

Updated: 2024-10-24 10:08:02

标题: 大数据集上隐私保护的逻辑回归训练

摘要: 隐私保护机器学习是一类密码学方法，旨在分析私人和敏感数据的同时保护隐私，例如在大规模加密数据上进行同态逻辑回归训练。本文提出了一种在大规模加密数据上使用同态加密（HE）进行逻辑回归训练的高效算法，这是使用更快梯度变体$\texttt{quadratic gradient}$的最近方法的小批量版本。据称$\texttt{quadratic gradient}$可以将曲线信息（Hessian矩阵）整合到梯度中，因此可以有效加速第一阶梯度（下降）算法。当加密数据集太大以至必须按小批量方式加密时，我们还实现了其方法的全批量版本。我们将我们的小批量算法与我们在由422,108个样本和200个特征组成的实际金融数据上的全批量实现方法进行比较。我们的实验表明，鉴于HE的低效性，我们的结果令人鼓舞，并表明在大规模加密数据集上进行逻辑回归训练是可行的，这标志着我们对该领域的理解取得了重要进展。

更新时间: 2024-10-24 10:08:02

领域: cs.CR

下载: http://arxiv.org/abs/2406.13221v3

STTATTS: Unified Speech-To-Text And Text-To-Speech Model

Speech recognition and speech synthesis models are typically trained separately, each with its own set of learning objectives, training data, and model parameters, resulting in two distinct large networks. We propose a parameter-efficient approach to learning ASR and TTS jointly via a multi-task learning objective and shared parameters. Our evaluation demonstrates that the performance of our multi-task model is comparable to that of individually trained models while significantly saving computational and memory costs ($\sim$50\% reduction in the total number of parameters required for the two tasks combined). We experiment with English as a resource-rich language, and Arabic as a relatively low-resource language due to shortage of TTS data. Our models are trained with publicly available data, and both the training code and model checkpoints are openly available for further research.

Updated: 2024-10-24 10:04:24

标题: STTATTS: 统一的语音转文本和文本转语音模型

摘要: 语音识别和语音合成模型通常是分别训练的，每个模型都有自己的学习目标、训练数据和模型参数，从而导致两个独立的大型网络。我们提出了一种参数高效的方法，通过多任务学习目标和共享参数来联合学习自动语音识别（ASR）和文本到语音合成（TTS）。我们的评估表明，我们的多任务模型的性能与单独训练的模型相当，同时显著节省了计算和内存成本（两个任务所需参数总数减少约50%）。我们尝试了英语作为资源丰富的语言，以及由于TTS数据短缺而被认为是相对低资源的阿拉伯语。我们的模型使用公开可用数据进行训练，训练代码和模型检查点也可供进一步研究使用。

更新时间: 2024-10-24 10:04:24

领域: cs.CL,cs.AI,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.18607v1

Neural-Rendezvous: Provably Robust Guidance and Control to Encounter Interstellar Objects

Interstellar objects (ISOs) are likely representatives of primitive materials invaluable in understanding exoplanetary star systems. Due to their poorly constrained orbits with generally high inclinations and relative velocities, however, exploring ISOs with conventional human-in-the-loop approaches is significantly challenging. This paper presents Neural-Rendezvous -- a deep learning-based guidance and control framework for encountering fast-moving objects, including ISOs, robustly, accurately, and autonomously in real time. It uses pointwise minimum norm tracking control on top of a guidance policy modeled by a spectrally-normalized deep neural network, where its hyperparameters are tuned with a loss function directly penalizing the MPC state trajectory tracking error. We show that Neural-Rendezvous provides a high probability exponential bound on the expected spacecraft delivery error, the proof of which leverages stochastic incremental stability analysis. In particular, it is used to construct a non-negative function with a supermartingale property, explicitly accounting for the ISO state uncertainty and the local nature of nonlinear state estimation guarantees. In numerical simulations, Neural-Rendezvous is demonstrated to satisfy the expected error bound for 100 ISO candidates. This performance is also empirically validated using our spacecraft simulator and in high-conflict and distributed UAV swarm reconfiguration with up to 20 UAVs.

Updated: 2024-10-24 10:01:12

标题: 神经会面：证明具有稳健性的遇到星际物体的引导和控制

摘要: 星际物体（ISOs）可能是原始材料的代表，对于理解外行星星系至关重要。然而，由于它们的轨道不确定性大，通常具有较高的倾斜度和相对速度，因此用传统的人为干预方法探索ISOs是非常具有挑战性的。本文提出了神经会合（Neural-Rendezvous）- 一个基于深度学习的指导和控制框架，可以实时、稳健、准确地遇到ISOs等快速移动物体。它使用点对点最小规范跟踪控制，结合由谱归一化深度神经网络建模的指导策略，其中超参数通过直接惩罚MPC状态轨迹跟踪误差的损失函数进行调整。我们展示了神经会合提供了对预期飞行器传递误差的高概率指数界限，其证明利用了随机增量稳定性分析。特别地，它用于构建一个非负函数，具有超马丁贝尔属性，明确考虑了ISO状态不确定性和非线性状态估计保证的本地特性。在数值模拟中，神经会合被证明对100个ISO候选者满足了预期误差限制。该性能也通过我们的飞船模拟器以及高冲突和分布式无人机群体重新配置（最多20个无人机）进行了实证验证。

更新时间: 2024-10-24 10:01:12

领域: cs.RO,cs.AI,cs.LG,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2208.04883v3

Understanding Players as if They Are Talking to the Game in a Customized Language: A Pilot Study

This pilot study explores the application of language models (LMs) to model game event sequences, treating them as a customized natural language. We investigate a popular mobile game, transforming raw event data into textual sequences and pretraining a Longformer model on this data. Our approach captures the rich and nuanced interactions within game sessions, effectively identifying meaningful player segments. The results demonstrate the potential of self-supervised LMs in enhancing game design and personalization without relying on ground-truth labels.

Updated: 2024-10-24 09:59:10

标题: 理解玩家好像他们正在用一种定制的语言与游戏交流：一项初步研究

摘要: 这项初步研究探讨了语言模型（LMs）在对游戏事件序列建模时的应用，将其视为定制的自然语言。我们研究了一款流行的移动游戏，将原始事件数据转化为文本序列，并在这些数据上预训练了一个Longformer模型。我们的方法捕捉了游戏会话中丰富而微妙的交互，有效地识别了有意义的玩家分段。结果表明，自监督LMs在增强游戏设计和个性化方面的潜力，而无需依赖地面真实标签。

更新时间: 2024-10-24 09:59:10

领域: cs.LG

下载: http://arxiv.org/abs/2410.18605v1

AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Digital agents capable of automating complex computer tasks have attracted considerable attention due to their immense potential to enhance human-computer interaction. However, existing agent methods exhibit deficiencies in their generalization and specialization capabilities, especially in handling open-ended computer tasks in real-world environments. Inspired by the rich functionality of the App store, we present AgentStore, a scalable platform designed to dynamically integrate heterogeneous agents for automating computer tasks. AgentStore empowers users to integrate third-party agents, allowing the system to continuously enrich its capabilities and adapt to rapidly evolving operating systems. Additionally, we propose a novel core \textbf{MetaAgent} with the \textbf{AgentToken} strategy to efficiently manage diverse agents and utilize their specialized and generalist abilities for both domain-specific and system-wide tasks. Extensive experiments on three challenging benchmarks demonstrate that AgentStore surpasses the limitations of previous systems with narrow capabilities, particularly achieving a significant improvement from 11.21\% to 23.85\% on the OSWorld benchmark, more than doubling the previous results. Comprehensive quantitative and qualitative results further demonstrate AgentStore's ability to enhance agent systems in both generalization and specialization, underscoring its potential for developing the specialized generalist computer assistant. All our codes will be made publicly available in https://chengyou-jia.github.io/AgentStore-Home.

Updated: 2024-10-24 09:58:40

标题: AgentStore：将异构代理集成为专业的通用计算助手的可扩展方法

摘要: 数字代理能够自动化复杂的计算机任务，由于其巨大的潜力来增强人机交互，已经引起了广泛关注。然而，现有的代理方法在泛化和专业化能力方面存在不足，特别是在处理真实环境中的开放式计算机任务时。受App商店丰富功能的启发，我们提出AgentStore，一个可扩展的平台，旨在动态集成异构代理来自动化计算机任务。AgentStore赋予用户集成第三方代理的能力，使系统能够持续丰富其功能，并适应快速发展的操作系统。此外，我们提出了一种新颖的核心MetaAgent，采用AgentToken策略来高效管理各种代理，并利用它们的专业化和泛化能力来执行领域特定和系统范围的任务。对三个具有挑战性的基准测试的广泛实验表明，AgentStore超越了先前系统的狭窄能力的限制，特别是在OSWorld基准测试上实现了从11.21％到23.85％的显著改进，超过了之前的结果。全面的定量和定性结果进一步表明AgentStore在泛化和专业化方面提高了代理系统的能力，突显了其发展专业泛化计算机助手的潜力。我们的所有代码将在https://chengyou-jia.github.io/AgentStore-Home上公开提供。

更新时间: 2024-10-24 09:58:40

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2410.18603v1

AutoPSV: Automated Process-Supervised Verifier

In this work, we propose a novel method named \textbf{Auto}mated \textbf{P}rocess-\textbf{S}upervised \textbf{V}erifier (\textbf{\textsc{AutoPSV}}) to enhance the reasoning capabilities of large language models (LLMs) by automatically annotating the reasoning steps. \textsc{AutoPSV} begins by training a verification model on the correctness of final answers, enabling it to generate automatic process annotations. This verification model assigns a confidence score to each reasoning step, indicating the probability of arriving at the correct final answer from that point onward. We detect relative changes in the verification's confidence scores across reasoning steps to automatically annotate the reasoning process, enabling error detection even in scenarios where ground truth answers are unavailable. This alleviates the need for numerous manual annotations or the high computational costs associated with model-induced annotation approaches. We experimentally validate that the step-level confidence changes learned by the verification model trained on the final answer correctness can effectively identify errors in the reasoning steps. We demonstrate that the verification model, when trained on process annotations generated by \textsc{AutoPSV}, exhibits improved performance in selecting correct answers from multiple LLM-generated outputs. Notably, we achieve substantial improvements across five datasets in mathematics and commonsense reasoning. The source code of \textsc{AutoPSV} is available at \url{https://github.com/rookie-joe/AutoPSV}.

Updated: 2024-10-24 09:52:59

标题: AutoPSV：自动过程监督验证器

摘要: 在这项工作中，我们提出了一种名为\textbf{Auto}mated \textbf{P}rocess-\textbf{S}upervised \textbf{V}erifier（\textbf{\textsc{AutoPSV}}）的新方法，通过自动注释推理步骤来增强大型语言模型（LLMs）的推理能力。\textsc{AutoPSV}首先通过训练一个验证模型来评估最终答案的正确性，从而使其能够生成自动的过程注释。这个验证模型为每个推理步骤分配一个置信度分数，表示从那一点开始到达正确的最终答案的概率。我们检测验证的置信度分数在推理步骤中的相对变化，自动注释推理过程，即使在没有地面真实答案的情况下也能检测错误。这减轻了大量手动注释或与模型引导的注释方法相关的高计算成本的需求。我们通过实验证实，验证模型在最终答案正确性训练的基础上学习到的步骤级置信度变化能有效识别推理步骤中的错误。我们展示，当验证模型在\textsc{AutoPSV}生成的过程注释上进行训练时，可以在从多个LLM生成的输出中选择正确答案方面表现出更好的性能。值得注意的是，我们在数学和常识推理的五个数据集上取得了显著的改进。\textsc{AutoPSV}的源代码可在\url{https://github.com/rookie-joe/AutoPSV}上找到。

更新时间: 2024-10-24 09:52:59

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.16802v4

Differential Informed Auto-Encoder

In this article, an encoder was trained to obtain the inner structure of the original data by obtain a differential equations. A decoder was trained to resample the original data domain, to generate new data that obey the differential structure of the original data using the physics-informed neural network.

Updated: 2024-10-24 09:42:52

标题: Differentiated Informed Auto-Encoder

摘要: 在这篇文章中，一个编码器被训练以通过获取微分方程来获得原始数据的内部结构。一个解码器被训练以重新取样原始数据域，通过使用物理启发的神经网络生成符合原始数据微分结构的新数据。

更新时间: 2024-10-24 09:42:52

领域: cs.LG

下载: http://arxiv.org/abs/2410.18593v1

A Historical Trajectory Assisted Optimization Method for Zeroth-Order Federated Learning

Federated learning heavily relies on distributed gradient descent techniques. In the situation where gradient information is not available, the gradients need to be estimated from zeroth-order information, which typically involves computing finite-differences along isotropic random directions. This method suffers from high estimation errors, as the geometric features of the objective landscape may be overlooked during the isotropic sampling. In this work, we propose a non-isotropic sampling method to improve the gradient estimation procedure. Gradients in our method are estimated in a subspace spanned by historical trajectories of solutions, aiming to encourage the exploration of promising regions and hence improve the convergence. The proposed method uses a covariance matrix for sampling which is a convex combination of two parts. The first part is a thin projection matrix containing the basis of the subspace which is designed to improve the exploitation ability. The second part is the historical trajectories. We implement this method in zeroth-order federated settings, and show that the convergence rate aligns with existing ones while introducing no significant overheads in communication or local computation. The effectiveness of our proposal is verified on several numerical experiments in comparison to several commonly-used zeroth-order federated optimization algorithms.

Updated: 2024-10-24 09:39:27

标题: 一个历史轨迹辅助优化方法用于零阶联邦学习

摘要: 联邦学习在很大程度上依赖于分布式梯度下降技术。在梯度信息不可用的情况下，需要从零阶信息中估计梯度，这通常涉及沿各向同性随机方向计算有限差分。这种方法存在较高的估计误差，因为在各向同性采样期间可能会忽略目标景观的几何特征。在这项工作中，我们提出了一种非各向同性采样方法来改善梯度估计过程。我们的方法中梯度是在由解决方案的历史轨迹张成的子空间中估计的，旨在鼓励探索有希望的区域，从而改善收敛性。所提出的方法使用一个协方差矩阵进行采样，该协方差矩阵是两部分的凸组合。第一部分是包含子空间基础的薄投影矩阵，旨在提高利用能力。第二部分是历史轨迹。我们在零阶联邦设置中实现了这种方法，并显示了收敛速度与现有方法一致，同时在通信或本地计算中没有引入显著的开销。我们在几个数值实验中验证了我们提议的方法的有效性，与几种常用的零阶联邦优化算法进行了比较。

更新时间: 2024-10-24 09:39:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.15955v5

Characterizing Physician Referral Networks with Ricci Curvature

Identifying (a) systemic barriers to quality healthcare access and (b) key indicators of care efficacy in the United States remains a significant challenge. To improve our understanding of regional disparities in care delivery, we introduce a novel application of curvature, a geometrical-topological property of networks, to Physician Referral Networks. Our initial findings reveal that Forman-Ricci and Ollivier-Ricci curvature measures, which are known for their expressive power in characterizing network structure, offer promising indicators for detecting variations in healthcare efficacy while capturing a range of significant regional demographic features. We also present APPARENT, an open-source tool that leverages Ricci curvature and other network features to examine correlations between regional Physician Referral Networks structure, local census data, healthcare effectiveness, and patient outcomes.

Updated: 2024-10-24 09:39:20

标题: 使用Ricci曲率特征化医师转诊网络

摘要: 在美国，确定质量医疗保健获取的系统性障碍和关键的护理效力指标仍然是一个重大挑战。为了提高我们对医疗保健提供中地区差异的理解，我们引入了曲率的新颖应用，这是网络的几何拓扑特性，在医生转诊网络中。我们的初步研究结果显示，Forman-Ricci和Ollivier-Ricci曲率度量，以其表征网络结构的表现力而闻名，为检测医疗保健效力变化提供了有希望的指标，同时捕捉了一系列重要的地区人口统计特征。我们还提出了一个名为APPARENT的开源工具，利用Ricci曲率和其他网络特性来研究地区医生转诊网络结构、当地人口普查数据、医疗效果和患者结果之间的相关性。

更新时间: 2024-10-24 09:39:20

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2408.16022v2

Knowledge Distillation Using Frontier Open-source LLMs: Generalizability and the Role of Synthetic Data

Leading open-source large language models (LLMs) such as Llama-3.1-Instruct-405B are extremely capable at generating text, answering questions, and solving a variety of natural language understanding tasks. However, they incur higher inference cost and latency compared to smaller LLMs. Knowledge distillation provides a way to use outputs from these large, capable teacher models to train smaller student models which can be used for inference at lower cost and latency, while retaining comparable accuracy. We investigate the efficacy of distillation using the Llama-3.1-405B-Instruct teacher and the smaller Llama-3.1-8B-Instruct and Llama-3.1-70B-Instruct student models. Contributions of this work include (a) We evaluate the generalizability of distillation with the above Llama-3.1 teacher-student pairs across different tasks and datasets (b) We show that using synthetic data during distillation significantly improves the accuracy of 8B and 70B models, and when used with reasoning chains, even matches or surpasses the zero-shot accuracy of 405B model on some datasets (c) We empirically show that distillation enables 8B and 70B models to internalize 405B's reasoning ability by using only standard fine-tuning (without customizing any loss function). This allows cost and latency-efficient student model inference. (d) We show pitfalls in evaluation of distillation, and present task-specific evaluation, including both human and LLM-grading, and ground-truth based traditional accuracy benchmarks. This methodical study brings out the fundamental importance of synthetic data quality in knowledge distillation, and of combining multiple, task-specific ways of accuracy and quality evaluation in assessing the effectiveness of distillation.

Updated: 2024-10-24 09:37:23

标题: 知识蒸馏：使用前沿开源LLMs的通用性和合成数据的作用

摘要: 领先的开源大型语言模型（LLMs），如Llama-3.1-Instruct-405B，在生成文本、回答问题和解决各种自然语言理解任务方面非常有能力。然而，与较小的LLMs相比，它们产生更高的推断成本和延迟。知识蒸馏提供了一种利用这些大型、有能力的教师模型的输出来训练较小的学生模型的方式，这些学生模型可以以更低的成本和延迟进行推断，同时保持可比较的准确性。我们研究了使用Llama-3.1-405B-Instruct教师和较小的Llama-3.1-8B-Instruct和Llama-3.1-70B-Instruct学生模型进行蒸馏的有效性。本研究的贡献包括（a）我们评估了以上Llama-3.1教师-学生对在不同任务和数据集上的蒸馏的泛化能力；（b）我们表明，在蒸馏过程中使用合成数据显著提高了8B和70B模型的准确性，当与推理链一起使用时，甚至在某些数据集上与405B模型的零-shot准确性相匹敌或超越；（c）我们凭经验证明，蒸馏使8B和70B模型能够通过仅使用标准微调（而无需自定义任何损失函数）内化405B的推理能力，这使得成本和延迟高效的学生模型推断成为可能；（d）我们展示了蒸馏评估中的陷阱，并提出了任务特定的评估，包括人类评分和LLM评分，以及基于地面真实数据的传统准确性基准。这种方法论研究揭示了知识蒸馏中合成数据质量以及结合多种任务特定的准确性和质量评估方式在评估蒸馏效果时的基本重要性。

更新时间: 2024-10-24 09:37:23

领域: cs.LG

下载: http://arxiv.org/abs/2410.18588v1

Boosting Deductive Reasoning with Step Signals In RLHF

Logical reasoning is a crucial task for Large Language Models (LLMs), enabling them to tackle complex problems. Among reasoning tasks, multi-step reasoning poses a particular challenge. Grounded in the theory of formal logic, we have developed an automated method, Multi-step Deduction (MuseD), for deductive reasoning data. MuseD has allowed us to create training and testing datasets for multi-step reasoning. Our generation method enables control over the complexity of the generated instructions, facilitating training and evaluation of models across different difficulty levels. Through RLHF training, our training data has demonstrated significant improvements in logical capabilities for both in-domain of out-of-domain reasoning tasks. Additionally, we have conducted tests to assess the multi-step reasoning abilities of various models.

Updated: 2024-10-24 09:36:53

标题: 用RLHF中的步进信号增强演绎推理

摘要: 逻辑推理是大型语言模型（LLMs）的关键任务，使它们能够解决复杂问题。在推理任务中，多步推理是一个特殊的挑战。基于形式逻辑理论，我们开发了一种自动方法，多步演绎（MuseD），用于演绎推理数据。MuseD使我们能够创建多步推理的训练和测试数据集。我们的生成方法可以控制生成指令的复杂性，便于在不同难度级别上训练和评估模型。通过RLHF训练，我们的训练数据在领域内和领域外的推理任务中展现出了逻辑能力的显著提升。此外，我们进行了测试，评估各种模型的多步推理能力。

更新时间: 2024-10-24 09:36:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.09528v2

Aligning CodeLLMs with Direct Preference Optimization

The last year has witnessed the rapid progress of large language models (LLMs) across diverse domains. Among them, CodeLLMs have garnered particular attention because they can not only assist in completing various programming tasks but also represent the decision-making and logical reasoning capabilities of LLMs. However, current CodeLLMs mainly focus on pre-training and supervised fine-tuning scenarios, leaving the alignment stage, which is important for post-training LLMs, under-explored. This work first identifies that the commonly used PPO algorithm may be suboptimal for the alignment of CodeLLM because the involved reward rules are routinely coarse-grained and potentially flawed. We then advocate addressing this using the DPO algorithm. Based on only preference data pairs, DPO can render the model rank data automatically, giving rise to a fine-grained rewarding pattern more robust than human intervention. We also contribute a pipeline for collecting preference pairs for DPO on CodeLLMs. Studies show that our method significantly improves the performance of existing CodeLLMs on benchmarks such as MBPP and HumanEval.

Updated: 2024-10-24 09:36:13

标题: 将CodeLLMs与直接偏好优化对齐

摘要: 去年见证了大型语言模型（LLMs）在各个领域取得了快速进展。其中，CodeLLMs引起了特别关注，因为它们不仅可以帮助完成各种编程任务，还可以代表LLMs的决策和逻辑推理能力。然而，目前的CodeLLMs主要集中在预训练和监督微调场景，对于后期训练LLMs很重要的对齐阶段尚未被充分探索。本文首先确定常用的PPO算法可能不够优化CodeLLM的对齐，因为涉及的奖励规则通常过于粗粒度且可能存在缺陷。然后我们倡导使用DPO算法来解决这个问题。基于仅有的偏好数据对，DPO可以自动使模型排序数据，产生比人为干预更稳健的细粒度奖励模式。我们还为在CodeLLMs上收集偏好对提供了一个流程。研究表明，我们的方法显著提高了现有CodeLLMs在诸如MBPP和HumanEval等基准上的性能。

更新时间: 2024-10-24 09:36:13

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18585v1

Benchmarking Graph Learning for Drug-Drug Interaction Prediction

Predicting drug-drug interaction (DDI) plays an important role in pharmacology and healthcare for identifying potential adverse interactions and beneficial combination therapies between drug pairs. Recently, a flurry of graph learning methods have been introduced to predict drug-drug interactions. However, evaluating existing methods has several limitations, such as the absence of a unified comparison framework for DDI prediction methods, lack of assessments in meaningful real-world scenarios, and insufficient exploration of side information usage. In order to address these unresolved limitations in the literature, we propose a DDI prediction benchmark on graph learning. We first conduct unified evaluation comparison among existing methods. To meet realistic scenarios, we further evaluate the performance of different methods in settings with new drugs involved and examine the performance across different DDI types. Component analysis is conducted on the biomedical network to better utilize side information. Through this work, we hope to provide more insights for the problem of DDI prediction. Our implementation and data is open-sourced at https://anonymous.4open.science/r/DDI-Benchmark-ACD9/.

Updated: 2024-10-24 09:35:34

标题: 基于图学习的药物相互作用预测的基准比较

摘要: 预测药物相互作用（DDI）在药理学和医疗保健中起着重要作用，可用于识别药物对之间潜在的不良相互作用和有益的组合疗法。最近，一系列图学习方法被引入用于预测药物相互作用。然而，评估现有方法存在几个局限，例如缺乏统一的DDI预测方法比较框架，缺乏在有意义的真实场景中的评估，以及对辅助信息使用的不足探索。为了解决文献中这些未解决的限制，我们提出了一个基于图学习的DDI预测基准。我们首先对现有方法进行统一评估比较。为了满足现实场景，我们进一步评估不同方法在涉及新药物的情况下的性能，并检查跨不同DDI类型的性能。在生物医学网络上进行部分分析以更好地利用辅助信息。通过这项工作，我们希望为DDI预测问题提供更多见解。我们的实现和数据是开源的，网址为https://anonymous.4open.science/r/DDI-Benchmark-ACD9/。

更新时间: 2024-10-24 09:35:34

领域: cs.LG

下载: http://arxiv.org/abs/2410.18583v1

ARBEx: Attentive Feature Extraction with Reliability Balancing for Robust Facial Expression Learning

In this paper, we introduce a framework ARBEx, a novel attentive feature extraction framework driven by Vision Transformer with reliability balancing to cope against poor class distributions, bias, and uncertainty in the facial expression learning (FEL) task. We reinforce several data pre-processing and refinement methods along with a window-based cross-attention ViT to squeeze the best of the data. We also employ learnable anchor points in the embedding space with label distributions and multi-head self-attention mechanism to optimize performance against weak predictions with reliability balancing, which is a strategy that leverages anchor points, attention scores, and confidence values to enhance the resilience of label predictions. To ensure correct label classification and improve the models' discriminative power, we introduce anchor loss, which encourages large margins between anchor points. Additionally, the multi-head self-attention mechanism, which is also trainable, plays an integral role in identifying accurate labels. This approach provides critical elements for improving the reliability of predictions and has a substantial positive effect on final prediction capabilities. Our adaptive model can be integrated with any deep neural network to forestall challenges in various recognition tasks. Our strategy outperforms current state-of-the-art methodologies, according to extensive experiments conducted in a variety of contexts.

Updated: 2024-10-24 09:32:17

标题: ARBEx：具有可靠性平衡的关注特征提取，用于稳健的面部表情学习

摘要: 在本文中，我们介绍了一个名为ARBEx的框架，这是一个新颖的关注特征提取框架，由Vision Transformer驱动，具有可靠性平衡，以应对在面部表情学习（FEL）任务中的糟糕类别分布、偏见和不确定性。我们强化了几种数据预处理和细化方法，以及基于窗口的交叉注意力ViT，以充分利用数据。我们还利用嵌入空间中的可学习锚点与标签分布和多头自注意机制，以优化针对可靠性平衡的弱预测的性能，这是一种利用锚点、注意力分数和置信值来增强标签预测韧性的策略。为了确保正确的标签分类并提高模型的区分能力，我们引入了锚点损失，鼓励锚点之间的大间隔。此外，多头自注意机制，也是可训练的，对于识别准确的标签起着至关重要的作用。这种方法提供了改善预测可靠性的关键要素，并对最终预测能力产生了实质性积极影响。我们的自适应模型可以与任何深度神经网络集成，以预防各种识别任务中的挑战。根据在各种情境下进行的广泛实验，我们的策略优于当前的最先进方法论。

更新时间: 2024-10-24 09:32:17

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2305.01486v5

SIKeD: Self-guided Iterative Knowledge Distillation for mathematical reasoning

Large Language Models (LLMs) can transfer their reasoning skills to smaller models by teaching them to generate the intermediate reasoning process required to solve multistep reasoning tasks. While LLMs can accurately solve reasoning tasks through a variety of strategies, even without fine-tuning, smaller models are not expressive enough to fit the LLMs distribution on all strategies when distilled and tend to prioritize one strategy over the others. This reliance on one strategy poses a challenge for smaller models when attempting to solve reasoning tasks that may be difficult with their preferred strategy. To address this, we propose a distillation method SIKeD (Self-guided Iterative Knowledge Distillation for mathematical reasoning), where the LLM teaches the smaller model to approach a task using different strategies and the smaller model uses its self-generated on-policy outputs to choose the most suitable strategy for the given task. The training continues in a self-guided iterative manner, where for each training iteration, a decision is made on how to combine the LLM data with the self-generated outputs. Unlike traditional distillation methods, SIKeD allows the smaller model to learn which strategy is suitable for a given task while continuously learning to solve a task using different strategies. Our experiments on various mathematical reasoning datasets show that SIKeD significantly outperforms traditional distillation techniques across smaller models of different sizes. Our code is available at: https://github.com/kumar-shridhar/SIKeD

Updated: 2024-10-24 09:29:18

标题: SIKeD: 数学推理的自主迭代知识蒸馏

摘要: 大型语言模型（LLMs）可以通过教导它们生成解决多步推理任务所需的中间推理过程来将其推理技能转移到较小模型。虽然LLMs可以通过各种策略准确解决推理任务，即使没有进行微调，但较小模型在蒸馏时不够表达能力强，无法适应LLMs在所有策略上的分布，并倾向于优先考虑一种策略。对一种策略的依赖在较小模型尝试使用其偏好策略解决可能较困难的推理任务时构成挑战。为了解决这个问题，我们提出了一种蒸馏方法SIKeD（用于数学推理的自主迭代知识蒸馏），其中LLM教导较小模型以不同策略处理任务，而较小模型使用其自动生成的在线策略输出来选择给定任务最合适的策略。训练以自主迭代方式继续进行，对于每个训练迭代，决定如何将LLM数据与自动生成的输出相结合。与传统蒸馏方法不同，SIKeD允许较小模型学习哪种策略适合于给定任务，同时不断学习使用不同策略解决任务。我们在各种数学推理数据集上的实验表明，SIKeD在不同大小的较小模型上显著优于传统蒸馏技术。我们的代码可在以下链接找到：https://github.com/kumar-shridhar/SIKeD

更新时间: 2024-10-24 09:29:18

领域: cs.AI

下载: http://arxiv.org/abs/2410.18574v1

Taipan: Efficient and Expressive State Space Language Models with Selective Attention

Efficient long-context language modeling remains a significant challenge in Natural Language Processing (NLP). While Transformers dominate language tasks, they struggle with long sequences due to quadratic computational complexity in training and linearly scaling memory costs during inference. Recent State Space Models (SSMs) such as Mamba offer alternatives with constant memory usage, but they underperform in tasks requiring extensive in-context retrieval. We introduce Taipan, a novel hybrid architecture that combines Mamba-2 with Selective Attention Layers (SALs). These SALs identify tokens requiring long-range interactions, remove less important features, and then augment their representations using the attention module. This approach balances Mamba's efficiency with Transformer-like performance in memory-intensive tasks. By constraining the attention budget, Taipan extends accurate predictions to context lengths of up to 1 million tokens while preserving computational efficiency. Our experiments demonstrate Taipan's superior performance across various scales and tasks, offering a promising solution for efficient long-context language modeling.

Updated: 2024-10-24 09:25:37

标题: Taipan: 高效且具有选择性注意力的状态空间语言模型

摘要: 高效的长上下文语言建模仍然是自然语言处理（NLP）中的一个重大挑战。尽管Transformer在语言任务中占主导地位，但由于训练中的二次计算复杂性和推断期间的线性内存成本增加，它们在处理长序列时表现出困难。最近的状态空间模型（SSM）如Mamba提供了与恒定内存使用相关的替代方法，但在需要大量上下文检索的任务中表现不佳。我们介绍了Taipan，这是一种新颖的混合架构，将Mamba-2与选择性注意力层（SALs）结合在一起。这些SALs标识需要长距离交互的令牌，去除不太重要的特征，然后使用注意力模块增强它们的表示。这种方法在内存密集型任务中平衡了Mamba的效率和类Transformer的性能。通过限制注意力预算，Taipan将准确预测扩展到长达100万个令牌的上下文长度，同时保持计算效率。我们的实验表明，Taipan在各种规模和任务中表现出优异的性能，为高效的长上下文语言建模提供了一个有前途的解决方案。

更新时间: 2024-10-24 09:25:37

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18572v1

Zero-shot Object Navigation with Vision-Language Models Reasoning

Object navigation is crucial for robots, but traditional methods require substantial training data and cannot be generalized to unknown environments. Zero-shot object navigation (ZSON) aims to address this challenge, allowing robots to interact with unknown objects without specific training data. Language-driven zero-shot object navigation (L-ZSON) is an extension of ZSON that incorporates natural language instructions to guide robot navigation and interaction with objects. In this paper, we propose a novel Vision Language model with a Tree-of-thought Network (VLTNet) for L-ZSON. VLTNet comprises four main modules: vision language model understanding, semantic mapping, tree-of-thought reasoning and exploration, and goal identification. Among these modules, Tree-of-Thought (ToT) reasoning and exploration module serves as a core component, innovatively using the ToT reasoning framework for navigation frontier selection during robot exploration. Compared to conventional frontier selection without reasoning, navigation using ToT reasoning involves multi-path reasoning processes and backtracking when necessary, enabling globally informed decision-making with higher accuracy. Experimental results on PASTURE and RoboTHOR benchmarks demonstrate the outstanding performance of our model in LZSON, particularly in scenarios involving complex natural language as target instructions.

Updated: 2024-10-24 09:24:07

标题: 使用视觉语言模型推理的零样本物体导航

摘要: 物体导航对于机器人至关重要，但传统方法需要大量训练数据，并且不能推广到未知环境。零样本物体导航（ZSON）旨在解决这一挑战，使机器人能够与未知物体进行交互，而无需特定训练数据。以自然语言为驱动的零样本物体导航（L-ZSON）是ZSON的扩展，它结合了自然语言指令来引导机器人导航和与物体的交互。在本文中，我们提出了一种新颖的具有思维树网络（VLTNet）的视觉语言模型，用于L-ZSON。VLTNet包括四个主要模块：视觉语言模型理解、语义映射、思维树推理和探索以及目标识别。在这些模块中，思维树（ToT）推理和探索模块作为一个核心组件，创新地使用ToT推理框架来选择机器人探索中的导航前沿。与没有推理的传统前沿选择相比，使用ToT推理进行导航涉及到多路径推理过程，并在必要时进行回溯，从而实现具有更高准确性的全局通知决策。在PASTURE和RoboTHOR基准测试上的实验结果显示了我们的模型在LZSON中的出色性能，特别是在涉及复杂自然语言作为目标指令的场景中。

更新时间: 2024-10-24 09:24:07

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.18570v1

Analyzing Human Questioning Behavior and Causal Curiosity through Natural Queries

The recent development of Large Language Models (LLMs) has changed our role in interacting with them. Instead of primarily testing these models with questions we already know the answers to, we now use them to explore questions where the answers are unknown to us. This shift, which hasn't been fully addressed in existing datasets, highlights the growing need to understand naturally occurring human questions - that are more complex, open-ended, and reflective of real-world needs. To this end, we present NatQuest, a collection of 13,500 naturally occurring questions from three diverse sources: human-to-search-engine queries, human-to-human interactions, and human-to-LLM conversations. Our comprehensive collection enables a rich understanding of human curiosity across various domains and contexts. Our analysis reveals a significant presence of causal questions (up to 42%) within the dataset, for which we develop an iterative prompt improvement framework to identify all causal queries, and examine their unique linguistic properties, cognitive complexity, and source distribution. We also lay the groundwork to explore LLM performance on these questions and provide six efficient classification models to identify causal questions at scale for future work.

Updated: 2024-10-24 09:21:38

标题: 通过自然查询分析人类提问行为和因果好奇心

摘要: 最近大型语言模型（LLMs）的发展改变了我们与其互动的角色。我们不再主要用我们已经知道答案的问题来测试这些模型，而是现在使用它们来探索我们不知道答案的问题。这种转变在现有数据集中尚未得到充分解决，凸显了我们需要了解自然发生的人类问题的增长需求 - 这些问题更加复杂、开放性，并反映了真实世界的需求。为此，我们提出了NatQuest，这是一个由三个不同来源（人类对搜索引擎的查询、人类之间的互动以及人类对LLM的对话）组成的包含13,500个自然发生问题的集合。我们全面的收集使得我们能够深入了解不同领域和背景下人类好奇心。我们的分析显示数据集中存在大量因果问题（高达42%），为此我们开发了一个迭代的提示改进框架来识别所有因果查询，并检查它们独特的语言属性、认知复杂性和来源分布。我们还为探索LLM在这些问题上的表现打下了基础，并提供了六种有效的分类模型，以便在未来的工作中大规模识别因果问题。

更新时间: 2024-10-24 09:21:38

领域: cs.CL,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.20318v2

How To Save Fees in Bitcoin Smart Contracts: a Simple Optimistic Off-chain Protocol

We consider the execution of smart contracts on Bitcoin. There, every contract step corresponds to appending to the blockchain a new transaction that spends the output representing the old contract state, creating a new one for the updated state. This standard procedure requires the contract participants to pay transaction fees for every execution step. In this paper, we introduce a protocol that moves most of the execution of a Bitcoin contract off-chain. When all participants follow this protocol, they are able to save on transaction fees, drastically reducing them. By contrast, whenever adversaries try to disrupt the off-chain execution, any honest participant is still able to enforce the correct contract behaviour, by continuing its execution on-chain.

Updated: 2024-10-24 09:21:09

标题: 如何在比特币智能合约中节省费用：一种简单的乐观离链协议

摘要: 我们考虑在比特币上执行智能合约。在那里，每个合约步骤都对应于向区块链追加一个新交易，该交易花费代表旧合约状态的输出，为更新后的状态创建一个新的输出。这一标准程序要求合约参与者为每个执行步骤支付交易费用。在本文中，我们介绍了一种协议，将大部分比特币合约的执行移至链下。当所有参与者遵循该协议时，他们能够节省交易费用，大幅减少费用。相比之下，每当对手试图干扰链下执行时，任何诚实的参与者仍然能够通过继续在链上执行来强制执行正确的合约行为。

更新时间: 2024-10-24 09:21:09

领域: cs.CR

下载: http://arxiv.org/abs/2403.09880v3

Heterogeneous Random Forest

Random forest (RF) stands out as a highly favored machine learning approach for classification problems. The effectiveness of RF hinges on two key factors: the accuracy of individual trees and the diversity among them. In this study, we introduce a novel approach called heterogeneous RF (HRF), designed to enhance tree diversity in a meaningful way. This diversification is achieved by deliberately introducing heterogeneity during the tree construction. Specifically, features used for splitting near the root node of previous trees are assigned lower weights when constructing the feature sub-space of the subsequent trees. As a result, dominant features in the prior trees are less likely to be employed in the next iteration, leading to a more diverse set of splitting features at the nodes. Through simulation studies, it was confirmed that the HRF method effectively mitigates the selection bias of trees within the ensemble, increases the diversity of the ensemble, and demonstrates superior performance on datasets with fewer noise features. To assess the comparative performance of HRF against other widely adopted ensemble methods, we conducted tests on 52 datasets, comprising both real-world and synthetic data. HRF consistently outperformed other ensemble methods in terms of accuracy across the majority of datasets.

Updated: 2024-10-24 09:18:55

标题: 异质性随机森林

摘要: 随机森林（RF）作为一种备受青睐的分类问题的机器学习方法。RF的有效性取决于两个关键因素：个体树的准确性和它们之间的多样性。在本研究中，我们介绍了一种名为异质RF（HRF）的新方法，旨在以有意义的方式增强树的多样性。这种多样化是通过在树的构建过程中故意引入异质性来实现的。具体地，在构建后续树的特征子空间时，先前树根节点附近用于分裂的特征被赋予较低的权重。因此，先前树中的主导特征在下一次迭代中更不可能被使用，从而导致节点上的分裂特征更加多样化。通过模拟研究，确认了HRF方法有效地减轻了集合中树的选择偏差，增加了集合的多样性，并在具有较少噪声特征的数据集上表现出更优越的性能。为了评估HRF相对于其他广泛采用的集成方法的性能，我们在包括真实世界和合成数据的52个数据集上进行了测试。在大多数数据集上，HRF在准确性方面一直优于其他集成方法。

更新时间: 2024-10-24 09:18:55

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.19022v1

Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation

We introduce Bielik 7B v0.1, a 7-billion-parameter generative text model for Polish language processing. Trained on curated Polish corpora, this model addresses key challenges in language model development through innovative techniques. These include Weighted Instruction Cross-Entropy Loss, which balances the learning of different instruction types, and Adaptive Learning Rate, which dynamically adjusts the learning rate based on training progress. To evaluate performance, we created the Open PL LLM Leaderboard and Polish MT-Bench, novel frameworks assessing various NLP tasks and conversational abilities. Bielik 7B v0.1 demonstrates significant improvements, achieving a 9 percentage point increase in average score compared to Mistral-7B-v0.1 on the RAG Reader task. It also excels in the Polish MT-Bench, particularly in Reasoning (6.15/10) and Role-playing (7.83/10) categories. This model represents a substantial advancement in Polish language AI, offering a powerful tool for diverse linguistic applications and setting new benchmarks in the field.

Updated: 2024-10-24 09:16:09

标题: Bielik 7B v0.1：波兰语言模型--发展、见解和评估

摘要: 我们介绍了Bielik 7B v0.1，这是一个拥有70亿参数的波兰语生成文本模型。该模型经过精心筛选的波兰语语料库训练，通过创新技术解决了语言模型开发中的关键挑战。这些技术包括加权指令交叉熵损失，平衡了不同指令类型的学习，以及自适应学习率，根据训练进展动态调整学习率。为了评估性能，我们创建了开放PL LLM排行榜和波兰MT-Bench，这是评估各种自然语言处理任务和对话能力的新框架。Bielik 7B v0.1展现了显著的改进，与Mistral-7B-v0.1在RAG Reader任务中平均得分相比提高了9个百分点。它在波兰MT-Bench中表现出色，特别是在推理（6.15/10）和角色扮演（7.83/10）类别中。这个模型代表了波兰语言AI的重大进步，为各种语言应用提供了强大的工具，并在该领域设立了新的基准。

更新时间: 2024-10-24 09:16:09

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2410.18565v1

Explainable News Summarization -- Analysis and mitigation of Disagreement Problem

Explainable AI (XAI) techniques for text summarization provide valuable understanding of how the summaries are generated. Recent studies have highlighted a major challenge in this area, known as the disagreement problem. This problem occurs when different XAI methods offer contradictory explanations for the summary generated from the same input article. This inconsistency across XAI methods has been evaluated using predefined metrics designed to quantify agreement levels between them, revealing significant disagreement. This impedes the reliability and interpretability of XAI in this area. To address this challenge, we propose a novel approach that utilizes sentence transformers and the k-means clustering algorithm to first segment the input article and then generate the explanation of the summary generated for each segment. By producing regional or segmented explanations rather than comprehensive ones, a decrease in the observed disagreement between XAI methods is hypothesized. This segmentation-based approach was used on two news summarization datasets, namely Extreme Summarization(XSum) and CNN-DailyMail, and the experiment was conducted using multiple disagreement metrics. Our experiments validate the hypothesis by showing a significant reduction in disagreement among different XAI methods. Additionally, a JavaScript visualization tool is developed, that is easy to use and allows users to interactively explore the color-coded visualization of the input article and the machine-generated summary based on the attribution scores of each sentences.

Updated: 2024-10-24 09:07:44

标题: 可解释的新闻摘要化--分析和缓解分歧问题

摘要: 可解释的人工智能（XAI）技术用于文本摘要提供了宝贵的理解，说明了摘要是如何生成的。最近的研究强调了这一领域的一个主要挑战，称为不一致问题。当不同的XAI方法为从相同输入文章生成的摘要提供矛盾的解释时，就会出现这个问题。通过使用旨在量化它们之间协议水平的预定义指标来评估跨XAI方法的这种不一致性，揭示了显著的不一致性。这影响了该领域XAI的可靠性和可解释性。为了解决这一挑战，我们提出了一种利用句子转换器和k均值聚类算法的新方法，首先对输入文章进行分段，然后为每个部分生成生成的摘要的解释。通过生成区域或分段解释而不是全面解释，假设可以减少观察到的XAI方法之间的不一致性。这种基于分段的方法在两个新闻摘要数据集上使用，即Extreme Summarization（XSum）和CNN-DailyMail，并且使用多个不一致性度量进行了实验。我们的实验通过显示不同XAI方法之间的显著减少来验证了假设。此外，还开发了一个JavaScript可视化工具，易于使用，允许用户交互式地探索根据每个句子的归因分数的颜色编码可视化输入文章和机器生成的摘要。

更新时间: 2024-10-24 09:07:44

领域: cs.AI

下载: http://arxiv.org/abs/2410.18560v1

Volley Revolver: A Novel Matrix-Encoding Method for Privacy-Preserving Neural Networks (Inference)

In this work, we present a novel matrix-encoding method that is particularly convenient for neural networks to make predictions in a privacy-preserving manner using homomorphic encryption. Based on this encoding method, we implement a convolutional neural network for handwritten image classification over encryption. For two matrices $A$ and $B$ to perform homomorphic multiplication, the main idea behind it, in a simple version, is to encrypt matrix $A$ and the transpose of matrix $B$ into two ciphertexts respectively. With additional operations, the homomorphic matrix multiplication can be calculated over encrypted matrices efficiently. For the convolution operation, we in advance span each convolution kernel to a matrix space of the same size as the input image so as to generate several ciphertexts, each of which is later used together with the ciphertext encrypting input images for calculating some of the final convolution results. We accumulate all these intermediate results and thus complete the convolution operation. In a public cloud with 40 vCPUs, our convolutional neural network implementation on the MNIST testing dataset takes $\sim$ 287 seconds to compute ten likelihoods of 32 encrypted images of size $28 \times 28$ simultaneously. The data owner only needs to upload one ciphertext ($\sim 19.8$ MB) encrypting these 32 images to the public cloud.

Updated: 2024-10-24 09:05:36

标题: 《Volley Revolver：一种用于隐私保护神经网络（推断）的新颖矩阵编码方法》

摘要: 在这项工作中，我们提出了一种新颖的矩阵编码方法，特别适用于神经网络使用同态加密以隐私保护的方式进行预测。基于这种编码方法，我们实现了一个用于手写图像分类的卷积神经网络加密实现。对于矩阵$A$和$B$进行同态乘法运算，其主要思想是将矩阵$A$和矩阵$B$的转置分别加密成两个密文。通过额外的操作，可以高效地在加密矩阵上计算同态矩阵乘法。对于卷积操作，我们提前将每个卷积核扩展到与输入图像相同大小的矩阵空间，以生成多个密文，每个密文与加密输入图像一起用于计算部分最终卷积结果。我们累加所有这些中间结果，从而完成卷积操作。在一个具有40个虚拟CPU的公共云中，我们在MNIST测试数据集上实现的卷积神经网络需要大约287秒来同时计算32张大小为$28 \times 28$的加密图像的十个可能性。数据所有者只需要上传一个加密这32张图像的密文（约19.8MB）到公共云。

更新时间: 2024-10-24 09:05:36

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2201.12577v6

Reinforcement Learning with Model Predictive Control for Highway Ramp Metering

In the backdrop of an increasingly pressing need for effective urban and highway transportation systems, this work explores the synergy between model-based and learning-based strategies to enhance traffic flow management by use of an innovative approach to the problem of ramp metering control that embeds Reinforcement Learning (RL) techniques within the Model Predictive Control (MPC) framework. The control problem is formulated as an RL task by crafting a suitable stage cost function that is representative of the traffic conditions, variability in the control action, and violations of the constraint on the maximum number of vehicles in queue. An MPC-based RL approach, which leverages the MPC optimal problem as a function approximation for the RL algorithm, is proposed to learn to efficiently control an on-ramp and satisfy its constraints despite uncertainties in the system model and variable demands. Simulations are performed on a benchmark small-scale highway network to compare the proposed methodology against other state-of-the-art control approaches. Results show that, starting from an MPC controller that has an imprecise model and is poorly tuned, the proposed methodology is able to effectively learn to improve the control policy such that congestion in the network is reduced and constraints are satisfied, yielding an improved performance that is superior to the other controllers.

Updated: 2024-10-24 09:01:51

标题: 使用模型预测控制的强化学习在高速公路匝道计量中的应用

摘要: 在日益迫切需要有效的城市和高速公路交通系统的背景下，本研究探讨了基于模型和基于学习的策略之间的协同作用，通过一种创新方法来增强交通流管理，该方法将强化学习（RL）技术嵌入到模型预测控制（MPC）框架中，解决了匝道信号灯控制的问题。该控制问题被制定为一个RL任务，通过设计一个适当的阶段成本函数来代表交通条件、控制行为的变化以及对队列中车辆数量的最大约束的违反。提出了一种基于MPC的RL方法，利用MPC最优问题作为RL算法的函数逼近，从而学会高效控制匝道并满足其约束，尽管系统模型和需求变化存在不确定性。在一个基准小规模高速公路网络上进行了模拟，以比较提出的方法与其他最先进的控制方法。结果表明，从一个具有不精确模型且调节不当的MPC控制器开始，提出的方法能够有效学习改进控制策略，使网络中的拥堵得到缓解并满足约束条件，获得了优于其他控制器的改善性能。

更新时间: 2024-10-24 09:01:51

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2311.08820v3

Complexity Matters: Effective Dimensionality as a Measure for Adversarial Robustness

Quantifying robustness in a single measure for the purposes of model selection, development of adversarial training methods, and anticipating trends has so far been elusive. The simplest metric to consider is the number of trainable parameters in a model but this has previously been shown to be insufficient at explaining robustness properties. A variety of other metrics, such as ones based on boundary thickness and gradient flatness have been proposed but have been shown to be inadequate proxies for robustness. In this work, we investigate the relationship between a model's effective dimensionality, which can be thought of as model complexity, and its robustness properties. We run experiments on commercial-scale models that are often used in real-world environments such as YOLO and ResNet. We reveal a near-linear inverse relationship between effective dimensionality and adversarial robustness, that is models with a lower dimensionality exhibit better robustness. We investigate the effect of a variety of adversarial training methods on effective dimensionality and find the same inverse linear relationship present, suggesting that effective dimensionality can serve as a useful criterion for model selection and robustness evaluation, providing a more nuanced and effective metric than parameter count or previously-tested measures.

Updated: 2024-10-24 09:01:34

标题: 复杂性很重要：有效维度作为对抗鲁棒性的衡量标准

摘要: 迄今为止，将稳健性量化为一个单一度量，以用于模型选择、对抗性训练方法的开发和趋势预测一直是困难的。最简单的度量指标是模型中可训练参数的数量，但先前已经显示这种度量不足以解释稳健性属性。提出了一系列其他度量指标，例如基于边界厚度和梯度平坦度的度量，但已经证明这些度量不足以成为稳健性的代理。在这项工作中，我们研究了模型的有效维度，可以将其视为模型复杂性，与其稳健性属性之间的关系。我们对商用规模的模型进行了实验，这些模型通常在现实环境中使用，如YOLO和ResNet。我们揭示了有效维度与对抗性稳健性之间的近线性逆关系，即具有较低维度的模型表现出更好的稳健性。我们研究了各种对抗性训练方法对有效维度的影响，并发现存在相同的逆线性关系，这表明有效维度可以作为模型选择和稳健性评估的有用标准，提供比参数数量或先前测试过的度量更细致和有效的指标。

更新时间: 2024-10-24 09:01:34

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2410.18556v1

Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents

Effective research ideation is a critical step for scientific research. However, the exponential increase in scientific literature makes it challenging for researchers to stay current with recent advances and identify meaningful research directions. Recent developments in large language models~(LLMs) suggest a promising avenue for automating the generation of novel research ideas. However, existing methods for idea generation either trivially prompt LLMs or directly expose LLMs to extensive literature without indicating useful information. Inspired by the research process of human researchers, we propose a Chain-of-Ideas~(CoI) agent, an LLM-based agent that organizes relevant literature in a chain structure to effectively mirror the progressive development in a research domain. This organization facilitates LLMs to capture the current advancements in research, thereby enhancing their ideation capabilities. Furthermore, we propose Idea Arena, an evaluation protocol that can comprehensively evaluate idea generation methods from different perspectives, aligning closely with the preferences of human researchers. Experimental results indicate that the CoI agent consistently outperforms other methods and shows comparable quality as humans in research idea generation. Moreover, our CoI agent is budget-friendly, with a minimum cost of \$0.50 to generate a candidate idea and its corresponding experimental design.

Updated: 2024-10-24 08:59:53

标题: 《思想链条：LLM代理在新思想开发中革新研究》

摘要: 有效的研究构想是科学研究中的关键步骤。然而，科学文献的指数增长使研究人员难以跟上最新进展并确定有意义的研究方向。最近大型语言模型（LLMs）的发展表明自动生成新颖研究想法的前景广阔。然而，现有的构想方法要么仅简单提示LLMs，要么直接让LLMs接触大量文献而未提供有用信息。受到人类研究者研究过程的启发，我们提出了一种Chain-of-Ideas（CoI）代理，这是一种基于LLMs的代理，它将相关文献组织成链式结构，有效地反映研究领域的渐进发展。这种组织有助于LLMs捕捉研究领域的最新进展，从而增强其构想能力。此外，我们提出了一个名为Idea Arena的评估协议，可以全面评估不同视角的构想方法，与人类研究者的偏好密切相关。实验结果表明，CoI代理在研究构想方面始终优于其他方法，并在研究构想方面显示出与人类相当的质量。此外，我们的CoI代理成本低廉，生成一个候选构想及其相应的实验设计的最低成本仅为0.50美元。

更新时间: 2024-10-24 08:59:53

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.13185v3

Local and Global Graph Modeling with Edge-weighted Graph Attention Network for Handwritten Mathematical Expression Recognition

In this paper, we present a novel approach to Handwritten Mathematical Expression Recognition (HMER) by leveraging graph-based modeling techniques. We introduce an End-to-end model with an Edge-weighted Graph Attention Mechanism (EGAT), designed to perform simultaneous node and edge classification. This model effectively integrates node and edge features, facilitating the prediction of symbol classes and their relationships within mathematical expressions. Additionally, we propose a stroke-level Graph Modeling method for both local (LGM) and global (GGM) information, which applies an end-to-end model to Online HMER tasks, transforming the recognition problem into node and edge classification tasks in graph structure. By capturing both local and global graph features, our method ensures comprehensive understanding of the expression structure. Through the combination of these components, our system demonstrates superior performance in symbol detection, relation classification, and expression-level recognition.

Updated: 2024-10-24 08:59:27

标题: 使用具有边权重图注意力网络的本地和全局图建模用于手写数学表达式识别

摘要: 在这篇论文中，我们提出了一种利用基于图的建模技术进行手写数学表达识别（HMER）的新方法。我们引入了一种具有边权重图注意机制（EGAT）的端到端模型，旨在同时进行节点和边的分类。该模型有效地整合了节点和边的特征，促进了数学表达式中符号类别及其关系的预测。此外，我们提出了一种基于笔画级别的图建模方法，既包括局部（LGM）又包括全局（GGM）信息，该方法将端到端模型应用于在线HMER任务，将识别问题转化为图结构中的节点和边分类任务。通过捕捉局部和全局图特征，我们的方法确保了对表达式结构的全面理解。通过结合这些组件，我们的系统在符号检测、关系分类和表达级别识别方面表现出优越性能。

更新时间: 2024-10-24 08:59:27

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18555v1

Optimal Equivariant Architectures from the Symmetries of Matrix-Element Likelihoods

The Matrix-Element Method (MEM) has long been a cornerstone of data analysis in high-energy physics. It leverages theoretical knowledge of parton-level processes and symmetries to evaluate the likelihood of observed events. In parallel, the advent of geometric deep learning has enabled neural network architectures that incorporate known symmetries directly into their design, leading to more efficient learning. This paper presents a novel approach that combines MEM-inspired symmetry considerations with equivariant neural network design for particle physics analysis. Even though Lorentz invariance and permutation invariance overall reconstructed objects are the largest and most natural symmetry in the input domain, we find that they are sub-optimal in most practical search scenarios. We propose a longitudinal boost-equivariant message-passing neural network architecture that preserves relevant discrete symmetries. We present numerical studies demonstrating MEM-inspired architectures achieve new state-of-the-art performance in distinguishing di-Higgs decays to four bottom quarks from the QCD background, with enhanced sample and parameter efficiencies. This synergy between MEM and equivariant deep learning opens new directions for physics-informed architecture design, promising more powerful tools for probing physics beyond the Standard Model.

Updated: 2024-10-24 08:56:37

标题: 矩阵元素似然性的对称性导出的最优等变结构

摘要: 矩阵元方法（MEM）长期以来一直是高能物理数据分析的基石。它利用对质子水平过程和对称性的理论知识来评估观测事件的可能性。与此同时，几何深度学习的出现使得神经网络架构可以直接将已知对称性纳入设计中，从而实现更高效的学习。本文提出了一种将受MEM启发的对称性考虑与等变神经网络设计相结合的新方法，用于粒子物理分析。尽管洛伦兹不变性和整体重建对象的置换不变性是输入域中最大且最自然的对称性，但我们发现它们在大多数实际搜索场景中并不是最佳选择。我们提出了一种纵向加速不变的消息传递神经网络架构，保留了相关的离散对称性。我们进行了数值研究，证明了受MEM启发的架构在区分双希格斯衰变成四个底夸克与QCD背景的性能方面取得了新的最先进水平，具有增强的样本和参数效率。MEM和等变深度学习之间的这种协同作用为基于物理的架构设计开辟了新的方向，为探索标准模型之外的物理提供了更强大的工具。

更新时间: 2024-10-24 08:56:37

领域: hep-ph,cs.LG,hep-ex,physics.data-an

下载: http://arxiv.org/abs/2410.18553v1

IMAN: An Adaptive Network for Robust NPC Mortality Prediction with Missing Modalities

Accurate prediction of mortality in nasopharyngeal carcinoma (NPC), a complex malignancy particularly challenging in advanced stages, is crucial for optimizing treatment strategies and improving patient outcomes. However, this predictive process is often compromised by the high-dimensional and heterogeneous nature of NPC-related data, coupled with the pervasive issue of incomplete multi-modal data, manifesting as missing radiological images or incomplete diagnostic reports. Traditional machine learning approaches suffer significant performance degradation when faced with such incomplete data, as they fail to effectively handle the high-dimensionality and intricate correlations across modalities. Even advanced multi-modal learning techniques like Transformers struggle to maintain robust performance in the presence of missing modalities, as they lack specialized mechanisms to adaptively integrate and align the diverse data types, while also capturing nuanced patterns and contextual relationships within the complex NPC data. To address these problem, we introduce IMAN: an adaptive network for robust NPC mortality prediction with missing modalities.

Updated: 2024-10-24 08:54:08

标题: IMAN：一种适应性网络，用于具有缺失模态的鲁棒NPC死亡预测

摘要: 准确预测鼻咽癌（NPC）的死亡率对于优化治疗策略并改善患者预后至关重要，特别是在晚期阶段，这是一种复杂的恶性肿瘤，具有挑战性。然而，由于NPC相关数据的高维和异质性特性，再加上普遍存在的多模态数据不完整的问题，如缺失放射影像或不完整的诊断报告，这种预测过程经常会受到影响。传统的机器学习方法在面对这种不完整数据时表现出显著的性能下降，因为它们无法有效处理高维度和各模态之间的复杂相关性。即使像Transformer这样的先进多模态学习技术在存在缺失模态时也很难保持稳健的性能，因为它们缺乏专门的机制来自适应地整合和对齐不同的数据类型，同时捕捉复杂NPC数据中微妙的模式和上下文关系。为了解决这些问题，我们介绍了IMAN：一种适应性网络，用于具有缺失模态的鲁棒NPC死亡率预测。

更新时间: 2024-10-24 08:54:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.18551v1

On Explaining with Attention Matrices

This paper explores the much discussed, possible explanatory link between attention weights (AW) in transformer models and predicted output. Contrary to intuition and early research on attention, more recent prior research has provided formal arguments and empirical evidence that AW are not explanatorily relevant. We show that the formal arguments are incorrect. We introduce and effectively compute efficient attention, which isolates the effective components of attention matrices in tasks and models in which AW play an explanatory role. We show that efficient attention has a causal role (provides minimally necessary and sufficient conditions) for predicting model output in NLP tasks requiring contextual information, and we show, contrary to [7], that efficient attention matrices are probability distributions and are effectively calculable. Thus, they should play an important part in the explanation of attention based model behavior. We offer empirical experiments in support of our method illustrating various properties of efficient attention with various metrics on four datasets.

Updated: 2024-10-24 08:43:33

标题: 关于用注意力矩阵解释的内容

摘要: 本文探讨了注意力权重（AW）在变压器模型中与预测输出之间可能的解释性联系，这一问题已被广泛讨论。与直觉和早期关于注意力的研究相反，最近的研究提出了正式的论据和经验证据，表明AW并不具有解释性相关性。我们展示了这些正式论据是不正确的。我们引入并有效计算了高效的注意力，它将注意力矩阵的有效组成部分在需要AW发挥解释作用的任务和模型中进行了隔离。我们表明高效的注意力在需要上下文信息的自然语言处理任务中对预测模型输出具有因果关系（提供最少必要和充分条件），并且我们展示了，与之前的研究相反，高效的注意力矩阵是概率分布且可以有效计算。因此，它们应在解释基于注意力的模型行为中扮演重要角色。我们提供了支持我们方法的实证实验，展示了高效的注意力在四个数据集上具有各种指标的各种属性。

更新时间: 2024-10-24 08:43:33

领域: cs.CL,cs.AI,46-04,I.2.7; I.7.0

下载: http://arxiv.org/abs/2410.18541v1

Towards Aligning Language Models with Textual Feedback

We present ALT (ALignment with Textual feedback), an approach that aligns language models with user preferences expressed in text. We argue that text offers greater expressiveness, enabling users to provide richer feedback than simple comparative preferences and this richer feedback can lead to more efficient and effective alignment. ALT aligns the model by conditioning its generation on the textual feedback. Our method relies solely on language modeling techniques and requires minimal hyper-parameter tuning, though it still presents the main benefits of RL-based alignment algorithms and can effectively learn from textual feedback. We explore the efficacy and efficiency of textual feedback across different tasks such as toxicity reduction, summarization, and dialog response generation. We find that ALT outperforms PPO for the task of toxicity reduction while being able to match its performance on summarization with only 20% of the samples. We also explore how ALT can be used with feedback provided by an existing LLM where we explore an LLM providing constrained and unconstrained textual feedback. We also outline future directions to align models with natural language feedback.

Updated: 2024-10-24 08:43:21

标题: 朝着将语言模型与文本反馈对齐

摘要: 我们提出了一种名为ALT（ALignment with Textual feedback）的方法，该方法通过文本中用户表达的偏好与语言模型进行对齐。我们认为文本具有更大的表达性，使用户能够提供比简单的比较偏好更丰富的反馈，这种更丰富的反馈可以导致更高效和更有效的对齐。ALT通过将模型的生成条件设置为文本反馈来对齐模型。我们的方法仅依赖于语言建模技术，并且需要进行最少的超参数调整，尽管它仍然具有基于RL的对齐算法的主要优势，并且可以有效地从文本反馈中学习。我们探讨了文本反馈在不同任务（如毒性减少、摘要和对话生成）中的有效性和效率。我们发现ALT在毒性减少任务中优于PPO，同时只需使用20%的样本即可与PPO在摘要任务上的表现相匹配。我们还探讨了ALT如何与现有LLM提供的反馈一起使用，其中我们探索了一个LLM提供受限和非受限文本反馈。我们还概述了将来如何通过自然语言反馈对齐模型的方向。

更新时间: 2024-10-24 08:43:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.16970v2

Interpretable Representation Learning from Videos using Nonlinear Priors

Learning interpretable representations of visual data is an important challenge, to make machines' decisions understandable to humans and to improve generalisation outside of the training distribution. To this end, we propose a deep learning framework where one can specify nonlinear priors for videos (e.g. of Newtonian physics) that allow the model to learn interpretable latent variables and use these to generate videos of hypothetical scenarios not observed at training time. We do this by extending the Variational Auto-Encoder (VAE) prior from a simple isotropic Gaussian to an arbitrary nonlinear temporal Additive Noise Model (ANM), which can describe a large number of processes (e.g. Newtonian physics). We propose a novel linearization method that constructs a Gaussian Mixture Model (GMM) approximating the prior, and derive a numerically stable Monte Carlo estimate of the KL divergence between the posterior and prior GMMs. We validate the method on different real-world physics videos including a pendulum, a mass on a spring, a falling object and a pulsar (rotating neutron star). We specify a physical prior for each experiment and show that the correct variables are learned. Once a model is trained, we intervene on it to change different physical variables (such as oscillation amplitude or adding air drag) to generate physically correct videos of hypothetical scenarios that were not observed previously.

Updated: 2024-10-24 08:39:24

标题: 使用非线性先验知识从视频中进行可解释的表示学习

摘要: 学习可解释的视觉数据表示是一个重要的挑战，可以使机器的决策对人类可理解，并提高在训练分布之外的泛化能力。为此，我们提出了一个深度学习框架，可以为视频（如牛顿物理学）指定非线性先验，使模型能够学习可解释的潜在变量，并利用这些变量生成在训练时未观察到的假设情景的视频。我们通过将变分自动编码器（VAE）的先验从简单的各向同性高斯扩展到任意非线性时间加性噪声模型（ANM）来实现这一点，该模型可以描述大量过程（如牛顿物理学）。我们提出了一种新颖的线性化方法，构建一个高斯混合模型（GMM）来近似先验，并推导出后验和先验GMM之间KL散度的数值稳定蒙特卡洛估计。我们在不同的真实世界物理视频上验证了该方法，包括摆，弹簧上的质量，下落物体和脉冲星（旋转中子星）。我们为每个实验指定了物理先验，并展示了正确的变量被学习。一旦模型训练完成，我们对其进行干预以改变不同的物理变量（如振动幅度或添加空气阻力），以生成先前未观察到的假设场景的物理正确的视频。

更新时间: 2024-10-24 08:39:24

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18539v1

Estimating Reaction Barriers with Deep Reinforcement Learning

Stable states in complex systems correspond to local minima on the associated potential energy surface. Transitions between these local minima govern the dynamics of such systems. Precisely determining the transition pathways in complex and high-dimensional systems is challenging because these transitions are rare events, and isolating the relevant species in experiments is difficult. Most of the time, the system remains near a local minimum, with rare, large fluctuations leading to transitions between minima. The probability of such transitions decreases exponentially with the height of the energy barrier, making the system's dynamics highly sensitive to the calculated energy barriers. This work aims to formulate the problem of finding the minimum energy barrier between two stable states in the system's state space as a cost-minimization problem. We propose solving this problem using reinforcement learning algorithms. The exploratory nature of reinforcement learning agents enables efficient sampling and determination of the minimum energy barrier for transitions.

Updated: 2024-10-24 08:37:08

标题: 用深度强化学习估计反应势垒

摘要: 复杂系统中的稳定状态对应于相关势能表面上的局部极小值。这些局部极小值之间的转变控制着这些系统的动态。准确确定复杂和高维系统中的转变路径是具有挑战性的，因为这些转变是罕见事件，并且在实验中孤立相关物种是困难的。大部分时间，系统保持在局部极小值附近，罕见的大波动导致不同极小值之间的转变。这种转变的概率随着能量障碍的高度呈指数下降，使得系统的动态对计算得到的能量障碍非常敏感。这项工作旨在将在系统状态空间中找到两个稳定状态之间的最小能量障碍问题制定为成本最小化问题。我们建议使用强化学习算法解决这个问题。强化学习代理的探索性质使其能够高效地取样和确定转变的最小能量障碍。

更新时间: 2024-10-24 08:37:08

领域: cs.LG,physics.comp-ph,J.2

下载: http://arxiv.org/abs/2407.12453v2

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

Despite the progress made by multimodal large language models (MLLMs) in computational pathology, they remain limited by a predominant focus on patch-level analysis, missing essential contextual information at the whole-slide level. The lack of large-scale instruction datasets and the gigapixel scale of whole slide images (WSIs) pose significant developmental challenges. In this paper, we present SlideChat, the first vision-language assistant capable of understanding gigapixel whole-slide images, exhibiting excellent multimodal conversational capability and response complex instruction across diverse pathology scenarios. To support its development, we created SlideInstruction, the largest instruction-following dataset for WSIs consisting of 4.2K WSI captions and 176K VQA pairs with multiple categories. Furthermore, we propose SlideBench, a multimodal benchmark that incorporates captioning and VQA tasks to assess SlideChat's capabilities in varied clinical settings such as microscopy, diagnosis. Compared to both general and specialized MLLMs, SlideChat exhibits exceptional capabilities achieving state-of-the-art performance on 18 of 22 tasks. For example, it achieved an overall accuracy of 81.17% on SlideBench-VQA (TCGA), and 54.15% on SlideBench-VQA (BCNB). We will fully release SlideChat, SlideInstruction and SlideBench as open-source resources to facilitate research and development in computational pathology.

Updated: 2024-10-24 08:35:28

标题: SlideChat：用于整张病理学图像理解的大型视觉-语言助手

摘要: 尽管多模态大语言模型（MLLMs）在计算病理学领域取得了进展，但它们仍受限于主要关注补丁级别分析，缺少整个幻灯片级别的重要上下文信息。缺乏大规模指导数据集以及整个幻灯片图像（WSIs）的吉格像素规模带来了重大的发展挑战。在本文中，我们介绍SlideChat，这是第一个能够理解吉格像素整个幻灯片图像、展示出优秀的多模态对话能力并在不同的病理场景中响应复杂指令的视觉语言助手。为支持其开发，我们创建了SlideInstruction，这是一个由4.2K幻灯片图像标题和176K VQA对组成的最大的用于WSIs的指令遵循数据集，包含多个类别。此外，我们提出了SlideBench，一个多模态基准测试，结合了字幕和VQA任务，以评估SlideChat在各种临床设置中的能力，如显微镜、诊断等。与一般和专门的MLLMs相比，SlideChat表现出卓越的能力，在22项任务中有18项达到了最先进的性能。例如，在SlideBench-VQA（TCGA）上达到了81.17%的总体准确率，在SlideBench-VQA（BCNB）上达到了54.15%。我们将完全开放SlideChat、SlideInstruction和SlideBench作为开放资源，以促进计算病理学领域的研究和发展。

更新时间: 2024-10-24 08:35:28

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.11761v2

MolMix: A Simple Yet Effective Baseline for Multimodal Molecular Representation Learning

In this work, we propose a simple transformer-based baseline for multimodal molecular representation learning, integrating three distinct modalities: SMILES strings, 2D graph representations, and 3D conformers of molecules. A key aspect of our approach is the aggregation of 3D conformers, allowing the model to account for the fact that molecules can adopt multiple conformations-an important factor for accurate molecular representation. The tokens for each modality are extracted using modality-specific encoders: a transformer for SMILES strings, a message-passing neural network for 2D graphs, and an equivariant neural network for 3D conformers. The flexibility and modularity of this framework enable easy adaptation and replacement of these encoders, making the model highly versatile for different molecular tasks. The extracted tokens are then combined into a unified multimodal sequence, which is processed by a downstream transformer for prediction tasks. To efficiently scale our model for large multimodal datasets, we utilize Flash Attention 2 and bfloat16 precision. Despite its simplicity, our approach achieves state-of-the-art results across multiple datasets, demonstrating its effectiveness as a strong baseline for multimodal molecular representation learning.

Updated: 2024-10-24 08:34:50

标题: MolMix：一种简单而有效的多模态分子表示学习基线

摘要: 在这项工作中，我们提出了一个基于Transformer的简单多模态分子表示学习基线，整合了三种不同的模态：SMILES字符串、2D图形表示和分子的3D构象。我们方法的一个关键方面是聚合3D构象，使模型能够考虑到分子可以采用多种构象的事实-这是准确表示分子的重要因素。每种模态的标记都是使用特定于模态的编码器提取的：用于SMILES字符串的Transformer，用于2D图的消息传递神经网络，用于3D构象的等变神经网络。这种框架的灵活性和模块化使得这些编码器的易于适应和替换，使模型对不同的分子任务高度通用。然后，提取的标记被组合成一个统一的多模态序列，该序列由下游Transformer进行处理以进行预测任务。为了有效地扩展我们的模型以处理大型多模态数据集，我们利用了Flash Attention 2和bfloat16精度。尽管简单，我们的方法在多个数据集上取得了最先进的结果，展示了其作为多模态分子表示学习强大基线的有效性。

更新时间: 2024-10-24 08:34:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.07981v2

Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets

Current publicly available knowledge work data collections lack diversity, extensive annotations, and contextual information about the users and their documents. These issues hinder objective and comparable data-driven evaluations and optimizations of knowledge work assistance systems. Due to the considerable resources needed to collect such data in real-life settings and the necessity of data censorship, collecting such a dataset appears nearly impossible. For this reason, we propose a configurable, multi-agent knowledge work dataset generator. This system simulates collaborative knowledge work among agents producing Large Language Model-generated documents and accompanying data traces. Additionally, the generator captures all background information, given in its configuration or created during the simulation process, in a knowledge graph. Finally, the resulting dataset can be utilized and shared without privacy or confidentiality concerns. This paper introduces our approach's design and vision and focuses on generating authentic knowledge work documents using Large Language Models. Our study involving human raters who assessed 53% of the generated and 74% of the real documents as realistic demonstrates the potential of our approach. Furthermore, we analyze the authenticity criteria mentioned in the participants' comments and elaborate on potential improvements for identified common issues.

Updated: 2024-10-24 08:32:54

标题: 使用大型语言模型生成真实的多主体知识工作数据集

摘要: 当前公开可用的知识工作数据集缺乏多样性、广泛的注释以及关于用户及其文档的上下文信息。这些问题妨碍了对知识工作辅助系统进行客观和可比较的数据驱动评估和优化。由于在现实生活环境中收集这样的数据需要相当大的资源，并且需要数据审查，因此收集这样一个数据集几乎是不可能的。因此，我们提出了一个可配置的、多代理知识工作数据集生成器。该系统模拟了代理之间的协作知识工作，产生了大型语言模型生成的文档和相关数据跟踪。此外，生成器在知识图中捕获了在配置中给出或在模拟过程中创建的所有背景信息。最终，生成的数据集可以在无需考虑隐私或保密问题的情况下被利用和共享。本文介绍了我们方法的设计和愿景，并侧重于使用大型语言模型生成真实的知识工作文档。我们进行的研究涉及人类评定员对53%的生成文档和74%的真实文档进行了评估，结果表明了我们方法的潜力。此外，我们分析了参与者评论中提到的真实性标准，并详细阐述了对已识别的常见问题的潜在改进。

更新时间: 2024-10-24 08:32:54

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.04286v2

LOGO -- Long cOntext aliGnment via efficient preference Optimization

Long-context models(LCMs) have shown great potential in processing long input sequences(even more than 100M tokens) conveniently and effectively. With significant progress, recent research has pointed out that LCMs can accurately locate token-level salient information within the context. Yet, the generation performance of these LCMs is far from satisfactory and might result in misaligned responses, such as hallucinations. To enhance the generation capability of LCMs, existing works have investigated the effects of data size and quality for both pre-training and instruction tuning. Though achieving meaningful improvement, previous methods fall short in either effectiveness or efficiency. In this paper, we introduce LOGO(Long cOntext aliGnment via efficient preference Optimization), a training strategy that first introduces preference optimization for long-context alignment. To overcome the GPU memory-bound issue caused by the long sequence, LOGO employs a reference-free preference optimization strategy and adopts a position synthesis method to construct the training data. By training with only 0.3B data on a single 8$\times$A800 GPU machine for 16 hours, LOGO allows the Llama-3-8B-Instruct-80K model to achieve comparable performance with GPT-4 in real-world long-context tasks while preserving the model's original capabilities on other tasks, e.g., language modeling and MMLU. Moreover, LOGO can extend the model's context window size while enhancing its generation performance.

Updated: 2024-10-24 08:27:26

标题: LOGO -- 通过高效偏好优化实现长文本对齐

摘要: 长上下文模型（LCMs）已经显示出在处理长输入序列（甚至超过100M个标记）方便有效的巨大潜力。随着显著进展，最近的研究指出LCMs能够准确地定位上下文中的标记级显著信息。然而，这些LCMs的生成性能远非令人满意，可能导致错位的响应，如幻觉。为了增强LCMs的生成能力，现有研究已经调查了数据大小和质量对预训练和指导调优的影响。尽管取得了有意义的改进，以前的方法在效果或效率方面存在不足。在本文中，我们介绍了LOGO（长上下文对齐通过高效偏好优化），这是一种首先引入了长上下文对齐的偏好优化的训练策略。为了克服由长序列引起的GPU内存限制问题，LOGO采用了一种无参考优化策略，并采用了位置合成方法来构建训练数据。通过在单个8×A800 GPU机器上训练仅使用0.3B数据16小时，LOGO使Llama-3-8B-Instruct-80K模型能够在现实世界的长上下文任务中实现与GPT-4相当的性能，同时保留模型在其他任务（如语言建模和MMLU）上的原始能力。此外，LOGO可以扩展模型的上下文窗口大小，同时增强其生成性能。

更新时间: 2024-10-24 08:27:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.18533v1

PRACT: Optimizing Principled Reasoning and Acting of LLM Agent

We introduce the Principled Reasoning and Acting (PRAct) framework, a novel method for learning and enforcing action principles from trajectory data. Central to our approach is the use of text gradients from a reflection and optimization engine to derive these action principles. To adapt action principles to specific task requirements, we propose a new optimization framework, Reflective Principle Optimization (RPO). After execution, RPO employs a reflector to critique current action principles and an optimizer to update them accordingly. We develop the RPO framework under two scenarios: Reward-RPO, which uses environmental rewards for reflection, and Self-RPO, which conducts self-reflection without external rewards. Additionally, two RPO methods, RPO-Traj and RPO-Batch, is introduced to adapt to different settings. Experimental results across four environments demonstrate that the PRAct agent, leveraging the RPO framework, effectively learns and applies action principles to enhance performance.

Updated: 2024-10-24 08:21:51

标题: PRACT：优化LLM代理的原则推理和行动

摘要: 我们介绍了Principled Reasoning and Acting (PRAct)框架，这是一种从轨迹数据中学习和执行行动原则的新方法。我们方法的核心是利用反思和优化引擎中的文本梯度来推导这些行动原则。为了适应特定任务需求，我们提出了一种新的优化框架，Reflective Principle Optimization (RPO)。在执行之后，RPO利用反射器批评当前的行动原则，并通过优化器相应更新它们。我们在两种情况下开发了RPO框架：Reward-RPO使用环境奖励进行反思，Self-RPO在没有外部奖励的情况下进行自我反思。此外，介绍了两种RPO方法，RPO-Traj和RPO-Batch，以适应不同的设置。通过对四个环境的实验结果表明，利用RPO框架的PRAct代理有效地学习和应用行动原则来提高性能。

更新时间: 2024-10-24 08:21:51

领域: cs.AI

下载: http://arxiv.org/abs/2410.18528v1

Probing Ranking LLMs: Mechanistic Interpretability in Information Retrieval

Transformer networks, especially those with performance on par with GPT models, are renowned for their powerful feature extraction capabilities. However, the nature and correlation of these features with human-engineered ones remain unclear. In this study, we delve into the mechanistic workings of state-of-the-art, fine-tuning-based passage-reranking transformer networks. Our approach involves a probing-based, layer-by-layer analysis of neurons within ranking LLMs to identify individual or groups of known human-engineered and semantic features within the network's activations. We explore a wide range of features, including lexical, document structure, query-document interaction, advanced semantic, interaction-based, and LLM-specific features, to gain a deeper understanding of the underlying mechanisms that drive ranking decisions in LLMs. Our results reveal a set of features that are prominently represented in LLM activations, as well as others that are notably absent. Additionally, we observe distinct behaviors of LLMs when processing low versus high relevance queries and when encountering out-of-distribution query and document sets. By examining these features within activations, we aim to enhance the interpretability and performance of LLMs in ranking tasks. Our findings provide valuable insights for the development of more effective and transparent ranking models, with significant implications for the broader information retrieval community. All scripts and code necessary to replicate our findings are made available.

Updated: 2024-10-24 08:20:10

标题: 探究排名LLMs：信息检索中的机制可解释性

摘要: 变压器网络，特别是那些性能与GPT模型相当的网络，以其强大的特征提取能力而闻名。然而，这些特征的性质和与人工设计的特征之间的相关性仍不清楚。在这项研究中，我们深入研究了基于微调的最新技术，对话重新排名变压器网络的机制工作方式。我们的方法涉及基于探测的逐层分析排名LLM中的神经元，以识别网络激活中的已知人工设计和语义特征的个体或群组。我们探索了一系列特征，包括词汇、文档结构、查询-文档交互、高级语义、基于交互的和LLM特定的特征，以深入了解驱动LLM中排名决策的潜在机制。我们的结果揭示了在LLM激活中明显表示的一组特征，以及其他明显缺失的特征。此外，我们观察到当处理低相关性查询和遇到超出分布范围的查询和文档集时，LLM的行为有所不同。通过检查激活中的这些特征，我们旨在增强LLM在排名任务中的可解释性和性能。我们的发现为开发更有效和透明的排名模型提供了宝贵的见解，对更广泛的信息检索社区有重要影响。我们提供了复制我们发现所需的所有脚本和代码。

更新时间: 2024-10-24 08:20:10

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2410.18527v1

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large-scale zero-shot speech synthesis system with approximately 5\% of the inference time compared with previous work. FlashSpeech is built on the latent consistency model and applies a novel adversarial consistency training approach that can train from scratch without the need for a pre-trained diffusion model as the teacher. Furthermore, a new prosody generator module enhances the diversity of prosody, making the rhythm of the speech sound more natural. The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation. Our experimental results demonstrate the superior performance of FlashSpeech. Notably, FlashSpeech can be about 20 times faster than other zero-shot speech synthesis systems while maintaining comparable performance in terms of voice quality and similarity. Furthermore, FlashSpeech demonstrates its versatility by efficiently performing tasks like voice conversion, speech editing, and diverse speech sampling. Audio samples can be found in https://flashspeech.github.io/.

Updated: 2024-10-24 08:19:04

标题: FlashSpeech：高效零样本语音合成

摘要: 最近在大规模零样式语音合成方面取得了显著进展，主要得益于语言模型和扩散模型。然而，这两种方法的生成过程较慢且计算密集。利用较低的计算预算实现与先前工作相当的质量的高效语音合成仍然是一个重大挑战。本文介绍了FlashSpeech，一个大规模零样式语音合成系统，与先前工作相比推理时间减少约5%。FlashSpeech建立在潜在一致性模型之上，并应用了一种新颖的对抗一致性训练方法，可以从头开始训练，无需预先训练的扩散模型作为教师。此外，一个新的韵律生成器模块增强了韵律的多样性，使语音的节奏听起来更加自然。FlashSpeech的生成过程可以在一两个采样步骤内高效实现，同时保持高音频质量和与语音提示的高相似性，用于零样式语音生成。我们的实验结果表明FlashSpeech的卓越性能。值得注意的是，FlashSpeech在声音质量和相似性方面维持可比性的同时，可能比其他零样式语音合成系统快约20倍。此外，FlashSpeech通过有效地执行诸如声音转换、语音编辑和多样化语音采样等任务展示了其多功能性。音频样本可在https://flashspeech.github.io/中找到。

更新时间: 2024-10-24 08:19:04

领域: eess.AS,cs.AI,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2404.14700v4

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

A longstanding goal of artificial general intelligence is highly capable generalists that can learn from diverse experiences and generalize to unseen tasks. The language and vision communities have seen remarkable progress toward this trend by scaling up transformer-based models trained on massive datasets, while reinforcement learning (RL) agents still suffer from poor generalization capacity under such paradigms. To tackle this challenge, we propose Meta Decision Transformer (Meta-DT), which leverages the sequential modeling ability of the transformer architecture and robust task representation learning via world model disentanglement to achieve efficient generalization in offline meta-RL. We pretrain a context-aware world model to learn a compact task representation, and inject it as a contextual condition to the causal transformer to guide task-oriented sequence generation. Then, we subtly utilize history trajectories generated by the meta-policy as a self-guided prompt to exploit the architectural inductive bias. We select the trajectory segment that yields the largest prediction error on the pretrained world model to construct the prompt, aiming to encode task-specific information complementary to the world model maximally. Notably, the proposed framework eliminates the requirement of any expert demonstration or domain knowledge at test time. Experimental results on MuJoCo and Meta-World benchmarks across various dataset types show that Meta-DT exhibits superior few and zero-shot generalization capacity compared to strong baselines while being more practical with fewer prerequisites. Our code is available at https://github.com/NJU-RL/Meta-DT.

Updated: 2024-10-24 08:17:51

标题: Meta-DT：离线元RL作为具有世界模型解缠的条件序列建模

摘要: 人工通用智能的一个长期目标是高度能力强大的通才，能够从各种经验中学习并推广到未见过的任务。语言和视觉社区通过扩大基于Transformer的模型在大规模数据集上训练取得了显著进展，而强化学习（RL）代理在这种范式下仍然存在泛化能力不足的问题。为了应对这一挑战，我们提出了Meta Decision Transformer（Meta-DT），利用Transformer架构的序列建模能力和通过世界模型解缠获得的强大任务表示学习，实现了离线元RL的高效泛化。我们预先训练一个上下文感知的世界模型来学习一个紧凑的任务表示，并将其作为上下文条件注入到因果Transformer中，以指导面向任务的序列生成。然后，我们巧妙地利用元策略生成的历史轨迹作为自引导提示，利用架构归纳偏差。我们选择在预训练的世界模型上产生最大预测误差的轨迹段来构建提示，旨在最大程度地编码与世界模型互补的任务特定信息。值得注意的是，提出的框架在测试时间消除了任何专家示范或领域知识的要求。在MuJoCo和Meta-World基准测试上的实验结果表明，Meta-DT相对于强基线表现出更优秀的少数和零次泛化能力，同时在先决条件较少的情况下更加实用。我们的代码可在https://github.com/NJU-RL/Meta-DT上找到。

更新时间: 2024-10-24 08:17:51

领域: cs.LG

下载: http://arxiv.org/abs/2410.11448v2

Approximation Rate of the Transformer Architecture for Sequence Modeling

The Transformer architecture is widely applied in sequence modeling applications, yet the theoretical understanding of its working principles remains limited. In this work, we investigate the approximation rate for single-layer Transformers with one head. We consider a class of non-linear relationships and identify a novel notion of complexity measures to establish an explicit Jackson-type approximation rate estimate for the Transformer. This rate reveals the structural properties of the Transformer and suggests the types of sequential relationships it is best suited for approximating. In particular, the results on approximation rates enable us to concretely analyze the differences between the Transformer and classical sequence modeling methods, such as recurrent neural networks.

Updated: 2024-10-24 08:13:01

标题: 序列建模中Transformer架构的近似速率

摘要: 变压器架构广泛应用于序列建模应用程序中，但其工作原理的理论理解仍然有限。在这项工作中，我们研究了具有一个头的单层变压器的逼近率。我们考虑了一类非线性关系，并确定了一种新颖的复杂度度量概念，以建立变压器的显式杰克逊类型逼近率估计。这种速率揭示了变压器的结构特性，并暗示了它最适合逼近的顺序关系类型。特别是，逼近率的结果使我们能够具体分析变压器与传统序列建模方法（如循环神经网络）之间的差异。

更新时间: 2024-10-24 08:13:01

领域: cs.LG

下载: http://arxiv.org/abs/2305.18475v3

Unsupervised Object Detection with Theoretical Guarantees

Unsupervised object detection using deep neural networks is typically a difficult problem with few to no guarantees about the learned representation. In this work we present the first unsupervised object detection method that is theoretically guaranteed to recover the true object positions up to quantifiable small shifts. We develop an unsupervised object detection architecture and prove that the learned variables correspond to the true object positions up to small shifts related to the encoder and decoder receptive field sizes, the object sizes, and the widths of the Gaussians used in the rendering process. We perform detailed analysis of how the error depends on each of these variables and perform synthetic experiments validating our theoretical predictions up to a precision of individual pixels. We also perform experiments on CLEVR-based data and show that, unlike current SOTA object detection methods (SAM, CutLER), our method's prediction errors always lie within our theoretical bounds. We hope that this work helps open up an avenue of research into object detection methods with theoretical guarantees.

Updated: 2024-10-24 08:09:47

标题: 具有理论保证的无监督目标检测

摘要: 使用深度神经网络进行无监督目标检测通常是一个困难的问题，几乎没有关于学习表示的保证。在这项工作中，我们提出了第一个理论上保证能够恢复真实目标位置的无监督目标检测方法，直到可量化的小偏移。我们开发了一个无监督目标检测架构，并证明学习的变量与真实目标位置相对应，直到与编码器和解码器感受野大小、目标大小以及渲染过程中使用的高斯函数的宽度相关的小偏移。我们对每个变量的误差依赖进行了详细分析，并进行了合成实验，验证了我们的理论预测，直到个别像素的精度。我们还在基于CLEVR的数据上进行了实验，并展示，与当前最先进的目标检测方法（如SAM、CutLER）不同，我们的方法的预测误差始终在我们的理论界限内。我们希望这项工作能够开辟一个具有理论保证的目标检测方法的研究途径。

更新时间: 2024-10-24 08:09:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.07284v2

KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing

The development of large language models (LLMs) has significantly expanded model sizes, resulting in substantial GPU memory requirements during inference. The key and value storage of the attention map in the KV (key-value) cache accounts for more than 80\% of this memory consumption. Nowadays, most existing KV cache compression methods focus on intra-layer compression within a single Transformer layer but few works consider layer-wise compression. In this paper, we propose a plug-and-play method called \textit{KVSharer}, which shares the KV cache between layers to achieve layer-wise compression. Rather than intuitively sharing based on higher similarity, we discover a counterintuitive phenomenon: sharing dissimilar KV caches better preserves the model performance. Experiments show that \textit{KVSharer} can reduce KV cache computation by 30\%, thereby lowering memory consumption without significantly impacting model performance and it can also achieve at least 1.3 times generation acceleration. Additionally, we verify that \textit{KVSharer} is compatible with existing intra-layer KV cache compression methods, and combining both can further save memory.

Updated: 2024-10-24 08:06:41

标题: KVSharer：通过逐层不同的KV缓存共享实现高效推断

摘要: 大型语言模型（LLMs）的发展显著扩大了模型规模，在推断过程中需要大量GPU内存。在KV（键-值）缓存中的注意力图的关键和值存储占据了超过80％的内存消耗。如今，大多数现有的KV缓存压缩方法侧重于单个Transformer层内的层内压缩，但很少有作品考虑逐层压缩。在本文中，我们提出了一种名为KVSharer的即插即用方法，该方法在层之间共享KV缓存以实现逐层压缩。我们发现，与直觉相反，共享不相似的KV缓存能更好地保持模型性能。实验表明，KVSharer可以将KV缓存计算减少30％，从而降低内存消耗而不会显著影响模型性能，还可以实现至少1.3倍的生成加速。此外，我们验证了KVSharer与现有的层内KV缓存压缩方法兼容，并且结合两者可以进一步节省内存。

更新时间: 2024-10-24 08:06:41

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18517v1

Scaling up Masked Diffusion Models on Text

Masked diffusion models (MDMs) have shown promise in language modeling, yet their scalability and effectiveness in core language tasks, such as text generation and language understanding, remain underexplored. This paper establishes the first scaling law for MDMs, demonstrating a scaling rate comparable to autoregressive models (ARMs) and a relatively small compute gap. Motivated by their scalability, we train a family of MDMs with up to 1.1 billion (B) parameters to systematically evaluate their performance against ARMs of comparable or larger sizes. Fully leveraging the probabilistic formulation of MDMs, we propose a simple yet effective \emph{unsupervised classifier-free guidance} that effectively exploits large-scale unpaired data, boosting performance for conditional inference. In language understanding, a 1.1B MDM shows competitive results, outperforming the larger 1.5B GPT-2 model on four out of eight zero-shot benchmarks. In text generation, MDMs provide a flexible trade-off compared to ARMs utilizing KV-cache: MDMs match the performance of ARMs while being 1.4 times faster, or achieve higher quality than ARMs at a higher computational cost. Moreover, MDMs address challenging tasks for ARMs by effectively handling bidirectional reasoning and adapting to temporal shifts in data. Notably, a 1.1B MDM breaks the \emph{reverse curse} encountered by much larger ARMs with significantly more data and computation, such as Llama-2 (13B) and GPT-3 (175B). Our code is available at \url{https://github.com/ML-GSAI/SMDM}.

Updated: 2024-10-24 08:01:22

标题: 在文本上扩展遮蔽扩散模型

摘要: 掩盖扩散模型（MDMs）在语言建模中显示出潜力，然而它们在核心语言任务（如文本生成和语言理解）中的可扩展性和有效性仍未得到充分探讨。本文建立了MDMs的第一个扩展定律，展示了与自回归模型（ARMs）相当的扩展速率和相对较小的计算差距。受到其可扩展性的启发，我们训练了一个包含高达11亿（B）参数的MDMs家族，以系统评估它们与相当或更大规模的ARMs的性能。充分利用MDMs的概率形式，我们提出了一种简单而有效的“无监督分类器指导”，有效利用大规模非配对数据，提升了条件推理的性能。在语言理解方面，一个11亿参数的MDM展示了竞争性结果，在八个零-shot基准测试中有四个超越了更大的15亿GPT-2模型。在文本生成方面，与利用KV-cache的ARMs相比，MDMs提供了灵活的权衡：MDMs在速度上比ARMs快1.4倍，或者以更高的计算成本实现比ARMs更高的质量。此外，MDMs通过有效处理双向推理和适应数据的时间性变化，解决了ARMs的挑战性任务。值得注意的是，一个11亿MDM打破了更大规模的ARMs（如Llama-2（13B）和GPT-3（175B））在更多数据和计算方面遇到的“逆诅咒”。我们的代码可在\url{https://github.com/ML-GSAI/SMDM}获得。

更新时间: 2024-10-24 08:01:22

领域: cs.AI

下载: http://arxiv.org/abs/2410.18514v1

Exclusively Penalized Q-learning for Offline Reinforcement Learning

Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift. This paper focuses on a limitation in existing offline RL methods with penalized value function, indicating the potential for underestimation bias due to unnecessary bias introduced in the value function. To address this concern, we propose Exclusively Penalized Q-learning (EPQ), which reduces estimation bias in the value function by selectively penalizing states that are prone to inducing estimation errors. Numerical results show that our method significantly reduces underestimation bias and improves performance in various offline control tasks compared to other offline RL methods

Updated: 2024-10-24 07:56:23

标题: 《仅惩罚Q学习用于离线强化学习》

摘要: 基于约束的离线强化学习（RL）涉及对政策约束或对值函数施加惩罚，以减轻由分布偏移引起的过度估计误差。本文关注现有离线RL方法中带惩罚值函数的局限性，表明由值函数引入的不必要偏见可能导致低估误差。为解决这一问题，我们提出了独占性惩罚Q学习（EPQ），通过选择性地对易引发估计错误的状态进行惩罚，从而减少值函数中的估计偏差。数值结果显示，与其他离线RL方法相比，我们的方法显著减少了低估误差，并在各种离线控制任务中提高了性能。

更新时间: 2024-10-24 07:56:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.14082v2

A framework for GNSS-based solutions performance analysis in an ERTMS context

Context Progresses in GNSS-based solution introduction in rail applications GNSS (Global Navigation Satellite System) is now used in most of our travels and each of our smartphone apps. Most of the usages are not safety-critical. But Europe identified GNSS for more applications and to be integrated in rail in general as part of the toolset to help railway to contribute to reduce transport carbon footprint. To increase the use of trains in European transports, railways must improve their attractiveness for passengers and freight, but also increase reliability, availability and efficiency by reducing capital expenditure and operational costs. GNSS is part of the global digitalization scheme of freight that aims to offer added value to the clients knowledge of accurate time of arrival, continuous monitoring of transport conditions (temperature, humidity...). But a major challenge will be to reach stringent applications and in particular, GNSS is today seen as a realistic and serious game changer for the future of the ERTMS (European Rail Traffic Management System). The localisation function is today performed with both odometry and balises. Odometer provides a continuous train position in time from a reference point. But as the distance delivered by the odometer shows a growing bias with distance, due to wear and wheel sliding, the use of on-track balises allows to reduce this error. Future systems will be based on on-board localisation solutions with GNSS receivers. It will allow the development of new concepts for moving blocks, virtual coupling and automation. Its use for train integrity is also investigated. But the environmental conditions of track and surroundings configuration, i.e, tunnels, dense urban areas or vegetation often degrade positioning performance and thus its efficiency and safety. Indeed, GNSS satellites are moving and their visibility (availability and relative position from the receiver) vary with time. Moreover, for optimal performance, the system requires open sky environments, which are the cases of most of the aeronautical uses but not of train uses. Trains often circulate in areas where signal reception can be disturbed (multipath, intentional or unintentional interferences) and thus, performances degraded. If many progresses have been made in the past years to develop more robust receivers [Puccitelli, 2022], multi-sensor solutions [CLUG website] or missing tools such as Digital Maps [Crespillo, 2023], in projects such as the Shift2Rail Project X2Rail-5 or CLUG, some questions remain and in particular related to performance evaluation. How can we evaluate performances in a dynamic environment (train, satellite, obstacles)? How can we be sure that every configuration has been tested? What is the impact of a failure (inaccuracy, missed detection) on operation? Some of these issues are addressed in the on-going R2DATO project funded by Europe's rail.

Updated: 2024-10-24 07:53:47

标题: 在ERTMS环境中进行基于GNSS解决方案性能分析的框架

摘要: 在铁路应用中基于GNSS的解决方案引入取得了进展。GNSS（全球导航卫星系统）现在被广泛应用于我们的出行和每一个智能手机应用程序中。大多数用途并非关乎安全。但欧洲确定了更多应用领域，并将GNSS整合到铁路系统中，作为帮助铁路减少运输碳足迹的工具。为增加火车在欧洲交通中的使用率，铁路必须提高对乘客和货物的吸引力，同时通过降低资本支出和运营成本来提高可靠性、可用性和效率。GNSS是货运全球数字化计划的一部分，旨在为客户提供到达时间准确性、运输条件持续监控（温度、湿度等）的附加价值。但一个主要挑战将是达到严格的应用，特别是，今天GNSS被视为欧洲铁路交通管理系统（ERTMS）未来的一个现实和重要变革者。定位功能目前通过里程表和轨道标志实现。里程表提供从参考点开始的连续火车位置。但由于磨损和车轮滑动，里程表提供的距离随距离增加而出现偏差，因此使用轨道标志可以减少这种误差。未来系统将基于带有GNSS接收器的车载定位解决方案。这将促使新概念的发展，如移动区块、虚拟耦合和自动化。同时，还在研究其用于列车完整性的可能性。但轨道和周围环境的环境条件，例如隧道、密集城市区域或植被等往往会降低定位性能，进而影响其效率和安全性。实际上，GNSS卫星在运动，其可见性（接收器的可用性和相对位置）随时间变化。此外，为了实现最佳性能，系统需要开放空间环境，这在大多数航空用途中是常见的，但在火车使用中并非如此。火车往往在信号接收可能受到干扰的区域行驶（多径、有意或无意干扰），从而降低性能。尽管在过去几年中已经取得了许多进展，包括开发更强大的接收器、多传感器解决方案或缺失的工具如数字地图，在Shift2Rail项目X2Rail-5或CLUG等项目中，仍然存在一些问题，特别是与性能评估相关的问题。在动态环境（火车、卫星、障碍物）中如何评估性能？如何确保每种配置都经过测试？故障（不准确、未检测到）对操作的影响是什么？一些这些问题正在欧洲铁路资助的正在进行的R2DATO项目中得到解决。

更新时间: 2024-10-24 07:53:47

领域: cs.AI,eess.SP

下载: http://arxiv.org/abs/2410.18510v1

Enhancing Graph Attention Neural Network Performance for Marijuana Consumption Classification through Large-scale Augmented Granger Causality (lsAGC) Analysis of Functional MR Images

In the present research, the effectiveness of large-scale Augmented Granger Causality (lsAGC) as a tool for gauging brain network connectivity was examined to differentiate between marijuana users and typical controls by utilizing resting-state functional Magnetic Resonance Imaging (fMRI). The relationship between marijuana consumption and alterations in brain network connectivity is a recognized fact in scientific literature. This study probes how lsAGC can accurately discern these changes. The technique used integrates dimension reduction with the augmentation of source time-series in a model that predicts time-series, which helps in estimating the directed causal relationships among fMRI time-series. As a multivariate approach, lsAGC uncovers the connection of the inherent dynamic system while considering all other time-series. A dataset of 60 adults with an ADHD diagnosis during childhood, drawn from the Addiction Connectome Preprocessed Initiative (ACPI), was used in the study. The brain connections assessed by lsAGC were utilized as classification attributes. A Graph Attention Neural Network (GAT) was chosen to carry out the classification task, particularly for its ability to harness graph-based data and recognize intricate interactions between brain regions, making it appropriate for fMRI-based brain connectivity data. The performance was analyzed using a five-fold cross-validation system. The average accuracy achieved by the correlation coefficient method was roughly 52.98%, with a 1.65 standard deviation, whereas the lsAGC approach yielded an average accuracy of 61.47%, with a standard deviation of 1.44. The suggested method enhances the body of knowledge in the field of neuroimaging-based classification and emphasizes the necessity to consider directed causal connections in brain network connectivity analysis when studying marijuana's effects on the brain.

Updated: 2024-10-24 07:50:10

标题: 通过大规模增强格兰杰因果关系（lsAGC）分析功能性磁共振图像，提升图注意神经网络在大麻消费分类中的性能

摘要: 在当前研究中，通过利用静息态功能磁共振成像（fMRI）来检验大规模增强格兰杰因果关系（lsAGC）作为衡量大脑网络连接性的工具的有效性，以区分大麻用户和典型对照组。大麻消耗与大脑网络连接性变化之间的关系是科学文献中公认的事实。本研究探讨了lsAGC如何准确地辨别这些变化。所使用的技术将维度缩减与增强源时间序列集成在一起，构建了一个预测时间序列的模型，有助于估计fMRI时间序列之间的有向因果关系。作为一种多变量方法，lsAGC揭示了固有动态系统的连接，同时考虑了所有其他时间序列。研究使用来自成瘾连通组预处理计划（ACPI）的60名在童年时期被诊断为ADHD的成年人的数据集。lsAGC评估的大脑连接被用作分类属性。选择了图注意神经网络（GAT）来执行分类任务，特别是因为它能够利用基于图的数据并识别脑区之间的复杂相互作用，使其适用于基于fMRI的脑连接数据。性能使用五折交叉验证系统进行分析。通过相关系数方法获得的平均准确率约为52.98％，标准偏差为1.65，而lsAGC方法的平均准确率为61.47％，标准偏差为1.44。建议的方法增强了神经影像学分类领域的知识体系，并强调了在研究大麻对大脑影响时考虑有向因果连接在大脑网络连接性分析中的必要性。

更新时间: 2024-10-24 07:50:10

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2410.18506v1

SFB-net for cardiac segmentation: Bridging the semantic gap with attention

In the past few years, deep learning algorithms have been widely used for cardiac image segmentation. However, most of these architectures rely on convolutions that hardly model long-range dependencies, limiting their ability to extract contextual information. In order to tackle this issue, this article introduces the Swin Filtering Block network (SFB-net) which takes advantage of both conventional and swin transformer layers. The former are used to introduce spatial attention at the bottom of the network, while the latter are applied to focus on high level semantically rich features between the encoder and decoder. An average Dice score of 92.4 was achieved on the ACDC dataset. To the best of our knowledge, this result outperforms any other work on this dataset. The average Dice score of 87.99 obtained on the M\&M's dataset demonstrates that the proposed method generalizes well to data from different vendors and centres.

Updated: 2024-10-24 07:48:13

标题: SFB-net用于心脏分割：通过关注机制弥合语义鸿沟

摘要: 在过去几年中，深度学习算法已广泛用于心脏图像分割。然而，大多数这些架构依赖于几乎不模拟长距离依赖关系的卷积，限制了它们提取上下文信息的能力。为了解决这个问题，本文介绍了Swin Filtering Block网络（SFB-net），它充分利用了传统和Swin变换器层。前者用于在网络底部引入空间注意力，而后者用于集中在编码器和解码器之间的高层语义丰富特征。在ACDC数据集上实现了92.4的平均Dice分数。据我们所知，这一结果优于该数据集上的任何其他工作。在M&M's数据集上获得的87.99的平均Dice分数表明，所提出的方法很好地泛化到来自不同供应商和中心的数据。

更新时间: 2024-10-24 07:48:13

领域: cs.AI

下载: http://arxiv.org/abs/2410.18503v1

Assured Automatic Programming via Large Language Models

With the advent of AI-based coding engines, it is possible to convert natural language requirements to executable code in standard programming languages. However, AI-generated code can be unreliable, and the natural language requirements driving this code may be ambiguous. In other words, the intent may not be accurately captured in the code generated from AI-coding engines like Copilot. The goal of our work is to discover the programmer intent, while generating code which conforms to the intent and a proof of this conformance. Our approach to intent discovery is powered by a novel repair engine called program-proof co-evolution, where the object of repair is a tuple (code, logical specification, test) generated by an LLM from the same natural language description. The program and the specification capture the initial operational and declarative description of intent, while the test represents a concrete, albeit partial, understanding of the intent. Our objective is to achieve consistency between the program, the specification, and the test by incrementally refining our understanding of the user intent. Reaching consistency through this repair process provides us with a formal, logical description of the intent, which is then translated back into natural language for the developer's inspection. The resultant intent description is now unambiguous, though expressed in natural language. We demonstrate how the unambiguous intent discovered through our approach increases the percentage of verifiable auto-generated programs on a recently proposed dataset in the Dafny programming language.

Updated: 2024-10-24 07:29:15

标题: 通过大型语言模型确保自动编程

摘要: 随着基于人工智能的编码引擎的出现，将自然语言要求转换为标准编程语言中的可执行代码成为可能。然而，AI生成的代码可能不可靠，驱动此代码的自然语言要求可能存在歧义。换句话说，从Copilot等AI编码引擎生成的代码可能无法准确捕捉意图。我们的工作目标是发现程序员意图，同时生成符合意图的代码以及对其符合性的证明。我们的意图发现方法由一个名为程序证明协同进化的新型修复引擎驱动，修复对象是由LLM从相同自然语言描述生成的一个元组（代码、逻辑规范、测试）。程序和规范捕捉了意图的初始操作性和声明性描述，而测试代表了对意图的具体理解，尽管是部分的。我们的目标是通过逐步完善对用户意图的理解，在程序、规范和测试之间实现一致性。通过这种修复过程达到一致性，为我们提供了意图的形式化、逻辑描述，然后将其翻译回自然语言供开发人员检查。通过我们的方法发现的明确意图如何增加在Dafny编程语言中最近提出的数据集中可验证的自动生成程序的百分比。

更新时间: 2024-10-24 07:29:15

领域: cs.SE,cs.LG,cs.PL

下载: http://arxiv.org/abs/2410.18494v1

Large Language Model for Table Processing: A Survey

Tables, typically two-dimensional and structured to store large amounts of data, are essential in daily activities like database queries, spreadsheet manipulations, web table question answering, and image table information extraction. Automating these table-centric tasks with Large Language Models (LLMs) or Visual Language Models (VLMs) offers significant public benefits, garnering interest from academia and industry. This survey provides a comprehensive overview of table-related tasks, examining both user scenarios and technical aspects. It covers traditional tasks like table question answering as well as emerging fields such as spreadsheet manipulation and table data analysis. We summarize the training techniques for LLMs and VLMs tailored for table processing. Additionally, we discuss prompt engineering, particularly the use of LLM-powered agents, for various table-related tasks. Finally, we highlight several challenges, including diverse user input when serving and slow thinking using chain-of-thought.

Updated: 2024-10-24 07:26:36

标题: 大型语言模型用于表格处理：一项调查

摘要: 表格通常是二维的结构化数据存储形式，用于存储大量数据，在日常活动中起着重要作用，如数据库查询、电子表格操作、网络表格问答和图像表格信息提取。利用大型语言模型（LLMs）或视觉语言模型（VLMs）自动化这些以表格为中心的任务，具有重要的公共利益，受到学术界和行业的关注。本调查提供了关于与表格相关任务的全面概述，考察了用户场景和技术方面。它涵盖了传统任务，如表格问答，以及新兴领域，如电子表格操作和表格数据分析。我们总结了为表格处理量身定制的LLMs和VLMs的训练技术。此外，我们讨论了提示工程，特别是LLM驱动的代理在各种与表格相关的任务中的应用。最后，我们强调了几个挑战，包括为用户提供服务时的多样化输入和使用思维链进行缓慢思考。

更新时间: 2024-10-24 07:26:36

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.05121v3

LLM as a code generator in Agile Model Driven Development

Leveraging Large Language Models (LLM) like GPT4 in the auto generation of code represents a significant advancement, yet it is not without its challenges. The ambiguity inherent in natural language descriptions of software poses substantial obstacles to generating deployable, structured artifacts. This research champions Model Driven Development (MDD) as a viable strategy to overcome these challenges, proposing an Agile Model Driven Development (AMDD) approach that employs GPT4 as a code generator. This approach enhances the flexibility and scalability of the code auto generation process and offers agility that allows seamless adaptation to changes in models or deployment environments. We illustrate this by modeling a multi agent Unmanned Vehicle Fleet (UVF) system using the Unified Modeling Language (UML), significantly reducing model ambiguity by integrating the Object Constraint Language (OCL) for code structure meta modeling, and the FIPA ontology language for communication semantics meta modeling. Applying GPT4 auto generation capabilities yields Java and Python code that is compatible with the JADE and PADE frameworks, respectively. Our thorough evaluation of the auto generated code verifies its alignment with expected behaviors and identifies enhancements in agent interactions. Structurally, we assessed the complexity of code derived from a model constrained solely by OCL meta models, against that influenced by both OCL and FIPA ontology meta models. The results indicate that the ontology constrained meta model produces inherently more complex code, yet its cyclomatic complexity remains within manageable levels, suggesting that additional meta model constraints can be incorporated without exceeding the high risk threshold for complexity.

Updated: 2024-10-24 07:24:11

标题: LLM作为敏捷模型驱动开发中的代码生成器

摘要: 利用类似GPT4的大型语言模型（LLM）在自动生成代码方面取得了重大进展，但也并非没有挑战。自然语言描述软件中固有的歧义给生成可部署的结构化工件带来了重大障碍。本研究倡导采用模型驱动开发（MDD）作为一种应对这些挑战的可行策略，提出了一种采用GPT4作为代码生成器的敏捷模型驱动开发（AMDD）方法。该方法增强了代码自动生成过程的灵活性和可扩展性，并提供了灵活性，使其能够无缝适应模型或部署环境的变化。我们通过使用统一建模语言（UML）对多智能体无人车队（UVF）系统进行建模来说明这一点，通过集成对象约束语言（OCL）进行代码结构元建模，以及使用FIPA本体语言进行通信语义元建模，显著减少了模型歧义。应用GPT4的自动生成能力产生了与JADE和PADE框架兼容的Java和Python代码。我们对自动生成的代码进行了彻底评估，验证了其与预期行为的一致性，并识别了在智能体交互方面的增强。在结构上，我们评估了仅受OCL元模型约束的模型衍生代码的复杂性，与同时受OCL和FIPA本体元模型影响的代码的复杂性。结果表明，受本体约束的元模型产生了本质上更复杂的代码，但其圈复杂度仍保持在可管理水平，这表明可以在不超出复杂性高风险阈值的情况下增加额外的元模型约束。

更新时间: 2024-10-24 07:24:11

领域: cs.AI,cs.ET,cs.RO,cs.SE

下载: http://arxiv.org/abs/2410.18489v1

Graph Pre-Training Models Are Strong Anomaly Detectors

Graph Anomaly Detection (GAD) is a challenging and practical research topic where Graph Neural Networks (GNNs) have recently shown promising results. The effectiveness of existing GNNs in GAD has been mainly attributed to the simultaneous learning of node representations and the classifier in an end-to-end manner. Meanwhile, graph pre-training, the two-stage learning paradigm such as DGI and GraphMAE, has shown potential in leveraging unlabeled graph data to enhance downstream tasks, yet its impact on GAD remains under-explored. In this work, we show that graph pre-training models are strong graph anomaly detectors. Specifically, we demonstrate that pre-training is highly competitive, markedly outperforming the state-of-the-art end-to-end training models when faced with limited supervision. To understand this phenomenon, we further uncover pre-training enhances the detection of distant, under-represented, unlabeled anomalies that go beyond 2-hop neighborhoods of known anomalies, shedding light on its superior performance against end-to-end models. Moreover, we extend our examination to the potential of pre-training in graph-level anomaly detection. We envision this work to stimulate a re-evaluation of pre-training's role in GAD and offer valuable insights for future research.

Updated: 2024-10-24 07:22:18

标题: 图形预训练模型是强大的异常检测器

摘要: 图异常检测（GAD）是一个具有挑战性和实践意义的研究课题，最近图神经网络（GNNs）展现出了令人期待的结果。现有GNNs在GAD中的有效性主要归因于以端到端方式同时学习节点表示和分类器。与此同时，图预训练，如DGI和GraphMAE这种两阶段学习范式，已显示出利用未标记的图数据增强下游任务的潜力，然而其对GAD的影响尚未得到充分探讨。在这项工作中，我们展示了图预训练模型是强大的图异常检测器。具体地，我们证明了预训练在面对有限监督时具有很高的竞争力，明显优于最先进的端到端训练模型。为了理解这一现象，我们进一步揭示了预训练加强了检测距离较远、代表性不足、未标记的异常，超出了已知异常的2跳邻域，为其在与端到端模型的优越性能提供了光明。此外，我们将我们的研究扩展到图级别的异常检测。我们希望这项工作能激发对预训练在GAD中的作用进行重新评估，并为未来研究提供有价值的见解。

更新时间: 2024-10-24 07:22:18

领域: cs.LG

下载: http://arxiv.org/abs/2410.18487v1

Evolving Voices Based on Temporal Poisson Factorisation

The world is evolving and so is the vocabulary used to discuss topics in speech. Analysing political speech data from more than 30 years requires the use of flexible topic models to uncover the latent topics and their change in prevalence over time as well as the change in the vocabulary of the topics. We propose the temporal Poisson factorisation (TPF) model as an extension to the Poisson factorisation model to model sparse count data matrices obtained based on the bag-of-words assumption from text documents with time stamps. We discuss and empirically compare different model specifications for the time-varying latent variables consisting either of a flexible auto-regressive structure of order one or a random walk. Estimation is based on variational inference where we consider a combination of coordinate ascent updates with automatic differentiation using batching of documents. Suitable variational families are proposed to ease inference. We compare results obtained using independent univariate variational distributions for the time-varying latent variables to those obtained with a multivariate variant. We discuss in detail the results of the TPF model when analysing speeches from 18 sessions in the U.S. Senate (1981-2016).

Updated: 2024-10-24 07:21:33

标题: 根据时间泊松因子分解演化的声音

摘要: 这个世界在不断发展，用来讨论演讲话题的词汇也在不断演变。分析超过30年的政治演讲数据需要使用灵活的主题模型来揭示潜在主题及其随时间变化的流行度以及主题词汇的变化。我们提出了时间泊松因子分解（TPF）模型作为泊松因子分解模型的扩展，用于对基于文本文档的时间戳获得的稀疏计数数据矩阵进行建模。我们讨论并在经验上比较不同的模型规范，这些模型规范包括由自适应自回归结构组成的时间变化潜在变量，或者是随机漫步。估计是基于变分推断进行的，我们考虑了结合文档批处理的坐标上升更新和自动微分。提出了适合的变分族以简化推断。我们将使用独立单变量变分分布获得的结果与使用多变量变体获得的结果进行比较。我们详细讨论了在分析美国参议院18个会议期间（1981-2016）的演讲时，TPF模型的结果。

更新时间: 2024-10-24 07:21:33

领域: stat.ME,cs.LG,62F15 (Primary) 62H99, 68U15 (Secondary),G.3; I.7.m

下载: http://arxiv.org/abs/2410.18486v1

FirmRCA: Towards Post-Fuzzing Analysis on ARM Embedded Firmware with Efficient Event-based Fault Localization

While fuzzing has demonstrated its effectiveness in exposing vulnerabilities within embedded firmware, the discovery of crashing test cases is only the first step in improving the security of these critical systems. The subsequent fault localization process, which aims to precisely identify the root causes of observed crashes, is a crucial yet time-consuming post-fuzzing work. Unfortunately, the automated root cause analysis on embedded firmware crashes remains an underexplored area, which is challenging from several perspectives: (1) the fuzzing campaign towards the embedded firmware lacks adequate debugging mechanisms, making it hard to automatically extract essential runtime information for analysis; (2) the inherent raw binary nature of embedded firmware often leads to over-tainted and noisy suspicious instructions, which provides limited guidance for analysts in manually investigating the root cause and remediating the underlying vulnerability. To address these challenges, we design and implement FirmRCA, a practical fault localization framework tailored specifically for embedded firmware. FirmRCA introduces an event-based footprint collection approach to aid and significantly expedite reverse execution. Next, to solve the complicated memory alias problem, FirmRCA proposes a history-driven method by tracking data propagation through the execution trace, enabling precise identification of deep crash origins. Finally, FirmRCA proposes a novel strategy to highlight key instructions related to the root cause, providing practical guidance in the final investigation. We evaluate FirmRCA with both synthetic and real-world targets, including 41 crashing test cases across 17 firmware images. The results show that FirmRCA can effectively (92.7% success rate) identify the root cause of crashing test cases within the top 10 instructions.

Updated: 2024-10-24 07:12:08

标题: FirmRCA：针对ARM嵌入式固件的后模糊分析，具有高效的基于事件的故障定位

摘要: 虽然模糊测试已经证明了在嵌入式固件中暴露漏洞的有效性，但发现导致崩溃测试用例的情况只是改善这些关键系统安全性的第一步。随后的故障定位过程旨在准确识别观察到的崩溃根本原因，这是一个至关重要但耗时的后模糊化工作。不幸的是，嵌入式固件崩溃的自动根本原因分析仍然是一个未经充分探讨的领域，从几个角度来看都具有挑战性：(1)针对嵌入式固件的模糊化活动缺乏足够的调试机制，使得难以自动提取用于分析的基本运行时信息；(2)嵌入式固件固有的原始二进制特性通常会导致过度污染和嘈杂的可疑指令，这为分析人员提供了有限的指导，难以手动调查根本原因并修复潜在漏洞。为了解决这些挑战，我们设计并实现了FirmRCA，这是一个专门针对嵌入式固件的实用故障定位框架。FirmRCA引入了一种基于事件的足迹收集方法，以帮助并显著加快反向执行。接下来，为了解决复杂的内存别名问题，FirmRCA提出了一种历史驱动的方法，通过跟踪数据在执行跟踪中的传播，实现对深层崩溃起源的精确识别。最后，FirmRCA提出了一种新颖的策略，突出与根本原因相关的关键指令，为最终调查提供实用指导。我们用合成和真实目标对FirmRCA进行评估，包括17个固件镜像中的41个崩溃测试用例。结果显示，FirmRCA能够有效地（92.7%的成功率）在前10个指令中识别崩溃测试用例的根本原因。

更新时间: 2024-10-24 07:12:08

领域: cs.CR

下载: http://arxiv.org/abs/2410.18483v1

Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction

Efficiently deriving structured workflows from unannotated dialogs remains an underexplored and formidable challenge in computational linguistics. Automating this process could significantly accelerate the manual design of workflows in new domains and enable the grounding of large language models in domain-specific flowcharts, enhancing transparency and controllability. In this paper, we introduce Dialog2Flow (D2F) embeddings, which differ from conventional sentence embeddings by mapping utterances to a latent space where they are grouped according to their communicative and informative functions (i.e., the actions they represent). D2F allows for modeling dialogs as continuous trajectories in a latent space with distinct action-related regions. By clustering D2F embeddings, the latent space is quantized, and dialogs can be converted into sequences of region/action IDs, facilitating the extraction of the underlying workflow. To pre-train D2F, we build a comprehensive dataset by unifying twenty task-oriented dialog datasets with normalized per-turn action annotations. We also introduce a novel soft contrastive loss that leverages the semantic information of these actions to guide the representation learning process, showing superior performance compared to standard supervised contrastive loss. Evaluation against various sentence embeddings, including dialog-specific ones, demonstrates that D2F yields superior qualitative and quantitative results across diverse domains.

Updated: 2024-10-24 07:10:18

标题: Dialog2Flow：用于自动对话流提取的预训练软对比驱动句子嵌入

摘要: 从未经注释的对话中有效地推导结构化工作流程仍然是计算语言学中一个未被充分探讨且艰巨的挑战。自动化这一过程可以显著加速在新领域中手动设计工作流程，并使大型语言模型与特定领域的流程图相结合，增强透明度和可控性。在本文中，我们介绍了一种称为Dialog2Flow（D2F）嵌入的方法，与传统的句子嵌入不同，D2F将话语映射到一个潜在空间，根据它们的交际和信息功能（即，它们代表的动作）对其进行分组。D2F允许将对话建模为在潜在空间中的连续轨迹，其中包含不同的与动作相关的区域。通过对D2F嵌入进行聚类，潜在空间被量化，对话可以转换为区域/动作ID的序列，便于提取潜在的工作流程。为了预训练D2F，我们通过统一二十个面向任务的对话数据集并标准化每轮动作注释构建了一个全面的数据集。我们还引入了一种利用这些动作的语义信息来指导表示学习过程的新型软对比损失，表现出比标准监督对比损失更优越的性能。与包括对话特定嵌入在内的各种句子嵌入进行评估表明，D2F在各个领域中产生了优越的定性和定量结果。

更新时间: 2024-10-24 07:10:18

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18481v1

Uncovering Biases with Reflective Large Language Models

Biases and errors in human-labeled data present significant challenges for machine learning, especially in supervised learning reliant on potentially flawed ground truth data. These flaws, including diagnostic errors and societal biases, risk being propagated and amplified through models trained using maximum likelihood estimation. We present the Reflective LLM Dialogue Framework RLDF, which leverages structured adversarial dialogues between multiple instances of a single LLM or different LLMs to uncover diverse perspectives and correct inconsistencies. By conditioning LLMs to adopt opposing stances, RLDF enables systematic bias detection through conditional statistics, information theory, and divergence metrics. Experiments show RLDF successfully identifies potential biases in public content while exposing limitations in human-labeled data. Our framework supports measurable progress tracking and explainable remediation actions, offering a scalable approach for improving content neutrality through transparent, multi-perspective analysis.

Updated: 2024-10-24 07:09:43

标题: 用反思性大型语言模型揭示偏见

摘要: 人类标记的数据中存在的偏见和错误对机器学习提出了重大挑战，尤其是在依赖潜在有缺陷的基本事实数据的监督学习中。这些缺陷包括诊断错误和社会偏见，有可能通过使用最大似然估计训练的模型进行传播和放大。我们提出了反思LLM对话框架（RLDF），利用结构化的对抗性对话来揭示单个LLM或不同LLM之间的多样化视角并纠正不一致性。通过让LLM采取相反的立场，RLDF通过条件统计学、信息论和分歧度量实现了系统性偏见检测。实验证明，RLDF成功地识别了公共内容中的潜在偏见，同时揭示了人类标记数据的局限性。我们的框架支持可衡量的进展跟踪和可解释的纠正措施，提供了一种通过透明、多角度分析来提高内容中立性的可扩展方法。

更新时间: 2024-10-24 07:09:43

领域: cs.AI,cs.CL,cs.LG,I.2.7

下载: http://arxiv.org/abs/2408.13464v2

A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning

Tool use, planning, and feedback learning are currently three prominent paradigms for developing Large Language Model (LLM)-based agents across various tasks. Although numerous frameworks have been devised for each paradigm, their intricate workflows and inconsistent taxonomy create challenges in understanding and reviewing the frameworks across different paradigms. This survey introduces a unified taxonomy to systematically review and discuss these frameworks. Specifically, 1) the taxonomy defines environments/tasks, common LLM-profiled roles or LMPRs (policy models, evaluators, and dynamic models), and universally applicable workflows found in prior work, and 2) it enables a comparison of key perspectives on the implementations of LMPRs and workflow designs across different agent paradigms and frameworks. 3) Finally, we identify three limitations in existing workflow designs and systematically discuss the future work. Resources have been made publicly available at in our GitHub repository https://github.com/xinzhel/LLM-Agent-Survey.

Updated: 2024-10-24 07:07:43

标题: 一种基于LLM的代理人突出范式的综述：工具使用（包括RAG）、规划和反馈学习

摘要: 工具使用、规划和反馈学习目前是开发基于大型语言模型(LLM)的代理人的三种显著范式。尽管针对每种范式已经设计了许多框架，但它们复杂的工作流程和不一致的分类法在不同范式下理解和审查这些框架时存在挑战。本调查引入了一个统一的分类法，以系统地审查和讨论这些框架。具体来说，1) 该分类法定义了环境/任务、常见的LLM-配置角色或LMPRs(策略模型、评估者和动态模型)，以及先前工作中发现的普遍适用的工作流程；2) 它使得可以比较不同代理人范式和框架中LMPRs的实施和工作流程设计的关键观点；3) 最后，我们确定了现有工作流程设计中的三个局限性，并系统地讨论了未来的工作。资源已经在我们的GitHub存储库https://github.com/xinzhel/LLM-Agent-Survey 上公开提供。

更新时间: 2024-10-24 07:07:43

领域: cs.AI,cs.CL,cs.SE

下载: http://arxiv.org/abs/2406.05804v5

Classifier Clustering and Feature Alignment for Federated Learning under Distributed Concept Drift

Data heterogeneity is one of the key challenges in federated learning, and many efforts have been devoted to tackling this problem. However, distributed concept drift with data heterogeneity, where clients may additionally experience different concept drifts, is a largely unexplored area. In this work, we focus on real drift, where the conditional distribution $P(Y|X)$ changes. We first study how distributed concept drift affects the model training and find that local classifier plays a critical role in drift adaptation. Moreover, to address data heterogeneity, we study the feature alignment under distributed concept drift, and find two factors that are crucial for feature alignment: the conditional distribution $P(Y|X)$ and the degree of data heterogeneity. Motivated by the above findings, we propose FedCCFA, a federated learning framework with classifier clustering and feature alignment. To enhance collaboration under distributed concept drift, FedCCFA clusters local classifiers at class-level and generates clustered feature anchors according to the clustering results. Assisted by these anchors, FedCCFA adaptively aligns clients' feature spaces based on the entropy of label distribution $P(Y)$, alleviating the inconsistency in feature space. Our results demonstrate that FedCCFA significantly outperforms existing methods under various concept drift settings. Code is available at https://github.com/Chen-Junbao/FedCCFA.

Updated: 2024-10-24 07:04:52

标题: 分类器聚类和特征对齐在分布式概念漂移下的联邦学习中的应用

摘要: 数据异质性是联邦学习中的一个关键挑战，许多工作致力于解决这一问题。然而，分布式数据异质性的概念漂移，即客户端可能经历不同的概念漂移，是一个较少探索的领域。在这项工作中，我们关注真实漂移，即条件分布$P(Y|X)$发生变化。我们首先研究分布式概念漂移如何影响模型训练，并发现局部分类器在漂移适应中起着关键作用。此外，为了解决数据异质性问题，我们研究了在分布式概念漂移下的特征对齐，并发现两个对特征对齐至关重要的因素：条件分布$P(Y|X)$和数据异质性程度。在上述发现的基础上，我们提出了FedCCFA，这是一个具有分类器聚类和特征对齐的联邦学习框架。为了增强在分布式概念漂移下的合作，FedCCFA将本地分类器在类级别进行聚类，并根据聚类结果生成聚类特征锚点。在这些锚点的帮助下，FedCCFA根据标签分布$P(Y)$的熵自适应地对齐客户端的特征空间，减轻特征空间中的不一致性。我们的结果表明，在各种概念漂移设置下，FedCCFA明显优于现有方法。代码可在https://github.com/Chen-Junbao/FedCCFA找到。

更新时间: 2024-10-24 07:04:52

领域: cs.LG

下载: http://arxiv.org/abs/2410.18478v1

Improving Adversarial Robust Fairness via Anti-Bias Soft Label Distillation

Adversarial Training (AT) has been widely proved to be an effective method to improve the adversarial robustness against adversarial examples for Deep Neural Networks (DNNs). As a variant of AT, Adversarial Robustness Distillation (ARD) has demonstrated its superior performance in improving the robustness of small student models with the guidance of large teacher models. However, both AT and ARD encounter the robust fairness problem: these models exhibit strong robustness when facing part of classes (easy class), but weak robustness when facing others (hard class). In this paper, we give an in-depth analysis of the potential factors and argue that the smoothness degree of samples' soft labels for different classes (i.e., hard class or easy class) will affect the robust fairness of DNNs from both empirical observation and theoretical analysis. Based on the above finding, we propose an Anti-Bias Soft Label Distillation (ABSLD) method to mitigate the adversarial robust fairness problem within the framework of Knowledge Distillation (KD). Specifically, ABSLD adaptively reduces the student's error risk gap between different classes to achieve fairness by adjusting the class-wise smoothness degree of samples' soft labels during the training process, and the smoothness degree of soft labels is controlled by assigning different temperatures in KD to different classes. Extensive experiments demonstrate that ABSLD outperforms state-of-the-art AT, ARD, and robust fairness methods in the comprehensive metric (Normalized Standard Deviation) of robustness and fairness.

Updated: 2024-10-24 06:58:05

标题: 通过反偏差软标签蒸馏提高对抗性鲁棒公平性

摘要: 对抗训练（AT）已被广泛证明是一种有效的方法，可以提高深度神经网络（DNNs）对对抗样本的对抗鲁棒性。作为AT的一种变体，对抗鲁棒性蒸馏（ARD）已经证明在以大型教师模型为指导的情况下，能够显著提升小型学生模型的鲁棒性能。然而，AT和ARD都面临着鲁棒公平性问题：这些模型在面对部分类别（易类别）时表现出强大的鲁棒性，但在面对其他类别（难类别）时则表现出较弱的鲁棒性。本文对潜在因素进行了深入分析，并认为不同类别样本的软标签的平滑程度（即难类别或易类别）将从经验观察和理论分析的角度影响DNNs的鲁棒公平性。基于上述发现，我们提出了一种反偏见软标签蒸馏（ABSLD）方法，以在知识蒸馏（KD）框架内缓解对抗鲁棒公平性问题。具体来说，ABSLD通过在训练过程中调整样本软标签的类别平滑度，通过在KD中为不同类别分配不同的温度来控制软标签的平滑度，从而自适应地减少学生在不同类别之间的错误风险差距，以达到公平性。大量实验证明，ABSLD在鲁棒性和公平性的综合指标（标准差归一化）方面优于最先进的AT、ARD和鲁棒公平性方法。

更新时间: 2024-10-24 06:58:05

领域: cs.LG,cs.CV,cs.CY

下载: http://arxiv.org/abs/2312.05508v2

Gene-Metabolite Association Prediction with Interactive Knowledge Transfer Enhanced Graph for Metabolite Production

In the rapidly evolving field of metabolic engineering, the quest for efficient and precise gene target identification for metabolite production enhancement presents significant challenges. Traditional approaches, whether knowledge-based or model-based, are notably time-consuming and labor-intensive, due to the vast scale of research literature and the approximation nature of genome-scale metabolic model (GEM) simulations. Therefore, we propose a new task, Gene-Metabolite Association Prediction based on metabolic graphs, to automate the process of candidate gene discovery for a given pair of metabolite and candidate-associated genes, as well as presenting the first benchmark containing 2474 metabolites and 1947 genes of two commonly used microorganisms Saccharomyces cerevisiae (SC) and Issatchenkia orientalis (IO). This task is challenging due to the incompleteness of the metabolic graphs and the heterogeneity among distinct metabolisms. To overcome these limitations, we propose an Interactive Knowledge Transfer mechanism based on Metabolism Graph (IKT4Meta), which improves the association prediction accuracy by integrating the knowledge from different metabolism graphs. First, to build a bridge between two graphs for knowledge transfer, we utilize Pretrained Language Models (PLMs) with external knowledge of genes and metabolites to help generate inter-graph links, significantly alleviating the impact of heterogeneity. Second, we propagate intra-graph links from different metabolic graphs using inter-graph links as anchors. Finally, we conduct the gene-metabolite association prediction based on the enriched metabolism graphs, which integrate the knowledge from multiple microorganisms. Experiments on both types of organisms demonstrate that our proposed methodology outperforms baselines by up to 12.3% across various link prediction frameworks.

Updated: 2024-10-24 06:54:27

标题: 基因-代谢物关联预测，利用增强图交互知识传递的方法促进代谢产物生产

摘要: 在快速发展的代谢工程领域，为了增强代谢产物生产效率和精确性，对基因靶标的有效和准确识别面临着重大挑战。传统方法，无论是基于知识还是基于模型，由于研究文献的广泛规模和基因组规模代谢模型（GEM）模拟的近似性质，通常耗时且劳动密集。因此，我们提出了一项新任务，基于代谢图的基因-代谢物关联预测，以自动化候选基因发现的过程，针对给定的代谢物和候选关联基因对，同时呈现了包含两种常用微生物Saccharomyces cerevisiae（SC）和Issatchenkia orientalis（IO）的2474种代谢物和1947个基因的第一个基准。由于代谢图的不完整性和不同代谢之间的异质性，这个任务具有挑战性。为了克服这些限制，我们提出了一种基于代谢图的交互式知识转移机制（IKT4Meta），通过整合来自不同代谢图的知识，提高了关联预测的准确性。首先，为了在两个图之间建立知识转移的桥梁，我们利用带有基因和代谢物外部知识的预训练语言模型（PLM）来帮助生成图间链接，显著减轻了异质性的影响。其次，我们使用图间链接作为锚点，从不同代谢图中传播图内链接。最后，我们基于丰富的代谢图进行基因-代谢物关联预测，整合了多种微生物的知识。对两种类型的生物进行的实验表明，我们提出的方法在不同的链接预测框架中优于基准最多达到12.3%。

更新时间: 2024-10-24 06:54:27

领域: cs.AI,IEEEtran

下载: http://arxiv.org/abs/2410.18475v1

What If the Input is Expanded in OOD Detection?

Out-of-distribution (OOD) detection aims to identify OOD inputs from unknown classes, which is important for the reliable deployment of machine learning models in the open world. Various scoring functions are proposed to distinguish it from in-distribution (ID) data. However, existing methods generally focus on excavating the discriminative information from a single input, which implicitly limits its representation dimension. In this work, we introduce a novel perspective, i.e., employing different common corruptions on the input space, to expand that. We reveal an interesting phenomenon termed confidence mutation, where the confidence of OOD data can decrease significantly under the corruptions, while the ID data shows a higher confidence expectation considering the resistance of semantic features. Based on that, we formalize a new scoring method, namely, Confidence aVerage (CoVer), which can capture the dynamic differences by simply averaging the scores obtained from different corrupted inputs and the original ones, making the OOD and ID distributions more separable in detection tasks. Extensive experiments and analyses have been conducted to understand and verify the effectiveness of CoVer. The code is publicly available at: https://github.com/tmlr-group/CoVer.

Updated: 2024-10-24 06:47:28

标题: 如果输入在OOD检测中扩展了会怎样？

摘要: Out-of-distribution（OOD）检测旨在识别来自未知类别的OOD输入，这对于可靠地部署机器学习模型在开放世界中是很重要的。提出了各种评分函数来区分它与在分布（ID）数据。然而，现有方法通常专注于从单个输入中挖掘有区别的信息，这隐含地限制了其表示维度。在这项工作中，我们引入了一个新颖的视角，即在输入空间上应用不同的常见破坏，以扩展其。我们揭示了一个有趣的现象，称为置信度变异，其中OOD数据的置信度在破坏下可以显著降低，而ID数据显示出更高的置信度期望，考虑到语义特征的抗性。基于此，我们规范了一种新的评分方法，即置信度平均（CoVer），它可以通过简单地对从不同破坏输入和原始输入获得的分数进行平均，捕捉动态差异，使OOD和ID分布在检测任务中更容易分离。已进行了大量实验和分析，以了解和验证CoVer的有效性。该代码可在以下网址公开获取：https://github.com/tmlr-group/CoVer。

更新时间: 2024-10-24 06:47:28

领域: cs.LG

下载: http://arxiv.org/abs/2410.18472v1

Inverting Cryptographic Hash Functions via Cube-and-Conquer

MD4 and MD5 are fundamental cryptographic hash functions proposed in the early 1990s. MD4 consists of 48 steps and produces a 128-bit hash given a message of arbitrary finite size. MD5 is a more secure 64-step extension of MD4. Both MD4 and MD5 are vulnerable to practical collision attacks, yet it is still not realistic to invert them, i.e., to find a message given a hash. In 2007, the 39-step version of MD4 was inverted by reducing to SAT and applying a CDCL solver along with the so-called Dobbertin's constraints. As for MD5, in 2012 its 28-step version was inverted via a CDCL solver for one specified hash without adding any extra constraints. In this study, Cube-and-Conquer (a combination of CDCL and lookahead) is applied to invert step-reduced versions of MD4 and MD5. For this purpose, two algorithms are proposed. The first one generates inverse problems for MD4 by gradually modifying the Dobbertin's constraints. The second algorithm tries the cubing phase of Cube-and-Conquer with different cutoff thresholds to find the one with the minimum runtime estimate of the conquer phase. This algorithm operates in two modes: (i) estimating the hardness of a given propositional Boolean formula; (ii) incomplete SAT solving of a given satisfiable propositional Boolean formula. While the first algorithm is focused on inverting step-reduced MD4, the second one is not area-specific and is therefore applicable to a variety of classes of hard SAT instances. In this study, 40-, 41-, 42-, and 43-step MD4 are inverted for the first time via the first algorithm and the estimating mode of the second algorithm. Also, 28-step MD5 is inverted for four hashes via the incomplete SAT solving mode of the second algorithm. For three hashes out of them, it is done for the first time.

Updated: 2024-10-24 06:43:53

标题: 通过立方体攻击反转密码哈希函数

摘要: MD4和MD5是在1990年代初提出的基本密码散列函数。MD4包括48个步骤，给定任意有限大小的消息生成一个128位的散列。MD5是MD4的更安全的64步扩展。虽然MD4和MD5都容易受到实际碰撞攻击的影响，但逆转它们，即找到一个给定散列的消息，仍然不现实。2007年，通过将MD4简化为SAT并应用CDCL求解器以及所谓的Dobbertin约束，将39步版本的MD4逆转。至于MD5，在2012年，通过CDCL求解器反转了28步版本的MD5，而无需添加任何额外约束。在本研究中，Cube-and-Conquer（CDCL和前瞻的结合）被应用于逆转简化步骤版本的MD4和MD5。为此，提出了两种算法。第一种算法通过逐渐修改Dobbertin约束生成MD4的逆转问题。第二种算法尝试Cube-and-Conquer的立方阶段，并使用不同的截止阈值来找到征服阶段的最小运行时间估计。这个算法有两种模式：（i）估计给定命题布尔公式的难度；（ii）对给定可满足的命题布尔公式进行不完全的SAT求解。虽然第一个算法专注于逆转简化步骤的MD4，但第二个算法不具有特定领域的特定性，因此适用于各种类别的难SAT实例。在本研究中，首次通过第一个算法和第二个算法的估计模式逆转了40、41、42和43步的MD4。此外，通过第二个算法的不完全SAT求解模式，对四个散列反转了28步的MD5。其中有三个散列是首次反转的。

更新时间: 2024-10-24 06:43:53

领域: cs.CR,cs.AI,I.2.6; I.2.8

下载: http://arxiv.org/abs/2212.02405v3

Interpretable A-posteriori Error Indication for Graph Neural Network Surrogate Models

Data-driven surrogate modeling has surged in capability in recent years with the emergence of graph neural networks (GNNs), which can operate directly on mesh-based representations of data. The goal of this work is to introduce an interpretability enhancement procedure for GNNs, with application to unstructured mesh-based fluid dynamics modeling. Given a black-box baseline GNN model, the end result is an interpretable GNN model that isolates regions in physical space, corresponding to sub-graphs, that are intrinsically linked to the forecasting task while retaining the predictive capability of the baseline. These structures identified by the interpretable GNNs are adaptively produced in the forward pass and serve as explainable links between the baseline model architecture, the optimization goal, and known problem-specific physics. Additionally, through a regularization procedure, the interpretable GNNs can also be used to identify, during inference, graph nodes that correspond to a majority of the anticipated forecasting error, adding a novel interpretable error-tagging capability to baseline models. Demonstrations are performed using unstructured flow field data sourced from flow over a backward-facing step at high Reynolds numbers, with geometry extrapolations demonstrated for ramp and wall-mounted cube configurations.

Updated: 2024-10-24 06:43:02

标题: 可解释的用于图神经网络代理模型的后验误差指示

摘要: 数据驱动的代理建模近年来在图神经网络（GNNs）的出现下取得了巨大进展，这些网络可以直接在基于网格的数据表示上操作。本文的目标是为GNNs引入一种可解释性增强程序，并将其应用于基于非结构化网格的流体动力学建模。给定一个黑盒基线GNN模型，最终结果是一个可解释的GNN模型，它能够隔离物理空间中与预测任务密切相关的子图，同时保留基线模型的预测能力。这些可解释的GNNs在前向传递中自适应地产生的结构作为可解释的链接，连接了基线模型的架构、优化目标和已知的问题特定物理。此外，通过正则化程序，可解释的GNNs还可以在推理过程中识别对应于大多数预期预测误差的图节点，为基线模型增加了一种新颖的可解释的错误标记能力。通过使用在高雷诺数下流过背向阶跃的非结构化流场数据进行演示，展示了对坡道和壁挂方块配置的几何外推。

更新时间: 2024-10-24 06:43:02

领域: cs.LG,physics.comp-ph,physics.flu-dyn

下载: http://arxiv.org/abs/2311.07548v4

Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities

Recent research has shown that Large Language Models (LLMs) are vulnerable to automated jailbreak attacks, where adversarial suffixes crafted by algorithms appended to harmful queries bypass safety alignment and trigger unintended responses. Current methods for generating these suffixes are computationally expensive and have low Attack Success Rates (ASR), especially against well-aligned models like Llama2 and Llama3. To overcome these limitations, we introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability. Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100\% ASR on various open-source LLMs. Moreover, it exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3. Beyond improving jailbreak ability, ADV-LLM provides valuable insights for future safety alignment research through its ability to generate large datasets for studying LLM safety. Our code is available at: https://github.com/SunChungEn/ADV-LLM

Updated: 2024-10-24 06:36:12

标题: 迭代式自调节LLMs以增强越狱能力

摘要: 最近的研究表明，大型语言模型（LLMs）容易受到自动越狱攻击的影响，即由算法制作的对抗性后缀附加到有害查询中，绕过安全对齐并触发意外响应。当前生成这些后缀的方法计算成本高，攻击成功率（ASR）低，特别是针对像Llama2和Llama3这样对齐良好的模型。为了克服这些限制，我们引入了ADV-LLM，这是一个迭代自调整过程，用于制作具有增强越狱能力的对抗性LLMs。我们的框架显著降低了生成对抗性后缀的计算成本，同时在各种开源LLMs上实现了近100\%的ASR。此外，它对闭源模型具有强大的攻击可移植性，在GPT-3.5上实现了99%的ASR，在GPT-4上实现了49%的ASR，尽管仅在Llama3上进行了优化。除了提高越狱能力，ADV-LLM还通过生成大型数据集来研究LLM安全性，为未来的安全对齐研究提供了有价值的见解。我们的代码可在以下网址获得：https://github.com/SunChungEn/ADV-LLM

更新时间: 2024-10-24 06:36:12

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.18469v1

Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?

Code completion, a key downstream task in code generation, is one of the most frequent and impactful methods for enhancing developer productivity in software development. As intelligent completion tools evolve, we need a robust evaluation benchmark that enables meaningful comparisons between products and guides future advancements. However, existing benchmarks focus more on coarse-grained tasks without industrial analysis resembling general code generation rather than the real-world scenarios developers encounter. Moreover, these benchmarks often rely on costly and time-consuming human annotation, and the standalone test cases fail to leverage minimal tests for maximum repository-level understanding and code coverage. To address these limitations, we first analyze business data from an industrial code completion tool and redefine the evaluation criteria to better align with the developer's intent and desired completion behavior throughout the coding process. Based on these insights, we introduce Codev-Agent, an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage, ensuring fair and effective comparisons. Using Codev-Agent, we present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Bench assesses whether a code completion tool can capture a developer's immediate intent and suggest appropriate code across diverse contexts, providing a more realistic benchmark for code completion in modern software development.

Updated: 2024-10-24 06:24:56

标题: Codev-Bench：LLMs如何理解开发人员中心的代码完成？

摘要: 代码补全是代码生成中的一个关键下游任务，是增强开发人员在软件开发中生产力的最频繁和最有影响力的方法之一。随着智能补全工具的发展，我们需要一个强大的评估基准，使产品之间能够进行有意义的比较，并指导未来的进步。然而，现有的基准更多地关注粗粒度任务，而不是类似于开发者遇到的真实场景的工业分析。此外，这些基准通常依赖昂贵且耗时的人工注释，而独立的测试用例未能充分利用最少的测试来实现仓库级的理解和代码覆盖。为了解决这些限制，我们首先分析了工业代码补全工具的业务数据，并重新定义评估标准，以更好地与开发者的意图和期望的补全行为在整个编码过程中保持一致。基于这些见解，我们引入了Codev-Agent，这是一个基于代理的系统，自动化仓库爬行，构建执行环境，从现有单元测试中提取动态调用链，并生成新的测试样本以避免数据泄漏，确保公平有效的比较。使用Codev-Agent，我们提出了Code-Development Benchmark（Codev-Bench），这是一个细粒度、真实世界、仓库级和以开发者为中心的评估框架。Codev-Bench评估一个代码补全工具是否能够捕捉开发者的即时意图，并在不同上下文中提供适当的代码建议，为现代软件开发中的代码补全提供更加现实的基准。

更新时间: 2024-10-24 06:24:56

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2410.01353v3

IBAC Mathematics and Mechanics: The Case for 'Integer Based Access Control' of Data Security in the Age of AI and AI Automation

Current methods for data access control, especially regarding AI and AI automation, face unique challenges in ensuring appropriate data access. We introduce Integer-Based Access Control (IBAC), addressing the limitations of Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC). IBAC's mathematical foundations enable its application to relational and NoSQL databases, as well as document authorization. We demonstrate IBAC's suitability for filtering relational database row-level information and AI and NLP access based on separation of duty, supporting both "need to know" and "need to share" data restrictions. IBAC uses security tokens, which are integers representing aggregated security attributes. These tokens maintain orthogonality across encoded attributes but are stored as integers for fast real-time vector comparison and efficient dominance testing. This mechanism allows high-speed row-level result filtering, ensuring unauthorized records are excluded before results reach the requester. We extend the Bell-LaPadula model by incorporating a "process constraint," overcoming RBAC and ABAC limitations with reduced complexity, increased flexibility, and enhanced performance in data filtering. Our theorems demonstrate the extended Dominance relationship, facilitating rapid federated authorization across diverse databases and file systems. This work reaffirms the practical strength of the Bell-LaPadula model in data security through (1) our mathematical extension, (2) a novel IBAC security attribute encoding scheme, and (3) a simplified dominance testing mechanism for security tokens without decoding.

Updated: 2024-10-24 06:19:57

标题: IBAC数学和力学：在人工智能和人工智能自动化时代对数据安全进行“基于整数的访问控制”的案例

摘要: 目前关于数据访问控制的方法，特别是涉及人工智能和人工智能自动化的方法，面临确保适当数据访问的独特挑战。我们引入了基于整数的访问控制（IBAC），解决了基于角色的访问控制（RBAC）和基于属性的访问控制（ABAC）的限制。IBAC的数学基础使其能够应用于关系型和NoSQL数据库，以及文档授权。我们展示了IBAC在基于职责分离的关系数据库行级信息和人工智能以及自然语言处理访问过滤方面的适用性，支持“需要知道”和“需要共享”数据限制。 IBAC使用安全令牌，这些令牌是表示聚合安全属性的整数。这些令牌在编码属性之间保持正交性，但以整数形式存储以进行快速实时向量比较和高效支配测试。这种机制允许高速行级结果过滤，确保未经授权的记录在结果到达请求者之前被排除。我们通过引入“过程约束”扩展了贝尔-拉帕杜拉模型，克服了RBAC和ABAC的限制，降低了复杂性，增加了灵活性，并提高了数据过滤性能。我们的定理展示了扩展的支配关系，促进了跨多样数据库和文件系统的快速联合授权。这项工作通过（1）我们的数学扩展，（2）一种新颖的IBAC安全属性编码方案，以及（3）一种简化的安全令牌支配测试机制，而无需解码，重新确认了贝尔-拉帕杜拉模型在数据安全方面的实际强大性。

更新时间: 2024-10-24 06:19:57

领域: cs.CR

下载: http://arxiv.org/abs/2410.19021v1

Learn 2 Rage: Experiencing The Emotional Roller Coaster That Is Reinforcement Learning

This work presents the experiments and solution outline for our teams winning submission in the Learn To Race Autonomous Racing Virtual Challenge 2022 hosted by AIcrowd. The objective of the Learn-to-Race competition is to push the boundary of autonomous technology, with a focus on achieving the safety benefits of autonomous driving. In the description the competition is framed as a reinforcement learning (RL) challenge. We focused our initial efforts on implementation of Soft Actor Critic (SAC) variants. Our goal was to learn non-trivial control of the race car exclusively from visual and geometric features, directly mapping pixels to control actions. We made suitable modifications to the default reward policy aiming to promote smooth steering and acceleration control. The framework for the competition provided real time simulation, meaning a single episode (learning experience) is measured in minutes. Instead of pursuing parallelisation of episodes we opted to explore a more traditional approach in which the visual perception was processed (via learned operators) and fed into rule-based controllers. Such a system, while not as academically "attractive" as a pixels-to-actions approach, results in a system that requires less training, is more explainable, generalises better and is easily tuned and ultimately out-performed all other agents in the competition by a large margin.

Updated: 2024-10-24 06:16:52

标题: 学习愤怒：体验强化学习中的情绪过山车

摘要: 这项工作介绍了我们团队在由AIcrowd主办的2022年“学习赛车自主竞赛虚拟挑战”中获胜提交的实验和解决方案概述。学习赛车比赛的目标是推动自主技术的边界，重点是实现自动驾驶的安全益处。在比赛描述中，将比赛框架为强化学习（RL）挑战。我们的初步努力集中在实现Soft Actor Critic（SAC）变体上。我们的目标是仅通过视觉和几何特征学习赛车的非平凡控制，直接将像素映射到控制动作。我们对默认奖励政策进行了适当修改，旨在促进平稳的转向和加速控制。比赛框架提供了实时模拟，这意味着单个周期（学习体验）的测量以分钟计。我们选择探索更传统的方法，即通过处理（通过学习操作符）视觉感知并将其输入到基于规则的控制器中，而不是追求周期的并行化。这样的系统，虽然不像像素到动作方法那样在学术上“吸引人”，但结果是，该系统需要更少的训练，更易解释，泛化能力更好，并且易于调整，最终在比赛中以较大差距超过了所有其他代理。

更新时间: 2024-10-24 06:16:52

领域: eess.SY,cs.CV,cs.LG,cs.RO,cs.SY

下载: http://arxiv.org/abs/2410.18462v1

Uncertainty-Error correlations in Evidential Deep Learning models for biomedical segmentation

In this work, we examine the effectiveness of an uncertainty quantification framework known as Evidential Deep Learning applied in the context of biomedical image segmentation. This class of models involves assigning Dirichlet distributions as priors for segmentation labels, and enables a few distinct definitions of model uncertainties. Using the cardiac and prostate MRI images available in the Medical Segmentation Decathlon for validation, we found that Evidential Deep Learning models with U-Net backbones generally yielded superior correlations between prediction errors and uncertainties relative to the conventional baseline equipped with Shannon entropy measure, Monte-Carlo Dropout and Deep Ensemble methods. We also examined these models' effectiveness in active learning, finding that relative to the standard Shannon entropy-based sampling, they yielded higher point-biserial uncertainty-error correlations while attaining similar performances in Dice-Sorensen coefficients. These superior features of EDL models render them well-suited for segmentation tasks that warrant a critical sensitivity in detecting large model errors.

Updated: 2024-10-24 06:16:04

标题: 生物医学分割中证据深度学习模型中的不确定性-误差相关性

摘要: 在这项工作中，我们研究了一种被称为证据深度学习的不确定性量化框架在生物医学图像分割领域的有效性。这类模型涉及将Dirichlet分布指定为分割标签的先验，并且能够提供一些不同的模型不确定性定义。通过在医学分割十项挑战中可用的心脏和前列腺MRI图像进行验证，我们发现具有U-Net骨干的证据深度学习模型通常在预测错误和不确定性之间产生更优越的相关性，相对于配备Shannon熵度量、蒙特卡罗Dropout和深度集合方法的传统基线。我们还研究了这些模型在主动学习中的有效性，发现相对于标准的基于Shannon熵的采样，它们产生了更高的点二列不确定性-错误相关性，同时在Dice-Sorensen系数上达到了类似的性能。这些EDL模型的优越特性使它们非常适合需要在检测大型模型错误时具有关键敏感性的分割任务。

更新时间: 2024-10-24 06:16:04

领域: eess.IV,cs.CV,cs.LG,physics.med-ph

下载: http://arxiv.org/abs/2410.18461v1

Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare

Large Language Models (LLMs) have gained significant attention in the medical domain for their human-level capabilities, leading to increased efforts to explore their potential in various healthcare applications. However, despite such a promising future, there are multiple challenges and obstacles that remain for their real-world uses in practical settings. This work discusses key challenges for LLMs in medical applications from four unique aspects: operational vulnerabilities, ethical and social considerations, performance and assessment difficulties, and legal and regulatory compliance. Addressing these challenges is crucial for leveraging LLMs to their full potential and ensuring their responsible integration into healthcare.

Updated: 2024-10-24 06:12:03

标题: 超越多项选择准确性：在医疗保健领域实施大型语言模型的现实挑战

摘要: 大型语言模型（LLMs）在医疗领域引起了重大关注，因其人类水平的能力而导致了对在各种医疗应用中探索其潜力的增加努力。然而，尽管有如此光明的前景，但在实际环境中使用中仍存在多个挑战和障碍。本文从四个独特的方面讨论了LLMs在医疗应用中的关键挑战：运营漏洞、伦理和社会考虑、性能和评估困难，以及法律和监管合规性。解决这些挑战对于充分利用LLMs的潜力并确保其负责任地整合到医疗保健中至关重要。

更新时间: 2024-10-24 06:12:03

领域: cs.AI

下载: http://arxiv.org/abs/2410.18460v1

Integrating Deep Feature Extraction and Hybrid ResNet-DenseNet Model for Multi-Class Abnormality Detection in Endoscopic Images

This paper presents a deep learning framework for the multi-class classification of gastrointestinal abnormalities in Video Capsule Endoscopy (VCE) frames. The aim is to automate the identification of ten GI abnormality classes, including angioectasia, bleeding, and ulcers, thereby reducing the diagnostic burden on gastroenterologists. Utilizing an ensemble of DenseNet and ResNet architectures, the proposed model achieves an overall accuracy of 94\% across a well-structured dataset. Precision scores range from 0.56 for erythema to 1.00 for worms, with recall rates peaking at 98% for normal findings. This study emphasizes the importance of robust data preprocessing techniques, including normalization and augmentation, in enhancing model performance. The contributions of this work lie in developing an effective AI-driven tool that streamlines the diagnostic process in gastroenterology, ultimately improving patient care and clinical outcomes.

Updated: 2024-10-24 06:10:31

标题: 将深度特征提取与混合ResNet-DenseNet模型集成，用于内窥镜图像中多类别异常检测

摘要: 本文提出了一种深度学习框架，用于视频胶囊内窥镜（VCE）帧中胃肠异常的多类分类。其目的是自动化识别十种胃肠异常类别，包括血管瘤、出血和溃疡，从而减轻胃肠病专家的诊断负担。利用DenseNet和ResNet架构的集成，所提出的模型在一个结构良好的数据集上实现了94\%的总体准确度。精确度分数从红斑的0.56到蠕虫的1.00不等，召回率达到了98%的正常结果。该研究强调了强大的数据预处理技术的重要性，包括归一化和增强，以提高模型性能。本研究的贡献在于开发了一种有效的人工智能驱动工具，简化了胃肠病学诊断过程，最终提高了患者护理和临床结果。

更新时间: 2024-10-24 06:10:31

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18457v1

Multi-Stage Airway Segmentation in Lung CT Based on Multi-scale Nested Residual UNet

Accurate and complete segmentation of airways in chest CT images is essential for the quantitative assessment of lung diseases and the facilitation of pulmonary interventional procedures. Although deep learning has led to significant advancements in medical image segmentation, maintaining airway continuity remains particularly challenging. This difficulty arises primarily from the small and dispersed nature of airway structures, as well as class imbalance in CT scans. To address these challenges, we designed a Multi-scale Nested Residual U-Net (MNR-UNet), incorporating multi-scale inputs and Residual Multi-scale Modules (RMM) into a nested residual framework to enhance information flow, effectively capturing the intricate details of small airways and mitigating gradient vanishing. Building on this, we developed a three-stage segmentation pipeline to optimize the training of the MNR-UNet. The first two stages prioritize high accuracy and sensitivity, while the third stage focuses on repairing airway breakages to balance topological completeness and correctness. To further address class imbalance, we introduced a weighted Breakage-Aware Loss (wBAL) to heighten focus on challenging samples, penalizing breakages and thereby extending the length of the airway tree. Additionally, we proposed a hierarchical evaluation framework to offer more clinically meaningful analysis. Validation on both in-house and public datasets demonstrates that our approach achieves superior performance in detecting more accurate airway voxels and identifying additional branches, significantly improving airway topological completeness. The code will be released publicly following the publication of the paper.

Updated: 2024-10-24 06:10:09

标题: 基于多尺度嵌套残差UNet的肺部CT多阶段气道分割

摘要: 在胸部CT图像中准确完整地分割气道对于定量评估肺部疾病和促进肺部介入手术至关重要。虽然深度学习在医学图像分割方面取得了显著进展，但保持气道的连续性仍然是一个特别具有挑战性的问题。这种困难主要源于气道结构的小型和分散性，以及CT扫描中的类别不平衡。为了解决这些挑战，我们设计了一个多尺度嵌套残差U-Net（MNR-UNet），将多尺度输入和残差多尺度模块（RMM）结合到嵌套残差框架中，以增强信息流动，有效捕捉小气道的复杂细节并减轻梯度消失。在此基础上，我们开发了一个三阶段分割流程来优化MNR-UNet的训练。前两个阶段优先考虑高准确度和敏感性，而第三阶段专注于修复气道断裂，以平衡拓扑完整性和正确性。为了进一步解决类别不平衡问题，我们引入了加权断裂感知损失（wBAL）来加强对具有挑战性样本的关注，惩罚断裂并从而延长气道树的长度。此外，我们提出了一个分层评估框架，提供更具临床意义的分析。在内部和公共数据集上的验证表明，我们的方法在检测更准确的气道体素和识别额外分支方面表现出优越性能，显着改善了气道的拓扑完整性。论文发表后，代码将公开发布。

更新时间: 2024-10-24 06:10:09

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.18456v1

Verifying Non-friendly Formal Verification Designs: Can We Start Earlier?

The design of Systems on Chips (SoCs) is becoming more and more complex due to technological advancements. Missed bugs can cause drastic failures in safety-critical environments leading to the endangerment of lives. To overcome these drastic failures, formal property verification (FPV) has been applied in the industry. However, there exist multiple hardware designs where the results of FPV are not conclusive even for long runtimes of model-checking tools. For this reason, the use of High-level Equivalence Checking (HLEC) tools has been proposed in the last few years. However, the procedure for how to use it inside an industrial toolchain has not been defined. For this reason, we proposed an automated methodology based on metamodeling techniques which consist of two main steps. First, an untimed algorithmic description written in C++ is verified in an early stage using generated assertions; the advantage of this step is that the assertions at the software level run in seconds and we can start our analysis with conclusive results about our algorithm before starting to write the RTL (Register Transfer Level) design. Second, this algorithmic description is verified against its sequential design using HLEC and the respective metamodel parameters. The results show that the presented methodology can find bugs early related to the algorithmic description and prepare the setup for the HLEC verification. This helps to reduce the verification efforts to set up the tool and write the properties manually which is always error-prone. The proposed framework can help teams working on datapaths to verify and make decisions in an early stage of the verification flow.

Updated: 2024-10-24 06:09:40

标题: 验证非友好形式验证设计：我们可以更早开始吗？

摘要: 由于技术进步，片上系统（SoCs）的设计变得越来越复杂。错过的错误可能会在安全关键环境中导致严重故障，危及生命。为了克服这些严重故障，行业中已经应用了形式属性验证（FPV）。然而，存在多个硬件设计，在模型检查工具长时间运行后，FPV的结果仍然不确定。因此，近年来提出了高级等效性检查（HLEC）工具的使用。然而，如何在工业工具链中使用它的程序尚未定义。因此，我们提出了一种基于元模型技术的自动化方法，包括两个主要步骤。首先，使用生成的断言对用C++编写的无时序算法描述进行早期验证；这一步的优势在于软件级别的断言在几秒内运行，我们可以在开始编写RTL（寄存器传输级）设计之前获得关于算法的确定结果。其次，通过HLEC验证算法描述与其顺序设计以及相应的元模型参数之间的一致性。结果表明，所提出的方法可以在早期发现与算法描述相关的错误，并为HLEC验证准备设置。这有助于减少验证工作量，设置工具并手动编写属性，这总是容易出错的。所提出的框架可以帮助团队在验证流程的早期阶段验证数据通路并做出决策。

更新时间: 2024-10-24 06:09:40

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2410.18454v1

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

In this report, we introduce a collection of methods to enhance reward modeling for LLMs, focusing specifically on data-centric techniques. We propose effective data selection and filtering strategies for curating high-quality open-source preference datasets, culminating in the Skywork-Reward data collection, which contains only 80K preference pairs -- significantly smaller than existing datasets. Using this curated dataset, we developed the Skywork-Reward model series -- Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B -- with the former currently holding the top position on the RewardBench leaderboard. Notably, our techniques and datasets have directly enhanced the performance of many top-ranked models on RewardBench, highlighting the practical impact of our contributions in real-world preference learning applications.

Updated: 2024-10-24 06:06:26

标题: Skywork-Reward: LLMs中奖励建模的技巧集

摘要: 在本报告中，我们介绍了一系列增强LLMs奖励建模的方法，重点放在数据中心技术上。我们提出了有效的数据选择和过滤策略，用于筛选高质量的开源偏好数据集，最终形成了Skywork-Reward数据集，其中仅包含80K偏好对--明显小于现有数据集。利用这个筛选过的数据集，我们开发了Skywork-Reward模型系列--Skywork-Reward-Gemma-27B和Skywork-Reward-Llama-3.1-8B--前者目前在RewardBench排行榜上处于领先地位。值得注意的是，我们的技术和数据集直接提升了许多在RewardBench上排名靠前的模型的性能，突显了我们在真实世界偏好学习应用中贡献的实际影响。

更新时间: 2024-10-24 06:06:26

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18451v1

On the Noise Robustness of In-Context Learning for Text Generation

Large language models (LLMs) have shown impressive performance on downstream tasks by in-context learning (ICL), which heavily relies on the quality of demonstrations selected from a large set of annotated examples. Recent works claim that in-context learning is robust to noisy demonstrations in text classification. In this work, we show that, on text generation tasks, noisy annotations significantly hurt the performance of in-context learning. To circumvent the issue, we propose a simple and effective approach called Local Perplexity Ranking (LPR), which replaces the "noisy" candidates with their nearest neighbors that are more likely to be clean. Our method is motivated by analyzing the perplexity deviation caused by noisy labels and decomposing perplexity into inherent perplexity and matching perplexity. Our key idea behind LPR is thus to decouple the matching perplexity by performing the ranking among the neighbors in semantic space. Our approach can prevent the selected demonstrations from including mismatched input-label pairs while preserving the effectiveness of the original selection methods. Extensive experiments demonstrate the effectiveness of LPR, improving the EM score by up to 18.75 on common benchmarks with noisy annotations. Our code is available at https://github.com/ml-stat-Sustech/Local-Perplexity-Ranking.

Updated: 2024-10-24 06:05:03

标题: 关于文本生成中上下文学习的噪声鲁棒性

摘要: 大型语言模型（LLMs）已经展示出在下游任务中通过上下文学习（ICL）取得了令人印象深刻的表现，这在很大程度上依赖于从大量注释示例中选择的演示质量。最近的研究声称，在文本分类中，上下文学习对嘈杂的演示是稳健的。在这项工作中，我们展示了，在文本生成任务中，嘈杂的注释显著影响了上下文学习的性能。为了避免这个问题，我们提出了一种简单而有效的方法，称为本地困惑度排名（LPR），它用更有可能是干净的最近邻替换“嘈杂”的候选项。我们的方法受到对由嘈杂标签引起的困惑度偏差进行分析以及将困惑度分解为固有困惑度和匹配困惑度的启发。因此，我们LPR背后的关键思想是通过在语义空间中对邻居进行排名来解耦匹配困惑度。我们的方法可以防止所选演示包含不匹配的输入-标签对，同时保持原始选择方法的有效性。大量实验证明了LPR的有效性，在带有嘈杂注释的常见基准上将EM分数提高了高达18.75。我们的代码可在https://github.com/ml-stat-Sustech/Local-Perplexity-Ranking找到。

更新时间: 2024-10-24 06:05:03

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.17264v3

Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation

Offline multi-agent reinforcement learning (MARL) is an emerging field with great promise for real-world applications. Unfortunately, the current state of research in offline MARL is plagued by inconsistencies in baselines and evaluation protocols, which ultimately makes it difficult to accurately assess progress, trust newly proposed innovations, and allow researchers to easily build upon prior work. In this paper, we firstly identify significant shortcomings in existing methodologies for measuring the performance of novel algorithms through a representative study of published offline MARL work. Secondly, by directly comparing to this prior work, we demonstrate that simple, well-implemented baselines can achieve state-of-the-art (SOTA) results across a wide range of tasks. Specifically, we show that on 35 out of 47 datasets used in prior work (almost 75% of cases), we match or surpass the performance of the current purported SOTA. Strikingly, our baselines often substantially outperform these more sophisticated algorithms. Finally, we correct for the shortcomings highlighted from this prior work by introducing a straightforward standardised methodology for evaluation and by providing our baseline implementations with statistically robust results across several scenarios, useful for comparisons in future work. Our proposal includes simple and sensible steps that are easy to adopt, which in combination with solid baselines and comparative results, could substantially improve the overall rigour of empirical science in offline MARL moving forward.

Updated: 2024-10-24 05:49:49

标题: 消除传统多智能体强化学习离线进展的幻象：通过标准基线和评估

摘要: 离线多智能体强化学习（MARL）是一个具有极大应用潜力的新兴领域。不幸的是，目前离线MARL研究的现状受到基线和评估协议的不一致性困扰，这最终使得准确评估进展、信任新提出的创新以及让研究人员轻松地在先前工作的基础上继续建设变得困难。在本文中，我们首先通过对已发表的离线MARL工作的代表性研究，识别了现有方法论中衡量新算法性能的显著缺陷。其次，通过与此前工作直接比较，我们展示了简单、良好实施的基线在各种任务中可以实现最先进（SOTA）的结果。具体来说，我们发现在先前工作中使用的47个数据集中的35个（几乎75%的情况下），我们与当前所谓的SOTA的性能相匹敌或超越。引人注目的是，我们的基线通常明显优于这些更复杂的算法。最后，我们通过引入一个简单的标准化评估方法来纠正这个先前工作中指出的缺陷，并提供我们的基线实现在几个场景中具有统计上的稳健结果，有助于未来工作中的比较。我们的提议包括简单而明智的步骤，易于采纳，结合可靠的基线和比较结果，可以显著提高离线MARL中实证科学的严谨性。

更新时间: 2024-10-24 05:49:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.09068v2

The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI

In this paper, we give an in-depth analysis on the mathematical problem formulations and the probabilistic optimization explorations for some of the key components in Transformer model [33] in the field of generative AI. We explore and discuss some potential further enhancement for current state of the art methods for some key underlying technologies of generative AI models from algorithmic and probabilistic optimization perspective. In particular, we present an optimal solution for sub-word encoding (SWE) based on similar initial settings as that of byte-pair encoding (BPE) algorithm in [9] with similar objectives as that of WordPiece approach in [28, 31] to maximize the likelihood of the training data. We also present cross entropy optimization method to optimize hyperparameters for word2vec model [17]. In addition, we propose a factored combination of rotary positional encoding (RoPE) [32] and attention with linear biases (ALiBi) [23] with a harmonic series. We also present a probabilistic FlashAttention [6, 7] (PrFlashAttention) method with a probability distribution over block distances in the matrix to decide which block is likely to participate in a given round of attention computation while maintaining the lower triangle shape of the tensor for autoregressive language models by re-shaping the tensors. Finally, we present staircase adaptive quantization (SAQ) of key-value (KV) cache for multi-query attention (MQA) based on the framework presented in [16] to have gradual quantization degradation while achieving reasonable model quality and cost savings.

Updated: 2024-10-24 05:29:20

标题: 《生成式人工智能中的数学建模和概率优化工程特性》

摘要: 在这篇论文中，我们对生成式人工智能领域中Transformer模型[33]的一些关键组件的数学问题形式化和概率优化探索进行了深入分析。我们从算法和概率优化的角度探讨并讨论了一些关键的生成式人工智能模型底层技术的当前最新方法的潜在进一步增强。具体而言，我们提出了一种基于与字节对编码（BPE）算法[9]类似的初始设置的子词编码（SWE）的最优解，并与WordPiece方法[28, 31]的相似目标相结合，以最大化训练数据的可能性。我们还提出了一种交叉熵优化方法，用于优化word2vec模型[17]的超参数。此外，我们提出了一种将旋转位置编码（RoPE）[32]和带有线性偏差（ALiBi）[23]的注意力相结合，并采用谐波级数。我们还提出了一种基于概率的FlashAttention[6, 7]（PrFlashAttention）方法，通过在矩阵中对块距离进行概率分布以决定哪个块可能参与给定轮次的注意力计算，同时通过重新塑造张量来保持自回归语言模型的张量的下三角形状。最后，我们提出了基于[16]中提出的框架的多查询注意力（MQA）的关键-值（KV）缓存的阶梯自适应量化（SAQ），以在实现合理的模型质量和成本节约的同时实现逐渐量化退化。

更新时间: 2024-10-24 05:29:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.18441v1

Feedback Schrödinger Bridge Matching

Recent advancements in diffusion bridges for distribution transport problems have heavily relied on matching frameworks, yet existing methods often face a trade-off between scalability and access to optimal pairings during training. Fully unsupervised methods make minimal assumptions but incur high computational costs, limiting their practicality. On the other hand, imposing full supervision of the matching process with optimal pairings improves scalability, however, it can be infeasible in many applications. To strike a balance between scalability and minimal supervision, we introduce Feedback Schr\"odinger Bridge Matching (FSBM), a novel semi-supervised matching framework that incorporates a small portion (less than 8% of the entire dataset) of pre-aligned pairs as state feedback to guide the transport map of non coupled samples, thereby significantly improving efficiency. This is achieved by formulating a static Entropic Optimal Transport (EOT) problem with an additional term capturing the semi-supervised guidance. The generalized EOT objective is then recast into a dynamic formulation to leverage the scalability of matching frameworks. Extensive experiments demonstrate that FSBM accelerates training and enhances generalization by leveraging coupled pairs guidance, opening new avenues for training matching frameworks with partially aligned datasets.

Updated: 2024-10-24 05:28:15

标题: 反馈Schrödinger桥匹配

摘要: 最近在分布式传输问题的扩散桥梁方面取得了重大进展，主要依赖于匹配框架，然而现有方法往往在可伸缩性和训练期间获取最佳配对之间面临权衡。完全无监督的方法假设最少，但会产生高计算成本，限制了其实用性。另一方面，通过完全监督匹配过程并获得最佳配对可以提高可伸缩性，但在许多应用中可能不可行。为了在可伸缩性和最小监督之间取得平衡，我们引入了反馈薛定谔桥匹配（FSBM），这是一种新颖的半监督匹配框架，将少量（不超过整个数据集的8%）预对齐对作为状态反馈，引导非耦合样本的传输映射，从而显著提高效率。这是通过制定一个静态熵最优传输（EOT）问题，其中包含一个额外项来捕获半监督指导来实现的。然后，将广义EOT目标重新构造为动态形式，以利用匹配框架的可伸缩性。广泛的实验表明，FSBM通过利用耦合对的指导加快了训练速度并增强了泛化能力，为利用部分对齐数据集训练匹配框架开辟了新途径。

更新时间: 2024-10-24 05:28:15

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.14055v2

A Huber Loss Minimization Approach to Mean Estimation under User-level Differential Privacy

Privacy protection of users' entire contribution of samples is important in distributed systems. The most effective approach is the two-stage scheme, which finds a small interval first and then gets a refined estimate by clipping samples into the interval. However, the clipping operation induces bias, which is serious if the sample distribution is heavy-tailed. Besides, users with large local sample sizes can make the sensitivity much larger, thus the method is not suitable for imbalanced users. Motivated by these challenges, we propose a Huber loss minimization approach to mean estimation under user-level differential privacy. The connecting points of Huber loss can be adaptively adjusted to deal with imbalanced users. Moreover, it avoids the clipping operation, thus significantly reducing the bias compared with the two-stage approach. We provide a theoretical analysis of our approach, which gives the noise strength needed for privacy protection, as well as the bound of mean squared error. The result shows that the new method is much less sensitive to the imbalance of user-wise sample sizes and the tail of sample distributions. Finally, we perform numerical experiments to validate our theoretical analysis.

Updated: 2024-10-24 05:26:18

标题: 一种在用户级差分隐私下的均值估计的Huber损失最小化方法

摘要: 用户完整样本的隐私保护在分布式系统中至关重要。最有效的方法是采用两阶段方案，首先找到一个小区间，然后通过将样本剪裁到该区间来得到精确估计。然而，剪裁操作会引入偏差，如果样本分布是重尾的，则偏差会很严重。此外，具有大量本地样本的用户可能会使灵敏度变得更大，因此该方法不适用于不平衡的用户。受到这些挑战的启发，我们提出了一种Huber损失最小化方法来进行用户级差分隐私下的均值估计。Huber损失的连接点可以被自适应调整以处理不平衡的用户。此外，它避免了剪裁操作，因此与两阶段方法相比，显著减少了偏差。我们对我们的方法进行了理论分析，给出了隐私保护所需的噪声强度，以及均方误差的界限。结果显示，这种新方法对用户样本大小不平衡和样本分布的尾部不太敏感。最后，我们进行了数值实验来验证我们的理论分析。

更新时间: 2024-10-24 05:26:18

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2405.13453v2

Bonsai: Gradient-free Graph Distillation for Node Classification

Graph distillation has emerged as a promising avenue to enable scalable training of GNNs by compressing the training dataset while preserving essential graph characteristics. Our study uncovers significant shortcomings in current graph distillation techniques. First, the majority of the algorithms paradoxically require training on the full dataset to perform distillation. Second, due to their gradient-emulating approach, these methods require fresh distillation for any change in hyperparameters or GNN architecture, limiting their flexibility and reusability. Finally, they fail to achieve substantial size reduction due to synthesizing fully-connected, edge-weighted graphs. To address these challenges, we present Bonsai, a novel graph distillation method empowered by the observation that \textit{computation trees} form the fundamental processing units of message-passing GNNs. Bonsai distills datasets by encoding a careful selection of \textit{exemplar} trees that maximize the representation of all computation trees in the training set. This unique approach imparts Bonsai as the first linear-time, model-agnostic graph distillation algorithm for node classification that outperforms existing baselines across $6$ real-world datasets on accuracy, while being $22$ times faster on average. Bonsai is grounded in rigorous mathematical guarantees on the adopted approximation strategies making it robust to GNN architectures, datasets, and parameters.

Updated: 2024-10-24 05:24:53

标题: 盆景：用于节点分类的无梯度图蒸馏

摘要: 图形精炼已经成为一种有希望的途径，可以通过压缩训练数据集来训练GNN，同时保留基本的图特征。我们的研究揭示了当前图形精炼技术存在显著的缺陷。首先，大多数算法矛盾地要求在完整数据集上进行训练才能进行精炼。其次，由于它们模拟梯度的方法，这些方法对超参数或GNN架构的任何更改都需要进行新的精炼，限制了它们的灵活性和可重用性。最后，它们无法实现实质性的尺寸减小，因为它们合成了全连接的、边权重图。为了解决这些挑战，我们提出了Bonsai，一种新颖的图形精炼方法，它受到了观察到的\textit{计算树}是消息传递GNN的基本处理单元的启发。Bonsai通过编码精心选择的\textit{示例}树来精炼数据集，这些树最大化了训练集中所有计算树的表示。这种独特的方法使Bonsai成为第一个线性时间的、与模型无关的节点分类图形精炼算法，在准确性方面优于现有基准，在$6$个真实世界的数据集上，平均速度快了$22$倍。Bonsai基于所采用的逼近策略的严格数学保证，使其对GNN架构、数据集和参数具有稳健性。

更新时间: 2024-10-24 05:24:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17579v2

VHELM: A Holistic Evaluation of Vision Language Models

Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality, or toxicity. Furthermore, they differ in their evaluation procedures and the scope of the evaluation, making it difficult to compare models. To address these issues, we extend the HELM framework to VLMs to present the Holistic Evaluation of Vision Language Models (VHELM). VHELM aggregates various datasets to cover one or more of the 9 aspects: visual perception, knowledge, reasoning, bias, fairness, multilinguality, robustness, toxicity, and safety. In doing so, we produce a comprehensive, multi-dimensional view of the capabilities of the VLMs across these important factors. In addition, we standardize the standard inference parameters, methods of prompting, and evaluation metrics to enable fair comparisons across models. Our framework is designed to be lightweight and automatic so that evaluation runs are cheap and fast. Our initial run evaluates 22 VLMs on 21 existing datasets to provide a holistic snapshot of the models. We uncover new key findings, such as the fact that efficiency-focused models (e.g., Claude 3 Haiku or Gemini 1.5 Flash) perform significantly worse than their full models (e.g., Claude 3 Opus or Gemini 1.5 Pro) on the bias benchmark but not when evaluated on the other aspects. For transparency, we release the raw model generations and complete results on our website (https://crfm.stanford.edu/helm/vhelm/v2.0.1). VHELM is intended to be a living benchmark, and we hope to continue adding new datasets and models over time.

Updated: 2024-10-24 05:17:36

标题: VHELM：视觉语言模型的整体评估

摘要: 目前用于评估视觉语言模型（VLMs）的基准通常集中在它们的感知或问题解决能力上，忽略了其他关键方面，如公平性、多语言性或毒性。此外，它们在评估程序和评估范围上存在差异，使得比较模型变得困难。为了解决这些问题，我们将HELM框架扩展到VLMs，提出了全面评估视觉语言模型（VHELM）。VHELM整合了各种数据集，涵盖了9个方面中的一个或多个：视觉感知、知识、推理、偏见、公平性、多语言性、鲁棒性、毒性和安全性。通过这样做，我们为VLMs在这些重要因素上的能力提供了全面的、多维的视图。此外，我们标准化了标准推理参数、提示方法和评估指标，以便在模型之间进行公平比较。我们的框架设计轻便且自动化，使评估运行便宜且快速。我们的初步运行评估了22个VLMs在21个现有数据集上，提供了模型的全面快照。我们发现了一些新的关键发现，例如，以效率为重点的模型（例如，Claude 3 Haiku或Gemini 1.5 Flash）在偏见基准上表现明显不如它们的完整模型（例如，Claude 3 Opus或Gemini 1.5 Pro），但在其他方面评估时并非如此。为了透明度，我们在我们的网站上发布了原始模型生成和完整结果（https://crfm.stanford.edu/helm/vhelm/v2.0.1）。VHELM旨在成为一个活跃的基准，我们希望随着时间的推移继续添加新的数据集和模型。

更新时间: 2024-10-24 05:17:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.07112v2

TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

Deep models have demonstrated remarkable performance in time series forecasting. However, due to the partially-observed nature of real-world applications, solely focusing on the target of interest, so-called endogenous variables, is usually insufficient to guarantee accurate forecasting. Notably, a system is often recorded into multiple variables, where the exogenous variables can provide valuable external information for endogenous variables. Thus, unlike well-established multivariate or univariate forecasting paradigms that either treat all the variables equally or ignore exogenous information, this paper focuses on a more practical setting: time series forecasting with exogenous variables. We propose a novel approach, TimeXer, to ingest external information to enhance the forecasting of endogenous variables. With deftly designed embedding layers, TimeXer empowers the canonical Transformer with the ability to reconcile endogenous and exogenous information, where patch-wise self-attention and variate-wise cross-attention are used simultaneously. Moreover, global endogenous tokens are learned to effectively bridge the causal information underlying exogenous series into endogenous temporal patches. Experimentally, TimeXer achieves consistent state-of-the-art performance on twelve real-world forecasting benchmarks and exhibits notable generality and scalability. Code is available at this repository: https://github.com/thuml/TimeXer.

Updated: 2024-10-24 05:13:24

标题: TimeXer：利用外生变量增强变压器进行时间序列预测

摘要: 深度模型在时间序列预测中表现出卓越的性能。然而，由于现实世界应用的部分观察性质，仅关注感兴趣的目标，即所谓的内生变量，通常不足以保证准确的预测。值得注意的是，系统通常被记录为多个变量，其中外生变量可以为内生变量提供有价值的外部信息。因此，与既定的多变量或单变量预测范式不同，这篇论文关注更实用的情境：带有外生变量的时间序列预测。我们提出了一种新颖的方法，称为TimeXer，用于吸收外部信息以增强内生变量的预测能力。通过巧妙设计的嵌入层，TimeXer赋予了经典的Transformer能力，使其能够调和内生和外生信息，同时使用基于补丁的自注意力和变量级的交叉注意力。此外，全局内生标记被学习用来有效地将潜在的因果信息从外生系列传递到内生时间补丁中。在实验中，TimeXer在十二个真实世界的预测基准上实现了稳定的最新性能，并展现出显著的普适性和可扩展性。代码可在以下存储库中找到：https://github.com/thuml/TimeXer。

更新时间: 2024-10-24 05:13:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.19072v2

RediSwap: MEV Redistribution Mechanism for CFMMs

Automated Market Makers (AMMs) are essential to decentralized finance, offering continuous liquidity and enabling intermediary-free trading on blockchains. However, participants in AMMs are vulnerable to Maximal Extractable Value (MEV) exploitation. Users face threats such as front-running, back-running, and sandwich attacks, while liquidity providers (LPs) incur the loss-versus-rebalancing (LVR). In this paper, we introduce RediSwap, a novel AMM designed to capture MEV at the application level and refund it fairly among users and liquidity providers. At its core, RediSwap features an MEV-redistribution mechanism that manages arbitrage opportunities within the AMM pool. We formalize the mechanism design problem and the desired game-theoretical properties. A central insight underpinning our mechanism is the interpretation of the maximal MEV value as the sum of LVR and individual user losses. We prove that our mechanism is incentive-compatible and Sybil-proof, and demonstrate that it is easy for arbitrageurs to participate. We empirically compared RediSwap with existing solutions by replaying historical AMM trades. Our results suggest that RediSwap can achieve better execution than UniswapX in 89% of trades and reduce LPs' loss to under 0.5% of the original LVR in most cases.

Updated: 2024-10-24 05:11:41

标题: RediSwap：CFMMs的MEV重新分配机制

摘要: 自动做市商（AMMs）对去中心化金融至关重要，提供持续流动性并在区块链上实现无中介交易。然而，AMM参与者容易受到最大可提取价值（MEV）的剥削。用户面临前置交易、后置交易和夹击攻击等威胁，而流动性提供者（LPs）则承担损失与再平衡（LVR）。在本文中，我们介绍了一种新型AMM RediSwap，旨在在应用级别捕获MEV，并公平地将其退还给用户和流动性提供者。RediSwap的核心特点是一个管理AMM池内套利机会的MEV重新分配机制。我们形式化了机制设计问题和所需的博弈理论属性。支持我们机制的一个核心见解是将最大MEV值解释为LVR和个体用户损失之和。我们证明我们的机制是激励兼容的和Sybil-proof，并且证明套利者容易参与。我们通过重新播放历史AMM交易，与现有解决方案进行了实证比较。我们的结果表明，RediSwap在89%的交易中可以比UniswapX实现更好的执行，并在大多数情况下将LPs的损失降低到原始LVR的0.5%以下。

更新时间: 2024-10-24 05:11:41

领域: cs.GT,cs.CR

下载: http://arxiv.org/abs/2410.18434v1

RegExplainer: Generating Explanations for Graph Neural Networks in Regression Tasks

Graph regression is a fundamental task that has gained significant attention in various graph learning tasks. However, the inference process is often not easily interpretable. Current explanation techniques are limited to understanding Graph Neural Network (GNN) behaviors in classification tasks, leaving an explanation gap for graph regression models. In this work, we propose a novel explanation method to interpret the graph regression models (XAIG-R). Our method addresses the distribution shifting problem and continuously ordered decision boundary issues that hinder existing methods away from being applied in regression tasks. We introduce a novel objective based on the graph information bottleneck theory (GIB) and a new mix-up framework, which can support various GNNs and explainers in a model-agnostic manner. Additionally, we present a self-supervised learning strategy to tackle the continuously ordered labels in regression tasks. We evaluate our proposed method on three benchmark datasets and a real-life dataset introduced by us, and extensive experiments demonstrate its effectiveness in interpreting GNN models in regression tasks.

Updated: 2024-10-24 05:11:13

标题: RegExplainer：在回归任务中为图神经网络生成解释

摘要: 图回归是一个在各种图学习任务中引起了重要关注的基本任务。然而，推断过程通常不容易解释。目前的解释技术仅限于理解图神经网络（GNN）在分类任务中的行为，为图回归模型留下了一个解释空白。在这项工作中，我们提出了一种新颖的解释方法来解释图回归模型（XAIG-R）。我们的方法解决了存在方法无法在回归任务中应用的分布移位问题和持续有序决策边界问题。我们基于图信息瓶颈理论（GIB）提出了一种新颖的目标，并引入了一个新的混合框架，可以以模型无关的方式支持各种GNN和解释器。此外，我们提出了一种自监督学习策略来解决回归任务中的持续有序标签。我们在三个基准数据集和我们介绍的一个真实数据集上评估了我们提出的方法，广泛的实验表明了它在解释GNN模型在回归任务中的有效性。

更新时间: 2024-10-24 05:11:13

领域: cs.LG,cs.AI,I.2.0

下载: http://arxiv.org/abs/2307.07840v4

Scalable Multi-Domain Adaptation of Language Models using Modular Experts

Domain-specific adaptation is critical to maximizing the performance of pre-trained language models (PLMs) on one or multiple targeted tasks, especially under resource-constrained use cases, such as edge devices. However, existing methods often struggle to balance domain-specific performance, retention of general knowledge, and efficiency for training and inference. To address these challenges, we propose Modular Domain Experts (MoDE). MoDE is a mixture-of-experts architecture that augments a general PLMs with modular, domain-specialized experts. These experts are trained independently and composed together via a lightweight training process. In contrast to standard low-rank adaptation methods, each MoDE expert consists of several transformer layers which scale better with more training examples and larger parameter counts. Our evaluation demonstrates that MoDE achieves comparable target performances to full parameter fine-tuning while achieving 1.65% better retention performance. Moreover, MoDE's architecture enables flexible sharding configurations and improves training speeds by up to 38% over state-of-the-art distributed training configurations.

Updated: 2024-10-24 05:04:57

标题: 可扩展的模块化专家使用语言模型的多领域适应

摘要: 领域特定的适应性对于最大化预训练语言模型（PLMs）在一个或多个目标任务上的性能至关重要，特别是在资源受限的使用情况下，如边缘设备。然而，现有方法常常难以平衡领域特定性能、一般知识的保留以及训练和推理的效率。为了解决这些挑战，我们提出了模块化领域专家（MoDE）。MoDE是一种混合专家体系结构，通过模块化、领域专业化的专家来增强一般的PLMs。这些专家独立训练，并通过轻量级训练过程组合在一起。与标准的低秩适应方法相反，每个MoDE专家由多个变压器层组成，随着更多的训练示例和更大的参数计数而扩展得更好。我们的评估表明，MoDE在实现与全参数微调相当的目标性能的同时，保留性能更好，提高了1.65%。此外，MoDE的体系结构支持灵活的分片配置，并且比最先进的分布式训练配置提高了高达38%的训练速度。

更新时间: 2024-10-24 05:04:57

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.10181v2

SCANet: Correcting LEGO Assembly Errors with Self-Correct Assembly Network

Autonomous assembly in robotics and 3D vision presents significant challenges, particularly in ensuring assembly correctness. Presently, predominant methods such as MEPNet focus on assembling components based on manually provided images. However, these approaches often fall short in achieving satisfactory results for tasks requiring long-term planning. Concurrently, we observe that integrating a self-correction module can partially alleviate such issues. Motivated by this concern, we introduce the Single-Step Assembly Error Correction Task, which involves identifying and rectifying misassembled components. To support research in this area, we present the LEGO Error Correction Assembly Dataset (LEGO-ECA), comprising manual images for assembly steps and instances of assembly failures. Additionally, we propose the Self-Correct Assembly Network (SCANet), a novel method to address this task. SCANet treats assembled components as queries, determining their correctness in manual images and providing corrections when necessary. Finally, we utilize SCANet to correct the assembly results of MEPNet. Experimental results demonstrate that SCANet can identify and correct MEPNet's misassembled results, significantly improving the correctness of assembly. Our code and dataset could be found at https://scanet-iros2024.github.io/.

Updated: 2024-10-24 04:59:25

标题: SCANet：使用自校正组装网络纠正LEGO组装错误

摘要: 自主装配在机器人和3D视觉中存在重大挑战，特别是在确保装配正确性方面。目前，主要的方法如MEPNet专注于根据手动提供的图像组装组件。然而，这些方法通常在需要长期规划的任务中难以取得令人满意的结果。同时，我们观察到整合自我纠正模块可以部分缓解这些问题。基于这一关注点，我们引入了单步装配错误修正任务，涉及识别和纠正错误组件的组装。为支持这一领域的研究，我们提出了LEGO错误校正装配数据集（LEGO-ECA），包括组装步骤的手动图像和组装失败实例。此外，我们提出了自我纠正装配网络（SCANet），一种解决这一任务的新方法。SCANet将已组装的组件视为查询，在手动图像中确定其正确性，并在必要时提供纠正。最后，我们利用SCANet来纠正MEPNet的组装结果。实验结果表明，SCANet能够识别和纠正MEPNet的错误组装结果，显著提高了组装的正确性。我们的代码和数据集可以在https://scanet-iros2024.github.io/找到。

更新时间: 2024-10-24 04:59:25

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2403.18195v3

Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise

Recently, research on denoising diffusion models has expanded its application to the field of image restoration. Traditional diffusion-based image restoration methods utilize degraded images as conditional input to effectively guide the reverse generation process, without modifying the original denoising diffusion process. However, since the degraded images already include low-frequency information, starting from Gaussian white noise will result in increased sampling steps. We propose Resfusion, a general framework that incorporates the residual term into the diffusion forward process, starting the reverse process directly from the noisy degraded images. The form of our inference process is consistent with the DDPM. We introduced a weighted residual noise, named resnoise, as the prediction target and explicitly provide the quantitative relationship between the residual term and the noise term in resnoise. By leveraging a smooth equivalence transformation, Resfusion determine the optimal acceleration step and maintains the integrity of existing noise schedules, unifying the training and inference processes. The experimental results demonstrate that Resfusion exhibits competitive performance on ISTD dataset, LOL dataset and Raindrop dataset with only five sampling steps. Furthermore, Resfusion can be easily applied to image generation and emerges with strong versatility. Our code and model are available at https://github.com/nkicsl/Resfusion.

Updated: 2024-10-24 04:55:59

标题: 重新融合：基于先验残差噪声的图像恢复的去噪扩散概率模型

摘要: 最近，对去噪扩散模型的研究已经将其应用拓展到图像恢复领域。传统的基于扩散的图像恢复方法利用退化图像作为条件输入，有效地引导逆向生成过程，而不修改原始的去噪扩散过程。然而，由于退化图像已经包含低频信息，从高斯白噪声开始会导致采样步骤增加。我们提出了Resfusion，一个通用框架，将剩余项纳入扩散正向过程中，直接从嘈杂的退化图像开始逆向过程。我们推断过程的形式与DDPM一致。我们引入了加权剩余噪声，命名为resnoise，作为预测目标，并明确提供了剩余项与噪声项在resnoise中的定量关系。通过利用平滑等效变换，Resfusion确定了最佳加速步骤，并保持了现有噪声计划的完整性，统一了训练和推断过程。实验结果表明，Resfusion在ISTD数据集、LOL数据集和Raindrop数据集上表现出竞争性能，仅需五个采样步骤。此外，Resfusion可以轻松应用于图像生成，并具有强大的通用性。我们的代码和模型可在https://github.com/nkicsl/Resfusion 上获得。

更新时间: 2024-10-24 04:55:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2311.14900v4

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

Existing retrieval benchmarks primarily consist of information-seeking queries (e.g., aggregated questions from search engines) where keyword or semantic-based retrieval is usually sufficient. However, many complex real-world queries require in-depth reasoning to identify relevant documents that go beyond surface form matching. For example, finding documentation for a coding question requires understanding the logic and syntax of the functions involved. To better benchmark retrieval on such challenging queries, we introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding. These queries are drawn from naturally occurring and carefully curated human data. Extensive evaluation reveals that even state-of-the-art retrieval models perform poorly on BRIGHT. The leading model on the MTEB leaderboard (Muennighoff et al., 2023), which achieves a score of 59.0 nDCG@10, produces a score of nDCG@10 of 18.3 on BRIGHT. We show that incorporating explicit reasoning about the query improves retrieval performance by up to 12.2 points. Moreover, incorporating retrieved documents from the top-performing retriever boosts question-answering performance by over 6.6 points. We believe that BRIGHT paves the way for future research on retrieval systems in more realistic and challenging settings.

Updated: 2024-10-24 04:51:21

标题: BRIGHT：一个现实而具有挑战性的基准，用于重点推理检索

摘要: 现有的检索基准主要由信息检索查询（例如，来自搜索引擎的聚合问题）组成，其中关键词或基于语义的检索通常足够。然而，许多复杂的现实世界查询需要深入推理，以识别超越表面形式匹配的相关文档。例如，查找编码问题的文档需要理解涉及的函数的逻辑和语法。为了更好地对这些具有挑战性的查询进行检索基准测试，我们介绍了BRIGHT，这是第一个需要深入推理才能检索相关文档的文本检索基准。我们的数据集包括1,384个跨越经济学、心理学、数学和编码等多个领域的真实世界查询。这些查询来自自然发生和精心策划的人类数据。广泛的评估显示，即使是最先进的检索模型在BRIGHT上表现不佳。在MTEB排行榜上领先的模型（Muennighoff等，2023年）在MTEB上的得分为59.0 nDCG@10，在BRIGHT上的得分为18.3 nDCG@10。我们表明，将查询的明确推理纳入检索性能可以提高高达12.2个点。此外，将从表现最好的检索器检索的文档纳入到问题回答性能中，可以提高超过6.6个点。我们相信，BRIGHT为未来在更现实和具有挑战性的环境中进行检索系统研究铺平了道路。

更新时间: 2024-10-24 04:51:21

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2407.12883v3

Optimizing Large Language Models for Dynamic Constraints through Human-in-the-Loop Discriminators

Large Language Models (LLMs) have recently demonstrated impressive capabilities across various real-world applications. However, due to the current text-in-text-out paradigm, it remains challenging for LLMs to handle dynamic and complex application constraints, let alone devise general solutions that meet predefined system goals. Current common practices like model finetuning and reflection-based reasoning often address these issues case-by-case, limiting their generalizability. To address this issue, we propose a flexible framework that enables LLMs to interact with system interfaces, summarize constraint concepts, and continually optimize performance metrics by collaborating with human experts. As a case in point, we initialized a travel planner agent by establishing constraints from evaluation interfaces. Then, we employed both LLM-based and human discriminators to identify critical cases and continuously improve agent performance until the desired outcomes were achieved. After just one iteration, our framework achieved a $7.78\%$ pass rate with the human discriminator (a $40.2\%$ improvement over baseline) and a $6.11\%$ pass rate with the LLM-based discriminator. Given the adaptability of our proposal, we believe this framework can be applied to a wide range of constraint-based applications and lay a solid foundation for model finetuning with performance-sensitive data samples.

Updated: 2024-10-24 04:46:32

标题: 通过人在回路鉴别器优化大型语言模型以满足动态约束

摘要: 最近，大型语言模型（LLMs）在各种实际应用中展现出了令人印象深刻的能力。然而，由于当前的文本输入-文本输出范式，LLMs难以处理动态和复杂的应用约束，更不用说设计满足预定义系统目标的通用解决方案了。目前常见的做法，如模型微调和基于反射的推理，通常是逐个案例解决这些问题，限制了它们的泛化能力。为了解决这个问题，我们提出了一个灵活的框架，使LLMs能够与系统接口互动，总结约束概念，并通过与人类专家合作不断优化性能指标。以旅行规划代理为例，我们通过评估接口建立约束，然后使用基于LLM和人类的鉴别器来识别关键案例，并不断改进代理的性能，直到达到期望的结果。经过一次迭代后，我们的框架在人类鉴别器下实现了7.78％的通过率（比基线提高了40.2％），在基于LLM的鉴别器下实现了6.11％的通过率。考虑到我们提案的适应性，我们相信这一框架可以应用于各种基于约束的应用，并为使用性能敏感数据样本进行模型微调奠定坚实基础。

更新时间: 2024-10-24 04:46:32

领域: cs.AI

下载: http://arxiv.org/abs/2410.15163v2

Effects of Scale on Language Model Robustness

Language models exhibit scaling laws, whereby increasing model and dataset size yields predictable decreases in negative log likelihood, unlocking a dazzling array of capabilities. This phenomenon spurs many companies to train ever larger models in pursuit of ever improved performance. Yet, these models are vulnerable to adversarial inputs such as ``jailbreaks'' and prompt injections that induce models to perform undesired behaviors, posing a growing risk as models become more capable. Prior work indicates that computer vision models become more robust with model and data scaling, raising the question: does language model robustness also improve with scale? We study this question empirically in the classification setting, finding that without explicit defense training, larger models tend to be modestly more robust on most tasks, though the effect is not reliable. Even with the advantage conferred by scale, undefended models remain easy to attack in absolute terms, and we thus turn our attention to explicitly training models for adversarial robustness, which we show to be a much more compute-efficient defense than scaling model size alone. In this setting, we also observe that adversarially trained larger models generalize faster and better to modified attacks not seen during training when compared with smaller models. Finally, we analyze the offense/defense balance of increasing compute, finding parity in some settings and an advantage for offense in others, suggesting that adversarial training alone is not sufficient to solve robustness, even at greater model scales.

Updated: 2024-10-24 04:40:06

标题: 规模对语言模型鲁棒性的影响

摘要: 语言模型展现出按规模定律，增加模型和数据集大小会导致负对数似然性下降，从而释放出一系列令人眼花缭乱的能力。这一现象促使许多公司培训规模越来越大的模型，以追求性能的不断提升。然而，这些模型容易受到“越狱”和提示注入等对抗性输入的影响，导致模型执行不良行为，随着模型变得更加强大，这种风险也在增加。先前的研究表明，计算机视觉模型随着模型和数据规模的增加变得更加稳健，这引发了一个问题：随着规模的扩大，语言模型的稳健性是否也会提高？我们在分类设置中对这个问题进行了实证研究，发现在大多数任务中，没有明确的防御训练，规模更大的模型往往在稳健性方面稍微更好，尽管效果并不稳定。即使在规模带来的优势下，未经防御训练的模型在绝对意义上仍然容易受到攻击，因此我们将注意力转向明确训练模型以提高对抗性稳健性，我们展示这比单纯扩大模型规模要节省大量计算资源。在这种情况下，我们还观察到，经过对抗性训练的更大型模型在面对未在训练中见过的修改攻击时更快更好地泛化。最后，我们分析了增加计算资源的攻守平衡，在某些情况下取得平衡，在其他情况下取得进攻优势，这表明单纯进行对抗性训练并不足以解决稳健性问题，即使在更大的模型规模下也是如此。

更新时间: 2024-10-24 04:40:06

领域: cs.LG,cs.AI,cs.CL,cs.CR,I.2.7

下载: http://arxiv.org/abs/2407.18213v3

Doubly Non-Central Beta Matrix Factorization for Stable Dimensionality Reduction of Bounded Support Matrix Data

We consider the problem of developing interpretable and computationally efficient matrix decomposition methods for matrices whose entries have bounded support. Such matrices are found in large-scale DNA methylation studies and many other settings. Our approach decomposes the data matrix into a Tucker representation wherein the number of columns in the constituent factor matrices is not constrained. We derive a computationally efficient sampling algorithm to solve for the Tucker decomposition. We evaluate the performance of our method using three criteria: predictability, computability, and stability. Empirical results show that our method has similar performance as other state-of-the-art approaches in terms of held-out prediction and computational complexity, but has significantly better performance in terms of stability to changes in hyper-parameters. The improved stability results in higher confidence in the results in applications where the constituent factors are used to generate and test scientific hypotheses such as DNA methylation analysis of cancer samples.

Updated: 2024-10-24 04:24:47

标题: 双重非中心贝塔矩阵分解用于有界支持矩阵数据的稳定降维

摘要: 我们考虑了为具有有界支撑的矩阵开发解释性和计算效率的矩阵分解方法的问题。这样的矩阵在大规模DNA甲基化研究和许多其他设置中被发现。我们的方法将数据矩阵分解为一个Tucker表示，其中构成因子矩阵的列数没有限制。我们推导出一个计算效率高的采样算法来解决Tucker分解问题。我们使用三个标准来评估我们的方法的性能：可预测性、可计算性和稳定性。实证结果表明，我们的方法在保持预测和计算复杂性方面与其他最先进方法具有类似的性能，但在超参数变化稳定性方面性能显著更好。改进的稳定性导致在使用构成因子生成和测试科学假设的应用中对结果更有信心，例如对癌症样本的DNA甲基化分析。

更新时间: 2024-10-24 04:24:47

领域: cs.LG

下载: http://arxiv.org/abs/2410.18425v1

A Causal Graph-Enhanced Gaussian Process Regression for Modeling Engine-out NOx

The stringent regulatory requirements on nitrogen oxides (NOx) emissions from diesel compression ignition engines require accurate and reliable models for real-time monitoring and diagnostics. Although traditional methods such as physical sensors and virtual engine control module (ECM) sensors provide essential data, they are only used for estimation. Ubiquitous literature primarily focuses on deterministic models with little emphasis on capturing the uncertainties due to sensors. The lack of probabilistic frameworks restricts the applicability of these models for robust diagnostics. The objective of this paper is to develop and validate a probabilistic model to predict engine-out NOx emissions using Gaussian process regression. Our approach is as follows. We employ three variants of Gaussian process models: the first with a standard radial basis function kernel with input window, the second incorporating a deep kernel using convolutional neural networks to capture temporal dependencies, and the third enriching the deep kernel with a causal graph derived via graph convolutional networks. The causal graph embeds physics knowledge into the learning process. All models are compared against a virtual ECM sensor using both quantitative and qualitative metrics. We conclude that our model provides an improvement in predictive performance when using an input window and a deep kernel structure. Even more compelling is the further enhancement achieved by the incorporation of a causal graph into the deep kernel. These findings are corroborated across different validation datasets.

Updated: 2024-10-24 04:23:57

标题: 一个因果图增强的高斯过程回归模型，用于建模发动机排放的NOx

摘要: 柴油压燃发动机氮氧化物（NOx）排放的严格监管要求需要准确可靠的模型进行实时监测和诊断。虽然传统方法如物理传感器和虚拟发动机控制模块（ECM）传感器提供了必要的数据，但它们仅用于估计。广泛的文献主要关注确定性模型，对于捕捉传感器引起的不确定性的重要性却很少强调。缺乏概率框架限制了这些模型用于健壮诊断的适用性。本文旨在开发和验证一个使用高斯过程回归预测发动机排放NOx的概率模型。我们的方法如下。我们采用三种高斯过程模型的变体：第一种采用具有输入窗口的标准径向基函数核，第二种结合使用卷积神经网络的深度核来捕捉时间依赖性，第三种通过图卷积网络衍生的因果图丰富深度核。因果图将物理知识嵌入学习过程中。所有模型均与虚拟ECM传感器进行了定量和定性指标的比较。我们得出结论，当使用输入窗口和深度核结构时，我们的模型在预测性能上提供了改进。更令人信服的是，通过将因果图纳入深度核，进一步增强了性能。这些发现在不同的验证数据集中得到了证实。

更新时间: 2024-10-24 04:23:57

领域: cs.LG

下载: http://arxiv.org/abs/2410.18424v1

Advancing Interpretability in Text Classification through Prototype Learning

Deep neural networks have achieved remarkable performance in various text-based tasks but often lack interpretability, making them less suitable for applications where transparency is critical. To address this, we propose ProtoLens, a novel prototype-based model that provides fine-grained, sub-sentence level interpretability for text classification. ProtoLens uses a Prototype-aware Span Extraction module to identify relevant text spans associated with learned prototypes and a Prototype Alignment mechanism to ensure prototypes are semantically meaningful throughout training. By aligning the prototype embeddings with human-understandable examples, ProtoLens provides interpretable predictions while maintaining competitive accuracy. Extensive experiments demonstrate that ProtoLens outperforms both prototype-based and non-interpretable baselines on multiple text classification benchmarks. Code and data are available at \url{https://anonymous.4open.science/r/ProtoLens-CE0B/}.

Updated: 2024-10-24 04:21:54

标题: 通过原型学习推进文本分类的可解释性

摘要: 深度神经网络在各种基于文本的任务中取得了显著的性能，但通常缺乏可解释性，使它们不太适用于透明度至关重要的应用。为了解决这个问题，我们提出了ProtoLens，一种新颖的基于原型的模型，为文本分类提供了细粒度的、子句级别的可解释性。ProtoLens使用一个基于原型感知的跨度提取模块来识别与学习原型相关的文本跨度，并使用原型对齐机制来确保在整个训练过程中原型具有语义意义。通过将原型嵌入与人可理解的示例对齐，ProtoLens提供可解释的预测结果，同时保持竞争性的准确性。大量实验表明，ProtoLens在多个文本分类基准测试中优于基于原型和不可解释基线。代码和数据可在\url{https://anonymous.4open.science/r/ProtoLens-CE0B/}上获取。

更新时间: 2024-10-24 04:21:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.17546v2

Knowledge-Assisted Privacy Preserving in Semantic Communication

Semantic communication (SC) offers promising advancements in data transmission efficiency and reliability by focusing on delivering true meaning rather than solely binary bits of messages. However, privacy concerns in SC might become outstanding. Eavesdroppers equipped with advanced semantic coding models and extensive knowledge could be capable of correctly decoding and reasoning sensitive semantics from just a few stolen bits. To this end, this article explores utilizing knowledge to enhance data privacy in SC networks. Specifically, we first identify the potential attacks in SC based on the analysis of knowledge. Then, we propose a knowledge-assisted privacy preserving SC framework, which consists of a data transmission layer for precisely encoding and decoding source messages, and a knowledge management layer responsible for injecting appropriate knowledge into the transmission pair. Moreover, we elaborate on the transceiver design in the proposed SC framework to explain how knowledge should be utilized properly. Finally, some challenges of the proposed SC framework are discussed to expedite the practical implementation.

Updated: 2024-10-24 04:05:20

标题: 基于知识的语义通信中的隐私保护

摘要: 语义通信（SC）通过着眼于传递真正的含义而不仅仅是二进制位的消息，为数据传输效率和可靠性带来了有希望的进展。然而，在SC中可能存在隐私问题。装备有先进语义编码模型和广泛知识的窃听者可能能够从仅有几个窃取的位正确解码和推理敏感语义。为此，本文探讨了利用知识增强SC网络数据隐私的方法。具体来说，我们首先通过对知识分析来识别SC中潜在的攻击。然后，我们提出了一个知识辅助的隐私保护SC框架，该框架包括一个用于精确编码和解码源消息的数据传输层，以及一个负责将适当知识注入传输对中的知识管理层。此外，我们详细阐述了在提出的SC框架中的收发器设计，以解释如何正确利用知识。最后，讨论了提出的SC框架的一些挑战，以加快实际实施。

更新时间: 2024-10-24 04:05:20

领域: cs.CR

下载: http://arxiv.org/abs/2410.18418v1

Large Language Models Reflect the Ideology of their Creators

Large language models (LLMs) are trained on vast amounts of data to generate natural language, enabling them to perform tasks like text summarization and question answering. These models have become popular in artificial intelligence (AI) assistants like ChatGPT and already play an influential role in how humans access information. However, the behavior of LLMs varies depending on their design, training, and use. In this paper, we uncover notable diversity in the ideological stance exhibited across different LLMs and languages in which they are accessed. We do this by prompting a diverse panel of popular LLMs to describe a large number of prominent and controversial personalities from recent world history, both in English and in Chinese. By identifying and analyzing moral assessments reflected in the generated descriptions, we find consistent normative differences between how the same LLM responds in Chinese compared to English. Similarly, we identify normative disagreements between Western and non-Western LLMs about prominent actors in geopolitical conflicts. Furthermore, popularly hypothesized disparities in political goals among Western models are reflected in significant normative differences related to inclusion, social inequality, and political scandals. Our results show that the ideological stance of an LLM often reflects the worldview of its creators. This raises important concerns around technological and regulatory efforts with the stated aim of making LLMs ideologically `unbiased', and it poses risks for political instrumentalization.

Updated: 2024-10-24 04:02:30

标题: 大型语言模型反映了其创作者的意识形态

摘要: 大型语言模型（LLMs）是在大量数据上进行训练以生成自然语言的，使它们能够执行文本摘要和问题回答等任务。这些模型在人工智能（AI）助手中变得流行，如ChatGPT，并且已经在人们获取信息的方式中发挥了影响力。然而，LLMs的行为取决于它们的设计、训练和使用。在本文中，我们发现不同LLMs和它们所访问的语言在表达意识形态立场上存在显著的多样性。我们通过要求一组多样化的流行LLMs描述最近世界历史上许多知名和有争议的人物，在英文和中文中进行。通过识别和分析生成的描述中反映的道德评估，我们发现相同LLM在中文和英文中的响应存在一致的规范差异。同样，我们发现西方和非西方LLMs在地缘政治冲突中的知名行为者上存在规范分歧。此外，对于西方模型之间政治目标的普遍假设在包含、社会不平等和政治丑闻方面反映出显著的规范差异。我们的研究结果表明，LLM的意识形态立场往往反映了其创作者的世界观。这引发了围绕旨在使LLMs在意识形态上“无偏见”的技术和监管努力的重要担忧，并对政治工具化构成风险。

更新时间: 2024-10-24 04:02:30

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.18417v1

SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions

Unsupervised skill discovery carries the promise that an intelligent agent can learn reusable skills through autonomous, reward-free environment interaction. Existing unsupervised skill discovery methods learn skills by encouraging distinguishable behaviors that cover diverse states. However, in complex environments with many state factors (e.g., household environments with many objects), learning skills that cover all possible states is impossible, and naively encouraging state diversity often leads to simple skills that are not ideal for solving downstream tasks. This work introduces Skill Discovery from Local Dependencies (Skild), which leverages state factorization as a natural inductive bias to guide the skill learning process. The key intuition guiding Skild is that skills that induce <b>diverse interactions</b> between state factors are often more valuable for solving downstream tasks. To this end, Skild develops a novel skill learning objective that explicitly encourages the mastering of skills that effectively induce different interactions within an environment. We evaluate Skild in several domains with challenging, long-horizon sparse reward tasks including a realistic simulated household robot domain, where Skild successfully learns skills with clear semantic meaning and shows superior performance compared to existing unsupervised reinforcement learning methods that only maximize state coverage.

Updated: 2024-10-24 04:01:59

标题: SkiLD：由因素相互作用引导的无监督技能发现

摘要: 无监督的技能发现承诺通过自主、无奖励的环境互动，智能代理可以学习可重复使用的技能。现有的无监督技能发现方法通过鼓励区分行为来学习技能，这些行为覆盖了多样的状态。然而，在具有许多状态因素的复杂环境中（例如，具有许多物体的家庭环境），学习覆盖所有可能状态的技能是不可能的，而单纯地鼓励状态多样性通常会导致简单的技能，这些技能不太适合解决下游任务。本文介绍了一种称为Skild的技能发现方法，它利用状态分解作为一种自然的归纳偏见来指导技能学习过程。Skild的关键直觉是，能够在状态因素之间引发多样化交互的技能通常对解决下游任务更有价值。为此，Skild开发了一种新颖的技能学习目标，明确鼓励掌握能够在环境中有效引发不同交互的技能。我们在几个具有挑战性、长期稀疏奖励任务的领域中评估了Skild，包括一个现实的模拟家庭机器人领域，在那里Skild成功地学习了具有明确语义意义的技能，并表现出比现有的仅最大化状态覆盖的无监督强化学习方法更优异的性能。

更新时间: 2024-10-24 04:01:59

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.18416v1

MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases

The improvement in translating natural language to structured query language (SQL) can be attributed to the advancements in large language models (LLMs). Open-source LLMs, tailored for specific database dialects such as MySQL, have shown great performance. However, cloud service providers are looking for a unified database manager service (e.g., Cosmos DB from Azure, Amazon Aurora from AWS, Lindorm from AlibabaCloud) that can support multiple dialects. This requirement has led to the concept of multi-dialect query generation, which presents challenges to LLMs. These challenges include syntactic differences among dialects and imbalanced data distribution across multiple dialects. To tackle these challenges, we propose MoMQ, a novel Mixture-of-Experts-based multi-dialect query generation framework across both relational and non-relational databases. MoMQ employs a dialect expert group for each dialect and a multi-level routing strategy to handle dialect-specific knowledge, reducing interference during query generation. Additionally, a shared expert group is introduced to address data imbalance, facilitating the transfer of common knowledge from high-resource dialects to low-resource ones. Furthermore, we have developed a high-quality multi-dialect query generation benchmark that covers relational and non-relational databases such as MySQL, PostgreSQL, Cypher for Neo4j, and nGQL for NebulaGraph. Extensive experiments have shown that MoMQ performs effectively and robustly even in resource-imbalanced scenarios.

Updated: 2024-10-24 03:42:43

标题: MoMQ: 混合专家模型增强跨关系和非关系数据库的多方言查询生成

摘要: 自然语言到结构化查询语言（SQL）的翻译改进可以归因于大型语言模型（LLMs）的进步。针对特定数据库方言（如MySQL）定制的开源LLMs表现出色。然而，云服务提供商正在寻求能够支持多个方言的统一数据库管理服务（例如来自Azure的Cosmos DB，来自AWS的Amazon Aurora，来自阿里云的Lindorm）。这一要求促使了多方言查询生成的概念，这给LLMs带来了挑战。这些挑战包括不同方言之间的语法差异和多个方言之间的数据分布不均衡。为了解决这些挑战，我们提出了MoMQ，这是一种基于专家混合的跨关系和非关系数据库的多方言查询生成框架。MoMQ为每种方言使用一个方言专家组和多级路由策略来处理方言特定知识，减少查询生成过程中的干扰。此外，引入了一个共享专家组来解决数据不平衡问题，促进将高资源方言的通用知识传递到低资源方言。此外，我们开发了一个高质量的多方言查询生成基准，涵盖了关系和非关系数据库，如MySQL，PostgreSQL，Neo4j的Cypher和NebulaGraph的nGQL。大量实验表明，即使在资源不平衡的情况下，MoMQ的性能和稳健性也表现出色。

更新时间: 2024-10-24 03:42:43

领域: cs.CL,cs.AI,cs.DB,cs.LG

下载: http://arxiv.org/abs/2410.18406v1

Enhancing Feature-Specific Data Protection via Bayesian Coordinate Differential Privacy

Local Differential Privacy (LDP) offers strong privacy guarantees without requiring users to trust external parties. However, LDP applies uniform protection to all data features, including less sensitive ones, which degrades performance of downstream tasks. To overcome this limitation, we propose a Bayesian framework, Bayesian Coordinate Differential Privacy (BCDP), that enables feature-specific privacy quantification. This more nuanced approach complements LDP by adjusting privacy protection according to the sensitivity of each feature, enabling improved performance of downstream tasks without compromising privacy. We characterize the properties of BCDP and articulate its connections with standard non-Bayesian privacy frameworks. We further apply our BCDP framework to the problems of private mean estimation and ordinary least-squares regression. The BCDP-based approach obtains improved accuracy compared to a purely LDP-based approach, without compromising on privacy.

Updated: 2024-10-24 03:39:55

标题: 通过贝叶斯坐标差分隐私增强特征特定的数据保护

摘要: 局部差分隐私（LDP）在不需要用户信任外部方的情况下提供了强大的隐私保护。然而，LDP对所有数据特征（包括不太敏感的特征）应用统一保护，这会降低下游任务的性能。为了克服这一限制，我们提出了一个贝叶斯框架，即贝叶斯坐标差分隐私（BCDP），它能够实现特征特定的隐私量化。这种更加细致的方法通过根据每个特征的敏感性调整隐私保护，可以改善下游任务的性能而不损害隐私。我们表征了BCDP的特性，并阐明了它与标准非贝叶斯隐私框架的联系。我们进一步将我们的BCDP框架应用于私密均值估计和普通最小二乘回归问题。与纯粹基于LDP的方法相比，基于BCDP的方法获得了更高的准确性，而不会损害隐私。

更新时间: 2024-10-24 03:39:55

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2410.18404v1

Structure Language Models for Protein Conformation Generation

Proteins adopt multiple structural conformations to perform their diverse biological functions, and understanding these conformations is crucial for advancing drug discovery. Traditional physics-based simulation methods often struggle with sampling equilibrium conformations and are computationally expensive. Recently, deep generative models have shown promise in generating protein conformations as a more efficient alternative. However, these methods predominantly rely on the diffusion process within a 3D geometric space, which typically centers around the vicinity of metastable states and is often inefficient in terms of runtime. In this paper, we introduce Structure Language Modeling (SLM) as a novel framework for efficient protein conformation generation. Specifically, the protein structures are first encoded into a compact latent space using a discrete variational auto-encoder, followed by conditional language modeling that effectively captures sequence-specific conformation distributions. This enables a more efficient and interpretable exploration of diverse ensemble modes compared to existing methods. Based on this general framework, we instantiate SLM with various popular LM architectures as well as proposing the ESMDiff, a novel BERT-like structure language model fine-tuned from ESM3 with masked diffusion. We verify our approach in various scenarios, including the equilibrium dynamics of BPTI, conformational change pairs, and intrinsically disordered proteins. SLM provides a highly efficient solution, offering a 20-100x speedup than existing methods in generating diverse conformations, shedding light on promising avenues for future research.

Updated: 2024-10-24 03:38:51

标题: 为蛋白质构象生成设计结构语言模型

摘要: 蛋白质采用多种结构构象来执行其多样的生物功能，了解这些构象对于推进药物发现至关重要。传统基于物理的模拟方法经常难以采样平衡构象，并且计算成本高昂。最近，深度生成模型显示出在生成蛋白质构象方面具有更高效的替代方法的潜力。然而，这些方法主要依赖于三维几何空间内的扩散过程，通常集中在亚稳态附近，并且在运行时通常效率低下。在本文中，我们介绍了结构语言建模（SLM）作为一种用于高效生成蛋白质构象的新框架。具体而言，蛋白质结构首先被编码为一个紧凑的潜在空间，使用离散变分自编码器，然后进行有效捕获特定序列构象分布的条件语言建模。与现有方法相比，这使得对各种集合模式的更高效和可解释的探索成为可能。基于这一通用框架，我们使用各种流行的LM架构对SLM进行实例化，并提出了ESMDiff，一种从ESM3微调而来的新型类似BERT的结构语言模型，具有掩蔽扩散。我们在各种场景中验证了我们的方法，包括BPTI的平衡动力学、构象变化对和固有无序蛋白质。SLM提供了一个高效的解决方案，在生成多样的构象方面比现有方法快20-100倍，为未来研究提供了有希望的途径。

更新时间: 2024-10-24 03:38:51

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2410.18403v1

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

The advancements of language language models (LLMs) have piqued growing interest in developing LLM-based language agents to automate scientific discovery end-to-end, which has sparked both excitement and skepticism about their true capabilities. In this work, we call for rigorous assessment of agents on individual tasks in a scientific workflow before making bold claims on end-to-end automation. To ensure the scientific authenticity and real-world relevance of our benchmark, we extract 102 tasks from 44 peer-reviewed publications in four disciplines and engage nine subject matter experts to validate them. We unify the target output for every task to a self-contained Python program file and employ an array of evaluation metrics to examine the generated programs, execution results, and costs. Each task goes through multiple rounds of manual validation by annotators and subject matter experts to ensure its annotation quality and scientific plausibility. We also propose two effective strategies to mitigate data contamination concerns. Using our benchmark, we evaluate five open-weight and proprietary LLMs, each with three frameworks: direct prompting, OpenHands CodeAct, and self-debug. Given three attempts for each task, the best-performing agent can only solve 32.4% of the tasks independently and 34.3% with expert-provided knowledge. In addition, we evaluate OpenAI o1 with direct prompting and self-debug, which demonstrates the effectiveness of increasing inference-time compute. Still, our results underscore the limitations of current language agents in generating code for data-driven discovery, let alone end-to-end automation for scientific research.

Updated: 2024-10-24 03:37:05

标题: ScienceAgentBench：朝着对数据驱动科学发现的语言代理进行严格评估

摘要: 语言模型（LLMs）的进展引起了对开发基于LLM的语言代理以自动化科学发现的日益关注，这引发了人们对它们真正能力的兴奋和怀疑。在这项工作中，我们呼吁在对整个自动化过程进行大胆声明之前，对科学工作流中的每个任务的代理进行严格评估。为了确保我们基准的科学真实性和现实相关性，我们从四个学科的44篇同行评审的出版物中提取了102个任务，并邀请了九位学科专家对其进行验证。我们将每个任务的目标输出统一为一个独立的Python程序文件，并使用一系列评估指标来检查生成的程序、执行结果和成本。每个任务都经过多轮手动验证，由注释者和学科专家来确保其注释质量和科学可信度。我们还提出了两种有效的策略来减轻数据污染的担忧。借助我们的基准，我们评估了五种开放权重和专有LLMs，每种LLM都有三个框架：直接提示、OpenHands CodeAct和自我调试。对于每个任务进行三次尝试，表现最佳的代理只能独立解决32.4%的任务，与专家提供的知识一起解决34.3%的任务。此外，我们评估了OpenAI o1，并使用直接提示和自我调试，证明了增加推理时间计算的有效性。然而，我们的结果强调了当前语言代理在为数据驱动的发现生成代码方面的局限性，更不用说用于科学研究的端到端自动化。

更新时间: 2024-10-24 03:37:05

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05080v2

Low-Rank Tensor Learning by Generalized Nonconvex Regularization

In this paper, we study the problem of low-rank tensor learning, where only a few of training samples are observed and the underlying tensor has a low-rank structure. The existing methods are based on the sum of nuclear norms of unfolding matrices of a tensor, which may be suboptimal. In order to explore the low-rankness of the underlying tensor effectively, we propose a nonconvex model based on transformed tensor nuclear norm for low-rank tensor learning. Specifically, a family of nonconvex functions are employed onto the singular values of all frontal slices of a tensor in the transformed domain to characterize the low-rankness of the underlying tensor. An error bound between the stationary point of the nonconvex model and the underlying tensor is established under restricted strong convexity on the loss function (such as least squares loss and logistic regression) and suitable regularity conditions on the nonconvex penalty function. By reformulating the nonconvex function into the difference of two convex functions, a proximal majorization-minimization (PMM) algorithm is designed to solve the resulting model. Then the global convergence and convergence rate of PMM are established under very mild conditions. Numerical experiments are conducted on tensor completion and binary classification to demonstrate the effectiveness of the proposed method over other state-of-the-art methods.

Updated: 2024-10-24 03:33:20

标题: Generalized Nonconvex Regularization下的低秩张量学习

摘要: 在这篇论文中，我们研究了低秩张量学习的问题，其中只观察到少量训练样本，而底层张量具有低秩结构。现有的方法基于张量的展开矩阵的核范数之和，这可能是次优的。为了有效探索底层张量的低秩性，我们提出了一种基于转换张量核范数的非凸模型，用于低秩张量学习。具体而言，在转换域中将一组非凸函数应用于张量的所有前向切片的奇异值，以表征底层张量的低秩性。在损失函数上施加受限强凸性（如最小二乘损失和逻辑回归）以及非凸惩罚函数的适当正则条件下，建立了非凸模型的局部极小点与底层张量之间的误差界。通过将非凸函数重新表述为两个凸函数的差值，设计了一种用于解决所得模型的近端主要化减极小（PMM）算法。然后，在非常温和的条件下建立了PMM的全局收敛性和收敛速度。在张量完成和二元分类问题上进行了数值实验，以证明所提方法相对于其他最先进方法的有效性。

更新时间: 2024-10-24 03:33:20

领域: cs.LG

下载: http://arxiv.org/abs/2410.18402v1

A Survey on LoRA of Large Language Models

Low-Rank Adaptation~(LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. Furthermore, it has significant advantages in cross-task generalization and privacy-preserving. Hence, LoRA has gained much attention recently, and the number of related literature demonstrates exponential growth. It is necessary to conduct a comprehensive overview of the current progress on LoRA. This survey categorizes and reviews the progress from the perspectives of (1) downstream adaptation improving variants that improve LoRA's performance on downstream tasks; (2) cross-task generalization methods that mix multiple LoRA plugins to achieve cross-task generalization; (3) efficiency-improving methods that boost the computation-efficiency of LoRA; (4) data privacy-preserving methods that use LoRA in federated learning; (5) application. Besides, this survey also discusses the future directions in this field. At last, we provide a Github page~\footnote{\href{https://github.com/ZJU-LLMs/Awesome-LoRAs.git}{https://github.com/ZJU-LLMs/Awesome-LoRAs.git}} for readers to check the updates and initiate discussions on this survey paper.

Updated: 2024-10-24 03:30:46

标题: 大规模语言模型的LoRA调查

摘要: 低秩适应（LoRA）是一种通过可插拔低秩矩阵更新密集神经网络层的最佳性能参数有效微调范例之一。此外，它在跨任务泛化和隐私保护方面具有显著优势。因此，LoRA最近引起了广泛关注，相关文献数量呈指数增长。有必要对LoRA的当前进展进行全面概述。本调查从以下角度对进展进行分类和审查：（1）改进下游适应变体，提高LoRA在下游任务上的性能；（2）交叉任务泛化方法，混合多个LoRA插件以实现跨任务泛化；（3）提高效率的方法，提高LoRA的计算效率；（4）数据隐私保护方法，利用LoRA进行联邦学习；（5）应用。此外，本调查还讨论了该领域的未来方向。最后，我们提供了一个Github页面供读者查看更新并启动关于本调查论文的讨论。

更新时间: 2024-10-24 03:30:46

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.11046v4

FDF: Flexible Decoupled Framework for Time Series Forecasting with Conditional Denoising and Polynomial Modeling

Time series forecasting is vital in numerous web applications, influencing critical decision-making across industries. While diffusion models have recently gained increasing popularity for this task, we argue they suffer from a significant drawback: indiscriminate noise addition to the original time series followed by denoising, which can obscure underlying dynamic evolving trend and complicate forecasting. To address this limitation, we propose a novel flexible decoupled framework (FDF) that learns high-quality time series representations for enhanced forecasting performance. A key characteristic of our approach leverages the inherent inductive bias of time series data of its decomposed trend and seasonal components, each modeled separately to enable decoupled analysis and modeling. Specifically, we propose an innovative Conditional Denoising Seasonal Module (CDSM) within the diffusion model, which leverages statistical information from the historical window to conditionally model the complex seasonal component. Notably, we incorporate a Polynomial Trend Module (PTM) to effectively capture the smooth trend component, thereby enhancing the model's ability to represent temporal dependencies. Extensive experiments validate the effectiveness of our framework, demonstrating superior performance over existing methods and highlighting its flexibility in time series forecasting. The source code is available at https://github.com/zjt-gpu/FDF.

Updated: 2024-10-24 03:27:06

标题: FDF：具有条件去噪和多项式建模的时间序列预测的灵活解耦框架

摘要: 时间序列预测在许多网络应用中至关重要，影响跨行业的关键决策。虽然扩散模型最近在这项任务中越来越受欢迎，但我们认为它们存在一个重大缺点：在原始时间序列上不加选择地添加噪声，然后进行去噪，这可能会掩盖潜在的动态发展趋势并复杂化预测。为了解决这一限制，我们提出了一种新颖的灵活解耦框架（FDF），该框架学习高质量的时间序列表示，以提高预测性能。我们方法的一个关键特征利用了时间序列数据的固有归纳偏差，对其分解的趋势和季节成分进行分别建模，以实现解耦分析和建模。具体地，我们在扩散模型中提出了一种创新的条件去噪季节模块（CDSM），它利用历史窗口中的统计信息有条件地建模复杂的季节成分。值得注意的是，我们还加入了一个多项式趋势模块（PTM），以有效捕捉平滑的趋势成分，从而增强模型表示时间依赖性的能力。广泛的实验证实了我们框架的有效性，表明其在时间序列预测方面优于现有方法，并突显了其灵活性。源代码可在https://github.com/zjt-gpu/FDF找到。

更新时间: 2024-10-24 03:27:06

领域: cs.LG

下载: http://arxiv.org/abs/2410.13253v3

Exogenous Matching: Learning Good Proposals for Tractable Counterfactual Estimation

We propose an importance sampling method for tractable and efficient estimation of counterfactual expressions in general settings, named Exogenous Matching. By minimizing a common upper bound of counterfactual estimators, we transform the variance minimization problem into a conditional distribution learning problem, enabling its integration with existing conditional distribution modeling approaches. We validate the theoretical results through experiments under various types and settings of Structural Causal Models (SCMs) and demonstrate the outperformance on counterfactual estimation tasks compared to other existing importance sampling methods. We also explore the impact of injecting structural prior knowledge (counterfactual Markov boundaries) on the results. Finally, we apply this method to identifiable proxy SCMs and demonstrate the unbiasedness of the estimates, empirically illustrating the applicability of the method to practical scenarios.

Updated: 2024-10-24 03:20:20

标题: 外部匹配：学习可行的反事实估计好建议

摘要: 我们提出了一种重要性抽样方法，用于在一般情况下对反事实表达进行可操作和高效的估计，命名为外生匹配。通过最小化反事实估计器的常用上界，我们将方差最小化问题转化为条件分布学习问题，使其能够与现有的条件分布建模方法集成。我们通过在各种类型和设置的结构因果模型（SCMs）下的实验验证了理论结果，并展示了在反事实估计任务上相较于其他现有的重要性抽样方法的表现优越性。我们还探讨了注入结构先验知识（反事实马尔可夫边界）对结果的影响。最后，我们将这种方法应用于可识别的代理SCMs，并展示了估计的无偏性，从而在实证上说明了该方法在实际场景中的适用性。

更新时间: 2024-10-24 03:20:20

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.13914v3

Revisiting Differentiable Structure Learning: Inconsistency of $\ell_1$ Penalty and Beyond

Recent advances in differentiable structure learning have framed the combinatorial problem of learning directed acyclic graphs as a continuous optimization problem. Various aspects, including data standardization, have been studied to identify factors that influence the empirical performance of these methods. In this work, we investigate critical limitations in differentiable structure learning methods, focusing on settings where the true structure can be identified up to Markov equivalence classes, particularly in the linear Gaussian case. While Ng et al. (2024) highlighted potential non-convexity issues in this setting, we demonstrate and explain why the use of $\ell_1$-penalized likelihood in such cases is fundamentally inconsistent, even if the global optimum of the optimization problem can be found. To resolve this limitation, we develop a hybrid differentiable structure learning method based on $\ell_0$-penalized likelihood with hard acyclicity constraint, where the $\ell_0$ penalty can be approximated by different techniques including Gumbel-Softmax. Specifically, we first estimate the underlying moral graph, and use it to restrict the search space of the optimization problem, which helps alleviate the non-convexity issue. Experimental results show that the proposed method enhances empirical performance both before and after data standardization, providing a more reliable path for future advancements in differentiable structure learning, especially for learning Markov equivalence classes.

Updated: 2024-10-24 03:17:14

标题: 重新审视可微结构学习：$\ell_1$惩罚的不一致性及其他可能

摘要: 最近在可微结构学习方面取得了进展，将学习有向无环图的组合问题框定为连续优化问题。已经研究了各种方面，包括数据标准化，以确定影响这些方法经验性能的因素。在这项工作中，我们调查了可微结构学习方法中的关键限制，重点关注真实结构可以被识别为马尔科夫等价类的情况，特别是在线性高斯情况下。虽然Ng等人（2024年）强调了这种情况下潜在的非凸问题，但我们展示并解释了为什么在这种情况下使用$\ell_1$惩罚似然是基本不一致的，即使可以找到优化问题的全局最优解。为了解决这个限制，我们开发了一种基于$\ell_0$惩罚似然的混合可微结构学习方法，带有硬无环约束，其中$\ell_0$惩罚可以通过不同的技术来近似，包括Gumbel-Softmax。具体而言，我们首先估计潜在的道德图，并使用它来限制优化问题的搜索空间，有助于缓解非凸性问题。实验结果显示，提出的方法在数据标准化前后都提高了经验性能，为未来在不同iable结构学习中取得更可靠的进展提供了更可靠的途径，特别是用于学习马尔科夫等价类。

更新时间: 2024-10-24 03:17:14

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.18396v1

RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment

Recent advances in text-to-image diffusion models have achieved remarkable success in generating high-quality, realistic images from textual descriptions. However, these approaches have faced challenges in precisely aligning the generated visual content with the textual concepts described in the prompts. In this paper, we propose a two-stage coarse-to-fine semantic re-alignment method, named RealignDiff, aimed at improving the alignment between text and images in text-to-image diffusion models. In the coarse semantic re-alignment phase, a novel caption reward, leveraging the BLIP-2 model, is proposed to evaluate the semantic discrepancy between the generated image caption and the given text prompt. Subsequently, the fine semantic re-alignment stage employs a local dense caption generation module and a re-weighting attention modulation module to refine the previously generated images from a local semantic view. Experimental results on the MS-COCO and ViLG-300 datasets demonstrate that the proposed two-stage coarse-to-fine semantic re-alignment method outperforms other baseline re-alignment techniques by a substantial margin in both visual quality and semantic similarity with the input prompt.

Updated: 2024-10-24 03:14:22

标题: RealignDiff：利用粗到细语义重新对齐增强文本到图像扩散模型

摘要: 最近，文本到图像扩散模型取得了显著的成功，能够从文本描述中生成高质量、逼真的图像。然而，这些方法在将生成的视觉内容与文本描述的概念精确对齐方面面临挑战。本文提出了一种名为RealignDiff的两阶段粗到细的语义重新对齐方法，旨在改善文本到图像扩散模型中文本和图像之间的对齐。在粗语义重新对齐阶段，提出了一种利用BLIP-2模型的新颖标题奖励，用于评估生成的图像标题与给定文本提示之间的语义差异。随后，精细语义重新对齐阶段采用本地密集标题生成模块和重新加权注意力调制模块，从本地语义视角优化先前生成的图像。在MS-COCO和ViLG-300数据集上的实验结果表明，提出的两阶段粗到细的语义重新对齐方法在视觉质量和与输入提示的语义相似性方面明显优于其他基线重新对齐技术。

更新时间: 2024-10-24 03:14:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2305.19599v5

A contrastive-learning approach for auditory attention detection

Carrying conversations in multi-sound environments is one of the more challenging tasks, since the sounds overlap across time and frequency making it difficult to understand a single sound source. One proposed approach to help isolate an attended speech source is through decoding the electroencephalogram (EEG) and identifying the attended audio source using statistical or machine learning techniques. However, the limited amount of data in comparison to other machine learning problems and the distributional shift between different EEG recordings emphasizes the need for a self supervised approach that works with limited data to achieve a more robust solution. In this paper, we propose a method based on self supervised learning to minimize the difference between the latent representations of an attended speech signal and the corresponding EEG signal. This network is further finetuned for the auditory attention classification task. We compare our results with previously published methods and achieve state-of-the-art performance on the validation set.

Updated: 2024-10-24 03:13:53

标题: 一种用于听觉注意力检测的对比学习方法

摘要: 在多声音环境中进行对话是一项更具挑战性的任务，因为声音在时间和频率上重叠，使得理解单一声源变得困难。为帮助隔离一个注意力集中的语音源，一种提出的方法是通过解码脑电图（EEG）并使用统计或机器学习技术识别关注的音频源。然而，与其他机器学习问题相比，数据量有限，不同EEG记录之间的分布变化强调了需要一种自监督方法，以应对有限数据实现更稳健的解决方案。在本文中，我们提出了一种基于自监督学习的方法，以最小化注意力集中的语音信号和相应EEG信号之间的潜在表示之间的差异。然后，进一步微调该网络以进行听觉注意力分类任务。我们将我们的结果与先前发表的方法进行比较，并在验证集上取得了最先进的性能。

更新时间: 2024-10-24 03:13:53

领域: cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.18395v1

Faster Algorithms for User-Level Private Stochastic Convex Optimization

We study private stochastic convex optimization (SCO) under user-level differential privacy (DP) constraints. In this setting, there are $n$ users (e.g., cell phones), each possessing $m$ data items (e.g., text messages), and we need to protect the privacy of each user's entire collection of data items. Existing algorithms for user-level DP SCO are impractical in many large-scale machine learning scenarios because: (i) they make restrictive assumptions on the smoothness parameter of the loss function and require the number of users to grow polynomially with the dimension of the parameter space; or (ii) they are prohibitively slow, requiring at least $(mn)^{3/2}$ gradient computations for smooth losses and $(mn)^3$ computations for non-smooth losses. To address these limitations, we provide novel user-level DP algorithms with state-of-the-art excess risk and runtime guarantees, without stringent assumptions. First, we develop a linear-time algorithm with state-of-the-art excess risk (for a non-trivial linear-time algorithm) under a mild smoothness assumption. Our second algorithm applies to arbitrary smooth losses and achieves optimal excess risk in $\approx (mn)^{9/8}$ gradient computations. Third, for non-smooth loss functions, we obtain optimal excess risk in $n^{11/8} m^{5/4}$ gradient computations. Moreover, our algorithms do not require the number of users to grow polynomially with the dimension.

Updated: 2024-10-24 03:02:33

标题: 用户级私有随机凸优化的更快算法

摘要: 我们研究了在用户级差分隐私（DP）约束条件下的私人随机凸优化（SCO）。在这种情况下，有$n$个用户（例如，手机），每个用户都拥有$m$个数据项（例如，文本消息），我们需要保护每个用户整个数据项集合的隐私。现有的用户级DP SCO算法在许多大规模机器学习场景中是不实际的，因为：（i）它们对损失函数的平滑参数进行了限制性假设，并要求用户数量与参数空间的维度呈多项式增长；或者（ii）它们速度过慢，至少需要$(mn)^{3/2}$梯度计算来处理平滑损失，以及$(mn)^3$次计算来处理非平滑损失。为了解决这些限制，我们提供了具有最先进的超额风险和运行时间保证的新型用户级DP算法，而无需严格的假设。首先，我们开发了一个线性时间算法，在轻微平滑假设下具有最先进的超额风险（对于一个非平凡的线性时间算法）。我们的第二个算法适用于任意平滑损失，并在$\approx (mn)^{9/8}$个梯度计算中实现最优超额风险。第三，对于非平滑损失函数，我们在$n^{11/8} m^{5/4}$个梯度计算中获得最优超额风险。此外，我们的算法不要求用户数量随维度的增长而呈多项式增长。

更新时间: 2024-10-24 03:02:33

领域: cs.LG,cs.CR,math.OC

下载: http://arxiv.org/abs/2410.18391v1

CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation

In recent years, instruction fine-tuning (IFT) on large language models (LLMs) has garnered considerable attention to enhance model performance on unseen tasks. Attempts have been made on automatic construction and effective selection for IFT data. However, we posit that previous methods have not fully harnessed the potential of LLMs for enhancing data quality. The responses within IFT data could be further enhanced by leveraging the capabilities of LLMs themselves. In this paper, we propose CoEvol, an LLM-based multi-agent cooperation framework for the improvement of responses to instructions. To effectively refine the responses, we develop an iterative framework following a debate-advise-edit-judge paradigm. A two-stage multi-agent debate strategy is further devised to ensure the diversity and reliability of editing suggestions within the framework. Empirically, models equipped with CoEvol outperform competitive baselines evaluated by MT-Bench and AlpacaEval, demonstrating its effectiveness in enhancing instruction-following capabilities for LLMs.

Updated: 2024-10-24 02:59:46

标题: CoEvol：通过多智能体合作构建更好的指导微调响应

摘要: 近年来，对大型语言模型（LLMs）进行指令微调（IFT）已经引起了相当大的关注，以增强模型在未知任务上的性能。人们已经尝试自动构建和有效选择IFT数据。然而，我们认为先前的方法并没有充分利用LLMs提升数据质量的潜力。通过利用LLMs本身的能力，IFT数据中的响应可以进一步增强。在本文中，我们提出了CoEvol，一种基于LLM的多智能体合作框架，用于改进对指令的响应。为了有效地优化响应，我们开发了一个遵循辩论-建议-编辑-评判范式的迭代框架。进一步设计了一个两阶段多智能体辩论策略，以确保框架内编辑建议的多样性和可靠性。在经验上，配备CoEvol的模型在MT-Bench和AlpacaEval评估的竞争基线上表现优异，证明了它在提升LLMs的指令遵循能力方面的有效性。

更新时间: 2024-10-24 02:59:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07054v2

Link, Synthesize, Retrieve: Universal Document Linking for Zero-Shot Information Retrieval

Despite the recent advancements in information retrieval (IR), zero-shot IR remains a significant challenge, especially when dealing with new domains, languages, and newly-released use cases that lack historical query traffic from existing users. For such cases, it is common to use query augmentations followed by fine-tuning pre-trained models on the document data paired with synthetic queries. In this work, we propose a novel Universal Document Linking (UDL) algorithm, which links similar documents to enhance synthetic query generation across multiple datasets with different characteristics. UDL leverages entropy for the choice of similarity models and named entity recognition (NER) for the link decision of documents using similarity scores. Our empirical studies demonstrate the effectiveness and universality of the UDL across diverse datasets and IR models, surpassing state-of-the-art methods in zero-shot cases. The developed code for reproducibility is included in https://github.com/eoduself/UDL

Updated: 2024-10-24 02:52:19

标题: 链接、综合、检索：零-shot 信息检索的通用文档链接

摘要: 尽管信息检索（IR）方面近年来取得了显著进展，但零样本IR仍然是一个重大挑战，特别是在处理新领域、语言和缺乏现有用户历史查询流量的新发布用例时。对于这种情况，通常会使用查询增强，然后在与合成查询配对的文档数据上对预训练模型进行微调。在这项工作中，我们提出了一种新颖的Universal Document Linking（UDL）算法，将相似的文档进行链接，以增强跨多个具有不同特征的数据集的合成查询生成。UDL利用熵来选择相似性模型，并利用命名实体识别（NER）来使用相似性分数做出文档链接决策。我们的实证研究证明了UDL在不同数据集和IR模型中的有效性和普适性，在零样本情况下超越了最先进的方法。为了可重现性，开发的代码已包含在https://github.com/eoduself/UDL中。

更新时间: 2024-10-24 02:52:19

领域: cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2410.18385v1

Mind Scramble: Unveiling Large Language Model Psychology Via Typoglycemia

Research into the external behaviors and internal mechanisms of large language models (LLMs) has shown promise in addressing complex tasks in the physical world. Studies suggest that powerful LLMs, like GPT-4, are beginning to exhibit human-like cognitive abilities, including planning, reasoning, and reflection. In this paper, we introduce a research line and methodology called LLM Psychology, leveraging human psychology experiments to investigate the cognitive behaviors and mechanisms of LLMs. We migrate the Typoglycemia phenomenon from psychology to explore the "mind" of LLMs. Unlike human brains, which rely on context and word patterns to comprehend scrambled text, LLMs use distinct encoding and decoding processes. Through Typoglycemia experiments at the character, word, and sentence levels, we observe: (I) LLMs demonstrate human-like behaviors on a macro scale, such as lower task accuracy and higher token/time consumption; (II) LLMs exhibit varying robustness to scrambled input, making Typoglycemia a benchmark for model evaluation without new datasets; (III) Different task types have varying impacts, with complex logical tasks (e.g., math) being more challenging in scrambled form; (IV) Each LLM has a unique and consistent "cognitive pattern" across tasks, revealing general mechanisms in its psychology process. We provide an in-depth analysis of hidden layers to explain these phenomena, paving the way for future research in LLM Psychology and deeper interpretability.

Updated: 2024-10-24 02:49:36

标题: 思维混乱：通过错位症揭示大型语言模型的心理学

摘要: 研究大型语言模型（LLMs）的外部行为和内部机制已经显示出在处理现实世界中复杂任务方面的潜力。研究表明，像GPT-4这样强大的LLMs开始展现出类似于人类的认知能力，包括规划、推理和反思。在本文中，我们引入了一条名为LLM心理学的研究线和方法论，利用人类心理学实验来探究LLMs的认知行为和机制。我们将Typoglycemia现象从心理学迁移到探索LLMs的“心智”。与依赖上下文和单词模式来理解混乱文本的人类大脑不同，LLMs使用不同的编码和解码过程。通过在字符、单词和句子级别进行Typoglycemia实验，我们观察到：（一）LLMs在宏观尺度上展现出类似于人类的行为，如较低的任务准确性和更高的令牌/时间消耗；（二）LLMs对于混乱输入表现出不同程度的鲁棒性，使得Typoglycemia成为模型评估的基准，而无需新的数据集；（三）不同类型的任务对LLMs的影响不同，复杂的逻辑任务（如数学）在混乱形式下更具挑战性；（四）每个LLM在任务中都有一个独特且一致的“认知模式”，揭示了其心理过程中的通用机制。我们提供了对隐藏层的深入分析，以解释这些现象，为未来LLM心理学和更深层次的可解释性研究铺平道路。

更新时间: 2024-10-24 02:49:36

领域: cs.AI

下载: http://arxiv.org/abs/2410.01677v3

Differentially Private Federated Learning without Noise Addition: When is it Possible?

Federated Learning (FL) with Secure Aggregation (SA) has gained significant attention as a privacy preserving framework for training machine learning models while preventing the server from learning information about users' data from their individual encrypted model updates. Recent research has extended privacy guarantees of FL with SA by bounding the information leakage through the aggregate model over multiple training rounds thanks to leveraging the "noise" from other users' updates. However, the privacy metric used in that work (mutual information) measures the on-average privacy leakage, without providing any privacy guarantees for worse-case scenarios. To address this, in this work we study the conditions under which FL with SA can provide worst-case differential privacy guarantees. Specifically, we formally identify the necessary condition that SA can provide DP without addition noise. We then prove that when the randomness inside the aggregated model update is Gaussian with non-singular covariance matrix, SA can provide differential privacy guarantees with the level of privacy $\epsilon$ bounded by the reciprocal of the minimum eigenvalue of the covariance matrix. However, we further demonstrate that in practice, these conditions are almost unlikely to hold and hence additional noise added in model updates is still required in order for SA in FL to achieve DP. Lastly, we discuss the potential solution of leveraging inherent randomness inside aggregated model update to reduce the amount of addition noise required for DP guarantee.

Updated: 2024-10-24 02:49:14

标题: 无噪声添加的差分隐私联邦学习：何时可能？

摘要: 具有安全聚合（SA）的联邦学习（FL）作为一个隐私保护框架，在训练机器学习模型的同时防止服务器从用户的个别加密模型更新中学习到信息，已经引起了广泛关注。最近的研究通过利用其他用户更新的“噪音”，扩展了FL与SA的隐私保证，通过约束多轮训练中聚合模型的信息泄露。然而，在该研究中使用的隐私度量（互信息）衡量了平均隐私泄露，但并未提供最坏情况下的隐私保证。为了解决这个问题，在本研究中我们研究了FL与SA可以提供最坏情况差分隐私保证的条件。具体地，我们正式确定了SA能够在不添加噪音的情况下提供差分隐私所需的条件。然后，我们证明了当聚合模型更新中的随机性服从具有非奇异协方差矩阵的高斯分布时，SA可以提供差分隐私保证，其隐私水平$\epsilon$由协方差矩阵的最小特征值的倒数限制。然而，我们进一步证明了在实践中，这些条件几乎不太可能成立，因此仍然需要在模型更新中添加额外的噪音才能实现SA在FL中实现差分隐私。最后，我们讨论了利用聚合模型更新中固有的随机性来减少需要用于差分隐私保证的额外噪音的潜在解决方案。

更新时间: 2024-10-24 02:49:14

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.04551v3

System Safety Monitoring of Learned Components Using Temporal Metric Forecasting

In learning-enabled autonomous systems, safety monitoring of learned components is crucial to ensure their outputs do not lead to system safety violations, given the operational context of the system. However, developing a safety monitor for practical deployment in real-world applications is challenging. This is due to limited access to internal workings and training data of the learned component. Furthermore, safety monitors should predict safety violations with low latency, while consuming a reasonable amount of computation. To address the challenges, we propose a safety monitoring method based on probabilistic time series forecasting. Given the learned component outputs and an operational context, we empirically investigate different Deep Learning (DL)-based probabilistic forecasting to predict the objective measure capturing the satisfaction or violation of a safety requirement (safety metric). We empirically evaluate safety metric and violation prediction accuracy, and inference latency and resource usage of four state-of-the-art models, with varying horizons, using autonomous aviation and autonomous driving case studies. Our results suggest that probabilistic forecasting of safety metrics, given learned component outputs and scenarios, is effective for safety monitoring. Furthermore, for both case studies, Temporal Fusion Transformer (TFT) was the most accurate model for predicting imminent safety violations, with acceptable latency and resource consumption.

Updated: 2024-10-24 02:49:11

标题: 使用时间度量预测对学习组件进行系统安全监测

摘要: 在学习启用的自主系统中，对学习组件进行安全监控对于确保它们的输出不会导致系统安全违规是至关重要的，考虑到系统的操作环境。然而，在实际部署到现实世界应用中开发安全监控是具有挑战性的。这是因为对于学习组件的内部工作和训练数据的访问是有限的。此外，安全监控应该以低延迟预测安全违规，同时消耗合理数量的计算资源。为了解决这些挑战，我们提出了一种基于概率时间序列预测的安全监控方法。鉴于学习组件的输出和操作环境，我们经验性地研究了不同基于深度学习（DL）的概率预测来预测捕获安全要求（安全度量）的满足或违规的客观度量。我们经验性地评估了安全度量和违规预测准确性，以及四种最先进模型的推理延迟和资源使用，使用自主航空和自主驾驶案例研究，采用不同的视角。我们的结果表明，鉴于学习组件的输出和场景，对安全性度量进行概率预测对于安全监控是有效的。此外，在两个案例研究中，时间融合变压器（TFT）是最准确的模型，可以预测即将发生的安全违规，同时具有可接受的延迟和资源消耗。

更新时间: 2024-10-24 02:49:11

领域: cs.LG,cs.AI,cs.RO,cs.SE

下载: http://arxiv.org/abs/2405.13254v2

Harnessing PU Learning for Enhanced Cloud-based DDoS Detection: A Comparative Analysis

This paper explores the application of Positive-Unlabeled (PU) learning for enhanced Distributed Denial-of-Service (DDoS) detection in cloud environments. Utilizing the $\texttt{BCCC-cPacket-Cloud-DDoS-2024}$ dataset, we implement PU learning with four machine learning algorithms: XGBoost, Random Forest, Support Vector Machine, and Na\"{i}ve Bayes. Our results demonstrate the superior performance of ensemble methods, with XGBoost and Random Forest achieving $F_{1}$ scores exceeding 98%. We quantify the efficacy of each approach using metrics including $F_{1}$ score, ROC AUC, Recall, and Precision. This study bridges the gap between PU learning and cloud-based anomaly detection, providing a foundation for addressing Context-Aware DDoS Detection in multi-cloud environments. Our findings highlight the potential of PU learning in scenarios with limited labeled data, offering valuable insights for developing more robust and adaptive cloud security mechanisms.

Updated: 2024-10-24 02:44:56

标题: 利用PU学习增强基于云的DDoS检测：一项比较分析

摘要: 本文探讨了在云环境中应用正未标记（PU）学习以增强分布式拒绝服务（DDoS）检测的方法。利用$\texttt{BCCC-cPacket-Cloud-DDoS-2024}$数据集，我们使用四种机器学习算法实施PU学习：XGBoost、随机森林、支持向量机和朴素贝叶斯。我们的结果表明，集成方法的性能优越，XGBoost和随机森林的$F_{1}$得分超过98%。我们使用$F_{1}$得分、ROC AUC、召回率和精确度等指标量化每种方法的有效性。这项研究弥合了PU学习和基于云的异常检测之间的差距，为解决多云环境中的上下文感知DDoS检测奠定了基础。我们的发现突显了在标记数据有限的情况下PU学习在场景中的潜力，为开发更强大和适应性云安全机制提供了宝贵的见解。

更新时间: 2024-10-24 02:44:56

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.18380v1

Send Message to the Future? Blockchain-based Time Machines for Decentralized Reveal of Locked Information

Conditional information reveal systems automate the release of information upon meeting specific predefined conditions, such as time or location. This paper introduces a breakthrough in the understanding, design, and application of conditional information reveal systems that are highly secure and decentralized. By designing a new practical timed-release cryptography system and a secret sharing scheme with reveal-verifiability, a novel data sharing system is devised on the blockchain that "sends messages in the future" with highly accurate decryption times. Notably, the proposed secret sharing scheme applies to other applications requiring verifiability of revealed secret shares. This paper provides a complete evaluation portfolio of this pioneering paradigm, including analytical results, a validation of its robustness in the Tamarin Prover and a performance evaluation of a real-world, open-source system prototype deployed across the globe. Using real-world election data, we also demonstrate the applicability of this innovative system in e-voting, illustrating its capacity to secure and ensure fair electronic voting processes.

Updated: 2024-10-24 02:39:08

标题: 将信息发送到未来？基于区块链的时间机器用于分散揭示被锁定信息

摘要: 条件信息揭示系统自动化地在满足特定预定义条件时释放信息，比如时间或位置。本文介绍了对高度安全和分散化的条件信息揭示系统的理解、设计和应用方面的突破。通过设计一种新型的实用定时释放加密系统和带有揭示可验证性的秘密分享方案，基于区块链构建了一种新颖的数据共享系统，可以“未来发送消息”，并具有高度准确的解密时间。值得注意的是，所提出的秘密分享方案适用于其他需要揭示秘密份额可验证性的应用程序。本文提供了这一开创性范式的完整评估组合，包括分析结果、在Tamarin Prover中对其稳健性的验证以及在全球范围内部署的一个真实世界开源系统原型的性能评估。利用真实世界的选举数据，我们还展示了这一创新系统在电子投票中的适用性，展示了其保障和确保公平的电子投票流程的能力。

更新时间: 2024-10-24 02:39:08

领域: cs.CR

下载: http://arxiv.org/abs/2401.05947v3

Delta: A Cloud-assisted Data Enrichment Framework for On-Device Continual Learning

In modern mobile applications, users frequently encounter various new contexts, necessitating on-device continual learning (CL) to ensure consistent model performance. While existing research predominantly focused on developing lightweight CL frameworks, we identify that data scarcity is a critical bottleneck for on-device CL. In this work, we explore the potential of leveraging abundant cloud-side data to enrich scarce on-device data, and propose a private, efficient and effective data enrichment framework Delta. Specifically, Delta first introduces a directory dataset to decompose the data enrichment problem into device-side and cloud-side sub-problems without sharing sensitive data. Next, Delta proposes a soft data matching strategy to effectively solve the device-side sub-problem with sparse user data, and an optimal data sampling scheme for cloud server to retrieve the most suitable dataset for enrichment with low computational complexity. Further, Delta refines the data sampling scheme by jointly considering the impact of enriched data on both new and past contexts, mitigating the catastrophic forgetting issue from a new aspect. Comprehensive experiments across four typical mobile computing tasks with varied data modalities demonstrate that Delta could enhance the overall model accuracy by an average of 15.1%, 12.4%, 1.1% and 5.6% for visual, IMU, audio and textual tasks compared with few-shot CL, and consistently reduce the communication costs by over 90% compared to federated CL.

Updated: 2024-10-24 02:38:09

标题: Delta：一种云辅助数据丰富框架，用于设备上的持续学习

摘要: 在现代移动应用程序中，用户经常遇到各种新的背景，需要进行设备上的持续学习（CL）以确保模型性能的一致。虽然现有的研究主要集中在开发轻量级的CL框架上，但我们认为数据稀缺是设备上CL的一个关键瓶颈。在这项工作中，我们探讨了利用丰富的云端数据来丰富稀缺的设备上数据的潜力，并提出了一个私密、高效和有效的数据丰富框架Delta。具体来说，Delta首先引入一个目录数据集，将数据丰富问题分解为设备端和云端子问题，而不共享敏感数据。接下来，Delta提出了一种软数据匹配策略，以有效解决设备端子问题，使用稀疏用户数据，并提出了一种优化的数据采样方案，用于云服务器检索最适合丰富的数据集，具有低计算复杂度。此外，Delta通过共同考虑对新旧背景的影响来优化数据采样方案，从一个新的角度缓解了灾难性遗忘问题。通过对四个典型的移动计算任务进行全面实验，涵盖了不同的数据模态，表明与少样本CL相比，Delta可以将整体模型准确性平均提高15.1％，12.4％，1.1％和5.6％，对于视觉，IMU，音频和文本任务，并且与联邦CL相比，Delta可以持续减少通信成本超过90％。

更新时间: 2024-10-24 02:38:09

领域: cs.LG

下载: http://arxiv.org/abs/2410.18378v1

Integrating Canonical Neural Units and Multi-Scale Training for Handwritten Text Recognition

The segmentation-free research efforts for addressing handwritten text recognition can be divided into three categories: connectionist temporal classification (CTC), hidden Markov model and encoder-decoder methods. In this paper, inspired by the above three modeling methods, we propose a new recognition network by using a novel three-dimensional (3D) attention module and global-local context information. Based on the feature maps of the last convolutional layer, a series of 3D blocks with different resolutions are split. Then, these 3D blocks are fed into the 3D attention module to generate sequential visual features. Finally, by integrating the visual features and the corresponding global-local context features, a well-designed representation can be obtained. Main canonical neural units including attention mechanisms, fully-connected layer, recurrent unit and convolutional layer are efficiently organized into a network and can be jointly trained by the CTC loss and the cross-entropy loss. Experiments on the latest Chinese handwritten text datasets (the SCUT-HCCDoc and the SCUT-EPT) and one English handwritten text dataset (the IAM) show that the proposed method can make a new milestone.

Updated: 2024-10-24 02:33:12

标题: 整合规范神经单元和多尺度训练用于手写文本识别

摘要: 解决手写文本识别的无分割研究工作可分为三类：连接主义时间分类（CTC）、隐马尔可夫模型和编码器-解码器方法。在本文中，受上述三种建模方法的启发，我们提出了一种新的识别网络，使用了新颖的三维（3D）注意力模块和全局-局部上下文信息。基于最后一个卷积层的特征图，一系列不同分辨率的3D块被分割。然后，这些3D块被送入3D注意力模块生成序列视觉特征。最后，通过整合视觉特征和相应的全局-局部上下文特征，可以得到一个精心设计的表示。主要的神经单元包括注意力机制、全连接层、循环单元和卷积层被有效地组织到一个网络中，并可以通过CTC损失和交叉熵损失进行联合训练。在最新的中文手写文本数据集（SCUT-HCCDoc和SCUT-EPT）和一个英文手写文本数据集（IAM）上的实验证明，提出的方法可以开创新的里程碑。

更新时间: 2024-10-24 02:33:12

领域: cs.AI

下载: http://arxiv.org/abs/2410.18374v1

A Unimodal Speaker-Level Membership Inference Detector for Contrastive Pretraining

Audio can disclose PII, particularly when combined with related text data. Therefore, it is essential to develop tools to detect privacy leakage in Contrastive Language-Audio Pretraining(CLAP). Existing MIAs need audio as input, risking exposure of voiceprint and requiring costly shadow models. To address these challenges, we propose USMID, a textual unimodal speaker-level membership inference detector for CLAP models, which queries the target model using only text data and does not require training shadow models. We randomly generate textual gibberish that are clearly not in training dataset. Then we extract feature vectors from these texts using the CLAP model and train a set of anomaly detectors on them. During inference, the feature vector of each test text is input into the anomaly detector to determine if the speaker is in the training set (anomalous) or not (normal). If available, USMID can further enhance detection by integrating real audio of the tested speaker. Extensive experiments on various CLAP model architectures and datasets demonstrate that USMID outperforms baseline methods using only text data.

Updated: 2024-10-24 02:26:57

标题: 一种单峰说话者级别的对比预训练成员推理检测器

摘要: 音频可以泄露个人身份信息，尤其是当与相关文本数据结合时。因此，开发工具来检测Contrastive Language-Audio Pretraining（CLAP）中的隐私泄漏至关重要。现有的成员推理攻击（MIAs）需要音频作为输入，存在暴露声纹并需要昂贵的影子模型的风险。为了解决这些挑战，我们提出了USMID，一种针对CLAP模型的文本单模态说话者级成员推理检测器，仅使用文本数据查询目标模型，并不需要训练影子模型。我们随机生成明显不在训练数据集中的文本胡言乱语。然后，我们使用CLAP模型从这些文本中提取特征向量，并在其上训练一组异常检测器。在推理过程中，将每个测试文本的特征向量输入到异常检测器中，以确定说话者是否在训练集中（异常）或不在（正常）。如果有音频可用，USMID可以通过集成被测试说话者的真实音频进一步增强检测能力。对各种CLAP模型架构和数据集进行的大量实验证明，USMID在仅使用文本数据时优于基线方法。

更新时间: 2024-10-24 02:26:57

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2410.18371v1

Multi-objective Optimization in CPU Design Space Exploration: Attention is All You Need

Design space exploration (DSE) enables architects to systematically evaluate various design options, guiding decisions on the most suitable configurations to meet specific objectives such as optimizing performance, power, and area. However, the growing complexity of modern CPUs has dramatically increased the number of micro-architectural parameters and expanded the overall design space, making DSE more challenging and time-consuming. Existing DSE frameworks struggle in large-scale design spaces due to inaccurate models and limited insights into parameter impact, hindering efficient identification of optimal micro-architectures within tight timeframes. In this work, we introduce AttentionDSE. Its key idea is to use the attention mechanism to establish a direct mapping of micro-architectural parameters to their contributions to predicted performance. This approach enhances both the prediction accuracy and interpretability of the performance model. Furthermore, the weights are dynamically adjusted, enabling the model to respond to design changes and effectively pinpoint the key micro-architectural parameters/components responsible for performance bottlenecks. Thus, AttentionDSE accurately, purposefully, and rapidly discovers optimal designs. Experiments on SPEC 2017 demonstrate that AttentionDSE significantly reduces exploration time by over 80\% and achieves 3.9\% improvement in Pareto Hypervolume compared to state-of-the-art DSE frameworks while maintaining superior prediction accuracy and efficiency with an increasing number of parameters.

Updated: 2024-10-24 02:20:17

标题: CPU设计空间探索中的多目标优化：注意力就是一切

摘要: 设计空间探索（DSE）使架构师能够系统地评估各种设计选项，指导决策以满足特定目标，如优化性能、功耗和面积。然而，现代CPU的日益复杂化大大增加了微架构参数的数量，并扩展了整体设计空间，使得DSE变得更具挑战性和耗时。现有的DSE框架在大规模设计空间中遇到困难，因为存在不准确的模型和对参数影响的见解有限，阻碍了在紧迫时间框架内高效识别最佳微架构的能力。在这项工作中，我们介绍了AttentionDSE。其关键思想是利用注意力机制将微架构参数直接映射到对预测性能的贡献。这种方法提高了性能模型的预测准确性和可解释性。此外，权重是动态调整的，使模型能够对设计变化做出响应，并有效地确定导致性能瓶颈的关键微架构参数/组件。因此，AttentionDSE能够准确、有目的地、快速地发现最佳设计。在SPEC 2017上的实验表明，AttentionDSE将探索时间减少了80\%，与最先进的DSE框架相比，Pareto超体积的改善率为3.9\%，同时保持了较高的预测准确性和效率，即使参数数量增加。

更新时间: 2024-10-24 02:20:17

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2410.18368v1

Out-of-Distribution Detection with a Single Unconditional Diffusion Model

Out-of-distribution (OOD) detection is a critical task in machine learning that seeks to identify abnormal samples. Traditionally, unsupervised methods utilize a deep generative model for OOD detection. However, such approaches require a new model to be trained for each inlier dataset. This paper explores whether a single model can perform OOD detection across diverse tasks. To that end, we introduce Diffusion Paths (DiffPath), which uses a single diffusion model originally trained to perform unconditional generation for OOD detection. We introduce a novel technique of measuring the rate-of-change and curvature of the diffusion paths connecting samples to the standard normal. Extensive experiments show that with a single model, DiffPath is competitive with prior work using individual models on a variety of OOD tasks involving different distributions. Our code is publicly available at https://github.com/clear-nus/diffpath.

Updated: 2024-10-24 02:17:00

标题: 使用单个无条件扩散模型进行外分布检测

摘要: Out-of-distribution（OOD）检测是机器学习中的一个关键任务，旨在识别异常样本。传统上，无监督方法利用深度生成模型进行OOD检测。然而，这种方法需要为每个正常数据集训练一个新模型。本文探讨了单个模型是否可以跨多个任务执行OOD检测。为此，我们引入了Diffusion Paths（DiffPath），它使用一个最初训练用于无条件生成的扩散模型进行OOD检测。我们引入了一种新颖的技术，用于测量连接样本到标准正态分布的扩散路径的变化率和曲率。大量实验表明，DiffPath使用单个模型，在涉及不同分布的各种OOD任务上与使用独立模型的先前工作竞争力强。我们的代码可以在https://github.com/clear-nus/diffpath上公开获取。

更新时间: 2024-10-24 02:17:00

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.11881v3

The Road to Trust: Building Enclaves within Confidential VMs

Integrity is critical for maintaining system security, as it ensures that only genuine software is loaded onto a machine. Although confidential virtual machines (CVMs) function within isolated environments separate from the host, it is important to recognize that users still encounter challenges in maintaining control over the integrity of the code running within the trusted execution environments (TEEs). The presence of a sophisticated operating system (OS) raises the possibility of dynamically creating and executing any code, making user applications within TEEs vulnerable to interference or tampering if the guest OS is compromised. To address this issue, this paper introduces NestedSGX, a framework which leverages virtual machine privilege level (VMPL), a recent hardware feature available on AMD SEV-SNP to enable the creation of hardware enclaves within the guest VM. Similar to Intel SGX, NestedSGX considers the guest OS untrusted for loading potentially malicious code. It ensures that only trusted and measured code executed within the enclave can be remotely attested. To seamlessly protect existing applications, NestedSGX aims for compatibility with Intel SGX by simulating SGX leaf functions. We have also ported the SGX SDK and the Occlum library OS to NestedSGX, enabling the use of existing SGX toolchains and applications in the system. Performance evaluations show that context switches in NestedSGX take about 32,000 -- 34,000 cycles, approximately $1.9\times$ -- $2.1\times$ higher than that of Intel SGX. NestedSGX incurs minimal overhead in most real-world applications, with an average overhead below 2% for computation and memory intensive workloads and below 15.68% for I/O intensive workloads.

Updated: 2024-10-24 02:02:40

标题: 通往信任之路：在保密虚拟机内构建飞地

摘要: 完整性对于维护系统安全至关重要，因为它确保只有真实的软件被加载到计算机上。尽管机密虚拟机（CVM）在与主机分离的隔离环境中运行，但重要的是要认识到用户仍然在维护对在受信任执行环境（TEEs）中运行的代码的完整性方面遇到挑战。复杂操作系统（OS）的存在可能会动态创建和执行任何代码，如果客户OS受损，那么TEE中的用户应用程序容易受到干扰或篡改。为了解决这个问题，本文介绍了NestedSGX，这是一个利用AMD SEV-SNP上最新硬件功能虚拟机特权级别（VMPL）的框架，以便在嵌入式VM内创建硬件隔离区。类似于Intel SGX，NestedSGX认为客户OS不可信，因此不会加载潜在恶意代码。它确保只有受信任和经过测量的代码在隔离区内执行时才能远程验证。为了无缝保护现有应用程序，NestedSGX旨在与Intel SGX兼容，通过模拟SGX叶函数来实现。我们还将SGX SDK和Occlum库OS移植到NestedSGX，从而使系统中可以使用现有的SGX工具链和应用程序。性能评估显示，在NestedSGX中的上下文切换大约需要32,000-34,000个周期，大约比Intel SGX高约1.9-2.1倍。对于大多数现实世界的应用程序，NestedSGX几乎没有额外开销，在计算和内存密集型工作负载下的平均开销低于2％，在I/O密集型工作负载下低于15.68％。

更新时间: 2024-10-24 02:02:40

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2402.11438v3

Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model

OpenAI's Whisper Automated Speech Recognition model excels in generalizing across diverse datasets and domains. However, this broad adaptability can lead to diminished performance in tasks requiring recognition of specific vocabularies. Addressing this challenge typically involves fine-tuning the model, which demands extensive labeled audio data that is often difficult to acquire and unavailable for specific domains. In this study, we propose a method to enhance transcription accuracy without explicit fine-tuning or altering model parameters, using a relatively small training dataset. Our method leverages contextual biasing, to direct Whisper model's output towards a specific vocabulary by integrating a neural-symbolic prefix tree structure to guide the model's transcription output. To validate our approach, we conducted experiments using a validation dataset comprising maritime data collected within a simulated training environment. A comparison between the original Whisper models of varying parameter sizes and our biased model revealed a notable reduction in transcription word error rate and enhanced performance of downstream applications. Our findings suggest that this methodology holds promise for improving speech-to-text translation performance in domains characterized by limited vocabularies.

Updated: 2024-10-24 01:58:11

标题: 上下文偏差以改进领域特定定制词汇的音频转录，无需明确微调耳语模型

摘要: OpenAI的Whisper自动语音识别模型在泛化跨多样化数据集和领域方面表现出色。然而，这种广泛的适应性可能导致在需要识别特定词汇的任务中性能下降。解决这一挑战通常涉及微调模型，这需要大量标记的音频数据，通常难以获取并且在特定领域不可用。在本研究中，我们提出了一种方法，可以在不显式微调或改变模型参数的情况下，使用相对较小的训练数据集来提高转录准确性。我们的方法利用上下文偏见，通过集成神经符号前缀树结构来引导模型的转录输出，将Whisper模型的输出定向到特定词汇。为了验证我们的方法，我们使用一个验证数据集，其中包含在模拟训练环境中收集的海事数据进行了实验。对比不同参数大小的原始Whisper模型和我们的偏倚模型，发现转录单词错误率显著降低，并且下游应用性能得到增强。我们的发现表明，这种方法可能有助于在词汇有限的领域改善语音到文本的翻译性能。

更新时间: 2024-10-24 01:58:11

领域: cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.18363v1

Boosting X-formers with Structured Matrix for Long Sequence Time Series Forecasting

Transformer-based models for long sequence time series forecasting problems have gained significant attention due to their exceptional forecasting precision. However, the self-attention mechanism introduces challenges in terms of computational efficiency due to its quadratic time complexity. To address these issues, we propose a novel architectural framework that enhances Transformer models through the integration of Surrogate Attention Blocks (SAB) and Surrogate Feed-Forward Neural Network Blocks (SFB). They replace the self-attention and feed-forward layer by leveraging structured matrices that reduce both time and space complexity while maintaining the expressive power of the original self-attention mechanism and feed-forward network. The equivalence of this substitution is fully demonstrated. Extensive experiments on nine Transformer variants across five distinct time series tasks demonstrate an average performance improvement of 9.45%, alongside a 46% reduction in model size. These results confirm the efficacy of our surrogate-based approach in maintaining prediction accuracy while significantly boosting computational efficiency.

Updated: 2024-10-24 01:52:17

标题: 利用结构化矩阵提升X-变压器以预测长序列时间序列

摘要: 基于Transformer的模型在长序列时间序列预测问题中获得了显著关注，因为其出色的预测精度。然而，自注意机制引入了挑战，因为其二次时间复杂度导致计算效率问题。为了解决这些问题，我们提出了一个新颖的架构框架，通过集成替代注意块（SAB）和替代前馈神经网络块（SFB）来增强Transformer模型。它们通过利用结构化矩阵替换自注意和前馈层，从而减少时间和空间复杂性，同时保持原始自注意机制和前馈网络的表达能力。这种替代的等效性得到了充分证明。在五个不同的时间序列任务上对九种Transformer变体进行的广泛实验表明，平均性能提高了9.45%，同时模型尺寸减少了46%。这些结果证实了我们基于替代的方法在保持预测准确性的同时显著提升了计算效率的有效性。

更新时间: 2024-10-24 01:52:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.12462v3

Assessing Alcohol Use Disorder: Insights from Lifestyle, Background, and Family History with Machine Learning Techniques

This study explored how lifestyle, personal background, and family history contribute to the risk of developing Alcohol Use Disorder (AUD). Survey data from the All of Us Program was utilized to extract information on AUD status, lifestyle, personal background, and family history for 6,016 participants. Key determinants of AUD were identified using decision trees including annual income, recreational drug use, length of residence, sex/gender, marital status, education level, and family history of AUD. Data visualization and Chi-Square Tests of Independence were then used to assess associations between identified factors and AUD. Afterwards, machine learning techniques including decision trees, random forests, and Naive Bayes were applied to predict an individual's likelihood of developing AUD. Random forests were found to achieve the highest accuracy (82%), compared to Decision Trees and Naive Bayes. Findings from this study can offer insights that help parents, healthcare professionals, and educators develop strategies to reduce AUD risk, enabling early intervention and targeted prevention efforts.

Updated: 2024-10-24 01:30:54

标题: 评估酒精使用障碍：机器学习技术对生活方式、背景和家庭史的见解

摘要: 这项研究探讨了生活方式、个人背景和家庭史如何影响发展酒精使用障碍（AUD）的风险。利用“我们所有人”计划的调查数据，提取了6,016名参与者的AUD状况、生活方式、个人背景和家庭史的信息。使用决策树识别了AUD的关键决定因素，包括年收入、娱乐药物使用、居住时间、性别、婚姻状况、教育水平和家庭中是否有AUD史。然后使用数据可视化和卡方独立性检验来评估确定的因素与AUD之间的关联。随后，应用决策树、随机森林和朴素贝叶斯等机器学习技术来预测个体发展AUD的可能性。发现随机森林的准确率最高（82%），相比决策树和朴素贝叶斯。这项研究的发现可以为父母、医疗专业人员和教育工作者提供见解，帮助他们制定减少AUD风险的策略，实现早期干预和有针对性的预防工作。

更新时间: 2024-10-24 01:30:54

领域: cs.LG

下载: http://arxiv.org/abs/2410.18354v1

Precision Soil Quality Analysis Using Transformer-based Data Fusion Strategies: A Systematic Review

This review explores the most recent advancements in transformer-based data fusion techniques in agricultural remote sensing (RS), with a particular focus on soil analysis. Utilizing a systematic, data-driven approach, we demonstrate that transformers have significantly outperformed conventional deep learning and machine learning methods since 2022, achieving prediction performance between 92% and 97%. The review is specifically focused on soil analysis, due to the importance of soil condition in optimizing crop productivity and ensuring sustainable farming practices. Transformer-based models have shown remarkable capabilities in handling complex multivariate soil data, improving the accuracy of soil moisture prediction, soil element analysis, and other soil-related applications. This systematic review primarily focuses on 1) analysing research trends and patterns in the literature, both chronologically and technically, and 2) conducting a comparative analysis of data fusion approaches, considering factors such as data types, techniques, and RS applications. Finally, we propose a roadmap for implementing data fusion methods in agricultural RS.

Updated: 2024-10-24 01:26:21

标题: Transformer-based数据融合策略在精准土壤质量分析中的应用：一项系统性综述

摘要: 这篇综述探讨了农业遥感中基于变压器的数据融合技术的最新进展，特别关注土壤分析。通过采用系统化、数据驱动的方法，我们展示了自2022年以来，变压器已明显优于传统的深度学习和机器学习方法，实现了92%至97%的预测性能。该综述专注于土壤分析，因为土壤状况对优化作物生产和确保可持续农业实践至关重要。基于变压器的模型在处理复杂的多变量土壤数据方面表现出显著能力，提高了土壤湿度预测、土壤元素分析和其他与土壤相关的应用的准确性。这篇系统性综述主要关注于：1）分析文献中的研究趋势和模式，无论是按时间顺序还是按技术，以及2）对数据融合方法进行比较分析，考虑数据类型、技术和遥感应用等因素。最后，我们提出了在农业遥感中实施数据融合方法的路线图。

更新时间: 2024-10-24 01:26:21

领域: cs.LG

下载: http://arxiv.org/abs/2410.18353v1

Distribution-Aware Compensation Design for Sustainable Data Rights in Machine Learning

Modern distributed learning systems face a critical challenge when clients request the removal of their data influence from trained models, as this process can significantly destabilize system performance and affect remaining participants. We propose an innovative mechanism that views this challenge through the lens of game theory, establishing a leader-follower framework where a central coordinator provides strategic incentives to maintain system stability during data removal operations. Our approach quantifies the ripple effects of data removal through a comprehensive analytical model that captures both system-wide and participant-specific impacts. We establish mathematical foundations for measuring participant utility and system outcomes, revealing critical insights into how data diversity influences both individual decisions and overall system stability. The framework incorporates a computationally efficient solution method that addresses the inherent complexity of optimizing participant interactions and resource allocation.

Updated: 2024-10-24 01:25:51

标题: 分布感知的补偿设计：机器学习中可持续数据权利的分配

摘要: 现代分布式学习系统面临一个关键挑战，即当客户要求从训练模型中删除他们的数据影响时，这个过程可能会显著破坏系统性能并影响剩余参与者。我们提出了一种创新机制，通过博弈论视角来解决这一挑战，建立了一个领导者-追随者框架，其中一个中央协调员提供战略激励，以在数据删除操作期间维持系统稳定性。我们的方法通过一个全面的分析模型量化了数据删除的涟漪效应，捕捉了系统范围和参与者特定影响。我们为衡量参与者效用和系统结果建立了数学基础，揭示了数据多样性如何影响个体决策和整体系统稳定性的关键见解。该框架整合了一个计算高效的解决方案方法，解决了优化参与者互动和资源分配的固有复杂性。

更新时间: 2024-10-24 01:25:51

领域: cs.GT,cs.AI

下载: http://arxiv.org/abs/2410.15045v2

Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner

Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system. Existing methods aim to reconstruct STTD using low-dimensional models. However, they are limited to data-specific dimensions or source-dependent patterns, restricting them from unifying representations. Here, we present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation. To discern the underlying dynamics in low-dimensional regimes, coordinate-based neural networks that can encode high-frequency structures are employed to directly map coordinates to traffic variables. To unravel the entangled spatial-temporal interactions, the variability is decomposed into separate processes. We further enable modeling in irregular spaces such as sensor graphs using spectral embedding. Through continuous representations, our approach enables the modeling of a variety of STTD with a unified input, thereby serving as a generalized learner of the underlying traffic dynamics. It is also shown that it can learn implicit low-rank priors and smoothness regularization from the data, making it versatile for learning different dominating data patterns. We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales. Empirical results not only indicate that our model has significant superiority over conventional low-rank models, but also highlight that the versatility of the approach extends to different data domains, output resolutions, and network topologies. Comprehensive model analyses provide further insight into the inductive bias of STTD. We anticipate that this pioneering modeling perspective could lay the foundation for universal representation of STTD in various real-world tasks. Code is available at https://github.com/tongnie/traffic_dynamics.

Updated: 2024-10-24 01:20:59

标题: 时空隐式神经表示作为广义交通数据学习者

摘要: 时空交通数据（STTD）测量了多尺度交通系统的复杂动态行为。现有方法旨在使用低维模型重建STTD。然而，它们受限于特定数据维度或源相关模式，限制了它们统一表示的能力。在这里，我们提出了一种新的范例来解决STTD学习问题，将STTD参数化为隐式神经表示。为了识别低维度情况下的潜在动力学，我们采用了基于坐标的神经网络，可以编码高频结构，直接将坐标映射到交通变量。为了揭示错综复杂的时空交互作用，变异性被分解为独立的过程。我们进一步通过使用谱嵌入在不规则空间（如传感器图）中进行建模。通过连续表示，我们的方法使得能够使用统一输入对各种STTD进行建模，从而作为潜在交通动态的通用学习器。还表明它可以从数据中学习隐式低秩先验和平滑正则化，使其适用于学习不同主导数据模式。我们通过在真实场景中进行大量实验验证了其有效性，展示了从走廊到网络规模的应用。实证结果不仅表明我们的模型明显优于传统的低秩模型，还突显了该方法的通用性扩展到不同数据域、输出分辨率和网络拓扑结构。全面的模型分析进一步深入了解STTD的归纳偏差。我们预计这种开创性的建模视角可能为各种真实世界任务中STTD的通用表示奠定基础。代码可在https://github.com/tongnie/traffic_dynamics找到。

更新时间: 2024-10-24 01:20:59

领域: cs.LG

下载: http://arxiv.org/abs/2405.03185v2

FedBaF: Federated Learning Aggregation Biased by a Foundation Model

Foundation models are now a major focus of leading technology organizations due to their ability to generalize across diverse tasks. Existing approaches for adapting foundation models to new applications often rely on Federated Learning (FL) and disclose the foundation model weights to clients when using it to initialize the global model. While these methods ensure client data privacy, they compromise model and information security. In this paper, we introduce Federated Learning Aggregation Biased by a Foundation Model (FedBaF), a novel method for dynamically integrating pre-trained foundation model weights during the FL aggregation phase. Unlike conventional methods, FedBaF preserves the confidentiality of the foundation model while still leveraging its power to train more accurate models, especially in non-IID and adversarial scenarios. Our comprehensive experiments use Pre-ResNet and foundation models like Vision Transformer to demonstrate that FedBaF not only matches, but often surpasses the test accuracy of traditional weight initialization methods by up to 11.4\% in IID and up to 15.8\% in non-IID settings. Additionally, FedBaF applied to a Transformer-based language model significantly reduced perplexity by up to 39.2\%.

Updated: 2024-10-24 01:14:23

标题: FedBaF: 基于基础模型的联邦学习聚合偏差

摘要: 基金会模型现在是领先技术组织的主要关注重点，因为它们能够在不同任务之间进行泛化。现有的将基金会模型调整到新应用程序的方法通常依赖于联邦学习（FL），并在使用它初始化全局模型时向客户披露基金会模型权重。虽然这些方法确保了客户数据的隐私，但却损害了模型和信息安全性。在本文中，我们介绍了一种新颖的方法，即基金会模型偏置的联邦学习聚合（FedBaF），用于在FL聚合阶段动态地整合预训练的基金会模型权重。与传统方法不同，FedBaF保护了基金会模型的机密性，同时利用其能力训练更精确的模型，尤其在非IID和对抗性场景中。我们的综合实验使用了Pre-ResNet和基金会模型，如Vision Transformer，证明了FedBaF不仅与传统的权重初始化方法的测试准确率相匹配，而且在IID情况下最多可超过11.4％，在非IID设置中最多可超过15.8％。此外，应用于基于Transformer的语言模型的FedBaF显着降低了困惑度高达39.2％。

更新时间: 2024-10-24 01:14:23

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2410.18352v1

AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability

Speculative decoding is a powerful technique that attempts to circumvent the autoregressive constraint of modern Large Language Models (LLMs). The aim of speculative decoding techniques is to improve the average inference time of a large, target model without sacrificing its accuracy, by using a more efficient draft model to propose draft tokens which are then verified in parallel. The number of draft tokens produced in each drafting round is referred to as the draft length and is often a static hyperparameter chosen based on the acceptance rate statistics of the draft tokens. However, setting a static draft length can negatively impact performance, especially in scenarios where drafting is expensive and there is a high variance in the number of tokens accepted. Adaptive Entropy-based Draft Length (AdaEDL) is a simple, training and parameter-free criteria which allows for early stopping of the token drafting process by approximating a lower bound on the expected acceptance probability of the drafted token based on the currently observed entropy of the drafted logits. We show that AdaEDL consistently outperforms static draft-length speculative decoding by 10%-57% as well as other training-free draft-stopping techniques by upto 10% in a variety of settings and datasets. At the same time, we show that AdaEDL is more robust than these techniques and preserves performance in high-sampling-temperature scenarios. Since it is training-free, in contrast to techniques that rely on the training of dataset-specific draft-stopping predictors, AdaEDL can seamlessly be integrated into a variety of pre-existing LLM systems.

Updated: 2024-10-24 01:13:43

标题: AdaEDL：基于熵的标记接受概率下限的大语言模型推测解码的早期草稿停止

摘要: 推测解码是一种强大的技术，旨在规避现代大型语言模型（LLMs）的自回归约束。推测解码技术的目标是通过使用更高效的草稿模型提出草稿标记，然后并行验证这些标记，从而提高大型目标模型的平均推理时间，而不损害其准确性。每一轮起草产生的草稿标记数量称为起草长度，通常是基于草稿标记的接受率统计选择的静态超参数。然而，设置静态起草长度可能对性能产生负面影响，特别是在起草昂贵且接受的标记数量存在高方差的情况下。自适应基于熵的起草长度（AdaEDL）是一种简单的、无需训练和参数的标准，它通过根据当前观察到的起草逻辑的熵近似计算出所起草标记的预期接受概率的下限，从而允许提前停止标记的起草过程。我们表明，AdaEDL在各种设置和数据集中始终优于静态起草长度的推测解码10%-57%，同时优于其他无需训练的停止起草技术高达10%。与此同时，我们表明AdaEDL比这些技术更加稳健，并在高采样温度情况下保持性能。由于AdaEDL无需训练，与依赖于数据集特定的起草停止预测器训练的技术不同，它可以无缝集成到各种现有LLM系统中。

更新时间: 2024-10-24 01:13:43

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.18351v1

Geometric Feature Enhanced Knowledge Graph Embedding and Spatial Reasoning

Geospatial Knowledge Graphs (GeoKGs) model geoentities (e.g., places and natural features) and spatial relationships in an interconnected manner, providing strong knowledge support for geographic applications, including data retrieval, question-answering, and spatial reasoning. However, existing methods for mining and reasoning from GeoKGs, such as popular knowledge graph embedding (KGE) techniques, lack geographic awareness. This study aims to enhance general-purpose KGE by developing new strategies and integrating geometric features of spatial relations, including topology, direction, and distance, to infuse the embedding process with geographic intuition. The new model is tested on downstream link prediction tasks, and the results show that the inclusion of geometric features, particularly topology and direction, improves prediction accuracy for both geoentities and spatial relations. Our research offers new perspectives for integrating spatial concepts and principles into the GeoKG mining process, providing customized GeoAI solutions for geospatial challenges.

Updated: 2024-10-24 00:53:48

标题: 几何特征增强的知识图嵌入与空间推理

摘要: 地理知识图谱（GeoKGs）模型地理实体（例如地点和自然特征）和空间关系以互连的方式交织在一起，为地理应用提供强大的知识支持，包括数据检索、问答和空间推理。然而，现有的从GeoKGs挖掘和推理的方法，如流行的知识图嵌入（KGE）技术，缺乏地理意识。本研究旨在通过开发新策略和整合空间关系的几何特征，包括拓扑、方向和距离，来增强通用KGE，从而为嵌入过程注入地理直觉。新模型在下游链接预测任务上进行了测试，结果显示包括几何特征（尤其是拓扑和方向）可以提高对地理实体和空间关系的预测准确性。我们的研究为将空间概念和原则整合到GeoKG挖掘过程中提供了新的视角，为地理挑战提供定制的GeoAI解决方案。

更新时间: 2024-10-24 00:53:48

领域: cs.AI

下载: http://arxiv.org/abs/2410.18345v1

Aggregated Knowledge Model: Enhancing Domain-Specific QA with Fine-Tuned and Retrieval-Augmented Generation Models

This paper introduces a novel approach to enhancing closed-domain Question Answering (QA) systems, focusing on the specific needs of the Lawrence Berkeley National Laboratory (LBL) Science Information Technology (ScienceIT) domain. Utilizing a rich dataset derived from the ScienceIT documentation, our study embarks on a detailed comparison of two fine-tuned large language models and five retrieval-augmented generation (RAG) models. Through data processing techniques, we transform the documentation into structured context-question-answer triples, leveraging the latest Large Language Models (AWS Bedrock, GCP PaLM2, Meta LLaMA2, OpenAI GPT-4, Google Gemini-Pro) for data-driven insights. Additionally, we introduce the Aggregated Knowledge Model (AKM), which synthesizes responses from the seven models mentioned above using K-means clustering to select the most representative answers. The evaluation of these models across multiple metrics offers a comprehensive look into their effectiveness and suitability for the LBL ScienceIT environment. The results demonstrate the potential benefits of integrating fine-tuning and retrieval-augmented strategies, highlighting significant performance improvements achieved with the AKM. The insights gained from this study can be applied to develop specialized QA systems tailored to specific domains.

Updated: 2024-10-24 00:49:46

标题: 聚合知识模型：通过微调和检索增强的生成模型增强特定领域的问答系统

摘要: 本文介绍了一种新颖的方法，用于增强封闭领域的问答（QA）系统，重点关注劳伦斯伯克利国家实验室（LBL）科学信息技术（ScienceIT）领域的特定需求。利用从ScienceIT文档中提取的丰富数据集，我们的研究进行了两种经过精细调整的大型语言模型和五种检索增强生成（RAG）模型的详细比较。通过数据处理技术，我们将文档转换为结构化的上下文-问题-答案三元组，利用最新的大型语言模型（AWS Bedrock、GCP PaLM2、Meta LLaMA2、OpenAI GPT-4、Google Gemini-Pro）进行数据驱动的洞察。此外，我们引入了聚合知识模型（AKM），通过K-means聚类从上述七个模型中综合响应，选择最具代表性的答案。对这些模型在多个指标上的评估提供了对它们的有效性和适用性于LBL ScienceIT环境的全面了解。结果显示了整合精细调整和检索增强策略的潜在益处，突出了AKM所实现的显著性能改进。从这项研究中获得的见解可以应用于开发针对特定领域定制的专业QA系统。

更新时间: 2024-10-24 00:49:46

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18344v1

PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures

Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors. Processing-In-Memory (PIM) systems can alleviate this data movement bottleneck by placing simple processors near or inside to memory arrays. In this work, we introduce PyGim, an efficient ML library that accelerates GNNs on real PIM systems. We propose intelligent parallelization techniques for memory-intensive kernels of GNNs tailored for real PIM systems, and develop handy Python API for them. We provide hybrid GNN execution, in which the compute-intensive and memory-intensive kernels are executed in processor-centric and memory-centric computing systems, respectively. We extensively evaluate PyGim on a real-world PIM system with 1992 PIM cores using emerging GNN models, and demonstrate that it outperforms its state-of-the-art CPU counterpart on Intel Xeon by on average 3.04x, and achieves higher resource utilization than CPU and GPU systems. Our work provides useful recommendations for software, system and hardware designers. PyGim is publicly available at https://github.com/CMU-SAFARI/PyGim.

Updated: 2024-10-24 00:44:49

标题: PyGim: 一种适用于实时处理内存架构的高效图神经网络库

摘要: 图神经网络（GNNs）是新兴的机器学习模型，用于分析图结构数据。图神经网络（GNN）的执行涉及计算密集型和内存密集型内核，后者占据了总时间的主导地位，受数据在内存和处理器之间的移动限制明显。处理内存系统（PIM）可以通过将简单处理器放置在或内存数组附近来缓解这种数据移动瓶颈。在这项工作中，我们介绍了PyGim，一种高效的机器学习库，可以在实际PIM系统上加速GNN。我们为真实PIM系统量身定制了GNN的内存密集型内核的智能并行化技术，并为其开发了方便的Python API。我们提供了混合GNN执行，其中计算密集型和内存密集型内核分别在以处理器为中心和以内存为中心的计算系统中执行。我们在一个拥有1992个PIM核心的真实世界PIM系统上对PyGim进行了广泛评估，使用新兴的GNN模型，并证明它的性能优于英特尔Xeon上的最先进CPU对应物平均提高了3.04倍，并实现了比CPU和GPU系统更高的资源利用率。我们的工作为软件、系统和硬件设计者提供了有用的建议。PyGim可在https://github.com/CMU-SAFARI/PyGim 上公开获取。

更新时间: 2024-10-24 00:44:49

领域: cs.AR,cs.DC,cs.LG,cs.PF

下载: http://arxiv.org/abs/2402.16731v4

Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems

The mathematical capabilities of AI systems are complex and multifaceted. Most existing research has predominantly focused on the correctness of AI-generated solutions to mathematical problems. In this work, we argue that beyond producing correct answers, AI systems should also be capable of, or assist humans in, developing novel solutions to mathematical challenges. This study explores the creative potential of Large Language Models (LLMs) in mathematical reasoning, an aspect that has received limited attention in prior research. We introduce a novel framework and benchmark, CreativeMath, which encompasses problems ranging from middle school curricula to Olympic-level competitions, designed to assess LLMs' ability to propose innovative solutions after some known solutions have been provided. Our experiments demonstrate that, while LLMs perform well on standard mathematical tasks, their capacity for creative problem-solving varies considerably. Notably, the Gemini-1.5-Pro model outperformed other LLMs in generating novel solutions. This research opens a new frontier in evaluating AI creativity, shedding light on both the strengths and limitations of LLMs in fostering mathematical innovation, and setting the stage for future developments in AI-assisted mathematical discovery.

Updated: 2024-10-24 00:12:49

标题: 评估LLMs在提出数学问题的新颖解决方案中的创造力

摘要: 人工智能系统的数学能力是复杂且多方面的。大多数现有研究主要集中在人工智能生成的数学问题解决方案的正确性上。在这项工作中，我们认为，除了产生正确答案外，人工智能系统还应该能够或者协助人类开发数学挑战的新解决方案。这项研究探讨了大型语言模型（LLMs）在数学推理中的创造潜力，这一方面在先前的研究中受到了有限的关注。我们引入了一个新的框架和基准，CreativeMath，涵盖了从中学课程到奥林匹克级别比赛的问题，旨在评估LLMs在提供一些已知解决方案后提出创新解决方案的能力。我们的实验表明，虽然LLMs在标准数学任务上表现良好，但它们在创造性问题解决能力方面差异很大。值得注意的是，Gemini-1.5-Pro模型在生成新颖解决方案方面优于其他LLMs。这项研究开辟了评估人工智能创造性的新领域，揭示了LLMs在促进数学创新方面的优势和局限性，并为未来AI辅助数学发现的发展奠定了基础。

更新时间: 2024-10-24 00:12:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.18336v1

Search-Based Path Planning among Movable Obstacles

This paper investigates Path planning Among Movable Obstacles (PAMO), which seeks a minimum cost collision-free path among static obstacles from start to goal while allowing the robot to push away movable obstacles (i.e., objects) along its path when needed. To develop planners that are complete and optimal for PAMO, the planner has to search a giant state space involving both the location of the robot as well as the locations of the objects, which grows exponentially with respect to the number of objects. The main idea in this paper is that, only a small fraction of this giant state space needs to be explored during planning as guided by a heuristic, and most of the objects far away from the robot are intact, which thus leads to runtime efficient algorithms. Based on this idea, this paper introduces two PAMO formulations, i.e., bi-objective and resource constrained problems in an occupancy grid, and develops PAMO*, a search method with completeness and solution optimality guarantees, to solve the two problems. We then further extend PAMO* to hybrid-state PAMO* to plan in continuous spaces with high-fidelity interaction between the robot and the objects. Our results show that, PAMO* can often find optimal solutions within a second in cluttered environments with up to 400 objects.

Updated: 2024-10-24 00:02:58

标题: 移动障碍物间的基于搜索的路径规划

摘要: 本文研究了可移动障碍物之间的路径规划（PAMO），该方法在允许机器人在需要时推开可移动障碍物（即物体）的路径上寻找从起点到目标的最小成本无碰撞路径。为了开发对PAMO完整且最优的规划器，规划器必须搜索一个庞大的状态空间，涉及机器人的位置以及物体的位置，随着物体数量的增加呈指数增长。本文的主要思想是，在规划过程中只有这个庞大状态空间的一小部分需要根据启发式指导进行探索，大部分远离机器人的物体保持完整，从而导致运行时高效的算法。基于这个思想，本文介绍了两种PAMO形式，即双目标和资源受限问题在一个占用格中，并开发了PAMO*，一种具有完整性和解决方案最优性保证的搜索方法来解决这两个问题。然后，我们进一步将PAMO*扩展为混合状态PAMO*，以在具有高保真度的机器人与物体之间的交互的连续空间中进行规划。我们的结果显示，PAMO*在高达400个物体的混乱环境中通常可以在一秒内找到最优解。

更新时间: 2024-10-24 00:02:58

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.18333v1