Arxiv Day: Article

ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics

Reinforcement learning (RL) has demonstrated compelling performance in robotic tasks, but its success often hinges on the design of complex, ad hoc reward functions. Researchers have explored how Large Language Models (LLMs) could enable non-expert users to specify reward functions more easily. However, LLMs struggle to balance the importance of different features, generalize poorly to out-of-distribution robotic tasks, and cannot represent the problem properly with only text-based descriptions. To address these challenges, we propose ELEMENTAL (intEractive LEarning froM dEmoNstraTion And Language), a novel framework that combines natural language guidance with visual user demonstrations to align robot behavior with user intentions better. By incorporating visual inputs, ELEMENTAL overcomes the limitations of text-only task specifications, while leveraging inverse reinforcement learning (IRL) to balance feature weights and match the demonstrated behaviors optimally. ELEMENTAL also introduces an iterative feedback-loop through self-reflection to improve feature, reward, and policy learning. Our experiment results demonstrate that ELEMENTAL outperforms prior work by 42.3% on task success, and achieves 41.3% better generalization in out-of-distribution tasks, highlighting its robustness in LfD.

Updated: 2024-11-27 23:58:32

标题: ELEMENTAL：互动学习，从示范和视觉语言模型中设计奖励的机器人

摘要: 强化学习（RL）在机器人任务中表现出令人信服的性能，但其成功往往取决于复杂、特定的奖励函数的设计。研究人员探索了大型语言模型（LLMs）如何使非专业用户更容易指定奖励函数。然而，LLMs往往难以平衡不同特征的重要性，在超出分布的机器人任务中泛化能力差，且无法仅通过基于文本的描述充分表示问题。为解决这些挑战，我们提出了ELEMENTAL（intEractive LEarning froM dEmoNstraTion And Language），这是一个结合了自然语言指导和视觉用户演示的新颖框架，以更好地将机器人行为与用户意图对齐。通过融入视觉输入，ELEMENTAL克服了仅有文本任务规范的限制，同时利用逆强化学习（IRL）来平衡特征权重并最佳匹配示范行为。ELEMENTAL还通过自我反思引入了迭代反馈循环，以改进特征、奖励和策略学习。我们的实验结果表明，ELEMENTAL在任务成功率上比之前的工作提高了42.3％，在超出分布任务中的泛化能力提高了41.3％，突显了其在LfD中的稳健性。

更新时间: 2024-11-27 23:58:32

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2411.18825v1

RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data

We present RelCon, a novel self-supervised \textit{Rel}ative \textit{Con}trastive learning approach that uses a learnable distance measure in combination with a softened contrastive loss for training an motion foundation model from wearable sensors. The learnable distance measure captures motif similarity and domain-specific semantic information such as rotation invariance. The learned distance provides a measurement of semantic similarity between a pair of accelerometer time-series segments, which is used to measure the distance between an anchor and various other sampled candidate segments. The self-supervised model is trained on 1 billion segments from 87,376 participants from a large wearables dataset. The model achieves strong performance across multiple downstream tasks, encompassing both classification and regression. To our knowledge, we are the first to show the generalizability of a self-supervised learning model with motion data from wearables across distinct evaluation tasks.

Updated: 2024-11-27 23:51:53

标题: 相对对比学习：用于可穿戴数据的运动基础模型

摘要: 我们提出了RelCon，一种新颖的自监督\textit{Rel}ative \textit{Con}trastive学习方法，该方法使用可学习的距离度量结合软化的对比损失来训练来自可穿戴传感器的运动基础模型。可学习的距离度量捕捉了主题相似性和特定领域语义信息，如旋转不变性。学习到的距离提供了一对加速度计时间序列段之间语义相似性的度量，该度量用于测量锚点与各种其他抽样候选段之间的距离。自监督模型在来自一个庞大可穿戴设备数据集的87,376名参与者的10亿个段上进行训练。该模型在多个下游任务中表现出色，涵盖分类和回归两方面。据我们所知，我们是首次展示了一种自监督学习模型在来自可穿戴设备的运动数据上跨不同评估任务的泛化能力。

更新时间: 2024-11-27 23:51:53

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.18822v1

Unifying Generative and Dense Retrieval for Sequential Recommendation

Sequential dense retrieval models utilize advanced sequence learning techniques to compute item and user representations, which are then used to rank relevant items for a user through inner product computation between the user and all item representations. However, this approach requires storing a unique representation for each item, resulting in significant memory requirements as the number of items grow. In contrast, the recently proposed generative retrieval paradigm offers a promising alternative by directly predicting item indices using a generative model trained on semantic IDs that encapsulate items' semantic information. Despite its potential for large-scale applications, a comprehensive comparison between generative retrieval and sequential dense retrieval under fair conditions is still lacking, leaving open questions regarding performance, and computation trade-offs. To address this, we compare these two approaches under controlled conditions on academic benchmarks and propose LIGER (LeveragIng dense retrieval for GEnerative Retrieval), a hybrid model that combines the strengths of these two widely used methods. LIGER integrates sequential dense retrieval into generative retrieval, mitigating performance differences and enhancing cold-start item recommendation in the datasets evaluated. This hybrid approach provides insights into the trade-offs between these approaches and demonstrates improvements in efficiency and effectiveness for recommendation systems in small-scale benchmarks.

Updated: 2024-11-27 23:36:59

标题: 将生成检索和密集检索统一为顺序推荐

摘要: 顺序密集检索模型利用先进的序列学习技术计算项目和用户表示，然后通过用户和所有项目表示之间的内积计算为用户排名相关项目。然而，这种方法需要为每个项目存储一个唯一表示，随着项目数量的增加，会导致显著的内存需求。相比之下，最近提出的生成检索范式通过直接预测项目索引，使用在语义ID上训练的生成模型，这些语义ID包含项目的语义信息，提供了一种有前途的替代方案。尽管具有大规模应用潜力，但在公平条件下对生成检索和顺序密集检索进行全面比较仍然缺乏，对于性能和计算权衡仍有待解决。为了解决这个问题，我们在学术基准上比较这两种方法，并提出了LIGER（利用密集检索进行生成检索），这是一个将这两种广泛使用的方法的优势结合起来的混合模型。LIGER将顺序密集检索集成到生成检索中，减轻性能差异，并增强了在评估数据集中的冷启动项目推荐。这种混合方法为这些方法之间的权衡提供了见解，并展示了在小规模基准上推荐系统的效率和效果的改进。

更新时间: 2024-11-27 23:36:59

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2411.18814v1

NewsEdits 2.0: Learning the Intentions Behind Updating News

As events progress, news articles often update with new information: if we are not cautious, we risk propagating outdated facts. In this work, we hypothesize that linguistic features indicate factual fluidity, and that we can predict which facts in a news article will update using solely the text of a news article (i.e. not external resources like search engines). We test this hypothesis, first, by isolating fact-updates in large news revisions corpora. News articles may update for many reasons (e.g. factual, stylistic, narrative). We introduce the NewsEdits 2.0 taxonomy, an edit-intentions schema that separates fact updates from stylistic and narrative updates in news writing. We annotate over 9,200 pairs of sentence revisions and train high-scoring ensemble models to apply this schema. Then, taking a large dataset of silver-labeled pairs, we show that we can predict when facts will update in older article drafts with high precision. Finally, to demonstrate the usefulness of these findings, we construct a language model question asking (LLM-QA) abstention task. We wish the LLM to abstain from answering questions when information is likely to become outdated. Using our predictions, we show, LLM absention reaches near oracle levels of accuracy.

Updated: 2024-11-27 23:35:23

标题: NewsEdits 2.0：学习新闻更新背后的意图

摘要: 随着事件的发展，新闻文章经常会更新新的信息：如果我们不谨慎，就会传播过时的事实。在这项工作中，我们假设语言特征表明了事实的流动性，并且我们可以仅使用新闻文章的文本来预测新闻文章中哪些事实会更新（即不使用搜索引擎等外部资源）。我们通过在大型新闻修订语料库中孤立事实更新来测试这一假设。新闻文章可能因多种原因而更新（例如事实，风格，叙述）。我们引入了NewsEdits 2.0分类法，这是一个编辑意图模式，将新闻写作中的事实更新与风格和叙述更新分开。我们标注了超过9200对句子修订，并训练了高分的集成模型来应用此模式。然后，通过使用一个大型的银标记对数据集，我们展示了我们可以高精度地预测旧文章草稿中的事实何时会更新。最后，为了展示这些发现的有用性，我们构建了一个语言模型问题问答（LLM-QA）弃权任务。我们希望LLM在信息可能会过时时放弃回答问题。使用我们的预测，我们展示了LLM弃权达到了近乎完美的准确性水平。

更新时间: 2024-11-27 23:35:23

领域: cs.CL,cs.AI,cs.DL

下载: http://arxiv.org/abs/2411.18811v1

Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds

Text-to-image diffusion models have demonstrated remarkable capability in generating realistic images from arbitrary text prompts. However, they often produce inconsistent results for compositional prompts such as "two dogs" or "a penguin on the right of a bowl". Understanding these inconsistencies is crucial for reliable image generation. In this paper, we highlight the significant role of initial noise in these inconsistencies, where certain noise patterns are more reliable for compositional prompts than others. Our analyses reveal that different initial random seeds tend to guide the model to place objects in distinct image areas, potentially adhering to specific patterns of camera angles and image composition associated with the seed. To improve the model's compositional ability, we propose a method for mining these reliable cases, resulting in a curated training set of generated images without requiring any manual annotation. By fine-tuning text-to-image models on these generated images, we significantly enhance their compositional capabilities. For numerical composition, we observe relative increases of 29.3% and 19.5% for Stable Diffusion and PixArt-{\alpha}, respectively. Spatial composition sees even larger gains, with 60.7% for Stable Diffusion and 21.1% for PixArt-{\alpha}.

Updated: 2024-11-27 23:32:54

标题: 通过可靠的随机种子增强组合文本到图像生成

摘要: 文本到图像扩散模型已经展示出在从任意文本提示生成逼真图像方面的显著能力。然而，它们经常为“两只狗”或“一个企鹅在碗的右边”等构图提示产生不一致的结果。理解这些不一致性对可靠的图像生成至关重要。在本文中，我们强调了这些不一致性中初始噪声的重要作用，其中某些噪声模式对于构图提示更可靠。我们的分析揭示了不同的初始随机种子倾向于引导模型将对象放置在不同的图像区域，可能遵循与种子相关的特定摄像机角度和图像构图模式。为了提高模型的构图能力，我们提出了一种挖掘这些可靠案例的方法，从而得到一个经过筛选的生成图像训练集，而无需任何手动注释。通过在这些生成图像上微调文本到图像模型，我们显著增强了它们的构图能力。对于数字构图，我们观察到稳定扩散和PixArt-{\alpha}分别增加了29.3%和19.5%。空间构图看到了更大的增益，稳定扩散增加了60.7%，PixArt-{\alpha}增加了21.1%。

更新时间: 2024-11-27 23:32:54

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.18810v1

Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic Analysis

QUIC, an increasingly adopted transport protocol, addresses limitations of TCP by offering improved security, performance, and features such as stream multiplexing and connection migration. However, these enhancements also introduce challenges for network operators in monitoring and analyzing web traffic, especially due to QUIC's encryption. Existing datasets are inadequate they are often outdated, lack diversity, anonymize critical information, or exclude essential features like SSL keys-limiting comprehensive research and development in this area. We introduce VisQUIC, a publicly available dataset of over 100,000 labeled QUIC traces with corresponding SSL keys, collected from more than 40,000 websites over four months. By generating visual representations of the traces, we facilitate advanced machine learning (ML) applications and in-depth analysis of encrypted QUIC traffic. To demonstrate the dataset's potential, we estimate the number of HTTP3 request-response pairs in a QUIC connection using only encrypted traffic, achieving up to 92% accuracy. This estimation provides insights into server behavior, client-server interactions, and connection load-crucial for tasks like load balancing and intrusion detection. Our dataset enables comprehensive studies on QUIC and HTTP/3 protocols and supports the development of tools for encrypted traffic analysis.

Updated: 2024-11-27 23:27:20

标题: 探索QUIC动态：用于加密流量分析的大规模数据集

摘要: QUIC，一种越来越被采用的传输协议，通过提供改进的安全性、性能和功能（如流复用和连接迁移）来解决TCP的局限性。然而，这些增强也给网络运营商在监控和分析Web流量方面带来了挑战，特别是由于QUIC的加密。现有数据集不足，它们通常过时，缺乏多样性，对关键信息进行了匿名处理，或者排除了诸如SSL密钥之类的必要特征，限制了该领域的全面研究和开发。我们介绍了VisQUIC，这是一个公开可用的数据集，包含了超过100,000个带有相应SSL密钥的QUIC跟踪数据，这些数据是在四个月内从超过40,000个网站收集的。通过生成跟踪数据的可视化表示，我们促进了高级机器学习（ML）应用和对加密QUIC流量的深入分析。为了展示数据集的潜力，我们仅使用加密流量估计了QUIC连接中的HTTP3请求-响应对数量，实现了高达92%的准确率。这种估计提供了有关服务器行为、客户端-服务器交互和连接负载的见解，这对于诸如负载平衡和入侵检测等任务至关重要。我们的数据集支持对QUIC和HTTP/3协议进行全面研究，并支持开发用于加密流量分析的工具。

更新时间: 2024-11-27 23:27:20

领域: cs.NI,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.03728v4

One-Step Early Stopping Strategy using Neural Tangent Kernel Theory and Rademacher Complexity

The early stopping strategy consists in stopping the training process of a neural network (NN) on a set $S$ of input data before training error is minimal. The advantage is that the NN then retains good generalization properties, i.e. it gives good predictions on data outside $S$, and a good estimate of the statistical error (``population loss'') is obtained. We give here an analytical estimation of the optimal stopping time involving basically the initial training error vector and the eigenvalues of the ``neural tangent kernel''. This yields an upper bound on the population loss which is well-suited to the underparameterized context (where the number of parameters is moderate compared with the number of data). Our method is illustrated on the example of an NN simulating the MPC control of a Van der Pol oscillator.

Updated: 2024-11-27 23:22:28

标题: 一步早停策略利用神经切向核理论和Rademacher复杂性

摘要: 早停止策略是指在神经网络（NN）在一组输入数据$S$上的训练过程中，在训练误差达到最小之前停止。其优势在于NN保留了良好的泛化特性，即在$S$之外的数据上能够给出良好的预测，并且可以获得对统计误差（“总体损失”）的良好估计。我们在这里提供了一个涉及初始训练误差向量和“神经切线核”特征值的最佳停止时间的解析估计。这产生了一个适合于欠参数化环境（参数数量相对于数据数量较少）的总体损失的上界。我们的方法在模拟Van der Pol振荡器的MPC控制的NN示例上进行了说明。

更新时间: 2024-11-27 23:22:28

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.18806v1

To bootstrap or to rollout? An optimal and adaptive interpolation

Bootstrapping and rollout are two fundamental principles for value function estimation in reinforcement learning (RL). We introduce a novel class of Bellman operators, called subgraph Bellman operators, that interpolate between bootstrapping and rollout methods. Our estimator, derived by solving the fixed point of the empirical subgraph Bellman operator, combines the strengths of the bootstrapping-based temporal difference (TD) estimator and the rollout-based Monte Carlo (MC) methods. Specifically, the error upper bound of our estimator approaches the optimal variance achieved by TD, with an additional term depending on the exit probability of a selected subset of the state space. At the same time, the estimator exhibits the finite-sample adaptivity of MC, with sample complexity depending only on the occupancy measure of this subset. We complement the upper bound with an information-theoretic lower bound, showing that the additional term is unavoidable given a reasonable sample size. Together, these results establish subgraph Bellman estimators as an optimal and adaptive framework for reconciling TD and MC methods in policy evaluation.

Updated: 2024-11-27 23:19:41

标题: 启动引导还是推出？一种最佳和自适应的插值方法

摘要: Bootstraping和rollout是强化学习中价值函数估计的两个基本原则。我们引入了一种新颖的贝尔曼算子类别，称为子图贝尔曼算子，它在bootstrapping和rollout方法之间进行插值。我们的估计器通过解决经验子图贝尔曼算子的不动点得到，结合了基于bootstrapping的时间差分（TD）估计器和基于rollout的蒙特卡洛（MC）方法的优势。具体来说，我们的估计器的误差上界接近TD实现的最佳方差，还有一个额外的项取决于所选状态空间子集的退出概率。同时，估计器展现出MC的有限样本适应性，样本复杂度仅取决于该子集的占用测度。我们将上界补充为信息理论下界，表明在合理的样本大小下，额外项是不可避免的。综合这些结果，我们将子图贝尔曼估计器确定为和MC方法在策略评估中协调的最佳自适应框架。

更新时间: 2024-11-27 23:19:41

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2411.09731v2

Stratified Non-Negative Tensor Factorization

Non-negative matrix factorization (NMF) and non-negative tensor factorization (NTF) decompose non-negative high-dimensional data into non-negative low-rank components. NMF and NTF methods are popular for their intrinsic interpretability and effectiveness on large-scale data. Recent work developed Stratified-NMF, which applies NMF to regimes where data may come from different sources (strata) with different underlying distributions, and seeks to recover both strata-dependent information and global topics shared across strata. Applying Stratified-NMF to multi-modal data requires flattening across modes, and therefore loses geometric structure contained implicitly within the tensor. To address this problem, we extend Stratified-NMF to the tensor setting by developing a multiplicative update rule and demonstrating the method on text and image data. We find that Stratified-NTF can identify interpretable topics with lower memory requirements than Stratified-NMF. We also introduce a regularized version of the method and demonstrate its effects on image data.

Updated: 2024-11-27 23:16:00

标题: 分层非负张量因子分解

摘要: 非负矩阵分解（NMF）和非负张量分解（NTF）将非负高维数据分解为非负低秩成分。NMF和NTF方法因其固有的可解释性和在大规模数据上的有效性而受到欢迎。最近的研究开发了分层NMF，该方法将NMF应用于数据可能来自具有不同基础分布的不同来源（层）的情况，并寻求恢复既与层相关的信息又与各层共享的全局主题。将分层NMF应用于多模态数据需要在各模式之间进行展平，因此丢失了张量隐含的几何结构。为解决这个问题，我们将分层NMF扩展到张量设置中，通过开发乘法更新规则并在文本和图像数据上演示该方法。我们发现，与分层NMF相比，分层NTF可以识别出具有更低内存需求的可解释主题。我们还介绍了该方法的正则化版本，并演示其在图像数据上的效果。

更新时间: 2024-11-27 23:16:00

领域: cs.LG,cs.NA,math.NA,G.1.6; I.5.3; I.5.4

下载: http://arxiv.org/abs/2411.18805v1

Formal Verification of Digital Twins with TLA and Information Leakage Control

Verifying the correctness of a digital twin provides a formal guarantee that the digital twin operates as intended. Digital twin verification is challenging due to the presence of uncertainties in the virtual representation, the physical environment, and the bidirectional flow of information between physical and virtual. A further challenge is that a digital twin of a complex system is composed of distributed components. This paper presents a methodology to specify and verify digital twin behavior, translating uncertain processes into a formally verifiable finite state machine. We use the Temporal Logic of Actions (TLA) to create a specification, an implementation abstraction that defines the properties required for correct system behavior. Our approach includes a novel weakening of formal security properties, allowing controlled information leakage while preserving theoretical guarantees. We demonstrate this approach on a digital twin of an unmanned aerial vehicle, verifying synchronization of physical-to-virtual and virtual-to-digital data flows to detect unintended misalignments.

Updated: 2024-11-27 22:52:36

标题: 用TLA和信息泄露控制对数字孪生进行形式验证

摘要: 验证数字孪生的正确性提供了数字孪生操作如预期的正式保证。数字孪生验证具有挑战性，因为虚拟表示、物理环境和物理与虚拟之间的信息双向流存在不确定性。另一个挑战是复杂系统的数字孪生由分布式组件组成。本文提出了一种方法论来指定和验证数字孪生行为，将不确定性过程转化为可以正式验证的有限状态机。我们使用行动时态逻辑（TLA）来创建规范，一种实现抽象，定义了正确系统行为所需的属性。我们的方法包括对形式安全性属性的新型削弱，允许控制信息泄露同时保留理论保证。我们在一个无人机的数字孪生上演示了这种方法，验证物理到虚拟和虚拟到数字数据流的同步，以检测意外的不对齐。

更新时间: 2024-11-27 22:52:36

领域: cs.CR,cs.DC,cs.IT,cs.SY,eess.SY,math.IT

下载: http://arxiv.org/abs/2411.18798v1

OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework

The integration of Large Language Models (LLMs) into autonomous driving systems offers promising enhancements in environmental understanding and decision-making. However, the substantial computational demands of deploying LLMs locally on vehicles render this approach unfeasible for real-world automotive applications. To address this challenge, we introduce OWLed, the Outlier-Weighed Layerwise Pruning for Efficient Autonomous Driving Framework that leverages outlier-weighted layerwise sparsity for model compression. Our method assigns non-uniform sparsity ratios to different layers based on the distribution of outlier features, significantly reducing the model size without the need for fine-tuning. To ensure the compressed model adapts well to autonomous driving tasks, we incorporate driving environment data into both the calibration and pruning processes. Our empirical studies reveal that the encoder component is more sensitive to pruning than the LLM, highlighting its critical role in the system. Experimental results demonstrate that OWLed outperforms existing methods in perception, action prediction, and language understanding while substantially lowering computational requirements. These findings underscore the potential of combining advanced pruning techniques with LLMs to develop efficient and robust autonomous driving systems capable of handling complex scenarios. Code will be made publicly available.

Updated: 2024-11-27 22:49:56

标题: OWLed：针对高效自动驾驶框架的异常值加权逐层剪枝

摘要: 将大型语言模型（LLMs）集成到自动驾驶系统中，可以在环境理解和决策方面提供有希望的增强。然而，将LLMs本地部署在车辆上所需的大量计算需求使得这种方法在现实世界的汽车应用中不可行。为了解决这一挑战，我们介绍了OWLed，即用于高效自动驾驶框架的异常加权逐层剪枝方法，利用异常加权的逐层稀疏性进行模型压缩。我们的方法根据异常特征的分布为不同层分配非均匀稀疏比率，显著减小模型大小而无需进行微调。为了确保压缩模型能够很好地适应自动驾驶任务，我们将驾驶环境数据融入到校准和剪枝过程中。我们的实证研究表明，编码器部件对剪枝更为敏感，突显了其在系统中的关键作用。实验结果表明，OWLed在感知、动作预测和语言理解方面优于现有方法，同时大幅降低了计算需求。这些发现强调了将先进的剪枝技术与LLMs相结合，开发能够处理复杂场景的高效强大自动驾驶系统的潜力。代码将公开发布。

更新时间: 2024-11-27 22:49:56

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2411.07711v2

UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS

Recent advancements in large language model (LLM) unlearning have shown remarkable success in removing unwanted data-model influences while preserving the model's utility for legitimate knowledge. However, despite these strides, sparse Mixture-of-Experts (MoE) LLMs--a key subset of the LLM family--have received little attention and remain largely unexplored in the context of unlearning. As MoE LLMs are celebrated for their exceptional performance and highly efficient inference processes, we ask: How can unlearning be performed effectively and efficiently on MoE LLMs? And will traditional unlearning methods be applicable to MoE architectures? Our pilot study shows that the dynamic routing nature of MoE LLMs introduces unique challenges, leading to substantial utility drops when existing unlearning methods are applied. Specifically, unlearning disrupts the router's expert selection, causing significant selection shift from the most unlearning target-related experts to irrelevant ones. As a result, more experts than necessary are affected, leading to excessive forgetting and loss of control over which knowledge is erased. To address this, we propose a novel single-expert unlearning framework, referred to as UOE, for MoE LLMs. Through expert attribution, unlearning is concentrated on the most actively engaged expert for the specified knowledge. Concurrently, an anchor loss is applied to the router to stabilize the active state of this targeted expert, ensuring focused and controlled unlearning that preserves model utility. The proposed UOE framework is also compatible with various unlearning algorithms. Extensive experiments demonstrate that UOE enhances both forget quality up to 5% and model utility by 35% on MoE LLMs across various benchmarks, LLM architectures, while only unlearning 0.06% of the model parameters.

Updated: 2024-11-27 22:46:08

标题: UOE：忘记一个专家就足够了对于混合专家LLMS

摘要: 最近在大型语言模型（LLM）取消学习方面取得的进展表明，在保留模型对合法知识的实用性的同时，成功地消除了不需要的数据-模型影响。然而，尽管取得了这些进展，稀疏的专家混合模型（MoE）LLMs——LLM家族的一个关键子集——却鲜有人关注，在取消学习的背景下仍然很少被探索。由于MoE LLMs以其出色的性能和高效的推理过程而受到赞誉，我们提出了一个问题：如何有效且高效地在MoE LLMs上进行取消学习？传统的取消学习方法是否适用于MoE架构？我们的初步研究表明，MoE LLMs的动态路由特性引入了独特的挑战，导致当应用现有的取消学习方法时，模型的效用显著下降。具体地，取消学习破坏了路由器的专家选择，导致将最相关于取消学习目标的专家的选择明显转移到与之无关的专家身上。结果，受影响的专家数量超过了必要的数量，导致过度遗忘和失去对擦除哪些知识的控制。为了解决这个问题，我们提出了一种新颖的MoE LLMs单专家取消学习框架，称为UOE。通过专家归因，取消学习集中在为指定知识最活跃的专家身上。同时，为路由器应用了一个锚损失，以稳定目标专家的活跃状态，确保集中和受控的取消学习，保持模型的实用性。所提出的UOE框架也与各种取消学习算法兼容。广泛的实验表明，UOE在各种基准测试、LLM架构上提高了遗忘质量高达5%，模型实用性提高了35%，同时只取消学习了模型参数的0.06%。

更新时间: 2024-11-27 22:46:08

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.18797v1

Graph-Based Biomarker Discovery and Interpretation for Alzheimer's Disease

Early diagnosis and discovery of therapeutic drug targets are crucial objectives for the effective management of Alzheimer's Disease (AD). Current approaches for AD diagnosis and treatment planning are based on radiological imaging and largely inaccessible for population-level screening due to prohibitive costs and limited availability. Recently, blood tests have shown promise in diagnosing AD and highlighting possible biomarkers that can be used as drug targets for AD management. Blood tests are significantly more accessible to disadvantaged populations, cost-effective, and minimally invasive. However, biomarker discovery in the context of AD diagnosis is complex as there exist important associations between various biomarkers. Here, we introduce BRAIN (Biomarker Representation, Analysis, and Interpretation Network), a novel machine learning (ML) framework to jointly optimize the diagnostic accuracy and biomarker discovery processes to identify all relevant biomarkers that contribute to AD diagnosis. Using a holistic graph-based representation for biomarkers, we highlight their inter-dependencies and explain why different ML models identify different discriminative biomarkers. We apply BRAIN to a publicly available blood biomarker dataset, revealing three novel biomarker sub-networks whose interactions vary between the control and AD groups, offering a new paradigm for drug discovery and biomarker analysis for AD.

Updated: 2024-11-27 22:45:19

标题: 基于图的生物标志物发现和解释在阿尔茨海默病中的应用

摘要: 早期诊断和发现治疗药物靶点对于有效管理阿尔茨海默病（AD）至关重要。目前用于AD诊断和治疗计划的方法基于放射影像学，由于高昂的成本和有限的可用性，大多数人群难以进行筛查。最近，血液检测显示了在诊断AD和突出可能用作AD管理药物靶点的生物标志方面的潜力。血液检测对于处于劣势人群更加可及，成本效益高，且侵入性小。然而，在AD诊断背景下的生物标志发现是复杂的，因为各种生物标志之间存在重要的关联。在这里，我们介绍了BRAIN（生物标志表示、分析和解释网络），这是一个新颖的机器学习（ML）框架，能够共同优化诊断准确性和生物标志发现过程，以识别所有与AD诊断有关的生物标志。通过使用基于图的全面表示生物标志，我们突出它们之间的相互依赖关系，并解释为什么不同的ML模型会识别出不同的区分性生物标志。我们将BRAIN应用于一个公开可用的血液生物标志数据集，揭示了三个新的生物标志子网络，其相互作用在对照组和AD组之间有所不同，为AD的药物发现和生物标志分析提供了新的范式。

更新时间: 2024-11-27 22:45:19

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2411.18796v1

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Large Language Models (LLMs) have the capacity to store and recall facts. Through experimentation with open-source models, we observe that this ability to retrieve facts can be easily manipulated by changing contexts, even without altering their factual meanings. These findings highlight that LLMs might behave like an associative memory model where certain tokens in the contexts serve as clues to retrieving facts. We mathematically explore this property by studying how transformers, the building blocks of LLMs, can complete such memory tasks. We study a simple latent concept association problem with a one-layer transformer and we show theoretically and empirically that the transformer gathers information using self-attention and uses the value matrix for associative memory.

Updated: 2024-11-27 22:41:03

标题: LLM们梦到大象了吗（尽管被告知不要）？变压器中的潜在概念联想和联想记忆

摘要: 大型语言模型（LLMs）具有存储和回忆事实的能力。通过对开源模型进行实验，我们观察到检索事实的能力可以通过改变上下文来轻松操纵，即使不改变它们的事实含义。这些发现突显了LLMs可能表现得像一种联想记忆模型，其中上下文中的某些标记充当检索事实的线索。我们通过研究变压器如何完成这种记忆任务来数学上探讨了这一属性，变压器是LLMs的构建模块。我们通过使用一个单层变压器研究一个简单的潜在概念关联问题，理论上和实证上展示了变压器如何利用自注意力收集信息，并使用值矩阵进行联想记忆。

更新时间: 2024-11-27 22:41:03

领域: cs.CL,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.18400v2

Graph Max Shift: A Hill-Climbing Method for Graph Clustering

We present a method for graph clustering that is analogous with gradient ascent methods previously proposed for clustering points in space. We show that, when applied to a random geometric graph with data iid from some density with Morse regularity, the method is asymptotically consistent. Here, consistency is understood with respect to a density-level clustering defined by the partition of the support of the density induced by the basins of attraction of the density modes.

Updated: 2024-11-27 22:32:26

标题: 图最大偏移：一种图聚类的爬山方法

摘要: 我们提出了一种图聚类方法，类似于先前用于空间点聚类的梯度上升方法。我们展示了，当应用于从具有莫尔斯正则性的某种密度中独立同分布的随机几何图形时，该方法是渐近一致的。在这里，一致性是指与密度水平聚类相关的，由密度模式的吸引盆导致的密度支持的划分定义。

更新时间: 2024-11-27 22:32:26

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2411.18794v1

Quantum Advantage via Solving Multivariate Quadratics

In this work, we propose a new way to (non-interactively, verifiably) demonstrate Quantum Advantage by solving the average-case $\mathsf{NP}$ search problem of finding a solution to a system of (underdetermined) multivariate quadratic equations over the finite field $\mathbb{F}_2$ drawn from a specified distribution. In particular, we design a distribution of degree-2 polynomials $\{p_i(x_1,\ldots,x_n)\}_{i\in [m]}$ for $m<n$ over $\mathbb{F}_2$ for which we show that there is a quantum polynomial-time algorithm that simultaneously solves $\{p_i(x_1,\ldots,x_n)=y_i\}_{i\in [m]}$ for a random vector $(y_1,\ldots,y_m)$. On the other hand, while a solution exists with high probability, we conjecture that it is classically hard to find one based on classical cryptanalysis that we provide, including a comprehensive review of all known relevant classical algorithms for solving multivariate quadratics. Our approach proceeds by examining the Yamakawa-Zhandry (FOCS 2022) quantum advantage scheme and replacing the role of the random oracle with our multivariate quadratic equations. Our work therefore gives several new perspectives: First, our algorithm gives a counterexample to the conventional belief that generic classically hard multivariate quadratic systems are also quantumly hard. Second, based on cryptanalytic evidence, our work gives an explicit simple replacement for the random oracle from the work of Yamakawa and Zhandry. We show how to instantiate the random oracle with families of just degree two multivariate polynomials over $\mathbb{F}_2$.

Updated: 2024-11-27 22:29:46

标题: 通过解决多变量二次方程实现量子优势

摘要: 在这项工作中，我们提出了一种新的方法（非交互式，可验证地）通过解决在有限域$\mathbb{F}_2$上从指定分布中抽取的一个（欠定的）多元二次方程组的平均情况$\mathsf{NP}$搜索问题，来展示量子优势。具体来说，我们设计了一个由二次多项式$\{p_i(x_1,\ldots,x_n)\}_{i\in [m]}$组成的分布，其中$m<n$，定义在$\mathbb{F}_2$上，我们展示了存在一个量子多项式时间算法可以同时解决对于随机向量$(y_1,\ldots,y_m)$的方程组$\{p_i(x_1,\ldots,x_n)=y_i\}_{i\in [m]}$。另一方面，虽然解存在的概率很高，但我们推测基于我们提供的经典密码分析，基于经典密码分析，发现一个解是经典上很难的，包括对于解决多元二次方程的所有已知相关的经典算法的全面审查。我们的方法通过检查Yamakawa-Zhandry（FOCS 2022）量子优势方案，并用我们的多元二次方程取代随机预言者的作用来进行。因此，我们的工作提供了几个新的视角：首先，我们的算法提供了一个反例，反驳了一般认为通常的经典难解的多元二次方程系统也是量子上难解的观点。其次，基于密码分析的证据，我们的工作提供了一种替换Yamakawa和Zhandry工作中随机预言者的明确简单方法。我们展示了如何用仅仅包含二次多元多项式的族来实例化随机预言者。

更新时间: 2024-11-27 22:29:46

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2411.14697v2

Representative Social Choice: From Learning Theory to AI Alignment

Social choice theory is the study of preference aggregation across a population, used both in mechanism design for human agents and in the democratic alignment of language models. In this study, we propose the representative social choice framework for the modeling of democratic representation in collective decisions, where the number of issues and individuals are too large for mechanisms to consider all preferences directly. These scenarios are widespread in real-world decision-making processes, such as jury trials, indirect elections, legislation processes, corporate governance, and, more recently, language model alignment. In representative social choice, the population is represented by a finite sample of individual-issue pairs based on which social choice decisions are made. We show that many of the deepest questions in representative social choice can be naturally formulated as statistical learning problems, and prove the generalization properties of social choice mechanisms using the theory of machine learning. We further formulate axioms for representative social choice, and prove Arrow-like impossibility theorems with new combinatorial tools of analysis. Our framework introduces the representative approach to social choice, opening up research directions at the intersection of social choice, learning theory, and AI alignment.

Updated: 2024-11-27 22:25:58

标题: 代表性社会选择：从学习理论到人工智能对齐

摘要: 社会选择理论是研究跨群体偏好聚合的学科，既用于人类代理的机制设计，也用于语言模型的民主对齐。在这项研究中，我们提出了代表性社会选择框架，用于模拟集体决策中的民主代表性，其中问题数量和个体数量太多，以至于机制无法直接考虑所有偏好。这些场景在现实世界的决策过程中广泛存在，例如陪审团审判、间接选举、立法过程、公司治理，以及最近的语言模型对齐。在代表性社会选择中，群体由基于个体-问题对的有限样本代表，基于这些样本进行社会选择决策。我们展示了代表性社会选择中许多最深刻的问题可以自然地表述为统计学习问题，并使用机器学习理论证明社会选择机制的泛化特性。我们进一步为代表性社会选择制定了公理，并利用新的组合分析工具证明了类似阿罗不可能定理。我们的框架引入了代表性社会选择方法，开启了社会选择、学习理论和AI对齐交叉研究方向。

更新时间: 2024-11-27 22:25:58

领域: cs.LG,cs.AI,cs.CL,cs.CY,cs.GT

下载: http://arxiv.org/abs/2410.23953v2

ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation

We propose ManiPose, a manifold-constrained multi-hypothesis model for human-pose 2D-to-3D lifting. We provide theoretical and empirical evidence that, due to the depth ambiguity inherent to monocular 3D human pose estimation, traditional regression models suffer from pose-topology consistency issues, which standard evaluation metrics (MPJPE, P-MPJPE and PCK) fail to assess. ManiPose addresses depth ambiguity by proposing multiple candidate 3D poses for each 2D input, each with its estimated plausibility. Unlike previous multi-hypothesis approaches, ManiPose forgoes generative models, greatly facilitating its training and usage. By constraining the outputs to lie on the human pose manifold, ManiPose guarantees the consistency of all hypothetical poses, in contrast to previous works. We showcase the performance of ManiPose on real-world datasets, where it outperforms state-of-the-art models in pose consistency by a large margin while being very competitive on the MPJPE metric.

Updated: 2024-11-27 22:24:02

标题: ManiPose：流形约束多假设3D人体姿势估计

摘要: 我们提出了ManiPose，这是一个受流形约束的多假设模型，用于人体姿势2D到3D的提升。我们提供了理论和实证证据表明，由于单目3D人体姿势估计固有的深度歧义，传统回归模型存在姿势拓扑一致性问题，而标准评估指标（MPJPE、P-MPJPE和PCK）无法评估。ManiPose通过为每个2D输入提出多个候选3D姿势，并分别估计其合理性来解决深度歧义。与先前的多假设方法不同，ManiPose放弃了生成模型，极大地促进了其训练和使用。通过将输出约束在人体姿势流形上，ManiPose保证了所有假设姿势的一致性，与先前的工作形成鲜明对比。我们展示了ManiPose在真实数据集上的性能，它在姿势一致性方面远远优于最先进的模型，同时在MPJPE指标上表现非常有竞争力。

更新时间: 2024-11-27 22:24:02

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.06386v2

Investigating Plausibility of Biologically Inspired Bayesian Learning in ANNs

Catastrophic forgetting has been the leading issue in the domain of lifelong learning in artificial systems. Current artificial systems are reasonably good at learning domains they have seen before; however, as soon as they encounter something new, they either go through a significant performance deterioration or if you try to teach them the new distribution of data, they forget what they have learned before. Additionally, they are also prone to being overly confident when performing inference on seen as well as unseen data, causing significant reliability issues when lives are at stake. Therefore, it is extremely important to dig into this problem and formulate an approach that will be continually adaptable as well as reliable. If we move away from the engineering domain of such systems and look into biological systems, we can realize that these very systems are very efficient at computing the reliance as well as the uncertainty of accurate predictions that further help them refine the inference in a life-long setting. These systems are not perfect; however, they do give us a solid understanding of the reasoning under uncertainty which takes us to the domain of Bayesian reasoning. We incorporate this Bayesian inference with thresholding mechanism as to mimic more biologically inspired models, but only at spatial level. Further, we reproduce a recent study on Bayesian Inference with Spiking Neural Networks for Continual Learning to compare against it as a suitable biologically inspired Bayesian framework. Overall, we investigate the plausibility of biologically inspired Bayesian Learning in artificial systems on a vision dataset, MNIST, and show relative performance improvement under the conditions when the model is forced to predict VS when the model is not.

Updated: 2024-11-27 22:19:27

标题: 探究生物启发的贝叶斯学习在人工神经网络中的可信度

摘要: 灾难性遗忘一直是人工系统终身学习领域的主要问题。当前的人工系统在学习之前看到过的领域方面表现相当好；然而，一旦它们遇到新事物，要么会经历显著的性能下降，要么如果您尝试教给它们新的数据分布，它们会忘记之前学到的内容。此外，它们在对已见和未见数据进行推理时也很容易过于自信，这在生命受到威胁时会导致重大可靠性问题。因此，深入研究这个问题并制定一个持续适应性和可靠性的方法非常重要。如果我们摆脱这些系统的工程领域，转而研究生物系统，我们会发现这些系统非常擅长计算准确预测的依赖性和不确定性，这进一步帮助它们在终身设置中完善推理。这些系统并不完美；然而，它们确实让我们对不确定性推理的原理有了扎实的理解，这让我们进入了贝叶斯推理的领域。我们将这种贝叶斯推理与阈值机制结合在一起，以模仿更多受生物启发的模型，但仅在空间层面上。此外，我们对最近一项关于脉冲神经网络的贝叶斯推理的研究进行复制，以将其作为一个合适的受生物启发的贝叶斯框架进行比较。总体而言，我们在视觉数据集MNIST上研究了受生物启发的贝叶斯学习在人工系统中的可行性，并展示了在模型被迫预测与模型不被迫预测的条件下的相对性能提升。

更新时间: 2024-11-27 22:19:27

领域: cs.LG

下载: http://arxiv.org/abs/2411.18788v1

Probabilistic Satisfaction of Temporal Logic Constraints in Reinforcement Learning via Adaptive Policy-Switching

Constrained Reinforcement Learning (CRL) is a subset of machine learning that introduces constraints into the traditional reinforcement learning (RL) framework. Unlike conventional RL which aims solely to maximize cumulative rewards, CRL incorporates additional constraints that represent specific mission requirements or limitations that the agent must comply with during the learning process. In this paper, we address a type of CRL problem where an agent aims to learn the optimal policy to maximize reward while ensuring a desired level of temporal logic constraint satisfaction throughout the learning process. We propose a novel framework that relies on switching between pure learning (reward maximization) and constraint satisfaction. This framework estimates the probability of constraint satisfaction based on earlier trials and properly adjusts the probability of switching between learning and constraint satisfaction policies. We theoretically validate the correctness of the proposed algorithm and demonstrate its performance through comprehensive simulations.

Updated: 2024-11-27 22:08:00

标题: 透过自适应策略切换在强化学习中概率满足时间逻辑约束

摘要: Constrained Reinforcement Learning（CRL）是机器学习的一个子集，将约束引入传统强化学习（RL）框架中。与传统的RL不同，传统RL的目标仅仅是最大化累积奖励，CRL还包括额外的约束条件，代表了学习过程中代理人必须遵守的特定任务要求或限制。在本文中，我们解决了一种CRL问题，其中代理人旨在学习最优策略，以最大化奖励，同时确保在学习过程中保持所需水平的时间逻辑约束满足。我们提出了一个新颖的框架，依赖于在纯学习（奖励最大化）和约束满足之间切换。该框架根据先前试验估计约束满足的概率，并恰当地调整在学习和约束满足策略之间切换的概率。我们在理论上验证了所提出算法的正确性，并通过全面的模拟展示了其性能。

更新时间: 2024-11-27 22:08:00

领域: cs.AI,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.08022v2

Fall Leaf Adversarial Attack on Traffic Sign Classification

Adversarial input image perturbation attacks have emerged as a significant threat to machine learning algorithms, particularly in image classification setting. These attacks involve subtle perturbations to input images that cause neural networks to misclassify the input images, even though the images remain easily recognizable to humans. One critical area where adversarial attacks have been demonstrated is in automotive systems where traffic sign classification and recognition is critical, and where misclassified images can cause autonomous systems to take wrong actions. This work presents a new class of adversarial attacks. Unlike existing work that has focused on adversarial perturbations that leverage human-made artifacts to cause the perturbations, such as adding stickers, paint, or shining flashlights at traffic signs, this work leverages nature-made artifacts: tree leaves. By leveraging nature-made artifacts, the new class of attacks has plausible deniability: a fall leaf stuck to a street sign could come from a near-by tree, rather than be placed there by an malicious human attacker. To evaluate the new class of the adversarial input image perturbation attacks, this work analyses how fall leaves can cause misclassification in street signs. The work evaluates various leaves from different species of trees, and considers various parameters such as size, color due to tree leaf type, and rotation. The work demonstrates high success rate for misclassification. The work also explores the correlation between successful attacks and how they affect the edge detection, which is critical in many image classification algorithms.

Updated: 2024-11-27 22:02:38

标题: 秋叶对交通标志分类的对抗性攻击

摘要: 对抗性输入图像扰动攻击已经成为机器学习算法的一个重要威胁，特别是在图像分类设置中。这些攻击涉及对输入图像进行微小扰动，导致神经网络错误分类输入图像，尽管这些图像对人类仍然容易识别。对抗性攻击已经在汽车系统中得到证明，其中交通标志的分类和识别至关重要，误分类的图像可能会导致自动系统采取错误操作。这项工作提出了一种新的对抗性攻击类别。与现有的侧重于利用人造物品进行扰动的对抗性攻击不同，比如在交通标志上添加贴纸、涂漆或闪光灯，这项工作利用了自然形成的物品：树叶。通过利用自然形成的物品，这种新类攻击具有合理的否认性：一片秋叶粘在路标上可能来自附近的树，而不是被恶意攻击者放置在那里。为了评估这种新类对抗性输入图像扰动攻击，这项工作分析了秋叶如何导致路标误分类。该工作评估了不同树种的各种树叶，并考虑了各种参数，如大小、颜色以及由树叶类型导致的旋转。该工作展示了误分类的高成功率。该工作还探讨了成功攻击与它们对边缘检测的影响之间的相关性，这在许多图像分类算法中至关重要。

更新时间: 2024-11-27 22:02:38

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2411.18776v1

Multi-Task Learning for Integrated Automated Contouring and Voxel-Based Dose Prediction in Radiotherapy

Deep learning-based automated contouring and treatment planning has been proven to improve the efficiency and accuracy of radiotherapy. However, conventional radiotherapy treatment planning process has the automated contouring and treatment planning as separate tasks. Moreover in deep learning (DL), the contouring and dose prediction tasks for automated treatment planning are done independently. In this study, we applied the multi-task learning (MTL) approach in order to seamlessly integrate automated contouring and voxel-based dose prediction tasks, as MTL can leverage common information between the two tasks and be able able to increase the efficiency of the automated tasks. We developed our MTL framework using the two datasets: in-house prostate cancer dataset and the publicly available head and neck cancer dataset, OpenKBP. Compared to the sequential DL contouring and treatment planning tasks, our proposed method using MTL improved the mean absolute difference of dose volume histogram metrics of prostate and head and neck sites by 19.82% and 16.33%, respectively. Our MTL model for automated contouring and dose prediction tasks demonstrated enhanced dose prediction performance while maintaining or sometimes even improving the contouring accuracy. Compared to the baseline automated contouring model with the dice score coefficients of 0.818 for prostate and 0.674 for head and neck datasets, our MTL approach achieved average scores of 0.824 and 0.716 for these datasets, respectively. Our study highlights the potential of the proposed automated contouring and planning using MTL to support the development of efficient and accurate automated treatment planning for radiotherapy.

Updated: 2024-11-27 21:45:03

标题: 放射治疗中集成自动轮廓绘制和基于像素的剂量预测的多任务学习

摘要: 基于深度学习的自动轮廓和治疗计划已被证明可以提高放射治疗的效率和准确性。然而，传统的放射治疗计划过程将自动轮廓和治疗计划作为独立的任务。此外，在深度学习中，自动治疗计划的轮廓和剂量预测任务是独立完成的。在这项研究中，我们应用了多任务学习（MTL）方法，以便无缝整合自动轮廓和基于体素的剂量预测任务，因为MTL可以利用这两个任务之间的共同信息，从而提高自动任务的效率。我们使用两个数据集开发了我们的MTL框架：内部前列腺癌数据集和公开可用的头颈癌数据集OpenKBP。与顺序DL轮廓和治疗计划任务相比，我们提出的使用MTL的方法改善了前列腺和头颈部位剂量容积直方图度量的平均绝对差异分别为19.82％和16.33％。我们的用于自动轮廓和剂量预测任务的MTL模型展示了增强的剂量预测性能，同时保持或有时甚至提高轮廓的准确性。与基线自动轮廓模型相比，前列腺和头颈数据集的骰子分数系数分别为0.818和0.674，我们的MTL方法分别实现了0.824和0.716的平均分数。我们的研究突出了所提出的使用MTL的自动轮廓和治疗计划的潜力，以支持放射治疗的高效和准确的自动治疗计划的发展。

更新时间: 2024-11-27 21:45:03

领域: physics.med-ph,cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.18767v1

CoVis: A Collaborative Framework for Fine-grained Graphic Visual Understanding

Graphic visual content helps in promoting information communication and inspiration divergence. However, the interpretation of visual content currently relies mainly on humans' personal knowledge background, thereby affecting the quality and efficiency of information acquisition and understanding. To improve the quality and efficiency of visual information transmission and avoid the limitation of the observer due to the information cocoon, we propose CoVis, a collaborative framework for fine-grained visual understanding. By designing and implementing a cascaded dual-layer segmentation network coupled with a large-language-model (LLM) based content generator, the framework extracts as much knowledge as possible from an image. Then, it generates visual analytics for images, assisting observers in comprehending imagery from a more holistic perspective. Quantitative experiments and qualitative experiments based on 32 human participants indicate that the CoVis has better performance than current methods in feature extraction and can generate more comprehensive and detailed visual descriptions than current general-purpose large models.

Updated: 2024-11-27 21:38:04

标题: CoVis：一种用于细粒度图形视觉理解的协作框架

摘要: 图形视觉内容有助于促进信息传播和激发灵感分歧。然而，目前对视觉内容的解释主要依赖于人类的个人知识背景，从而影响了信息获取和理解的质量和效率。为了提高视觉信息传输的质量和效率，并避免观察者由于信息茧而受限制，我们提出了CoVis，这是一个用于细粒度视觉理解的协作框架。通过设计和实现一个级联双层分割网络，结合基于大型语言模型（LLM）的内容生成器，该框架从图像中提取尽可能多的知识。然后，它为图像生成视觉分析，帮助观察者从更全面的角度理解图像。基于32名人类参与者的定量实验和定性实验表明，CoVis在特征提取方面的性能优于当前方法，并且能够生成比当前通用大型模型更全面和详细的视觉描述。

更新时间: 2024-11-27 21:38:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18764v1

Classification of Deceased Patients from Non-Deceased Patients using Random Forest and Support Vector Machine Classifiers

Analyzing large datasets and summarizing it into useful information is the heart of the data mining process. In healthcare, information can be converted into knowledge about patient historical patterns and possible future trends. During the COVID-19 pandemic, data mining COVID-19 patient information poses an opportunity to discover patterns that may signal that the patient is at high risk for death. COVID-19 patients die from sepsis, a complex disease process involving multiple organ systems. We extracted the variables physicians are most concerned about regarding viral septic infections. With the aim of distinguishing COVID-19 patients who survive their hospital stay and those COVID-19 who do not, the authors of this study utilize the Support Vector Machine (SVM) and the Random Forest (RF) classification techniques to classify patients according to their demographics, laboratory test results, and preexisting health conditions. After conducting a 10-fold validation procedure, we assessed the performance of the classification through a Receiver Operating Characteristic (ROC) curve, and a Confusion Matrix was used to determine the accuracy of the classifiers. We also performed a cluster analysis on the binary factors, such as if the patient had a preexisting condition and if sepsis was identified, and the numeric values from patient demographics and laboratory test results as predictors.

Updated: 2024-11-27 21:27:54

标题: 使用随机森林和支持向量机分类器对已故患者和非已故患者进行分类

摘要: 分析大型数据集并将其总结为有用信息是数据挖掘过程的核心。在医疗保健领域，信息可以转化为关于患者历史模式和可能未来趋势的知识。在COVID-19大流行期间，对COVID-19患者信息进行数据挖掘提供了发现可能预示患者处于高死亡风险的模式的机会。COVID-19患者死于败血症，这是一个涉及多个器官系统的复杂疾病过程。我们提取了医生最关注的关于病毒性败血症感染的变量。本研究的作者利用支持向量机（SVM）和随机森林（RF）分类技术，根据患者的人口统计学、实验室检测结果和既往健康状况对COVID-19患者进行分类，以区分住院期间幸存和死亡的COVID-19患者。在进行了10倍交叉验证程序后，我们通过接收器工作特性（ROC）曲线评估了分类的性能，并使用混淆矩阵确定了分类器的准确性。我们还对二元因素进行了聚类分析，例如患者是否有既往疾病和是否鉴别出了败血症，并将患者的人口统计学和实验室检测结果的数值作为预测因素。

更新时间: 2024-11-27 21:27:54

领域: cs.LG

下载: http://arxiv.org/abs/2411.18759v1

Cyber-Attack Technique Classification Using Two-Stage Trained Large Language Models

Understanding the attack patterns associated with a cyberattack is crucial for comprehending the attacker's behaviors and implementing the right mitigation measures. However, majority of the information regarding new attacks is typically presented in unstructured text, posing significant challenges for security analysts in collecting necessary information. In this paper, we present a sentence classification system that can identify the attack techniques described in natural language sentences from cyber threat intelligence (CTI) reports. We propose a new method for utilizing auxiliary data with the same labels to improve classification for the low-resource cyberattack classification task. The system first trains the model using the augmented training data and then trains more using only the primary data. We validate our model using the TRAM data1 and the MITRE ATT&CK framework. Experiments show that our method enhances Macro-F1 by 5 to 9 percentage points and keeps Micro-F1 scores competitive when compared to the baseline performance on the TRAM dataset.

Updated: 2024-11-27 21:09:02

标题: 使用两阶段训练的大型语言模型对网络攻击技术进行分类

摘要: 理解与网络攻击相关的攻击模式对于理解攻击者的行为并实施正确的缓解措施至关重要。然而，大多数关于新攻击的信息通常以非结构化文本的形式呈现，这给安全分析人员在收集必要信息方面带来了重大挑战。本文提出了一种句子分类系统，可以从网络威胁情报（CTI）报告中识别自然语言句子中描述的攻击技术。我们提出了一种新的方法，利用具有相同标签的辅助数据来改进低资源网络攻击分类任务的分类。该系统首先使用增强训练数据训练模型，然后仅使用主要数据进行更多训练。我们使用TRAM数据和MITRE ATT&CK框架验证了我们的模型。实验证明，与TRAM数据集上基准性能相比，我们的方法将Macro-F1提高了5至9个百分点，并保持了Micro-F1得分的竞争力。

更新时间: 2024-11-27 21:09:02

领域: cs.LG,cs.CL,cs.CR

下载: http://arxiv.org/abs/2411.18755v1

DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos

Human mesh recovery (HMR) provides rich human body information for various real-world applications. While image-based HMR methods have achieved impressive results, they often struggle to recover humans in dynamic scenarios, leading to temporal inconsistencies and non-smooth 3D motion predictions due to the absence of human motion. In contrast, video-based approaches leverage temporal information to mitigate this issue. In this paper, we present DiffMesh, an innovative motion-aware Diffusion-like framework for video-based HMR. DiffMesh establishes a bridge between diffusion models and human motion, efficiently generating accurate and smooth output mesh sequences by incorporating human motion within the forward process and reverse process in the diffusion model. Extensive experiments are conducted on the widely used datasets (Human3.6M \cite{h36m_pami} and 3DPW \cite{pw3d2018}), which demonstrate the effectiveness and efficiency of our DiffMesh. Visual comparisons in real-world scenarios further highlight DiffMesh's suitability for practical applications.

Updated: 2024-11-27 21:05:33

标题: DiffMesh：一种从视频中恢复人体网格的运动感知扩散框架

摘要: 人体网格恢复（HMR）为各种真实世界应用提供了丰富的人体信息。虽然基于图像的HMR方法取得了令人印象深刻的结果，但它们经常在动态场景中难以恢复人体，导致由于缺乏人体运动而出现时间不一致和非平滑的3D运动预测。相比之下，基于视频的方法利用时间信息来缓解这个问题。在本文中，我们提出了DiffMesh，一种创新的面向视频的HMR的运动感知扩散框架。DiffMesh在扩散模型和人体运动之间建立了桥梁，通过在扩散模型的正向过程和反向过程中结合人体运动，有效地生成准确和平滑的输出网格序列。我们在广泛使用的数据集（Human3.6M和3DPW）上进行了大量实验，证明了我们的DiffMesh的有效性和效率。在真实世界场景中的视觉比较进一步突显了DiffMesh适用于实际应用的特点。

更新时间: 2024-11-27 21:05:33

领域: cs.CV,cs.AI,cs.HC,cs.MM

下载: http://arxiv.org/abs/2303.13397v5

SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models

Large language models (LLMs) have demonstrated remarkable capabilities, but their outputs can sometimes be unreliable or factually incorrect. To address this, we introduce Self Logits Evolution Decoding (SLED), a novel decoding framework that enhances the truthfulness of LLMs without relying on external knowledge bases or requiring further fine-tuning. From an optimization perspective, our SLED framework leverages the latent knowledge embedded within the LLM by contrasting the output logits from the final layer with those from early layers. It then utilizes an approximate gradient approach to enable latent knowledge to guide the self-refinement of outputs, thereby effectively improving factual accuracy. Extensive experiments have been conducted on established benchmarks across a diverse range of model families (LLaMA 2, LLaMA 3, Gemma) and scales (from 2B to 70B), including more advanced architectural configurations such as the mixture of experts (MoE). Our evaluation spans a wide variety of tasks, including multi-choice, open-generation, and adaptations to chain-of-thought reasoning tasks. The results demonstrate that SLED consistently improves factual accuracy by up to 20\% compared to existing decoding methods while maintaining natural language fluency and negligible latency overhead. Furthermore, it can be flexibly combined with other decoding methods to further enhance their performance.

Updated: 2024-11-27 20:59:05

标题: SLED：自我logits演化解码，用于改善大型语言模型中的事实性

摘要: 大型语言模型（LLMs）展示了非凡的能力，但它们的输出有时可能不可靠或事实错误。为了解决这个问题，我们引入了自身Logits进化解码（SLED），这是一种新颖的解码框架，可以增强LLMs的真实性，而不依赖外部知识库或需要进一步的微调。从优化的角度来看，我们的SLED框架利用LLMs中嵌入的潜在知识，通过对比最终层的输出logits和早期层的logits，利用近似梯度方法来使潜在知识引导输出的自我细化，从而有效提高事实准确性。我们在多个已建立的基准测试上进行了广泛实验，涵盖了各种模型系列（LLaMA 2、LLaMA 3、Gemma）和规模（从2B到70B），包括更高级的架构配置，如专家混合（MoE）。我们的评估涵盖了各种任务，包括多选题、开放生成以及对思维链推理任务的适应。结果表明，与现有的解码方法相比，SLED可以使事实准确性提高多达20\%，同时保持自然语言流畅性和可忽略的延迟开销。此外，它可以灵活地与其他解码方法结合，进一步增强它们的性能。

更新时间: 2024-11-27 20:59:05

领域: cs.CL,cs.AI,stat.ML

下载: http://arxiv.org/abs/2411.02433v2

Locally Differentially Private Online Federated Learning With Correlated Noise

We introduce a locally differentially private (LDP) algorithm for online federated learning that employs temporally correlated noise to improve utility while preserving privacy. To address challenges posed by the correlated noise and local updates with streaming non-IID data, we develop a perturbed iterate analysis that controls the impact of the noise on the utility. Moreover, we demonstrate how the drift errors from local updates can be effectively managed for several classes of nonconvex loss functions. Subject to an $(\epsilon,\delta)$-LDP budget, we establish a dynamic regret bound that quantifies the impact of key parameters and the intensity of changes in the dynamic environment on the learning performance. Numerical experiments confirm the efficacy of the proposed algorithm.

Updated: 2024-11-27 20:56:43

标题: 具有相关噪声的本地差分隐私在线联邦学习

摘要: 我们介绍了一种用于在线联邦学习的局部差分隐私（LDP）算法，该算法利用时间相关的噪声来提高效用同时保护隐私。为了解决相关噪声和具有流式非独立同分布数据的局部更新带来的挑战，我们开发了一个扰动迭代分析，控制噪声对效用的影响。此外，我们展示了如何有效管理来自局部更新的漂移误差，适用于多类非凸损失函数。在$(\epsilon,\delta)$-LDP预算的约束下，我们建立了一个动态遗憾界，量化关键参数和动态环境变化强度对学习性能的影响。数值实验证实了所提出算法的有效性。

更新时间: 2024-11-27 20:56:43

领域: cs.LG,cs.DC,stat.ML

下载: http://arxiv.org/abs/2411.18752v1

"Moralized" Multi-Step Jailbreak Prompts: Black-Box Testing of Guardrails in Large Language Models for Verbal Attacks

As the application of large language models continues to expand in various fields, it poses higher challenges to the effectiveness of identifying harmful content generation and guardrail mechanisms. This research aims to evaluate the guardrail effectiveness of GPT-4o, Grok-2 Beta, Llama 3.1 (405B), Gemini 1.5, and Claude 3.5 Sonnet through black-box testing of seemingly ethical multi-step jailbreak prompts. It conducts ethical attacks by designing an identical multi-step prompts that simulates the scenario of "corporate middle managers competing for promotions." The data results show that the guardrails of the above-mentioned LLMs were bypassed and the content of verbal attacks was generated. Claude 3.5 Sonnet's resistance to multi-step jailbreak prompts is more obvious. To ensure objectivity, the experimental process, black box test code, and enhanced guardrail code are uploaded to the GitHub repository: https://github.com/brucewang123456789/GeniusTrail.git.

Updated: 2024-11-27 20:49:44

标题: "Moralized" 多步越狱提示：大语言模型中护栏的黑盒测试，针对言语攻击

摘要: 随着大型语言模型在各个领域的应用不断扩大，识别有害内容生成和防护机制的有效性面临更高挑战。本研究旨在评估GPT-4o、Grok-2 Beta、Llama 3.1（405B）、Gemini 1.5和Claude 3.5 Sonnet的防护效果，通过对看似道德的多步越狱提示进行黑盒测试。通过设计一个模拟“企业中层经理竞争晋升”的相同多步提示进行道德攻击。数据结果显示，上述LLM的防护措施被绕过，并生成了口头攻击内容。Claude 3.5 Sonnet对多步越狱提示的抵抗更为明显。为确保客观性，实验过程、黑盒测试代码和加强防护代码已上传至GitHub存储库：https://github.com/brucewang123456789/GeniusTrail.git。

更新时间: 2024-11-27 20:49:44

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.16730v2

Inference Privacy: Properties and Mechanisms

Ensuring privacy during inference stage is crucial to prevent malicious third parties from reconstructing users' private inputs from outputs of public models. Despite a large body of literature on privacy preserving learning (which ensures privacy of training data), there is no existing systematic framework to ensure the privacy of users' data during inference. Motivated by this problem, we introduce the notion of Inference Privacy (IP), which can allow a user to interact with a model (for instance, a classifier, or an AI-assisted chat-bot) while providing a rigorous privacy guarantee for the users' data at inference. We establish fundamental properties of the IP privacy notion and also contrast it with the notion of Local Differential Privacy (LDP). We then present two types of mechanisms for achieving IP: namely, input perturbations and output perturbations which are customizable by the users and can allow them to navigate the trade-off between utility and privacy. We also demonstrate the usefulness of our framework via experiments and highlight the resulting trade-offs between utility and privacy during inference.

Updated: 2024-11-27 20:47:28

标题: 推理隐私：属性和机制

摘要: 确保在推理阶段保持隐私对于防止恶意第三方从公共模型的输出中重构用户的私人输入至关重要。尽管有大量关于隐私保护学习的文献（确保训练数据的隐私），但目前尚无现有系统框架来确保推理期间用户数据的隐私。受到这个问题的启发，我们引入了推理隐私（IP）的概念，它可以允许用户与模型（例如分类器或人工智能辅助聊天机器人）进行交互，同时为用户的数据提供严格的隐私保证。我们建立了IP隐私概念的基本特性，并将其与局部差分隐私（LDP）的概念进行对比。然后，我们提出了两种实现IP的机制：输入扰动和输出扰动，用户可以根据需要定制，并可以让他们在效用和隐私之间权衡。我们还通过实验展示了我们框架的实用性，并强调了推理过程中效用和隐私之间的权衡。

更新时间: 2024-11-27 20:47:28

领域: cs.CR,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2411.18746v1

Iso-Diffusion: Improving Diffusion Probabilistic Models Using the Isotropy of the Additive Gaussian Noise

Denoising Diffusion Probabilistic Models (DDPMs) have accomplished much in the realm of generative AI. With the tremendous level of popularity the Generative AI algorithms have achieved, the demand for higher levels of performance continues to increase. Under this backdrop, careful scrutinization of algorithm performance under sample fidelity type measures is essential to ascertain how, effectively, the underlying structures of the data distribution were learned. In this context, minimizing the mean squared error between the additive and predicted noise alone does not impose structural integrity constraints on the predicted noise, for instance, isotropic. Under this premise, we were motivated to utilize the isotropy of the additive noise as a constraint on the objective function to enhance the fidelity of DDPMs. Our approach is simple and can be applied to any DDPM variant. We validate our approach by presenting experiments conducted on four synthetic 2D datasets as well as on unconditional image generation. As demonstrated by the results, the incorporation of this constraint improves the fidelity metrics, Precision and Density, and the results clearly indicate how the structural imposition was effective.

Updated: 2024-11-27 20:40:08

标题: Iso-Diffusion: 利用加性高斯噪声的各向同性改进扩散概率模型

摘要: 去噪扩散概率模型（DDPMs）在生成式人工智能领域取得了很大进展。随着生成式人工智能算法的巨大流行，对更高性能水平的需求不断增加。在这种背景下，仔细审查算法在样本保真度类型度量下的性能是必不可少的，以确定数据分布的基本结构是如何有效地学习的。在这种情况下，仅仅最小化附加噪声和预测噪声之间的均方误差不会对预测噪声的结构完整性施加约束，例如各向同性。在这个前提下，我们被激发利用附加噪声的各向同性作为目标函数的约束，以增强DDPMs的保真度。我们的方法简单易行，并可应用于任何DDPM变体。我们通过在四个合成2D数据集上进行的实验以及无条件图像生成来验证我们的方法。正如结果所示，这种约束的引入改善了保真度指标，如精度和密度，结果明确表明结构施加是有效的。

更新时间: 2024-11-27 20:40:08

领域: cs.LG

下载: http://arxiv.org/abs/2403.16790v2

A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms

We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our analog of the Bellman operator and Q-learning, a new control-policy-variable gradient theorem, and a specific gradient ascent algorithm based on this theorem within the context of a specific control-theoretic framework. We empirically evaluate the performance of our control theoretic approach on several classical reinforcement learning tasks, demonstrating significant improvements in solution quality, sample complexity, and running time of our approach over state-of-the-art methods.

Updated: 2024-11-27 20:34:29

标题: 一个通用的控制论方法用于强化学习：理论与算法

摘要: 我们设计了一个控制理论强化学习方法，支持直接学习最优策略。我们建立了该方法的各种理论性质，例如我们类似贝尔曼算子和Q-learning的收敛性和最优性，一个新的控制策略变量梯度定理，以及基于该定理的特定梯度上升算法在特定控制理论框架内的应用。我们在几个经典强化学习任务上对我们的控制理论方法进行了实证评估，展示了我们的方法在解决方案质量、样本复杂度和运行时间方面相比最先进方法的显著改进。

更新时间: 2024-11-27 20:34:29

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2406.14753v3

Adaptive Random Fourier Features Training Stabilized By Resampling With Applications in Image Regression

This paper presents an enhanced adaptive random Fourier features (ARFF) training algorithm for shallow neural networks, building upon the work introduced in "Adaptive Random Fourier Features with Metropolis Sampling", Kammonen et al., \emph{Foundations of Data Science}, 2(3):309--332, 2020. This improved method uses a particle filter-type resampling technique to stabilize the training process and reduce the sensitivity to parameter choices. The Metropolis test can also be omitted when resampling is used, reducing the number of hyperparameters by one and reducing the computational cost per iteration compared to the ARFF method. We present comprehensive numerical experiments demonstrating the efficacy of the proposed algorithm in function regression tasks as a stand-alone method and as a pretraining step before gradient-based optimization, using the Adam optimizer. Furthermore, we apply the proposed algorithm to a simple image regression problem, illustrating its utility in sampling frequencies for the random Fourier features (RFF) layer of coordinate-based multilayer perceptrons. In this context, we use the proposed algorithm to sample the parameters of the RFF layer in an automated manner.

Updated: 2024-11-27 20:24:36

标题: 通过重新采样稳定的自适应随机傅里叶特征训练及在图像回归中的应用

摘要: 本文提出了一种增强的适应性随机傅立叶特征（ARFF）训练算法，用于浅层神经网络，建立在“自适应随机傅立叶特征与Metropolis抽样”一文的基础上，作者为Kammonen等人，发表于《数据科学基础》，2020年，第2卷第3期，页码为309-332。这种改进的方法使用了一种粒子滤波器类型的重采样技术来稳定训练过程，并减少对参数选择的敏感性。当使用重采样时，也可以省略Metropolis测试，从而将超参数数量减少一个，并减少每次迭代的计算成本，相较于ARFF方法。我们展示了全面的数值实验，证明了所提出的算法在函数回归任务中作为独立方法以及作为梯度优化的预训练步骤（使用Adam优化器）的有效性。此外，我们将所提出的算法应用于一个简单的图像回归问题，展示了其在为基于坐标的多层感知器的随机傅立叶特征（RFF）层采样频率方面的实用性。在这种情况下，我们使用所提出的算法来自动化地采样RFF层的参数。

更新时间: 2024-11-27 20:24:36

领域: cs.LG

下载: http://arxiv.org/abs/2410.06399v2

The Performance of the LSTM-based Code Generated by Large Language Models (LLMs) in Forecasting Time Series Data

As an intriguing case is the goodness of the machine and deep learning models generated by these LLMs in conducting automated scientific data analysis, where a data analyst may not have enough expertise in manually coding and optimizing complex deep learning models and codes and thus may opt to leverage LLMs to generate the required models. This paper investigates and compares the performance of the mainstream LLMs, such as ChatGPT, PaLM, LLama, and Falcon, in generating deep learning models for analyzing time series data, an important and popular data type with its prevalent applications in many application domains including financial and stock market. This research conducts a set of controlled experiments where the prompts for generating deep learning-based models are controlled with respect to sensitivity levels of four criteria including 1) Clarify and Specificity, 2) Objective and Intent, 3) Contextual Information, and 4) Format and Style. While the results are relatively mix, we observe some distinct patterns. We notice that using LLMs, we are able to generate deep learning-based models with executable codes for each dataset seperatly whose performance are comparable with the manually crafted and optimized LSTM models for predicting the whole time series dataset. We also noticed that ChatGPT outperforms the other LLMs in generating more accurate models. Furthermore, we observed that the goodness of the generated models vary with respect to the ``temperature'' parameter used in configuring LLMS. The results can be beneficial for data analysts and practitioners who would like to leverage generative AIs to produce good prediction models with acceptable goodness.

Updated: 2024-11-27 20:18:36

标题: 基于大型语言模型（LLMs）生成的LSTM代码在时间序列数据预测中的表现

摘要: 在进行自动化科学数据分析时，机器和深度学习模型的优劣成为一个引人注目的案例。在这种情况下，数据分析师可能没有足够的专业知识来手动编码和优化复杂的深度学习模型和代码，因此可能选择利用LLMs生成所需的模型。本文研究并比较了主流的LLMs，例如ChatGPT、PaLM、LLama和Falcon，在生成用于分析时间序列数据的深度学习模型时的性能。时间序列数据是一种重要且流行的数据类型，在许多应用领域中具有广泛的应用，包括金融和股票市场。这项研究进行了一系列受控实验，其中用于生成基于深度学习的模型的提示受到四个标准的敏感性水平的控制，包括1）澄清和具体性、2）客观和意图、3）上下文信息、以及4）格式和风格。尽管结果相对混合，我们观察到了一些明显的模式。我们注意到，使用LLMs，我们能够为每个数据集生成具有可执行代码的深度学习模型，其性能与手动制作和优化的LSTM模型用于预测整个时间序列数据集的性能相当。我们还注意到，ChatGPT在生成更准确的模型方面优于其他LLMs。此外，我们观察到生成的模型的优劣取决于配置LLMS中使用的“温度”参数。这些结果对于希望利用生成式人工智能产生具有可接受性优势的良好预测模型的数据分析师和实践者可能有益。

更新时间: 2024-11-27 20:18:36

领域: cs.AI,cs.SE

下载: http://arxiv.org/abs/2411.18731v1

Foundation Models in Radiology: What, How, When, Why and Why Not

Recent advances in artificial intelligence have witnessed the emergence of large-scale deep learning models capable of interpreting and generating both textual and imaging data. Such models, typically referred to as foundation models, are trained on extensive corpora of unlabeled data and demonstrate high performance across various tasks. Foundation models have recently received extensive attention from academic, industry, and regulatory bodies. Given the potentially transformative impact that foundation models can have on the field of radiology, this review aims to establish a standardized terminology concerning foundation models, with a specific focus on the requirements of training data, model training paradigms, model capabilities, and evaluation strategies. We further outline potential pathways to facilitate the training of radiology-specific foundation models, with a critical emphasis on elucidating both the benefits and challenges associated with such models. Overall, we envision that this review can unify technical advances and clinical needs in the training of foundation models for radiology in a safe and responsible manner, for ultimately benefiting patients, providers, and radiologists.

Updated: 2024-11-27 20:13:01

标题: 放射学基础模型：什么、如何、何时、为什么以及为什么不

摘要: 最近人工智能的进展见证了大规模深度学习模型的出现，这些模型能够解释和生成文本和图像数据。这种模型通常被称为基础模型，它们是在大量未标记数据的语料库上训练的，并在各种任务中表现出高性能。基础模型最近受到学术界、工业界和监管机构的广泛关注。鉴于基础模型对放射学领域可能产生的变革性影响，本综述旨在建立关于基础模型的标准术语，特别关注训练数据的要求、模型训练范式、模型能力和评估策略。我们进一步概述了促进放射学特定基础模型训练的潜在途径，着重阐明了与这种模型相关的好处和挑战。总体上，我们设想这篇综述可以以安全负责的方式统一技术进步和临床需求，为放射学基础模型的培训带来益处，最终惠及患者、医疗提供者和放射科医生。

更新时间: 2024-11-27 20:13:01

领域: cs.LG

下载: http://arxiv.org/abs/2411.18730v1

Multi-Task Model Merging via Adaptive Weight Disentanglement

Model merging has gained increasing attention as an efficient and effective technique for integrating task-specific weights from various tasks into a unified multi-task model without retraining or additional data. As a representative approach, Task Arithmetic (TA) has demonstrated that combining task vectors through arithmetic operations facilitates efficient capability transfer between different tasks. In this framework, task vectors are obtained by subtracting the parameter values of a pre-trained model from those of individually fine-tuned models initialized from it. Despite the notable effectiveness of TA, interference among task vectors can adversely affect the performance of the merged model. In this paper, we relax the constraints of Task Arithmetic Property and propose Task Consistency Property, which can be regarded as being free from task interference. Through theoretical derivation, we show that such a property can be approximately achieved by seeking orthogonal task vectors. Guiding by this insight, we propose Adaptive Weight Disentanglement (AWD), which decomposes traditional task vectors into a redundant vector and several disentangled task vectors. The primary optimization objective of AWD is to achieve orthogonality among the disentangled task vectors, thereby closely approximating the desired solution. Notably, these disentangled task vectors can be seamlessly integrated into existing merging methodologies. Experimental results demonstrate that our AWD consistently and significantly improves upon previous merging approaches, achieving state-of-the-art results. Our code is available at \href{https://github.com/FarisXiong/AWD.git}{https://github.com/FarisXiong/AWD.git}.

Updated: 2024-11-27 20:08:55

标题: 通过自适应权重解缠合并多任务模型

摘要: 模型合并作为一种有效和高效的技术，用于将来自不同任务的特定权重集成到统一的多任务模型中，无需重新训练或额外的数据。作为一种代表性方法，任务算术（TA）已经证明通过算术操作将任务向量组合起来，可以促进不同任务之间的有效能力转移。在这个框架中，任务向量是通过从一个预训练模型的参数值中减去单独微调的模型的参数值来获得的。尽管TA的有效性值得称赞，任务向量之间的干扰可能会对合并模型的性能产生不利影响。在本文中，我们放宽了任务算术属性的约束，并提出了任务一致性属性，可以看作是不受任务干扰的。通过理论推导，我们表明可以通过寻求正交任务向量来近似实现这种属性。在这种洞察力的指导下，我们提出了自适应权重解缠（AWD），它将传统任务向量分解为冗余向量和几个解缠的任务向量。AWD的主要优化目标是实现解缠任务向量之间的正交性，从而紧密逼近所需的解决方案。值得注意的是，这些解缠的任务向量可以无缝集成到现有的合并方法中。实验结果表明，我们的AWD始终且显著地改进了先前的合并方法，实现了最先进的结果。我们的代码可在\href{https://github.com/FarisXiong/AWD.git}{https://github.com/FarisXiong/AWD.git}找到。

更新时间: 2024-11-27 20:08:55

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2411.18729v1

The Last Mile to Supervised Performance: Semi-Supervised Domain Adaptation for Semantic Segmentation

Supervised deep learning requires massive labeled datasets, but obtaining annotations is not always easy or possible, especially for dense tasks like semantic segmentation. To overcome this issue, numerous works explore Unsupervised Domain Adaptation (UDA), which uses a labeled dataset from another domain (source), or Semi-Supervised Learning (SSL), which trains on a partially labeled set. Despite the success of UDA and SSL, reaching supervised performance at a low annotation cost remains a notoriously elusive goal. To address this, we study the promising setting of Semi-Supervised Domain Adaptation (SSDA). We propose a simple SSDA framework that combines consistency regularization, pixel contrastive learning, and self-training to effectively utilize a few target-domain labels. Our method outperforms prior art in the popular GTA-to-Cityscapes benchmark and shows that as little as 50 target labels can suffice to achieve near-supervised performance. Additional results on Synthia-to-Cityscapes, GTA-to-BDD and Synthia-to-BDD further demonstrate the effectiveness and practical utility of the method. Lastly, we find that existing UDA and SSL methods are not well-suited for the SSDA setting and discuss design patterns to adapt them.

Updated: 2024-11-27 20:07:42

标题: 通往监督性能的最后一英里：用于语义分割的半监督域适应

摘要: 监督深度学习需要大量标记数据集，但获取注释并不总是容易或可能的，特别是对于像语义分割这样的密集任务。为了解决这个问题，许多研究探索了无监督域自适应（UDA），它使用来自另一个领域（源领域）的标记数据集，或者半监督学习（SSL），它在部分标记的数据集上进行训练。尽管UDA和SSL取得了成功，但以低标注成本达到监督性能仍然是一个难以实现的目标。为了解决这个问题，我们研究了半监督域自适应（SSDA）这一有前途的设置。我们提出了一个简单的SSDA框架，结合一致性正则化、像素对比学习和自我训练，有效利用少量目标领域标签。我们的方法在流行的GTA-to-Cityscapes基准测试中优于先前的方法，并表明只需50个目标标签就足以实现接近监督性能。在Synthia-to-Cityscapes、GTA-to-BDD和Synthia-to-BDD上的额外结果进一步展示了该方法的有效性和实用性。最后，我们发现现有的UDA和SSL方法并不适用于SSDA设置，并讨论了如何设计模式来适应它们。

更新时间: 2024-11-27 20:07:42

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.18728v1

Generative Visual Communication in the Era of Vision-Language Models

Visual communication, dating back to prehistoric cave paintings, is the use of visual elements to convey ideas and information. In today's visually saturated world, effective design demands an understanding of graphic design principles, visual storytelling, human psychology, and the ability to distill complex information into clear visuals. This dissertation explores how recent advancements in vision-language models (VLMs) can be leveraged to automate the creation of effective visual communication designs. Although generative models have made great progress in generating images from text, they still struggle to simplify complex ideas into clear, abstract visuals and are constrained by pixel-based outputs, which lack flexibility for many design tasks. To address these challenges, we constrain the models' operational space and introduce task-specific regularizations. We explore various aspects of visual communication, namely, sketches and visual abstraction, typography, animation, and visual inspiration.

Updated: 2024-11-27 20:04:31

标题: 在视觉语言模型时代的生成视觉交流

摘要: 视觉传播可以追溯到史前洞穴壁画，是利用视觉元素传达思想和信息的方式。在当今充斥着视觉内容的世界中，有效的设计需要对图形设计原则、视觉叙事、人类心理学有一定了解，并具备将复杂信息简化为清晰可见的能力。本文探讨了如何利用最新的视觉-语言模型（VLMs）来自动化创建有效的视觉传播设计。尽管生成模型在从文本生成图像方面取得了很大进展，但它们仍然难以将复杂的思想简化为清晰的抽象视觉，并且受限于基于像素的输出，这对于许多设计任务缺乏灵活性。为了解决这些挑战，我们限制了模型的操作空间并引入了特定任务的正则化。我们探讨了视觉传播的各个方面，包括素描和视觉抽象、排版、动画和视觉灵感。

更新时间: 2024-11-27 20:04:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18727v1

Timing Matters: Enhancing User Experience through Temporal Prediction in Smart Homes

Have you ever considered the sheer volume of actions we perform using IoT (Internet of Things) devices within our homes, offices, and daily environments? From the mundane act of flicking a light switch to the precise adjustment of room temperatures, we are surrounded by a wealth of data, each representing a glimpse into user behaviour. While existing research has sought to decipher user behaviours from these interactions and their timestamps, a critical dimension still needs to be explored: the timing of these actions. Despite extensive efforts to understand and forecast user behaviours, the temporal dimension of these interactions has received scant attention. However, the timing of actions holds profound implications for user experience, efficiency, and overall satisfaction with intelligent systems. In our paper, we venture into the less-explored realm of human-centric AI by endeavoring to predict user actions and their timing. To achieve this, we contribute a meticulously synthesized dataset comprising 11k sequences of actions paired with their respective date and time stamps. Building upon this dataset, we propose our model, which employs advanced machine learning techniques for k-class classification over time intervals within a day. To the best of our knowledge, this is the first attempt at time prediction for smart homes. We achieve a 40% (96-class) accuracy across all datasets and an 80% (8-class) accuracy on the dataset containing exact timestamps, showcasing the efficacy of our approach in predicting the temporal dynamics of user actions within smart environments.

Updated: 2024-11-27 19:49:11

标题: 时间很重要：通过智能家居中的时间预测提升用户体验

摘要: 您是否曾考虑过我们在家庭、办公室和日常环境中使用物联网（IoT）设备执行的动作数量之巨大？从简单的开关灯到精确调节室内温度，我们被大量数据所包围，每个数据代表着对用户行为的一瞥。尽管现有研究已尝试从这些交互和时间戳中解读用户行为，但仍有一个关键维度需要探索：这些动作的时间。尽管已经做出了大量努力来了解和预测用户行为，但这些交互的时间维度却受到了很少的关注。然而，动作的时间对用户体验、效率和智能系统的整体满意度具有深远的影响。在我们的论文中，我们冒险进入人本人工智能的较少探索领域，努力预测用户的动作和时间。为了实现这一目标，我们贡献了一个精心合成的数据集，包括11k个动作序列，以及它们各自的日期和时间戳。基于这个数据集，我们提出了我们的模型，该模型利用先进的机器学习技术，对一天内的时间间隔进行k类分类。据我们所知，这是对智能家居进行时间预测的首次尝试。我们在所有数据集上实现了40%（96类）的准确率，在包含确切时间戳的数据集上实现了80%（8类）的准确率，展示了我们的方法在预测智能环境中用户动作的时间动态方面的有效性。

更新时间: 2024-11-27 19:49:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.18719v1

Addressing bias in Recommender Systems: A Case Study on Data Debiasing Techniques in Mobile Games

The mobile gaming industry, particularly the free-to-play sector, has been around for more than a decade, yet it still experiences rapid growth. The concept of games-as-service requires game developers to pay much more attention to recommendations of content in their games. With recommender systems (RS), the inevitable problem of bias in the data comes hand in hand. A lot of research has been done on the case of bias in RS for online retail or services, but much less is available for the specific case of the game industry. Also, in previous works, various debiasing techniques were tested on explicit feedback datasets, while it is much more common in mobile gaming data to only have implicit feedback. This case study aims to identify and categorize potential bias within datasets specific to model-based recommendations in mobile games, review debiasing techniques in the existing literature, and assess their effectiveness on real-world data gathered through implicit feedback. The effectiveness of these methods is then evaluated based on their debiasing quality, data requirements, and computational demands.

Updated: 2024-11-27 19:45:17

标题: 解决推荐系统中的偏见：移动游戏数据去偏见技术案例研究

摘要: 移动游戏行业，特别是免费游戏领域，已经存在超过十年，但仍然经历着快速增长。游戏作为服务的概念要求游戏开发者更加关注游戏内容的推荐。通过推荐系统（RS），数据中的偏见是不可避免的问题。已经有很多关于在线零售或服务中RS偏见的研究，但对于游戏行业的具体情况的研究较少。此外，在先前的研究中，各种去偏见技术都是在显式反馈数据集上进行测试的，而在移动游戏数据中更常见的是只有隐式反馈。本案例研究旨在识别和分类与移动游戏中基于模型推荐相关的数据集中潜在的偏见，审查现有文献中的去偏见技术，并评估它们在通过隐式反馈收集的真实数据上的有效性。然后基于它们的去偏见质量、数据需求和计算要求评估这些方法的有效性。

更新时间: 2024-11-27 19:45:17

领域: cs.LG

下载: http://arxiv.org/abs/2411.18716v1

Explainable deep learning improves human mental models of self-driving cars

Self-driving cars increasingly rely on deep neural networks to achieve human-like driving. However, the opacity of such black-box motion planners makes it challenging for the human behind the wheel to accurately anticipate when they will fail, with potentially catastrophic consequences. Here, we introduce concept-wrapper network (i.e., CW-Net), a method for explaining the behavior of black-box motion planners by grounding their reasoning in human-interpretable concepts. We deploy CW-Net on a real self-driving car and show that the resulting explanations refine the human driver's mental model of the car, allowing them to better predict its behavior and adjust their own behavior accordingly. Unlike previous work using toy domains or simulations, our study presents the first real-world demonstration of how to build authentic autonomous vehicles (AVs) that give interpretable, causally faithful explanations for their decisions, without sacrificing performance. We anticipate our method could be applied to other safety-critical systems with a human in the loop, such as autonomous drones and robotic surgeons. Overall, our study suggests a pathway to explainability for autonomous agents as a whole, which can help make them more transparent, their deployment safer, and their usage more ethical.

Updated: 2024-11-27 19:38:43

标题: 可解释深度学习改进了人类对自动驾驶汽车的认知模型

摘要: 自动驾驶汽车越来越依赖深度神经网络实现类似人类驾驶的功能。然而，这种黑匣子运动规划器的不透明性使得驾驶座后的人难以准确预测它们何时会失败，可能造成灾难性后果。在这里，我们介绍了概念包装网络（即CW-Net），一种通过将黑匣子运动规划器的推理基于人可解释的概念来解释其行为的方法。我们在一辆真实的自动驾驶汽车上部署了CW-Net，并展示了由此产生的解释如何完善了人类驾驶员对汽车的心理模型，使他们能够更好地预测其行为并相应调整自己的行为。与以往利用玩具领域或模拟的研究不同，我们的研究展示了如何构建真正的自主车辆（AVs），这些车辆给出可解释、因果忠实的决策解释，而不牺牲性能。我们预计我们的方法可以应用于其他具有人类参与的安全关键系统，如自主无人机和机器人外科医生。总的来说，我们的研究提出了一种解释自主代理的可行途径，这可以帮助使它们更透明，其部署更安全，使用更合乎道德。

更新时间: 2024-11-27 19:38:43

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.18714v1

Creating Scalable AGI: the Open General Intelligence Framework

Recent advancements in Artificial Intelligence (AI), particularly with Large Language Models (LLMs), have led to significant progress in narrow tasks such as image classification, language translation, coding, and writing. However, these models face limitations in reliability and scalability due to their siloed architectures, which are designed to handle only one data modality (data type) at a time. This single modal approach hinders their ability to integrate the complex set of data points required for real-world challenges and problem-solving tasks like medical diagnosis, quality assurance, equipment troubleshooting, and financial decision-making. Addressing these real-world challenges requires a more capable Artificial General Intelligence (AGI) system. Our primary contribution is the development of the Open General Intelligence (OGI) framework, a novel systems architecture that serves as a macro design reference for AGI. The OGI framework adopts a modular approach to the design of intelligent systems, based on the premise that cognition must occur across multiple specialized modules that can seamlessly operate as a single system. OGI integrates these modules using a dynamic processing system and a fabric interconnect, enabling real-time adaptability, multi-modal integration, and scalable processing. The OGI framework consists of three key components: (1) Overall Macro Design Guidance that directs operational design and processing, (2) a Dynamic Processing System that controls routing, primary goals, instructions, and weighting, and (3) Framework Areas, a set of specialized modules that operate cohesively to form a unified cognitive system. By incorporating known principles from human cognition into AI systems, the OGI framework aims to overcome the challenges observed in today's intelligent systems, paving the way for more holistic and context-aware problem-solving capabilities.

Updated: 2024-11-27 19:25:31

标题: 创建可扩展的AGI：开放通用智能框架

摘要: 最近人工智能（AI）领域的进展，尤其是大型语言模型（LLMs），在狭窄任务中取得了显著进展，如图像分类、语言翻译、编码和写作。然而，这些模型面临可靠性和可扩展性方面的限制，因为它们的独立架构只设计用于处理一种数据模态（数据类型）。这种单模态方法阻碍了它们整合真实世界挑战和问题解决任务所需的复杂数据点的能力，如医学诊断、质量保证、设备故障排除和金融决策。解决这些真实世界挑战需要更强大的通用人工智能（AGI）系统。我们的主要贡献是开发了开放通用智能（OGI）框架，这是一个作为AGI宏观设计参考的新型系统架构。OGI框架采用模块化方法设计智能系统，基于认知必须跨越多个专门模块进行的前提，这些模块可以无缝地作为一个单一系统运作。OGI使用动态处理系统和织物互连集成这些模块，实现实时适应性、多模态整合和可扩展处理。OGI框架包括三个关键组件：（1）指导整体宏观设计的指导方针，（2）控制路由、主要目标、指令和权重的动态处理系统，以及（3）框架区域，一组专门模块共同运作形成统一的认知系统。通过将人类认知原理纳入AI系统，OGI框架旨在克服今天智能系统中观察到的挑战，为更全面和上下文感知的问题解决能力铺平道路。

更新时间: 2024-11-27 19:25:31

领域: cs.AI,I.2; C.5

下载: http://arxiv.org/abs/2411.15832v2

Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students

The impressive essay writing and problem-solving capabilities of large language models (LLMs) like OpenAI's ChatGPT have opened up new avenues in education. Our goal is to gain insights into the widespread use of LLMs among secondary students to inform their future development. Despite school restrictions, our survey of over 300 middle and high school students revealed that a remarkable 70% of students have utilized LLMs, higher than the usage percentage among young adults, and this percentage remains consistent across 7th to 12th grade. Students also reported using LLMs for multiple subjects, including language arts, history, and math assignments, but expressed mixed thoughts on their effectiveness due to occasional hallucinations in historical contexts and incorrect answers for lack of rigorous reasoning. The survey feedback called for LLMs better adapted for students, and also raised questions to developers and educators on how to help students from underserved communities leverage LLMs' capabilities for equal access to advanced education resources. We propose a few ideas to address such issues, including subject-specific models, personalized learning, and AI classrooms.

Updated: 2024-11-27 19:19:34

标题: 拥抱人工智能在教育领域的应用：理解中学生在大型语言模型使用中的增长

摘要: 大型语言模型（LLMs）如OpenAI的ChatGPT展示出了出色的论文写作和问题解决能力，为教育开辟了新的途径。我们的目标是深入了解中学生广泛使用LLMs的情况，以指导其未来发展。尽管受到学校限制，我们对300多名初高中学生进行的调查显示，有70%的学生使用过LLMs，高于年轻成人的使用比例，并且这一比例在7至12年级之间保持一致。学生们还报告使用LLMs进行多个学科的作业，包括语言艺术、历史和数学，但由于在历史背景下偶发的幻觉和缺乏严谨推理导致的错误答案，他们对其有效性表示了不同看法。调查反馈呼吁开发出更适合学生的LLMs，并提出了如何帮助来自弱势群体的学生利用LLMs能力获得平等接触先进教育资源的问题。我们提出了一些解决这些问题的想法，包括专门的学科模型、个性化学习和人工智能课堂。

更新时间: 2024-11-27 19:19:34

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2411.18708v1

Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits

Weight averaging of Stochastic Gradient Descent (SGD) iterates is a popular method for training deep learning models. While it is often used as part of complex training pipelines to improve generalization or serve as a `teacher' model, weight averaging lacks proper evaluation on its own. In this work, we present a systematic study of the Exponential Moving Average (EMA) of weights. We first explore the training dynamics of EMA, give guidelines for hyperparameter tuning, and highlight its good early performance, partly explaining its success as a teacher. We also observe that EMA requires less learning rate decay compared to SGD since averaging naturally reduces noise, introducing a form of implicit regularization. Through extensive experiments, we show that EMA solutions differ from last-iterate solutions. EMA models not only generalize better but also exhibit improved i) robustness to noisy labels, ii) prediction consistency, iii) calibration and iv) transfer learning. Therefore, we suggest that an EMA of weights is a simple yet effective plug-in to improve the performance of deep learning models.

Updated: 2024-11-27 19:14:27

标题: 深度学习中权重的指数移动平均：动态和好处

摘要: 随机梯度下降（SGD）迭代的权重平均是训练深度学习模型的一种流行方法。虽然它经常作为复杂训练流程的一部分用于改善泛化性能或作为“教师”模型，但权重平均缺乏适当的评估。在这项工作中，我们对权重的指数移动平均（EMA）进行了系统研究。我们首先探讨了EMA的训练动态，给出了超参数调整的指导方针，并强调了其良好的早期性能，部分解释了其作为教师模型的成功。我们还观察到，与SGD相比，EMA需要更少的学习率衰减，因为平均自然减少了噪声，引入了一种隐式正则化形式。通过大量实验，我们展示了EMA解决方案与最后迭代解决方案不同。EMA模型不仅泛化能力更好，而且在噪声标签、预测一致性、校准和迁移学习方面表现出改进。因此，我们建议将权重的EMA作为一种简单而有效的插件，用于提高深度学习模型的性能。

更新时间: 2024-11-27 19:14:27

领域: cs.LG

下载: http://arxiv.org/abs/2411.18704v1

Random Walks with Tweedie: A Unified Framework for Diffusion Models

We present a simple template for designing generative diffusion model algorithms based on an interpretation of diffusion sampling as a sequence of random walks. Score-based diffusion models are widely used to generate high-quality images. Diffusion models have also been shown to yield state-of-the-art performance in many inverse problems. While these algorithms are often surprisingly simple, the theory behind them is not, and multiple complex theoretical justifications exist in the literature. Here, we provide a simple and largely self-contained theoretical justification for score-based-diffusion models that avoids using the theory of Markov chains or reverse diffusion, instead centering the theory of random walks and Tweedie's formula. This approach leads to unified algorithmic templates for network training and sampling. In particular, these templates cleanly separate training from sampling, e.g., the noise schedule used during training need not match the one used during sampling. We show that several existing diffusion models correspond to particular choices within this template and demonstrate that other, more straightforward algorithmic choices lead to effective diffusion models. The proposed framework has the added benefit of enabling conditional sampling without any likelihood approximation.

Updated: 2024-11-27 19:13:20

标题: 使用Tweedie分布的随机漫步：扩散模型的统一框架

摘要: 我们提出了一个简单的模板，用于设计基于扩散抽样的生成扩散模型算法，该模板基于将扩散抽样解释为一系列随机漫步。基于评分的扩散模型被广泛用于生成高质量图像。扩散模型还显示出在许多反问题中具有最先进的性能。尽管这些算法通常令人惊讶地简单，但它们背后的理论并非如此，文献中存在多个复杂的理论证明。在这里，我们提供了一个简单且基本独立的理论证明，用于评分扩散模型，避免使用马尔可夫链或逆向扩散的理论，而是集中于随机漫步和Tweedie公式的理论。这种方法导致了网络训练和抽样的统一算法模板。特别是，这些模板清晰地将训练与抽样分开，例如，在训练期间使用的噪声计划不需要与抽样期间使用的相匹配。我们展示了几种现有的扩散模型对应于此模板中的特定选择，并证明其他更为直接的算法选择可以导致有效的扩散模型。所提出的框架还有一个额外的好处，即在不需要任何可能性近似的情况下实现条件抽样。

更新时间: 2024-11-27 19:13:20

领域: cs.CV,cs.AI,cs.LG,eess.IV

下载: http://arxiv.org/abs/2411.18702v1

Scale-MIA: A Scalable Model Inversion Attack against Secure Federated Learning via Latent Space Reconstruction

Federated learning is known for its capability to safeguard the participants' data privacy. However, recently emerged model inversion attacks (MIAs) have shown that a malicious parameter server can reconstruct individual users' local data samples from model updates. The state-of-the-art attacks either rely on computation-intensive iterative optimization methods to reconstruct each input batch, making scaling difficult, or involve the malicious parameter server adding extra modules before the global model architecture, rendering the attacks too conspicuous and easily detectable. To overcome these limitations, we propose Scale-MIA, a novel MIA capable of efficiently and accurately reconstructing local training samples from the aggregated model updates, even when the system is protected by a robust secure aggregation (SA) protocol. Scale-MIA utilizes the inner architecture of models and identifies the latent space as the critical layer for breaching privacy. Scale-MIA decomposes the complex reconstruction task into an innovative two-step process. The first step is to reconstruct the latent space representations (LSRs) from the aggregated model updates using a closed-form inversion mechanism, leveraging specially crafted linear layers. Then in the second step, the LSRs are fed into a fine-tuned generative decoder to reconstruct the whole input batch. We implemented Scale-MIA on commonly used machine learning models and conducted comprehensive experiments across various settings. The results demonstrate that Scale-MIA achieves excellent performance on different datasets, exhibiting high reconstruction rates, accuracy, and attack efficiency on a larger scale compared to state-of-the-art MIAs. Our code is available at https://github.com/unknown123489/Scale-MIA.

Updated: 2024-11-27 19:12:50

标题: 规模MIA：通过潜在空间重建对安全联合学习的可扩展模型反演攻击

摘要: 联邦学习以其保护参与者数据隐私的能力而闻名。然而，最近出现的模型反演攻击（MIAs）表明，恶意参数服务器可以从模型更新中重建单个用户的本地数据样本。目前的攻击要么依赖于计算密集型的迭代优化方法来重建每个输入批次，使得扩展困难，要么涉及恶意参数服务器在全局模型架构之前添加额外模块，使得攻击过于显眼，容易被检测到。为了克服这些限制，我们提出了Scale-MIA，一种新型的MIA，能够从聚合的模型更新中高效准确地重建本地训练样本，即使系统受到强大的安全聚合（SA）协议的保护。Scale-MIA利用模型的内部架构，并将潜在空间识别为侵犯隐私的关键层。Scale-MIA将复杂的重建任务分解为创新的两步过程。第一步是利用特别设计的线性层，通过闭合形式反演机制从聚合的模型更新中重建潜在空间表示（LSRs）。然后在第二步中，LSRs被馈送到经过微调的生成解码器中，以重建整个输入批次。我们在常用的机器学习模型上实现了Scale-MIA，并在各种设置下进行了全面实验。结果表明，与最先进的MIAs相比，Scale-MIA在不同数据集上取得了出色的性能，展现出更高的重建率、准确性和攻击效率。我们的代码可在https://github.com/unknown123489/Scale-MIA找到。

更新时间: 2024-11-27 19:12:50

领域: cs.LG

下载: http://arxiv.org/abs/2311.05808v3

On the Effectiveness of Incremental Training of Large Language Models

Training large language models is a computationally intensive process that often requires substantial resources to achieve state-of-the-art results. Incremental layer-wise training has been proposed as a potential strategy to optimize the training process by progressively introducing layers, with the expectation that this approach would lead to faster convergence and more efficient use of computational resources. In this paper, we investigate the effectiveness of incremental training for LLMs, dividing the training process into multiple stages where layers are added progressively. Our experimental results indicate that while the incremental approach initially demonstrates some computational efficiency, it ultimately requires greater overall computational costs to reach comparable performance to traditional full-scale training. Although the incremental training process can eventually close the performance gap with the baseline, it does so only after significantly extended continual training. These findings suggest that incremental layer-wise training may not be a viable alternative for training large language models, highlighting its limitations and providing valuable insights into the inefficiencies of this approach.

Updated: 2024-11-27 19:11:49

标题: 关于大型语言模型增量训练的有效性

摘要: 训练大型语言模型是一个计算密集型的过程，通常需要大量资源才能达到最先进的结果。逐层递增训练被提出作为一种潜在策略来优化训练过程，通过逐步引入层，希望这种方法能够实现更快的收敛速度和更有效地利用计算资源。本文研究了逐步训练对LLM的有效性，将训练过程分为多个阶段，逐渐添加层。我们的实验结果表明，虽然逐步方法最初展示了一些计算效率，但最终需要更大的计算成本才能达到与传统全面训练相当的性能。尽管逐步训练过程最终可以缩小与基线之间的性能差距，但只有经过显著延长的持续训练才能实现这一点。这些发现表明，逐层逐层训练可能不是训练大型语言模型的可行替代方案，突出了其局限性，并为这种方法的低效提供了宝贵的见解。

更新时间: 2024-11-27 19:11:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.18700v1

An indicator for effectiveness of text-to-image guardrails utilizing the Single-Turn Crescendo Attack (STCA)

The Single-Turn Crescendo Attack (STCA), first introduced in Aqrawi and Abbasi [2024], is an innovative method designed to bypass the ethical safeguards of text-to-text AI models, compelling them to generate harmful content. This technique leverages a strategic escalation of context within a single prompt, combined with trust-building mechanisms, to subtly deceive the model into producing unintended outputs. Extending the application of STCA to text-to-image models, we demonstrate its efficacy by compromising the guardrails of a widely-used model, DALL-E 3, achieving outputs comparable to outputs from the uncensored model Flux Schnell, which served as a baseline control. This study provides a framework for researchers to rigorously evaluate the robustness of guardrails in text-to-image models and benchmark their resilience against adversarial attacks.

Updated: 2024-11-27 19:09:16

标题: 一种利用单旋转渐强攻击（STCA）的文本到图像护栏有效性指标

摘要: 单圈渐强攻击（STCA）最初由Aqrawi和Abbasi [2024]引入，是一种创新方法，旨在规避文本到文本AI模型的道德保障，迫使其生成有害内容。该技术利用单个提示中上下文的战略升级，结合建立信任的机制，巧妙地欺骗模型产生意外的输出。将STCA的应用扩展到文本到图像模型，我们通过破坏广泛使用的模型DALL-E 3的防护栏，展示了其有效性，实现了与未经审查的模型Flux Schnell相当的输出，后者作为基准对照。本研究为研究人员提供了一个框架，以严格评估文本到图像模型中防护栏的强度，并对抗攻击的韧性进行基准测试。

更新时间: 2024-11-27 19:09:16

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2411.18699v1

Regularizing Explanations in Bayesian Convolutional Neural Networks

Neural networks are powerful function approximators with tremendous potential in learning complex distributions. However, they are prone to overfitting on spurious patterns. Bayesian inference provides a principled way to regularize neural networks and give well-calibrated uncertainty estimates. It allows us to specify prior knowledge on weights. However, specifying domain knowledge via distributions over weights is infeasible. Furthermore, it is unable to correct models when they focus on spurious or irrelevant features. New methods within explainable artificial intelligence allow us to regularize explanations in the form of feature importance to add domain knowledge and correct the models' focus. Nevertheless, they are incompatible with Bayesian neural networks, as they require us to modify the loss function. We propose a new explanation regularization method that is compatible with Bayesian inference. Consequently, we can quantify uncertainty and, at the same time, have correct explanations. We test our method using four different datasets. The results show that our method improves predictive performance when models overfit on spurious features or are uncertain of which features to focus on. Moreover, our method performs better than augmenting training data with samples where spurious features are removed through masking. We provide code, data, trained weights, and hyperparameters.

Updated: 2024-11-27 19:06:09

标题: 在贝叶斯卷积神经网络中规范解释

摘要: 神经网络是强大的函数逼近器，在学习复杂分布方面具有巨大潜力。然而，它们容易在虚假模式上过拟合。贝叶斯推断提供了一种原则性的方法来正则化神经网络，并提供良好校准的不确定性估计。它允许我们在权重上指定先验知识。然而，通过权重分布指定领域知识是不可行的。此外，当模型专注于虚假或无关紧要的特征时，它无法纠正模型。在可解释人工智能领域的新方法允许我们在特征重要性的形式中规范解释，以增加领域知识并纠正模型的焦点。然而，它们与贝叶斯神经网络不兼容，因为它们需要我们修改损失函数。我们提出了一种与贝叶斯推断兼容的新解释正则化方法。因此，我们可以量化不确定性，并同时具有正确的解释。我们使用四个不同的数据集测试我们的方法。结果显示，当模型过拟合虚假特征或不确定要关注哪些特征时，我们的方法提高了预测性能。此外，我们的方法比通过遮蔽删除虚假特征的样本来增加训练数据表现更好。我们提供代码、数据、训练权重和超参数。

更新时间: 2024-11-27 19:06:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2105.02653v3

GazeSearch: Radiology Findings Search Benchmark

Medical eye-tracking data is an important information source for understanding how radiologists visually interpret medical images. This information not only improves the accuracy of deep learning models for X-ray analysis but also their interpretability, enhancing transparency in decision-making. However, the current eye-tracking data is dispersed, unprocessed, and ambiguous, making it difficult to derive meaningful insights. Therefore, there is a need to create a new dataset with more focus and purposeful eyetracking data, improving its utility for diagnostic applications. In this work, we propose a refinement method inspired by the target-present visual search challenge: there is a specific finding and fixations are guided to locate it. After refining the existing eye-tracking datasets, we transform them into a curated visual search dataset, called GazeSearch, specifically for radiology findings, where each fixation sequence is purposefully aligned to the task of locating a particular finding. Subsequently, we introduce a scan path prediction baseline, called ChestSearch, specifically tailored to GazeSearch. Finally, we employ the newly introduced GazeSearch as a benchmark to evaluate the performance of current state-of-the-art methods, offering a comprehensive assessment for visual search in the medical imaging domain. Code is available at \url{https://github.com/UARK-AICV/GazeSearch}.

Updated: 2024-11-27 19:01:53

标题: 注视搜索：放射学发现搜索基准

摘要: 医学眼动数据是了解放射科医师如何视觉解释医学图像的重要信息来源。这些信息不仅提高了X射线分析的深度学习模型的准确性，还增强了它们的可解释性，提高了决策透明度。然而，当前的眼动数据分散、未经处理且含糊不清，难以得出有意义的见解。因此，有必要创建一个新的数据集，更专注和有目的地收集眼动数据，以提高其在诊断应用中的效用。在这项工作中，我们提出了一种受到目标出现的视觉搜索挑战启发的精炼方法：有一个特定的发现，注视点被引导到该位置。在对现有眼动数据集进行精炼后，我们将其转化为一个经过策划的视觉搜索数据集，称为GazeSearch，专门用于放射学发现，其中每个注视序列都被有意地对齐到寻找特定发现的任务上。随后，我们引入了一个名为ChestSearch的扫描路径预测基线，专门针对GazeSearch。最后，我们利用新引入的GazeSearch作为基准来评估当前最先进方法的性能，为医学成像领域的视觉搜索提供全面评估。代码可在\url{https://github.com/UARK-AICV/GazeSearch} 上找到。

更新时间: 2024-11-27 19:01:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.05780v2

Wake Vision: A Tailored Dataset and Benchmark Suite for TinyML Computer Vision Applications

Tiny machine learning (TinyML) for low-power devices lacks robust datasets for development. We present Wake Vision, a large-scale dataset for person detection that contains over 6 million quality-filtered images. We provide two variants: Wake Vision (Large) and Wake Vision (Quality), leveraging the large variant for pretraining and knowledge distillation, while the higher-quality labels drive final model performance. The manually labeled validation and test sets reduce error rates from 7.8% to 2.2% compared to previous standards. In addition, we introduce five detailed benchmark sets to evaluate model performance in real-world scenarios, including varying lighting, camera distances, and demographic characteristics. Training with Wake Vision improves accuracy by 1.93% over existing datasets, demonstrating the importance of dataset quality for low-capacity models and dataset size for high-capacity models. The dataset, benchmarks, code, and models are available under the CC-BY 4.0 license at WakeVision.ai, maintained by the Edge AI Foundation.

Updated: 2024-11-27 19:01:52

标题: 唤醒视觉：为TinyML计算机视觉应用量身定制的数据集和基准套件

摘要: 小型机器学习（TinyML）用于低功耗设备缺乏稳健的数据集以进行开发。我们提出Wake Vision，一个包含超过600万张经过质量筛选的人体检测图像的大规模数据集。我们提供两个变种：Wake Vision（Large）和Wake Vision（Quality），利用大型变种进行预训练和知识蒸馏，而更高质量的标签推动最终模型性能。手动标记的验证和测试集将错误率从7.8%降低到2.2%，相比之前的标准。此外，我们引入了五个详细的基准集，以评估模型在真实世界场景中的性能，包括不同的光照、摄像机距离和人口特征。使用Wake Vision进行训练可将准确率提高1.93%，显示了数据集质量对低容量模型和数据集大小对高容量模型的重要性。该数据集、基准、代码和模型可在WakeVision.ai上以CC-BY 4.0许可证获得，由Edge AI Foundation维护。

更新时间: 2024-11-27 19:01:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.00892v3

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

With the widespread deployment of Multimodal Large Language Models (MLLMs) for visual-reasoning tasks, improving their safety has become crucial. Recent research indicates that despite training-time safety alignment, these models remain vulnerable to jailbreak attacks: carefully crafted image-prompt pairs that compel the model to generate harmful content. In this work, we first highlight a critical safety gap, demonstrating that alignment achieved solely through safety training may be insufficient against jailbreak attacks. To address this vulnerability, we propose Immune, an inference-time defense framework that leverages a safe reward model during decoding to defend against jailbreak attacks. Additionally, we provide a rigorous mathematical characterization of Immune, offering provable guarantees against jailbreaks. Extensive evaluations on diverse jailbreak benchmarks using recent MLLMs reveal that Immune effectively enhances model safety while preserving the model's original capabilities. For instance, against text-based jailbreak attacks on LLaVA-1.6, Immune reduces the attack success rate by 57.82% and 16.78% compared to the base MLLM and state-of-the-art defense strategy, respectively.

Updated: 2024-11-27 19:00:10

标题: 《免疫：通过推理时间对齐提高多模式LLMs防止越狱的安全性》

摘要: 随着多模态大型语言模型（MLLMs）在视觉推理任务中的广泛部署，提高它们的安全性变得至关重要。最近的研究表明，尽管在训练时进行了安全对齐，这些模型仍然容易受到越狱攻击的影响：精心设计的图像提示对导致模型生成有害内容。在这项工作中，我们首先强调了一个关键的安全漏洞，证明单纯通过安全训练实现的对齐可能不足以抵御越狱攻击。为了解决这种脆弱性，我们提出了Immune，这是一个推理时间的防御框架，在解码过程中利用安全奖励模型来抵御越狱攻击。此外，我们对Immune进行了严格的数学建模，提供了针对越狱攻击的可证明保证。使用最近的MLLMs对各种越狱基准进行广泛评估，结果表明Immune有效地增强了模型的安全性，同时保留了模型的原始能力。例如，在LLaVA-1.6上针对基于文本的越狱攻击，Immune将攻击成功率分别与基础MLLM和最先进的防御策略相比分别降低了57.82%和16.78%。

更新时间: 2024-11-27 19:00:10

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.18688v1

MatchDiffusion: Training-free Generation of Match-cuts

Match-cuts are powerful cinematic tools that create seamless transitions between scenes, delivering strong visual and metaphorical connections. However, crafting match-cuts is a challenging, resource-intensive process requiring deliberate artistic planning. In MatchDiffusion, we present the first training-free method for match-cut generation using text-to-video diffusion models. MatchDiffusion leverages a key property of diffusion models: early denoising steps define the scene's broad structure, while later steps add details. Guided by this insight, MatchDiffusion employs "Joint Diffusion" to initialize generation for two prompts from shared noise, aligning structure and motion. It then applies "Disjoint Diffusion", allowing the videos to diverge and introduce unique details. This approach produces visually coherent videos suited for match-cuts. User studies and metrics demonstrate MatchDiffusion's effectiveness and potential to democratize match-cut creation.

Updated: 2024-11-27 18:59:59

标题: MatchDiffusion：无需训练的匹配切割生成

摘要: Match-cuts是一种强大的电影工具，可以在场景之间创建无缝过渡，传达强烈的视觉和隐喻联系。然而，制作match-cuts是一个具有挑战性且需要大量资源的过程，需要精心的艺术规划。在MatchDiffusion中，我们提出了第一个无需训练的方法，使用文本到视频扩散模型生成match-cuts。MatchDiffusion利用扩散模型的一个关键特性：早期去噪步骤定义了场景的整体结构，而后续步骤则添加了细节。在这一洞察力的指导下，MatchDiffusion使用“联合扩散”从共享的噪声初始化两个提示的生成，对齐结构和运动。然后应用“不相交扩散”，允许视频分歧并引入独特的细节。这种方法产生了适合match-cuts的视觉连贯的视频。用户研究和指标显示了MatchDiffusion的有效性和潜力，可以使match-cut的创作民主化。

更新时间: 2024-11-27 18:59:59

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.18677v1

Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data

In the 21st-century information age, with the development of big data technology, effectively extracting valuable information from massive data has become a key issue. Traditional data mining methods are inadequate when faced with large-scale, high-dimensional and complex data. Especially when labeled data is scarce, their performance is greatly limited. This study optimizes data mining algorithms by introducing semi-supervised learning methods, aiming to improve the algorithm's ability to utilize unlabeled data, thereby achieving more accurate data analysis and pattern recognition under limited labeled data conditions. Specifically, we adopt a self-training method and combine it with a convolutional neural network (CNN) for image feature extraction and classification, and continuously improve the model prediction performance through an iterative process. The experimental results demonstrate that the proposed method significantly outperforms traditional machine learning techniques such as Support Vector Machine (SVM), XGBoost, and Multi-Layer Perceptron (MLP) on the CIFAR-10 image classification dataset. Notable improvements were observed in key performance metrics, including accuracy, recall, and F1 score. Furthermore, the robustness and noise-resistance capabilities of the semi-supervised CNN model were validated through experiments under varying noise levels, confirming its practical applicability in real-world scenarios.

Updated: 2024-11-27 18:59:50

标题: 利用半监督学习增强在有限标记数据条件下的图像分类数据挖掘

摘要: 在21世纪信息时代，随着大数据技术的发展，有效地从海量数据中提取有价值信息已成为一个关键问题。传统的数据挖掘方法在面对大规模、高维度和复杂数据时显得不足。特别是在标记数据稀缺的情况下，它们的性能受到极大限制。本研究通过引入半监督学习方法优化数据挖掘算法，旨在提高算法利用未标记数据的能力，从而在有限标记数据条件下实现更准确的数据分析和模式识别。具体而言，我们采用自训练方法，并将其与卷积神经网络（CNN）结合，用于图像特征提取和分类，并通过迭代过程持续改善模型预测性能。实验结果表明，所提出的方法在CIFAR-10图像分类数据集上明显优于传统的机器学习技术，如支持向量机（SVM）、XGBoost和多层感知器（MLP）。关键性能指标，包括准确率、召回率和F1分数，均有显著提高。此外，通过在不同噪声水平下的实验验证了半监督CNN模型的稳健性和抗噪声能力，确认了其在实际场景中的实用性。

更新时间: 2024-11-27 18:59:50

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.18622v1

XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models

The applications of LLM Agents are becoming increasingly complex and diverse, leading to a high demand for structured outputs that can be parsed into code, structured function calls, and embodied agent commands. These developments bring significant demands for structured generation in LLM inference. Context-free grammar is a flexible approach to enable structured generation via constrained decoding. However, executing context-free grammar requires going through several stack states over all tokens in vocabulary during runtime, bringing non-negligible overhead for structured generation. In this paper, we propose XGrammar, a flexible and efficient structure generation engine for large language models. XGrammar accelerates context-free grammar execution by dividing the vocabulary into context-independent tokens that can be prechecked and context-dependent tokens that need to be interpreted during runtime. We further build transformations to expand the grammar context and reduce the number of context-independent tokens. Additionally, we build an efficient persistent stack to accelerate the context-dependent token checks. Finally, we co-design the grammar engine with LLM inference engine to overlap grammar computation with GPU executions. Evaluation results show that XGrammar can achieve up to 100x speedup over existing solutions. Combined with an LLM inference engine, it can generate near-zero overhead structure generation in end-to-end low-LLM serving.

Updated: 2024-11-27 18:59:28

标题: XGrammar：大型语言模型的灵活高效结构生成引擎

摘要: LLM代理的应用越来越复杂和多样化，导致对结构化输出的需求增加，这些输出可以被解析成代码、结构化函数调用和具体的代理命令。这些发展带来了对LLM推理中结构化生成的重大需求。无上下文语法是一种灵活的方法，可以通过受限解码实现结构化生成。然而，执行无上下文语法需要在运行时通过多个堆栈状态遍历整个词汇中的所有标记，这会给结构化生成带来不可忽视的开销。在本文中，我们提出了XGrammar，一个灵活而高效的大型语言模型结构生成引擎。XGrammar通过将词汇划分为可以预先检查的无上下文标记和需要在运行时解释的有上下文标记，加速了无上下文语法的执行。我们进一步构建了扩展语法上下文和减少无上下文标记数量的转换。另外，我们构建了一个高效的持久堆栈来加速有上下文标记的检查。最后，我们与LLM推理引擎共同设计了语法引擎，以将语法计算与GPU执行重叠。评估结果显示，XGrammar可以比现有解决方案提高多达100倍的速度。与LLM推理引擎结合使用，它可以在端到端低LLM服务中实现几乎零开销的结构生成。

更新时间: 2024-11-27 18:59:28

领域: cs.CL,cs.AI,cs.PL

下载: http://arxiv.org/abs/2411.15100v2

Cross-modal Information Flow in Multimodal Large Language Models

The recent advancements in auto-regressive multimodal large language models (MLLMs) have demonstrated promising progress for vision-language tasks. While there exists a variety of studies investigating the processing of linguistic information within large language models, little is currently known about the inner working mechanism of MLLMs and how linguistic and visual information interact within these models. In this study, we aim to fill this gap by examining the information flow between different modalities -- language and vision -- in MLLMs, focusing on visual question answering. Specifically, given an image-question pair as input, we investigate where in the model and how the visual and linguistic information are combined to generate the final prediction. Conducting experiments with a series of models from the LLaVA series, we find that there are two distinct stages in the process of integration of the two modalities. In the lower layers, the model first transfers the more general visual features of the whole image into the representations of (linguistic) question tokens. In the middle layers, it once again transfers visual information about specific objects relevant to the question to the respective token positions of the question. Finally, in the higher layers, the resulting multimodal representation is propagated to the last position of the input sequence for the final prediction. Overall, our findings provide a new and comprehensive perspective on the spatial and functional aspects of image and language processing in the MLLMs, thereby facilitating future research into multimodal information localization and editing.

Updated: 2024-11-27 18:59:26

标题: 多模态大语言模型中的跨模态信息流

摘要: 最近自回归多模态大语言模型（MLLMs）的发展展示了在视觉语言任务中取得了令人期待的进展。虽然存在大量研究调查大语言模型内部处理语言信息的情况，但目前对MLLMs的内部工作机制以及语言和视觉信息在这些模型中如何相互作用知之甚少。在本研究中，我们旨在通过研究MLLMs中不同模态之间的信息流动，即语言和视觉，在视觉问答方面填补这一空白。具体而言，给定一对图像-问题作为输入，我们研究模型在何处以及如何将视觉和语言信息相结合以生成最终预测。通过使用LLaVA系列中的一系列模型进行实验，我们发现在整合两种模式的过程中存在两个明显阶段。在较低层中，模型首先将整个图像的更一般的视觉特征转化为（语言的）问题标记的表示。在中间层中，它再次将与问题相关的特定对象的视觉信息传输到问题的相应标记位置。最后，在更高层中，得到的多模态表示被传播到输入序列的最后位置进行最终预测。总的来说，我们的发现为MLLMs中图像和语言处理的空间和功能方面提供了新的全面视角，从而促进了未来对多模态信息定位和编辑的研究。

更新时间: 2024-11-27 18:59:26

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2411.18620v1

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

Text-to-image diffusion models produce impressive results but are frustrating tools for artists who desire fine-grained control. For example, a common use case is to create images of a specific instance in novel contexts, i.e., "identity-preserving generation". This setting, along with many other tasks (e.g., relighting), is a natural fit for image+text-conditional generative models. However, there is insufficient high-quality paired data to train such a model directly. We propose Diffusion Self-Distillation, a method for using a pre-trained text-to-image model to generate its own dataset for text-conditioned image-to-image tasks. We first leverage a text-to-image diffusion model's in-context generation ability to create grids of images and curate a large paired dataset with the help of a Visual-Language Model. We then fine-tune the text-to-image model into a text+image-to-image model using the curated paired dataset. We demonstrate that Diffusion Self-Distillation outperforms existing zero-shot methods and is competitive with per-instance tuning techniques on a wide range of identity-preservation generation tasks, without requiring test-time optimization.

Updated: 2024-11-27 18:58:52

标题: 扩散自蒸馏用于零射定制图像生成

摘要: 文本到图像扩散模型产生令人印象深刻的结果，但对于希望精细控制的艺术家来说是令人沮丧的工具。例如，一个常见的用例是在新颖的情境中创建特定实例的图像，即“保持身份生成”。这种设置，以及许多其他任务（例如，重新照明），是图像+文本条件生成模型的自然适用范围。然而，目前还没有足够高质量的配对数据来直接训练这样的模型。我们提出了扩散自蒸馏（Diffusion Self-Distillation）方法，用于利用预训练的文本到图像模型为文本条件的图像到图像任务生成自己的数据集。我们首先利用文本到图像扩散模型的上下文生成能力创建图像网格，并借助视觉-语言模型的帮助筛选出一个大型配对数据集。然后，我们使用筛选后的配对数据集将文本到图像模型微调为文本+图像到图像模型。我们证明了扩散自蒸馏方法在广泛的保持身份生成任务上优于现有的零样本方法，并与每个实例调整技术竞争，在不需要测试时优化的情况下。

更新时间: 2024-11-27 18:58:52

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2411.18616v1

DINO-LG: A Task-Specific DINO Model for Coronary Calcium Scoring

Coronary artery disease (CAD), one of the leading causes of mortality worldwide, necessitates effective risk assessment strategies, with coronary artery calcium (CAC) scoring via computed tomography (CT) being a key method for prevention. Traditional methods, primarily based on UNET architectures implemented on pre-built models, face challenges like the scarcity of annotated CT scans containing CAC and imbalanced datasets, leading to reduced performance in segmentation and scoring tasks. In this study, we address these limitations by incorporating the self-supervised learning (SSL) technique of DINO (self-distillation with no labels), which trains without requiring CAC-specific annotations, enhancing its robustness in generating distinct features. The DINO-LG model, which leverages label guidance to focus on calcified areas, achieves significant improvements, with a sensitivity of 89% and specificity of 90% for detecting CAC-containing CT slices, compared to the standard DINO model's sensitivity of 79% and specificity of 77%. Additionally, false-negative and false-positive rates are reduced by 49% and 59%, respectively, instilling greater confidence in clinicians when ruling out calcification in low-risk patients and minimizing unnecessary imaging reviews by radiologists. Further, CAC scoring and segmentation tasks are conducted using a basic UNET architecture, applied specifically to CT slices identified by the DINO-LG model as containing calcified areas. This targeted approach enhances CAC scoring accuracy by feeding the UNET model with relevant slices, significantly improving diagnostic precision, reducing both false positives and false negatives, and ultimately lowering overall healthcare costs by minimizing unnecessary tests and treatments, presenting a valuable advancement in CAD risk assessment.

Updated: 2024-11-27 18:58:41

标题: DINO-LG：一种用于冠状动脉钙化评分的任务特定的DINO模型

摘要: 冠状动脉疾病（CAD）是全球死亡的主要原因之一，需要有效的风险评估策略，冠状动脉钙化（CAC）评分通过计算机断层扫描（CT）是预防的关键方法。传统方法主要基于UNET架构在预构建模型上实现，面临着CT扫描中CAC稀缺和数据集不平衡等挑战，导致分割和评分任务的性能降低。在这项研究中，我们通过整合自监督学习（SSL）技术DINO（无标签的自蒸馏），来解决这些限制，该技术无需CAC特定注释即可进行训练，增强其生成独特特征的鲁棒性。DINO-LG模型利用标签指导关注钙化区域，实现了显著改进，对于检测含CAC的CT切片，灵敏度达到了89％，特异性达到了90％，而标准DINO模型的灵敏度为79％，特异性为77％。此外，假阴性和假阳性率分别降低了49％和59％，使临床医生在排除低风险患者的钙化时更加有信心，并通过减少放射科医生的不必要影像复查，最终降低整体医疗成本。此外，使用基本的UNET架构进行CAC评分和分割任务，专门应用于DINO-LG模型识别出含有钙化区域的CT切片。这种有针对性的方法通过向UNET模型提供相关切片来增强CAC评分的准确性，显着提高诊断精度，减少假阳性和假阴性，最终通过最小化不必要的检查和治疗降低整体医疗成本，为CAD风险评估提供了有价值的进展。

更新时间: 2024-11-27 18:58:41

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2411.07976v5

Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective

Advancing towards generalist agents necessitates the concurrent processing of multiple tasks using a unified model, thereby underscoring the growing significance of simultaneous model training on multiple downstream tasks. A common issue in multi-task learning is the occurrence of gradient conflict, which leads to potential competition among different tasks during joint training. This competition often results in improvements in one task at the expense of deterioration in another. Although several optimization methods have been developed to address this issue by manipulating task gradients for better task balancing, they cannot decrease the incidence of gradient conflict. In this paper, we systematically investigate the occurrence of gradient conflict across different methods and propose a strategy to reduce such conflicts through sparse training (ST), wherein only a portion of the model's parameters are updated during training while keeping the rest unchanged. Our extensive experiments demonstrate that ST effectively mitigates conflicting gradients and leads to superior performance. Furthermore, ST can be easily integrated with gradient manipulation techniques, thus enhancing their effectiveness.

Updated: 2024-11-27 18:58:22

标题: 多任务学习中的主动梯度冲突缓解：稀疏训练视角

摘要: 朝着通用代理的发展需要使用统一模型同时处理多个任务，从而强调在多个下游任务上同时进行模型训练的日益重要性。多任务学习中的一个常见问题是梯度冲突的发生，这导致在联合训练过程中不同任务之间的潜在竞争。这种竞争通常会导致一个任务的改善以牺牲另一个任务的恶化。尽管已经开发了几种优化方法来处理这个问题，通过操纵任务梯度以实现更好的任务平衡，但它们不能减少梯度冲突的发生。在本文中，我们系统地研究了不同方法中梯度冲突的发生，并提出了通过稀疏训练（ST）来减少这种冲突的策略，其中在训练过程中只更新模型的一部分参数，同时保持其余部分不变。我们的广泛实验表明，ST有效地减轻了冲突梯度，并导致了更优越的性能。此外，ST可以轻松地与梯度操纵技术结合，从而增强它们的有效性。

更新时间: 2024-11-27 18:58:22

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2411.18615v1

Embodied Red Teaming for Auditing Robotic Foundation Models

Language-conditioned robot models (i.e., robotic foundation models) enable robots to perform a wide range of tasks based on natural language instructions. Despite strong performance on existing benchmarks, evaluating the safety and effectiveness of these models is challenging due to the complexity of testing all possible language variations. Current benchmarks have two key limitations: they rely on a limited set of human-generated instructions, missing many challenging cases, and they focus only on task performance without assessing safety, such as avoiding damage. To address these gaps, we introduce Embodied Red Teaming (ERT), a new evaluation method that generates diverse and challenging instructions to test these models. ERT uses automated red teaming techniques with Vision Language Models (VLMs) to create contextually grounded, difficult instructions. Experimental results show that state-of-the-art models frequently fail or behave unsafely on ERT tests, underscoring the shortcomings of current benchmarks in evaluating real-world performance and safety. Code and videos are available at: https://sites.google.com/view/embodiedredteam.

Updated: 2024-11-27 18:57:26

标题: 具体实施的红队行动用于审计机器人基础模型

摘要: 语言条件机器人模型（即，机器人基础模型）使机器人能够根据自然语言指令执行各种任务。尽管在现有基准测试中表现出色，但由于测试所有可能的语言变化的复杂性，评估这些模型的安全性和有效性具有挑战性。当前的基准测试存在两个关键限制：它们依赖于有限的人工生成的指令集，错过了许多具有挑战性的案例，并且它们仅关注任务绩效而不评估安全性，如避免损坏。为了解决这些差距，我们引入了具有实体红队（ERT）的新评估方法，该方法生成多样化且具有挑战性的指令以测试这些模型。ERT使用自动化的红队技术与视觉语言模型（VLMs）创建具有语境基础、困难的指令。实验结果表明，最先进的模型在ERT测试中经常失败或表现不安全，突显了当前基准测试在评估真实世界性能和安全性方面的缺陷。代码和视频可在以下网址获得：https://sites.google.com/view/embodiedredteam。

更新时间: 2024-11-27 18:57:26

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.18676v1

Robust Offline Reinforcement Learning with Linearly Structured $f$-Divergence Regularization

The Distributionally Robust Markov Decision Process (DRMDP) is a popular framework for addressing dynamics shift in reinforcement learning by learning policies robust to the worst-case transition dynamics within a constrained set. However, solving its dual optimization oracle poses significant challenges, limiting theoretical analysis and computational efficiency. The recently proposed Robust Regularized Markov Decision Process (RRMDP) replaces the uncertainty set constraint with a regularization term on the value function, offering improved scalability and theoretical insights. Yet, existing RRMDP methods rely on unstructured regularization, often leading to overly conservative policies by considering transitions that are unrealistic. To address these issues, we propose a novel framework, the $d$-rectangular linear robust regularized Markov decision process ($d$-RRMDP), which introduces a linear latent structure into both transition kernels and regularization. For the offline RL setting, where an agent learns robust policies from a pre-collected dataset in the nominal environment, we develop a family of algorithms, Robust Regularized Pessimistic Value Iteration (R2PVI), employing linear function approximation and $f$-divergence based regularization terms on transition kernels. We provide instance-dependent upper bounds on the suboptimality gap of R2PVI policies, showing these bounds depend on how well the dataset covers state-action spaces visited by the optimal robust policy under robustly admissible transitions. This term is further shown to be fundamental to $d$-RRMDPs via information-theoretic lower bounds. Finally, numerical experiments validate that R2PVI learns robust policies and is computationally more efficient than methods for constrained DRMDPs.

Updated: 2024-11-27 18:57:03

标题: 具有线性结构的$f$-散度正则化的强健离线强化学习

摘要: 分布鲁棒马尔可夫决策过程（DRMDP）是解决强化学习中动态转移的流行框架，通过学习对于约束集内最坏情况转移动态具有鲁棒性的策略。然而，解决其双重优化预言的过程存在重大挑战，限制了理论分析和计算效率。最近提出的鲁棒正则化马尔可夫决策过程（RRMDP）用值函数上的正则化项取代了不确定性集合约束，提供了更好的可伸缩性和理论洞见。然而，现有的RRMDP方法依赖于非结构化的正则化，通常会考虑到不现实的转移，导致过于保守的策略。为了解决这些问题，我们提出了一个新的框架，即$d$-矩形线性鲁棒正则化马尔可夫决策过程（$d$-RRMDP），它在转移核和正则化中引入了线性潜在结构。对于离线强化学习设置，在这种设置中代理从预先收集的数据集中在名义环境中学习鲁棒策略，我们开发了一系列算法，即鲁棒正则化悲观值迭代（R2PVI），采用线性函数逼近和基于$f$-散度的正则化项在转移核上。我们提供了R2PVI策略的实例相关上界，显示这些上界取决于数据集如何覆盖最优鲁棒策略在鲁棒可接受转移下访问的状态-动作空间。通过信息论下界，进一步表明这一术语对于$d$-RRMDPs是基本的。最后，数值实验验证了R2PVI学习鲁棒策略并且在计算效率上比约束DRMDPs方法更高效。

更新时间: 2024-11-27 18:57:03

领域: cs.LG,cs.AI,cs.RO,stat.ML

下载: http://arxiv.org/abs/2411.18612v1

Benchmarking Uncertainty Disentanglement: Specialized Uncertainties for Specialized Tasks

Uncertainty quantification, once a singular task, has evolved into a spectrum of tasks, including abstained prediction, out-of-distribution detection, and aleatoric uncertainty quantification. The latest goal is disentanglement: the construction of multiple estimators that are each tailored to one and only one source of uncertainty. This paper presents the first benchmark of uncertainty disentanglement. We reimplement and evaluate a comprehensive range of uncertainty estimators, from Bayesian over evidential to deterministic ones, across a diverse range of uncertainty tasks on ImageNet. We find that, despite recent theoretical endeavors, no existing approach provides pairs of disentangled uncertainty estimators in practice. We further find that specialized uncertainty tasks are harder than predictive uncertainty tasks, where we observe saturating performance. Our results provide both practical advice for which uncertainty estimators to use for which specific task, and reveal opportunities for future research toward task-centric and disentangled uncertainties. All our reimplementations and Weights & Biases logs are available at https://github.com/bmucsanyi/untangle.

Updated: 2024-11-27 18:54:22

标题: 基准不确定性分解：专门任务的专门不确定性

摘要: 不确定性量化，曾经是一个独立的任务，已经发展成为一系列任务的光谱，包括放弃预测、超出分布检测和随机不确定性量化。最新的目标是解缠：构建多个估计器，每个估计器都针对一种且仅有一种不确定性来源。本文提出了不确定性解缠的第一个基准。我们重新实现并评估了从贝叶斯到确定性的广泛范围的不确定性估计器，跨越了ImageNet上多样的不确定性任务。我们发现，尽管最近进行了理论努力，但实践中没有任何现有方法提供了解缠的不确定性估计器对。我们进一步发现，专门的不确定性任务比预测不确定性任务更难，我们观察到饱和的表现。我们的结果为特定任务使用哪种不确定性估计器提供了实际建议，并揭示了未来研究面向任务中心和解缠不确定性的机会。我们所有的重新实现和权重与偏差日志都可以在 https://github.com/bmucsanyi/untangle 上找到。

更新时间: 2024-11-27 18:54:22

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.19460v2

GaussianSpeech: Audio-Driven Gaussian Avatars

We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation sequences of photo-realistic, personalized 3D human head avatars from spoken audio. To capture the expressive, detailed nature of human heads, including skin furrowing and finer-scale facial movements, we propose to couple speech signal with 3D Gaussian splatting to create realistic, temporally coherent motion sequences. We propose a compact and efficient 3DGS-based avatar representation that generates expression-dependent color and leverages wrinkle- and perceptually-based losses to synthesize facial details, including wrinkles that occur with different expressions. To enable sequence modeling of 3D Gaussian splats with audio, we devise an audio-conditioned transformer model capable of extracting lip and expression features directly from audio input. Due to the absence of high-quality datasets of talking humans in correspondence with audio, we captured a new large-scale multi-view dataset of audio-visual sequences of talking humans with native English accents and diverse facial geometry. GaussianSpeech consistently achieves state-of-the-art performance with visually natural motion at real time rendering rates, while encompassing diverse facial expressions and styles.

Updated: 2024-11-27 18:54:08

标题: 高斯语音：由音频驱动的高斯头像

摘要: 我们介绍了一种新方法，名为GaussianSpeech，它从口头语音合成高保真度的、逼真的、个性化的3D人头化身的动画序列。为了捕捉人头的表现力和细节本质，包括皮肤皱褶和更精细的面部运动，我们提出将语音信号与3D高斯斑点结合起来，以创建逼真、时间连贯的运动序列。我们提出了一种基于3DGS的紧凑高效的化身表示，其生成表情相关的颜色，并利用皱纹和感知损失来合成面部细节，包括随不同表情而发生的皱纹。为了实现带有音频的3D高斯斑点的序列建模，我们设计了一种音频条件的变压器模型，能够直接从音频输入中提取唇和表情特征。由于缺乏与音频对应的高质量人类说话数据集，我们捕捉了一个新的大规模多视角数据集，其中包含具有本地英语口音和多样化面部几何的说话人的音视频序列。GaussianSpeech在实时渲染速率下稳定达到了最先进的性能，同时涵盖了多样化的面部表情和风格，具有视觉自然的动作。

更新时间: 2024-11-27 18:54:08

领域: cs.CV,cs.AI,cs.GR,cs.SD,eess.AS

下载: http://arxiv.org/abs/2411.18675v1

Task Arithmetic Through The Lens Of One-Shot Federated Learning

Task Arithmetic is a model merging technique that enables the combination of multiple models' capabilities into a single model through simple arithmetic in the weight space, without the need for additional fine-tuning or access to the original training data. However, the factors that determine the success of Task Arithmetic remain unclear. In this paper, we examine Task Arithmetic for multi-task learning by framing it as a one-shot Federated Learning problem. We demonstrate that Task Arithmetic is mathematically equivalent to the commonly used algorithm in Federated Learning, called Federated Averaging (FedAvg). By leveraging well-established theoretical results from FedAvg, we identify two key factors that impact the performance of Task Arithmetic: data heterogeneity and training heterogeneity. To mitigate these challenges, we adapt several algorithms from Federated Learning to improve the effectiveness of Task Arithmetic. Our experiments demonstrate that applying these algorithms can often significantly boost performance of the merged model compared to the original Task Arithmetic approach. This work bridges Task Arithmetic and Federated Learning, offering new theoretical perspectives on Task Arithmetic and improved practical methodologies for model merging.

Updated: 2024-11-27 18:53:41

标题: 通过一次性联邦学习的视角看任务算术

摘要: 任务算术是一种模型融合技术，通过在权重空间中进行简单的算术操作，将多个模型的能力合并成一个单一模型，而无需额外的微调或访问原始训练数据。然而，决定任务算术成功的因素仍不清楚。本文将任务算术视为一次性联邦学习问题，对多任务学习中的任务算术进行了研究。我们证明任务算术在数学上等效于联邦学习中常用的算法，称为联邦平均（FedAvg）。通过利用联邦平均的成熟理论结果，我们确定了影响任务算术性能的两个关键因素：数据异质性和训练异质性。为了减轻这些挑战，我们改编了几种联邦学习算法，以提高任务算术的效果。我们的实验表明，应用这些算法通常可以显著提高合并模型的性能，与原始的任务算术方法相比。这项工作将任务算术与联邦学习联系起来，为任务算术提供了新的理论视角，并改进了模型融合的实用方法。

更新时间: 2024-11-27 18:53:41

领域: cs.LG

下载: http://arxiv.org/abs/2411.18607v1

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this work we explore an alternative, yet simple approach -- active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks.

Updated: 2024-11-27 18:50:15

标题: 主动数据整理有效地提炼大规模多模态模型

摘要: 知识蒸馏（KD）是将大型模型压缩为较小模型的事实标准。先前的研究探索了越来越复杂的知识蒸馏策略，涉及不同的目标函数、教师集合和权重继承。在这项工作中，我们探索了一种替代但简单的方法——作为对比多模态预训练的有效蒸馏的主动数据筛选。我们的简单在线批量选择方法ACID在各种模型、数据和计算配置下优于强大的KD基线。此外，我们发现这种主动数据筛选策略实际上是标准KD的补充，并且可以有效地结合起来训练高性能的推理高效模型。我们简单且可扩展的预训练框架ACED在27个零-shot分类和检索任务中取得了最新的结果，推理FLOPs降低了11%。我们进一步证明，我们的ACED模型为在LiT-Decoder设置中训练生成多模态模型提供了强大的视觉编码器，优于用于图像字幕和视觉问答任务的更大视觉编码器。

更新时间: 2024-11-27 18:50:15

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.18674v1

Data Readiness for AI: A 360-Degree Survey

Artificial Intelligence (AI) applications critically depend on data. Poor quality data produces inaccurate and ineffective AI models that may lead to incorrect or unsafe use. Evaluation of data readiness is a crucial step in improving the quality and appropriateness of data usage for AI. R&D efforts have been spent on improving data quality. However, standardized metrics for evaluating data readiness for use in AI training are still evolving. In this study, we perform a comprehensive survey of metrics used to verify data readiness for AI training. This survey examines more than 140 papers published by ACM Digital Library, IEEE Xplore, journals such as Nature, Springer, and Science Direct, and online articles published by prominent AI experts. This survey aims to propose a taxonomy of data readiness for AI (DRAI) metrics for structured and unstructured datasets. We anticipate that this taxonomy will lead to new standards for DRAI metrics that will be used for enhancing the quality, accuracy, and fairness of AI training and inference.

Updated: 2024-11-27 18:44:07

标题: AI的数据准备工作：一项360度调查

摘要: 人工智能（AI）应用严重依赖数据。质量低劣的数据会产生不准确和无效的AI模型，可能导致错误或不安全的使用。评估数据准备情况是改进数据使用质量和适用性的关键步骤。研发工作已经致力于提高数据质量。然而，评估用于AI训练的数据准备情况的标准化指标仍在不断发展。在这项研究中，我们进行了对用于验证AI训练数据准备情况的指标的全面调查。这项调查涵盖了ACM数字图书馆、IEEE Xplore、《自然》、Springer、Science Direct等期刊以及著名AI专家发表的在线文章，超过140篇。本调查旨在提出一个用于结构化和非结构化数据集的AI数据准备（DRAI）指标分类。我们预计这一分类将引领DRAI指标的新标准，用于增强AI训练和推断的质量、准确性和公平性。

更新时间: 2024-11-27 18:44:07

领域: cs.LG,cs.AI,I.2.0; E.m

下载: http://arxiv.org/abs/2404.05779v2

Biomolecular Analysis of Soil Samples and Rock Imagery for Tracing Evidence of Life Using a Mobile Robot

The search for evidence of past life on Mars presents a tremendous challenge that requires the usage of very advanced robotic technologies to overcome it. Current digital microscopic imagers and spectrometers used for astrobiological examination suffer from limitations such as insufficient resolution, narrow detection range, and lack of portability. To overcome these challenges, this research study presents modifications to the Phoenix rover to expand its capability for detecting biosignatures on Mars. This paper examines the modifications implemented on the Phoenix rover to enhance its capability to detect a broader spectrum of biosignatures. One of the notable improvements comprises the integration of advanced digital microscopic imagers and spectrometers, enabling high-resolution examination of soil samples. Additionally, the mechanical components of the device have been reinforced to enhance maneuverability and optimize subsurface sampling capabilities. Empirical investigations have demonstrated that Phoenix has the capability to navigate diverse geological environments and procure samples for the purpose of biomolecular analysis. The biomolecular instrumentation and hybrid analytical methods showcased in this study demonstrate considerable potential for future astrobiology missions on Mars. The potential for enhancing the system lies in the possibility of broadening the range of detectable biomarkers and biosignatures.

Updated: 2024-11-27 18:38:05

标题: 使用移动机器人对土壤样本和岩石图像进行生命迹象的生物分子分析

摘要: 在火星上寻找过去生命的证据面临着巨大的挑战，需要利用非常先进的机器人技术来克服。目前用于天体生物学研究的数字显微成像仪和光谱仪存在分辨率不足、检测范围狭窄和缺乏便携性等限制。为了克服这些挑战，这项研究对凤凰号漫游车进行了修改，以扩展其在火星上检测生物标志物的能力。本文考察了对凤凰号漫游车实施的修改，以增强其检测更广泛生物标志物的能力。其中一个显著的改进是整合先进的数字显微成像仪和光谱仪，实现对土壤样本的高分辨率检查。此外，设备的机械部件已经得到加固，以增强机动性并优化地下取样能力。实证研究表明，凤凰号具有在不同地质环境中导航并获取样本进行生物分子分析的能力。本研究展示的生物分子仪器和混合分析方法展示了未来火星天体生物学任务的巨大潜力。系统的潜力在于扩大可检测的生物标记物和生物标志物的范围。

更新时间: 2024-11-27 18:38:05

领域: cs.LG,cs.CV,cs.RO

下载: http://arxiv.org/abs/2411.18594v1

From interpretability to inference: an estimation framework for universal approximators

We present a novel framework for estimation and inference for the broad class of universal approximators. Estimation is based on the decomposition of model predictions into Shapley values. Inference relies on analyzing the bias and variance properties of individual Shapley components. We show that Shapley value estimation is asymptotically unbiased, and we introduce Shapley regressions as a tool to uncover the true data generating process from noisy data alone. The well-known case of the linear regression is the special case in our framework if the model is linear in parameters. We present theoretical, numerical, and empirical results for the estimation of heterogeneous treatment effects as our guiding example.

Updated: 2024-11-27 18:31:17

标题: 从可解释性到推断：一种用于通用逼近器的估计框架

摘要: 我们提出了一个新颖的框架，用于估计和推断广泛类别的通用逼近器。估计基于将模型预测分解为Shapley值。推断依赖于分析各个Shapley组件的偏差和方差特性。我们展示了Shapley值估计是渐近无偏的，并引入Shapley回归作为一种工具，仅通过嘈杂数据就能揭示真实的数据生成过程。如果模型在参数中是线性的，我们的框架中的著名情况是线性回归的特例。我们以异质治疗效应的估计作为我们的指导示例，提出了理论、数值和实证结果。

更新时间: 2024-11-27 18:31:17

领域: stat.ML,cs.LG,econ.EM,62G10, 62G20, 62-07, 91-08, 91A12,G.1; G.2; G.3; I.2

下载: http://arxiv.org/abs/1903.04209v5

Automated Literature Review Using NLP Techniques and LLM-Based Retrieval-Augmented Generation

This research presents and compares multiple approaches to automate the generation of literature reviews using several Natural Language Processing (NLP) techniques and retrieval-augmented generation (RAG) with a Large Language Model (LLM). The ever-increasing number of research articles provides a huge challenge for manual literature review. It has resulted in an increased demand for automation. Developing a system capable of automatically generating the literature reviews from only the PDF files as input is the primary objective of this research work. The effectiveness of several Natural Language Processing (NLP) strategies, such as the frequency-based method (spaCy), the transformer model (Simple T5), and retrieval-augmented generation (RAG) with Large Language Model (GPT-3.5-turbo), is evaluated to meet the primary objective. The SciTLDR dataset is chosen for this research experiment and three distinct techniques are utilized to implement three different systems for auto-generating the literature reviews. The ROUGE scores are used for the evaluation of all three systems. Based on the evaluation, the Large Language Model GPT-3.5-turbo achieved the highest ROUGE-1 score, 0.364. The transformer model comes in second place and spaCy is at the last position. Finally, a graphical user interface is created for the best system based on the large language model.

Updated: 2024-11-27 18:27:07

标题: 基于自然语言处理技术和LLM的检索增强生成的自动文献综述

摘要: 本研究提出并比较了多种自动化文献综述生成方法，使用了几种自然语言处理（NLP）技术和检索增强生成（RAG）与大型语言模型（LLM）。不断增加的研究文章数量给手动文献综述带来了巨大挑战。这导致了对自动化的需求增加。开发一个能够仅通过PDF文件作为输入自动生成文献综述的系统是本研究工作的主要目标。为了达到这一主要目标，评估了几种自然语言处理（NLP）策略的有效性，如基于频率的方法（spaCy）、变压器模型（Simple T5）和检索增强生成（RAG）与大型语言模型（GPT-3.5-turbo）。本研究选择了SciTLDR数据集进行实验，并利用三种不同的技术实现了三个不同系统用于自动生成文献综述。ROUGE分数用于评估所有三个系统。根据评估结果，大型语言模型GPT-3.5-turbo获得了最高的ROUGE-1分数，为0.364。变压器模型排名第二，而spaCy排名最后。最后，基于大型语言模型创建了一个最佳系统的图形用户界面。

更新时间: 2024-11-27 18:27:07

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2411.18583v1

Surveying the space of descriptions of a composite system with machine learning

Multivariate information theory provides a general and principled framework for understanding how the components of a complex system are connected. Existing analyses are coarse in nature -- built up from characterizations of discrete subsystems -- and can be computationally prohibitive. In this work, we propose to study the continuous space of possible descriptions of a composite system as a window into its organizational structure. A description consists of specific information conveyed about each of the components, and the space of possible descriptions is equivalent to the space of lossy compression schemes of the components. We introduce a machine learning framework to optimize descriptions that extremize key information theoretic quantities used to characterize organization, such as total correlation and O-information. Through case studies on spin systems, Sudoku boards, and letter sequences from natural language, we identify extremal descriptions that reveal how system-wide variation emerges from individual components. By integrating machine learning into a fine-grained information theoretic analysis of composite random variables, our framework opens a new avenues for probing the structure of real-world complex systems.

Updated: 2024-11-27 18:24:13

标题: 使用机器学习调查复合系统描述空间

摘要: 多元信息理论提供了一个通用和原则性的框架，用于理解复杂系统的组件是如何相互连接的。现有的分析性质粗糙，是由对离散子系统的表征构建而成的，并且在计算上可能是禁止的。在这项工作中，我们提议研究一个复合系统可能描述的连续空间，作为窥视其组织结构的一扇窗口。描述包括有关每个组件传达的具体信息，而可能描述的空间等同于组件的有损压缩方案的空间。我们引入了一个机器学习框架，用于优化描述，使其极小化用于表征组织结构的关键信息理论量，如总相关性和O-信息。通过对自旋系统、数独板和自然语言字母序列的案例研究，我们确定了揭示系统范围变化如何从个体组件中出现的极端描述。通过将机器学习整合到对复合随机变量的细粒度信息理论分析中，我们的框架为探索现实世界复杂系统的结构开辟了新的途径。

更新时间: 2024-11-27 18:24:13

领域: cs.IT,cs.LG,math.IT,physics.data-an

下载: http://arxiv.org/abs/2411.18579v1

Pruning Deep Convolutional Neural Network Using Conditional Mutual Information

Convolutional Neural Networks (CNNs) achieve high performance in image classification tasks but are challenging to deploy on resource-limited hardware due to their large model sizes. To address this issue, we leverage Mutual Information, a metric that provides valuable insights into how deep learning models retain and process information through measuring the shared information between input features or output labels and network layers. In this study, we propose a structured filter-pruning approach for CNNs that identifies and selectively retains the most informative features in each layer. Our approach successively evaluates each layer by ranking the importance of its feature maps based on Conditional Mutual Information (CMI) values, computed using a matrix-based Renyi {\alpha}-order entropy numerical method. We propose several formulations of CMI to capture correlation among features across different layers. We then develop various strategies to determine the cutoff point for CMI values to prune unimportant features. This approach allows parallel pruning in both forward and backward directions and significantly reduces model size while preserving accuracy. Tested on the VGG16 architecture with the CIFAR-10 dataset, the proposed method reduces the number of filters by more than a third, with only a 0.32% drop in test accuracy.

Updated: 2024-11-27 18:23:59

标题: 利用条件互信息修剪深度卷积神经网络

摘要: 卷积神经网络（CNNs）在图像分类任务中取得了高性能，但由于其较大的模型大小，部署在资源有限的硬件上具有挑战性。为了解决这个问题，我们利用互信息，这是一种度量标准，可以为我们提供深度学习模型如何保留和处理信息的宝贵见解，通过测量输入特征或输出标签与网络层之间的共享信息。在本研究中，我们提出了一种针对CNNs的结构化滤波修剪方法，该方法在每个层中识别并选择性地保留最具信息量的特征。我们的方法通过使用基于矩阵的Renyi α阶熵数值方法计算的条件互信息（CMI）值对每个层的重要性进行排名，依次评估每个层。我们提出了几种CMI的表述形式，以捕捉不同层间特征之间的相关性。然后，我们开发了各种策略来确定修剪不重要特征的CMI值的截止点。这种方法允许在前向和后向方向上并行修剪，并在保持准确性的同时显着减小模型大小。在使用CIFAR-10数据集的VGG16架构上进行测试，所提出的方法将滤波器数量减少了超过三分之一，仅降低了0.32%的测试准确率。

更新时间: 2024-11-27 18:23:59

领域: cs.LG

下载: http://arxiv.org/abs/2411.18578v1

On Importance of Code-Mixed Embeddings for Hate Speech Identification

Code-mixing is the practice of using two or more languages in a single sentence, which often occurs in multilingual communities such as India where people commonly speak multiple languages. Classic NLP tools, trained on monolingual data, face challenges when dealing with code-mixed data. Extracting meaningful information from sentences containing multiple languages becomes difficult, particularly in tasks like hate speech detection, due to linguistic variation, cultural nuances, and data sparsity. To address this, we aim to analyze the significance of code-mixed embeddings and evaluate the performance of BERT and HingBERT models (trained on a Hindi-English corpus) in hate speech detection. Our study demonstrates that HingBERT models, benefiting from training on the extensive Hindi-English dataset L3Cube-HingCorpus, outperform BERT models when tested on hate speech text datasets. We also found that code-mixed Hing-FastText performs better than standard English FastText and vanilla BERT models.

Updated: 2024-11-27 18:23:57

标题: 关于代码混合嵌入在仇恨言论识别中的重要性

摘要: 代码混合是在单个句子中使用两种或多种语言的做法，这经常发生在多语社区，如印度，人们通常会说多种语言。经典的自然语言处理工具在处理混合数据时面临挑战，因为它们是在单语数据上训练的。从包含多种语言的句子中提取有意义的信息变得困难，特别是在诸如仇恨言论检测之类的任务中，由于语言变化、文化细微差别和数据稀疏性。为了解决这个问题，我们旨在分析代码混合嵌入的重要性，并评估在仇恨言论检测中训练的BERT和HingBERT模型（在一个印地语-英语语料库上）。我们的研究表明，HingBERT模型在测试仇恨言论文本数据集时表现优于BERT模型，这得益于在广泛的印地语-英语数据集L3Cube-HingCorpus上训练。我们还发现，代码混合的Hing-FastText比标准的英文FastText和普通的BERT模型表现更好。

更新时间: 2024-11-27 18:23:57

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.18577v1

Functional relevance based on the continuous Shapley value

The presence of Artificial Intelligence (AI) in our society is increasing, which brings with it the need to understand the behaviour of AI mechanisms, including machine learning predictive algorithms fed with tabular data, text, or images, among other types of data. This work focuses on interpretability of predictive models based on functional data. Designing interpretability methods for functional data models implies working with a set of features whose size is infinite. In the context of scalar on function regression, we propose an interpretability method based on the Shapley value for continuous games, a mathematical formulation that allows to fairly distribute a global payoff among a continuous set players. The method is illustrated through a set of experiments with simulated and real data sets. The open source Python package ShapleyFDA is also presented.

Updated: 2024-11-27 18:20:00

标题: 基于连续Shapley值的功能相关性

摘要: 我们社会中人工智能（AI）的存在正在增加，这带来了对AI机制行为的理解的需求，包括基于表格数据、文本或图像等数据类型的机器学习预测算法。本文关注基于功能数据的预测模型的可解释性。设计功能数据模型的可解释性方法意味着需要处理一个尺寸无限的特征集。在标量对函数回归的背景下，我们提出了一种基于连续游戏的 Shapley 值的可解释性方法，这是一种数学形式化，可以公平地在一个连续的玩家集合中分配全局回报。该方法通过一系列对模拟和真实数据集的实验进行了说明。同时介绍了开源Python包 ShapleyFDA。

更新时间: 2024-11-27 18:20:00

领域: stat.ML,cs.AI,cs.LG,stat.AP

下载: http://arxiv.org/abs/2411.18575v1

Challenges in Adapting Multilingual LLMs to Low-Resource Languages using LoRA PEFT Tuning

Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities, yet challenges persist in adapting these models for low-resource languages. In this study, we investigate the effects of Low-Rank Adaptation (LoRA) Parameter-Efficient Fine-Tuning (PEFT) on multilingual Gemma models for Marathi, a language with limited resources. Using a translated Alpaca dataset with 52,000 instruction-response pairs, our findings reveal that while evaluation metrics often show a performance decline post-fine-tuning, manual assessments frequently suggest that the fine-tuned models outperform their original counterparts. The observations indicate improvements in target language generation capabilities but a reduction in reasoning abilities following language adaptation. These results underscore the need for improved evaluation methodologies and the creation of high-quality native datasets to accurately assess language-specific model performance in low-resource settings.

Updated: 2024-11-27 18:14:38

标题: 挑战在于如何利用LoRA PEFT调整来适应低资源语言的多语言LLMs

摘要: 大型语言模型(LLMs)展示了出色的多语言能力，但在将这些模型适应低资源语言方面仍存在挑战。本研究探讨了低秩适应（LoRA）参数高效微调（PEFT）对马拉地语的多语言Gemma模型的影响，这是一种资源有限的语言。使用翻译的52,000个指令-响应对的Alpaca数据集，我们的研究发现，尽管评估指标通常显示微调后性能下降，但手动评估经常表明微调的模型胜过其原始对应物。观察结果表明目标语言生成能力得到改进，但语言适应后推理能力降低。这些结果强调了改进评估方法的必要性，以及在低资源环境中准确评估语言特定模型性能的高质量本地数据集的创建。

更新时间: 2024-11-27 18:14:38

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.18571v1

Learning to Project for Cross-Task Knowledge Distillation

Traditional knowledge distillation (KD) relies on a proficient teacher trained on the target task, which is not always available. In this setting, cross-task distillation can be used, enabling the use of any teacher model trained on a different task. However, many KD methods prove ineffective when applied to this cross-task setting. To address this limitation, we propose a simple modification: the use of an inverted projection. We show that this drop-in replacement for a standard projector is effective by learning to disregard any task-specific features which might degrade the student's performance. We find that this simple modification is sufficient for extending many KD methods to the cross-task setting, where the teacher and student tasks can be very different. In doing so, we obtain up to a 1.9% improvement in the cross-task setting compared to the traditional projection, at no additional cost. Our method can obtain significant performance improvements (up to 7%) when using even a randomly-initialised teacher on various tasks such as depth estimation, image translation, and semantic segmentation, despite the lack of any learned knowledge to transfer. To provide conceptual and analytical insights into this result, we show that using an inverted projection allows the distillation loss to be decomposed into a knowledge transfer and a spectral regularisation component. Through this analysis we are additionally able to propose a novel regularisation loss that allows teacher-free distillation, enabling performance improvements of up to 8.57% on ImageNet with no additional training costs.

Updated: 2024-11-27 18:12:42

标题: 学习投影以进行跨任务知识蒸馏

摘要: 传统的知识蒸馏（KD）依赖于在目标任务上接受过训练的熟练教师，但并非总是可用。在这种情况下，可以使用跨任务蒸馏，从而利用在不同任务上接受训练的任何教师模型。然而，许多知识蒸馏方法在应用于这种跨任务设置时被证明是无效的。为了解决这一限制，我们提出了一个简单的修改：使用反向投影。我们展示了这种替代标准投影的方法是有效的，因为它学会忽略可能降低学生表现的任何任务特定特征。我们发现，这种简单的修改足以将许多知识蒸馏方法扩展到跨任务设置中，其中教师和学生的任务可能非常不同。通过这样做，与传统投影相比，在跨任务设置中我们获得了高达1.9%的改进，而没有额外的成本。我们的方法可以在各种任务上（如深度估计、图像转换和语义分割）上获得显著的性能提升（高达7%），即使是在使用随机初始化的教师时也是如此，尽管缺乏任何可转移的学习知识。为了为这一结果提供概念和分析见解，我们表明使用反向投影允许将蒸馏损失分解为知识传输和谱正则化组件。通过这种分析，我们还能够提出一种新型正则化损失，实现无教师蒸馏，使在ImageNet上的性能提高高达8.57%，而无需额外的训练成本。

更新时间: 2024-11-27 18:12:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.14494v2

cedar: Optimized and Unified Machine Learning Input Data Pipelines

The input data pipeline is an essential component of each machine learning (ML) training job. It is responsible for reading massive amounts of training data, processing batches of samples using complex transformations, and loading them onto training nodes at low latency and high throughput. Performant input data systems are becoming increasingly critical, driven by skyrocketing data volumes and training throughput demands. Unfortunately, current input data systems cannot fully leverage key performance optimizations, resulting in hugely inefficient infrastructures that require significant resources - or worse - underutilize expensive accelerators. To address these demands, we present cedar, an optimized and unified programming framework for ML input data pipelines. cedar allows users to define input data pipelines using composable operators that support arbitrary ML frameworks and libraries. cedar introduces an extensible optimizer that systematically applies a complex combination of optimizations (e.g., offloading, caching, prefetching, fusion, and reordering). It orchestrates processing across a customizable set of local and distributed compute resources in order to improve processing performance and efficiency, all without user input. Across eight pipelines, cedar improves performance by up to 1.87x to 10.65x compared to state-of-the-art input data systems.

Updated: 2024-11-27 18:05:57

标题: 雪松：优化和统一的机器学习输入数据管道

摘要: 输入数据管道是每个机器学习（ML）训练作业的基本组成部分。它负责读取大量的训练数据，使用复杂的转换处理样本批次，并以低延迟和高吞吐量将它们加载到训练节点上。高性能的输入数据系统变得日益关键，受到数据量激增和训练吞吐量需求的推动。不幸的是，当前的输入数据系统无法充分利用关键性能优化，导致极为低效的基础设施，需要大量资源 - 或更糟糕的是 - 浪费昂贵的加速器。为了满足这些需求，我们提出了cedar，这是一个针对ML输入数据管道进行优化和统一编程框架。cedar允许用户使用支持任意ML框架和库的可组合操作符来定义输入数据管道。cedar引入了一个可扩展的优化器，系统地应用复杂的优化组合（例如，卸载、缓存、预取、融合和重新排序）。它在可定制的本地和分布式计算资源集上编排处理，以提高处理性能和效率，无需用户输入。在八个管道中，与最先进的输入数据系统相比，cedar的性能提高了1.87倍至10.65倍。

更新时间: 2024-11-27 18:05:57

领域: cs.LG,cs.DC,cs.PF

下载: http://arxiv.org/abs/2401.08895v4

A Pipeline of Neural-Symbolic Integration to Enhance Spatial Reasoning in Large Language Models

Large Language Models (LLMs) have demonstrated impressive capabilities across various tasks. However, LLMs often struggle with spatial reasoning which is one essential part of reasoning and inference and requires understanding complex relationships between objects in space. This paper proposes a novel neural-symbolic framework that enhances LLMs' spatial reasoning abilities. We evaluate our approach on two benchmark datasets: StepGame and SparQA, implementing three distinct strategies: (1) ASP (Answer Set Programming)-based symbolic reasoning, (2) LLM + ASP pipeline using DSPy, and (3) Fact + Logical rules. Our experiments demonstrate significant improvements over the baseline prompting methods, with accuracy increases of 40-50% on StepGame} dataset and 3-13% on the more complex SparQA dataset. The "LLM + ASP" pipeline achieves particularly strong results on the tasks of Finding Relations (FR) and Finding Block (FB) questions, though performance varies across different question types. The impressive results suggest that while neural-symbolic approaches offer promising directions for enhancing spatial reasoning in LLMs, their effectiveness depends heavily on the specific task characteristics and implementation strategies. We propose an integrated, simple yet effective set of strategies using a neural-symbolic pipeline to boost spatial reasoning abilities in LLMs. This pipeline and its strategies demonstrate strong and broader applicability to other reasoning domains in LLMs, such as temporal reasoning, deductive inference etc.

Updated: 2024-11-27 18:04:05

标题: 一个神经符号整合的管道，以增强大型语言模型中的空间推理

摘要: 大型语言模型（LLMs）在各种任务中展示了令人印象深刻的能力。然而，LLMs经常在空间推理方面遇到困难，这是推理和推断的一个重要部分，需要理解空间中物体之间的复杂关系。本文提出了一种新颖的神经符号框架，增强了LLMs的空间推理能力。我们在两个基准数据集StepGame和SparQA上评估了我们的方法，实现了三种不同的策略：（1）基于ASP（Answer Set Programming）的符号推理，（2）LLM + ASP管道使用DSPy，以及（3）事实+逻辑规则。我们的实验表明，与基线提示方法相比，我们的方法在StepGame数据集上的准确率提高了40-50％，在更复杂的SparQA数据集上提高了3-13％。 "LLM + ASP"管道在查找关系（FR）和查找块（FB）问题上取得了特别强大的结果，尽管性能在不同类型的问题上有所变化。这些令人印象深刻的结果表明，虽然神经符号方法为增强LLMs的空间推理提供了有希望的方向，但它们的有效性严重依赖于特定任务特征和实现策略。我们提出了一套综合的、简单而有效的策略，使用神经符号管道来增强LLMs的空间推理能力。这个管道及其策略展示了在LLMs的其他推理领域，如时间推理、演绎推理等方面的强大和更广泛的适用性。

更新时间: 2024-11-27 18:04:05

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.18564v1

DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

Dexterous manipulation with contact-rich interactions is crucial for advanced robotics. While recent diffusion-based planning approaches show promise for simpler manipulation tasks, they often produce unrealistic ghost states (e.g., the object automatically moves without hand contact) or lack adaptability when handling complex sequential interactions. In this work, we introduce DexDiffuser, an interaction-aware diffusion planning framework for adaptive dexterous manipulation. DexDiffuser models joint state-action dynamics through a dual-phase diffusion process which consists of pre-interaction contact alignment and post-contact goal-directed control, enabling goal-adaptive generalizable dexterous manipulation. Additionally, we incorporate dynamics model-based dual guidance and leverage large language models for automated guidance function generation, enhancing generalizability for physical interactions and facilitating diverse goal adaptation through language cues. Experiments on physical interaction tasks such as door opening, pen and block re-orientation, and hammer striking demonstrate DexDiffuser's effectiveness on goals outside training distributions, achieving over twice the average success rate (59.2% vs. 29.5%) compared to existing methods. Our framework achieves 70.0% success on 30-degree door opening, 40.0% and 36.7% on pen and block half-side re-orientation respectively, and 46.7% on hammer nail half drive, highlighting its robustness and flexibility in contact-rich manipulation.

Updated: 2024-11-27 18:03:26

标题: DexDiffuser：面向交互的适应性灵巧操控扩散规划

摘要: 具有接触丰富交互的灵巧操作对于先进的机器人技术至关重要。尽管最近基于扩散的规划方法在简单操作任务上表现出了潜力，但它们常常会产生不切实际的幽灵状态（例如，物体在没有手部接触的情况下自动移动）或在处理复杂的序列交互时缺乏适应性。在这项工作中，我们引入了DexDiffuser，这是一个针对自适应灵巧操作的交互感知扩散规划框架。DexDiffuser通过双相扩散过程模拟联合状态-动作动态，包括预交互接触对齐和后接触目标导向控制，实现了目标自适应的通用灵巧操作。此外，我们结合了基于动态模型的双重引导，并利用大型语言模型进行自动引导功能生成，增强了对物理交互的泛化能力，并通过语言提示促进多样化的目标适应。在门开启、笔和方块重新定向以及锤子击打等物理交互任务上的实验表明，与现有方法相比，DexDiffuser在训练分布之外的目标上实现了超过两倍的平均成功率（59.2％对29.5％）。我们的框架在30度门开启任务上实现了70.0％的成功率，在笔和方块半侧重新定向任务上分别为40.0％和36.7％，在锤子钉半驱动任务上为46.7％，突出了其在接触丰富操作中的鲁棒性和灵活性。

更新时间: 2024-11-27 18:03:26

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.18562v1

Concentration of Cumulative Reward in Markov Decision Processes

In this paper, we investigate the concentration properties of cumulative rewards in Markov Decision Processes (MDPs), focusing on both asymptotic and non-asymptotic settings. We introduce a unified approach to characterize reward concentration in MDPs, covering both infinite-horizon settings (i.e., average and discounted reward frameworks) and finite-horizon setting. Our asymptotic results include the law of large numbers, the central limit theorem, and the law of iterated logarithms, while our non-asymptotic bounds include Azuma-Hoeffding-type inequalities and a non-asymptotic version of the law of iterated logarithms. Additionally, we explore two key implications of our results. First, we analyze the sample path behavior of the difference in rewards between any two stationary policies. Second, we show that two alternative definitions of regret for learning policies proposed in the literature are rate-equivalent. Our proof techniques rely on a novel martingale decomposition of cumulative rewards, properties of the solution to the policy evaluation fixed-point equation, and both asymptotic and non-asymptotic concentration results for martingale difference sequences.

Updated: 2024-11-27 17:51:39

标题: 马尔可夫决策过程中累积奖励的集中度

摘要: 在这篇论文中，我们研究了马尔可夫决策过程（MDPs）中累积奖励的集中性质，重点关注渐近和非渐近环境。我们引入了一种统一方法来表征MDPs中的奖励集中性，涵盖了无限时间跨度设置（即平均和折现奖励框架）和有限时间跨度设置。我们的渐近结果包括大数定律、中心极限定理和对数定理，而我们的非渐近界限包括Azuma-Hoeffding型不等式和对数定理的非渐近版本。此外，我们探讨了我们结果的两个关键含义。首先，我们分析了任意两个固定策略之间奖励差异的样本路径行为。其次，我们展示了文献中提出的学习策略的遗憾两种替代定义是等价的。我们的证明技术依赖于累积奖励的新型鞅分解、策略评估固定点方程解的性质，以及鞅差分序列的渐近和非渐近集中结果。

更新时间: 2024-11-27 17:51:39

领域: cs.LG,cs.SY,eess.SY,stat.ML

下载: http://arxiv.org/abs/2411.18551v1

Markov Equivalence and Consistency in Differentiable Structure Learning

Existing approaches to differentiable structure learning of directed acyclic graphs (DAGs) rely on strong identifiability assumptions in order to guarantee that global minimizers of the acyclicity-constrained optimization problem identifies the true DAG. Moreover, it has been observed empirically that the optimizer may exploit undesirable artifacts in the loss function. We explain and remedy these issues by studying the behavior of differentiable acyclicity-constrained programs under general likelihoods with multiple global minimizers. By carefully regularizing the likelihood, it is possible to identify the sparsest model in the Markov equivalence class, even in the absence of an identifiable parametrization. We first study the Gaussian case in detail, showing how proper regularization of the likelihood defines a score that identifies the sparsest model. Assuming faithfulness, it also recovers the Markov equivalence class. These results are then generalized to general models and likelihoods, where the same claims hold. These theoretical results are validated empirically, showing how this can be done using standard gradient-based optimizers, thus paving the way for differentiable structure learning under general models and losses.

Updated: 2024-11-27 17:49:02

标题: 马尔科夫等价性及在可微结构学习中的一致性

摘要: 现有的针对有向无环图（DAGs）的可微结构学习方法依赖于强可辨识性假设，以确保无环约束优化问题的全局最小值能够识别出真实的DAG。此外，经验观察到优化器可能会利用损失函数中的不良特征。我们通过研究具有多个全局最小值的一般似然函数下的可微无环约束程序的行为来解释和解决这些问题。通过仔细正则化似然函数，即使在没有可辨识参数化的情况下，也可以识别出马尔可夫等价类中最稀疏的模型。我们首先详细研究了高斯情况，展示了如何适当正则化似然函数定义出一个能够识别出最稀疏模型的得分。在假设忠实性的情况下，它还能够恢复马尔可夫等价类。这些结果随后推广到一般模型和似然函数，其中相同的声明仍然成立。这些理论结果经过实证验证，展示了如何使用标准基于梯度的优化器来实现这一点，从而为在一般模型和损失下的可微结构学习铺平道路。

更新时间: 2024-11-27 17:49:02

领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2410.06163v3

DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization

Data visualization (DV) is the fundamental and premise tool to improve the efficiency in conveying the insights behind the big data, which has been widely accepted in existing data-driven world. Task automation in DV, such as converting natural language queries to visualizations (i.e., text-to-vis), generating explanations from visualizations (i.e., vis-to-text), answering DV-related questions in free form (i.e. FeVisQA), and explicating tabular data (i.e., table-to-text), is vital for advancing the field. Despite their potential, the application of pre-trained language models (PLMs) like T5 and BERT in DV has been limited by high costs and challenges in handling cross-modal information, leading to few studies on PLMs for DV. We introduce DataVisT5, a novel PLM tailored for DV that enhances the T5 architecture through a hybrid objective pre-training and multi-task fine-tuning strategy, integrating text and DV datasets to effectively interpret cross-modal semantics. Extensive evaluations on public datasets show that DataVisT5 consistently outperforms current state-of-the-art models on various DV-related tasks. We anticipate that DataVisT5 will not only inspire further research on vertical PLMs but also expand the range of applications for PLMs.

Updated: 2024-11-27 17:42:57

标题: DataVisT5：一种用于共同理解文本和数据可视化的预训练语言模型

摘要: 数据可视化（DV）是提高传达大数据背后洞见效率的基础和前提工具，在现有数据驱动世界中被广泛接受。DV中的任务自动化，如将自然语言查询转换为可视化（即文本到可视化），从可视化生成解释（即可视化到文本），以自由形式回答与DV相关的问题（即FeVisQA），以及解释表格数据（即表格到文本），对于推进该领域至关重要。尽管预训练语言模型（PLMs）如T5和BERT在DV中有潜力，但由于高成本和处理跨模态信息的挑战，导致了对PLMs在DV中的研究较少。我们介绍了DataVisT5，一种专为DV定制的新型PLM，通过混合目标预训练和多任务微调策略增强了T5架构，将文本和DV数据集整合在一起，以有效解释跨模态语义。对公共数据集的广泛评估表明，DataVisT5在各种DV相关任务上始终优于当前最先进的模型。我们预计DataVisT5不仅将激发对垂直PLMs的进一步研究，还将扩大PLMs的应用范围。

更新时间: 2024-11-27 17:42:57

领域: cs.CL,cs.AI,cs.DB

下载: http://arxiv.org/abs/2408.07401v2

An investigation of the Online Payment and Banking System Apps in Bangladesh

Presently, Bangladesh is expending substantial efforts to digitize its national infrastructure, with a significant emphasis on achieving this goal through mobile applications that facilitate online payments and banking system advancements. Despite the lack of knowledge about the security level of these systems, they are currently in frequent use without much consideration. To observe whether they follow the minimum global set standards, we choose to conduct static and dynamic analysis of the applications using available open-source analyzers and open-source tools. This allows us to attempt to extract sensitive information, if possible, and determine whether the applications adhere to the standards of MASVS set by OWASP. We show how we analyzed 17 .apks and a SDK using open source scanner and discover security flaws to the applications, such as weaknesses related to data storage, vulnerable cryptographic elements, insecure network communications, and unsafe utilization of WebViews, detected by the scanner. These outputs demonstrate the need for extensive manual analysis of the application through source code review and dynamic analysis. We further implement reverse engineering and dynamic approach to verify the outputs and expose some applications do not comply with the standard method of network communication. Moreover, we attempt to verify the rest of the potential vulnerabilities in the next phase of our ongoing investigation.

Updated: 2024-11-27 17:37:42

标题: 孟加拉国在线支付和银行系统应用的调查

摘要: 目前，孟加拉国正在大力推动数字化国家基础设施的工作，重点是通过移动应用程序实现在线支付和银行系统的进步。尽管对这些系统的安全级别缺乏了解，但它们目前被广泛使用而没有受到太多考虑。为了观察它们是否符合全球最低标准，我们选择使用现有的开源分析器和开源工具对这些应用程序进行静态和动态分析。这样我们可以尝试提取敏感信息（如果可能的话），并确定这些应用程序是否符合OWASP制定的MASVS标准。我们展示了如何使用开源扫描仪对17个.apk文件和一个SDK进行分析，并发现了与数据存储相关的弱点、易受攻击的加密元素、不安全的网络通信以及对WebViews的不安全利用等应用程序安全漏洞。这些输出表明需要通过源代码审查和动态分析对应用程序进行广泛的手动分析。我们进一步实施了逆向工程和动态方法来验证输出，并揭示了一些应用程序未遵守网络通信的标准方法。此外，我们将在我们正在进行的调查的下一阶段尝试验证其余潜在的漏洞。

更新时间: 2024-11-27 17:37:42

领域: cs.CR,cs.AR,cs.SE

下载: http://arxiv.org/abs/2407.07766v2

NeuroAI for AI Safety

As AI systems become increasingly powerful, the need for safe AI has become more pressing. Humans are an attractive model for AI safety: as the only known agents capable of general intelligence, they perform robustly even under conditions that deviate significantly from prior experiences, explore the world safely, understand pragmatics, and can cooperate to meet their intrinsic goals. Intelligence, when coupled with cooperation and safety mechanisms, can drive sustained progress and well-being. These properties are a function of the architecture of the brain and the learning algorithms it implements. Neuroscience may thus hold important keys to technical AI safety that are currently underexplored and underutilized. In this roadmap, we highlight and critically evaluate several paths toward AI safety inspired by neuroscience: emulating the brain's representations, information processing, and architecture; building robust sensory and motor systems from imitating brain data and bodies; fine-tuning AI systems on brain data; advancing interpretability using neuroscience methods; and scaling up cognitively-inspired architectures. We make several concrete recommendations for how neuroscience can positively impact AI safety.

Updated: 2024-11-27 17:18:51

标题: 神经网络人工智能用于人工智能安全

摘要: 随着人工智能系统变得越来越强大，对安全人工智能的需求变得更加迫切。人类是安全人工智能的一个吸引人的模型：作为唯一已知能够具备一般智能的代理，他们在即使在与先前经验明显偏离的情况下也能表现出稳健性，安全地探索世界，理解实用逻辑，并能够合作以实现其内在目标。智能，当与合作和安全机制结合时，可以推动持续进步和幸福。这些特性是大脑结构和其实施的学习算法的功能。神经科学可能因此拥有重要的技术人工智能安全的关键，目前这些关键被低估和未充分利用。在这个路线图中，我们突出并批判性地评估了受到神经科学启发的几条通向人工智能安全的途径：模仿大脑的表示、信息处理和架构；从模仿大脑数据和身体构建稳健的感觉和运动系统；通过脑数据对人工智能系统进行微调；利用神经科学方法推进可解释性；以及扩展认知启发的架构。我们提出了几个具体的建议，说明神经科学如何可以积极影响人工智能安全。

更新时间: 2024-11-27 17:18:51

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.18526v1

Perturbation Ontology based Graph Attention Networks

In recent years, graph representation learning has undergone a paradigm shift, driven by the emergence and proliferation of graph neural networks (GNNs) and their heterogeneous counterparts. Heterogeneous GNNs have shown remarkable success in extracting low-dimensional embeddings from complex graphs that encompass diverse entity types and relationships. While meta-path-based techniques have long been recognized for their ability to capture semantic affinities among nodes, their dependence on manual specification poses a significant limitation. In contrast, matrix-focused methods accelerate processing by utilizing structural cues but often overlook contextual richness. In this paper, we challenge the current paradigm by introducing ontology as a fundamental semantic primitive within complex graphs. Our goal is to integrate the strengths of both matrix-centric and meta-path-based approaches into a unified framework. We propose perturbation Ontology-based Graph Attention Networks (POGAT), a novel methodology that combines ontology subgraphs with an advanced self-supervised learning paradigm to achieve a deep contextual understanding. The core innovation of POGAT lies in our enhanced homogeneous perturbing scheme designed to generate rigorous negative samples, encouraging the model to explore minimal contextual features more thoroughly. Through extensive empirical evaluations, we demonstrate that POGAT significantly outperforms state-of-the-art baselines, achieving a groundbreaking improvement of up to 10.78\% in F1-score for the critical task of link prediction and 12.01\% in Micro-F1 for the critical task of node classification.

Updated: 2024-11-27 17:12:14

标题: 扰动本体基于图注意力网络

摘要: 最近几年，图表示学习经历了一个范式转变，这是由图神经网络（GNNs）及其异构对应物的出现和普及所推动的。异构GNNs在从包含不同实体类型和关系的复杂图中提取低维嵌入方面取得了显著成功。虽然基于元路径的技术长期以来被认为能够捕捉节点之间的语义关联，但它们依赖于手动规范，这构成了一个重要的限制。相反，基于矩阵的方法通过利用结构线索加速处理，但往往忽视了上下文丰富性。在本文中，我们通过将本体论引入复杂图作为基本语义原语，挑战当前的范式。我们的目标是将矩阵为中心和基于元路径的方法的优势融合到一个统一的框架中。我们提出扰动本体图注意力网络（POGAT），这是一种将本体子图与先进的自监督学习范式结合起来，以实现对深层上下文的理解的新方法。POGAT的核心创新在于我们增强的同质扰动方案，旨在生成严格的负样本，鼓励模型更彻底地探索最小的上下文特征。通过大量的实证评估，我们证明POGAT明显优于最先进的基线，对于关键的链接预测任务，F1得分的改进高达10.78％，对于关键的节点分类任务，Micro-F1的改进为12.01％。

更新时间: 2024-11-27 17:12:14

领域: cs.LG

下载: http://arxiv.org/abs/2411.18520v1

Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems

Training reinforcement learning-based recommender systems is often hindered by the lack of dynamic and realistic user interactions. To address this limitation, we introduce Lusifer, a novel environment leveraging Large Language Models (LLMs) to generate simulated user feedback. Lusifer synthesizes user profiles and interaction histories to simulate responses and behaviors toward recommended items, with profiles updated after each rating to reflect evolving user characteristics. Utilizing the MovieLens dataset as a proof of concept, we limited our implementation to the last 40 interactions for each user, representing approximately 39% and 22% of the training sets, to focus on recent user behavior. For consistency and to gain insights into the performance of traditional methods with limited data, we implemented baseline approaches using the same data subset. Our results demonstrate that Lusifer accurately emulates user behavior and preferences, even with reduced training data having an RMSE of 1.3 across various test sets. This paper presents Lusifer's operational pipeline, including prompt generation and iterative user profile updates, and compares its performance against baseline methods. The findings validate Lusifer's ability to produce realistic dynamic feedback and suggest that it offers a scalable and adjustable framework for user simulation in online reinforcement learning recommender systems for future studies, particularly when training data is limited.

Updated: 2024-11-27 17:07:41

标题: Lusifer：基于LLM的用户模拟反馈环境，用于在线推荐系统

摘要: 培训基于强化学习的推荐系统通常受到缺乏动态和真实用户交互的限制。为了解决这一限制，我们引入了Lusifer，这是一个利用大型语言模型（LLMs）生成模拟用户反馈的新颖环境。Lusifer综合用户档案和互动历史，模拟对推荐物品的响应和行为，每次评分后更新档案以反映演化中的用户特征。我们利用MovieLens数据集作为概念验证，将实现限制在每个用户的最后40次互动，约占训练集的39%和22%，以便关注最近用户行为。为了保持一致性并了解在有限数据下传统方法的性能，我们使用相同的数据子集实现了基准方法。我们的结果表明，即使在训练数据减少的情况下，Lusifer仍能准确模拟用户行为和偏好，在各种测试集上具有1.3的RMSE。本文介绍了Lusifer的操作流程，包括提示生成和迭代用户档案更新，并将其性能与基准方法进行了比较。研究结果验证了Lusifer产生真实动态反馈的能力，并建议它为未来研究中在线强化学习推荐系统中用户模拟提供了可扩展和可调整的框架，特别是在训练数据有限的情况下。

更新时间: 2024-11-27 17:07:41

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.13362v2

Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data

The impressive capabilities of large language models (LLMs) have sparked debate over whether these models genuinely generalize to unseen tasks or predominantly rely on memorizing vast amounts of pretraining data. To explore this issue, we introduce an extended concept of memorization, distributional memorization, which measures the correlation between the LLM output probabilities and the pretraining data frequency. To effectively capture task-specific pretraining data frequency, we propose a novel task-gram language model, which is built by counting the co-occurrence of semantically related $n$-gram pairs from task inputs and outputs in the pretraining corpus. Using the Pythia models trained on the Pile dataset, we evaluate four distinct tasks: machine translation, factual question answering, world knowledge understanding, and math reasoning. Our findings reveal varying levels of memorization, with the strongest effect observed in factual question answering. Furthermore, while model performance improves across all tasks as LLM size increases, only factual question answering shows an increase in memorization, whereas machine translation and reasoning tasks exhibit greater generalization, producing more novel outputs. This study demonstrates that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks, providing a scalable method for analyzing large pretraining corpora in greater depth. We also show the practical implications of our analysis through a novel prompt optimization algorithm.

Updated: 2024-11-27 17:05:16

标题: 泛化与记忆：将语言模型的能力追溯到预训练数据

摘要: 大型语言模型（LLMs）的强大能力引发了关于这些模型是否真正适用于未知任务或主要依赖于记忆大量预训练数据的争论。为了探讨这个问题，我们引入了一个扩展概念，即分布式记忆，它衡量了LLM输出概率与预训练数据频率之间的相关性。为了有效捕捉特定任务的预训练数据频率，我们提出了一种新颖的任务语言模型，该模型通过计算预训练语料库中任务输入和输出中语义相关的n-gram对的共现来构建。使用在Pile数据集上训练的Pythia模型，我们评估了四个不同的任务：机器翻译，事实问题回答，世界知识理解和数学推理。我们的研究结果显示不同程度的记忆效应，其中在事实问题回答中观察到最强烈的效应。此外，随着LLM大小的增加，所有任务的模型性能都有所提升，但只有在事实问题回答中才显示出记忆增加，而机器翻译和推理任务表现出更大的泛化能力，产生更多新颖的输出。这项研究表明，记忆在简单的、知识密集型的任务中发挥着更大的作用，而泛化是更难、基于推理的任务的关键，为深入分析大型预训练语料库提供了可扩展的方法。我们还通过一种新颖的提示优化算法展示了我们分析的实际意义。

更新时间: 2024-11-27 17:05:16

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.14985v4

Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection

A strategy used by malicious actors is to "live off the land," where benign systems and tools already available on a victim's systems are used and repurposed for the malicious actor's intent. In this work, we ask if there is a way for anti-virus developers to similarly re-purpose existing work to improve their malware detection capability. We show that this is plausible via YARA rules, which use human-written signatures to detect specific malware families, functionalities, or other markers of interest. By extracting sub-signatures from publicly available YARA rules, we assembled a set of features that can more effectively discriminate malicious samples from benign ones. Our experiments demonstrate that these features add value beyond traditional features on the EMBER 2018 dataset. Manual analysis of the added sub-signatures shows a power-law behavior in a combination of features that are specific and unique, as well as features that occur often. A prior expectation may be that the features would be limited in being overly specific to unique malware families. This behavior is observed, and is apparently useful in practice. In addition, we also find sub-signatures that are dual-purpose (e.g., detecting virtual machine environments) or broadly generic (e.g., DLL imports).

Updated: 2024-11-27 17:03:00

标题: Living off the Analyst: 从Yara规则中提取特征以进行恶意软件检测

摘要: 恶意行为者使用的一种策略是“依靠现有资源生存”，即利用受害者系统上已有的良性系统和工具，并重新用于恶意行为者的目的。在这项工作中，我们探讨是否反病毒开发人员可以类似地重新利用现有工作来提高其恶意软件检测能力。我们展示了通过YARA规则实现这一点是可行的，YARA规则使用人工编写的签名来检测特定的恶意软件系列、功能或其他感兴趣的标记。通过从公开可获得的YARA规则中提取子签名，我们组装了一组特征，可以更有效地区分恶意样本和良性样本。我们的实验表明，这些特征在EMBER 2018数据集上比传统特征增加了价值。对添加的子签名进行手动分析显示，这些特征的组合呈现幂律行为，既具有特定和独特的特征，又具有频繁出现的特征。先前的预期可能是这些特征会受限于过于特定于独特的恶意软件系列。这种行为得到观察，并且在实践中显然很有用。此外，我们还发现一些双重用途的子签名（例如，检测虚拟机环境）或广泛通用的子签名（例如，DLL导入）。

更新时间: 2024-11-27 17:03:00

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2411.18516v1

Federated Low-Rank Adaptation with Differential Privacy over Wireless Networks

Fine-tuning large pre-trained foundation models (FMs) on distributed edge devices presents considerable computational and privacy challenges. Federated fine-tuning (FedFT) mitigates some privacy issues by facilitating collaborative model training without the need to share raw data. To lessen the computational burden on resource-limited devices, combining low-rank adaptation (LoRA) with federated learning enables parameter-efficient fine-tuning. Additionally, the split FedFT architecture partitions an FM between edge devices and a central server, reducing the necessity for complete model deployment on individual devices. However, the risk of privacy eavesdropping attacks in FedFT remains a concern, particularly in sensitive areas such as healthcare and finance. In this paper, we propose a split FedFT framework with differential privacy (DP) over wireless networks, where the inherent wireless channel noise in the uplink transmission is utilized to achieve DP guarantees without adding an extra artificial noise. We shall investigate the impact of the wireless noise on convergence performance of the proposed framework. We will also show that by updating only one of the low-rank matrices in the split FedFT with DP, the proposed method can mitigate the noise amplification effect. Simulation results will demonstrate that the proposed framework achieves higher accuracy under strict privacy budgets compared to baseline methods.

Updated: 2024-11-27 16:55:59

标题: 在无线网络上使用差分隐私的联邦低秩适应

摘要: 在分布式边缘设备上对大型预训练基础模型（FMs）进行微调存在相当大的计算和隐私挑战。联邦微调（FedFT）通过促进协作模型训练而无需共享原始数据，从而缓解了一些隐私问题。为减轻资源有限设备的计算负担，将低秩适应（LoRA）与联邦学习相结合实现了参数高效的微调。此外，分裂式FedFT架构在边缘设备和中央服务器之间分割FM，减少了在个别设备上完全部署模型的必要性。然而，在FedFT中隐私窃听攻击的风险仍然令人担忧，特别是在敏感领域如医疗保健和金融领域。在本文中，我们提出了一个带微分隐私（DP）的分裂式FedFT框架，应用于无线网络，利用上行传输中固有的无线信道噪声来实现DP保证，而不需要额外添加人工噪声。我们将调查无线噪声对所提出框架的收敛性能的影响。我们还将展示，在分裂式FedFT中只更新一个低秩矩阵时，所提出的方法可以减轻噪声放大效应。仿真结果将证明，与基准方法相比，所提出的框架在严格的隐私预算下实现了更高的准确性。

更新时间: 2024-11-27 16:55:59

领域: cs.LG,cs.CR,eess.SP

下载: http://arxiv.org/abs/2411.07806v2

Simulation-based inference with scattering representations: scattering is all you need

We demonstrate the successful use of scattering representations without further compression for simulation-based inference (SBI) with images (i.e. field-level), illustrated with a cosmological case study. Scattering representations provide a highly effective representational space for subsequent learning tasks, although the higher dimensional compressed space introduces challenges. We overcome these through spatial averaging, coupled with more expressive density estimators. Compared to alternative methods, such an approach does not require additional simulations for either training or computing derivatives, is interpretable, and resilient to covariate shift. As expected, we show that a scattering only approach extracts more information than traditional second order summary statistics.

Updated: 2024-11-27 16:52:44

标题: 基于散射表示的仿真推断：散射就是你所需要的

摘要: 我们展示了在基于模拟的推断（SBI）中使用散射表示而无需进一步压缩的成功案例，该案例以宇宙学为案例进行说明。散射表示为后续学习任务提供了高效的表征空间，尽管更高维的压缩空间引入了挑战。我们通过空间平均和更具表现力的密度估计器克服了这些挑战。与替代方法相比，这种方法既不需要额外的模拟进行训练或计算导数，又具有可解释性，并且对协变量转移具有抗性。正如预期的那样，我们展示了仅使用散射方法提取的信息比传统的二阶摘要统计信息更多。

更新时间: 2024-11-27 16:52:44

领域: cs.LG,astro-ph.CO,astro-ph.IM,stat.ML

下载: http://arxiv.org/abs/2410.11883v2

LLM-ABBA: Understand time series via symbolic approximation

The success of large language models (LLMs) for time series has been demonstrated in previous work. Utilizing a symbolic time series representation, one can efficiently bridge the gap between LLMs and time series. However, the remaining challenge is to exploit the semantic information hidden in time series by using symbols or existing tokens of LLMs, while aligning the embedding space of LLMs according to the hidden information of time series. The symbolic time series approximation (STSA) method called adaptive Brownian bridge-based symbolic aggregation (ABBA) shows outstanding efficacy in preserving salient time series features by modeling time series patterns in terms of amplitude and period while using existing tokens of LLMs. In this paper, we introduce a method, called LLM-ABBA, that integrates ABBA into large language models for various downstream time series tasks. By symbolizing time series, LLM-ABBA compares favorably to the recent state-of-the-art (SOTA) in UCR and three medical time series classification tasks. Meanwhile, a fixed-polygonal chain trick in ABBA is introduced to \kc{avoid obvious drifting} during prediction tasks by significantly mitigating the effects of cumulative error arising from misused symbols during the transition from symbols to numerical values. In time series regression tasks, LLM-ABBA achieves the new SOTA on Time Series Extrinsic Regression (TSER) benchmarks. LLM-ABBA also shows competitive prediction capability compared to recent SOTA time series prediction results. We believe this framework can also seamlessly extend to other time series tasks.

Updated: 2024-11-27 16:48:24

标题: LLM-ABBA：通过符号化近似理解时间序列

摘要: 大语言模型（LLMs）在时间序列方面的成功已在先前的研究中得以证明。利用符号化的时间序列表示，可以有效地弥合LLMs和时间序列之间的差距。然而，剩下的挑战是通过使用符号或现有的LLMs标记来利用隐藏在时间序列中的语义信息，同时根据时间序列的隐藏信息对LLMs的嵌入空间进行对齐。名为自适应布朗桥符号聚合（ABBA）的符号化时间序列近似（STSA）方法在通过以振幅和周期来建模时间序列模式的方式，同时使用现有的LLMs标记来保留显著的时间序列特征方面表现出了出色的功效。在本文中，我们介绍了一种称为LLM-ABBA的方法，将ABBA集成到大语言模型中，用于各种下游时间序列任务。通过对时间序列进行符号化，LLM-ABBA在UCR和三个医学时间序列分类任务中与最近的最新技术（SOTA）相比表现出色。同时，ABBA中的固定多边形链技巧被引入到预测任务中，通过显著减轻由于在从符号到数值转换过程中误用符号而产生的累积误差，从而避免明显的漂移。在时间序列回归任务中，LLM-ABBA在时间序列外部回归（TSER）基准测试中取得了最新的SOTA。LLM-ABBA还显示出与最近的SOTA时间序列预测结果相比具有竞争力的预测能力。我们相信这一框架也可以无缝地扩展到其他时间序列任务。

更新时间: 2024-11-27 16:48:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.18506v1

Calibrated Adaptive Teacher for Domain Adaptive Intelligent Fault Diagnosis

Intelligent Fault Diagnosis (IFD) based on deep learning has proven to be an effective and flexible solution, attracting extensive research. Deep neural networks can learn rich representations from vast amounts of representative labeled data for various applications. In IFD, they achieve high classification performance from signals in an end-to-end manner, without requiring extensive domain knowledge. However, deep learning models usually only perform well on the data distribution they have been trained on. When applied to a different distribution, they may experience performance drops. This is also observed in IFD, where assets are often operated in working conditions different from those in which labeled data have been collected. Unsupervised domain adaptation (UDA) deals with the scenario where labeled data are available in a source domain, and only unlabeled data are available in a target domain, where domains may correspond to operating conditions. Recent methods rely on training with confident pseudo-labels for target samples. However, the confidence-based selection of pseudo-labels is hindered by poorly calibrated confidence estimates in the target domain, primarily due to over-confident predictions, which limits the quality of pseudo-labels and leads to error accumulation. In this paper, we propose a novel UDA method called Calibrated Adaptive Teacher (CAT), where we propose to calibrate the predictions of the teacher network throughout the self-training process, leveraging post-hoc calibration techniques. We evaluate CAT on domain-adaptive IFD and perform extensive experiments on the Paderborn benchmark for bearing fault diagnosis under varying operating conditions. Our proposed method achieves state-of-the-art performance on most transfer tasks.

Updated: 2024-11-27 16:47:10

标题: 经过校准的自适应教师用于领域自适应智能故障诊断

摘要: 基于深度学习的智能故障诊断（IFD）已被证明是一种有效且灵活的解决方案，吸引了广泛的研究。深度神经网络可以从大量的代表性标记数据中学习丰富的表示，用于各种应用。在IFD中，它们以端到端的方式从信号中实现高分类性能，无需大量领域知识。然而，深度学习模型通常只在它们被训练的数据分布上表现良好。当应用于不同分布时，它们可能会遇到性能下降。这也在IFD中观察到，资产通常在与收集标记数据不同的工作条件下运行。无监督领域自适应（UDA）处理的是标记数据在源领域可用，目标领域只有未标记数据的情况，其中领域可能对应于操作条件。最近的方法依赖于对目标样本使用自信伪标签进行训练。然而，基于置信度的伪标签选择受到目标领域中置信度估计不准确的妨碍，主要是由于过于自信的预测，这限制了伪标签的质量并导致误差累积。在本文中，我们提出了一种名为Calibrated Adaptive Teacher（CAT）的新型UDA方法，我们提议通过整个自训练过程校准教师网络的预测，利用事后校准技术。我们在领域自适应IFD上评估CAT，并在Paderborn轴承故障诊断基准测试中进行了大量实验，评估不同工作条件下的性能。我们提出的方法在大多数传输任务上实现了最先进的性能。

更新时间: 2024-11-27 16:47:10

领域: cs.LG,cs.AI,eess.SP,stat.ML,68T07, 62H30,I.2.6; J.2

下载: http://arxiv.org/abs/2312.02826v2

Unveiling the optimization process of Physics Informed Neural Networks: How accurate and competitive can PINNs be?

This study investigates the potential accuracy boundaries of physics-informed neural networks, contrasting their approach with previous similar works and traditional numerical methods. We find that selecting improved optimization algorithms significantly enhances the accuracy of the results. Simple modifications to the loss function may also improve precision, offering an additional avenue for enhancement. Despite optimization algorithms having a greater impact on convergence than adjustments to the loss function, practical considerations often favor tweaking the latter due to ease of implementation. On a global scale, the integration of an enhanced optimizer and a marginally adjusted loss function enables a reduction in the loss function by several orders of magnitude across diverse physical problems. Consequently, our results obtained using compact networks (typically comprising 2 or 3 layers of 20-30 neurons) achieve accuracies comparable to finite difference schemes employing thousands of grid points. This study encourages the continued advancement of PINNs and associated optimization techniques for broader applications across various fields.

Updated: 2024-11-27 16:46:13

标题: 揭示物理信息神经网络的优化过程：PINNs能有多准确和竞争力？

摘要: 这项研究调查了受物理启发的神经网络的潜在准确性边界，将它们的方法与先前类似作品和传统数值方法进行对比。我们发现选择改进的优化算法显著提高了结果的准确性。对损失函数进行简单修改也可以提高精度，为改进提供了另一途径。尽管优化算法对收敛的影响大于对损失函数的调整，但实际考虑通常更倾向于调整后者，因为实施起来更容易。在全球范围内，增强优化器和轻微调整的损失函数的整合使得在各种物理问题中将损失函数减少数个数量级成为可能。因此，我们使用紧凑网络（通常包括2或3层20-30个神经元）获得的结果达到与使用成千上万个网格点的有限差分方案相当的准确性。这项研究鼓励继续推动PINN和相关优化技术在各个领域的广泛应用。

更新时间: 2024-11-27 16:46:13

领域: physics.comp-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04230v2

Isometry pursuit

Isometry pursuit is a convex algorithm for identifying orthonormal column-submatrices of wide matrices. It consists of a novel normalization method followed by multitask basis pursuit. Applied to Jacobians of putative coordinate functions, it helps identity isometric embeddings from within interpretable dictionaries. We provide theoretical and experimental results justifying this method. For problems involving coordinate selection and diversification, it offers a synergistic alternative to greedy and brute force search.

Updated: 2024-11-27 16:43:13

标题: 等距追踪

摘要: 等距追踪是一种用于识别宽矩阵的正交列子矩阵的凸算法。它包括一种新颖的归一化方法，然后是多任务基础追踪。应用于假定坐标函数的雅可比矩阵时，它有助于从可解释的字典中确定等距嵌入。我们提供理论和实验结果来证明这种方法。对于涉及坐标选择和多样化的问题，它提供了一种协同替代贪婪和蛮力搜索的方法。

更新时间: 2024-11-27 16:43:13

领域: stat.ML,cs.AI,cs.IR,cs.LG,stat.ME

下载: http://arxiv.org/abs/2411.18502v1

Multiple Choice Learning for Efficient Speech Separation with Many Speakers

Training speech separation models in the supervised setting raises a permutation problem: finding the best assignation between the model predictions and the ground truth separated signals. This inherently ambiguous task is customarily solved using Permutation Invariant Training (PIT). In this article, we instead consider using the Multiple Choice Learning (MCL) framework, which was originally introduced to tackle ambiguous tasks. We demonstrate experimentally on the popular WSJ0-mix and LibriMix benchmarks that MCL matches the performances of PIT, while being computationally advantageous. This opens the door to a promising research direction, as MCL can be naturally extended to handle a variable number of speakers, or to tackle speech separation in the unsupervised setting.

Updated: 2024-11-27 16:38:34

标题: 多选学习用于高效的多说话人语音分离

摘要: 在监督设置中训练语音分离模型会引发一个排列问题：找到模型预测和地面真实分离信号之间的最佳分配。这个固有模糊的任务通常使用排列不变训练（PIT）来解决。在本文中，我们考虑使用多选择学习（MCL）框架，该框架最初是为了解决模糊任务而引入的。我们在流行的WSJ0-mix和LibriMix基准上进行实验证明，MCL与PIT的性能相匹配，同时在计算上具有优势。这打开了一个有前途的研究方向，因为MCL可以自然扩展到处理可变数量的说话者，或者处理无监督设置中的语音分离。

更新时间: 2024-11-27 16:38:34

领域: cs.SD,cs.LG,eess.AS,stat.ML

下载: http://arxiv.org/abs/2411.18497v1

Agent Skill Acquisition for Large Language Models via CycleQD

Training large language models to acquire specific skills remains a challenging endeavor. Conventional training approaches often struggle with data distribution imbalances and inadequacies in objective functions that do not align well with task-specific performance. To address these challenges, we introduce CycleQD, a novel approach that leverages the Quality Diversity framework through a cyclic adaptation of the algorithm, along with a model merging based crossover and an SVD-based mutation. In CycleQD, each task's performance metric is alternated as the quality measure while the others serve as the behavioral characteristics. This cyclic focus on individual tasks allows for concentrated effort on one task at a time, eliminating the need for data ratio tuning and simplifying the design of the objective function. Empirical results from AgentBench indicate that applying CycleQD to LLAMA3-8B-INSTRUCT based models not only enables them to surpass traditional fine-tuning methods in coding, operating systems, and database tasks, but also achieves performance on par with GPT-3.5-TURBO, which potentially contains much more parameters, across these domains. Crucially, this enhanced performance is achieved while retaining robust language capabilities, as evidenced by its performance on widely adopted language benchmark tasks. We highlight the key design choices in CycleQD, detailing how these contribute to its effectiveness. Furthermore, our method is general and can be applied to image segmentation models, highlighting its applicability across different domains.

Updated: 2024-11-27 16:38:33

标题: 通过CycleQD实现大语言模型的代理技能习得

摘要: 训练大规模语言模型以获取特定技能仍然是一项具有挑战性的工作。传统的训练方法经常遇到数据分布不均衡以及目标函数与特定任务表现不佳的问题。为了解决这些挑战，我们引入了CycleQD，这是一种利用质量多样性框架的新方法，通过算法的循环适应，以及基于模型合并的交叉和基于SVD的突变。在CycleQD中，每个任务的性能指标被交替作为质量衡量，而其他指标则作为行为特征。这种对各个任务的循环关注允许集中精力逐个任务进行，消除了数据比例调整的需要，并简化了目标函数的设计。AgentBench的实证结果表明，将CycleQD应用于基于LLAMA3-8B-INSTRUCT的模型不仅使其能够在编码、操作系统和数据库任务中超越传统的微调方法，而且在这些领域取得了与GPT-3.5-TURBO相当的性能，后者可能包含更多的参数。关键是，在保留强大语言能力的同时实现了这种增强的性能，这可以通过其在广泛采用的语言基准任务上的表现得到证明。我们突出了CycleQD中的关键设计选择，详细说明了这些选择如何促进其有效性。此外，我们的方法是通用的，可以应用于图像分割模型，突显了其在不同领域的适用性。

更新时间: 2024-11-27 16:38:33

领域: cs.CL,cs.AI,cs.NE

下载: http://arxiv.org/abs/2410.14735v2

Initial Evidence of Elevated Reconnaissance Attacks Against Nodes in P2P Overlay Networks

We hypothesize that peer-to-peer (P2P) overlay network nodes can be attractive to attackers due to their visibility, sustained uptime, and resource potential. Towards validating this hypothesis, we investigate the state of active reconnaissance attacks on Ethereum P2P network nodes by deploying a series of honeypots alongside actual Ethereum nodes across globally distributed vantage points. We find that Ethereum nodes experience not only increased attacks, but also specific types of attacks targeting particular ports and services. Furthermore, we find evidence that the threat assessment on our nodes is applicable to the wider P2P network by having performed port scans on other reachable peers. Our findings provide insights into potential mitigation strategies to improve the security of the P2P networking layer.

Updated: 2024-11-27 16:38:18

标题: 对P2P覆盖网络中节点受到提升的侦察攻击的初步证据

摘要: 我们假设点对点（P2P）覆盖网络节点可能受到攻击者的青睐，因为它们具有可见性、持续的在线时间和资源潜力。为了验证这一假设，我们通过在全球各地的不同地点部署一系列蜜罐，调查了对以太坊P2P网络节点进行的主动侦察攻击的状态。我们发现，以太坊节点不仅遭受到攻击增加，还遭受特定类型的攻击，针对特定端口和服务。此外，我们发现有证据表明，我们节点上的威胁评估也适用于更广泛的P2P网络，因为我们已经对其他可达节点进行了端口扫描。我们的研究结果为改进P2P网络层安全性的潜在缓解策略提供了见解。

更新时间: 2024-11-27 16:38:18

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2411.14623v2

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

LLMs can now act as autonomous agents that interact with digital environments and complete specific objectives (e.g., arranging an online meeting). However, accuracy is still far from satisfactory, partly due to a lack of large-scale, direct demonstrations for digital tasks. Obtaining supervised data from humans is costly, and automatic data collection through exploration or reinforcement learning relies on complex environmental and content setup, resulting in datasets that lack comprehensive coverage of various scenarios. On the other hand, there is abundant knowledge that may indirectly assist task completion, such as online tutorials that were created for human consumption. In this work, we present Synatra, an approach that effectively transforms this indirect knowledge into direct supervision at scale. We define different types of indirect knowledge, and carefully study the available sources to obtain it, methods to encode the structure of direct demonstrations, and finally methods to transform indirect knowledge into direct demonstrations. We use 100k such synthetically-created demonstrations to finetune a 7B CodeLlama, and demonstrate that the resulting agent surpasses all comparably sized models on three web-based task benchmarks Mind2Web, MiniWoB++ and WebArena, as well as surpassing GPT-3.5 on WebArena and Mind2Web. In addition, while synthetic demonstrations prove to be only 3% the cost of human demonstrations (at $0.031 each), we show that the synthetic demonstrations can be more effective than an identical number of human demonstrations collected from limited domains.

Updated: 2024-11-27 16:34:52

标题: Synatra：将间接知识转化为规模化数字代理的直接演示

摘要: LLMs现在可以作为自主代理与数字环境互动并完成特定目标（例如，安排在线会议）。然而，准确性仍然远未令人满意，部分原因是由于缺乏用于数字任务的大规模直接演示。从人类获取受监督的数据成本高昂，而通过探索或强化学习进行自动数据收集依赖于复杂的环境和内容设置，导致数据集缺乏对各种情景的全面覆盖。另一方面，有大量知识可能间接帮助完成任务，例如为人类消费而创建的在线教程。在这项工作中，我们提出了Synatra方法，有效地将这种间接知识转化为大规模的直接监督。我们定义了不同类型的间接知识，并仔细研究了获取它的可用来源、编码直接演示结构的方法，最后将间接知识转化为直接演示的方法。我们使用了10万个这样合成创建的演示来微调一个7B CodeLlama，并展示结果代理在三个基于Web的任务基准Mind2Web、MiniWoB++和WebArena上超过了所有相同规模的模型，同时在WebArena和Mind2Web上也超过了GPT-3.5。此外，尽管合成演示的成本仅为人类演示的3%（每个$0.031），但我们表明合成演示可能比从有限领域收集的相同数量的人类演示更有效。

更新时间: 2024-11-27 16:34:52

领域: cs.AI

下载: http://arxiv.org/abs/2409.15637v2

SPTTE: A Spatiotemporal Probabilistic Framework for Travel Time Estimation

Accurate travel time estimation is essential for navigation and itinerary planning. While existing research employs probabilistic modeling to assess travel time uncertainty and account for correlations between multiple trips, modeling the temporal variability of multi-trip travel time distributions remains a significant challenge. Capturing the evolution of joint distributions requires large, well-organized datasets; however, real-world trip data are often temporally sparse and spatially unevenly distributed. To address this issue, we propose SPTTE, a spatiotemporal probabilistic framework that models the evolving joint distribution of multi-trip travel times by formulating the estimation task as a spatiotemporal stochastic process regression problem with fragmented observations. SPTTE incorporates an RNN-based temporal Gaussian process parameterization to regularize sparse observations and capture temporal dependencies. Additionally, it employs a prior-based heterogeneity smoothing strategy to correct unreliable learning caused by unevenly distributed trips, effectively modeling temporal variability under sparse and uneven data distributions. Evaluations on real-world datasets demonstrate that SPTTE outperforms state-of-the-art deterministic and probabilistic methods by over 10.13%. Ablation studies and visualizations further confirm the effectiveness of the model components.

Updated: 2024-11-27 16:28:54

标题: SPTTE：一个面向旅行时间估计的时空概率框架

摘要: 准确的旅行时间估计对于导航和行程规划至关重要。虽然现有研究采用概率建模来评估旅行时间的不确定性并考虑多次旅行之间的相关性，但对多次旅行时间分布的时间变异性建模仍然是一个重大挑战。捕捉联合分布的演变需要大量且组织良好的数据集；然而，现实世界的旅行数据通常在时间上稀疏且在空间上不均匀分布。为了解决这个问题，我们提出了一个名为SPTTE的时空概率框架，通过将估计任务构建为一个带有碎片观测的时空随机过程回归问题，来模拟多次旅行时间的演变联合分布。SPTTE整合了基于RNN的时间高斯过程参数化来规范稀疏观测并捕捉时间依赖性。此外，它采用基于先验的异质性平滑策略来纠正由于不均匀分布旅行而导致的不可靠学习，有效地模拟了稀疏和不均匀数据分布下的时间变异性。对真实世界数据集的评估表明，SPTTE的性能超过最先进的确定性和概率方法10.13%以上。消融研究和可视化进一步确认了模型组件的有效性。

更新时间: 2024-11-27 16:28:54

领域: cs.LG

下载: http://arxiv.org/abs/2411.18484v1

SoK: Watermarking for AI-Generated Content

As the outputs of generative AI (GenAI) techniques improve in quality, it becomes increasingly challenging to distinguish them from human-created content. Watermarking schemes are a promising approach to address the problem of distinguishing between AI and human-generated content. These schemes embed hidden signals within AI-generated content to enable reliable detection. While watermarking is not a silver bullet for addressing all risks associated with GenAI, it can play a crucial role in enhancing AI safety and trustworthiness by combating misinformation and deception. This paper presents a comprehensive overview of watermarking techniques for GenAI, beginning with the need for watermarking from historical and regulatory perspectives. We formalize the definitions and desired properties of watermarking schemes and examine the key objectives and threat models for existing approaches. Practical evaluation strategies are also explored, providing insights into the development of robust watermarking techniques capable of resisting various attacks. Additionally, we review recent representative works, highlight open challenges, and discuss potential directions for this emerging field. By offering a thorough understanding of watermarking in GenAI, this work aims to guide researchers in advancing watermarking methods and applications, and support policymakers in addressing the broader implications of GenAI.

Updated: 2024-11-27 16:22:33

标题: SoK：用于AI生成内容的水印技术

摘要: 随着生成式人工智能（GenAI）技术的输出质量不断提高，将其与人类创造的内容区分开变得越来越具有挑战性。数字水印方案是解决区分人工智能和人类生成内容问题的一种有前途的方法。这些方案在人工智能生成的内容中嵌入隐藏信号，以实现可靠的检测。虽然数字水印并非解决所有与GenAI相关风险的灵丹妙药，但它可以在打击误导和欺骗方面发挥关键作用，从而增强人工智能的安全性和可信度。本文全面介绍了GenAI数字水印技术，从历史和监管的角度出发，阐明了数字水印的定义和期望属性，并研究了现有方法的主要目标和威胁模型。还探讨了实践评估策略，为开发能够抵抗各种攻击的强大数字水印技术提供了见解。此外，我们还回顾了最近的代表性工作，突出了开放挑战，并讨论了这一新兴领域的潜在方向。通过深入了解GenAI中的数字水印技术，本文旨在指导研究人员推进数字水印方法和应用，并支持决策者解决GenAI的更广泛影响。

更新时间: 2024-11-27 16:22:33

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.18479v1

Weakly Supervised Framework Considering Multi-temporal Information for Large-scale Cropland Mapping with Satellite Imagery

Accurately mapping large-scale cropland is crucial for agricultural production management and planning. Currently, the combination of remote sensing data and deep learning techniques has shown outstanding performance in cropland mapping. However, those approaches require massive precise labels, which are labor-intensive. To reduce the label cost, this study presented a weakly supervised framework considering multi-temporal information for large-scale cropland mapping. Specifically, we extract high-quality labels according to their consistency among global land cover (GLC) products to construct the supervised learning signal. On the one hand, to alleviate the overfitting problem caused by the model's over-trust of remaining errors in high-quality labels, we encode the similarity/aggregation of cropland in the visual/spatial domain to construct the unsupervised learning signal, and take it as the regularization term to constrain the supervised part. On the other hand, to sufficiently leverage the plentiful information in the samples without high-quality labels, we also incorporate the unsupervised learning signal in these samples, enriching the diversity of the feature space. After that, to capture the phenological features of croplands, we introduce dense satellite image time series (SITS) to extend the proposed framework in the temporal dimension. We also visualized the high dimensional phenological features to uncover how multi-temporal information benefits cropland extraction, and assessed the method's robustness under conditions of data scarcity. The proposed framework has been experimentally validated for strong adaptability across three study areas (Hunan Province, Southeast France, and Kansas) in large-scale cropland mapping, and the internal mechanism and temporal generalizability are also investigated.

Updated: 2024-11-27 16:11:52

标题: 利用卫星图像的多时相信息考虑大规模农田制图的弱监督框架

摘要: 准确地绘制大规模耕地对于农业生产管理和规划至关重要。目前，遥感数据和深度学习技术的结合在耕地绘制中显示出优秀的性能。然而，这些方法需要大量精确的标签，这是劳动密集型的。为了降低标签成本，本研究提出了一个考虑多时相信息的弱监督框架用于大规模耕地绘制。具体来说，我们根据全球土地覆盖(GLC)产品之间的一致性提取高质量标签以构建监督学习信号。一方面，为了减轻模型对高质量标签中剩余错误的过度信任导致的过拟合问题，我们在视觉/空间领域对耕地的相似性/聚合进行编码，构建无监督学习信号，并将其作为正则化项来约束监督部分。另一方面，为了充分利用没有高质量标签的样本中丰富的信息，我们还将无监督学习信号纳入这些样本中，丰富特征空间的多样性。之后，为了捕捉耕地的物候特征，我们引入了密集的卫星图像时间序列(SITS)来在时间维度上扩展提出的框架。我们还可视化了高维物候特征，揭示多时相信息如何有益于耕地提取，并评估了在数据稀缺条件下该方法的稳健性。提出的框架已在大规模耕地绘制中的三个研究区域（湖南省、法国东南部和堪萨斯州）中进行了实验验证，还对内部机制和时间泛化能力进行了调查。

更新时间: 2024-11-27 16:11:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18475v1

Isolating authorship from content with semantic embeddings and contrastive learning

Authorship has entangled style and content inside. Authors frequently write about the same topics in the same style, so when different authors write about the exact same topic the easiest way out to distinguish them is by understanding the nuances of their style. Modern neural models for authorship can pick up these features using contrastive learning, however, some amount of content leakage is always present. Our aim is to reduce the inevitable impact and correlation between content and authorship. We present a technique to use contrastive learning (InfoNCE) with additional hard negatives synthetically created using a semantic similarity model. This disentanglement technique aims to distance the content embedding space from the style embedding space, leading to embeddings more informed by style. We demonstrate the performance with ablations on two different datasets and compare them on out-of-domain challenges. Improvements are clearly shown on challenging evaluations on prolific authors with up to a 10% increase in accuracy when the settings are particularly hard. Trials on challenges also demonstrate the preservation of zero-shot capabilities of this method as fine tuning.

Updated: 2024-11-27 16:08:46

标题: 使用语义嵌入和对比学习将作者与内容隔离

摘要: 作者身份已经将风格和内容纠缠在一起。作者经常以相同的风格写作相同的主题，因此当不同的作者写作完全相同的主题时，区分它们的最简单方法是了解他们风格的细微差别。现代作者身份的神经模型可以利用对比学习来捕捉这些特征，然而，总会存在一定程度的内容泄漏。我们的目标是减少内容和作者身份之间不可避免的影响和相关性。我们提出了一种技术，利用对比学习(InfoNCE)并利用语义相似性模型合成创建额外的硬负例。这种解缠技术旨在将内容嵌入空间与风格嵌入空间分开，从而使嵌入更多地受到风格的影响。我们通过对两个不同数据集的消融实验进行性能展示，并在领域外挑战中进行比较。在具有挑战性的评估中，当设置特别困难时，准确性增加了高达10%。挑战试验还展示了这种方法作为微调的零样本能力的保持。

更新时间: 2024-11-27 16:08:46

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.18472v1

Autonomous search of real-life environments combining dynamical system-based path planning and unsupervised learning

In recent years, advancements have been made towards the goal of using chaotic coverage path planners for autonomous search and traversal of spaces with limited environmental cues. However, the state of this field is still in its infancy as there has been little experimental work done. The existing experimental works have not developed robust methods to satisfactorily address the immediate set of problems a chaotic coverage path planner needs to overcome in order to scan realistic environments within reasonable coverage times. These immediate problems are as follows: (1) an obstacle avoidance technique that reduces halts or disruptions in continuous chaotic trajectories, (2) a means to spread chaotic trajectories across the environment (especially crucial for large and/or complex-shaped environments) that need to be covered, and (3) a real-time coverage calculation technique that is accurate and independent of cell size. This study addresses these problems by developing a novel applied framework for real-world applications of chaotic coverage path planners while providing techniques for effective obstacle avoidance, chaotic trajectory dispersal, and accurate real-time coverage calculation. These algorithms were created within the ROS framework and make up a newly developed chaotic path planning application. The performance of this application was comparable to that of a conventional optimal path planner. The performance tests were carried out in environments of various sizes, shapes, and obstacle densities, both in real-life and Gazebo simulations.

Updated: 2024-11-27 16:07:12

标题: 自主搜索现实环境的结合动力系统路径规划和无监督学习

摘要: 近年来，人们在利用混沌覆盖路径规划器进行有限环境线索下的自主搜索和遍历方面已取得了进展。然而，这一领域仍处于起步阶段，因为很少进行实验工作。现有的实验工作尚未开发出强大的方法，能够令混沌覆盖路径规划器能够在合理的覆盖时间内扫描现实环境中必须克服的一系列问题。这些问题包括：（1）减少在连续混沌轨迹中的停滞或中断的障碍避免技术，（2）在环境中分布混沌轨迹的方法（对于需要覆盖的大型和/或复杂形状的环境尤为关键），以及（3）准确且独立于单元大小的实时覆盖计算技术。本研究通过开发一个新的混沌覆盖路径规划器应用框架，解决了这些问题，同时提供了有效的障碍物避免、混沌轨迹扩散和准确的实时覆盖计算技术。这些算法是在ROS框架内创建的，并组成了一个新开发的混沌路径规划应用程序。该应用程序的性能与传统的最优路径规划器相当。性能测试在各种大小、形状和障碍密度的环境中进行，包括真实环境和Gazebo模拟。

更新时间: 2024-11-27 16:07:12

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2305.01834v3

Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding

Speculative Decoding (SD) has become an important technique in accelerating the inference speed of large language models. Conventional SD methods employ a fixed draft length, which ignores the token generation difficulty across tasks. Consequently, in this paper, we address such an issue and introduce SVIP - a difficulty-aware dynamic draft length policy for speculative decoding systems. Based on a theoretical lower bound of draft token acceptance rate and its inference-time approximation, SVIP adaptively determines the lengths of draft sequences based on the entropy of each draft token distribution. Experimental results on mainstream SD benchmarks and frameworks demonstrate the superior performance of SVIP, achieving up to 20\% walltime speedup on SpecBench over baseline SD methods and 60\% speedup on MT-Bench for long-form generation of up to 8K tokens. Moreover, SVIP is totally training-free and compatible with any existing SD methods that generate draft tokens autoregressively. Experimental results also show that SVIP yields consistent walltime improvement on top of GliDe & CaPE and EAGLE-2.

Updated: 2024-11-27 15:53:17

标题: 初稿模型知道何时停止：一种用于推测解码的自我验证长度策略

摘要: 推测解码（SD）已成为加速大型语言模型推理速度的重要技术。传统的SD方法采用固定的草稿长度，忽略了跨任务的标记生成难度。因此，在本文中，我们解决了这样一个问题，并引入了SVIP - 一种针对推测解码系统的难度感知动态草稿长度策略。基于草稿标记接受率的理论下限和其推理时间近似，SVIP根据每个草稿标记分布的熵自适应地确定草稿序列的长度。在主流的SD基准和框架上的实验结果表明，SVIP的性能优越，相对于基准SD方法在SpecBench上实现了高达20％的墙时加速，并在MT-Bench上长格式生成长达8K标记时实现了60％的加速。此外，SVIP完全不需要训练，与任何生成草稿标记的自回归SD方法兼容。实验结果还表明，SVIP在GliDe＆CaPE和EAGLE-2之上持续提供了一致的墙时改进。

更新时间: 2024-11-27 15:53:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.18462v1

What do physics-informed DeepONets learn? Understanding and improving training for scientific computing applications

Physics-informed deep operator networks (DeepONets) have emerged as a promising approach toward numerically approximating the solution of partial differential equations (PDEs). In this work, we aim to develop further understanding of what is being learned by physics-informed DeepONets by assessing the universality of the extracted basis functions and demonstrating their potential toward model reduction with spectral methods. Results provide clarity about measuring the performance of a physics-informed DeepONet through the decays of singular values and expansion coefficients. In addition, we propose a transfer learning approach for improving training for physics-informed DeepONets between parameters of the same PDE as well as across different, but related, PDEs where these models struggle to train well. This approach results in significant error reduction and learned basis functions that are more effective in representing the solution of a PDE.

Updated: 2024-11-27 15:48:35

标题: 物理信息DeepONets学到了什么？理解和改进科学计算应用的培训

摘要: 物理信息深度操作员网络（DeepONets）已经成为一种有希望的方法，用于数值逼近偏微分方程（PDE）的解。在这项工作中，我们旨在通过评估提取的基函数的普适性并展示它们在模型简化中的潜力，进一步了解物理信息深度操作员网络学到了什么。结果明确了通过奇异值和展开系数的衰减来衡量物理信息深度操作员网络性能的清晰度。此外，我们提出了一种迁移学习方法，用于改善物理信息深度操作员网络在相同PDE参数之间以及不同但相关的PDE之间的训练，这些模型在训练中遇到困难。这种方法导致了显著的误差减少和学习的基函数更有效地表示PDE解。

更新时间: 2024-11-27 15:48:35

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2411.18459v1

GSE: Group-wise Sparse and Explainable Adversarial Attacks

Sparse adversarial attacks fool deep neural networks (DNNs) through minimal pixel perturbations, often regularized by the $\ell_0$ norm. Recent efforts have replaced this norm with a structural sparsity regularizer, such as the nuclear group norm, to craft group-wise sparse adversarial attacks. The resulting perturbations are thus explainable and hold significant practical relevance, shedding light on an even greater vulnerability of DNNs. However, crafting such attacks poses an optimization challenge, as it involves computing norms for groups of pixels within a non-convex objective. We address this by presenting a two-phase algorithm that generates group-wise sparse attacks within semantically meaningful areas of an image. Initially, we optimize a quasinorm adversarial loss using the $1/2-$quasinorm proximal operator tailored for non-convex programming. Subsequently, the algorithm transitions to a projected Nesterov's accelerated gradient descent with $2-$norm regularization applied to perturbation magnitudes. Rigorous evaluations on CIFAR-10 and ImageNet datasets demonstrate a remarkable increase in group-wise sparsity, e.g., $50.9\%$ on CIFAR-10 and $38.4\%$ on ImageNet (average case, targeted attack). This performance improvement is accompanied by significantly faster computation times, improved explainability, and a $100\%$ attack success rate.

Updated: 2024-11-27 15:46:34

标题: GSE：基于群体稀疏和可解释的对抗性攻击

摘要: 稀疏对抗攻击通过最小像素扰动欺骗深度神经网络（DNNs），通常通过$\ell_0$范数进行正则化。最近的研究努力将这种范数替换为结构稀疏正则化器，例如核组范数，以制定基于组稀疏的对抗攻击。因此产生的扰动是可以解释的，并且具有重要的实际意义，揭示了DNNs更大的脆弱性。然而，制定这种攻击会面临优化挑战，因为它涉及在非凸目标中计算像素组的范数。我们通过提出一个两阶段算法来解决这个问题，该算法在图像的语义上有意义的区域内生成基于组的稀疏攻击。最初，我们使用1/2-拟范对抗损失进行优化，该拟范专门用于非凸编程。随后，算法转换为应用于扰动幅度的2-范数正则化的投影Nesterov加速梯度下降。对CIFAR-10和ImageNet数据集的严格评估显示了基于组的稀疏性显著增加，例如在CIFAR-10上为50.9％，在ImageNet上为38.4％（平均情况，有针对性的攻击）。这种性能改进伴随着计算时间显着缩短，解释性的提高和100％的攻击成功率。

更新时间: 2024-11-27 15:46:34

领域: cs.CV,cs.CR,cs.LG,math.OC

下载: http://arxiv.org/abs/2311.17434v4

Synthetic ECG Generation for Data Augmentation and Transfer Learning in Arrhythmia Classification

Deep learning models need a sufficient amount of data in order to be able to find the hidden patterns in it. It is the purpose of generative modeling to learn the data distribution, thus allowing us to sample more data and augment the original dataset. In the context of physiological data, and more specifically electrocardiogram (ECG) data, given its sensitive nature and expensive data collection, we can exploit the benefits of generative models in order to enlarge existing datasets and improve downstream tasks, in our case, classification of heart rhythm. In this work, we explore the usefulness of synthetic data generated with different generative models from Deep Learning namely Diffweave, Time-Diffusion and Time-VQVAE in order to obtain better classification results for two open source multivariate ECG datasets. Moreover, we also investigate the effects of transfer learning, by fine-tuning a synthetically pre-trained model and then progressively adding increasing proportions of real data. We conclude that although the synthetic samples resemble the real ones, the classification improvement when simply augmenting the real dataset is barely noticeable on individual datasets, but when both datasets are merged the results show an increase across all metrics for the classifiers when using synthetic samples as augmented data. From the fine-tuning results the Time-VQVAE generative model has shown to be superior to the others but not powerful enough to achieve results close to a classifier trained with real data only. In addition, methods and metrics for measuring closeness between synthetic data and the real one have been explored as a side effect of the main research questions of this study.

Updated: 2024-11-27 15:46:34

标题: 合成心电图生成用于心律失常分类中的数据增强和迁移学习

摘要: 深度学习模型需要足够的数据量才能找到其中隐藏的模式。生成建模的目的是学习数据分布，从而允许我们采样更多数据并增加原始数据集。在生理数据，特别是心电图（ECG）数据的背景下，考虑到其敏感性和昂贵的数据收集，我们可以利用生成模型的优势来扩大现有数据集并改善下游任务，即心律分类。在这项工作中，我们探讨了利用深度学习中的不同生成模型（Diffweave，Time-Diffusion和Time-VQVAE）生成的合成数据的实用性，以获得两个开源多变量ECG数据集的更好分类结果。此外，我们还研究了迁移学习的效果，通过微调一个经过合成预训练的模型，然后逐渐增加真实数据的比例。我们得出结论，尽管合成样本与真实样本相似，但仅仅增加真实数据集时的分类改进在个别数据集上几乎不明显，但当合并两个数据集时，使用合成样本作为增强数据时，分类器的所有指标都显示出增加。从微调结果来看，Time-VQVAE生成模型显示出优于其他模型，但不足以取得接近仅使用真实数据训练的分类器的结果。此外，还探讨了衡量合成数据与真实数据接近程度的方法和指标，作为本研究主要研究问题的副作用。

更新时间: 2024-11-27 15:46:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.18456v1

S-CFE: Simple Counterfactual Explanations

We study the problem of finding optimal sparse, manifold-aligned counterfactual explanations for classifiers. Canonically, this can be formulated as an optimization problem with multiple non-convex components, including classifier loss functions and manifold alignment (or \emph{plausibility}) metrics. The added complexity of enforcing \emph{sparsity}, or shorter explanations, complicates the problem further. Existing methods often focus on specific models and plausibility measures, relying on convex $\ell_1$ regularizers to enforce sparsity. In this paper, we tackle the canonical formulation using the accelerated proximal gradient (APG) method, a simple yet efficient first-order procedure capable of handling smooth non-convex objectives and non-smooth $\ell_p$ (where $0 \leq p < 1$) regularizers. This enables our approach to seamlessly incorporate various classifiers and plausibility measures while producing sparser solutions. Our algorithm only requires differentiable data-manifold regularizers and supports box constraints for bounded feature ranges, ensuring the generated counterfactuals remain \emph{actionable}. Finally, experiments on real-world datasets demonstrate that our approach effectively produces sparse, manifold-aligned counterfactual explanations while maintaining proximity to the factual data and computational efficiency.

Updated: 2024-11-27 15:43:22

标题: S-CFE: 简单的反事实解释

摘要: 我们研究了为分类器找到最佳稀疏、流形对齐的对照解释的问题。经典地，这可以被表述为一个具有多个非凸组件的优化问题，包括分类器损失函数和流形对齐（或“可信度”）度量。强制稀疏性，或更短的解释，增加了问题的复杂性。现有方法通常专注于特定模型和可信度测量，依赖于凸$\ell_1$正则化器来强制稀疏性。在本文中，我们使用加速近端梯度（APG）方法来解决这个经典的表述，这是一种简单而高效的一阶过程，能够处理光滑非凸目标和非光滑的$\ell_p$（其中$0 \leq p < 1$）正则化器。这使得我们的方法能够无缝地整合各种分类器和可信度测量，同时产生更稀疏的解决方案。我们的算法只需要可微的数据流形正则化器，并支持用于有界特征范围的盒约束，确保生成的对照解释保持“可操作性”。最后，在真实世界数据集上的实验表明，我们的方法有效地产生了稀疏、流形对齐的对照解释，同时保持与事实数据的接近和计算效率。

更新时间: 2024-11-27 15:43:22

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2410.15723v3

Advancements in Myocardial Infarction Detection and Classification Using Wearable Devices: A Comprehensive Review

Myocardial infarction (MI), commonly known as a heart attack, is a critical health condition caused by restricted blood flow to the heart. Early-stage detection through continuous ECG monitoring is essential to minimize irreversible damage. This review explores advancements in MI classification methodologies for wearable devices, emphasizing their potential in real-time monitoring and early diagnosis. It critically examines traditional approaches, such as morphological filtering and wavelet decomposition, alongside cutting-edge techniques, including Convolutional Neural Networks (CNNs) and VLSI-based methods. By synthesizing findings on machine learning, deep learning, and hardware innovations, this paper highlights their strengths, limitations, and future prospects. The integration of these techniques into wearable devices offers promising avenues for efficient, accurate, and energy-aware MI detection, paving the way for next-generation wearable healthcare solutions.

Updated: 2024-11-27 15:42:30

标题: 可穿戴设备在心肌梗塞检测和分类中的进展：综述

摘要: 心肌梗死（MI），通常称为心脏病发作，是由于心脏血流受限引起的危急健康状况。通过持续的心电图监测早期检测对于最小化不可逆损害至关重要。本综述探讨了可穿戴设备中MI分类方法的进展，强调它们在实时监测和早期诊断中的潜力。它批判性地审视了传统方法，如形态滤波和小波分解，以及最新技术，包括卷积神经网络（CNNs）和基于VLSI的方法。通过综合机器学习、深度学习和硬件创新的研究结果，本文突出了它们的优势、局限性和未来前景。将这些技术整合到可穿戴设备中为高效、准确和节能的MI检测提供了有希望的途径，为下一代可穿戴医疗解决方案铺平了道路。

更新时间: 2024-11-27 15:42:30

领域: cs.LG

下载: http://arxiv.org/abs/2411.18451v1

Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation

Autoregressive models are typically applied to sequences of discrete tokens, but recent research indicates that generating sequences of continuous embeddings in an autoregressive manner is also feasible. However, such Continuous Autoregressive Models (CAMs) can suffer from a decline in generation quality over extended sequences due to error accumulation during inference. We introduce a novel method to address this issue by injecting random noise into the input embeddings during training. This procedure makes the model robust against varying error levels at inference. We further reduce error accumulation through an inference procedure that introduces low-level noise. Experiments on musical audio generation show that CAM substantially outperforms existing autoregressive and non-autoregressive approaches while preserving audio quality over extended sequences. This work paves the way for generating continuous embeddings in a purely autoregressive setting, opening new possibilities for real-time and interactive generative applications.

Updated: 2024-11-27 15:38:20

标题: 使用噪声增强的连续自回归模型避免误差累积

摘要: 自回归模型通常应用于离散令牌序列，但最近的研究表明以自回归方式生成连续嵌入序列也是可行的。然而，这种连续自回归模型（CAMs）可能由于推理过程中误差积累而导致生成质量在长序列上下降。我们引入了一种新方法来解决这个问题，即在训练过程中向输入嵌入注入随机噪声。这一过程使模型能够抵抗推理过程中不同误差水平。我们通过一种推理过程引入低级噪声来进一步减少误差积累。音乐音频生成实验表明，CAM明显优于现有的自回归和非自回归方法，同时在长序列上保持音频质量。这项工作为在纯自回归环境中生成连续嵌入铺平了道路，为实时和交互式生成应用开辟了新的可能性。

更新时间: 2024-11-27 15:38:20

领域: cs.LG,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2411.18447v1

Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator

The quality of meeting summaries generated by natural language generation (NLG) systems is hard to measure automatically. Established metrics such as ROUGE and BERTScore have a relatively low correlation with human judgments and fail to capture nuanced errors. Recent studies suggest using large language models (LLMs), which have the benefit of better context understanding and adaption of error definitions without training on a large number of human preference judgments. However, current LLM-based evaluators risk masking errors and can only serve as a weak proxy, leaving human evaluation the gold standard despite being costly and hard to compare across studies. In this work, we present MESA, an LLM-based framework employing a three-step assessment of individual error types, multi-agent discussion for decision refinement, and feedback-based self-training to refine error definition understanding and alignment with human judgment. We show that MESA's components enable thorough error detection, consistent rating, and adaptability to custom error guidelines. Using GPT-4o as its backbone, MESA achieves mid to high Point-Biserial correlation with human judgment in error detection and mid Spearman and Kendall correlation in reflecting error impact on summary quality, on average 0.25 higher than previous methods. The framework's flexibility in adapting to custom error guidelines makes it suitable for various tasks with limited human-labeled data.

Updated: 2024-11-27 15:35:32

标题: 我的会议总结好吗？使用多LLM评估器估计质量

摘要: 自然语言生成（NLG）系统生成的会议摘要的质量很难自动衡量。已建立的度量标准，如ROUGE和BERTScore与人类判断之间的相关性相对较低，无法捕捉微妙的错误。最近的研究表明，使用大型语言模型（LLMs）具有更好的上下文理解能力和错误定义的调整优势，而无需训练大量的人类偏好判断。然而，当前基于LLM的评估器存在掩盖错误的风险，并且只能作为一个弱代理，使得人类评估成为金标准，尽管成本高且难以跨研究进行比较。在这项工作中，我们提出了MESA，这是一个基于LLM的框架，采用三步评估个别错误类型，多代理讨论进行决策精炼，并基于反馈进行自我训练，以改进错误定义理解并与人类判断保持一致。我们展示了MESA的组件使得能够进行彻底的错误检测，一致的评分，并适应自定义错误指南。使用GPT-4o作为其支柱，MESA在错误检测中与人类判断的Point-Biserial相关性达到中高水平，并在反映错误对摘要质量影响的中等Spearman和Kendall相关性上，平均比先前的方法高0.25。该框架适应自定义错误指南的灵活性使其适用于具有有限人类标记数据的各种任务。

更新时间: 2024-11-27 15:35:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.18444v1

Multiscale Hodge Scattering Networks for Data Analysis

We propose new scattering networks for signals measured on simplicial complexes, which we call \emph{Multiscale Hodge Scattering Networks} (MHSNs). Our construction is based on multiscale basis dictionaries on simplicial complexes, i.e., the $\kappa$-GHWT and $\kappa$-HGLET, which we recently developed for simplices of dimension $\kappa \in \mathbb{N}$ in a given simplicial complex by generalizing the node-based Generalized Haar-Walsh Transform (GHWT) and Hierarchical Graph Laplacian Eigen Transform (HGLET). The $\kappa$-GHWT and the $\kappa$-HGLET both form redundant sets (i.e., dictionaries) of multiscale basis vectors and the corresponding expansion coefficients of a given signal. Our MHSNs use a layered structure analogous to a convolutional neural network (CNN) to cascade the moments of the modulus of the dictionary coefficients. The resulting features are invariant to reordering of the simplices (i.e., node permutation of the underlying graphs). Importantly, the use of multiscale basis dictionaries in our MHSNs admits a natural pooling operation that is akin to local pooling in CNNs, and which may be performed either locally or per-scale. These pooling operations are harder to define in both traditional scattering networks based on Morlet wavelets, and geometric scattering networks based on Diffusion Wavelets. As a result, we are able to extract a rich set of descriptive yet robust features that can be used along with very simple machine learning methods (i.e., logistic regression or support vector machines) to achieve high-accuracy classification systems with far fewer parameters to train than most modern graph neural networks. Finally, we demonstrate the usefulness of our MHSNs in three distinct types of problems: signal classification, domain (i.e., graph/simplex) classification, and molecular dynamics prediction.

Updated: 2024-11-27 15:32:51

标题: 多尺度霍奇散射网络用于数据分析

摘要: 我们提出了一种针对在单纯复合体上测量的信号的新散射网络，我们将其称为\emph{多尺度霍奇散射网络}（MHSNs）。我们的构建基于单纯复合体上的多尺度基础词典，即$\kappa$-GHWT和$\kappa$-HGLET，我们最近为给定单纯复合体中维度为$\kappa \in \mathbb{N}$的单纯体发展了这些词典，通过广义化基于节点的广义哈尔-沃尔什变换（GHWT）和分层图拉普拉斯特征值变换（HGLET）。$\kappa$-GHWT和$\kappa$-HGLET都形成冗余集合（即词典），包含给定信号的多尺度基础向量和相应的展开系数。我们的MHSNs使用类似于卷积神经网络（CNN）的分层结构来级联词典系数的模的矩。结果特征对单纯体的重新排序（即基础图的节点置换）是不变的。重要的是，在我们的MHSNs中使用多尺度基础词典允许一种自然的池化操作，类似于CNN中的局部池化，可以在本地或每个尺度上执行。这些池化操作在基于Morlet小波的传统散射网络和基于扩散小波的几何散射网络中更难定义。因此，我们能够提取一组描述性且稳健的特征，可以与非常简单的机器学习方法（即逻辑回归或支持向量机）一起使用，以实现具有比大多数现代图神经网络更少参数训练的高准确度分类系统。最后，我们展示了我们的MHSNs在三种不同类型问题中的有用性：信号分类，领域（即图/单纯体）分类和分子动力学预测。

更新时间: 2024-11-27 15:32:51

领域: cs.LG,cs.NA,cs.SI,eess.SP,math.NA,stat.ML

下载: http://arxiv.org/abs/2311.10270v5

Metric-DST: Mitigating Selection Bias Through Diversity-Guided Semi-Supervised Metric Learning

Selection bias poses a critical challenge for fairness in machine learning, as models trained on data that is less representative of the population might exhibit undesirable behavior for underrepresented profiles. Semi-supervised learning strategies like self-training can mitigate selection bias by incorporating unlabeled data into model training to gain further insight into the distribution of the population. However, conventional self-training seeks to include high-confidence data samples, which may reinforce existing model bias and compromise effectiveness. We propose Metric-DST, a diversity-guided self-training strategy that leverages metric learning and its implicit embedding space to counter confidence-based bias through the inclusion of more diverse samples. Metric-DST learned more robust models in the presence of selection bias for generated and real-world datasets with induced bias, as well as a molecular biology prediction task with intrinsic bias. The Metric-DST learning strategy offers a flexible and widely applicable solution to mitigate selection bias and enhance fairness of machine learning models.

Updated: 2024-11-27 15:29:42

标题: Metric-DST：通过多样性引导的半监督度量学习缓解选择偏差

摘要: 选择偏差对机器学习中的公平性构成了一个关键挑战，因为在数据代表性较低的情况下训练的模型可能对少数群体表现出不良行为。半监督学习策略如自训练可以通过将未标记数据纳入模型训练来更好地了解人口分布，从而缓解选择偏差。然而，传统的自训练试图包含高置信度的数据样本，这可能加强现有模型的偏见并影响效果。我们提出了Metric-DST，一种以多样性为导向的自训练策略，利用度量学习及其隐式嵌入空间来通过包含更多不同样本来抵消基于置信度的偏见。在引入偏见的生成和真实世界数据集以及具有内在偏见的分子生物学预测任务中，Metric-DST学习了更健壮的模型。Metric-DST学习策略为缓解选择偏差和提高机器学习模型的公平性提供了灵活且广泛适用的解决方案。

更新时间: 2024-11-27 15:29:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.18442v1

MROVSeg: Breaking the Resolution Curse of Vision-Language Models in Open-Vocabulary Image Segmentation

Pretrained vision-language models (VLMs), \eg CLIP, are increasingly used to bridge the gap between open- and close-vocabulary recognition in open-vocabulary image segmentation. As VLMs are generally pretrained with low-resolution images (e.g. $224\times224$), most previous methods operate only on downscaled images. We question this design as low resolution features often fail to preserve fine details. A typical solution is to employ additional image backbones for high-resolution inputs, but it also introduce significant computation overhead. Therefore, we propose MROVSeg, a multi-resolution training framework for open-vocabulary image segmentation with a single pretrained CLIP backbone, that uses sliding windows to slice the high-resolution input into uniform patches, each matching the input size of the well-trained image encoder. Its key components include a Multi-Res Adapter, which restores the spatial geometry and grasps local-global correspondences across patches by interacting with multi-resolution features. To achieve accurate segmentation, we introduce Multi-grained Masked Attention scheme to aggregate multi-grained semantics from multi-resolution CLIP features to object queries. Through comprehensive experiments, we demonstrate the superiority of MROVSeg on well-established open-vocabulary image segmentation benchmarks, establishing new standards for open-vocabulary image segmentation.

Updated: 2024-11-27 15:26:41

标题: MROVSeg：打破视觉语言模型在开放词汇图像分割中的分辨率诅咒

摘要: 预训练的视觉语言模型（VLMs），如CLIP，越来越多地被用于在开放词汇图像分割中弥合开放和封闭词汇识别之间的差距。由于VLMs通常是用低分辨率图像（例如$224\times224$）进行预训练的，大多数先前的方法只能在缩小的图像上运行。我们对这种设计提出了质疑，因为低分辨率特征通常无法保留细节。一个典型的解决方案是使用额外的图像主干来处理高分辨率输入，但这也会引入显着的计算开销。因此，我们提出了MROVSeg，一个用单个预训练的CLIP主干进行多分辨率训练的开放词汇图像分割框架，它使用滑动窗口将高分辨率输入切割为统一的补丁，每个补丁匹配训练有素的图像编码器的输入大小。其关键组件包括一个多分辨率适配器，通过与多分辨率特征交互来恢复空间几何并把握补丁之间的局部-全局对应关系。为了实现准确的分割，我们引入了多粒度蒙版注意机制，从多分辨率CLIP特征中聚合多粒度语义到对象查询。通过全面的实验，我们展示了MROVSeg在成熟的开放词汇图像分割基准上的优越性，为开放词汇图像分割建立了新的标准。

更新时间: 2024-11-27 15:26:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.14776v2

Creativity in AI: Progresses and Challenges

Creativity is the ability to produce novel, useful, and surprising ideas, and has been widely studied as a crucial aspect of human cognition. Machine creativity on the other hand has been a long-standing challenge. With the rise of advanced generative AI, there has been renewed interest and debate regarding AI's creative capabilities. Therefore, it is imperative to revisit the state of creativity in AI and identify key progresses and remaining challenges. In this work, we survey leading works studying the creative capabilities of AI systems, focusing on creative problem-solving, linguistic, artistic, and scientific creativity. Our review suggests that while the latest AI models are largely capable of producing linguistically and artistically creative outputs such as poems, images, and musical pieces, they struggle with tasks that require creative problem-solving, abstract thinking and compositionality and their generations suffer from a lack of diversity, originality, long-range incoherence and hallucinations. We also discuss key questions concerning copyright and authorship issues with generative models. Furthermore, we highlight the need for a comprehensive evaluation of creativity that is process-driven and considers several dimensions of creativity. Finally, we propose future research directions to improve the creativity of AI outputs, drawing inspiration from cognitive science and psychology.

Updated: 2024-11-27 15:22:29

标题: 人工智能中的创造力：进展与挑战

摘要: 创造力是产生新颖、有用和令人惊讶的想法的能力，被广泛研究为人类认知的重要方面。另一方面，机器创造力长期以来一直是一个挑战。随着先进生成人工智能的兴起，关于人工智能的创造性能力引起了重新关注和辩论。因此，有必要重新审视人工智能中的创造力状态，并确定关键进展和尚未解决的挑战。在这项工作中，我们调查了研究人工智能系统创造能力的领先作品，重点关注创造性问题解决、语言、艺术和科学创造力。我们的回顾表明，虽然最新的人工智能模型在产生诸如诗歌、图像和音乐作品等语言和艺术创造性输出方面基本能胜任，但它们在需要创造性问题解决、抽象思维和组合性的任务上遇到困难，它们的生成物缺乏多样性、独创性、长程不连贯和幻觉。我们还讨论了关于生成模型的版权和作者权问题的关键问题。此外，我们强调需要进行一个以过程为驱动并考虑创造力几个维度的综合评估。最后，我们提出了未来改进人工智能输出创造性的研究方向，从认知科学和心理学中汲取灵感。

更新时间: 2024-11-27 15:22:29

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.17218v3

EnrichEvent: Enriching Social Data with Contextual Information for Emerging Event Extraction

Social platforms have emerged as crucial platforms for disseminating information and discussing real-life social events, offering researchers an excellent opportunity to design and implement novel event detection frameworks. However, most existing approaches only exploit keyword burstiness or network structures to detect unspecified events. Thus, they often need help identifying unknown events regarding the challenging nature of events and social data. Social data, e.g., tweets, is characterized by misspellings, incompleteness, word sense ambiguation, irregular language, and variation in aspects of opinions. Moreover, extracting discriminative features and patterns for evolving events by exploiting the limited structural knowledge is almost infeasible. To address these challenges, in this paper, we propose a novel framework, namely EnrichEvent, that leverages the linguistic and contextual representations of streaming social data. In particular, we leverage contextual and linguistic knowledge to detect semantically related tweets and enhance the effectiveness of the event detection approaches. Eventually, our proposed framework produces cluster chains for each event to show the evolving variation of the event through time. We conducted extensive experiments to evaluate our framework, validating its high performance and effectiveness in detecting and distinguishing unspecified social events.

Updated: 2024-11-27 15:19:51

标题: EnrichEvent：为新兴事件提取增加上下文信息的社交数据

摘要: 社交平台已经成为传播信息和讨论现实社会事件的关键平台，为研究人员提供了设计和实施新型事件检测框架的绝佳机会。然而，大多数现有方法仅利用关键词的突发性或网络结构来检测未指定的事件。因此，他们经常需要帮助识别未知事件，考虑到事件和社交数据的挑战性质。社交数据，例如推文，具有拼写错误、不完整性、词义模糊、不规则语言和观点方面的变化特征。此外，通过利用有限的结构知识来提取演化事件的区分性特征和模式几乎是不可行的。为了解决这些挑战，在本文中，我们提出了一个新颖的框架，即EnrichEvent，利用流动社交数据的语言和语境表示。具体来说，我们利用语境和语言知识来检测语义相关的推文，并增强事件检测方法的有效性。最终，我们提出的框架为每个事件生成集群链，以展示事件随时间的演变变化。我们进行了大量实验来评估我们的框架，在检测和区分未指定的社交事件方面验证了其高性能和有效性。

更新时间: 2024-11-27 15:19:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2307.16082v5

An End-to-End Smart Predict-then-Optimize Framework for Vehicle Relocation Problems in Large-Scale Vehicle Crowd Sensing

Ubiquitous mobile devices have catalyzed the development of vehicle crowd sensing (VCS). In particular, vehicle sensing systems show great potential in the flexible acquisition of spatio-temporal urban data through built-in sensors under diverse sensing scenarios. However, vehicle systems often exhibit biased coverage due to the heterogeneous nature of trip requests and routes. To achieve a high sensing coverage, a critical challenge lies in optimally relocating vehicles to minimize the divergence between vehicle distributions and target sensing distributions. Conventional approaches typically employ a two-stage predict-then-optimize (PTO) process: first predicting real-time vehicle distributions and subsequently generating an optimal relocation strategy based on the predictions. However, this approach can lead to suboptimal decision-making due to the propagation of errors from upstream prediction. To this end, we develop an end-to-end Smart Predict-then-Optimize (SPO) framework by integrating optimization into prediction within the deep learning architecture, and the entire framework is trained by minimizing the task-specific matching divergence rather than the upstream prediction error. Methodologically, we formulate the vehicle relocation problem by quadratic programming (QP) and incorporate a novel unrolling approach based on the Alternating Direction Method of Multipliers (ADMM) within the SPO framework to compute gradients of the QP layer, facilitating backpropagation and gradient-based optimization for end-to-end learning. The effectiveness of the proposed framework is validated by real-world taxi datasets in Hong Kong. Utilizing the alternating differentiation method, the general SPO framework presents a novel concept of addressing decision-making problems with uncertainty, demonstrating significant potential for advancing applications in intelligent transportation systems.

Updated: 2024-11-27 15:16:22

标题: 一个端到端的智能预测-优化框架用于大规模车辆群感知中的车辆重新定位问题

摘要: 普及的移动设备已经催生了车辆群体感知(VCS)的发展。特别是，车辆感知系统在各种感知场景下通过内置传感器灵活获取城市时空数据显示出巨大潜力。然而，由于行程请求和路线的异质性，车辆系统通常表现出偏向性覆盖。为实现高覆盖率的感知，一个关键挑战在于最优地重新定位车辆，以减小车辆分布和目标感知分布之间的差异。传统方法通常采用两阶段预测-优化(PTO)过程：首先预测实时车辆分布，然后基于预测生成最佳的重新定位策略。然而，由于上游预测误差的传播，这种方法可能导致次优的决策制定。为此，我们开发了一个端到端的智能预测-优化(SPO)框架，通过将优化集成到深度学习架构中的预测中，并通过最小化任务特定匹配差异而不是上游预测误差来训练整个框架。在方法上，我们通过二次规划(QP)来制定车辆重新定位问题，并在SPO框架中基于交替方向乘法器(ADMM)的新型展开方法来计算QP层的梯度，促进端到端学习的反向传播和基于梯度的优化。所提出的框架的有效性经过了在香港的真实出租车数据集的验证。利用交替微分方法，普通SPO框架提出了一个新颖的概念，用于解决带有不确定性的决策问题，展示了在智能交通系统中推进应用的重要潜力。

更新时间: 2024-11-27 15:16:22

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2411.18432v1

How Does Variance Shape the Regret in Contextual Bandits?

We consider realizable contextual bandits with general function approximation, investigating how small reward variance can lead to better-than-minimax regret bounds. Unlike in minimax bounds, we show that the eluder dimension $d_\text{elu}$$-$a complexity measure of the function class$-$plays a crucial role in variance-dependent bounds. We consider two types of adversary: (1) Weak adversary: The adversary sets the reward variance before observing the learner's action. In this setting, we prove that a regret of $\Omega(\sqrt{\min\{A,d_\text{elu}\}\Lambda}+d_\text{elu})$ is unavoidable when $d_{\text{elu}}\leq\sqrt{AT}$, where $A$ is the number of actions, $T$ is the total number of rounds, and $\Lambda$ is the total variance over $T$ rounds. For the $A\leq d_\text{elu}$ regime, we derive a nearly matching upper bound $\tilde{O}(\sqrt{A\Lambda}+d_\text{elu})$ for the special case where the variance is revealed at the beginning of each round. (2) Strong adversary: The adversary sets the reward variance after observing the learner's action. We show that a regret of $\Omega(\sqrt{d_\text{elu}\Lambda}+d_\text{elu})$ is unavoidable when $\sqrt{d_\text{elu}\Lambda}+d_\text{elu}\leq\sqrt{AT}$. In this setting, we provide an upper bound of order $\tilde{O}(d_\text{elu}\sqrt{\Lambda}+d_\text{elu})$. Furthermore, we examine the setting where the function class additionally provides distributional information of the reward, as studied by Wang et al. (2024). We demonstrate that the regret bound $\tilde{O}(\sqrt{d_\text{elu}\Lambda}+d_\text{elu})$ established in their work is unimprovable when $\sqrt{d_{\text{elu}}\Lambda}+d_\text{elu}\leq\sqrt{AT}$. However, with a slightly different definition of the total variance and with the assumption that the reward follows a Gaussian distribution, one can achieve a regret of $\tilde{O}(\sqrt{A\Lambda}+d_\text{elu})$.

Updated: 2024-11-27 15:14:24

标题: 方差如何影响上下文臂的后悔？

摘要: 我们考虑具有一般函数逼近的可实现的情境性赌博机，研究小的奖励方差如何导致优于极小化遗憾界限。与极小化界限不同的是，我们展示了eluder维度$d_\text{elu}$——函数类的一个复杂度度量——在依赖于方差的界限中发挥了关键作用。我们考虑两种对手: （1）弱对手：对手在观察学习者的动作之前设置奖励方差。在这种情况下，我们证明当$d_{\text{elu}}\leq\sqrt{AT}$时，遗憾不可避免地为$\Omega(\sqrt{\min\{A,d_\text{elu}\}\Lambda}+d_\text{elu})$，其中$A$是动作数量，$T$是总回合数，$\Lambda$是$T$回合的总方差。对于$A\leq d_\text{elu}$的情况，我们推导出了一个几乎匹配的上界$\tilde{O}(\sqrt{A\Lambda}+d_\text{elu})$，特殊情况是方差在每轮开始时被揭示。（2）强对手：对手在观察学习者的动作之后设置奖励方差。我们展示了当$\sqrt{d_\text{elu}\Lambda}+d_\text{elu}\leq\sqrt{AT}$时，遗憾不可避免地为$\Omega(\sqrt{d_\text{elu}\Lambda}+d_\text{elu})$。在这种情况下，我们提供了一个顺序为$\tilde{O}(d_\text{elu}\sqrt{\Lambda}+d_\text{elu})$的上界。此外，我们研究了函数类另外提供奖励的分布信息的情况，这是由Wang等人（2024年）研究的。我们证明了在他们的工作中建立的遗憾界限$\tilde{O}(\sqrt{d_\text{elu}\Lambda}+d_\text{elu})$当$\sqrt{d_{\text{elu}}\Lambda}+d_\text{elu}\leq\sqrt{AT}$时是无法改进的。然而，通过稍微不同的总方差定义，并假设奖励遵循高斯分布，可以实现一个遗憾为$\tilde{O}(\sqrt{A\Lambda}+d_\text{elu})$的界限。

更新时间: 2024-11-27 15:14:24

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.12713v2

Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study

Retrieval-augmented generation (RAG) is increasingly recognized as an effective approach for mitigating the hallucination of large language models (LLMs) through the integration of external knowledge. While numerous efforts, most studies focus on a single type of externeal knowledge source. However, in real-world applications, most situations involve diverse knowledge from various sources, yet this area has been less explored. The main dilemma is the lack of a suitable dataset containing multiple knowledge sources and pre-exploration of the associated issues. To address these challenges, we standardize a benchmark dataset that combines structured and unstructured knowledge across diverse and complementary domains. Based on this dataset, we further develop a plug-and-play RAG framework, PruningRAG, whose main characteristic is to employ multi-granularity pruning strategies for optimizing the integration of relevant information and minimizing misleading context. Building upon the standardized dataset and PruningRAG, we also report a series of experimental results, as well as insightful findings. Our dataset and code are publicly available\footnote{https://github.com/USTCAGI/PruningRAG}, with the aim of advancing future research in the RAG community.

Updated: 2024-11-27 15:13:00

标题: 多源知识修剪用于检索增强生成：基准和实证研究

摘要: 检索增强生成（RAG）越来越被认为是通过整合外部知识来减轻大语言模型（LLMs）的幻觉的有效方法。尽管有许多努力，大多数研究集中在单一类型的外部知识源上。然而，在现实世界的应用中，大多数情况涉及来自各种来源的多样化知识，但这个领域却少有探讨。主要困境在于缺乏一个包含多个知识源和相关问题的适当数据集。为了解决这些挑战，我们标准化了一个基准数据集，该数据集结合了跨多个互补领域的结构化和非结构化知识。基于这个数据集，我们进一步开发了一个即插即用的RAG框架，PruningRAG，其主要特征是采用多颗粒度修剪策略来优化相关信息的整合，并最大程度地减少误导性的上下文。基于标准化数据集和PruningRAG，我们还报告了一系列实验结果以及深刻的发现。我们的数据集和代码公开可用，旨在推动RAG社区未来研究的进展。

更新时间: 2024-11-27 15:13:00

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2409.13694v2

MM-Path: Multi-modal, Multi-granularity Path Representation Learning -- Extended Version

Developing effective path representations has become increasingly essential across various fields within intelligent transportation. Although pre-trained path representation learning models have shown improved performance, they predominantly focus on the topological structures from single modality data, i.e., road networks, overlooking the geometric and contextual features associated with path-related images, e.g., remote sensing images. Similar to human understanding, integrating information from multiple modalities can provide a more comprehensive view, enhancing both representation accuracy and generalization. However, variations in information granularity impede the semantic alignment of road network-based paths (road paths) and image-based paths (image paths), while the heterogeneity of multi-modal data poses substantial challenges for effective fusion and utilization. In this paper, we propose a novel Multi-modal, Multi-granularity Path Representation Learning Framework (MM-Path), which can learn a generic path representation by integrating modalities from both road paths and image paths. To enhance the alignment of multi-modal data, we develop a multi-granularity alignment strategy that systematically associates nodes, road sub-paths, and road paths with their corresponding image patches, ensuring the synchronization of both detailed local information and broader global contexts. To address the heterogeneity of multi-modal data effectively, we introduce a graph-based cross-modal residual fusion component designed to comprehensively fuse information across different modalities and granularities. Finally, we conduct extensive experiments on two large-scale real-world datasets under two downstream tasks, validating the effectiveness of the proposed MM-Path. This is an extended version of the paper accepted by KDD 2025.

Updated: 2024-11-27 15:10:22

标题: MM-Path：多模态、多粒度路径表示学习 -- 扩展版本

摘要: 发展有效的路径表示在智能交通的各个领域中变得越来越重要。虽然预训练的路径表示学习模型表现出了改进的性能，但它们主要关注单一模态数据的拓扑结构，即道路网络，忽视了与路径相关的图像的几何和上下文特征，例如遥感图像。与人类理解类似，整合多模态信息可以提供更全面的视角，增强表示的准确性和泛化能力。然而，信息粒度的差异阻碍了基于道路网络的路径（道路路径）和基于图像的路径（图像路径）的语义对齐，而多模态数据的异质性对有效融合和利用提出了重大挑战。在本文中，我们提出了一种新颖的多模态、多粒度路径表示学习框架（MM-Path），通过整合道路路径和图像路径的模态学习通用路径表示。为了增强多模态数据的对齐，我们开发了一个多粒度对齐策略，系统地将节点、道路子路径和道路路径与它们对应的图像块关联起来，确保详细的局部信息和更广泛的全局上下文的同步。为了有效应对多模态数据的异质性，我们引入了一个基于图的跨模态残差融合组件，旨在全面融合不同模态和粒度的信息。最后，我们在两个大规模实际数据集上进行了大量实验，验证了所提出的MM-Path的有效性。这是一篇被KDD 2025接受的论文的扩展版本。

更新时间: 2024-11-27 15:10:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.18428v1

Improved Noise Schedule for Diffusion Training

Diffusion models have emerged as the de facto choice for generating high-quality visual signals across various domains. However, training a single model to predict noise across various levels poses significant challenges, necessitating numerous iterations and incurring significant computational costs. Various approaches, such as loss weighting strategy design and architectural refinements, have been introduced to expedite convergence and improve model performance. In this study, we propose a novel approach to design the noise schedule for enhancing the training of diffusion models. Our key insight is that the importance sampling of the logarithm of the Signal-to-Noise ratio ($\log \text{SNR}$), theoretically equivalent to a modified noise schedule, is particularly beneficial for training efficiency when increasing the sample frequency around $\log \text{SNR}=0$. This strategic sampling allows the model to focus on the critical transition point between signal dominance and noise dominance, potentially leading to more robust and accurate predictions.We empirically demonstrate the superiority of our noise schedule over the standard cosine schedule.Furthermore, we highlight the advantages of our noise schedule design on the ImageNet benchmark, showing that the designed schedule consistently benefits different prediction targets. Our findings contribute to the ongoing efforts to optimize diffusion models, potentially paving the way for more efficient and effective training paradigms in the field of generative AI.

Updated: 2024-11-27 15:10:12

标题: 改进的噪声时间表用于扩散训练

摘要: 扩散模型已成为在各个领域生成高质量视觉信号的事实选择。然而，训练单个模型以预测各个级别的噪声存在重大挑战，需要大量迭代，并产生显著的计算成本。各种方法，如损失加权策略设计和结构改进，已被引入以加快收敛速度并提高模型性能。在本研究中，我们提出了一种新颖的方法来设计扩散模型的噪声计划以增强训练效果。我们的关键洞察是，信噪比（$\log \text{SNR}$）的重要性采样，在理论上等效于修改后的噪声计划，特别有利于在$\log \text{SNR}=0$周围增加样本频率时的训练效率。这种战略采样使模型能够专注于信号优势和噪声优势之间的关键转变点，潜在地导致更稳健和准确的预测。我们在实证上证明了我们的噪声计划优于标准余弦计划。此外，我们强调了我们的噪声计划设计在ImageNet基准上的优势，显示出设计的计划始终有益于不同的预测目标。我们的发现有助于优化扩散模型的持续努力，可能为生成式人工智能领域更高效和有效的训练范式铺平道路。

更新时间: 2024-11-27 15:10:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.03297v2

Streamlining Prediction in Bayesian Deep Learning

The rising interest in Bayesian deep learning (BDL) has led to a plethora of methods for estimating the posterior distribution. However, efficient computation of inferences, such as predictions, has been largely overlooked with Monte Carlo integration remaining the standard. In this work we examine streamlining prediction in BDL through a single forward pass without sampling. For this we use local linearisation on activation functions and local Gaussian approximations at linear layers. Thus allowing us to analytically compute an approximation to the posterior predictive distribution. We showcase our approach for both MLP and transformers, such as ViT and GPT-2, and assess its performance on regression and classification tasks.

Updated: 2024-11-27 15:07:44

标题: 简化贝叶斯深度学习中的预测

摘要: 对贝叶斯深度学习（BDL）的兴趣日益增加，导致了大量用于估计后验分布的方法。然而，对于诸如预测等推断的高效计算在很大程度上被忽视，而蒙特卡洛积分仍然是标准方法。在这项工作中，我们通过单次前向传递而不进行抽样来简化BDL中的预测。为此，我们在激活函数上使用局部线性化以及在线性层上使用局部高斯近似。这使我们能够分析计算后验预测分布的近似值。我们展示了我们的方法适用于MLP和transformers，如ViT和GPT-2，并评估了其在回归和分类任务中的性能。

更新时间: 2024-11-27 15:07:44

领域: cs.LG

下载: http://arxiv.org/abs/2411.18425v1

FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving

Serving numerous users and requests concurrently requires good fairness in Large Language Models (LLMs) serving system. This ensures that, at the same cost, the system can meet the Service Level Objectives (SLOs) of more users , such as time to first token (TTFT) and time between tokens (TBT), rather than allowing a few users to experience performance far exceeding the SLOs. To achieve better fairness, the preemption-based scheduling policy dynamically adjusts the priority of each request to maintain balance during runtime. However, existing systems tend to overly prioritize throughput, overlooking the overhead caused by preemption-induced context switching, which is crucial for maintaining fairness through priority adjustments. In this work, we identify three main challenges that result in this overhead. 1) Inadequate I/O utilization. 2) GPU idleness. 3) Unnecessary I/O transmission during multi-turn conversations. Our key insight is that the block-based KV cache memory policy in existing systems, while achieving near-zero memory waste, leads to discontinuity and insufficient granularity in the KV cache memory. To respond, we introduce FastSwitch, a fairness-aware serving system that not only aligns with existing KV cache memory allocation policy but also mitigates context switching overhead. Our evaluation shows that FastSwitch outperforms the state-of-the-art LLM serving system vLLM with speedups of 1.4-11.2x across different tail TTFT and TBT.

Updated: 2024-11-27 15:07:28

标题: FastSwitch：在考虑公平性的大型语言模型服务中优化上下文切换效率

摘要: 为了同时为众多用户和请求提供服务，大型语言模型（LLMs）服务系统需要良好的公平性。这确保系统在相同成本的情况下，能够满足更多用户的服务水平目标（SLOs），如首次令牌时间（TTFT）和令牌之间的时间（TBT），而不是允许少数用户体验远远超出SLOs的性能。为了实现更好的公平性，基于抢占的调度策略动态调整每个请求的优先级，以在运行时保持平衡。然而，现有系统往往过分优先考虑吞吐量，忽视了抢占引起的上下文切换带来的开销，这对通过优先级调整来维持公平性至关重要。在这项工作中，我们确定导致这种开销的三个主要挑战。1）I/O利用不足。2）GPU闲置。3）多轮对话期间不必要的I/O传输。我们的关键观点是，现有系统中基于块的KV缓存内存策略虽然实现了接近零的内存浪费，但导致KV缓存内存中的不连续性和不足的粒度。为了应对这一问题，我们引入了FastSwitch，一个具有公平性意识的服务系统，不仅与现有的KV缓存内存分配策略保持一致，而且减轻了上下文切换的开销。我们的评估结果显示，FastSwitch在不同的尾部TTFT和TBT上胜过了最先进的LLM服务系统vLLM，速度提升了1.4-11.2倍。

更新时间: 2024-11-27 15:07:28

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2411.18424v1

When does a bridge become an aeroplane?

Despite recent advances in population-based structural health monitoring (PBSHM), knowledge transfer between highly-disparate structures (i.e., heterogeneous populations) remains a challenge. It has been proposed that heterogeneous transfer may be accomplished via intermediate structures that bridge the gap in information between the structures of interest. A key aspect of the technique is the idea that by varying parameters such as material properties and geometry, one structure can be continuously morphed into another. The current work demonstrates the development of these interpolating structures, via case studies involving the parameterisation of (and transfer between) a simple, simulated 'bridge' and 'aeroplane'. The facetious question 'When is a bridge not an aeroplane?' has been previously asked in the context of predicting positive transfer based on structural similarity. While the obvious answer to this question is 'Always,' the current work demonstrates that in some cases positive transfer can be achieved between highly-disparate systems.

Updated: 2024-11-27 14:49:49

标题: 什么时候桥梁变成飞机？

摘要: 尽管基于人口的结构健康监测（PBSHM）取得了最近的进展，但高度不同结构（即异质人口）之间的知识转移仍然是一个挑战。有人提出，异质转移可以通过中间结构来实现，这些结构可以弥补感兴趣结构之间信息差距。该技术的关键方面是通过改变诸如材料特性和几何形状等参数，一个结构可以连续地变形成另一个结构。目前的工作通过涉及简单的模拟“桥梁”和“飞机”的参数化（和转移）的案例研究，展示了这些插值结构的发展。在预测基于结构相似性的正向转移时，一个戏谑的问题是“何时桥梁不是飞机？”此问题的显而易见答案是“总是”，但当前的工作证明，在某些情况下，可以在高度不同的系统之间实现正向转移。

更新时间: 2024-11-27 14:49:49

领域: cs.LG

下载: http://arxiv.org/abs/2411.18406v1

Proving and Rewarding Client Diversity to Strengthen Resilience of Blockchain Networks

Client diversity in the Ethereum blockchain refers to the use of multiple independent implementations of the Ethereum protocol. This effectively enhances network resilience by reducing reliance on any single software client implementation. With client diversity, a single bug cannot tear the whole network down. However, despite multiple production-grade client implementations being available, there is still a heavily skewed distribution of clients in Ethereum. This is a concern for the community. In this paper, we introduce a novel conceptual framework for client diversity. The core goal is to improve the network resilience as a systemic property. Our key insight is to leverage economic incentives and verifiable execution to encourage the adoption of minority clients, thereby fostering a more robust blockchain ecosystem. Concretely, we propose to unambiguously and provably identify the client implementation used by any protocol participant, and to use this information to incentivize the usage of minority clients by offering higher participation rewards. We outline a detailed blueprint for our conceptual framework, in the realm of Ethereum. Our proposal is a game changer for improving client diversity of blockchains. Ultimately, it applies to strengthening the resilience of any decentralized distributed systems.

Updated: 2024-11-27 14:44:43

标题: 证明和奖励客户多样性以增强区块链网络的弹性

摘要: 以太坊区块链中的客户端多样性指的是使用多个独立实现以太坊协议的客户端。通过这种方式，有效地增强了网络的弹性，减少了对任何单一软件客户端实现的依赖。有了客户端的多样性，一个单一的bug不能将整个网络拖垮。然而，尽管有多个生产级客户端实现可用，以太坊中客户端的分布仍然存在严重倾斜，这是社区的一个关注点。在本文中，我们介绍了一个关于客户端多样性的新概念框架。核心目标是改善网络的弹性作为一个系统属性。我们的关键见解是利用经济激励和可验证的执行来鼓励少数客户端的采用，从而促进更健壮的区块链生态系统。具体而言，我们建议明确和可证明地识别任何协议参与者使用的客户端实现，并利用这些信息通过提供更高的参与奖励来激励少数客户端的使用。我们在以太坊领域提出了我们概念框架的详细蓝图。我们的提议是改善区块链客户端多样性的一个颠覆性创举。最终，这适用于增强任何分散式分布系统的弹性。

更新时间: 2024-11-27 14:44:43

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2411.18401v1

A Novel Approach to Image Steganography Using Generative Adversarial Networks

The field of steganography has long been focused on developing methods to securely embed information within various digital media while ensuring imperceptibility and robustness. However, the growing sophistication of detection tools and the demand for increased data hiding capacity have revealed limitations in traditional techniques. In this paper, we propose a novel approach to image steganography that leverages the power of generative adversarial networks (GANs) to address these challenges. By employing a carefully designed GAN architecture, our method ensures the creation of stego-images that are visually indistinguishable from their original counterparts, effectively thwarting detection by advanced steganalysis tools. Additionally, the adversarial training paradigm optimizes the balance between embedding capacity, imperceptibility, and robustness, enabling more efficient and secure data hiding. We evaluate our proposed method through a series of experiments on benchmark datasets and compare its performance against baseline techniques, including least significant bit (LSB) substitution and discrete cosine transform (DCT)-based methods. Our results demonstrate significant improvements in metrics such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and robustness against detection. This work not only contributes to the advancement of image steganography but also provides a foundation for exploring GAN-based approaches for secure digital communication.

Updated: 2024-11-27 14:34:41

标题: 使用生成对抗网络的图像隐写术的新方法

摘要: 隐写术领域长期以来一直致力于开发方法，在确保信息嵌入各种数字媒体的同时，确保不可察觉性和鲁棒性。然而，检测工具的日益复杂化以及对增加数据隐藏容量的需求已经揭示了传统技术的局限性。在本文中，我们提出了一种利用生成对抗网络（GANs）的力量来应对这些挑战的新方法。通过采用精心设计的GAN架构，我们的方法确保生成的隐写图像在视觉上与其原始对应物无法区分，有效地阻止了高级隐写分析工具的检测。此外，对抗式训练范式优化了嵌入容量、不可察觉性和鲁棒性之间的平衡，实现了更高效、更安全的数据隐藏。我们通过一系列对基准数据集的实验来评估我们提出的方法，并将其性能与基线技术（包括最低有效位（LSB）替换和基于离散余弦变换（DCT）的方法）进行比较。我们的结果显示，峰值信噪比（PSNR）、结构相似性指数（SSIM）和对检测的鲁棒性等指标均有显著改进。这项工作不仅推动了图像隐写术的发展，还为探索基于GAN的安全数字通信方法奠定了基础。

更新时间: 2024-11-27 14:34:41

领域: cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2412.00094v1

Federated Learning with Uncertainty and Personalization via Efficient Second-order Optimization

Federated Learning (FL) has emerged as a promising method to collaboratively learn from decentralized and heterogeneous data available at different clients without the requirement of data ever leaving the clients. Recent works on FL have advocated taking a Bayesian approach to FL as it offers a principled way to account for the model and predictive uncertainty by learning a posterior distribution for the client and/or server models. Moreover, Bayesian FL also naturally enables personalization in FL to handle data heterogeneity across the different clients by having each client learn its own distinct personalized model. In particular, the hierarchical Bayesian approach enables all the clients to learn their personalized models while also taking into account the commonalities via a prior distribution provided by the server. However, despite their promise, Bayesian approaches for FL can be computationally expensive and can have high communication costs as well because of the requirement of computing and sending the posterior distributions. We present a novel Bayesian FL method using an efficient second-order optimization approach, with a computational cost that is similar to first-order optimization methods like Adam, but also provides the various benefits of the Bayesian approach for FL (e.g., uncertainty, personalization), while also being significantly more efficient and accurate than SOTA Bayesian FL methods (both for standard as well as personalized FL settings). Our method achieves improved predictive accuracies as well as better uncertainty estimates as compared to the baselines which include both optimization based as well as Bayesian FL methods.

Updated: 2024-11-27 14:30:02

标题: 使用高效的二阶优化实现带有不确定性和个性化的联邦学习

摘要: 联邦学习（FL）已经成为一种有前景的方法，可以在不要求数据离开客户端的情况下，从分散和异质数据中协作学习。最近对FL的研究倡导采用贝叶斯方法，因为贝叶斯方法提供了一种原则性的方式，通过学习客户端和/或服务器模型的后验分布来考虑模型和预测的不确定性。此外，贝叶斯FL自然地实现了个性化，通过让每个客户端学习其自己独特的个性化模型来处理不同客户端之间的数据异构性。特别是，分层贝叶斯方法使得所有客户端在考虑服务器提供的先验分布的基础上学习其个性化模型成为可能。然而，尽管有前景，贝叶斯方法对于FL而言可能计算成本高昂，并且由于需要计算和发送后验分布，通信成本也很高。我们提出了一种使用高效的二阶优化方法的新颖的贝叶斯FL方法，其计算成本类似于Adam等一阶优化方法，但同时提供了贝叶斯方法在FL中的各种好处（如不确定性、个性化），同时比SOTA贝叶斯FL方法更有效和准确（无论是标准还是个性化FL设置）。我们的方法在预测准确性和不确定性估计方面均优于包括基于优化的方法和贝叶斯FL方法在内的基准方法。

更新时间: 2024-11-27 14:30:02

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2411.18385v1

Optimal In-Network Distribution of Learning Functions for a Secure-by-Design Programmable Data Plane of Next-Generation Networks

The rise of programmable data plane (PDP) and in-network computing (INC) paradigms paves the way for the development of network devices (switches, network interface cards, etc.) capable of performing advanced computing tasks. This allows to execute algorithms of various nature, including machine learning ones, within the network itself to support user and network services. In particular, this paper delves into the issue of implementing in-network learning models to support distributed intrusion detection systems (IDS). It proposes a model that optimally distributes the IDS workload, resulting from the subdivision of a "Strong Learner" (SL) model into lighter distributed "Weak Learner" (WL) models, among data plane devices; the objective is to ensure complete network security without excessively burdening their normal operations. Furthermore, a meta-heuristic approach is proposed to reduce the long computational time required by the exact solution provided by the mathematical model, and its performance is evaluated. The analysis conducted and the results obtained demonstrate the enormous potential of the proposed new approach to the creation of intelligent data planes that effectively act as a first line of defense against cyber attacks, with minimal additional workload on network devices.

Updated: 2024-11-27 14:29:53

标题: 下一代网络安全设计可编程数据平面中学习函数的最佳网络内分发

摘要: 可编程数据平面（PDP）和网络计算（INC）范式的兴起为发展能够执行高级计算任务的网络设备（交换机、网络接口卡等）铺平了道路。这使得可以在网络内部执行各种性质的算法，包括机器学习算法，以支持用户和网络服务。具体而言，本文深入探讨了实施网络学习模型以支持分布式入侵检测系统（IDS）的问题。它提出了一个模型，该模型将“强学习器”（SL）模型分割为更轻的分布式“弱学习器”（WL）模型，最优地分配到数据平面设备中；其目标是确保完整的网络安全，同时不过分加重它们的正常操作负担。此外，提出了一种元启发式方法来减少数学模型提供的精确解所需的长时间计算，并对其性能进行了评估。进行的分析和获得的结果证明了提出的新方法在创造智能数据平面方面具有巨大潜力，有效地作为针对网络攻击的第一道防线，对网络设备的额外工作负担最小。

更新时间: 2024-11-27 14:29:53

领域: cs.NI,cs.AI,math.OC

下载: http://arxiv.org/abs/2411.18384v1

ChatGPT as speechwriter for the French presidents

Generative AI proposes several large language models (LLMs) to automatically generate a message in response to users' requests. Such scientific breakthroughs promote new writing assistants but with some fears. The main focus of this study is to analyze the written style of one LLM called ChatGPT by comparing its generated messages with those of the recent French presidents. To achieve this, we compare end-of-the-year addresses written by Chirac, Sarkozy, Hollande, and Macron with those automatically produced by ChatGPT. We found that ChatGPT tends to overuse nouns, possessive determiners, and numbers. On the other hand, the generated speeches employ less verbs, pronouns, and adverbs and include, in mean, too standardized sentences. Considering some words, one can observe that ChatGPT tends to overuse "to must" (devoir), "to continue" or the lemma "we" (nous). Moreover, GPT underuses the auxiliary verb "to be" (^etre), or the modal verbs "to will" (vouloir) or "to have to" (falloir). In addition, when a short text is provided as example to ChatGPT, the machine can generate a short message with a style closed to the original wording. Finally, we reveal that ChatGPT style exposes distinct features compared to real presidential speeches.

Updated: 2024-11-27 14:29:10

标题: ChatGPT作为法国总统的演讲撰稿人

摘要: 生成式人工智能提出了几种大型语言模型（LLMs），可以自动地生成回应用户请求的消息。这种科学突破推动了新的写作助手的诞生，但也带来了一些担忧。本研究的主要焦点是通过比较一个名为ChatGPT的LLM生成的消息与最近的法国总统的消息，分析其书写风格。为了实现这一目标，我们将希拉克、萨科齐、奥朗德和马克龙撰写的年终讲话与ChatGPT自动生成的内容进行了比较。我们发现ChatGPT倾向于过度使用名词、所有格冠词和数字。另一方面，生成的演讲使用的动词、代词和副词较少，并且平均包含过于标准化的句子。考虑到一些词语，可以观察到ChatGPT倾向于过度使用"to must"（devoir）、"to continue"或词元"we"（nous）。此外，GPT很少使用助动词"to be"（être）或情态动词"to will"（vouloir）或"to have to"（falloir）。此外，当向ChatGPT提供一个短文本作为示例时，机器可以生成一个风格接近原始措辞的短消息。最后，我们揭示了ChatGPT的风格与真实总统演讲相比具有明显的特征。

更新时间: 2024-11-27 14:29:10

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2411.18382v1

XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration

Tracking the full body motions of users in XR (AR/VR) devices is a fundamental challenge to bring a sense of authentic social presence. Due to the absence of dedicated leg sensors, currently available body tracking methods adopt a synthesis approach to generate plausible motions given a 3-point signal from the head and controller tracking. In order to enable mixed reality features, modern XR devices are capable of estimating depth information of the headset surroundings using available sensors combined with dedicated machine learning models. Such egocentric depth sensing cannot drive the body directly, as it is not registered and is incomplete due to limited field-of-view and body self-occlusions. For the first time, we propose to leverage the available depth sensing signal combined with self-supervision to learn a multi-modal pose estimation model capable of tracking full body motions in real time on XR devices. We demonstrate how current 3-point motion synthesis models can be extended to point cloud modalities using a semantic point cloud encoder network combined with a residual network for multi-modal pose estimation. These modules are trained jointly in a self-supervised way, leveraging a combination of real unregistered point clouds and simulated data obtained from motion capture. We compare our approach against several state-of-the-art systems for XR body tracking and show that our method accurately tracks a diverse range of body motions. XR-MBT tracks legs in XR for the first time, whereas traditional synthesis approaches based on partial body tracking are blind.

Updated: 2024-11-27 14:25:32

标题: XR-MBT：通过自我监督学习深度点云配准实现XR的多模态全身跟踪

摘要: 追踪用户在XR（AR/VR）设备中的全身动作是带来真实社交存在感的基本挑战。由于缺乏专用的腿部传感器，目前可用的身体追踪方法采用综合方法，根据头部和控制器追踪的3点信号生成可信的动作。为了实现混合现实功能，现代XR设备能够利用可用传感器结合专用机器学习模型估算头戴设备周围的深度信息。这种自我中心的深度感知不能直接驱动身体，因为它没有注册并且由于有限的视野和身体自遮挡而不完整。我们首次提出利用可用深度感知信号结合自监督学习，学习一种能够在XR设备上实时追踪全身动作的多模态姿势估计模型。我们演示了如何将当前的3点运动合成模型扩展到点云模态，使用语义点云编码器网络结合残差网络进行多模态姿势估计。这些模块以自监督方式联合训练，利用真实未注册点云和从动作捕捉获得的模拟数据的组合。我们将我们的方法与几种最先进的XR身体追踪系统进行比较，并展示我们的方法准确追踪各种身体动作。XR-MBT首次在XR中追踪腿部，而基于部分身体追踪的传统合成方法是盲目的。

更新时间: 2024-11-27 14:25:32

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.18377v1

Preserving Deep Representations In One-Shot Pruning: A Hessian-Free Second-Order Optimization Framework

We present SNOWS, a one-shot post-training pruning framework aimed at reducing the cost of vision network inference without retraining. Current leading one-shot pruning methods minimize layer-wise least squares reconstruction error which does not take into account deeper network representations. We propose to optimize a more global reconstruction objective. This objective accounts for nonlinear activations deep in the network to obtain a better proxy for the network loss. This nonlinear objective leads to a more challenging optimization problem -- we demonstrate it can be solved efficiently using a specialized second-order optimization framework. A key innovation of our framework is the use of Hessian-free optimization to compute exact Newton descent steps without needing to compute or store the full Hessian matrix. A distinct advantage of SNOWS is that it can be readily applied on top of any sparse mask derived from prior methods, readjusting their weights to exploit nonlinearities in deep feature representations. SNOWS obtains state-of-the-art results on various one-shot pruning benchmarks including residual networks and Vision Transformers (ViT/B-16 and ViT/L-16, 86m and 304m parameters respectively).

Updated: 2024-11-27 14:25:00

标题: 在一次性剪枝中保留深度表示：一种不依赖Hessian的二阶优化框架

摘要: 我们提出了SNOWS，这是一个一次性的后训练修剪框架，旨在减少视觉网络推断的成本，而无需重新训练。当前领先的一次性修剪方法最小化逐层最小二乘重建误差，这并未考虑到更深层次网络表示。我们建议优化更全局的重建目标。该目标考虑了网络深处的非线性激活，以获得更好的网络损失代理。这个非线性目标导致了一个更具挑战性的优化问题 - 我们展示了可以使用专门的二阶优化框架有效地解决它。我们框架的一个关键创新是使用无Hessian优化来计算精确的牛顿下降步骤，而无需计算或存储完整的Hessian矩阵。SNOWS的一个明显优势是它可以轻松应用于从先前方法派生的任何稀疏掩模之上，重新调整它们的权重以利用深层特征表示中的非线性。SNOWS在各种一次性修剪基准上取得了最先进的结果，包括残差网络和Vision Transformers（ViT/B-16和ViT/L-16，分别为86m和304m参数）。

更新时间: 2024-11-27 14:25:00

领域: cs.LG

下载: http://arxiv.org/abs/2411.18376v1

G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

Recent advances in imitation learning for 3D robotic manipulation have shown promising results with diffusion-based policies. However, achieving human-level dexterity requires seamless integration of geometric precision and semantic understanding. We present G3Flow, a novel framework that constructs real-time semantic flow, a dynamic, object-centric 3D semantic representation by leveraging foundation models. Our approach uniquely combines 3D generative models for digital twin creation, vision foundation models for semantic feature extraction, and robust pose tracking for continuous semantic flow updates. This integration enables complete semantic understanding even under occlusions while eliminating manual annotation requirements. By incorporating semantic flow into diffusion policies, we demonstrate significant improvements in both terminal-constrained manipulation and cross-object generalization. Extensive experiments across five simulation tasks show that G3Flow consistently outperforms existing approaches, achieving up to 68.3% and 50.1% average success rates on terminal-constrained manipulation and cross-object generalization tasks respectively. Our results demonstrate the effectiveness of G3Flow in enhancing real-time dynamic semantic feature understanding for robotic manipulation policies.

Updated: 2024-11-27 14:17:43

标题: G3Flow：生成式3D语义流用于姿势感知和可推广的物体操纵

摘要: 最近在3D机器人操作的模仿学习方面取得了进展，Diffusion-based policies表现出有希望的结果。然而，实现人类级别的灵巧需要无缝集成几何精度和语义理解。我们提出了G3Flow，这是一个新颖的框架，通过利用基础模型构建实时语义流，这是一种动态的、以对象为中心的3D语义表示。我们的方法独特地结合了用于数字孪生创建的3D生成模型、用于语义特征提取的视觉基础模型，以及用于连续语义流更新的强大姿态跟踪。这种集成使得即使在遮挡情况下也能实现完整的语义理解，同时消除了手动标注的需求。通过将语义流纳入扩散策略，我们展示了在末端约束操作和跨对象泛化方面取得的显著改进。在五个模拟任务中进行的大量实验表明，G3Flow始终优于现有方法，在末端约束操作和跨对象泛化任务上分别实现了高达68.3%和50.1%的平均成功率。我们的结果表明了G3Flow在增强机器人操作策略的实时动态语义特征理解方面的有效性。

更新时间: 2024-11-27 14:17:43

领域: cs.RO,cs.AI,cs.CV,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.18369v1

AMPS: ASR with Multimodal Paraphrase Supervision

Spontaneous or conversational multilingual speech presents many challenges for state-of-the-art automatic speech recognition (ASR) systems. In this work, we present a new technique AMPS that augments a multilingual multimodal ASR system with paraphrase-based supervision for improved conversational ASR in multiple languages, including Hindi, Marathi, Malayalam, Kannada, and Nyanja. We use paraphrases of the reference transcriptions as additional supervision while training the multimodal ASR model and selectively invoke this paraphrase objective for utterances with poor ASR performance. Using AMPS with a state-of-the-art multimodal model SeamlessM4T, we obtain significant relative reductions in word error rates (WERs) of up to 5%. We present detailed analyses of our system using both objective and human evaluation metrics.

Updated: 2024-11-27 14:16:51

标题: AMPS：多模态释义监督下的ASR

摘要: Spontaneous or conversational multilingual speech presents many challenges for state-of-the-art automatic speech recognition (ASR) systems. In this work, we present a new technique called AMPS that enhances a multilingual multimodal ASR system with paraphrase-based supervision to improve conversational ASR in multiple languages, including Hindi, Marathi, Malayalam, Kannada, and Nyanja. We use paraphrases of the reference transcriptions as additional supervision during training of the multimodal ASR model and apply this paraphrase objective selectively for utterances with poor ASR performance. By implementing AMPS with a state-of-the-art multimodal model SeamlessM4T, we achieve significant relative reductions in word error rates (WERs) of up to 5%. We provide detailed analyses of our system using both objective and human evaluation metrics.

更新时间: 2024-11-27 14:16:51

领域: cs.CL,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2411.18368v1

Goetterfunke: Creativity in Machinae Sapiens. About the Qualitative Shift in Generative AI with a Focus on Text-To-Image

The year 2022 marks a watershed in technology, and arguably in human history, with the release of powerful generative AIs capable of convincingly performing creative tasks. With the help of these systems, anyone can create something that would previously have been considered a remarkable work of art. In human-AI collaboration, the computer seems to have become more than a tool. Many who have made their first contact with current generative AIs see them as "creativity machines" while for others the term "machine creativity" remains an oxymoron. This article is about (the possibility of) creativity in computers within the current Machine Learning paradigm. It outlines some of the key concepts behind the technologies and the innovations that have contributed to this qualitative shift, with a focus on text-to-image systems. The nature of Artificial Creativity as such is discussed, as well as what this might mean for art. AI may become a responsible collaborator with elements of independent machine authorship in the artistic process.

Updated: 2024-11-27 14:16:41

标题: 神灵之火：机械智能中的创造力。关于生成AI中的定性转变，重点关注文字到图像的转换。

摘要: 2022年标志着技术的一个分水岭，也可以说是人类历史的一个分水岭，因为强大的生成人工智能的发布使其能够令人信服地执行创意任务。在这些系统的帮助下，任何人都可以创造出以前被认为是杰出艺术品的东西。在人工智能与人类的合作中，计算机似乎已经超越了工具的范畴。许多第一次接触当前生成人工智能的人将其视为“创造性机器”，而对其他人来说，“机器创造力”这个词仍然是一个矛盾修饰语。本文讨论了当前机器学习范式内计算机中（可能的）创造力。它概述了一些技术背后的关键概念和促成这一质的转变的创新，重点放在文本到图像系统上。讨论了人工创造力的本质，以及这可能对艺术意味着什么。人工智能可能会成为艺术过程中具有独立机器作者元素的负责合作者。

更新时间: 2024-11-27 14:16:41

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2411.10448v2

GPT as ghostwriter at the White House

Recently several large language models (LLMs) have demonstrated their capability to generate a message in response to a user request. Such scientific breakthroughs promote new perspectives but also some fears. The main focus of this study is to analyze the written style of one LLM called ChatGPT 3.5 by comparing its generated messages with those of the recent US presidents. To achieve this objective, we compare the State of the Union addresses written by Reagan to Obama with those automatically produced by ChatGPT. We found that ChatGPT tends to overuse the lemma "we" as well as nouns and commas. On the other hand, the generated speeches employ less verbs and include, in mean, longer sentences. Even when imposing a given style to ChatGPT, the resulting speech remains distinct from messages written by the target author. Moreover, ChatGPT opts for a neutral tone with mainly positive emotional expressions and symbolic terms (e.g., freedom, nation). Finally, we show that the GPT's style exposes distinct features compared to real presidential addresses.

Updated: 2024-11-27 14:12:36

标题: GPT在白宫担任幕僚的标题

摘要: 最近几个大型语言模型(LLMs)展示了它们生成回应用户请求的消息的能力。这种科学突破促进了新的视角，但也带来了一些恐惧。本研究的主要重点是通过比较一个名为ChatGPT 3.5的LLM生成的消息与最近美国总统的消息，分析其书面风格。为了实现这一目标，我们比较了里根和奥巴马写的国情咨文与ChatGPT自动生成的国情咨文。我们发现ChatGPT倾向于过度使用"我们"这个词形和名词以及逗号。另一方面，生成的演讲中使用的动词较少，并且平均句子更长。即使对ChatGPT施加了特定的风格，生成的演讲仍然与目标作者写的消息有所不同。此外，ChatGPT选择中性语调，主要是积极的情感表达和象征性术语(例如，自由，国家)。最后，我们展示了GPT的风格与真实总统讲话相比具有不同的特征。

更新时间: 2024-11-27 14:12:36

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2411.18365v1

Differentiable Weightless Neural Networks

We introduce the Differentiable Weightless Neural Network (DWN), a model based on interconnected lookup tables. Training of DWNs is enabled by a novel Extended Finite Difference technique for approximate differentiation of binary values. We propose Learnable Mapping, Learnable Reduction, and Spectral Regularization to further improve the accuracy and efficiency of these models. We evaluate DWNs in three edge computing contexts: (1) an FPGA-based hardware accelerator, where they demonstrate superior latency, throughput, energy efficiency, and model area compared to state-of-the-art solutions, (2) a low-power microcontroller, where they achieve preferable accuracy to XGBoost while subject to stringent memory constraints, and (3) ultra-low-cost chips, where they consistently outperform small models in both accuracy and projected hardware area. DWNs also compare favorably against leading approaches for tabular datasets, with higher average rank. Overall, our work positions DWNs as a pioneering solution for edge-compatible high-throughput neural networks.

Updated: 2024-11-27 13:59:05

标题: 可微的无权重神经网络

摘要: 我们介绍了可微无权重神经网络（DWN），这是一种基于互连查找表的模型。DWN的训练采用了一种新颖的扩展有限差分技术，用于对二进制值进行近似微分。我们提出了可学习映射、可学习降维和谱正则化方法，进一步提高了这些模型的准确性和效率。我们在三种边缘计算环境中评估了DWN：（1）基于FPGA的硬件加速器，在这里它们在延迟、吞吐量、能效和模型面积方面表现优于最先进的解决方案，（2）低功耗微控制器，在这里它们在严格的内存约束下实现了优越的准确性，胜过XGBoost，（3）超低成本芯片，在这里它们在准确性和预期硬件面积方面始终优于小型模型。DWN与用于表格数据集的主流方法相比也表现出色，平均排名更高。总的来说，我们的工作将DWN定位为适用于边缘计算的高吞吐神经网络的开拓性解决方案。

更新时间: 2024-11-27 13:59:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.11112v3

Referential communication in heterogeneous communities of pre-trained visual deep networks

As large pre-trained image-processing neural networks are being embedded in autonomous agents such as self-driving cars or robots, the question arises of how such systems can communicate with each other about the surrounding world, despite their different architectures and training regimes. As a first step in this direction, we systematically explore the task of referential communication in a community of heterogeneous state-of-the-art pre-trained visual networks, showing that they can develop, in a self-supervised way, a shared protocol to refer to a target object among a set of candidates. This shared protocol can also be used, to some extent, to communicate about previously unseen object categories of different granularity. Moreover, a visual network that was not initially part of an existing community can learn the community's protocol with remarkable ease. Finally, we study, both qualitatively and quantitatively, the properties of the emergent protocol, providing some evidence that it is capturing high-level semantic features of objects.

Updated: 2024-11-27 13:54:59

标题: 在预先训练的视觉深度网络异质社区中的指称性交流

摘要: 随着大型预训练图像处理神经网络被嵌入到自动驾驶汽车或机器人等自主代理中，一个问题出现了，即这些系统如何能够相互交流关于周围世界的信息，尽管它们具有不同的架构和训练方式。作为朝着这个方向迈出的第一步，我们系统地探索了在一个由异构最先进的预训练视觉网络组成的社区中进行指称性沟通的任务，显示它们可以以自监督的方式发展出一个共享的协议，用于在一组候选目标中指称目标对象。这个共享协议也可以在一定程度上用于沟通有关之前未见过的不同粒度的目标对象类别。此外，一个最初不是现有社区的一部分的视觉网络可以以惊人的轻松学习社区的协议。最后，我们定性和定量地研究了新兴协议的特性，提供了一些证据表明它正在捕捉对象的高级语义特征。

更新时间: 2024-11-27 13:54:59

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2302.08913v5

TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models

This paper introduces Virtual Try-Off (VTOFF), a novel task focused on generating standardized garment images from single photos of clothed individuals. Unlike traditional Virtual Try-On (VTON), which digitally dresses models, VTOFF aims to extract a canonical garment image, posing unique challenges in capturing garment shape, texture, and intricate patterns. This well-defined target makes VTOFF particularly effective for evaluating reconstruction fidelity in generative models. We present TryOffDiff, a model that adapts Stable Diffusion with SigLIP-based visual conditioning to ensure high fidelity and detail retention. Experiments on a modified VITON-HD dataset show that our approach outperforms baseline methods based on pose transfer and virtual try-on with fewer pre- and post-processing steps. Our analysis reveals that traditional image generation metrics inadequately assess reconstruction quality, prompting us to rely on DISTS for more accurate evaluation. Our results highlight the potential of VTOFF to enhance product imagery in e-commerce applications, advance generative model evaluation, and inspire future work on high-fidelity reconstruction. Demo, code, and models are available at: https://rizavelioglu.github.io/tryoffdiff/

Updated: 2024-11-27 13:53:09

标题: TryOffDiff:使用扩散模型进行高保真服装重建的虚拟试穿

摘要: 本文介绍了Virtual Try-Off（VTOFF），这是一个新颖的任务，专注于从穿着衣物的个人的单张照片中生成标准化的服装图像。与传统的Virtual Try-On（VTON）不同，后者是为模特穿上数字服装，VTOFF旨在提取一个规范的服装图像，这在捕捉服装形状、质地和复杂图案方面提出了独特挑战。这个清晰定义的目标使得VTOFF在评估生成模型中的重建保真度方面尤为有效。我们提出了TryOffDiff，这是一个利用基于SigLIP的视觉调节来适应稳定扩散，以确保高保真度和细节保留的模型。在修改后的VITON-HD数据集上的实验表明，我们的方法在基于姿势转移和虚拟试穿的基线方法上表现更好，且需要更少的预处理和后处理步骤。我们的分析表明，传统的图像生成度量不足以充分评估重建质量，促使我们依赖于DISTS进行更准确的评估。我们的结果突显了VTOFF在增强电子商务应用中的产品形象、推进生成模型评估以及激发未来高保真度重建工作的潜力。演示、代码和模型可在以下网址找到：https://rizavelioglu.github.io/tryoffdiff/

更新时间: 2024-11-27 13:53:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18350v1

Benchmarking Counterfactual Image Generation

Generative AI has revolutionised visual content editing, empowering users to effortlessly modify images and videos. However, not all edits are equal. To perform realistic edits in domains such as natural image or medical imaging, modifications must respect causal relationships inherent to the data generation process. Such image editing falls into the counterfactual image generation regime. Evaluating counterfactual image generation is substantially complex: not only it lacks observable ground truths, but also requires adherence to causal constraints. Although several counterfactual image generation methods and evaluation metrics exist, a comprehensive comparison within a unified setting is lacking. We present a comparison framework to thoroughly benchmark counterfactual image generation methods. We integrate all models that have been used for the task at hand and expand them to novel datasets and causal graphs, demonstrating the superiority of Hierarchical VAEs across most datasets and metrics. Our framework is implemented in a user-friendly Python package that can be extended to incorporate additional SCMs, causal methods, generative models, and datasets for the community to build on. Code: https://github.com/gulnazaki/counterfactual-benchmark.

Updated: 2024-11-27 13:49:27

标题: 基准对照图像生成

摘要: 生成式人工智能已经彻底改变了视觉内容编辑，使用户能够轻松修改图像和视频。然而，并非所有的编辑都是相等的。为了在自然图像或医学成像等领域执行逼真的编辑，修改必须遵守数据生成过程中固有的因果关系。这种图像编辑属于反事实图像生成范畴。评估反事实图像生成具有相当复杂性：不仅缺乏可观察到的基本事实，还需要遵守因果约束。尽管存在多种反事实图像生成方法和评估指标，但缺乏在统一设置中进行全面比较。我们提出了一个比较框架，彻底评估反事实图像生成方法。我们整合了所有已用于此任务的模型，并将它们扩展到新领域和因果图，展示了在大多数数据集和指标上Hierarchical VAEs的优越性。我们的框架以用户友好的Python包实现，可扩展以包含额外的SCMs、因果方法、生成模型和数据集，供社区进一步建设。代码：https://github.com/gulnazaki/counterfactual-benchmark。

更新时间: 2024-11-27 13:49:27

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.20287v4

Biometric Authentication Based on Enhanced Remote Photoplethysmography Signal Morphology

Remote photoplethysmography (rPPG) is a non-contact method for measuring cardiac signals from facial videos, offering a convenient alternative to contact photoplethysmography (cPPG) obtained from contact sensors. Recent studies have shown that each individual possesses a unique cPPG signal morphology that can be utilized as a biometric identifier, which has inspired us to utilize the morphology of rPPG signals extracted from facial videos for person authentication. Since the facial appearance and rPPG are mixed in the facial videos, we first de-identify facial videos to remove facial appearance while preserving the rPPG information, which protects facial privacy and guarantees that only rPPG is used for authentication. The de-identified videos are fed into an rPPG model to get the rPPG signal morphology for authentication. In the first training stage, unsupervised rPPG training is performed to get coarse rPPG signals. In the second training stage, an rPPG-cPPG hybrid training is performed by incorporating external cPPG datasets to achieve rPPG biometric authentication and enhance rPPG signal morphology. Our approach needs only de-identified facial videos with subject IDs to train rPPG authentication models. The experimental results demonstrate that rPPG signal morphology hidden in facial videos can be used for biometric authentication. The code is available at https://github.com/zhaodongsun/rppg_biometrics.

Updated: 2024-11-27 13:47:03

标题: 基于增强远程光电容积脉搏信号形态的生物特征认证

摘要: 远程光电脉动图（rPPG）是一种用于通过面部视频测量心脏信号的非接触方法，为接触式光电脉动图（cPPG）提供了一种方便的替代方法。最近的研究表明，每个个体都具有可以用作生物特征识别器的独特cPPG信号形态，这启发了我们利用从面部视频中提取的rPPG信号形态进行人员认证。由于面部外观和rPPG混合在面部视频中，我们首先对面部视频进行去识别处理，以去除面部外观同时保留rPPG信息，从而保护面部隐私并确保仅使用rPPG进行认证。去识别的视频被输入到rPPG模型中，以获得用于认证的rPPG信号形态。在第一训练阶段，进行无监督rPPG训练以获得粗糙的rPPG信号。在第二训练阶段，通过整合外部cPPG数据集进行rPPG-cPPG混合训练，以实现rPPG生物特征认证并增强rPPG信号形态。我们的方法只需要具有主体ID的去识别面部视频来训练rPPG认证模型。实验结果表明，隐藏在面部视频中的rPPG信号形态可以用于生物特征认证。代码可在https://github.com/zhaodongsun/rppg_biometrics找到。

更新时间: 2024-11-27 13:47:03

领域: cs.CV,cs.AI,eess.IV,eess.SP

下载: http://arxiv.org/abs/2407.04127v3

FreqX: What neural networks learn is what network designers say

Personalized Federal learning(PFL) allows clients to cooperatively train a personalized model without disclosing their private dataset. However, PFL suffers from Non-IID, heterogeneous devices, lack of fairness, and unclear contribution which urgently need the interpretability of deep learning model to overcome these challenges. These challenges proposed new demands for interpretability. Low cost, privacy, and detailed information. There is no current interpretability method satisfying them. In this paper, we propose a novel interpretability method \emph{FreqX} by introducing Signal Processing and Information Theory. Our experiments show that the explanation results of FreqX contain both attribution information and concept information. FreqX runs at least 10 times faster than the baselines which contain concept information.

Updated: 2024-11-27 13:41:24

标题: FreqX：神经网络学习的是网络设计者所说的内容

摘要: 个性化联邦学习（PFL）允许客户合作训练个性化模型，而不需要披露他们的私人数据集。然而，PFL面临非独立同分布、异构设备、缺乏公平性和贡献不清晰等问题，迫切需要深度学习模型的可解释性来克服这些挑战。这些挑战提出了对解释性的新需求，包括低成本、隐私和详细信息。目前还没有解释性方法能够满足这些需求。在本文中，我们通过引入信号处理和信息理论提出了一种新颖的解释性方法FreqX。我们的实验表明，FreqX的解释结果包含属性信息和概念信息。与包含概念信息的基线相比，FreqX至少运行速度提高了10倍。

更新时间: 2024-11-27 13:41:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.18343v1

Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation

Despite considerable progress in stereo depth estimation, omnidirectional imaging remains underexplored, mainly due to the lack of appropriate data. We introduce Helvipad, a real-world dataset for omnidirectional stereo depth estimation, consisting of 40K frames from video sequences across diverse environments, including crowded indoor and outdoor scenes with diverse lighting conditions. Collected using two 360{\deg} cameras in a top-bottom setup and a LiDAR sensor, the dataset includes accurate depth and disparity labels by projecting 3D point clouds onto equirectangular images. Additionally, we provide an augmented training set with a significantly increased label density by using depth completion. We benchmark leading stereo depth estimation models for both standard and omnidirectional images. The results show that while recent stereo methods perform decently, a significant challenge persists in accurately estimating depth in omnidirectional imaging. To address this, we introduce necessary adaptations to stereo models, achieving improved performance.

Updated: 2024-11-27 13:34:41

标题: Helvipad：用于全向立体深度估计的实际数据集

摘要: 尽管立体深度估计取得了相当大的进展，但全向成像仍未得到充分探索，主要是由于缺乏适当的数据。我们引入了Helvipad，这是一个用于全向立体深度估计的真实世界数据集，包括来自不同环境的视频序列中的40K帧，包括拥挤的室内和室外场景，具有多样化的光照条件。该数据集使用两个360度相机在顶部和底部设置和LiDAR传感器收集，通过将3D点云投影到等距图像上，数据集包含准确的深度和视差标签。此外，我们通过使用深度完成提供了一个标签密度显著增加的增强训练集。我们对领先的立体深度估计模型进行了标准和全向图像的基准测试。结果显示，虽然最近的立体方法表现不错，但在全向成像中准确估计深度仍然存在显著挑战。为了解决这个问题，我们对立体模型进行了必要的调整，实现了改进的性能。

更新时间: 2024-11-27 13:34:41

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2411.18335v1

Using Malware Detection Techniques for HPC Application Classification

HPC systems face security and compliance challenges, particularly in preventing waste and misuse of computational resources by unauthorized or malicious software that deviates from allocation purpose. Existing methods to classify applications based on job names or resource usage are often unreliable or fail to capture applications that have different behavior due to different inputs or system noise. This research proposes an approach that uses similarity-preserving fuzzy hashes to classify HPC application executables. By comparing the similarity of SSDeep fuzzy hashes, a Random Forest Classifier can accurately label applications executing on HPC systems including unknown samples. We evaluate the Fuzzy Hash Classifier on a dataset of 92 application classes and 5333 distinct application samples. The proposed method achieved a macro f1-score of 90% (micro f1-score: 89%, weighted f1-score: 90%). Our approach addresses the critical need for more effective application classification in HPC environments, minimizing resource waste, and enhancing security and compliance.

Updated: 2024-11-27 13:28:43

标题: 使用恶意软件检测技术进行高性能计算应用分类

摘要: HPC系统面临安全和合规挑战，尤其是在防止未经授权或恶意软件浪费和滥用计算资源方面。现有的基于作业名称或资源使用情况对应用程序进行分类的方法通常不可靠，或者无法捕捉由于不同输入或系统噪音而表现出不同行为的应用程序。本研究提出了一种使用保持相似性的模糊哈希来对HPC应用程序可执行文件进行分类的方法。通过比较SSDeep模糊哈希的相似性，随机森林分类器可以准确地标记在HPC系统上执行的应用程序，包括未知样本。我们在一个包含92个应用程序类和5333个不同应用程序样本的数据集上评估了模糊哈希分类器。所提出的方法实现了90%的宏 f1分数（微 f1分数：89%，加权 f1分数：90%）。我们的方法解决了HPC环境中更有效应用程序分类的迫切需求，减少资源浪费，并增强安全性和合规性。

更新时间: 2024-11-27 13:28:43

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2411.18327v1

MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint

Hierarchical reinforcement learning (HRL) provides a promising solution for complex tasks with sparse rewards of intelligent agents, which uses a hierarchical framework that divides tasks into subgoals and completes them sequentially. However, current methods struggle to find suitable subgoals for ensuring a stable learning process. Without additional guidance, it is impractical to rely solely on exploration or heuristics methods to determine subgoals in a large goal space. To address the issue, We propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints (MENTOR). MENTOR acts as a "mentor", incorporating human feedback into high-level policy learning, to find better subgoals. As for low-level policy, MENTOR designs a dual policy for exploration-exploitation decoupling respectively to stabilize the training. Furthermore, although humans can simply break down tasks into subgoals to guide the right learning direction, subgoals that are too difficult or too easy can still hinder downstream learning efficiency. We propose the Dynamic Distance Constraint (DDC) mechanism dynamically adjusting the space of optional subgoals. Thus MENTOR can generate subgoals matching the low-level policy learning process from easy to hard. Extensive experiments demonstrate that MENTOR uses a small amount of human feedback to achieve significant improvement in complex tasks with sparse rewards.

Updated: 2024-11-27 13:27:41

标题: 导师：利用人类反馈和动态距离约束引导分层强化学习

摘要: 层次强化学习（HRL）为智能代理的稀疏奖励提供了一个有前途的解决方案，它使用将任务分解为子目标并按顺序完成的层次框架。然而，当前方法在寻找适当的子目标以确保稳定的学习过程方面存在困难。在没有额外指导的情况下，仅依靠探索或启发式方法在庞大的目标空间中确定子目标是不切实际的。为了解决这个问题，我们提出了一个综合人类反馈和动态距离约束（MENTOR）的一般层次强化学习框架。MENTOR充当“导师”，将人类反馈融入高层策略学习中，以找到更好的子目标。至于低层策略，MENTOR设计了一个双策略进行探索-利用解耦，以稳定训练。此外，虽然人类可以简单地将任务分解为子目标来指导正确的学习方向，但过于困难或过于简单的子目标仍可能阻碍下游学习效率。我们提出了动态距离约束（DDC）机制，动态调整可选子目标的空间。因此，MENTOR可以生成与低层策略学习过程匹配的子目标，从简单到困难。大量实验证明，MENTOR使用少量人类反馈能够在稀疏奖励的复杂任务中取得显著改进。

更新时间: 2024-11-27 13:27:41

领域: cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2402.14244v2

RITA: Automatic Framework for Designing of Resilient IoT Applications

Designing resilient Internet of Things (IoT) systems requires i) identification of IoT Critical Objects (ICOs) such as services, devices, and resources, ii) threat analysis, and iii) mitigation strategy selection. However, the traditional process for designing resilient IoT systems is still manual, leading to inefficiencies and increased risks. In addition, while tools such as ChatGPT could support this manual and highly error-prone process, their use raises concerns over data privacy, inconsistent outputs, and internet dependence. Therefore, we propose RITA, an automated, open-source framework that uses a fine-tuned RoBERTa-based Named Entity Recognition (NER) model to identify ICOs from IoT requirement documents, correlate threats, and recommend countermeasures. RITA operates entirely offline and can be deployed on-site, safeguarding sensitive information and delivering consistent outputs that enhance standardization. In our empirical evaluation, RITA outperformed ChatGPT in four of seven ICO categories, particularly in actuator, sensor, network resource, and service identification, using both human-annotated and ChatGPT-generated test data. These findings indicate that RITA can improve resilient IoT design by effectively supporting key security operations, offering a practical solution for developing robust IoT architectures.

Updated: 2024-11-27 13:24:52

标题: RITA：用于设计具有弹性的物联网应用程序的自动化框架

摘要: 设计具有弹性的物联网（IoT）系统需要i）识别IoT关键对象（ICOs），如服务、设备和资源，ii）威胁分析，以及iii）选择缓解策略。然而，传统的设计具有弹性的IoT系统的流程仍然是手动的，导致低效和增加风险。此外，虽然像ChatGPT这样的工具可以支持这种手动且高度容易出错的过程，但它们的使用引发了关于数据隐私、不一致的输出和对互联网依赖性的担忧。因此，我们提出了RITA，一个自动化、开源的框架，使用经过优化的基于RoBERTa的命名实体识别（NER）模型，从物联网需求文档中识别ICO，并相关威胁，并推荐对策。RITA完全脱机运行，并可以部署在现场，保护敏感信息并提供增强标准化的一致输出。在我们的实证评估中，RITA在七个ICO类别中的四个中表现优于ChatGPT，特别是在执行器、传感器、网络资源和服务识别方面，使用人工注释和ChatGPT生成的测试数据。这些发现表明，RITA可以通过有效支持关键安全操作来改善具有弹性的IoT设计，为开发健壮的IoT架构提供实用解决方案。

更新时间: 2024-11-27 13:24:52

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.18324v1

Mixture of Experts in Image Classification: What's the Sweet Spot?

Mixture-of-Experts (MoE) models have shown promising potential for parameter-efficient scaling across various domains. However, the implementation in computer vision remains limited, and often requires large-scale datasets comprising billions of samples. In this study, we investigate the integration of MoE within computer vision models and explore various MoE configurations on open datasets. When introducing MoE layers in image classification, the best results are obtained for models with a moderate number of activated parameters per sample. However, such improvements gradually vanish when the number of parameters per sample increases.

Updated: 2024-11-27 13:23:11

标题: 图像分类中的专家混合：最佳点在哪里？

摘要: 混合专家（MoE）模型已显示出在各个领域中实现参数高效扩展的潜力。然而，在计算机视觉中的实施仍然有限，通常需要包含数十亿样本的大规模数据集。在本研究中，我们调查了在计算机视觉模型中集成MoE，并在开放数据集上探索各种MoE配置。当将MoE层引入图像分类时，对于每个样本具有适度数量的激活参数的模型获得了最佳结果。然而，当每个样本的参数数量增加时，这种改进逐渐消失。

更新时间: 2024-11-27 13:23:11

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.18322v1

Learning optimal objective values for MILP

Modern Mixed Integer Linear Programming (MILP) solvers use the Branch-and-Bound algorithm together with a plethora of auxiliary components that speed up the search. In recent years, there has been an explosive development in the use of machine learning for enhancing and supporting these algorithmic components. Within this line, we propose a methodology for predicting the optimal objective value, or, equivalently, predicting if the current incumbent is optimal. For this task, we introduce a predictor based on a graph neural network (GNN) architecture, together with a set of dynamic features. Experimental results on diverse benchmarks demonstrate the efficacy of our approach, achieving high accuracy in the prediction task and outperforming existing methods. These findings suggest new opportunities for integrating ML-driven predictions into MILP solvers, enabling smarter decision-making and improved performance.

Updated: 2024-11-27 13:22:31

标题: 学习混合整数线性规划的最优目标值

摘要: 现代混合整数线性规划（MILP）求解器使用分支定界算法以及大量辅助组件加快搜索速度。近年来，机器学习在增强和支持这些算法组件方面取得了爆炸性的发展。在这一领域内，我们提出了一种方法论，用于预测最优目标值，或者等效地预测当前的最优解是否已找到。为了完成这一任务，我们引入了基于图神经网络（GNN）架构的预测器，以及一组动态特征。在各种基准测试中的实验结果表明，我们的方法效果显著，预测任务具有高准确性，并且优于现有方法。这些发现为将基于机器学习的预测整合到MILP求解器中提供了新的机会，实现更智能的决策和改善性能。

更新时间: 2024-11-27 13:22:31

领域: math.OC,cs.AI,cs.LG,cs.MS

下载: http://arxiv.org/abs/2411.18321v1

Continual Learning in Machine Speech Chain Using Gradient Episodic Memory

Continual learning for automatic speech recognition (ASR) systems poses a challenge, especially with the need to avoid catastrophic forgetting while maintaining performance on previously learned tasks. This paper introduces a novel approach leveraging the machine speech chain framework to enable continual learning in ASR using gradient episodic memory (GEM). By incorporating a text-to-speech (TTS) component within the machine speech chain, we support the replay mechanism essential for GEM, allowing the ASR model to learn new tasks sequentially without significant performance degradation on earlier tasks. Our experiments, conducted on the LJ Speech dataset, demonstrate that our method outperforms traditional fine-tuning and multitask learning approaches, achieving a substantial error rate reduction while maintaining high performance across varying noise conditions. We showed the potential of our semi-supervised machine speech chain approach for effective and efficient continual learning in speech recognition.

Updated: 2024-11-27 13:19:20

标题: 使用梯度记忆的机器语音链中的持续学习

摘要: 语音识别（ASR）系统的持续学习面临着挑战，特别是需要避免灾难性遗忘同时保持在先前学习任务上的性能。本文介绍了一种新颖的方法，利用机器语音链框架实现ASR中的持续学习，使用梯度记忆（GEM）。通过在机器语音链中加入文本到语音（TTS）组件，我们支持对GEM至关重要的重播机制，使ASR模型能够顺序学习新任务，而不会在早期任务上显著降低性能。我们在LJ Speech数据集上进行的实验表明，我们的方法优于传统的微调和多任务学习方法，在保持高性能的同时实现了大幅度的错误率降低，跨不同噪音条件保持高性能。我们展示了我们的半监督机器语音链方法在语音识别中实现有效和高效的持续学习的潜力。

更新时间: 2024-11-27 13:19:20

领域: cs.CL,cs.AI,eess.AS

下载: http://arxiv.org/abs/2411.18320v1

LLMEasyQuant -- An Easy to Use Toolkit for LLM Quantization

Currently, there are many quantization methods appeared for LLM quantization, yet few are user-friendly and easy to be deployed locally. Packages like TensorRT and Quantohave many underlying structures and self-invoking internal functions, which are not conducive to developers' personalized development and learning for deployment. Therefore, we develop LLMEasyQuant, it is a package aiming to for easy quantization deployment which is user-friendly and suitable for beginners' learning.

Updated: 2024-11-27 12:59:03

标题: LLMEasyQuant - 一款易于使用的LLM量化工具包

摘要: 目前，针对LLM量化出现了许多量化方法，但很少有用户友好且易于本地部署的方法。像TensorRT和Quantohave这样的软件包具有许多底层结构和自我调用的内部功能，这不利于开发人员个性化开发和学习部署。因此，我们开发了LLMEasyQuant，这是一个旨在实现简单量化部署的软件包，用户友好且适合初学者学习。

更新时间: 2024-11-27 12:59:03

领域: cs.LG

下载: http://arxiv.org/abs/2406.19657v3

MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement

CT report generation (CTRG) aims to automatically generate diagnostic reports for 3D volumes, relieving clinicians' workload and improving patient care. Despite clinical value, existing works fail to effectively incorporate diagnostic information from multiple anatomical views and lack related clinical expertise essential for accurate and reliable diagnosis. To resolve these limitations, we propose a novel Multi-view perception Knowledge-enhanced Tansformer (MvKeTR) to mimic the diagnostic workflow of clinicians. Just as radiologists first examine CT scans from multiple planes, a Multi-View Perception Aggregator (MVPA) with view-aware attention effectively synthesizes diagnostic information from multiple anatomical views. Then, inspired by how radiologists further refer to relevant clinical records to guide diagnostic decision-making, a Cross-Modal Knowledge Enhancer (CMKE) retrieves the most similar reports based on the query volume to incorporate domain knowledge into the diagnosis procedure. Furthermore, instead of traditional MLPs, we employ Kolmogorov-Arnold Networks (KANs) with learnable nonlinear activation functions as the fundamental building blocks of both modules to better capture intricate diagnostic patterns in CT interpretation. Extensive experiments on the public CTRG-Chest-548K dataset demonstrate that our method outpaces prior state-of-the-art models across all metrics.

Updated: 2024-11-27 12:58:23

标题: MvKeTR: 多视角感知和知识增强的胸部CT报告生成

摘要: CT报告生成（CTRG）旨在自动生成针对3D体积的诊断报告，减轻临床医生的工作量并改善患者护理。尽管具有临床价值，现有研究未能有效整合来自多个解剖视图的诊断信息，并缺乏准确可靠诊断所必需的相关临床专业知识。为了解决这些局限性，我们提出了一种新颖的多视角知识增强变换器（MvKeTR）来模拟临床医生的诊断工作流程。就像放射科医生首先从多个平面检查CT扫描一样，具有视图感知的多视图感知聚合器（MVPA）有效地综合来自多个解剖视图的诊断信息。然后，受到放射科医生如何进一步参考相关临床记录以指导诊断决策的启发，交叉模态知识增强器（CMKE）基于查询体积检索最相似的报告，将领域知识纳入诊断过程。此外，我们不再使用传统的MLP，而是采用具有可学习非线性激活函数的Kolmogorov-Arnold网络（KAN）作为两个模块的基本构建模块，以更好地捕捉CT解释中复杂的诊断模式。对公共CTRG-Chest-548K数据集的大量实验表明，我们的方法在所有指标上超过了先前的最先进模型。

更新时间: 2024-11-27 12:58:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18309v1

Application of Soft Actor-Critic Algorithms in Optimizing Wastewater Treatment with Time Delays Integration

Wastewater treatment plants face unique challenges for process control due to their complex dynamics, slow time constants, and stochastic delays in observations and actions. These characteristics make conventional control methods, such as Proportional-Integral-Derivative controllers, suboptimal for achieving efficient phosphorus removal, a critical component of wastewater treatment to ensure environmental sustainability. This study addresses these challenges using a novel deep reinforcement learning approach based on the Soft Actor-Critic algorithm, integrated with a custom simulator designed to model the delayed feedback inherent in wastewater treatment plants. The simulator incorporates Long Short-Term Memory networks for accurate multi-step state predictions, enabling realistic training scenarios. To account for the stochastic nature of delays, agents were trained under three delay scenarios: no delay, constant delay, and random delay. The results demonstrate that incorporating random delays into the reinforcement learning framework significantly improves phosphorus removal efficiency while reducing operational costs. Specifically, the delay-aware agent achieved 36% reduction in phosphorus emissions, 55% higher reward, 77% lower target deviation from the regulatory limit, and 9% lower total costs than traditional control methods in the simulated environment. These findings underscore the potential of reinforcement learning to overcome the limitations of conventional control strategies in wastewater treatment, providing an adaptive and cost-effective solution for phosphorus removal.

Updated: 2024-11-27 12:52:48

标题: 软性演员-评论家算法在优化带有时间延迟集成的废水处理中的应用

摘要: 废水处理厂在过程控制方面面临着独特的挑战，因为其复杂的动态特性、缓慢的时间常数以及观测和行动中的随机延迟。这些特性使得传统的控制方法，如比例-积分-微分控制器，在实现高效磷去除方面表现不佳，这是废水处理的关键组成部分，以确保环境可持续性。本研究利用基于Soft Actor-Critic算法的新型深度强化学习方法，结合设计用于模拟废水处理厂固有延迟反馈的自定义模拟器来解决这些挑战。该模拟器采用长短期记忆网络进行准确的多步状态预测，实现了逼真的训练场景。为了考虑延迟的随机性，代理人在三种延迟情景下接受培训：无延迟、恒定延迟和随机延迟。结果表明，将随机延迟纳入强化学习框架显著提高了磷去除效率，同时降低了运营成本。具体而言，具有延迟感知的代理实现了磷排放减少36％，奖励增加55％，目标偏离监管限制降低77％，总成本降低9％，比在模拟环境中传统控制方法更优。这些发现强调了强化学习在克服废水处理中传统控制策略的局限性方面的潜力，为磷去除提供了一种适应性和经济有效的解决方案。

更新时间: 2024-11-27 12:52:48

领域: eess.SY,cs.AI,cs.LG,cs.SY

下载: http://arxiv.org/abs/2411.18305v1

A Comprehensive Study of Structural Pruning for Vision Models

Structural pruning has emerged as a promising approach for producing more efficient models. Nevertheless, the community suffers from a lack of standardized benchmarks and metrics, leaving the progress in this area not fully comprehended.To fill this gap, we present the first comprehensive benchmark, termed PruningBench, for structural pruning. PruningBench showcases the following three characteristics: 1) PruningBench employs a unified and consistent framework for evaluating the effectiveness of diverse structural pruning techniques; 2) PruningBench systematically evaluates 16 existing pruning methods, encompassing a wide array of models (e.g., CNNs and ViTs) and tasks (e.g., classification and detection); 3) PruningBench provides easily implementable interfaces to facilitate the implementation of future pruning methods, and enables the subsequent researchers to incorporate their work into our leaderboards. We provide an online pruning platform http://pruning.vipazoo.cn for customizing pruning tasks and reproducing all results in this paper. Leaderboard results can be available on https://github.com/HollyLee2000/PruningBench.

Updated: 2024-11-27 12:32:56

标题: 一个关于视觉模型结构修剪的综合研究

摘要: 结构剪枝已经成为生产更高效模型的一种有前途的方法。然而，社区缺乏标准化的基准和度量标准，这使得这一领域的进展尚未完全被理解。为了填补这一空白，我们提出了第一个全面的基准，称为PruningBench，用于结构剪枝。PruningBench展示了以下三个特点：1）PruningBench采用统一和一致的框架来评估各种结构剪枝技术的有效性；2）PruningBench系统地评估了16种现有的剪枝方法，涵盖了各种模型（例如CNN和ViTs）和任务（例如分类和检测）；3）PruningBench提供了易于实现的接口，以促进未来剪枝方法的实现，并使后续研究人员将他们的工作纳入我们的排行榜。我们提供一个在线剪枝平台http://pruning.vipazoo.cn，用于定制剪枝任务并重现本文中的所有结果。排行榜结果可以在https://github.com/HollyLee2000/PruningBench上查看。

更新时间: 2024-11-27 12:32:56

领域: cs.AI

下载: http://arxiv.org/abs/2406.12315v4

Aligning Pre-trained Models for Spoken Language Translation

This paper investigates a novel approach to end-to-end speech translation (ST) based on aligning frozen pre-trained automatic speech recognition (ASR) and machine translation (MT) models via a small connector module (Q-Former, our Subsampler-Transformer Encoder). This connector bridges the gap between the speech and text modalities, transforming ASR encoder embeddings into the latent representation space of the MT encoder while being the only part of the system optimized during training. Experiments are conducted on the How2 English-Portuguese dataset as we investigate the alignment approach in a small-scale scenario focusing on ST. While keeping the size of the connector module constant and small in comparison ( < 5% of the size of the larger aligned models), increasing the size and capability of the foundation ASR and MT models universally improves translation results. We also find that the connectors can serve as domain adapters for the foundation MT models, significantly improving translation performance in the aligned ST setting. We conclude that this approach represents a viable and scalable approach to training end-to-end ST systems.

Updated: 2024-11-27 12:32:41

标题: 对话语言翻译的预训练模型对齐

摘要: 本文研究了一种新颖的端到端语音翻译（ST）方法，该方法基于通过一个小的连接器模块（Q-Former，我们的Subsampler-Transformer编码器）对冻结的预训练自动语音识别（ASR）和机器翻译（MT）模型进行对齐。该连接器桥接了语音和文本模态之间的差距，将ASR编码器嵌入转换为MT编码器的潜在表示空间，同时是训练期间优化的系统中唯一的部分。我们在How2英葡语数据集上进行实验，研究了在小规模场景下专注于ST的对齐方法。在保持连接器模块的尺寸恒定且较小的情况下（ < 5% 大型对齐模型尺寸的大小），增加基础ASR和MT模型的尺寸和能力普遍改善了翻译结果。我们还发现，连接器可以作为基础MT模型的领域适配器，在对齐的ST设置中显著提高了翻译性能。我们得出结论，该方法代表了一种可行且可扩展的训练端到端ST系统的方法。

更新时间: 2024-11-27 12:32:41

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.18294v1

SWIM: Short-Window CNN Integrated with Mamba for EEG-Based Auditory Spatial Attention Decoding

In complex auditory environments, the human auditory system possesses the remarkable ability to focus on a specific speaker while disregarding others. In this study, a new model named SWIM, a short-window convolution neural network (CNN) integrated with Mamba, is proposed for identifying the locus of auditory attention (left or right) from electroencephalography (EEG) signals without relying on speech envelopes. SWIM consists of two parts. The first is a short-window CNN (SW$_\text{CNN}$), which acts as a short-term EEG feature extractor and achieves a final accuracy of 84.9% in the leave-one-speaker-out setup on the widely used KUL dataset. This improvement is due to the use of an improved CNN structure, data augmentation, multitask training, and model combination. The second part, Mamba, is a sequence model first applied to auditory spatial attention decoding to leverage the long-term dependency from previous SW$_\text{CNN}$ time steps. By joint training SW$_\text{CNN}$ and Mamba, the proposed SWIM structure uses both short-term and long-term information and achieves an accuracy of 86.2%, which reduces the classification errors by a relative 31.0% compared to the previous state-of-the-art result. The source code is available at https://github.com/windowso/SWIM-ASAD.

Updated: 2024-11-27 12:30:24

标题: SWIM：短窗口CNN与Mamba集成用于基于EEG的听觉空间注意力解码

摘要: 在复杂的听觉环境中，人类听觉系统具有显著的能力，可以聚焦在特定的说话者身上，而忽略其他人。本研究提出了一种名为SWIM的新模型，它是一个短时窗卷积神经网络（CNN），集成了Mamba，用于从脑电图（EEG）信号中识别听觉注意的位置（左侧或右侧），而无需依赖语音包络。SWIM由两部分组成。第一部分是短时窗CNN（SWCNN），它充当短期EEG特征提取器，在广泛使用的KUL数据集的留一说话者外设置中实现了最终准确率84.9%。这一改进是由于改进的CNN结构、数据增强、多任务训练和模型组合的使用。第二部分，Mamba，是首次应用于听觉空间注意力解码的序列模型，以利用先前SWCNN时间步的长期依赖性。通过联合训练SWCNN和Mamba，所提出的SWIM结构使用短期和长期信息，实现了86.2%的准确率，相对于先前的最新结果，减少了31.0%的分类错误。源代码可在https://github.com/windowso/SWIM-ASAD找到。

更新时间: 2024-11-27 12:30:24

领域: eess.AS,cs.AI,cs.SD,eess.SP

下载: http://arxiv.org/abs/2409.19884v2

ProteinWeaver: A Divide-and-Assembly Approach for Protein Backbone Design

Nature creates diverse proteins through a 'divide and assembly' strategy. Inspired by this idea, we introduce ProteinWeaver, a two-stage framework for protein backbone design. Our method first generates individual protein domains and then employs an SE(3) diffusion model to flexibly assemble these domains. A key challenge lies in the assembling step, given the complex and rugged nature of the inter-domain interaction landscape. To address this challenge, we employ preference alignment to discern complex relationships between structure and interaction landscapes through comparative analysis of generated samples. Comprehensive experiments demonstrate that ProteinWeaver: (1) generates high-quality, novel protein backbones through versatile domain assembly; (2) outperforms RFdiffusion, the current state-of-the-art in backbone design, by 13\% and 39\% for long-chain proteins; (3) shows the potential for cooperative function design through illustrative case studies. To sum up, by introducing a `divide-and-assembly' paradigm, ProteinWeaver advances protein engineering and opens new avenues for functional protein design.

Updated: 2024-11-27 12:18:46

标题: 蛋白质编织器：一种蛋白质主干设计的分割和组装方法

摘要: 大自然通过“分割和组装”策略创造多样化蛋白质。受此理念启发，我们引入了ProteinWeaver，这是一个用于蛋白质骨架设计的两阶段框架。我们的方法首先生成单个蛋白质结构域，然后利用SE(3)扩散模型灵活地组装这些结构域。一个关键挑战在于组装步骤，鉴于结构域间相互作用景观的复杂和崎岖性质。为了解决这一挑战，我们采用了偏好对齐方法，通过对生成样本进行比较分析来识别结构和相互作用景观之间的复杂关系。广泛的实验表明，ProteinWeaver：(1)通过多功能结构域组装生成高质量、新颖的蛋白质骨架；(2)在长链蛋白质设计方面优于当前骨架设计领域的最新技术RFdiffusion，分别提高了13\%和39\%；(3)显示了通过例证案例研究实现合作功能设计的潜力。总而言之，通过引入“分割和组装”范式，ProteinWeaver推动了蛋白工程领域，并为功能蛋白设计开辟了新途径。

更新时间: 2024-11-27 12:18:46

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2411.16686v2

DualCast: Disentangling Aperiodic Events from Traffic Series with a Dual-Branch Model

Traffic forecasting is an important problem in the operation and optimisation of transportation systems. State-of-the-art solutions train machine learning models by minimising the mean forecasting errors on the training data. The trained models often favour periodic events instead of aperiodic ones in their prediction results, as periodic events often prevail in the training data. While offering critical optimisation opportunities, aperiodic events such as traffic incidents may be missed by the existing models. To address this issue, we propose DualCast -- a model framework to enhance the learning capability of traffic forecasting models, especially for aperiodic events. DualCast takes a dual-branch architecture, to disentangle traffic signals into two types, one reflecting intrinsic {spatial-temporal} patterns and the other reflecting external environment contexts including aperiodic events. We further propose a cross-time attention mechanism, to capture high-order spatial-temporal relationships from both periodic and aperiodic patterns. DualCast is versatile. We integrate it with recent traffic forecasting models, consistently reducing their forecasting errors by up to 9.6% on multiple real datasets.

Updated: 2024-11-27 12:17:50

标题: DualCast：使用双分支模型将非周期性事件与流量系列区分开。

摘要: 交通预测是交通系统运营和优化中的一个重要问题。最先进的解决方案通过最小化训练数据上的平均预测误差来训练机器学习模型。训练的模型通常更偏向周期性事件而不是非周期性事件在其预测结果中，因为周期性事件在训练数据中往往占主导地位。虽然提供了关键的优化机会，但交通事件等非周期性事件可能会被现有模型所忽略。为解决这一问题，我们提出了DualCast -- 一个模型框架，以增强交通预测模型的学习能力，特别是针对非周期性事件。DualCast采用双分支架构，将交通信号分解为两种类型，一种反映内在的{时空}模式，另一种反映包括非周期性事件在内的外部环境上下文。我们进一步提出了一个跨时间的注意机制，以捕捉来自周期性和非周期性模式的高阶时空关系。DualCast具有通用性。我们将其与最近的交通预测模型集成在一起，在多个真实数据集上一致地减少了其预测误差高达9.6%。

更新时间: 2024-11-27 12:17:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.18286v1

CaT-GNN: Enhancing Credit Card Fraud Detection via Causal Temporal Graph Neural Networks

Credit card fraud poses a significant threat to the economy. While Graph Neural Network (GNN)-based fraud detection methods perform well, they often overlook the causal effect of a node's local structure on predictions. This paper introduces a novel method for credit card fraud detection, the \textbf{\underline{Ca}}usal \textbf{\underline{T}}emporal \textbf{\underline{G}}raph \textbf{\underline{N}}eural \textbf{N}etwork (CaT-GNN), which leverages causal invariant learning to reveal inherent correlations within transaction data. By decomposing the problem into discovery and intervention phases, CaT-GNN identifies causal nodes within the transaction graph and applies a causal mixup strategy to enhance the model's robustness and interpretability. CaT-GNN consists of two key components: Causal-Inspector and Causal-Intervener. The Causal-Inspector utilizes attention weights in the temporal attention mechanism to identify causal and environment nodes without introducing additional parameters. Subsequently, the Causal-Intervener performs a causal mixup enhancement on environment nodes based on the set of nodes. Evaluated on three datasets, including a private financial dataset and two public datasets, CaT-GNN demonstrates superior performance over existing state-of-the-art methods. Our findings highlight the potential of integrating causal reasoning with graph neural networks to improve fraud detection capabilities in financial transactions.

Updated: 2024-11-27 12:15:06

标题: CaT-GNN：通过因果时间图神经网络增强信用卡欺诈检测

摘要: 信用卡欺诈对经济构成重大威胁。虽然基于图神经网络（GNN）的欺诈检测方法表现良好，但它们经常忽视节点的局部结构对预测的因果影响。本文介绍了一种新颖的信用卡欺诈检测方法，即\textbf{\underline{Ca}}usal \textbf{\underline{T}}emporal \textbf{\underline{G}}raph \textbf{\underline{N}}eural \textbf{N}etwork（CaT-GNN），它利用因果不变学习来揭示交易数据中的内在相关性。通过将问题分解为发现和干预阶段，CaT-GNN确定交易图中的因果节点，并应用因果混合策略来增强模型的鲁棒性和可解释性。CaT-GNN由两个关键组件组成：因果检视器和因果干预器。因果检视器利用时间注意机制中的注意权重来识别因果和环境节点，而无需引入额外的参数。随后，因果干预器根据节点集对环境节点执行因果混合增强。在三个数据集上进行评估，包括一个私人金融数据集和两个公共数据集，CaT-GNN表现出优于现有最先进方法的性能。我们的发现突显了将因果推理与图神经网络相结合以提高金融交易中欺诈检测能力的潜力。

更新时间: 2024-11-27 12:15:06

领域: cs.LG,cs.AI,q-fin.ST

下载: http://arxiv.org/abs/2402.14708v2

Large Language Model-Brained GUI Agents: A Survey

GUIs have long been central to human-computer interaction, providing an intuitive and visually-driven way to access and interact with digital systems. The advent of LLMs, particularly multimodal models, has ushered in a new era of GUI automation. They have demonstrated exceptional capabilities in natural language understanding, code generation, and visual processing. This has paved the way for a new generation of LLM-brained GUI agents capable of interpreting complex GUI elements and autonomously executing actions based on natural language instructions. These agents represent a paradigm shift, enabling users to perform intricate, multi-step tasks through simple conversational commands. Their applications span across web navigation, mobile app interactions, and desktop automation, offering a transformative user experience that revolutionizes how individuals interact with software. This emerging field is rapidly advancing, with significant progress in both research and industry. To provide a structured understanding of this trend, this paper presents a comprehensive survey of LLM-brained GUI agents, exploring their historical evolution, core components, and advanced techniques. We address research questions such as existing GUI agent frameworks, the collection and utilization of data for training specialized GUI agents, the development of large action models tailored for GUI tasks, and the evaluation metrics and benchmarks necessary to assess their effectiveness. Additionally, we examine emerging applications powered by these agents. Through a detailed analysis, this survey identifies key research gaps and outlines a roadmap for future advancements in the field. By consolidating foundational knowledge and state-of-the-art developments, this work aims to guide both researchers and practitioners in overcoming challenges and unlocking the full potential of LLM-brained GUI agents.

Updated: 2024-11-27 12:13:39

标题: 大型语言模型驱动的GUI代理：一项调查

摘要: GUIs一直是人机交互的核心，提供了一种直观和视觉驱动的方式来访问和与数字系统进行交互。LLM的出现，特别是多模型，开启了GUI自动化的新时代。它们在自然语言理解、代码生成和视觉处理方面表现出色。这为一代新的LLM大脑GUI代理铺平了道路，这些代理能够解释复杂的GUI元素，并根据自然语言指令自主执行操作。这些代理代表了一种范式转变，使用户能够通过简单的对话命令执行复杂的多步任务。它们的应用涵盖了网页导航、移动应用程序交互和桌面自动化，提供了一种革命性的用户体验，改变了个人与软件互动的方式。这个新兴领域正在迅速发展，研究和行业都取得了重大进展。为了提供对这一趋势的结构化理解，本文提供了LLM大脑GUI代理的全面调查，探讨了它们的历史演变、核心组件和高级技术。我们探讨了一些研究问题，如现有的GUI代理框架、用于训练专门GUI代理的数据的收集和利用、为GUI任务量身定制的大型操作模型的开发，以及评估其有效性所必需的评估指标和基准。此外，我们还研究了由这些代理驱动的新兴应用。通过详细分析，这项调查确定了关键的研究空白，并勾画了未来在该领域取得进展的路线图。通过整合基础知识和最新发展，本研究旨在指导研究人员和从业者克服挑战，释放LLM大脑GUI代理的全部潜力。

更新时间: 2024-11-27 12:13:39

领域: cs.AI,cs.CL,cs.HC

下载: http://arxiv.org/abs/2411.18279v1

Large Models Enabled Ubiquitous Wireless Sensing

In the era of 5G communication, the knowledge of channel state information (CSI) is crucial for enhancing network performance. This paper explores the utilization of language models for spatial CSI prediction within MIMO-OFDM systems. We begin by outlining the significance of accurate CSI in enabling advanced functionalities such as adaptive modulation. We review existing methodologies for CSI estimation, emphasizing the shift from traditional to data-driven approaches. Then a novel framework for spatial CSI prediction using realistic environment information is proposed, and experimental results demonstrate the effectiveness. This research paves way for innovative strategies in managing wireless networks.

Updated: 2024-11-27 12:11:35

标题: 大模型实现无处不在的无线传感

摘要: 在5G通信时代，对信道状态信息（CSI）的了解对于提升网络性能至关重要。本文探讨了在MIMO-OFDM系统中利用语言模型进行空间CSI预测的方法。我们首先概述了准确CSI在实现自适应调制等先进功能方面的重要性。我们回顾了现有的CSI估计方法，强调了从传统到数据驱动方法的转变。然后提出了一个利用真实环境信息进行空间CSI预测的新框架，并实验结果表明了其有效性。这项研究为管理无线网络提供了创新策略的道路。

更新时间: 2024-11-27 12:11:35

领域: cs.LG

下载: http://arxiv.org/abs/2411.18277v1

GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation

Effectively manipulating articulated objects in household scenarios is a crucial step toward achieving general embodied artificial intelligence. Mainstream research in 3D vision has primarily focused on manipulation through depth perception and pose detection. However, in real-world environments, these methods often face challenges due to imperfect depth perception, such as with transparent lids and reflective handles. Moreover, they generally lack the diversity in part-based interactions required for flexible and adaptable manipulation. To address these challenges, we introduced a large-scale part-centric dataset for articulated object manipulation that features both photo-realistic material randomizations and detailed annotations of part-oriented, scene-level actionable interaction poses. We evaluated the effectiveness of our dataset by integrating it with several state-of-the-art methods for depth estimation and interaction pose prediction. Additionally, we proposed a novel modular framework that delivers superior and robust performance for generalizable articulated object manipulation. Our extensive experiments demonstrate that our dataset significantly improves the performance of depth perception and actionable interaction pose prediction in both simulation and real-world scenarios.

Updated: 2024-11-27 12:11:23

标题: GAPartManip：一个用于材料无关关节对象操作的大规模部件中心数据集

摘要: 在家庭场景中有效地操纵关节式物体是实现普遍具有体现智能的关键步骤。三维视觉领域的主流研究主要集中在通过深度感知和姿势检测进行操作。然而，在现实环境中，这些方法常常面临由于深度感知不完善而带来的挑战，例如透明盖子和反光手柄。此外，它们通常缺乏灵活和适应性操纵所需的基于部件的交互多样性。为了解决这些挑战，我们引入了一个大规模的基于部件的数据集，用于关节式物体操纵，该数据集具有逼真的材料随机化和详细的基于部件的、场景级别的可操作交互姿势注释。我们通过将其与几种最先进的深度估计和交互姿势预测方法集成，评估了我们数据集的有效性。此外，我们提出了一个新颖的模块化框架，为通用的关节式物体操纵提供了卓越和稳健的性能。我们广泛的实验表明，我们的数据集显著提高了深度感知和可操作交互姿势预测在模拟和真实场景中的性能。

更新时间: 2024-11-27 12:11:23

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2411.18276v1

Hidden Data Privacy Breaches in Federated Learning

Federated Learning (FL) emerged as a paradigm for conducting machine learning across broad and decentralized datasets, promising enhanced privacy by obviating the need for direct data sharing. However, recent studies show that attackers can steal private data through model manipulation or gradient analysis. Existing attacks are constrained by low theft quantity or low-resolution data, and they are often detected through anomaly monitoring in gradients or weights. In this paper, we propose a novel data-reconstruction attack leveraging malicious code injection, supported by two key techniques, i.e., distinctive and sparse encoding design and block partitioning. Unlike conventional methods that require detectable changes to the model, our method stealthily embeds a hidden model using parameter sharing to systematically extract sensitive data. The Fibonacci-based index design ensures efficient, structured retrieval of memorized data, while the block partitioning method enhances our method's capability to handle high-resolution images by dividing them into smaller, manageable units. Extensive experiments on 4 datasets confirmed that our method is superior to the five state-of-the-art data-reconstruction attacks under the five respective detection methods. Our method can handle large-scale and high-resolution data without being detected or mitigated by state-of-the-art data reconstruction defense methods. In contrast to baselines, our method can be directly applied to both FedAVG and FedSGD scenarios, underscoring the need for developers to devise new defenses against such vulnerabilities. We will open-source our code upon acceptance.

Updated: 2024-11-27 12:04:37

标题: 联邦学习中的隐私数据泄露

摘要: 联邦学习（FL）作为一种在广泛和分散数据集上进行机器学习的范式出现，通过避免直接数据共享，承诺提供增强隐私保护。然而，最近的研究表明，攻击者可以通过模型操纵或梯度分析窃取私人数据。现有的攻击受到盗窃数量或低分辨率数据的限制，并且它们通常通过异常监测梯度或权重来检测。在本文中，我们提出了一种利用恶意代码注入的新颖数据重建攻击，支持两个关键技术，即独特稀疏编码设计和块分区。与需要对模型进行可检测更改的传统方法不同，我们的方法通过参数共享隐蔽地嵌入隐藏模型，以系统地提取敏感数据。基于斐波那契索引设计确保了记忆数据的高效结构化检索，而块分区方法通过将高分辨率图像划分为较小的可管理单元，增强了我们方法处理高分辨率图像的能力。对4个数据集进行的大量实验证实，我们的方法优于五种最先进的数据重建攻击在五种各自的检测方法下。我们的方法可以处理大规模和高分辨率数据，而不会被最先进的数据重建防御方法检测或缓解。与基线相比，我们的方法可以直接应用于FedAVG和FedSGD场景，强调开发人员需要设计新的防御措施来应对这些漏洞。我们将在接受后开源我们的代码。

更新时间: 2024-11-27 12:04:37

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2411.18269v1

Wearable intelligent throat enables natural speech in stroke patients with dysarthria

Wearable silent speech systems hold significant potential for restoring communication in patients with speech impairments. However, seamless, coherent speech remains elusive, and clinical efficacy is still unproven. Here, we present an AI-driven intelligent throat (IT) system that integrates throat muscle vibrations and carotid pulse signal sensors with large language model (LLM) processing to enable fluent, emotionally expressive communication. The system utilizes ultrasensitive textile strain sensors to capture high-quality signals from the neck area and supports token-level processing for real-time, continuous speech decoding, enabling seamless, delay-free communication. In tests with five stroke patients with dysarthria, IT's LLM agents intelligently corrected token errors and enriched sentence-level emotional and logical coherence, achieving low error rates (4.2% word error rate, 2.9% sentence error rate) and a 55% increase in user satisfaction. This work establishes a portable, intuitive communication platform for patients with dysarthria with the potential to be applied broadly across different neurological conditions and in multi-language support systems.

Updated: 2024-11-27 12:03:52

标题: 可穿戴智能喉咙设备使中风患者患有发音障碍的自然言语成为可能

摘要: 穿戴式无声语音系统在恢复语言受损患者的沟通方面具有巨大潜力。然而，无缝、连贯的语音仍然难以实现，临床疗效仍未得到证实。在这里，我们介绍了一种由人工智能驱动的智能喉咙（IT）系统，该系统将喉部肌肉振动和颈动脉脉搏信号传感器与大型语言模型（LLM）处理集成，以实现流畅、富有表达力的沟通。该系统利用超灵敏的纺织应变传感器捕捉颈部区域的高质量信号，并支持标记级处理，实现实时、连续的语音解码，实现无缝、无延迟的沟通。在与五名患有运动障碍的中风患者进行测试时，IT的LLM代理智能地纠正了标记错误，并丰富了句子级情感和逻辑连贯性，实现了低错误率（4.2%字错误率，2.9%句错误率）和用户满意度提高了55%。这项工作为患有运动障碍的患者建立了一个便携、直观的沟通平台，具有在不同神经病症和多语言支持系统中广泛应用的潜力。

更新时间: 2024-11-27 12:03:52

领域: eess.AS,cs.AI,cs.SD,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.18266v1

G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks

Recent advancements in large language model (LLM)-based agents have demonstrated that collective intelligence can significantly surpass the capabilities of individual agents, primarily due to well-crafted inter-agent communication topologies. Despite the diverse and high-performing designs available, practitioners often face confusion when selecting the most effective pipeline for their specific task: \textit{Which topology is the best choice for my task, avoiding unnecessary communication token overhead while ensuring high-quality solution?} In response to this dilemma, we introduce G-Designer, an adaptive, efficient, and robust solution for multi-agent deployment, which dynamically designs task-aware, customized communication topologies. Specifically, G-Designer models the multi-agent system as a multi-agent network, leveraging a variational graph auto-encoder to encode both the nodes (agents) and a task-specific virtual node, and decodes a task-adaptive and high-performing communication topology. Extensive experiments on six benchmarks showcase that G-Designer is: \textbf{(1) high-performing}, achieving superior results on MMLU with accuracy at $84.50\%$ and on HumanEval with pass@1 at $89.90\%$; \textbf{(2) task-adaptive}, architecting communication protocols tailored to task difficulty, reducing token consumption by up to $95.33\%$ on HumanEval; and \textbf{(3) adversarially robust}, defending against agent adversarial attacks with merely $0.3\%$ accuracy drop.

Updated: 2024-11-27 12:03:27

标题: G-Designer：通过图神经网络设计多智能体通信拓扑结构

摘要: 最近大型语言模型(LLM)的进展表明，基于群体智能的代理可以显著超越个体代理的能力，主要是由于精心设计的代理间通信拓扑结构。尽管有各种多样且高性能的设计可供选择，但从业者在选择特定任务的最有效流程时常常感到困惑：“对于我的任务来说，哪种拓扑结构是最佳选择，避免不必要的通信令牌开销同时确保高质量的解决方案？”为了解决这一困境，我们引入了G-Designer，这是一个适应性强、高效且稳健的多代理部署解决方案，可以动态设计任务感知的定制通信拓扑结构。具体而言，G-Designer将多代理系统建模为一个多代理网络，利用变分图自动编码器来对节点(代理)和一个任务特定的虚拟节点进行编码，并解码出一个任务自适应且高性能的通信拓扑结构。对六个基准测试进行的大量实验表明，G-Designer具有以下特点：(1)性能优异，在MMLU上的准确率达到84.50%，在HumanEval上的pass@1达到89.90%；(2)任务自适应，为任务难度量身定制通信协议，在HumanEval上可将令牌消耗减少高达95.33%；(3)对抗性稳健，仅有0.3%的准确率下降即可抵御代理对抗攻击。

更新时间: 2024-11-27 12:03:27

领域: cs.MA,cs.LG

下载: http://arxiv.org/abs/2410.11782v2

Break the ID-Language Barrier: An Adaption Framework for Sequential Recommendation

The recent breakthrough of large language models (LLMs) in natural language processing has sparked exploration in recommendation systems, however, their limited domain-specific knowledge remains a critical bottleneck. Specifically, LLMs lack key pieces of information crucial for sequential recommendations, such as user behavior patterns. To address this critical gap, we propose IDLE-Adapter, a novel framework that integrates pre-trained ID embeddings, rich in domain-specific knowledge, into LLMs to improve recommendation accuracy. IDLE-Adapter acts as a bridge, transforming sparse user-item interaction data into dense, LLM-compatible representations through a Pre-trained ID Sequential Model, Dimensionality Alignment, Layer-wise Embedding Refinement, and Layer-wise Distribution Alignment. Furthermore, IDLE-Adapter demonstrates remarkable flexibility by seamlessly integrating ID embeddings from diverse ID-based sequential models and LLM architectures. Extensive experiments across various datasets demonstrate the superiority of IDLE-Adapter, achieving over 10\% and 20\% improvements in HitRate@5 and NDCG@5 metrics, respectively, compared to state-of-the-art methods.

Updated: 2024-11-27 11:59:44

标题: 突破ID-语言障碍：顺序推荐的适应框架

摘要: 最近大型语言模型（LLMs）在自然语言处理领域取得的突破激发了对推荐系统的探索，然而，它们有限的领域特定知识仍然是一个关键瓶颈。具体来说，LLMs缺乏对于顺序推荐至关重要的信息片段，比如用户行为模式。为了解决这一关键缺口，我们提出了IDLE-Adapter，一个新颖的框架，将预训练的ID嵌入，富含领域特定知识，集成到LLMs中以提高推荐准确性。IDLE-Adapter充当一个桥梁，通过预训练的ID顺序模型、维度对齐、层次嵌入细化和层次分布对齐，将稀疏的用户-项目交互数据转化为密集的、LLM兼容的表示。此外，IDLE-Adapter展示了非常灵活的特性，可以无缝地集成来自不同ID基础的顺序模型和LLM架构的ID嵌入。通过在各种数据集上进行大量实验，IDLE-Adapter展现出了卓越的灵活性，与最先进的方法相比，分别在HitRate@5和NDCG@5指标上实现了超过10%和20%的提升。

更新时间: 2024-11-27 11:59:44

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2411.18262v1

Dynamic Retail Pricing via Q-Learning -- A Reinforcement Learning Framework for Enhanced Revenue Management

This paper explores the application of a reinforcement learning (RL) framework using the Q-Learning algorithm to enhance dynamic pricing strategies in the retail sector. Unlike traditional pricing methods, which often rely on static demand models, our RL approach continuously adapts to evolving market dynamics, offering a more flexible and responsive pricing strategy. By creating a simulated retail environment, we demonstrate how RL effectively addresses real-time changes in consumer behavior and market conditions, leading to improved revenue outcomes. Our results illustrate that the RL model not only surpasses traditional methods in terms of revenue generation but also provides insights into the complex interplay of price elasticity and consumer demand. This research underlines the significant potential of applying artificial intelligence in economic decision-making, paving the way for more sophisticated, data-driven pricing models in various commercial domains.

Updated: 2024-11-27 11:59:06

标题: 通过Q学习的动态零售定价——增强收入管理的强化学习框架

摘要: 本文探讨了在零售行业中运用强化学习（RL）框架和Q学习算法来增强动态定价策略的应用。与通常依赖静态需求模型的传统定价方法不同，我们的RL方法不断适应不断发展的市场动态，提供更灵活和响应迅速的定价策略。通过创建一个模拟零售环境，我们展示了RL如何有效地应对消费者行为和市场条件的实时变化，从而带来改善的收入结果。我们的结果表明，RL模型不仅在收入生成方面超越了传统方法，还提供了有关价格弹性和消费者需求复杂相互作用的见解。这项研究强调了在经济决策中应用人工智能的重要潜力，为各种商业领域的更复杂、数据驱动的定价模型铺平了道路。

更新时间: 2024-11-27 11:59:06

领域: cs.LG

下载: http://arxiv.org/abs/2411.18261v1

On Designing Effective RL Reward at Training Time for LLM Reasoning

Reward models have been increasingly critical for improving the reasoning capability of LLMs. Existing research has shown that a well-trained reward model can substantially improve model performances at inference time via search. However, the potential of reward models during RL training time still remains largely under-explored. It is currently unclear whether these reward models can provide additional training signals to enhance the reasoning capabilities of LLMs in RL training that uses sparse success rewards, which verify the correctness of solutions. In this work, we evaluate popular reward models for RL training, including the Outcome-supervised Reward Model (ORM) and the Process-supervised Reward Model (PRM), and train a collection of LLMs for math problems using RL by combining these learned rewards with success rewards. Surprisingly, even though these learned reward models have strong inference-time performances, they may NOT help or even hurt RL training, producing worse performances than LLMs trained with the success reward only. Our analysis reveals that an LLM can receive high rewards from some of these reward models by repeating correct but unnecessary reasoning steps, leading to a severe reward hacking issue. Therefore, we introduce two novel reward refinement techniques, including Clipping and Delta. The key idea is to ensure the accumulative reward of any reasoning trajectory is upper-bounded to keep a learned reward model effective without being exploited. We evaluate our techniques with multiple reward models over a set of 1.5B and 7B LLMs on MATH and GSM8K benchmarks and demonstrate that with a carefully designed reward function, RL training without any additional supervised tuning can improve all the evaluated LLMs, including the state-of-the-art 7B LLM Qwen2.5-Math-7B-Instruct on MATH and GSM8K benchmarks.

Updated: 2024-11-27 11:58:50

标题: 关于在LLM推理训练中设计有效的强化学习奖励

摘要: 奖励模型对于提高LLMs的推理能力变得越来越关键。现有研究表明，经过良好训练的奖励模型可以通过搜索在推理时显着提高模型性能。然而，在RL训练期间奖励模型的潜力仍然大部分未被开发。目前尚不清楚这些奖励模型是否能提供额外的训练信号，以增强LLMs在使用稀疏成功奖励的RL训练中的推理能力，这些奖励验证了解决方案的正确性。在这项工作中，我们评估了用于RL训练的流行奖励模型，包括Outcome-supervised Reward Model（ORM）和Process-supervised Reward Model（PRM），并通过将这些学习到的奖励与成功奖励结合，训练了一组LLMs解决数学问题。令人惊讶的是，尽管这些学习到的奖励模型在推理时表现强劲，但它们可能不会帮助甚至会损害RL训练，导致LLMs的表现比仅使用成功奖励训练的LLMs更差。我们的分析揭示了LLM可以通过重复正确但不必要的推理步骤从这些奖励模型中获得高奖励，导致严重的奖励欺骗问题。因此，我们引入了两种新颖的奖励细化技术，包括Clipping和Delta。关键思想是确保任何推理轨迹的累积奖励上限，以确保学习到的奖励模型有效而不被利用。我们在MATH和GSM8K基准上的1.5B和7B LLMs集合上评估了我们的技术，并证明了通过精心设计的奖励函数，即使没有任何额外的监督调整，RL训练也可以改善所有评估的LLMs，包括最先进的7B LLM Qwen2.5-Math-7B-Instruct 在MATH和GSM8K基准上。

更新时间: 2024-11-27 11:58:50

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.15115v3

Transfer Learning for Deep Learning-based Prediction of Lattice Thermal Conductivity

Machine learning promises to accelerate the material discovery by enabling high-throughput prediction of desirable macro-properties from atomic-level descriptors or structures. However, the limited data available about precise values of these properties have been a barrier, leading to predictive models with limited precision or the ability to generalize. This is particularly true of lattice thermal conductivity (LTC): existing datasets of precise (ab initio, DFT-based) computed values are limited to a few dozen materials with little variability. Based on such datasets, we study the impact of transfer learning on both the precision and generalizability of a deep learning model (ParAIsite). We start from an existing model (MEGNet~\cite{Chen2019}) and show that improvements are obtained by fine-tuning a pre-trained version on different tasks. Interestingly, we also show that a much greater improvement is obtained when first fine-tuning it on a large datasets of low-quality approximations of LTC (based on the AGL model) and then applying a second phase of fine-tuning with our high-quality, smaller-scale datasets. The promising results obtained pave the way not only towards a greater ability to explore large databases in search of low thermal conductivity materials but also to methods enabling increasingly precise predictions in areas where quality data are rare.

Updated: 2024-11-27 11:57:58

标题: 基于深度学习的晶格热导率预测的迁移学习

摘要: 机器学习承诺通过从原子级描述符或结构中实现高通量预测，加速材料发现。然而，有关这些性质精确值的有限数据是一个障碍，导致预测模型具有有限的精度或泛化能力。这在晶格热导率（LTC）方面尤为明显：现有的准确（从头计算，基于DFT）计算数值的数据集仅限于少数几十种材料，变化很小。基于这样的数据集，我们研究了迁移学习对深度学习模型（ParAIsite）精度和泛化能力的影响。我们从现有模型（MEGNet~\cite{Chen2019}）开始，并展示通过在不同任务上微调预训练版本可以获得改进。有趣的是，我们还展示了通过首先在大量低质量LTC近似数据集（基于AGL模型）上微调，然后再应用二次微调到我们的高质量、小规模数据集时，可以获得更大的改进。获得的有希望的结果不仅为探索大型数据库以寻找低热导率材料的能力铺平了道路，还为在质量数据稀缺的领域提供越来越精确的预测方法。

更新时间: 2024-11-27 11:57:58

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2411.18259v1

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Recent advancements in audio generation have been significantly propelled by the capabilities of Large Language Models (LLMs). The existing research on audio LLM has primarily focused on enhancing the architecture and scale of audio language models, as well as leveraging larger datasets, and generally, acoustic codecs, such as EnCodec, are used for audio tokenization. However, these codecs were originally designed for audio compression, which may lead to suboptimal performance in the context of audio LLM. Our research aims to address the shortcomings of current audio LLM codecs, particularly their challenges in maintaining semantic integrity in generated audio. For instance, existing methods like VALL-E, which condition acoustic token generation on text transcriptions, often suffer from content inaccuracies and elevated word error rates (WER) due to semantic misinterpretations of acoustic tokens, resulting in word skipping and errors. To overcome these issues, we propose a straightforward yet effective approach called X-Codec. X-Codec incorporates semantic features from a pre-trained semantic encoder before the Residual Vector Quantization (RVQ) stage and introduces a semantic reconstruction loss after RVQ. By enhancing the semantic ability of the codec, X-Codec significantly reduces WER in speech synthesis tasks and extends these benefits to non-speech applications, including music and sound generation. Our experiments in text-to-speech, music continuation, and text-to-sound tasks demonstrate that integrating semantic information substantially improves the overall performance of language models in audio generation. Our code and demo are available (Demo: https://x-codec-audio.github.io Code: https://github.com/zhenye234/xcodec)

Updated: 2024-11-27 11:47:45

标题: 编解码器的选择很重要：探讨编解码器在音频语言模型中的语义缺陷

摘要: 最近音频生成领域的进展受到大型语言模型（LLMs）的能力推动。现有研究主要集中在增强音频语言模型的架构和规模，利用更大的数据集，通常使用声学编解码器（如EnCodec）进行音频标记化。然而，这些编解码器最初是为音频压缩而设计的，这可能导致在音频LLM环境下性能不佳。我们的研究旨在解决当前音频LLM编解码器的缺点，特别是在生成音频时维持语义完整性方面的挑战。例如，现有方法如VALL-E，在文本转录上调节声学标记生成，通常由于声学标记的语义误解而导致内容不准确和词错误率（WER）升高，从而导致跳字和错误。为了克服这些问题，我们提出了一种简单而有效的方法，称为X-Codec。X-Codec在Residual Vector Quantization（RVQ）阶段之前引入了来自预训练语义编码器的语义特征，并在RVQ之后引入语义重构损失。通过增强编解码器的语义能力，X-Codec在语音合成任务中显著减少了WER，并将这些好处扩展到非语音应用，包括音乐和声音生成。我们在文本到语音、音乐延续和文本到声音任务中的实验表明，整合语义信息显著提高了语言模型在音频生成中的整体性能。我们的代码和演示可用（演示：https://x-codec-audio.github.io 代码：https://github.com/zhenye234/xcodec）

更新时间: 2024-11-27 11:47:45

领域: eess.AS,cs.AI,cs.CL,cs.SD

下载: http://arxiv.org/abs/2408.17175v3

Active partitioning: inverting the paradigm of active learning

Datasets often incorporate various functional patterns related to different aspects or regimes, which are typically not equally present throughout the dataset. We propose a novel, general-purpose partitioning algorithm that utilizes competition between models to detect and separate these functional patterns. This competition is induced by multiple models iteratively submitting their predictions for the dataset, with the best prediction for each data point being rewarded with training on that data point. This reward mechanism amplifies each model's strengths and encourages specialization in different patterns. The specializations can then be translated into a partitioning scheme. The amplification of each model's strengths inverts the active learning paradigm: while active learning typically focuses the training of models on their weaknesses to minimize the number of required training data points, our concept reinforces the strengths of each model, thus specializing them. We validate our concept -- called active partitioning -- with various datasets with clearly distinct functional patterns, such as mechanical stress and strain data in a porous structure. The active partitioning algorithm produces valuable insights into the datasets' structure, which can serve various further applications. As a demonstration of one exemplary usage, we set up modular models consisting of multiple expert models, each learning a single partition, and compare their performance on more than twenty popular regression problems with single models learning all partitions simultaneously. Our results show significant improvements, with up to 54% loss reduction, confirming our partitioning algorithm's utility.

Updated: 2024-11-27 11:47:07

标题: 主动分区：颠覆主动学习的范式

摘要: 数据集通常包含与不同方面或领域相关的各种功能模式，这些功能模式通常不会在整个数据集中均等存在。我们提出了一种新颖的通用分区算法，利用模型之间的竞争来检测和分离这些功能模式。这种竞争是通过多个模型迭代地提交其对数据集的预测来引起的，每个数据点的最佳预测都会通过在该数据点上进行训练来获得奖励。这种奖励机制放大了每个模型的优势，并鼓励不同模式的专业化。然后可以将这些专业化转化为分区方案。每个模型的优势的放大颠倒了主动学习范式：虽然主动学习通常将模型的训练集中在它们的弱点上以最小化所需的训练数据点数量，但我们的概念则增强了每个模型的优势，从而使它们专业化。我们用具有明显不同功能模式的各种数据集验证了我们的概念--称为主动分区--，例如多孔结构中的机械应力和应变数据。主动分区算法为数据集的结构提供了有价值的见解，可以用于各种进一步的应用。作为一个示范性用途的演示，我们建立了由多个专家模型组成的模块化模型，每个模型学习一个单独的分区，并将它们在超过二十个流行的回归问题上的性能与同时学习所有分区的单个模型进行比较。我们的结果显示出显著的改进，最多可减少54%的损失，验证了我们的分区算法的实用性。

更新时间: 2024-11-27 11:47:07

领域: cs.LG

下载: http://arxiv.org/abs/2411.18254v1

Human Motion Instruction Tuning

This paper presents LLaMo (Large Language and Human Motion Assistant), a multimodal framework for human motion instruction tuning. In contrast to conventional instruction-tuning approaches that convert non-linguistic inputs, such as video or motion sequences, into language tokens, LLaMo retains motion in its native form for instruction tuning. This method preserves motion-specific details that are often diminished in tokenization, thereby improving the model's ability to interpret complex human behaviors. By processing both video and motion data alongside textual inputs, LLaMo enables a flexible, human-centric analysis. Experimental evaluations across high-complexity domains, including human behaviors and professional activities, indicate that LLaMo effectively captures domain-specific knowledge, enhancing comprehension and prediction in motion-intensive scenarios. We hope LLaMo offers a foundation for future multimodal AI systems with broad applications, from sports analytics to behavioral prediction. Our code and models are available on the project website: https://github.com/ILGLJ/LLaMo.

Updated: 2024-11-27 11:45:29

标题: 人类动作指导调整

摘要: 这篇论文介绍了LLaMo（大型语言和人体运动助手），这是一个用于人体运动指导调整的多模态框架。与传统的指导调整方法不同，传统方法将非语言输入（如视频或运动序列）转换为语言标记，而LLaMo保留了运动的原始形式用于指导调整。这种方法保留了在标记化过程中经常减少的运动特定细节，从而提高了模型解释复杂人类行为的能力。通过同时处理视频、运动数据和文本输入，LLaMo实现了灵活的、以人为中心的分析。在包括人类行为和专业活动在内的高复杂度领域进行的实验评估表明，LLaMo有效地捕获了领域特定知识，增强了运动密集场景中的理解和预测能力。我们希望LLaMo为未来具有广泛应用的多模态人工智能系统奠定基础，从体育分析到行为预测。我们的代码和模型可在项目网站上获得：https://github.com/ILGLJ/LLaMo。

更新时间: 2024-11-27 11:45:29

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2411.16805v2

Multimodal Integration of Longitudinal Noninvasive Diagnostics for Survival Prediction in Immunotherapy Using Deep Learning

Purpose: Analyzing noninvasive longitudinal and multimodal data using artificial intelligence could potentially transform immunotherapy for cancer patients, paving the way towards precision medicine. Methods: In this study, we integrated pre- and on-treatment blood measurements, prescribed medications and CT-based volumes of organs from a large pan-cancer cohort of 694 patients treated with immunotherapy to predict short and long-term overall survival. By leveraging a combination of recent developments, different variants of our extended multimodal transformer-based simple temporal attention (MMTSimTA) network were trained end-to-end to predict mortality at three, six, nine and twelve months. These models were also compared to baseline methods incorporating intermediate and late fusion based integration methods. Results: The strongest prognostic performance was demonstrated using the extended transformer-based multimodal model with area under the curves (AUCs) of $0.84 \pm $0.04, $0.83 \pm $0.02, $0.82 \pm $0.02, $0.81 \pm $0.03 for 3-, 6-, 9-, and 12-month survival prediction, respectively. Conclusion: Our findings suggest that analyzing integrated early treatment data has potential for predicting survival of immunotherapy patients. Integrating complementary noninvasive modalities into a jointly trained model, using our extended transformer-based architecture, demonstrated an improved multimodal prognostic performance, especially in short term survival prediction.

Updated: 2024-11-27 11:44:06

标题: Using Deep Learning进行免疫治疗中纵向非侵入式诊断的多模态整合以预测生存期

摘要: 目的：利用人工智能分析非侵入性纵向和多模态数据可能会改变癌症患者的免疫疗法，为精准医学铺平道路。方法：在这项研究中，我们整合了大型全癌症患者队列的预治疗和治疗期间的血液测量、处方药物和基于CT的器官体积，以预测接受免疫疗法治疗的694名患者的短期和长期总生存。通过利用最近发展的组合，我们训练了不同版本的扩展多模态变压器基于简单时间注意力（MMTSimTA）网络，以在三个、六个、九个和十二个月预测死亡率。这些模型也与基线方法进行了比较，包括基于中间和后期融合的整合方法。结果：最强的预后表现是使用扩展变压器基础多模态模型展示的，对于3、6、9和12个月的存活预测，曲线下面积（AUC）分别为0.84±0.04、0.83±0.02、0.82±0.02、0.81±0.03。结论：我们的研究结果表明，分析整合的早期治疗数据有潜力预测免疫疗法患者的生存。将互补的非侵入性模态整合到一个共同训练的模型中，使用我们的扩展变压器基础架构，展示了改进的多模态预测表现，特别是在短期生存预测方面。

更新时间: 2024-11-27 11:44:06

领域: cs.LG,cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2411.18253v1

IKUN: Initialization to Keep snn training and generalization great with sUrrogate-stable variaNce

Weight initialization significantly impacts the convergence and performance of neural networks. While traditional methods like Xavier and Kaiming initialization are widely used, they often fall short for spiking neural networks (SNNs), which have distinct requirements compared to artificial neural networks (ANNs). To address this, we introduce \textbf{IKUN}, a variance-stabilizing initialization method integrated with surrogate gradient functions, specifically designed for SNNs. \textbf{IKUN} stabilizes signal propagation, accelerates convergence, and enhances generalization. Experiments show \textbf{IKUN} improves training efficiency by up to \textbf{50\%}, achieving \textbf{95\%} training accuracy and \textbf{91\%} generalization accuracy. Hessian analysis reveals that \textbf{IKUN}-trained models converge to flatter minima, characterized by Hessian eigenvalues near zero on the positive side, promoting better generalization. The method is open-sourced for further exploration: \href{https://github.com/MaeChd/SurrogateVarStabe}{https://github.com/MaeChd/SurrogateVarStabe}.

Updated: 2024-11-27 11:41:11

标题: IKUN：使用稳定替代方差的初始化方法来保持snn训练和泛化效果优秀

摘要: 权重初始化显著影响神经网络的收敛和性能。虽然传统方法如Xavier和Kaiming初始化被广泛使用，但它们常常不能满足脉冲神经网络（SNNs）的需求，后者与人工神经网络（ANNs）有明显的区别。为了解决这个问题，我们引入了\textbf{IKUN}，这是一个与替代梯度函数集成的方差稳定初始化方法，专门为SNNs设计。 \textbf{IKUN} 稳定了信号传播，加速了收敛，并增强了泛化能力。实验证明，\textbf{IKUN} 提高了训练效率，最高可达50\%，实现了95%的训练准确度和91%的泛化准确度。 Hessian分析显示，\textbf{IKUN} 训练的模型收敛到更平缓的最小值，以Hessian特征值接近零的正值为特征，促进更好的泛化。这一方法已开源供进一步探索：\href{https://github.com/MaeChd/SurrogateVarStabe}{https://github.com/MaeChd/SurrogateVarStabe}。

更新时间: 2024-11-27 11:41:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.18250v1

A gentle push funziona benissimo: making instructed models in Italian via contrastive activation steering

Adapting models to a language that was only partially present in the pre-training data requires fine-tuning, which is expensive in terms of both data and computational resources. As an alternative to fine-tuning, we explore the potential of activation steering-based techniques to enhance model performance on Italian tasks. Through our experiments we show that Italian steering (i) can be successfully applied to different models, (ii) achieves performances comparable to, or even better than, fine-tuned models for Italian, and (iii) yields higher quality and consistency in Italian generations. We also discuss the utility of steering and fine-tuning in the contemporary LLM landscape where models are anyway getting high Italian performances even if not explicitly trained in this language.

Updated: 2024-11-27 11:38:09

标题: A gentle push works very well: creating instructed models in Italian through contrastive activation steering

摘要: 将模型调整到预训练数据中仅部分存在的语言需要微调，这在数据和计算资源方面都是昂贵的。作为微调的替代方案，我们探讨了基于激活引导技术的潜力，以增强模型在意大利任务上的性能。通过我们的实验，我们展示了意大利引导技术可以成功应用于不同模型，达到与或甚至优于意大利微调模型的性能，并且在意大利生成中产生更高质量和一致性。我们还讨论了在当代LLM景观中引导和微调的实用性，在这个景观中，即使没有明确在这种语言中进行训练，模型仍然在意大利表现出色。

更新时间: 2024-11-27 11:38:09

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.18247v1

BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence

In embodied intelligence systems, a key component is 3D perception algorithm, which enables agents to understand their surrounding environments. Previous algorithms primarily rely on point cloud, which, despite offering precise geometric information, still constrain perception performance due to inherent sparsity, noise, and data scarcity. In this work, we introduce a novel image-centric 3D perception model, BIP3D, which leverages expressive image features with explicit 3D position encoding to overcome the limitations of point-centric methods. Specifically, we leverage pre-trained 2D vision foundation models to enhance semantic understanding, and introduce a spatial enhancer module to improve spatial understanding. Together, these modules enable BIP3D to achieve multi-view, multi-modal feature fusion and end-to-end 3D perception. In our experiments, BIP3D outperforms current state-of-the-art results on the EmbodiedScan benchmark, achieving improvements of 5.69% in the 3D detection task and 15.25% in the 3D visual grounding task.

Updated: 2024-11-27 11:31:05

标题: BIP3D：跨越二维图像和三维感知的体现智能

摘要: 在具有体现智能的系统中，一个关键组成部分是3D感知算法，它使代理能够理解其周围环境。先前的算法主要依赖于点云，尽管提供了精确的几何信息，但由于固有的稀疏性、噪声和数据稀缺性，仍然限制了感知性能。在这项工作中，我们引入了一种新颖的以图像为中心的3D感知模型BIP3D，它利用具有显式3D位置编码的表现力强大的图像特征，以克服以点为中心的方法的局限性。具体来说，我们利用预训练的2D视觉基础模型来增强语义理解，并引入一个空间增强模块来改善空间理解。这些模块共同使BIP3D能够实现多视角、多模态特征融合和端到端的3D感知。在我们的实验中，BIP3D在EmbodiedScan基准测试中表现优于当前最先进的结果，在3D检测任务中取得了5.69%的改进，在3D视觉对齐任务中取得了15.25%的改进。

更新时间: 2024-11-27 11:31:05

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.14869v2

Thai Financial Domain Adaptation of THaLLE -- Technical Report

Large Language Models (LLMs) excel in general tasks but struggle with domain-specific challenges, such as specialized terminology and localized regulations. Existing financial LLMs, like FinGPT and BloombergGPT, lack support for the Thai financial domain. We developed a Thai Financial LLM using the Investment Consultant (IC) exam dataset from the Stock Exchange of Thailand. To address dataset limitations, we applied data augmentation, ReLoRA for efficient training, Continued Pretraining (CPT) for domain knowledge, and Rank-Stabilized LoRA (rsLoRA) for fine-tuning. Supervised Fine-Tuning (SFT) simulated exam scenarios, while Direct Preference Optimization (DPO) refined the model using feedback. The model achieved scores of 72%, 72%, and 84% on IC exam levels P1, P2, and P3, respectively, demonstrating its effectiveness in Thai financial advisory tasks and its potential for specialized applications.

Updated: 2024-11-27 11:30:00

标题: 泰国金融领域THaLLE的领域自适应--技术报告

摘要: 大型语言模型（LLMs）在一般任务中表现出色，但在领域特定挑战中表现出困难，如专业术语和本地化法规。现有的金融LLMs，如FinGPT和BloombergGPT，缺乏对泰国金融领域的支持。我们利用泰国证券交易所的投资顾问（IC）考试数据集开发了一种泰国金融LLM。为了解决数据集限制，我们应用了数据增强、用于高效训练的ReLoRA、用于领域知识的持续预训练（CPT）和用于微调的Rank-Stabilized LoRA（rsLoRA）。监督微调（SFT）模拟考试场景，而直接偏好优化（DPO）利用反馈改进模型。该模型在IC考试级别P1、P2和P3上分别达到了72%、72%和84%的分数，展示了其在泰国金融咨询任务中的有效性以及在专业应用中的潜力。

更新时间: 2024-11-27 11:30:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.18242v1

On the role of Artificial Intelligence methods in modern force-controlled manufacturing robotic tasks

This position paper explores the integration of Artificial Intelligence (AI) into force-controlled robotic tasks within the scope of advanced manufacturing, a cornerstone of Industry 4.0. AI's role in enhancing robotic manipulators - key drivers in the Fourth Industrial Revolution - is rapidly leading to significant innovations in smart manufacturing. The objective of this article is to frame these innovations in practical force-controlled applications - e.g. deburring, polishing, and assembly tasks like peg-in-hole (PiH) - highlighting their necessity for maintaining high-quality production standards. By reporting on recent AI-based methodologies, this article contrasts them and identifies current challenges to be addressed in future research. The analysis concludes with a perspective on future research directions, emphasizing the need for common performance metrics to validate AI techniques, integration of various enhancements for performance optimization, and the importance of validating them in relevant scenarios. These future directions aim to provide consistency with already adopted approaches, so as to be compatible with manufacturing standards, increasing the relevance of AI-driven methods in both academic and industrial contexts.

Updated: 2024-11-27 11:29:59

标题: 关于人工智能方法在现代力控制制造机器人任务中的作用

摘要: 这篇立场文件探讨了人工智能（AI）在先进制造范围内的力控机器人任务中的整合，这是工业4.0的基石。AI在增强机器人操作器的作用 - 第四次工业革命的关键驱动因素 - 正迅速导致智能制造领域的重大创新。本文的目标是将这些创新应用于实际的力控应用 - 例如去毛刺、抛光和组装任务，如销钉孔（PiH） - 强调它们在保持高质量生产标准方面的必要性。通过报道最近基于AI的方法论，本文对它们进行了对比，并确定了未来研究需要解决的当前挑战。分析最后以对未来研究方向的展望结束，强调了验证AI技术的共同性能指标的必要性，集成各种增强功能以优化性能的重要性，以及验证它们在相关场景中的重要性。这些未来方向旨在与已采用的方法保持一致，以便与制造标准兼容，增加AI驱动方法在学术和工业环境中的相关性。

更新时间: 2024-11-27 11:29:59

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.16828v2

Exploration of LLM Multi-Agent Application Implementation Based on LangGraph+CrewAI

With the rapid development of large model technology, the application of agent technology in various fields is becoming increasingly widespread, profoundly changing people's work and lifestyles. In complex and dynamic systems, multi-agents achieve complex tasks that are difficult for a single agent to complete through division of labor and collaboration among agents. This paper discusses the integrated application of LangGraph and CrewAI. LangGraph improves the efficiency of information transmission through graph architecture, while CrewAI enhances team collaboration capabilities and system performance through intelligent task allocation and resource management. The main research contents of this paper are: (1) designing the architecture of agents based on LangGraph for precise control; (2) enhancing the capabilities of agents based on CrewAI to complete a variety of tasks. This study aims to delve into the application of LangGraph and CrewAI in multi-agent systems, providing new perspectives for the future development of agent technology, and promoting technological progress and application innovation in the field of large model intelligent agents.

Updated: 2024-11-27 11:29:17

标题: 基于LangGraph+CrewAI的LLM多智能体应用实现的探索

摘要: 随着大型模型技术的快速发展，代理技术在各个领域的应用越来越广泛，深刻地改变了人们的工作和生活方式。在复杂和动态的系统中，多代理通过分工合作实现了单个代理难以完成的复杂任务。本文讨论了LangGraph和CrewAI的集成应用。LangGraph通过图形架构提高了信息传输的效率，而CrewAI通过智能任务分配和资源管理增强了团队协作能力和系统性能。本文的主要研究内容包括：（1）基于LangGraph为精确控制设计代理的架构；（2）基于CrewAI增强代理完成各种任务的能力。这项研究旨在深入探讨LangGraph和CrewAI在多代理系统中的应用，为代理技术未来发展提供新的视角，推动大型模型智能代理领域的技术进步和应用创新。

更新时间: 2024-11-27 11:29:17

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2411.18241v1

Transferable Ensemble Black-box Jailbreak Attacks on Large Language Models

In this report, we propose a novel black-box jailbreak attacking framework that incorporates various LLM-as-Attacker methods to deliver transferable and powerful jailbreak attacks. Our method is designed based on three key observations from existing jailbreaking studies and practices. First, we consider an ensemble approach should be more effective in exposing the vulnerabilities of an aligned LLM compared to individual attacks. Second, different malicious instructions inherently vary in their jailbreaking difficulty, necessitating differentiated treatment to ensure more efficient attacks. Finally, the semantic coherence of a malicious instruction is crucial for triggering the defenses of an aligned LLM; therefore, it must be carefully disrupted to manipulate its embedding representation, thereby increasing the jailbreak success rate. We validated our approach by participating in the Competition for LLM and Agent Safety 2024, where our team achieved top performance in the Jailbreaking Attack Track.

Updated: 2024-11-27 11:28:00

标题: 大型语言模型上可转移的集成黑匣子越狱攻击

摘要: 在这份报告中，我们提出了一种新颖的黑盒越狱攻击框架，该框架结合了各种LLM作为攻击者的方法，以实现可转移且强大的越狱攻击。我们的方法是基于现有的越狱研究和实践中的三个关键观察而设计的。首先，我们认为集成方法在揭示与个体攻击相比，对齐的LLM的漏洞更有效。其次，不同的恶意指令在其越狱难度上本质上存在差异，需要差异化对待以确保更高效的攻击。最后，恶意指令的语义一致性对于触发对齐的LLM的防御至关重要；因此，必须仔细破坏其嵌入表示以操纵其嵌入表示，从而提高越狱成功率。我们通过参加2024年LLM和Agent安全竞赛来验证了我们的方法，我们的团队在越狱攻击赛道中取得了最佳表现。

更新时间: 2024-11-27 11:28:00

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.23558v2

Bayesian Hierarchical Probabilistic Forecasting of Intraday Electricity Prices

We address the need for forecasting methodologies that handle large uncertainties in electricity prices for continuous intraday markets by incorporating parameter uncertainty and using a broad set of covariables. This study presents the first Bayesian forecasting of electricity prices traded on the German intraday market. Endogenous and exogenous covariables are handled via Orthogonal Matching Pursuit (OMP) and regularising priors. The target variable is the IDFull price index, with forecasts given as posterior predictive distributions. Validation uses the highly volatile 2022 electricity prices, which have seldom been studied. As a benchmark, we use all intraday transactions at the time of forecast to compute a live IDFull value. According to market efficiency, it should not be possible to improve on this last-price benchmark. However, we observe significant improvements in point measures and probability scores, including an average reduction of $5.9\,\%$ in absolute errors and an average increase of $1.7\,\%$ in accuracy when forecasting whether the IDFull exceeds the day-ahead price. Finally, we challenge the use of LASSO in electricity price forecasting, showing that OMP results in superior performance, specifically an average reduction of $22.7\,\%$ in absolute error and $20.2\,\%$ in the continuous ranked probability score.

Updated: 2024-11-27 11:19:40

标题: 贝叶斯层次概率预测日内电力价格

摘要: 我们针对连续日内市场中处理大量不确定性的电价预测方法的需求，通过纳入参数不确定性和使用广泛的协变量来解决这一问题。本研究首次提出了对德国日内市场电价进行贝叶斯预测的方法。内生和外生协变量通过正交匹配追踪（OMP）和正则化先验进行处理。目标变量是IDFull价格指数，预测结果以后验预测分布给出。验证使用高度波动的2022年电价，这些电价很少被研究。作为基准，我们使用预测时的所有日内交易来计算实时的IDFull值。根据市场效率，不应该可能改善这个最后价格基准。然而，我们观察到点度量和概率分数方面的显著改善，包括绝对误差平均减少$5.9\,\%$和准确性平均增加$1.7\,\%$，当预测IDFull是否超过日前价格时。最后，我们质疑在电价预测中使用LASSO，表明OMP表现出更好的性能，特别是绝对误差平均减少$22.7\,\%$和连续排名概率分数减少$20.2\,\%$。

更新时间: 2024-11-27 11:19:40

领域: stat.AP,cs.LG,G.3; I.2.6; I.6.3; I.6.5

下载: http://arxiv.org/abs/2403.05441v3

Certified Training with Branch-and-Bound: A Case Study on Lyapunov-stable Neural Control

We study the problem of learning Lyapunov-stable neural controllers which provably satisfy the Lyapunov asymptotic stability condition within a region-of-attraction. Compared to previous works which commonly used counterexample guided training on this task, we develop a new and generally formulated certified training framework named CT-BaB, and we optimize for differentiable verified bounds, to produce verification-friendly models. In order to handle the relatively large region-of-interest, we propose a novel framework of training-time branch-and-bound to dynamically maintain a training dataset of subregions throughout training, such that the hardest subregions are iteratively split into smaller ones whose verified bounds can be computed more tightly to ease the training. We demonstrate that our new training framework can produce models which can be more efficiently verified at test time. On the largest 2D quadrotor dynamical system, verification for our model is more than 5X faster compared to the baseline, while our size of region-of-attraction is 16X larger than the baseline.

Updated: 2024-11-27 11:12:46

标题: 使用分支定界的认证培训：以Lyapunov稳定神经控制为例的案例研究

摘要: 我们研究了学习具有李雅普诺夫稳定性的神经控制器的问题，这些神经控制器在吸引域内可以确定满足李雅普诺夫渐近稳定性条件。与以往通常在这一任务上使用反例引导训练相比，我们开发了一个新的并且一般性地制定了认证训练框架，命名为CT-BaB，并优化可微的验证边界，以生成友好于验证的模型。为了处理相对较大的感兴趣区域，我们提出了一个新的训练时分支与界限框架，动态地在整个训练过程中维护一个子区域的训练数据集，这样最难的子区域会被迭代地分割成更小的区域，其验证边界可以更紧密地计算，以便于训练。我们展示了我们的新训练框架可以产生在测试时更有效地验证的模型。在最大的2D四旋翼动力系统上，与基线相比，我们的模型验证速度提高了5倍以上，而我们的吸引域大小比基线大16倍。

更新时间: 2024-11-27 11:12:46

领域: cs.LG,cs.AI,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.18235v1

ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains

Large language models (LLMs) have brought significant changes to many aspects of our lives. However, assessing and ensuring their chronological knowledge remains challenging. Existing approaches fall short in addressing the temporal adaptability of knowledge, often relying on a fixed time-point view. To overcome this, we introduce ChroKnowBench, a benchmark dataset designed to evaluate chronologically accumulated knowledge across three key aspects: multiple domains, time dependency, temporal state. Our benchmark distinguishes between knowledge that evolves (e.g., personal history, scientific discoveries, amended laws) and knowledge that remain constant (e.g., mathematical truths, commonsense facts). Building on this benchmark, we present ChroKnowledge (Chronological Categorization of Knowledge), a novel sampling-based framework for evaluating LLMs' non-parametric chronological knowledge. Our evaluation led to the following observations: (1) The ability of eliciting temporal knowledge varies depending on the data format that model was trained on. (2) LLMs partially recall knowledge or show a cut-off at temporal boundaries rather than recalling all aspects of knowledge correctly. Thus, we apply ourChroKnowPrompt, an in-depth prompting to elicit chronological knowledge by traversing step-by-step through the surrounding time spans. We observe that it successfully recalls objects across both open-source and proprietary LLMs, demonstrating versatility, though it faces challenges with dynamic datasets and unstructured formats.

Updated: 2024-11-27 11:11:00

标题: ChroKnowledge: 揭示多领域语言模型的时间知识

摘要: 大型语言模型（LLMs）已经给我们生活的许多方面带来了重大变化。然而，评估和确保它们的时间知识仍然是具有挑战性的。现有方法在处理知识的时间适应性方面存在不足，通常依赖于固定的时间点视图。为了克服这一问题，我们引入了ChroKnowBench，一个旨在评估跨三个关键方面（多个领域、时间依赖性、时间状态）积累的知识的基准数据集。我们的基准数据集区分了不断发展的知识（例如个人历史、科学发现、修订的法律）和保持不变的知识（例如数学真理、常识事实）。基于这个基准数据集，我们提出了ChroKnowledge（知识的时间分类），这是一个用于评估LLMs非参数化时间知识的新颖基于抽样的框架。我们的评估得出以下观察结果：（1）引出时间知识的能力取决于模型训练的数据格式。（2）LLMs部分地回忆知识或在时间边界处截断，而不是正确地回忆所有知识的各个方面。因此，我们应用了我们的ChroKnowPrompt，一种深入提示，通过逐步穿越周围的时间跨度来引出时间知识。我们观察到，它成功地回忆了开源和专有LLMs中的对象，展示了其灵活性，尽管它在处理动态数据集和非结构化格式时面临挑战。

更新时间: 2024-11-27 11:11:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.09870v2

Randomized-Grid Search for Hyperparameter Tuning in Decision Tree Model to Improve Performance of Cardiovascular Disease Classification

Cardiovascular disease refers to any critical condition that impacts the heart. Because heart diseases can be life-threatening. Researchers are focusing on designing smart systems to accurately diagnose them based on electronic health data, with the aid of machine learning algorithms. Heart disease classification using machine learning (ML) algorithms such as Support Vector Machine(SVM), Na\"ive Bayes(NB), Decision Trees (DTs) and Random Forests (RFs) are often hindered by overfitting. These ML algorithms need extensive hyperparameter tuning. Random Search offers a faster, and, more efficient exploration of hyperparameter space, but, it may overlook optimal regions. Grid Search, though exhaustive, but, it is computationally expensive and inefficient, particularly with high-dimensional data. To address these limitations, Randomized-Grid Search, a novel hybrid optimization method is proposed that combines the global exploration strengths of Random Search with the focused, and, exhaustive search of Grid Search in the most promising regions. This hybrid approach efficiently balances exploration and exploitation. The proposed model optimizes the hyperparameter for Decision Tree model. The proposed model is applied to UCI heart disease dataset for classification. It enhances model performance, provides improved accuracy, generalization, and computational efficiency. Experimental results demonstrate that Randomized-Grid Search outperforms traditional methods by significant margins. The proposed model provides a more effective solution for machine learning applications in healthcare diagnosis.

Updated: 2024-11-27 11:10:28

标题: 随机网格搜索用于调整决策树模型的超参数以提高心血管疾病分类性能

摘要: 心血管疾病指的是影响心脏的任何严重疾病。由于心脏疾病可能危及生命，研究人员正专注于设计智能系统，基于电子健康数据并借助机器学习算法准确诊断这些疾病。使用机器学习（ML）算法如支持向量机（SVM）、朴素贝叶斯（NB）、决策树（DTs）和随机森林（RFs）对心脏疾病进行分类经常受到过拟合的阻碍。这些ML算法需要进行广泛的超参数调整。随机搜索提供了更快速、更高效的超参数空间探索，但可能会忽视最优区域。网格搜索虽然穷尽，但在处理高维数据时计算成本高、效率低。为了解决这些限制，提出了一种新颖的混合优化方法——随机网格搜索，它结合了随机搜索的全局探索优势和网格搜索的集中、穷尽搜索优势，以在最有希望的区域中实现最佳平衡探索和开发。该混合方法有效地平衡了探索和开发。提出的模型对决策树模型进行了超参数优化。该模型应用于UCI心脏疾病数据集进行分类。它提升了模型性能，提供了更高的准确性、泛化性和计算效率。实验结果表明，随机网格搜索在性能上大幅优于传统方法。提出的模型为医疗诊断中的机器学习应用提供了更有效的解决方案。

更新时间: 2024-11-27 11:10:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.18234v1

Machine learning-based classification for Single Photon Space Debris Light Curves

The growing number of man-made debris in Earth's orbit poses a threat to active satellite missions due to the risk of collision. Characterizing unknown debris is, therefore, of high interest. Light Curves (LCs) are temporal variations of object brightness and have been shown to contain information such as shape, attitude, and rotational state. Since 2015, the Satellite Laser Ranging (SLR) group of Space Research Institute (IWF) Graz has been building a space debris LC catalogue. The LCs are captured on a Single Photon basis, which sets them apart from CCD-based measurements. In recent years, Machine Learning (ML) models have emerged as a viable technique for analyzing LCs. This work aims to classify Single Photon Space Debris using the ML framework. We have explored LC classification using k-Nearest Neighbour (k-NN), Random Forest (RDF), XGBoost (XGB), and Convolutional Neural Network (CNN) classifiers in order to assess the difference in performance between traditional and deep models. Instead of performing classification on the direct LCs data, we extracted features from the data first using an automated pipeline. We apply our models on three tasks, which are classifying individual objects, objects grouped into families according to origin (e.g., GLONASS satellites), and grouping into general types (e.g., rocket bodies). We successfully classified Space Debris LCs captured on Single Photon basis, obtaining accuracies as high as 90.7%. Further, our experiments show that the classifiers provide better classification accuracy with automated extracted features than other methods.

Updated: 2024-11-27 11:08:06

标题: 基于机器学习的单光子空间碎片光变曲线分类

摘要: 地球轨道中人造碎片数量不断增加，对活跃卫星任务构成威胁，因为存在碰撞风险。因此，表征未知碎片具有极大的兴趣。光曲线（LCs）是物体亮度的时间变化，已被证明包含形状、姿态和旋转状态等信息。自2015年以来，格拉茨空间研究所（IWF）的卫星激光测距（SLR）小组一直在建立一个太空碎片光曲线目录。LCs是基于单光子捕获的，这使它们与基于CCD的测量方法有所不同。近年来，机器学习（ML）模型已成为分析LCs的一种可行技术。本研究旨在利用ML框架对单光子太空碎片进行分类。我们探索了使用k-最近邻（k-NN）、随机森林（RDF）、XGBoost（XGB）和卷积神经网络（CNN）分类器进行LC分类，以评估传统和深度模型之间性能差异。我们没有直接对LCs数据进行分类，而是首先使用自动化流程从数据中提取特征。我们将我们的模型应用于三个任务，即对个体物体进行分类、根据来源（例如GLONASS卫星）将物体分组成家族，以及将物体分组成一般类型（例如火箭主体）。我们成功地对基于单光子捕获的太空碎片LCs进行了分类，获得高达90.7%的准确性。此外，我们的实验表明，分类器在自动提取特征时提供更好的分类准确性。

更新时间: 2024-11-27 11:08:06

领域: astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2411.18231v1

Dependency-Aware CAV Task Scheduling via Diffusion-Based Reinforcement Learning

In this paper, we propose a novel dependency-aware task scheduling strategy for dynamic unmanned aerial vehicle-assisted connected autonomous vehicles (CAVs). Specifically, different computation tasks of CAVs consisting of multiple dependency subtasks are judiciously assigned to nearby CAVs or the base station for promptly completing tasks. Therefore, we formulate a joint scheduling priority and subtask assignment optimization problem with the objective of minimizing the average task completion time. The problem aims at improving the long-term system performance, which is reformulated as a Markov decision process. To solve the problem, we further propose a diffusion-based reinforcement learning algorithm, named Synthetic DDQN based Subtasks Scheduling, which can make adaptive task scheduling decision in real time. A diffusion model-based synthetic experience replay is integrated into the reinforcement learning framework, which can generate sufficient synthetic data in experience replay buffer, thereby significantly accelerating convergence and improving sample efficiency. Simulation results demonstrate the effectiveness of the proposed algorithm on reducing task completion time, comparing to benchmark schemes.

Updated: 2024-11-27 11:07:31

标题: 基于扩散强化学习的依赖感知CAV任务调度

摘要: 在本文中，我们提出了一种新颖的依赖关系感知任务调度策略，用于动态的无人机辅助连接的自动驾驶车辆（CAVs）。具体来说，由多个依赖子任务组成的CAVs的不同计算任务被明智地分配给附近的CAVs或基站，以便及时完成任务。因此，我们制定了一个联合调度优先级和子任务分配优化问题，其目标是最小化平均任务完成时间。该问题旨在提高长期系统性能，被重新表述为马尔可夫决策过程。为了解决这个问题，我们进一步提出了一种基于扩散的强化学习算法，名为合成DDQN基于子任务调度，可以实时做出自适应的任务调度决策。基于扩散模型的合成经验重放被整合到强化学习框架中，可以在经验重放缓冲区生成足够的合成数据，从而显着加快收敛速度并提高样本效率。仿真结果表明，与基准方案相比，所提出的算法在减少任务完成时间方面是有效的。

更新时间: 2024-11-27 11:07:31

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2411.18230v1

Feature-Factory: Automating Software Feature Integration Using Generative AI

Integrating new features into existing software projects can be a complex and time-consuming process. Feature-Factory leverages Generative AI with WatsonX.ai to automate the analysis, planning, and implementation of feature requests. By combining advanced project parsing, dependency resolution, and AI-generated code, the program ensures seamless integration of features into software systems while maintaining structural integrity. This paper presents the methodology, mathematical model, and results of the Feature-Factory framework.

Updated: 2024-11-27 11:03:47

标题: 特征工厂：利用生成型人工智能自动化软件特征集成

摘要: 将新功能集成到现有软件项目中可能是一个复杂和耗时的过程。Feature-Factory利用WatsonX.ai的生成式人工智能来自动化特性请求的分析、规划和实施。通过结合先进的项目解析、依赖关系解决和人工智能生成的代码，该程序确保将功能无缝集成到软件系统中，同时保持结构完整性。本文介绍了Feature-Factory框架的方法论、数学模型和结果。

更新时间: 2024-11-27 11:03:47

领域: cs.SE,cs.AI,cs.LG,cs.MA,68T05, 68N01, 68N30, 68Q25,D.2.3; I.2.2; D.2.7; D.2.9; I.2.7

下载: http://arxiv.org/abs/2411.18226v1

PATHS: A Hierarchical Transformer for Efficient Whole Slide Image Analysis

Computational analysis of whole slide images (WSIs) has seen significant research progress in recent years, with applications ranging across important diagnostic and prognostic tasks such as survival or cancer subtype prediction. Many state-of-the-art models process the entire slide - which may be as large as $150,000 \times 150,000$ pixels - as a bag of many patches, the size of which necessitates computationally cheap feature aggregation methods. However, a large proportion of these patches are uninformative, such as those containing only healthy or adipose tissue, adding significant noise and size to the bag. We propose Pathology Transformer with Hierarchical Selection (PATHS), a novel top-down method for hierarchical weakly supervised representation learning on slide-level tasks in computational pathology. PATHS is inspired by the cross-magnification manner in which a human pathologist examines a slide, recursively filtering patches at each magnification level to a small subset relevant to the diagnosis. Our method overcomes the complications of processing the entire slide, enabling quadratic self-attention and providing a simple interpretable measure of region importance. We apply PATHS to five datasets of The Cancer Genome Atlas (TCGA), and achieve superior performance on slide-level prediction tasks when compared to previous methods, despite processing only a small proportion of the slide.

Updated: 2024-11-27 11:03:38

标题: PATHS：一种用于高效全切片图像分析的分层Transformer

摘要: 最近几年来，对整张切片图像（WSIs）进行计算分析取得了显著的研究进展，应用范围涵盖重要的诊断和预后任务，如生存或癌症亚型预测。许多最先进的模型处理整张幻灯片 - 可能达到$150,000 \times 150,000$像素 - 作为许多补丁的集合，其大小需要计算廉价的特征聚合方法。然而，这些补丁中有很大一部分是无信息的，例如只包含健康或脂肪组织的补丁，这些补丁给集合增加了显著的噪音和大小。我们提出了一种新颖的自顶向下方法，名为Hierarchical Selection (PATHS)的病理变换器，用于在计算病理学中进行幻灯片级任务的分层弱监督表示学习。PATHS受到人类病理学家检查幻灯片的交叉放大方式的启发，逐级在每个放大级别递归地过滤补丁，将其减少到与诊断相关的小子集。我们的方法克服了处理整张幻灯片的复杂性，实现了二次自注意力并提供了区域重要性的简单可解释度量。我们将PATHS应用于五个The Cancer Genome Atlas（TCGA）数据集，并与先前方法相比，在幻灯片级预测任务上实现了卓越的性能，尽管只处理了幻灯片的一小部分。

更新时间: 2024-11-27 11:03:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18225v1

LAGUNA: LAnguage Guided UNsupervised Adaptation with structured spaces

Unsupervised domain adaptation remains a critical challenge in enabling the knowledge transfer of models across unseen domains. Existing methods struggle to balance the need for domain-invariant representations with preserving domain-specific features, which is often due to alignment approaches that impose the projection of samples with similar semantics close in the latent space despite their drastic domain differences. We introduce LAGUNA - LAnguage Guided UNsupervised Adaptation with structured spaces, a novel approach that shifts the focus from aligning representations in absolute coordinates to aligning the relative positioning of equivalent concepts in latent spaces. LAGUNA defines a domain-agnostic structure upon the semantic/geometric relationships between class labels in language space and guides adaptation, ensuring that the organization of samples in visual space reflects reference inter-class relationships while preserving domain-specific characteristics. We empirically demonstrate LAGUNA's superiority in domain adaptation tasks across four diverse images and video datasets. Remarkably, LAGUNA surpasses previous works in 18 different adaptation scenarios across four diverse image and video datasets with average accuracy improvements of +3.32% on DomainNet, +5.75% in GeoPlaces, +4.77% on GeoImnet, and +1.94% mean class accuracy improvement on EgoExo4D.

Updated: 2024-11-27 11:01:33

标题: LAGUNA: 结构空间下的语言引导无监督适应

摘要: Unsupervised domain adaptation (UDA) is a challenging task for transferring knowledge of models to new domains. Existing methods struggle to find a balance between creating domain-invariant representations and preserving domain-specific features. This is often due to alignment techniques that force samples with similar semantics to be close in latent space, even if they come from vastly different domains. In this study, we propose LAGUNA - Language Guided Unsupervised Adaptation with structured spaces, a novel approach that focuses on aligning the relative positioning of equivalent concepts in latent spaces rather than absolute coordinates. LAGUNA establishes a domain-agnostic structure based on semantic and geometric relationships between class labels in language space, guiding adaptation to ensure that the organization of samples in visual space reflects inter-class relationships while still preserving domain-specific characteristics. Our empirical results demonstrate that LAGUNA outperforms previous methods in 18 different adaptation scenarios across four diverse image and video datasets. Specifically, LAGUNA achieves average accuracy improvements of +3.32% on DomainNet, +5.75% on GeoPlaces, +4.77% on GeoImnet, and a mean class accuracy improvement of +1.94% on EgoExo4D.

更新时间: 2024-11-27 11:01:33

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.15557v2

Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants

AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes. We conceptualize these challenges through the lens of vulnerability, the potential for university assessments and learning outcomes to be impacted by student use of generative AI. We investigate the potential scale of this vulnerability by measuring the degree to which AI assistants can complete assessment questions in standard university-level STEM courses. Specifically, we compile a novel dataset of textual assessment questions from 50 courses at EPFL and evaluate whether two AI assistants, GPT-3.5 and GPT-4 can adequately answer these questions. We use eight prompting strategies to produce responses and find that GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions. When grouping courses in our dataset by degree program, these systems already pass non-project assessments of large numbers of core courses in various degree programs, posing risks to higher education accreditation that will be amplified as these models improve. Our results call for revising program-level assessment design in higher education in light of advances in generative AI.

Updated: 2024-11-27 10:59:10

标题: ChatGPT能否获得工程学位？评估高等教育对AI助手的脆弱性

摘要: AI 助手在高等教育机构的学生中越来越被广泛使用。虽然这些工具提供了改善教学和教育的机会，但它们也对评估和学习成果提出了重大挑战。我们通过脆弱性的视角来概念化这些挑战，即学生使用生成式AI可能会影响大学评估和学习成果。我们通过衡量AI助手在标准大学水平STEM课程中能够完成评估问题的程度，来调查这种脆弱性的潜在规模。具体来说，我们编制了来自EPFL的50门课程的文本评估问题的新数据集，并评估了两个AI助手GPT-3.5和GPT-4是否能够充分回答这些问题。我们使用八种提示策略来产生答案，并发现GPT-4平均正确回答65.8%的问题，甚至可以通过至少一种提示策略正确回答85.1%的问题。当根据我们数据集中的学位计划将课程分组时，这些系统已经通过了各种学位计划中大量核心课程的非项目评估，这对高等教育认证构成了风险，随着这些模型的进步，这种风险将会加剧。我们的研究结果呼吁在生成式AI的进步光下重新设计高等教育的课程评估。

更新时间: 2024-11-27 10:59:10

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.11841v2

R-MTLLMF: Resilient Multi-Task Large Language Model Fusion at the Wireless Edge

Multi-task large language models (MTLLMs) are important for many applications at the wireless edge, where users demand specialized models to handle multiple tasks efficiently. However, training MTLLMs is complex and exhaustive, particularly when tasks are subject to change. Recently, the concept of model fusion via task vectors has emerged as an efficient approach for combining fine-tuning parameters to produce an MTLLM. In this paper, the problem of enabling edge users to collaboratively craft such MTTLMs via tasks vectors is studied, under the assumption of worst-case adversarial attacks. To this end, first the influence of adversarial noise to multi-task model fusion is investigated and a relationship between the so-called weight disentanglement error and the mean squared error (MSE) is derived. Using hypothesis testing, it is directly shown that the MSE increases interference between task vectors, thereby rendering model fusion ineffective. Then, a novel resilient MTLLM fusion (R-MTLLMF) is proposed, which leverages insights about the LLM architecture and fine-tuning process to safeguard task vector aggregation under adversarial noise by realigning the MTLLM. The proposed R-MTLLMF is then compared for both worst-case and ideal transmission scenarios to study the impact of the wireless channel. Extensive model fusion experiments with vision LLMs demonstrate R-MTLLMF's effectiveness, achieving close-to-baseline performance across eight different tasks in ideal noise scenarios and significantly outperforming unprotected model fusion in worst-case scenarios. The results further advocate for additional physical layer protection for a holistic approach to resilience, from both a wireless and LLM perspective.

Updated: 2024-11-27 10:57:06

标题: R-MTLLMF：无线边缘的具弹性多任务大语言模型融合

摘要: 多任务大语言模型（MTLLMs）对于许多在无线边缘的应用非常重要，用户需要专门的模型来高效处理多个任务。然而，训练MTLLMs是复杂且繁琐的，特别是当任务会发生变化时。最近，通过任务向量进行模型融合的概念已经被提出作为一种有效的方法，用于结合微调参数以产生MTLLM。本文研究了在最坏情况下进行对抗攻击的假设下，使边缘用户通过任务向量共同打造这样的MTTLMs的问题。为此，首先研究了对抗性噪声对多任务模型融合的影响，并推导了所谓的权重解缠错误与均方误差（MSE）之间的关系。通过假设检验，直接显示MSE增加了任务向量之间的干扰，从而使模型融合失效。然后，提出了一种新颖的弹性MTLLM融合（R-MTLLMF），它利用了关于LLM架构和微调过程的见解，通过重新调整MTLLM来保护任务向量在对抗性噪声下的聚合。提出的R-MTLLMF然后在最坏情况和理想传输情景下进行比较，以研究无线信道的影响。通过使用视觉LLMs进行广泛的模型融合实验，证明了R-MTLLMF的有效性，在理想噪音情况下在八个不同任务中实现了接近基准性能，并在最坏情况下显著优于未受保护的模型融合。结果进一步倡导对物理层的额外保护，以实现从无线和LLM两个角度的全面弹性方法。

更新时间: 2024-11-27 10:57:06

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.18220v1

Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs

Large Language Models (LLMs) are increasingly used in software development to generate functions, such as attack detectors, that implement security requirements. However, LLMs struggle to generate accurate code, resulting, e.g., in attack detectors that miss well-known attacks when used in practice. This is most likely due to the LLM lacking knowledge about some existing attacks and to the generated code being not evaluated in real usage scenarios. We propose a novel approach integrating Retrieval Augmented Generation (RAG) and Self-Ranking into the LLM pipeline. RAG enhances the robustness of the output by incorporating external knowledge sources, while the Self-Ranking technique, inspired to the concept of Self-Consistency, generates multiple reasoning paths and creates ranks to select the most robust detector. Our extensive empirical study targets code generated by LLMs to detect two prevalent injection attacks in web security: Cross-Site Scripting (XSS) and SQL injection (SQLi). Results show a significant improvement in detection performance compared to baselines, with an increase of up to 71%pt and 37%pt in the F2-Score for XSS and SQLi detection, respectively.

Updated: 2024-11-27 10:48:37

标题: 评估和改进由LLMs生成的安全攻击检测器的鲁棒性

摘要: 大型语言模型（LLMs）越来越多地被用于软件开发中，用于生成实现安全需求的功能，如攻击检测器。然而，LLMs往往难以生成准确的代码，导致在实际应用中使用时，例如，攻击检测器会错过众所周知的攻击。这很可能是因为LLMs缺乏对一些现有攻击的了解，以及生成的代码未在实际使用场景中进行评估。我们提出了一种新颖的方法，将检索增强生成（RAG）和自我排名集成到LLM管道中。RAG通过整合外部知识来源来增强输出的鲁棒性，而灵感来自于自一致性概念的自我排名技术生成多个推理路径，并创建排名以选择最稳健的检测器。我们广泛的经验研究针对LLMs生成的代码，用于检测Web安全中的两种常见注入攻击：跨站脚本（XSS）和SQL注入（SQLi）。结果显示，与基线相比，检测性能有显著改善，分别在XSS和SQLi检测的F2分数上增加了高达71%和37%。

更新时间: 2024-11-27 10:48:37

领域: cs.SE,cs.CR,cs.LG

下载: http://arxiv.org/abs/2411.18216v1

SCoTT: Wireless-Aware Path Planning with Vision Language Models and Strategic Chains-of-Thought

Path planning is a complex problem for many practical applications, particularly in robotics. Existing algorithms, however, are exhaustive in nature and become increasingly complex when additional side constraints are incorporated alongside distance minimization. In this paper, a novel approach using vision language models (VLMs) is proposed for enabling path planning in complex wireless-aware environments. To this end, insights from a digital twin (DT) with real-world wireless ray tracing data are explored in order to guarantee an average path gain threshold while minimizing the trajectory length. First, traditional approaches such as A* are compared to several wireless-aware extensions, and an optimal iterative dynamic programming approach (DP-WA*) is derived, which fully takes into account all path gains and distance metrics within the DT. On the basis of these baselines, the role of VLMs as an alternative assistant for path planning is investigated, and a strategic chain-of-thought tasking (SCoTT) approach is proposed. SCoTT divides the complex planning task into several subproblems and solves each with advanced CoT prompting. Results show that SCoTT achieves very close average path gains compared to DP-WA* while at the same time yielding consistently shorter path lengths. The results also show that VLMs can be used to accelerate DP-WA* by efficiently reducing the algorithm's search space and thus saving up to 62\% in execution time. This work underscores the potential of VLMs in future digital systems as capable assistants for solving complex tasks, while enhancing user interaction and accelerating rapid prototyping under diverse wireless constraints.

Updated: 2024-11-27 10:45:49

标题: SCoTT：利用视觉语言模型和战略思维链的无线感知路径规划

摘要: 路径规划对许多实际应用而言都是一个复杂的问题，特别是在机器人领域。然而，现有的算法具有穷举性质，当额外的侧面约束与距离最小化结合时，算法变得越来越复杂。本文提出了一种新颖的方法，利用视觉语言模型（VLMs）来实现在复杂的无线感知环境中进行路径规划。为此，利用数字孪生（DT）与真实世界的无线射线跟踪数据的见解，以确保在最小化轨迹长度的同时保证平均路径增益阈值。首先，将传统方法如A*与几种无线感知扩展进行比较，并推导出一种最优的迭代动态规划方法（DP-WA*），该方法充分考虑DT内的所有路径增益和距离度量。基于这些基线，研究了VLMs作为路径规划的替代助手的作用，并提出了策略性的思维链任务（SCoTT）方法。SCoTT将复杂的规划任务分解为几个子问题，并利用先进的CoT提示来解决每个问题。结果表明，SCoTT实现了与DP-WA*非常接近的平均路径增益，同时产生了一致较短的路径长度。结果还表明，VLMs可以用于加速DP-WA*，通过有效地减少算法的搜索空间，从而节省高达62%的执行时间。这项工作突显了VLMs在未来数字系统中作为解决复杂任务的能力助手的潜力，同时在不同的无线约束下增强用户交互并加速快速原型设计。

更新时间: 2024-11-27 10:45:49

领域: cs.LG,cs.AI,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.18212v1

TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability

Rapid development of large language models (LLMs) has significantly advanced multimodal large language models (LMMs), particularly in vision-language tasks. However, existing video-language models often overlook precise temporal localization and struggle with videos of varying lengths. We introduce TimeMarker, a versatile Video-LLM designed for high-quality dialogue based on video content, emphasizing temporal localization. TimeMarker integrates Temporal Separator Tokens to enhance temporal awareness, accurately marking specific moments within videos. It employs the AnyLength mechanism for dynamic frame sampling and adaptive token merging, enabling effective handling of both short and long videos. Additionally, TimeMarker utilizes diverse datasets, including further transformed temporal-related video QA datasets, to bolster its temporal understanding capabilities. Image and interleaved data are also employed to further enhance the model's semantic perception ability. Evaluations demonstrate that TimeMarker achieves state-of-the-art performance across multiple benchmarks, excelling in both short and long video categories. Our project page is at \url{https://github.com/TimeMarker-LLM/TimeMarker/}.

Updated: 2024-11-27 10:45:40

标题: 时间标记：一种多功能的视频LLM，用于长视频和短视频理解，具有出色的时间定位能力

摘要: 大型语言模型（LLM）的快速发展显著推动了多模态大型语言模型（LMM），特别是在视觉语言任务中。然而，现有的视频语言模型经常忽略精确的时间定位，并且在处理长度不同的视频时存在困难。我们引入了TimeMarker，一个专为基于视频内容的高质量对话设计的多功能视频-LLM，强调时间定位。TimeMarker集成了时间分隔符令牌以增强时间意识，准确标记视频中特定时刻。它采用了AnyLength机制进行动态帧采样和自适应令牌合并，从而实现对短视频和长视频的有效处理。此外，TimeMarker利用各种数据集，包括进一步转换的与时间相关的视频问答数据集，以增强其时间理解能力。还使用图像和交错数据进一步增强模型的语义感知能力。评估结果表明，TimeMarker在多个基准测试中实现了最先进的性能，在短视频和长视频类别中表现出色。我们的项目页面位于\url{https://github.com/TimeMarker-LLM/TimeMarker/}。

更新时间: 2024-11-27 10:45:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18211v1

From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

Traditional object detection methods operate under the closed-set assumption, where models can only detect a fixed number of objects predefined in the training set. Recent works on open vocabulary object detection (OVD) enable the detection of objects defined by an unbounded vocabulary, which reduces the cost of training models for specific tasks. However, OVD heavily relies on accurate prompts provided by an ''oracle'', which limits their use in critical applications such as driving scene perception. OVD models tend to misclassify near-out-of-distribution (NOOD) objects that have similar semantics to known classes, and ignore far-out-of-distribution (FOOD) objects. To address theses limitations, we propose a framework that enables OVD models to operate in open world settings, by identifying and incrementally learning novel objects. To detect FOOD objects, we propose Open World Embedding Learning (OWEL) and introduce the concept of Pseudo Unknown Embedding which infers the location of unknown classes in a continuous semantic space based on the information of known classes. We also propose Multi-Scale Contrastive Anchor Learning (MSCAL), which enables the identification of misclassified unknown objects by promoting the intra-class consistency of object embeddings at different scales. The proposed method achieves state-of-the-art performance in common open world object detection and autonomous driving benchmarks.

Updated: 2024-11-27 10:33:51

标题: 从开放词汇到开放世界：教视觉语言模型检测新颖对象

摘要: 传统的物体检测方法基于封闭集假设，模型只能检测训练集中预定义的固定数量的对象。最近关于开放词汇物体检测（OVD）的研究使得能够检测由无限词汇定义的对象，从而降低了针对特定任务训练模型的成本。然而，OVD严重依赖于“神谕”提供的精确提示，这限制了它们在关键应用中如驾驶场景感知的使用。OVD模型往往会误分类与已知类别具有相似语义的接近分布之外（NOOD）的对象，并忽略远离分布之外（FOOD）的对象。为了解决这些限制，我们提出了一个框架，使得OVD模型能够在开放世界环境中运作，通过识别并逐步学习新对象。为了检测FOOD对象，我们提出了开放世界嵌入学习（OWEL）并引入了伪未知嵌入的概念，根据已知类别的信息在连续语义空间中推断未知类别的位置。我们还提出了多尺度对比锚学习（MSCAL），通过促进不同尺度的对象嵌入的内部一致性来实现对误分类未知对象的识别。提出的方法在常见的开放世界物体检测和自动驾驶基准测试中取得了最先进的性能。

更新时间: 2024-11-27 10:33:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18207v1

Atlas-Based Interpretable Age Prediction In Whole-Body MR Images

Age prediction is an important part of medical assessments and research. It can aid in detecting diseases as well as abnormal ageing by highlighting potential discrepancies between chronological and biological age. To improve understanding of age-related changes in various body parts, we investigate the ageing of the human body on a large scale by using whole-body 3D images. We utilise the Grad-CAM method to determine the body areas most predictive of a person's age. In order to expand our analysis beyond individual subjects, we employ registration techniques to generate population-wide importance maps that show the most predictive areas in the body for a whole cohort of subjects. We show that the investigation of the full 3D volume of the whole body and the population-wide analysis can give important insights into which body parts play the most important roles in predicting a person's age. Our findings reveal three primary areas of interest: the spine, the autochthonous back muscles, and the cardiac region, which exhibits the highest importance. Finally, we investigate differences between subjects that show accelerated and decelerated ageing.

Updated: 2024-11-27 10:26:18

标题: 基于图谱的全身MR图像可解释年龄预测

摘要: 年龄预测是医学评估和研究的重要组成部分。它可以帮助检测疾病以及异常老化，通过突出实际年龄和生物年龄之间潜在的差异。为了改善对各个身体部位的年龄相关变化的理解，我们使用全身3D图像对人体的衰老进行大规模研究。我们利用Grad-CAM方法确定对于人的年龄最具预测性的身体区域。为了将分析扩展到个体之外，我们采用注册技术生成全体受试者重要性地图，显示出对于整个队列受试者最具预测性的身体区域。我们展示了对于预测人的年龄起着最重要作用的身体部位，全身3D体积的调查和全体人群范围的分析可以提供重要见解。我们的研究结果揭示了三个主要感兴趣的区域：脊柱、本地背部肌肉和心脏区域，其中心脏区域表现出最高的重要性。最后，我们研究了表现出加速和减缓老化的个体之间的差异。

更新时间: 2024-11-27 10:26:18

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2307.07439v5

Learning for Long-Horizon Planning via Neuro-Symbolic Abductive Imitation

Recent learning-to-imitation methods have shown promising results in planning via imitating within the observation-action space. However, their ability in open environments remains constrained, particularly in long-horizon tasks. In contrast, traditional symbolic planning excels in long-horizon tasks through logical reasoning over human-defined symbolic spaces but struggles to handle observations beyond symbolic states, such as high-dimensional visual inputs encountered in real-world scenarios. In this work, we draw inspiration from abductive learning and introduce a novel framework \textbf{AB}ductive \textbf{I}mitation \textbf{L}earning (ABIL) that integrates the benefits of data-driven learning and symbolic-based reasoning, enabling long-horizon planning. Specifically, we employ abductive reasoning to understand the demonstrations in symbolic space and design the principles of sequential consistency to resolve the conflicts between perception and reasoning. ABIL generates predicate candidates to facilitate the perception from raw observations to symbolic space without laborious predicate annotations, providing a groundwork for symbolic planning. With the symbolic understanding, we further develop a policy ensemble whose base policies are built with different logical objectives and managed through symbolic reasoning. Experiments show that our proposal successfully understands the observations with the task-relevant symbolics to assist the imitation learning. Importantly, ABIL demonstrates significantly improved data efficiency and generalization across various long-horizon tasks, highlighting it as a promising solution for long-horizon planning. Project website: \url{https://www.lamda.nju.edu.cn/shaojj/KDD25_ABIL/}.

Updated: 2024-11-27 10:26:14

标题: 通过神经符号应用性模拟学习长期规划

摘要: 最近的学习-模仿方法在规划中通过在观察-动作空间内进行模仿展现出了令人满意的结果。然而，在开放环境中，它们的能力仍然受到限制，特别是在长期任务中。相比之下，传统的符号规划通过在人类定义的符号空间上进行逻辑推理在长期任务中表现出色，但在处理超出符号状态的观察方面遇到困难，比如在真实场景中遇到的高维视觉输入。在这项工作中，我们从推理学习中汲取灵感，引入了一个新颖的框架ABductive Imitation Learning（ABIL），它整合了数据驱动学习和基于符号的推理的优势，实现了长期规划。具体来说，我们采用归纳推理来理解符号空间中的演示，并设计了顺序一致性原则来解决感知和推理之间的冲突。ABIL生成谓词候选项，以促进从原始观察到符号空间的感知，而无需繁琐的谓词注释，为符号规划奠定基础。通过符号理解，我们进一步发展了一个策略集合，其基本策略是根据不同的逻辑目标构建的，并通过符号推理进行管理。实验证明，我们的提议成功地理解了与任务相关的符号观察，以帮助模仿学习。重要的是，ABIL在各种长期任务中展示了明显改善的数据效率和泛化能力，突显其作为长期规划的有希望解决方案。项目网站：https://www.lamda.nju.edu.cn/shaojj/KDD25_ABIL/。

更新时间: 2024-11-27 10:26:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.18201v1

Semantic Edge Computing and Semantic Communications in 6G Networks: A Unifying Survey and Research Challenges

Semantic Edge Computing (SEC) and Semantic Communications (SemComs) have been proposed as viable approaches to achieve real-time edge-enabled intelligence in sixth-generation (6G) wireless networks. On one hand, SemCom leverages the strength of Deep Neural Networks (DNNs) to encode and communicate the semantic information only, while making it robust to channel distortions by compensating for wireless effects. Ultimately, this leads to an improvement in the communication efficiency. On the other hand, SEC has leveraged distributed DNNs to divide the computation of a DNN across different devices based on their computational and networking constraints. Although significant progress has been made in both fields, the literature lacks a systematic view to connect both fields. In this work, we fulfill the current gap by unifying the SEC and SemCom fields. We summarize the research problems in these two fields and provide a comprehensive review of the state of the art with a focus on their technical strengths and challenges.

Updated: 2024-11-27 10:21:10

标题: 第六代网络中的语义边缘计算和语义通信：一项统一调查及研究挑战

摘要: 语义边缘计算（SEC）和语义通信（SemComs）已被提出作为实现第六代（6G）无线网络中实时边缘智能的可行方法。一方面，SemCom利用深度神经网络（DNNs）的优势仅编码和传输语义信息，同时通过补偿无线效应使其对信道扭曲具有鲁棒性。最终，这导致通信效率的提高。另一方面，SEC利用分布式DNN在不同设备之间分配DNN的计算，根据它们的计算和网络约束。尽管在这两个领域取得了显着进展，但文献缺乏一个系统性观点来连接这两个领域。在这项工作中，我们通过统一SEC和SemCom领域来填补当前的空白。我们总结了这两个领域的研究问题，并重点介绍了其技术优势和挑战的现状综述。

更新时间: 2024-11-27 10:21:10

领域: cs.LG,cs.NI,eess.SP

下载: http://arxiv.org/abs/2411.18199v1

Scalable Multi-Objective Reinforcement Learning with Fairness Guarantees using Lorenz Dominance

Multi-Objective Reinforcement Learning (MORL) aims to learn a set of policies that optimize trade-offs between multiple, often conflicting objectives. MORL is computationally more complex than single-objective RL, particularly as the number of objectives increases. Additionally, when objectives involve the preferences of agents or groups, ensuring fairness is socially desirable. This paper introduces a principled algorithm that incorporates fairness into MORL while improving scalability to many-objective problems. We propose using Lorenz dominance to identify policies with equitable reward distributions and introduce {\lambda}-Lorenz dominance to enable flexible fairness preferences. We release a new, large-scale real-world transport planning environment and demonstrate that our method encourages the discovery of fair policies, showing improved scalability in two large cities (Xi'an and Amsterdam). Our methods outperform common multi-objective approaches, particularly in high-dimensional objective spaces.

Updated: 2024-11-27 10:16:25

标题: 使用Lorenz支配的公平保证实现可拓展的多目标强化学习

摘要: 多目标强化学习（MORL）旨在学习一组策略，优化多个常常矛盾的目标之间的权衡。与单一目标强化学习相比，MORL在计算上更为复杂，特别是当目标数量增加时。此外，当目标涉及代理或群体的偏好时，确保公平性是社会上可取的。本文介绍了一种将公平性融入MORL并提高对多目标问题的可扩展性的原则算法。我们建议使用Lorenz支配来识别具有公平奖励分布的策略，并引入{\lambda}-Lorenz支配来实现灵活的公平偏好。我们发布了一个新的、大规模的真实世界交通规划环境，并展示了我们的方法鼓励发现公平策略，在两个大城市（西安和阿姆斯特丹）中展现出改善的可扩展性。我们的方法在高维目标空间中优于常见的多目标方法。

更新时间: 2024-11-27 10:16:25

领域: cs.LG

下载: http://arxiv.org/abs/2411.18195v1

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

Agents powered by large language models have shown remarkable abilities in solving complex tasks. However, most agent systems remain reactive, limiting their effectiveness in scenarios requiring foresight and autonomous decision-making. In this paper, we tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions. We propose a novel data-driven approach for this problem. Firstly, we collect real-world human activities to generate proactive task predictions. These predictions are then labeled by human annotators as either accepted or rejected. The labeled data is used to train a reward model that simulates human judgment and serves as an automatic evaluator of the proactiveness of LLM agents. Building on this, we develop a comprehensive data generation pipeline to create a diverse dataset, ProactiveBench, containing 6,790 events. Finally, we demonstrate that fine-tuning models with the proposed ProactiveBench can significantly elicit the proactiveness of LLM agents. Experimental results show that our fine-tuned model achieves an F1-Score of 66.47% in proactively offering assistance, outperforming all open-source and close-source models. These results highlight the potential of our method in creating more proactive and effective agent systems, paving the way for future advancements in human-agent collaboration.

Updated: 2024-11-27 10:14:54

标题: 主动型代理：将LLM代理从被动响应转变为主动协助

摘要: 由大型语言模型驱动的代理展示了在解决复杂任务方面的显著能力。然而，大多数代理系统仍然是反应性的，限制了它们在需要前瞻性和自主决策的场景中的有效性。在本文中，我们致力于开发能够预测和发起任务而无需明确人类指令的主动代理的挑战。我们提出了一种新颖的数据驱动方法来解决这个问题。首先，我们收集真实世界的人类活动以生成主动任务预测。然后，这些预测由人类标注者标记为接受或拒绝。标记的数据用于训练一个模拟人类判断并作为LLM代理主动性的自动评估器的奖励模型。在此基础上，我们开发了一个全面的数据生成流水线来创建一个包含6,790个事件的多样化数据集ProactiveBench。最后，我们展示了用提出的ProactiveBench对模型进行微调可以显著引发LLM代理的主动性。实验结果显示，我们的微调模型在主动提供帮助方面实现了66.47%的F1分数，优于所有开源和闭源模型。这些结果突显了我们的方法在创建更主动和有效的代理系统方面的潜力，为人类-代理协作的未来进展铺平了道路。

更新时间: 2024-11-27 10:14:54

领域: cs.AI,cs.CL,I.2.7

下载: http://arxiv.org/abs/2410.12361v2

InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks

Large language models (LLMs) possess extensive knowledge and question-answering capabilities, having been widely deployed in privacy-sensitive domains like finance and medical consultation. During LLM inferences, cache-sharing methods are commonly employed to enhance efficiency by reusing cached states or responses for the same or similar inference requests. However, we identify that these cache mechanisms pose a risk of private input leakage, as the caching can result in observable variations in response times, making them a strong candidate for a timing-based attack hint. In this study, we propose a novel timing-based side-channel attack to execute input theft in LLMs inference. The cache-based attack faces the challenge of constructing candidate inputs in a large search space to hit and steal cached user queries. To address these challenges, we propose two primary components. The input constructor employs machine learning techniques and LLM-based approaches for vocabulary correlation learning while implementing optimized search mechanisms for generalized input construction. The time analyzer implements statistical time fitting with outlier elimination to identify cache hit patterns, continuously providing feedback to refine the constructor's search strategy. We conduct experiments across two cache mechanisms and the results demonstrate that our approach consistently attains high attack success rates in various applications. Our work highlights the security vulnerabilities associated with performance optimizations, underscoring the necessity of prioritizing privacy and security alongside enhancements in LLM inference.

Updated: 2024-11-27 10:14:38

标题: InputSnatch:通过时间侧信道攻击在LLM服务中窃取输入

摘要: 大型语言模型（LLMs）具有广泛的知识和问答能力，在金融和医疗咨询等隐私敏感领域被广泛部署。在LLM推理过程中，常常采用缓存共享方法来提高效率，通过重复使用缓存状态或响应来处理相同或相似的推理请求。然而，我们发现这些缓存机制存在私人输入泄露的风险，因为缓存可能导致响应时间的可观变化，使其成为基于时间的攻击线索的强有力候选。在这项研究中，我们提出了一种新颖的基于时间的侧信道攻击方法，以执行LLMs推理中的输入窃取。基于缓存的攻击面临着在庞大的搜索空间中构造候选输入以命中并窃取缓存用户查询的挑战。为了解决这些挑战，我们提出了两个主要组件。输入构造器采用机器学习技术和基于LLM的方法进行词汇相关性学习，同时实施优化搜索机制以进行通用输入构造。时间分析器实施统计时间拟合和异常值排除，以识别缓存命中模式，并持续提供反馈以完善构造器的搜索策略。我们在两种缓存机制上进行了实验，结果表明我们的方法在各种应用中始终实现高攻击成功率。我们的工作突出了与性能优化相关的安全漏洞，强调在LLM推理的增强方面优先考虑隐私和安全的必要性。

更新时间: 2024-11-27 10:14:38

领域: cs.CR

下载: http://arxiv.org/abs/2411.18191v1

CASCRNet: An Atrous Spatial Pyramid Pooling and Shared Channel Residual based Network for Capsule Endoscopy

This manuscript summarizes work on the Capsule Vision Challenge 2024 by MISAHUB. To address the multi-class disease classification task, which is challenging due to the complexity and imbalance in the Capsule Vision challenge dataset, this paper proposes CASCRNet (Capsule endoscopy-Aspp-SCR-Network), a parameter-efficient and novel model that uses Shared Channel Residual (SCR) blocks and Atrous Spatial Pyramid Pooling (ASPP) blocks. Further, the performance of the proposed model is compared with other well-known approaches. The experimental results yield that proposed model provides better disease classification results. The proposed model was successful in classifying diseases with an F1 Score of 78.5% and a Mean AUC of 98.3%, which is promising given its compact architecture.

Updated: 2024-11-27 10:03:24

标题: CASCRNet：一种基于空洞空间金字塔池化和共享通道残差的胶囊内窥镜网络

摘要: 这篇手稿总结了MISAHUB在2024年Capsule Vision Challenge上的工作。为了解决Capsule Vision挑战数据集中复杂性和不平衡造成的多类疾病分类任务，本文提出了CASCRNet（Capsule内窥镜-ASPP-SCR网络），这是一个参数高效且新颖的模型，使用了共享通道残差（SCR）块和空洞空间金字塔池（ASPP）块。此外，将所提出的模型的性能与其他知名方法进行了比较。实验结果表明，所提出的模型提供了更好的疾病分类结果。所提出的模型在疾病分类上表现出色，F1分数为78.5%，平均AUC为98.3%，这在其紧凑的架构下是令人充满希望的。

更新时间: 2024-11-27 10:03:24

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.17863v2

Multi-Source Temporal Attention Network for Precipitation Nowcasting

Precipitation nowcasting is crucial across various industries and plays a significant role in mitigating and adapting to climate change. We introduce an efficient deep learning model for precipitation nowcasting, capable of predicting rainfall up to 8 hours in advance with greater accuracy than existing operational physics-based and extrapolation-based models. Our model leverages multi-source meteorological data and physics-based forecasts to deliver high-resolution predictions in both time and space. It captures complex spatio-temporal dynamics through temporal attention networks and is optimized using data quality maps and dynamic thresholds. Experiments demonstrate that our model outperforms state-of-the-art, and highlight its potential for fast reliable responses to evolving weather conditions.

Updated: 2024-11-27 09:57:35

标题: 多源时态注意力网络用于降水现预测

摘要: 目前，降水现在预报在各行各业中至关重要，并在缓解和适应气候变化中起着重要作用。我们引入了一种高效的深度学习模型，用于降水现在预报，能够比现有的基于物理和外推的模型更准确地预测未来8小时的降雨。我们的模型利用多源气象数据和基于物理的预测，提供了时间和空间上的高分辨率预测。通过时间注意力网络捕捉复杂的时空动态，并使用数据质量图和动态阈值进行优化。实验证明，我们的模型优于最先进的模型，并突显了其在迅速可靠应对不断变化的天气条件方面的潜力。

更新时间: 2024-11-27 09:57:35

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.08641v2

CLUE-MARK: Watermarking Diffusion Models using CLWE

As AI-generated images become widespread, reliable watermarking is essential for content verification, copyright enforcement, and combating disinformation. Existing techniques rely on heuristic approaches and lack formal guarantees of undetectability, making them vulnerable to steganographic attacks that can expose or erase the watermark. Additionally, these techniques often degrade output quality by introducing perceptible changes, which is not only undesirable but an important barrier to adoption in practice. In this work, we introduce CLUE-Mark, the first provably undetectable watermarking scheme for diffusion models. CLUE-Mark requires no changes to the model being watermarked, is computationally efficient, and because it is provably undetectable is guaranteed to have no impact on model output quality. Our approach leverages the Continuous Learning With Errors (CLWE) problem -- a cryptographically hard lattice problem -- to embed watermarks in the latent noise vectors used by diffusion models. By proving undetectability via reduction from a cryptographically hard problem we ensure not only that the watermark is imperceptible to human observers or adhoc heuristics, but to \emph{any} efficient detector that does not have the secret key. CLUE-Mark allows multiple keys to be embedded, enabling traceability of images to specific users without altering model parameters. Empirical evaluations on state-of-the-art diffusion models confirm that CLUE-Mark achieves high recoverability, preserves image quality, and is robust to minor perturbations such JPEG compression and brightness adjustments. Uniquely, CLUE-Mark cannot be detected nor removed by recent steganographic attacks.

Updated: 2024-11-27 09:57:02

标题: CLUE-MARK: 使用CLWE进行扩散模型水印处理

摘要: 随着人工智能生成的图像变得普及，可靠的水印技术对于内容验证、版权执行和打击虚假信息至关重要。现有技术依赖于启发式方法，缺乏对不可检测性的形式保证，使其容易受到可以暴露或擦除水印的隐写攻击的威胁。此外，这些技术往往会通过引入可感知的变化来降低输出质量，这不仅是不希望的，而且是实践中采用的一个重要障碍。在这项工作中，我们介绍了CLUE-Mark，这是扩散模型的第一个具有可证明不可检测性的水印方案。CLUE-Mark无需对被贴水印的模型进行任何更改，计算效率高，并且因为它具有可证明的不可检测性，保证不会对模型输出质量产生影响。我们的方法利用了连续误差学习（CLWE）问题--一种密码学难题--将水印嵌入扩散模型使用的潜在噪声向量中。通过从一个密码学难题的简化证明不可检测性，我们确保水印不仅对人类观察者或启发式方法不可感知，而且对于任何没有秘钥的高效检测器也是如此。CLUE-Mark允许嵌入多个密钥，实现图像追踪到特定用户而不改变模型参数。对最先进的扩散模型进行的经验评估证实，CLUE-Mark实现了高度的可恢复性，保留了图像质量，并且对JPEG压缩和亮度调整等轻微扰动具有鲁棒性。独特的是，CLUE-Mark无法被最近的隐写攻击检测或移除。

更新时间: 2024-11-27 09:57:02

领域: cs.CR

下载: http://arxiv.org/abs/2411.11434v2

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been extensively investigated. We observe that the misalignment is caused by inadequate token attention activation. We further attribute this phenomenon to the diffusion model's insufficient condition utilization, which is caused by its training paradigm. To address the issue, we propose CoMat, an end-to-end diffusion model fine-tuning strategy with an image-to-text concept matching mechanism. We leverage an image captioning model to measure image-to-text alignment and guide the diffusion model to revisit ignored tokens. A novel attribute concentration module is also proposed to address the attribute binding problem. Without any image or human preference data, we use only 20K text prompts to fine-tune SDXL to obtain CoMat-SDXL. Extensive experiments show that CoMat-SDXL significantly outperforms the baseline model SDXL in two text-to-image alignment benchmarks and achieves start-of-the-art performance.

Updated: 2024-11-27 09:55:41

标题: CoMat：将文本到图像扩散模型与图像到文本概念匹配对齐

摘要: 扩散模型在文本到图像生成领域取得了巨大成功。然而，缓解文本提示和图像之间的不对齐仍然具有挑战性。不对齐的根本原因尚未得到广泛研究。我们观察到，不对齐是由于令牌注意力激活不足引起的。我们进一步将这一现象归因于扩散模型条件利用不足，这是由于其训练范式引起的。为了解决这个问题，我们提出了CoMat，一种具有图像到文本概念匹配机制的端到端扩散模型微调策略。我们利用图像字幕模型来衡量图像到文本的对齐，并引导扩散模型重新审视被忽略的令牌。还提出了一种新颖的属性集中模块来解决属性绑定问题。在没有任何图像或人类偏好数据的情况下，我们仅使用20K个文本提示对SDXL进行微调，获得CoMat-SDXL。大量实验表明，CoMat-SDXL在两个文本到图像对齐基准测试中显著优于基线模型SDXL，并实现了最先进的性能。

更新时间: 2024-11-27 09:55:41

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.03653v3

Prediction with Action: Visual Policy Learning via Joint Denoising Process

Diffusion models have demonstrated remarkable capabilities in image generation tasks, including image editing and video creation, representing a good understanding of the physical world. On the other line, diffusion models have also shown promise in robotic control tasks by denoising actions, known as diffusion policy. Although the diffusion generative model and diffusion policy exhibit distinct capabilities--image prediction and robotic action, respectively--they technically follow a similar denoising process. In robotic tasks, the ability to predict future images and generate actions is highly correlated since they share the same underlying dynamics of the physical world. Building on this insight, we introduce PAD, a novel visual policy learning framework that unifies image Prediction and robot Action within a joint Denoising process. Specifically, PAD utilizes Diffusion Transformers (DiT) to seamlessly integrate images and robot states, enabling the simultaneous prediction of future images and robot actions. Additionally, PAD supports co-training on both robotic demonstrations and large-scale video datasets and can be easily extended to other robotic modalities, such as depth images. PAD outperforms previous methods, achieving a significant 26.3% relative improvement on the full Metaworld benchmark, by utilizing a single text-conditioned visual policy within a data-efficient imitation learning setting. Furthermore, PAD demonstrates superior generalization to unseen tasks in real-world robot manipulation settings with 28.0% success rate increase compared to the strongest baseline. Project page at https://sites.google.com/view/pad-paper

Updated: 2024-11-27 09:54:58

标题: 通过动作预测：通过联合去噪过程进行视觉策略学习

摘要: 扩散模型在图像生成任务中展示了显着的能力，包括图像编辑和视频创建，表现出对物理世界的良好理解。另一方面，扩散模型还在机器人控制任务中展现出潜力，通过去噪行为来实现，被称为扩散策略。尽管扩散生成模型和扩散策略展示了不同的能力--图像预测和机器人动作，但它们在技术上遵循相似的去噪过程。在机器人任务中，预测未来图像和生成动作的能力高度相关，因为它们共享物理世界的相同基本动态。基于这一见解，我们引入了PAD，一个将图像预测和机器人动作统一在一个联合去噪过程中的新型视觉策略学习框架。具体而言，PAD利用扩散变换器（DiT）将图像和机器人状态无缝集成，实现了未来图像和机器人动作的同时预测。此外，PAD支持在机器人演示和大规模视频数据集上进行联合训练，并且可以轻松扩展到其他机器人模式，如深度图像。通过在数据有效的模仿学习设置中利用单一的文本条件视觉策略，PAD优于先前的方法，在完整的Metaworld基准测试中取得了显著的26.3%相对改进。此外，与最强基准相比，PAD在真实世界机器人操纵环境中对未见任务展示了卓越的泛化能力，成功率提高了28.0%。项目页面网址为https://sites.google.com/view/pad-paper。

更新时间: 2024-11-27 09:54:58

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2411.18179v1

Citywide Electric Vehicle Charging Demand Prediction Approach Considering Urban Region and Dynamic Influences

Electric vehicle charging demand prediction is important for vacant charging pile recommendation and charging infrastructure planning, thus facilitating vehicle electrification and green energy development. The performance of previous spatio-temporal studies is still far from satisfactory nowadays because urban region attributes and multivariate temporal influences are not adequately taken into account. To tackle these issues, we propose a learning approach for citywide electric vehicle charging demand prediction, named CityEVCP. To learn non-pairwise relationships in urban areas, we cluster service areas by the types and numbers of points of interest in the areas and develop attentive hypergraph networks accordingly. Graph attention mechanisms are employed for information propagation between neighboring areas. Additionally, we propose a variable selection network to adaptively learn dynamic auxiliary information and improve the Transformer encoder utilizing gated mechanisms for fluctuating charging time-series data. Experiments on a citywide electric vehicle charging dataset demonstrate the performances of our proposed approach compared with a broad range of competing baselines. Furthermore, we demonstrate the impact of dynamic influences on prediction results in different areas of the city and the effectiveness of our area clustering method.

Updated: 2024-11-27 09:54:34

标题: 考虑城市区域和动态影响的城市范围电动车充电需求预测方法

摘要: 电动汽车充电需求预测对于空置充电桩推荐和充电基础设施规划至关重要，从而促进车辆电气化和绿色能源发展。目前，以前的时空研究表现仍然远未令人满意，因为城市区域属性和多变量时间影响没有充分考虑进去。为了解决这些问题，我们提出了一种学习方法用于城市范围内的电动汽车充电需求预测，名为CityEVCP。为了学习城市区域中的非配对关系，我们通过区域中的兴趣点类型和数量对服务区进行聚类，并相应地开发关注力超图网络。图注意机制被用于在相邻区域之间传播信息。此外，我们提出了一个变量选择网络，以自适应地学习动态辅助信息，并利用门控机制改进Transformer编码器，利用波动的充电时间序列数据。对一个城市范围内的电动汽车充电数据集进行的实验展示了我们提出的方法与广泛的竞争基线相比的性能。此外，我们展示了动态影响对不同城市地区预测结果的影响，以及我们区域聚类方法的有效性。

更新时间: 2024-11-27 09:54:34

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2410.18766v2

Machine Unlearning reveals that the Gender-based Violence Victim Condition can be detected from Speech in a Speaker-Agnostic Setting

This study addresses the critical issue of gender-based violence's (GBV) impact on women's mental health. GBV, encompassing physical and sexual aggression, often results in long-lasting adverse effects for the victims, including anxiety, depression, post-traumatic stress disorder (PTSD), and substance abuse. Artificial Intelligence (AI)-based speech technologies have proven valuable for mental health assessments. However, these technologies experience performance challenges when confronted with speakers whose data has not been used for training. Our research presents a novel approach to speaker-agnostic detection of the gender-based violence victim condition (GBVVC), focusing on the development of robust AI models capable of generalization across diverse speakers. Leveraging advanced deep learning models and domain-adversarial training techniques, we minimize speaker identity's influence, achieving a 26.95% relative reduction in speaker identification ability while enhancing the GBVVC detection by a 6.37% relative improvement in the accuracy. This shows that models can focus on discriminative paralinguistic biomarkers that enhance the GBVVC prediction, and reduce the subject-specific traits' impact. Additionally, our model's predictions moderately correlate with pre-clinical PTSD symptoms, emphasizing the link between GBV and mental health. This work paves the way for AI-powered tools to aid mental health professionals in addressing this societal issue, offering a promising baseline for further research.

Updated: 2024-11-27 09:53:53

标题: 机器遗忘揭示了在与说话者无关的环境中可以通过语音检测基于性别的暴力受害者情况

摘要: 这项研究探讨了基于性别暴力（GBV）对妇女心理健康的重要影响。GBV包括身体和性侵犯，通常会对受害者造成长期不良影响，包括焦虑、抑郁、创伤后应激障碍（PTSD）和物质滥用。基于人工智能（AI）的语音技术已被证明对心理健康评估非常有价值。然而，当面对未被用于训练的说话者时，这些技术会遇到性能挑战。我们的研究提出了一种新颖的方法，用于无关说话者检测基于性别暴力受害者状况（GBVVC），重点是开发能够在不同说话者之间泛化的强大AI模型。利用先进的深度学习模型和领域对抗训练技术，我们最小化说话者身份的影响，实现了26.95%的相对减少说话者识别能力，同时将GBVVC检测的准确性提高了6.37%的相对改善。这表明模型可以专注于增强GBVVC预测的辨别性语言生物标志，并减少主观特征的影响。此外，我们模型的预测与临床前PTSD症状有适度相关，强调了GBV与心理健康之间的联系。这项工作为AI支持的工具帮助心理健康专业人士解决这一社会问题铺平了道路，并为进一步研究提供了有希望的基准。

更新时间: 2024-11-27 09:53:53

领域: cs.LG

下载: http://arxiv.org/abs/2411.18177v1

Empowering ChatGPT-Like Large-Scale Language Models with Local Knowledge Base for Industrial Prognostics and Health Management

Prognostics and health management (PHM) is essential for industrial operation and maintenance, focusing on predicting, diagnosing, and managing the health status of industrial systems. The emergence of the ChatGPT-Like large-scale language model (LLM) has begun to lead a new round of innovation in the AI field. It has extensively promoted the level of intelligence in various fields. Therefore, it is also expected further to change the application paradigm in industrial PHM and promote PHM to become intelligent. Although ChatGPT-Like LLMs have rich knowledge reserves and powerful language understanding and generation capabilities, they lack domain-specific expertise, significantly limiting their practicability in PHM applications. To this end, this study explores the ChatGPT-Like LLM empowered by the local knowledge base (LKB) in industrial PHM to solve the above limitations. In addition, we introduce the method and steps of combining the LKB with LLMs, including LKB preparation, LKB vectorization, prompt engineering, etc. Experimental analysis of real cases shows that combining the LKB with ChatGPT-Like LLM can significantly improve its performance and make ChatGPT-Like LLMs more accurate, relevant, and able to provide more insightful information. This can promote the development of ChatGPT-Like LLMs in industrial PHM and promote their efficiency and quality.

Updated: 2024-11-27 09:43:20

标题: 为工业预测和健康管理赋能ChatGPT-Like大规模语言模型，结合本地知识库

摘要: 预测和健康管理（PHM）对于工业运营和维护至关重要，重点是预测、诊断和管理工业系统的健康状态。ChatGPT-Like大规模语言模型（LLM）的出现开始引领人工智能领域的新一轮创新。它广泛提升了各个领域的智能水平。因此，人们也期望进一步改变工业PHM的应用范式，推动PHM变得更加智能化。虽然ChatGPT-Like LLM拥有丰富的知识储备和强大的语言理解和生成能力，但缺乏特定领域的专业知识，极大限制了它们在PHM应用中的实用性。因此，本研究探讨了在工业PHM中利用本地知识库（LKB）赋能的ChatGPT-Like LLM来解决上述限制。此外，我们介绍了将LKB与LLM结合的方法和步骤，包括LKB准备、LKB向量化、提示工程等。实际案例的实验分析表明，将LKB与ChatGPT-Like LLM结合可以显著提高其性能，使ChatGPT-Like LLM更加准确、相关并且能够提供更多见解性信息。这可以促进ChatGPT-Like LLM在工业PHM中的发展，提高其效率和质量。

更新时间: 2024-11-27 09:43:20

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2312.14945v3

DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models

Large vision-language models (LVLMs) have demonstrated exceptional performance on complex multimodal tasks. However, they continue to suffer from significant hallucination issues, including object, attribute, and relational hallucinations. To accurately detect these hallucinations, we investigated the variations in cross-modal attention patterns between hallucination and non-hallucination states. Leveraging these distinctions, we developed a lightweight detector capable of identifying hallucinations. Our proposed method, Detecting Hallucinations by Cross-modal Attention Patterns (DHCP), is straightforward and does not require additional LVLM training or extra LVLM inference steps. Experimental results show that DHCP achieves remarkable performance in hallucination detection. By offering novel insights into the identification and analysis of hallucinations in LVLMs, DHCP contributes to advancing the reliability and trustworthiness of these models.

Updated: 2024-11-27 09:43:09

标题: DHCP：在大型视觉-语言模型中通过跨模态注意力模式检测幻觉

摘要: 大型视觉语言模型（LVLMs）已经在复杂的多模态任务上展现出卓越的性能。然而，它们仍然存在显著的幻觉问题，包括对象、属性和关系幻觉。为了准确检测这些幻觉，我们研究了幻觉和非幻觉状态之间的跨模态注意力模式的变化。利用这些区别，我们开发了一种轻量级检测器，能够识别幻觉。我们提出的方法，通过跨模态注意力模式检测幻觉（DHCP），简单直接，不需要额外的LVLM训练或额外的LVLM推断步骤。实验证明，DHCP在幻觉检测方面取得了显著的性能。通过为LVLMs中的幻觉的识别和分析提供新的见解，DHCP有助于提高这些模型的可靠性和信任度。

更新时间: 2024-11-27 09:43:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18659v1

CHORDONOMICON: A Dataset of 666,000 Songs and their Chord Progressions

Chord progressions encapsulate important information about music, pertaining to its structure and conveyed emotions. They serve as the backbone of musical composition, and in many cases, they are the sole information required for a musician to play along and follow the music. Despite their importance, chord progressions as a data domain remain underexplored. There is a lack of large-scale datasets suitable for deep learning applications, and limited research exploring chord progressions as an input modality. In this work, we present Chordonomicon, a dataset of over 666,000 songs and their chord progressions, annotated with structural parts, genre, and release date - created by scraping various sources of user-generated progressions and associated metadata. We demonstrate the practical utility of the Chordonomicon dataset for classification and generation tasks, and discuss its potential to provide valuable insights to the research community. Chord progressions are unique in their ability to be represented in multiple formats (e.g. text, graph) and the wealth of information chords convey in given contexts, such as their harmonic function . These characteristics make the Chordonomicon an ideal testbed for exploring advanced machine learning techniques, including transformers, graph machine learning, and hybrid systems that combine knowledge representation and machine learning.

Updated: 2024-11-27 09:33:18

标题: 《和弦宝典：一份包含66万首歌曲及其和弦进行的数据集》

摘要: 和弦进行包含有关音乐的重要信息，涉及其结构和传达的情感。它们是音乐作品的支柱，在许多情况下，它们是音乐家演奏和跟随音乐所需的唯一信息。尽管它们的重要性，作为数据领域的和弦进行仍未得到充分探索。缺乏适用于深度学习应用的大规模数据集，以及有限的研究探索和弦进行作为输入模态。在这项工作中，我们介绍了Chordonomicon，这是一个包含超过666,000首歌曲及其和弦进行的数据集，附有结构部分、流派和发行日期的注释 - 通过抓取各种用户生成的进行和相关元数据而创建。我们展示了Chordonomicon数据集在分类和生成任务中的实际效用，并讨论了它为研究界提供有价值见解的潜力。和弦进行在其能够以多种格式（例如文本、图表）表示和在特定背景中传达的和弦所包含的丰富信息方面是独特的，比如它们的和声功能。这些特性使Chordonomicon成为探索先进机器学习技术的理想试验平台，包括变压器、图机器学习和结合知识表示和机器学习的混合系统。

更新时间: 2024-11-27 09:33:18

领域: cs.SD,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2410.22046v2

PDZSeg: Adapting the Foundation Model for Dissection Zone Segmentation with Visual Prompts in Robot-assisted Endoscopic Submucosal Dissection

Purpose: Endoscopic surgical environments present challenges for dissection zone segmentation due to unclear boundaries between tissue types, leading to segmentation errors where models misidentify or overlook edges. This study aims to provide precise dissection zone suggestions during endoscopic submucosal dissection (ESD) procedures, enhancing ESD safety. Methods: We propose the Prompted-based Dissection Zone Segmentation (PDZSeg) model, designed to leverage diverse visual prompts such as scribbles and bounding boxes. By overlaying these prompts onto images and fine-tuning a foundational model on a specialized dataset, our approach improves segmentation performance and user experience through flexible input methods. Results: The PDZSeg model was validated using three experimental setups: in-domain evaluation, variability in visual prompt availability, and robustness assessment. Using the ESD-DZSeg dataset, results show that our method outperforms state-of-the-art segmentation approaches. This is the first study to integrate visual prompt design into dissection zone segmentation. Conclusion: The PDZSeg model effectively utilizes visual prompts to enhance segmentation performance and user experience, supported by the novel ESD-DZSeg dataset as a benchmark for dissection zone segmentation in ESD. Our work establishes a foundation for future research.

Updated: 2024-11-27 09:28:50

标题: PDZSeg：利用视觉提示调整基础模型进行机器人辅助内窥镜黏膜下剥除中的解剖区域分割

摘要: 目的：内窥镜手术环境对于解剖区分割提出了挑战，因为组织类型之间的边界不清晰，导致模型误识别或忽视边缘而产生分割错误。本研究旨在提供精确的内窥镜黏膜下剥离（ESD）过程中的解剖区建议，增强ESD的安全性。方法：我们提出了基于提示的解剖区分割（PDZSeg）模型，旨在利用各种视觉提示，如涂鸦和边界框。通过将这些提示叠加到图像上，并在专门数据集上对基础模型进行微调，我们的方法通过灵活的输入方法提高了分割性能和用户体验。结果：PDZSeg模型通过三种实验设置进行了验证：领域内评估、视觉提示可用性的变化以及鲁棒性评估。使用ESD-DZSeg数据集，结果显示我们的方法优于最先进的分割方法。这是首个将视觉提示设计整合到解剖区分割中的研究。结论：PDZSeg模型有效利用视觉提示来增强分割性能和用户体验，同时借助ESD-DZSeg数据集作为ESD中解剖区分割的基准。我们的工作为未来研究奠定了基础。

更新时间: 2024-11-27 09:28:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18169v1

Latent Neural Operator Pretraining for Solving Time-Dependent PDEs

Pretraining methods gain increasing attraction recently for solving PDEs with neural operators. It alleviates the data scarcity problem encountered by neural operator learning when solving single PDE via training on large-scale datasets consisting of various PDEs and utilizing shared patterns among different PDEs to improve the solution precision. In this work, we propose the Latent Neural Operator Pretraining (LNOP) framework based on the Latent Neural Operator (LNO) backbone. We achieve universal transformation through pretraining on hybrid time-dependent PDE dataset to extract representations of different physical systems and solve various time-dependent PDEs in the latent space through finetuning on single PDE dataset. Our proposed LNOP framework reduces the solution error by 31.7% on four problems and can be further improved to 57.1% after finetuning. On out-of-distribution dataset, our LNOP model achieves roughly 50% lower error and 3$\times$ data efficiency on average across different dataset sizes. These results show that our method is more competitive in terms of solution precision, transfer capability and data efficiency compared to non-pretrained neural operators.

Updated: 2024-11-27 09:25:42

标题: 潜在神经算子预训练用于求解时间相关的偏微分方程

摘要: 最近，预训练方法在解决带神经算子的PDEs方面越来越受到关注。当解决单个PDE时，通过在包含各种PDEs的大规模数据集上进行训练，并利用不同PDEs之间的共享模式来提高解决方案的精度，可以缓解神经算子学习时遇到的数据稀缺问题。在这项工作中，我们提出了基于潜在神经算子（LNO）骨干的潜在神经算子预训练（LNOP）框架。通过在混合时间相关PDE数据集上进行预训练，提取不同物理系统的表示，并通过在单个PDE数据集上微调，在潜在空间中解决各种时间相关PDEs。我们提出的LNOP框架将四个问题的解决误差减少了31.7%，并在微调后进一步提高至57.1%。在分布外数据集上，我们的LNOP模型平均实现了大约50%的较低误差和3倍的数据效率。这些结果表明，与未经预训练的神经算子相比，我们的方法在解决精度、迁移能力和数据效率方面更具竞争力。

更新时间: 2024-11-27 09:25:42

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2410.20100v2

Latent Neural Operator for Solving Forward and Inverse PDE Problems

Neural operators effectively solve PDE problems from data without knowing the explicit equations, which learn the map from the input sequences of observed samples to the predicted values. Most existing works build the model in the original geometric space, leading to high computational costs when the number of sample points is large. We present the Latent Neural Operator (LNO) solving PDEs in the latent space. In particular, we first propose Physics-Cross-Attention (PhCA) transforming representation from the geometric space to the latent space, then learn the operator in the latent space, and finally recover the real-world geometric space via the inverse PhCA map. Our model retains flexibility that can decode values in any position not limited to locations defined in the training set, and therefore can naturally perform interpolation and extrapolation tasks particularly useful for inverse problems. Moreover, the proposed LNO improves both prediction accuracy and computational efficiency. Experiments show that LNO reduces the GPU memory by 50%, speeds up training 1.8 times, and reaches state-of-the-art accuracy on four out of six benchmarks for forward problems and a benchmark for inverse problem. Code is available at https://github.com/L-I-M-I-T/LatentNeuralOperator.

Updated: 2024-11-27 09:22:42

标题: 潜在神经算子用于解决前向和反向PDE问题

摘要: 神经算子能够有效地从数据中解决偏微分方程问题，而无需知道显式方程，它学习从观测样本输入序列到预测值的映射。大多数现有的工作在原始几何空间中构建模型，当样本点数量庞大时导致计算成本高昂。我们提出了在潜在空间中解决PDE的Latent Neural Operator（LNO）。具体而言，我们首先提出了Physics-Cross-Attention（PhCA），将表示从几何空间转换到潜在空间，然后在潜在空间中学习算子，最后通过逆PhCA映射恢复真实世界的几何空间。我们的模型保留了灵活性，可以解码任何位置的值，不仅限于训练集中定义的位置，因此可以自然地执行插值和外推任务，特别适用于逆问题。此外，所提出的LNO提高了预测准确性和计算效率。实验表明，LNO将GPU内存减少了50％，训练速度提高了1.8倍，并在六个前向问题基准测试中的四个以及一个逆问题基准测试中达到了最新的准确性水平。代码可在https://github.com/L-I-M-I-T/LatentNeuralOperator找到。

更新时间: 2024-11-27 09:22:42

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2406.03923v4

Efficient Hardware Implementation of Constant Time Sampling for HQC

HQC is one of the code-based finalists in the last round of the NIST post quantum cryptography standardization process. In this process, security and implementation efficiency are key metrics for the selection of the candidates. A critical compute kernel with respect to efficient hardware implementations and security in HQC is the sampling method used to derive random numbers. Due to its security criticality, recently an updated sampling algorithm was presented to increase its robustness against side-channel attacks. In this paper, we pursue a cross layer approach to optimize this new sampling algorithm to enable an efficient hardware implementation without comprising the original algorithmic security and side-channel attack robustness. We compare our cross layer based implementation to a direct hardware implementation of the original algorithm and to optimized implementations of the previous sampler version. All implementations are evaluated using the Xilinx Artix 7 FPGA. Our results show that our approach reduces the latency by a factor of 24 compared to the original algorithm and by a factor of 28 compared to the previously used sampler with significantly less resources.

Updated: 2024-11-27 09:21:29

标题: 高效硬件实现HQCH常量时间采样

摘要: HQC是NIST后量子密码标准化过程中最后一轮的基于代码的入围者之一。在这一过程中，安全性和实现效率是候选者选择的关键指标。在HQC中，关于高效硬件实现和安全性的一个关键计算内核是用于生成随机数的采样方法。鉴于其安全性的关键性，最近提出了一种更新的采样算法，以增强其对侧信道攻击的鲁棒性。本文采用跨层方法优化这种新的采样算法，以实现有效的硬件实现，同时不影响原始算法的安全性和对侧信道攻击的鲁棒性。我们将基于跨层的实现与原始算法的直接硬件实现以及之前采样器版本的优化实现进行比较。所有实现均使用Xilinx Artix 7 FPGA进行评估。我们的结果显示，与原始算法相比，我们的方法将延迟降低了24倍，与之前使用的采样器相比，资源利用更少，降低了28倍。

更新时间: 2024-11-27 09:21:29

领域: cs.CR

下载: http://arxiv.org/abs/2309.16493v2

RPEE-HEADS: A Novel Benchmark for Pedestrian Head Detection in Crowd Videos

The automatic detection of pedestrian heads in crowded environments is essential for crowd analysis and management tasks, particularly in high-risk settings such as railway platforms and event entrances. These environments, characterized by dense crowds and dynamic movements, are underrepresented in public datasets, posing challenges for existing deep learning models. To address this gap, we introduce the Railway Platforms and Event Entrances-Heads (RPEE-Heads) dataset, a novel, diverse, high-resolution, and accurately annotated resource. It includes 109,913 annotated pedestrian heads across 1,886 images from 66 video recordings, with an average of 56.2 heads per image. Annotations include bounding boxes for visible head regions. In addition to introducing the RPEE-Heads dataset, this paper evaluates eight state-of-the-art object detection algorithms using the RPEE-Heads dataset and analyzes the impact of head size on detection accuracy. The experimental results show that You Only Look Once v9 and Real-Time Detection Transformer outperform the other algorithms, achieving mean average precisions of 90.7% and 90.8%, with inference times of 11 and 14 milliseconds, respectively. Moreover, the findings underscore the need for specialized datasets like RPEE-Heads for training and evaluating accurate models for head detection in railway platforms and event entrances. The dataset and pretrained models are available at https://doi.org/10.34735/ped.2024.2.

Updated: 2024-11-27 09:20:26

标题: RPEE-HEADS：一种新颖的用于群体视频中行人头部检测的基准Benchmark

摘要: 在拥挤环境中自动检测行人头部对于人群分析和管理任务至关重要，特别是在高风险环境，如铁路站台和活动入口。这些环境以密集人群和动态移动为特征，在公共数据集中往往被低估，给现有深度学习模型带来挑战。为填补这一空白，我们引入了铁路站台和活动入口-头部（RPEE-Heads）数据集，这是一个新颖、多样、高分辨率且准确标注的资源。它包含来自66个视频录像的1,886张图像中的109,913个注释行人头部，每张图像平均有56.2个头部。注释包括可见头部区域的边界框。除了介绍RPEE-Heads数据集外，本文还评估了八种最先进的目标检测算法，使用RPEE-Heads数据集并分析头部大小对检测准确性的影响。实验结果表明，You Only Look Once v9和实时检测变换器优于其他算法，分别实现了90.7%和90.8%的平均准确度，推断时间分别为11和14毫秒。此外，研究结果强调了如RPEE-Heads这样的专门数据集对于在铁路站台和活动入口中训练和评估准确的头部检测模型的必要性。数据集和预训练模型可在https://doi.org/10.34735/ped.2024.2 上获得。

更新时间: 2024-11-27 09:20:26

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.18164v1

Abductive Symbolic Solver on Abstraction and Reasoning Corpus

This paper addresses the challenge of enhancing artificial intelligence reasoning capabilities, focusing on logicality within the Abstraction and Reasoning Corpus (ARC). Humans solve such visual reasoning tasks based on their observations and hypotheses, and they can explain their solutions with a proper reason. However, many previous approaches focused only on the grid transition and it is not enough for AI to provide reasonable and human-like solutions. By considering the human process of solving visual reasoning tasks, we have concluded that the thinking process is likely the abductive reasoning process. Thus, we propose a novel framework that symbolically represents the observed data into a knowledge graph and extracts core knowledge that can be used for solution generation. This information limits the solution search space and helps provide a reasonable mid-process. Our approach holds promise for improving AI performance on ARC tasks by effectively narrowing the solution space and providing logical solutions grounded in core knowledge extraction.

Updated: 2024-11-27 09:09:00

标题: 基于抽象和推理语料库的归纳符号求解器

摘要: 本文讨论了增强人工智能推理能力的挑战，重点关注在抽象和推理语料库（ARC）中的逻辑性。人类通过观察和假设来解决这类视觉推理任务，并且能够用合理的理由解释他们的解决方案。然而，许多先前的方法只关注了网格转换，这对于人工智能提供合理且类似人类的解决方案是不够的。通过考虑人类解决视觉推理任务的思维过程，我们得出结论认为思维过程很可能是缺省推理过程。因此，我们提出了一个新颖的框架，将观察到的数据符号化地表示为知识图，并提取可以用于生成解决方案的核心知识。这些信息限制了解决方案的搜索空间，并有助于提供一个合理的中间过程。我们的方法有望通过有效地缩小解决方案空间并提供基于核心知识提取的逻辑解决方案，从而改善人工智能在ARC任务上的表现。

更新时间: 2024-11-27 09:09:00

领域: cs.AI

下载: http://arxiv.org/abs/2411.18158v1

A survey on cutting-edge relation extraction techniques based on language models

This comprehensive survey delves into the latest advancements in Relation Extraction (RE), a pivotal task in natural language processing essential for applications across biomedical, financial, and legal sectors. This study highlights the evolution and current state of RE techniques by analyzing 137 papers presented at the Association for Computational Linguistics (ACL) conferences over the past four years, focusing on models that leverage language models. Our findings underscore the dominance of BERT-based methods in achieving state-of-the-art results for RE while also noting the promising capabilities of emerging large language models (LLMs) like T5, especially in few-shot relation extraction scenarios where they excel in identifying previously unseen relations.

Updated: 2024-11-27 09:04:47

标题: 基于语言模型的尖端关系抽取技术调查

摘要: 这份全面的调查深入探讨了关系抽取（RE）的最新进展，这是自然语言处理中的一个关键任务，对于生物医学、金融和法律领域的应用至关重要。本研究通过分析过去四年在计算语言学协会（ACL）会议上呈现的137篇论文，突出了关系抽取技术的演变和当前状态，重点关注利用语言模型的模型。我们的研究结果强调了基于BERT的方法在实现关系抽取的最新成果方面的主导地位，同时也注意到新兴大型语言模型（LLMs）如T5在少样本关系抽取场景中具有很强的潜力，特别是在识别以前未见过的关系方面表现出色。

更新时间: 2024-11-27 09:04:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.18157v1

A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs

Transformer neural networks (TNN) excel in natural language processing (NLP), machine translation, and computer vision (CV) without relying on recurrent or convolutional layers. However, they have high computational and memory demands, particularly on resource-constrained devices like FPGAs. Moreover, transformer models vary in processing time across applications, requiring custom models with specific parameters. Designing custom accelerators for each model is complex and time-intensive. Some custom accelerators exist with no runtime adaptability, and they often rely on sparse matrices to reduce latency. However, hardware designs become more challenging due to the need for application-specific sparsity patterns. This paper introduces ADAPTOR, a runtime-adaptive accelerator for dense matrix computations in transformer encoders and decoders on FPGAs. ADAPTOR enhances the utilization of processing elements and on-chip memory, enhancing parallelism and reducing latency. It incorporates efficient matrix tiling to distribute resources across FPGA platforms and is fully quantized for computational efficiency and portability. Evaluations on Xilinx Alveo U55C data center cards and embedded platforms like VC707 and ZCU102 show that our design is 1.2$\times$ and 2.87$\times$ more power efficient than the NVIDIA K80 GPU and the i7-8700K CPU respectively. Additionally, it achieves a speedup of 1.7 to 2.25$\times$ compared to some state-of-the-art FPGA-based accelerators.

Updated: 2024-11-27 08:53:19

标题: 一个基于FPGA的运行时自适应变压器神经网络加速器

摘要: Transformer神经网络（TNN）在自然语言处理（NLP）、机器翻译和计算机视觉（CV）方面表现出色，而无需依赖循环或卷积层。然而，它们对计算和内存的需求较高，特别是在资源受限的设备（如FPGAs）上。此外，transformer模型在不同应用中的处理时间各不相同，需要具有特定参数的定制模型。为每个模型设计定制加速器是复杂且耗时的。一些定制加速器存在，但缺乏运行时适应性，通常依赖于稀疏矩阵以减少延迟。然而，由于需要特定于应用的稀疏模式，硬件设计变得更具挑战性。本文介绍了ADAPTOR，这是一种针对FPGA上transformer编码器和解码器中的密集矩阵计算的运行时自适应加速器。ADAPTOR增强了处理元素和芯片内存的利用率，提高了并行性并降低了延迟。它结合了有效的矩阵分块技术，将资源分配到FPGA平台上，并且完全量化以实现计算效率和可移植性。在Xilinx Alveo U55C数据中心卡和嵌入式平台（如VC707和ZCU102）上的评估显示，我们的设计分别比NVIDIA K80 GPU和i7-8700K CPU更节能1.2倍和2.87倍。此外，与一些最先进的基于FPGA的加速器相比，它实现了1.7到2.25倍的加速。

更新时间: 2024-11-27 08:53:19

领域: cs.AR,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.18148v1

Online Knowledge Integration for 3D Semantic Mapping: A Survey

Semantic mapping is a key component of robots operating in and interacting with objects in structured environments. Traditionally, geometric and knowledge representations within a semantic map have only been loosely integrated. However, recent advances in deep learning now allow full integration of prior knowledge, represented as knowledge graphs or language concepts, into sensor data processing and semantic mapping pipelines. Semantic scene graphs and language models enable modern semantic mapping approaches to incorporate graph-based prior knowledge or to leverage the rich information in human language both during and after the mapping process. This has sparked substantial advances in semantic mapping, leading to previously impossible novel applications. This survey reviews these recent developments comprehensively, with a focus on online integration of knowledge into semantic mapping. We specifically focus on methods using semantic scene graphs for integrating symbolic prior knowledge and language models for respective capture of implicit common-sense knowledge and natural language concepts

Updated: 2024-11-27 08:53:16

标题: 在线知识集成用于3D语义地图的调查

摘要: 语义映射是机器人在结构化环境中操作和与物体互动的关键组成部分。传统上，语义地图中的几何和知识表示仅松散集成。然而，最近深度学习的进展现在允许将先前的知识，表示为知识图或语言概念，完全集成到传感器数据处理和语义映射管道中。语义场景图和语言模型使现代语义映射方法能够在映射过程中或之后将基于图的先前知识或丰富的人类语言信息整合进来。这引发了语义映射的重大进展，带来了以前不可能的新应用。本调查综合地审视了这些最新发展，重点放在将知识在线集成到语义映射中。我们特别关注使用语义场景图的方法，用于集成符号先前知识，以及使用语言模型进行隐含常识知识和自然语言概念的捕获。

更新时间: 2024-11-27 08:53:16

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.18147v1

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

The advent of Large Language Models (LLMs) has paved the way for AI search engines, e.g., SearchGPT, showcasing a new paradigm in human-internet interaction. However, most current AI search engines are limited to text-only settings, neglecting the multimodal user queries and the text-image interleaved nature of website information. Recently, Large Multimodal Models (LMMs) have made impressive strides. Yet, whether they can function as AI search engines remains under-explored, leaving the potential of LMMs in multimodal search an open question. To this end, we first design a delicate pipeline, MMSearch-Engine, to empower any LMMs with multimodal search capabilities. On top of this, we introduce MMSearch, a comprehensive evaluation benchmark to assess the multimodal search performance of LMMs. The curated dataset contains 300 manually collected instances spanning 14 subfields, which involves no overlap with the current LMMs' training data, ensuring the correct answer can only be obtained within searching. By using MMSearch-Engine, the LMMs are evaluated by performing three individual tasks (requery, rerank, and summarization), and one challenging end-to-end task with a complete searching process. We conduct extensive experiments on closed-source and open-source LMMs. Among all tested models, GPT-4o with MMSearch-Engine achieves the best results, which surpasses the commercial product, Perplexity Pro, in the end-to-end task, demonstrating the effectiveness of our proposed pipeline. We further present error analysis to unveil current LMMs still struggle to fully grasp the multimodal search tasks, and conduct ablation study to indicate the potential of scaling test-time computation for AI search engine. We hope MMSearch may provide unique insights to guide the future development of multimodal AI search engine. Project Page: https://mmsearch.github.io

Updated: 2024-11-27 08:49:12

标题: MMSearch：将大型模型作为多模态搜索引擎的潜力进行基准测试

摘要: 大型语言模型（LLMs）的出现为AI搜索引擎，例如SearchGPT，开辟了一条新的人机互动范式。然而，大多数当前的AI搜索引擎仅限于文本设置，忽略了多模态用户查询和网站信息的文本-图像交替性质。最近，大型多模态模型（LMMs）取得了令人瞩目的进展。然而，它们是否能够作为AI搜索引擎运行仍未充分探讨，使得LMMs在多模态搜索中的潜力仍是一个待解开的问题。为此，我们首先设计了一个精心设计的管道，MMSearch-Engine，以赋予任何LMMs多模态搜索能力。在此基础上，我们引入了MMSearch，一个全面评估LMMs多模态搜索性能的基准。精心策划的数据集包含300个手动收集的实例，涵盖14个子领域，与当前LMMs的训练数据没有重叠，确保只能在搜索中获取正确答案。通过使用MMSearch-Engine，LMMs通过执行三个单独的任务（重新查询、重新排名和摘要），以及一个具有完整搜索过程的具有挑战性的端到端任务进行评估。我们对闭源和开源LMMs进行了广泛的实验。在所有测试的模型中，GPT-4o与MMSearch-Engine取得了最佳结果，在端到端任务中超过了商业产品Perplexity Pro，展示了我们提出的管道的有效性。我们进一步进行错误分析，揭示当前LMMs仍然在完全掌握多模态搜索任务方面存在困难，并进行消融研究，表明在AI搜索引擎中进行测试时间计算的潜力。我们希望MMSearch可以为指导未来多模态AI搜索引擎的发展提供独特的见解。项目页面：https://mmsearch.github.io

更新时间: 2024-11-27 08:49:12

领域: cs.CV,cs.AI,cs.CL,cs.IR

下载: http://arxiv.org/abs/2409.12959v2

Harnessing Large Language Models for Seed Generation in Greybox Fuzzing

Greybox fuzzing has emerged as a preferred technique for discovering software bugs, striking a balance between efficiency and depth of exploration. While research has focused on improving fuzzing techniques, the importance of high-quality initial seeds remains critical yet often overlooked. Existing methods for seed generation are limited, especially for programs with non-standard or custom input formats. Large Language Models (LLMs) has revolutionized numerous domains, showcasing unprecedented capabilities in understanding and generating complex patterns across various fields of knowledge. This paper introduces SeedMind, a novel system that leverages LLMs to boost greybox fuzzing through intelligent seed generation. Unlike previous approaches, SeedMind employs LLMs to create test case generators rather than directly producing test cases. Our approach implements an iterative, feedback-driven process that guides the LLM to progressively refine test case generation, aiming for increased code coverage depth and breadth. In developing SeedMind, we addressed key challenges including input format limitations, context window constraints, and ensuring consistent, progress-aware behavior. Intensive evaluations with real-world applications show that SeedMind effectively harnesses LLMs to generate high-quality test cases and facilitate fuzzing in bug finding, presenting utility comparable to human-created seeds and significantly outperforming the existing LLM-based solutions.

Updated: 2024-11-27 08:44:41

标题: 利用大型语言模型在灰盒模糊测试中进行种子生成

摘要: 灰盒模糊测试已成为发现软件漏洞的首选技术，平衡了效率和探索深度。虽然研究已经着重改进模糊测试技术，但高质量的初始种子仍然至关重要，但往往被忽视。现有的种子生成方法受限，特别是对于具有非标准或自定义输入格式的程序。大型语言模型（LLMs）在许多领域引起了革命，展示了在各种知识领域中理解和生成复杂模式的前所未有的能力。本文介绍了SeedMind，这是一个利用LLMs通过智能种子生成来增强灰盒模糊测试的新系统。与先前的方法不同，SeedMind利用LLMs创建测试案例生成器，而不是直接生成测试案例。我们的方法实现了一个迭代的、反馈驱动的过程，指导LLM逐渐完善测试案例生成，旨在增加代码覆盖深度和广度。在开发SeedMind时，我们解决了包括输入格式限制、上下文窗口约束以及确保一致、进步感知行为在内的关键挑战。通过与真实应用程序的密集评估，表明SeedMind有效地利用LLMs生成高质量的测试案例，并促进了在漏洞发现中的模糊测试，提供了与人类创建种子相当的效用，并明显优于现有的基于LLMs的解决方案。

更新时间: 2024-11-27 08:44:41

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2411.18143v1

Predicting Water Quality using Quantum Machine Learning: The Case of the Umgeni Catchment (U20A) Study Region

In this study, we consider a real-world application of QML techniques to study water quality in the U20A region in Durban, South Africa. Specifically, we applied the quantum support vector classifier (QSVC) and quantum neural network (QNN), and we showed that the QSVC is easier to implement and yields a higher accuracy. The QSVC models were applied for three kernels: Linear, polynomial, and radial basis function (RBF), and it was shown that the polynomial and RBF kernels had exactly the same performance. The QNN model was applied using different optimizers, learning rates, noise on the circuit components, and weight initializations were considered, but the QNN persistently ran into the dead neuron problem. Thus, the QNN was compared only by accraucy and loss, and it was shown that with the Adam optimizer, the model has the best performance, however, still less than the QSVC.

Updated: 2024-11-27 08:43:07

标题: 使用量子机器学习预测水质：以Umgeni集水区（U20A）研究区为例

摘要: 在这项研究中，我们考虑了将QML技术应用于南非德班U20A地区水质研究的实际应用。具体而言，我们应用了量子支持向量分类器（QSVC）和量子神经网络（QNN），并且我们展示了QSVC更容易实现并且具有更高的准确性。QSVC模型应用了三种核函数：线性、多项式和径向基函数（RBF），结果显示多项式和RBF核函数具有完全相同的性能。QNN模型使用不同的优化器、学习率、电路组件上的噪声和权重初始化进行了考虑，但是QNN持续遇到死神经元问题。因此，QNN仅通过准确性和损失进行比较，并且结果显示使用Adam优化器时模型性能最佳，但仍低于QSVC。

更新时间: 2024-11-27 08:43:07

领域: quant-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.18141v1

ScaleViz: Scaling Visualization Recommendation Models on Large Data

Automated visualization recommendations (vis-rec) help users to derive crucial insights from new datasets. Typically, such automated vis-rec models first calculate a large number of statistics from the datasets and then use machine-learning models to score or classify multiple visualizations choices to recommend the most effective ones, as per the statistics. However, state-of-the art models rely on very large number of expensive statistics and therefore using such models on large datasets become infeasible due to prohibitively large computational time, limiting the effectiveness of such techniques to most real world complex and large datasets. In this paper, we propose a novel reinforcement-learning (RL) based framework that takes a given vis-rec model and a time-budget from the user and identifies the best set of input statistics that would be most effective while generating the visual insights within a given time budget, using the given model. Using two state-of-the-art vis-rec models applied on three large real-world datasets, we show the effectiveness of our technique in significantly reducing time-to visualize with very small amount of introduced error. Our approach is about 10X times faster compared to the baseline approaches that introduce similar amounts of error.

Updated: 2024-11-27 08:43:06

标题: ScaleViz：在大数据上扩展可视化推荐模型

摘要: 自动化可视化推荐（vis-rec）帮助用户从新数据集中获取关键见解。通常，这些自动化vis-rec模型首先从数据集中计算大量统计数据，然后使用机器学习模型对多个可视化选择进行评分或分类，以根据统计数据推荐最有效的选择。然而，目前的模型依赖于非常昂贵的统计数据，因此在大型数据集上使用这些模型变得不可行，因为计算时间过长，限制了这些技术在大多数现实世界复杂和大型数据集上的有效性。在本文中，我们提出了一种基于强化学习（RL）的框架，该框架接受用户提供的vis-rec模型和时间预算，并确定在给定时间预算内使用给定模型生成视觉见解时最有效的输入统计数据集。通过在三个大型真实数据集上应用两种最先进的vis-rec模型，我们展示了我们的技术在显着减少可视化时间并引入非常少量错误方面的有效性。与引入相似数量的错误的基线方法相比，我们的方法大约快10倍。

更新时间: 2024-11-27 08:43:06

领域: cs.AI,cs.HC,stat.ML

下载: http://arxiv.org/abs/2411.18657v1

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation

Full-duplex multimodal large language models (LLMs) provide a unified framework for addressing diverse speech understanding and generation tasks, enabling more natural and seamless human-machine conversations. Unlike traditional modularised conversational AI systems, which separate speech recognition, understanding, and text-to-speech generation into distinct components, multimodal LLMs operate as single end-to-end models. This streamlined design eliminates error propagation across components and fully leverages the rich non-verbal information embedded in input speech signals. We introduce SALMONN-omni, a codec-free, full-duplex speech understanding and generation model capable of simultaneously listening to its own generated speech and background sounds while speaking. To support this capability, we propose a novel duplex spoken dialogue framework incorporating a ``thinking'' mechanism that facilitates asynchronous text and speech generation relying on embeddings instead of codecs (quantized speech and audio tokens). Experimental results demonstrate SALMONN-omni's versatility across a broad range of streaming speech tasks, including speech recognition, speech enhancement, and spoken question answering. Additionally, SALMONN-omni excels at managing turn-taking, barge-in, and echo cancellation scenarios, establishing its potential as a robust prototype for full-duplex conversational AI systems. To the best of our knowledge, SALMONN-omni is the first codec-free model of its kind. A full technical report along with model checkpoints will be released soon.

Updated: 2024-11-27 08:38:57

标题: SALMONN-omni: 一种无编解码器的全双工语音理解和生成LLM

摘要: 全双工多模态大型语言模型（LLMs）提供了一个统一的框架，用于解决各种语音理解和生成任务，实现更自然和无缝的人机对话。与传统的模块化对话人工智能系统不同，将语音识别、理解和文本转语音生成分开成独立的组件，多模态LLMs作为单一端到端模型运行。这种简化的设计消除了组件之间的错误传播，并充分利用了输入语音信号中嵌入的丰富非语言信息。我们介绍了SALMONN-omni，一个无编解码器的全双工语音理解和生成模型，能够同时听取自己生成的语音和背景声音。为了支持这种能力，我们提出了一种新颖的双对话框架，包括一个“思考”机制，促进异步文本和语音生成，依靠嵌入而不是编解码器（量化语音和音频令牌）。实验结果表明，SALMONN-omni在广泛的流式语音任务中表现出了多样性，包括语音识别、语音增强和口语问答。此外，SALMONN-omni擅长处理轮流、插话和回声消除情景，确立其作为全双工对话人工智能系统坚固原型的潜力。据我们所知，SALMONN-omni是其类别中第一个无编解码器的模型。很快将发布完整的技术报告和模型检查点。

更新时间: 2024-11-27 08:38:57

领域: eess.AS,cs.AI,cs.CL,cs.SD

下载: http://arxiv.org/abs/2411.18138v1

The Return of Pseudosciences in Artificial Intelligence: Have Machine Learning and Deep Learning Forgotten Lessons from Statistics and History?

In today's world, AI programs powered by Machine Learning are ubiquitous, and have achieved seemingly exceptional performance across a broad range of tasks, from medical diagnosis and credit rating in banking, to theft detection via video analysis, and even predicting political or sexual orientation from facial images. These predominantly deep learning methods excel due to their extraordinary capacity to process vast amounts of complex data to extract complex correlations and relationship from different levels of features. In this paper, we contend that the designers and final users of these ML methods have forgotten a fundamental lesson from statistics: correlation does not imply causation. Not only do most state-of-the-art methods neglect this crucial principle, but by doing so they often produce nonsensical or flawed causal models, akin to social astrology or physiognomy. Consequently, we argue that current efforts to make AI models more ethical by merely reducing biases in the training data are insufficient. Through examples, we will demonstrate that the potential for harm posed by these methods can only be mitigated by a complete rethinking of their core models, improved quality assessment metrics and policies, and by maintaining humans oversight throughout the process.

Updated: 2024-11-27 08:23:23

标题: 人工智能中伪科学的回归：机器学习和深度学习是否忘记了统计学和历史的教训？

摘要: 在当今世界，由机器学习驱动的人工智能程序无处不在，并在广泛的任务范围内取得了看似异常的性能，从医学诊断和银行信用评级，到通过视频分析检测盗窃，甚至从面部图像预测政治或性取向。这些主要是深度学习方法的卓越表现，是由于它们处理大量复杂数据以从不同层次的特征中提取复杂相关性和关系的非凡能力。在本文中，我们认为这些机器学习方法的设计者和最终用户已经忘记了统计学的一个基本教训：相关性不意味着因果关系。大多数最先进的方法不仅忽视了这一关键原则，而且通过这样做，他们经常产生荒谬或有缺陷的因果模型，类似于社会占星术或相面相术。因此，我们认为当前仅仅通过减少训练数据中的偏见来使人工智能模型更具道德性的努力是不够的。通过例子，我们将证明这些方法所带来的潜在危害只能通过对其核心模型的彻底重新思考、改进的质量评估指标和政策，以及在整个过程中保持人类监督来减轻。

更新时间: 2024-11-27 08:23:23

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.18656v1

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Video Variational Autoencoder (VAE) encodes videos into a low-dimensional latent space, becoming a key component of most Latent Video Diffusion Models (LVDMs) to reduce model training costs. However, as the resolution and duration of generated videos increase, the encoding cost of Video VAEs becomes a limiting bottleneck in training LVDMs. Moreover, the block-wise inference method adopted by most LVDMs can lead to discontinuities of latent space when processing long-duration videos. The key to addressing the computational bottleneck lies in decomposing videos into distinct components and efficiently encoding the critical information. Wavelet transform can decompose videos into multiple frequency-domain components and improve the efficiency significantly, we thus propose Wavelet Flow VAE (WF-VAE), an autoencoder that leverages multi-level wavelet transform to facilitate low-frequency energy flow into latent representation. Furthermore, we introduce a method called Causal Cache, which maintains the integrity of latent space during block-wise inference. Compared to state-of-the-art video VAEs, WF-VAE demonstrates superior performance in both PSNR and LPIPS metrics, achieving 2x higher throughput and 4x lower memory consumption while maintaining competitive reconstruction quality. Our code and models are available at https://github.com/PKU-YuanGroup/WF-VAE.

Updated: 2024-11-27 08:21:47

标题: WF-VAE：通过小波驱动的能量流增强视频VAE以用于潜在视频扩散模型

摘要: 视频变分自动编码器（VAE）将视频编码为低维潜在空间，成为大多数潜在视频扩散模型（LVDMs）的关键组件，以减少模型训练成本。然而，随着生成视频的分辨率和持续时间的增加，视频VAE的编码成本成为训练LVDMs的限制瓶颈。此外，大多数LVDMs采用的分块推理方法在处理长时间视频时可能导致潜在空间的不连续性。解决计算瓶颈的关键在于将视频分解为不同的组件并有效地编码关键信息。小波变换可以将视频分解为多个频域组件并显着提高效率，因此我们提出了Wavelet Flow VAE（WF-VAE），这是一种自动编码器，利用多级小波变换促进低频能量流入潜在表示。此外，我们引入了一种称为因果缓存的方法，可以在分块推理过程中保持潜在空间的完整性。与最先进的视频VAE相比，WF-VAE在PSNR和LPIPS指标上表现出优越性能，实现了2倍的吞吐量和4倍的内存消耗降低，同时保持了竞争性的重建质量。我们的代码和模型可在https://github.com/PKU-YuanGroup/WF-VAE 上找到。

更新时间: 2024-11-27 08:21:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.17459v2

A Machine Learning-based Framework towards Assessment of Decision-Makers' Biases

Biased human decisions have consequential impacts across various domains, yielding unfair treatment of individuals and resulting in suboptimal outcomes for organizations and society. In recognition of this fact, organizations regularly design and deploy interventions aimed at mitigating these biases. However, measuring human decision biases remains an important but elusive task. Organizations are frequently concerned with mistaken decisions disproportionately affecting one group. In practice, however, this is typically not possible to assess due to the scarcity of a gold standard: a label that indicates what the correct decision would have been. In this work, we propose a machine learning-based framework to assess bias in human-generated decisions when gold standard labels are scarce. We provide theoretical guarantees and empirical evidence demonstrating the superiority of our method over existing alternatives. This proposed methodology establishes a foundation for transparency in human decision-making, carrying substantial implications for managerial duties, and offering potential for alleviating algorithmic biases when human decisions are used as labels to train algorithms.

Updated: 2024-11-27 08:02:31

标题: 一个基于机器学习的框架，用于评估决策者的偏见

摘要: 偏见的人类决策在各个领域产生重要影响，导致对个人的不公平对待，并导致组织和社会的次优结果。鉴于这一事实，组织定期设计和部署旨在减轻这些偏见的干预措施。然而，衡量人类决策偏见仍然是一项重要但难以实现的任务。组织经常担心错误决策对某一群体造成不成比例的影响。然而，在实践中，由于缺乏黄金标准：一个指示正确决策应该是什么的标签，通常无法评估这一点。在这项工作中，我们提出了一个基于机器学习的框架，用于在黄金标准标签稀缺时评估人类生成的决策中的偏见。我们提供理论保证和经验证据，展示了我们的方法优于现有替代方法的优越性。这种提出的方法论为人类决策透明性奠定了基础，对管理职责具有重要意义，并为减轻算法偏见提供潜力，当人类决策被用作标签来训练算法时。

更新时间: 2024-11-27 08:02:31

领域: cs.LG

下载: http://arxiv.org/abs/2411.18122v1

The Bigger the Better? Accurate Molecular Potential Energy Surfaces from Minimalist Neural Networks

Atomistic simulations are a powerful tool for studying the dynamics of molecules, proteins, and materials on wide time and length scales. Their reliability and predictiveness, however, depend directly on the accuracy of the underlying potential energy surface (PES). Guided by the principle of parsimony this work introduces KerNN, a combined kernel/neural network-based approach to represent molecular PESs. Compared to state-of-the-art neural network PESs the number of learnable parameters of KerNN is significantly reduced. This speeds up training and evaluation times by several orders of magnitude while retaining high prediction accuracy. Importantly, using kernels as the features also improves the extrapolation capabilities of KerNN far beyond the coverage provided by the training data which solves a general problem of NN-based PESs. KerNN applied to spectroscopy and reaction dynamics shows excellent performance on test set statistics and observables including vibrational bands computed from classical and quantum simulations.

Updated: 2024-11-27 08:01:21

标题: 越大越好？来自极简神经网络的准确分子势能曲面

摘要: 原子级模拟是研究分子、蛋白质和材料在广泛时间和长度尺度上动力学的强大工具。然而，它们的可靠性和预测性直接取决于基本势能表面（PES）的准确性。本文引入了KerNN，一种基于组合核/神经网络的方法来表示分子PES。与最先进的神经网络PES相比，KerNN的可学习参数数量显著减少。这加快了训练和评估时间数个数量级，同时保持高预测精度。重要的是，使用核作为特征还提高了KerNN的外推能力，远远超出了训练数据提供的覆盖范围，解决了基于NN的PES的一个普遍问题。KerNN应用于光谱学和反应动力学，在测试集统计数据和从经典和量子模拟计算出的振动带等可观测量上表现出优异性能。

更新时间: 2024-11-27 08:01:21

领域: physics.chem-ph,cs.LG

下载: http://arxiv.org/abs/2411.18121v1

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Estimating video depth in open-world scenarios is challenging due to the diversity of videos in appearance, content motion, camera movement, and length. We present DepthCrafter, an innovative method for generating temporally consistent long depth sequences with intricate details for open-world videos, without requiring any supplementary information such as camera poses or optical flow. The generalization ability to open-world videos is achieved by training the video-to-depth model from a pre-trained image-to-video diffusion model, through our meticulously designed three-stage training strategy. Our training approach enables the model to generate depth sequences with variable lengths at one time, up to 110 frames, and harvest both precise depth details and rich content diversity from realistic and synthetic datasets. We also propose an inference strategy that can process extremely long videos through segment-wise estimation and seamless stitching. Comprehensive evaluations on multiple datasets reveal that DepthCrafter achieves state-of-the-art performance in open-world video depth estimation under zero-shot settings. Furthermore, DepthCrafter facilitates various downstream applications, including depth-based visual effects and conditional video generation.

Updated: 2024-11-27 07:59:25

标题: DepthCrafter：为开放世界视频生成一致的长深度序列

摘要: 在开放世界场景中估计视频深度是具有挑战性的，因为视频在外观、内容动态、摄像机移动和长度等方面具有多样性。我们提出了DepthCrafter，一种创新方法，用于在开放世界视频中生成具有细节的时间一致的长深度序列，而无需任何额外信息，如摄像机姿势或光流。通过我们精心设计的三阶段训练策略，从预先训练的图像到视频扩散模型训练视频到深度模型，实现了对开放世界视频的泛化能力。我们的训练方法使模型能够一次生成长度可变的深度序列，最多达到110帧，并从真实和合成数据集中获取精确的深度细节和丰富的内容多样性。我们还提出了一种推理策略，通过分段估计和无缝拼接可以处理极长的视频。在多个数据集上进行的全面评估表明，DepthCrafter在零-shot设置下实现了开放世界视频深度估计的最新性能。此外，DepthCrafter促进了各种下游应用，包括基于深度的视觉效果和条件视频生成。

更新时间: 2024-11-27 07:59:25

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2409.02095v2

Pathways on the Image Manifold: Image Editing via Video Generation

Recent advances in image editing, driven by image diffusion models, have shown remarkable progress. However, significant challenges remain, as these models often struggle to follow complex edit instructions accurately and frequently compromise fidelity by altering key elements of the original image. Simultaneously, video generation has made remarkable strides, with models that effectively function as consistent and continuous world simulators. In this paper, we propose merging these two fields by utilizing image-to-video models for image editing. We reformulate image editing as a temporal process, using pretrained video models to create smooth transitions from the original image to the desired edit. This approach traverses the image manifold continuously, ensuring consistent edits while preserving the original image's key aspects. Our approach achieves state-of-the-art results on text-based image editing, demonstrating significant improvements in both edit accuracy and image preservation.

Updated: 2024-11-27 07:59:06

标题: 图像流形上的路径：通过视频生成进行图像编辑

摘要: 最近基于图像扩散模型推动的图像编辑方面取得了显著进展。然而，重大挑战仍然存在，因为这些模型经常难以准确遵循复杂的编辑指令，并经常通过改变原始图像的关键元素来损害保真度。同时，视频生成取得了显著进展，使用有效地作为一致连续的世界模拟器的模型。在本文中，我们提出通过利用图像到视频模型进行图像编辑来合并这两个领域。我们将图像编辑重新构造为一个时间过程，利用预训练的视频模型从原始图像到所需编辑之间创建平滑的过渡。这种方法连续遍历图像空间，确保编辑的一致性同时保留原始图像的关键方面。我们的方法在基于文本的图像编辑上取得了最先进的结果，显示出在编辑准确性和图像保护方面的显著改进。

更新时间: 2024-11-27 07:59:06

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.16819v2

Combinational Backdoor Attack against Customized Text-to-Image Models

Recently, Text-to-Image (T2I) synthesis technology has made tremendous strides. Numerous representative T2I models have emerged and achieved promising application outcomes, such as DALL-E, Stable Diffusion, Imagen, etc. In practice, it has become increasingly popular for model developers to selectively adopt various pre-trained text encoders and conditional diffusion models from third-party platforms, integrating them to build customized (personalized) T2I models. However, such an adoption approach is vulnerable to backdoor attacks. In this work, we propose a Combinational Backdoor Attack against Customized T2I models (CBACT2I) targeting this application scenario. Different from previous backdoor attacks against T2I models, CBACT2I embeds the backdoor into the text encoder and the conditional diffusion model separately. The customized T2I model exhibits backdoor behaviors only when the backdoor text encoder is used in combination with the backdoor conditional diffusion model. These properties make CBACT2I more stealthy and flexible than prior backdoor attacks against T2I models. Extensive experiments demonstrate the effectiveness of CBACT2I with different backdoor triggers and different backdoor targets on the open-sourced Stable Diffusion model. This work reveals the backdoor vulnerabilities of customized T2I models and urges countermeasures to mitigate backdoor threats in this scenario.

Updated: 2024-11-27 07:41:57

标题: 针对定制文本到图像模型的组合后门攻击

摘要: 最近，文本到图像（T2I）合成技术取得了巨大进展。许多代表性的T2I模型已经出现并取得了令人期待的应用成果，如DALL-E、Stable Diffusion、Imagen等。在实践中，模型开发者越来越倾向于有选择地采用第三方平台上的各种预训练文本编码器和条件扩散模型，将它们集成在一起构建定制（个性化）的T2I模型。然而，这种采用方法容易受到后门攻击。在本文中，我们提出了针对这种应用场景的定制T2I模型的组合后门攻击（CBACT2I）。与先前针对T2I模型的后门攻击不同，CBACT2I将后门分别嵌入文本编码器和条件扩散模型中。只有当后门文本编码器与后门条件扩散模型结合使用时，定制T2I模型才会表现出后门行为。这些特性使CBACT2I比先前的后门攻击对T2I模型更具隐蔽性和灵活性。大量实验表明CBACT2I在开源的Stable Diffusion模型上具有不同后门触发器和不同后门目标的有效性。这项工作揭示了定制T2I模型的后门漏洞，并敦促采取措施来减轻在这种情况下的后门威胁。

更新时间: 2024-11-27 07:41:57

领域: cs.CR

下载: http://arxiv.org/abs/2411.12389v2

Playing Language Game with LLMs Leads to Jailbreaking

The advent of large language models (LLMs) has spurred the development of numerous jailbreak techniques aimed at circumventing their security defenses against malicious attacks. An effective jailbreak approach is to identify a domain where safety generalization fails, a phenomenon known as mismatched generalization. In this paper, we introduce two novel jailbreak methods based on mismatched generalization: natural language games and custom language games, both of which effectively bypass the safety mechanisms of LLMs, with various kinds and different variants, making them hard to defend and leading to high attack rates. Natural language games involve the use of synthetic linguistic constructs and the actions intertwined with these constructs, such as the Ubbi Dubbi language. Building on this phenomenon, we propose the custom language games method: by engaging with LLMs using a variety of custom rules, we successfully execute jailbreak attacks across multiple LLM platforms. Extensive experiments demonstrate the effectiveness of our methods, achieving success rates of 93% on GPT-4o, 89% on GPT-4o-mini and 83% on Claude-3.5-Sonnet. Furthermore, to investigate the generalizability of safety alignments, we fine-tuned Llama-3.1-70B with the custom language games to achieve safety alignment within our datasets and found that when interacting through other language games, the fine-tuned models still failed to identify harmful content. This finding indicates that the safety alignment knowledge embedded in LLMs fails to generalize across different linguistic formats, thus opening new avenues for future research in this area.

Updated: 2024-11-27 07:41:35

标题: 与LLMs玩语言游戏会导致越狱

摘要: 大语言模型（LLMs）的出现促进了大量破解技术的发展，旨在规避其安全防御措施，抵御恶意攻击。一种有效的破解方法是识别安全概括失败的领域，这一现象称为不匹配的概括。在本文中，我们介绍了两种基于不匹配概括的新型破解方法：自然语言游戏和定制语言游戏，两者都可以有效地绕过LLMs的安全机制，具有各种不同的变体，使其难以防御，并导致高攻击率。自然语言游戏涉及使用合成语言结构和与这些结构相互交织的动作，例如Ubbi Dubbi语言。基于这一现象，我们提出了定制语言游戏方法：通过采用各种自定义规则与LLMs互动，我们成功地在多个LLM平台上执行了破解攻击。大量实验证明了我们方法的有效性，在GPT-4o上成功率达到93％，在GPT-4o-mini上为89％，在Claude-3.5-Sonnet上为83％。此外，为了调查安全对齐的普适性，我们使用定制语言游戏对Llama-3.1-70B进行了微调，以在我们的数据集中实现安全对齐，并发现当通过其他语言游戏进行交互时，微调模型仍无法识别有害内容。这一发现表明，嵌入在LLMs中的安全对齐知识无法在不同的语言格式之间概括，因此为未来在这一领域的研究开辟了新的途径。

更新时间: 2024-11-27 07:41:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.12762v2

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Current diffusion models for human image animation struggle to ensure identity (ID) consistency. This paper presents StableAnimator, the first end-to-end ID-preserving video diffusion framework, which synthesizes high-quality videos without any post-processing, conditioned on a reference image and a sequence of poses. Building upon a video diffusion model, StableAnimator contains carefully designed modules for both training and inference striving for identity consistency. In particular, StableAnimator begins by computing image and face embeddings with off-the-shelf extractors, respectively and face embeddings are further refined by interacting with image embeddings using a global content-aware Face Encoder. Then, StableAnimator introduces a novel distribution-aware ID Adapter that prevents interference caused by temporal layers while preserving ID via alignment. During inference, we propose a novel Hamilton-Jacobi-Bellman (HJB) equation-based optimization to further enhance the face quality. We demonstrate that solving the HJB equation can be integrated into the diffusion denoising process, and the resulting solution constrains the denoising path and thus benefits ID preservation. Experiments on multiple benchmarks show the effectiveness of StableAnimator both qualitatively and quantitatively.

Updated: 2024-11-27 07:39:20

标题: 稳定动画师：高质量保持人类形象的图像动画

摘要: 目前的人体图像动画扩散模型在确保身份（ID）一致性方面存在困难。本文介绍了StableAnimator，这是第一个端到端的保持ID的视频扩散框架，它能合成高质量的视频而无需任何后期处理，条件是一个参考图像和一系列姿势。在视频扩散模型的基础上，StableAnimator包含了精心设计的模块，用于训练和推理，以确保身份一致性。特别是，StableAnimator首先利用现成的提取器计算图像和人脸嵌入，然后通过与图像嵌入相互作用的全局内容感知人脸编码器进一步细化人脸嵌入。然后，StableAnimator引入了一种新颖的分布感知ID适配器，可以防止由于时间层次而引起的干扰，并通过对齐来保持ID。在推理过程中，我们提出了一种基于Hamilton-Jacobi-Bellman（HJB）方程的优化方法，以进一步提高人脸质量。我们证明了解决HJB方程可以整合到扩散去噪过程中，并且所得到的解决方案限制了去噪路径，从而有利于保持ID。在多个基准测试上的实验显示了StableAnimator在定性和定量方面的有效性。

更新时间: 2024-11-27 07:39:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.17697v2

Training and Evaluating Language Models with Template-based Data Generation

The rapid advancement of large language models (LLMs) such as GPT-3, PaLM, and Llama has significantly transformed natural language processing, showcasing remarkable capabilities in understanding and generating language. However, these models often struggle with tasks requiring complex reasoning, particularly in mathematical problem-solving, due in part to the scarcity of large-scale, high-quality, domain-specific datasets necessary for training sophisticated reasoning abilities. To address this limitation, we introduce Template-based Data Generation (TDG), a novel approach that leverages LLMs (GPT-4) to automatically generate parameterized meta-templates, which are then used to synthesize a vast array of high-quality problems and solutions. Leveraging TDG, we create TemplateMath Part I: TemplateGSM, a dataset comprising over 7 million synthetically generated grade school math problems--each accompanied by code-based and natural language solutions--with the potential to generate an effectively unlimited number more. This dataset alleviates the scarcity of large-scale mathematical datasets and serves as a valuable resource for pre-training, fine-tuning, and evaluating LLMs in mathematical reasoning. Our method not only enables the generation of virtually infinite data but also elevates data augmentation to a new level by using GPT-4 for meta-template generation, ensuring diverse and high-quality problem structures. The TemplateMath Part I: TemplateGSM dataset is publicly available at https://huggingface.co/datasets/math-ai/TemplateGSM. The code is available at https://github.com/iiis-ai/TemplateMath.

Updated: 2024-11-27 07:32:56

标题: 使用基于模板的数据生成来训练和评估语言模型

摘要: 大型语言模型（LLMs）如GPT-3、PaLM和Llama的迅速发展显著改变了自然语言处理，展示出在理解和生成语言方面的杰出能力。然而，这些模型通常在需要复杂推理的任务中遇到困难，特别是在数学问题解决方面，部分原因是由于缺乏用于训练复杂推理能力所需的大规模、高质量、特定领域的数据集。为了解决这一限制，我们引入了基于模板的数据生成（TDG），这是一种利用LLMs（GPT-4）自动生成参数化元模板的新方法，然后用这些元模板来合成大量高质量的问题和解决方案。利用TDG，我们创建了TemplateMath Part I: TemplateGSM，这是一个包含超过700万个合成生成的小学数学问题的数据集，每个问题都附带基于代码和自然语言的解决方案，有潜力生成更多问题。这个数据集缓解了大规模数学数据集的稀缺情况，并成为了在数学推理中对LLMs进行预训练、微调和评估的宝贵资源。我们的方法不仅能够生成几乎无限的数据，还通过使用GPT-4进行元模板生成，将数据增强提升到了一个新水平，确保多样化和高质量的问题结构。TemplateMath Part I: TemplateGSM数据集可以在https://huggingface.co/datasets/math-ai/TemplateGSM上公开获取。代码可以在https://github.com/iiis-ai/TemplateMath上找到。

更新时间: 2024-11-27 07:32:56

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.18104v1

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

In the realm of large language models (LLMs), the ability of models to accurately follow instructions is paramount as more agents and applications leverage LLMs for construction, where the complexity of instructions are rapidly increasing. However, on the one hand, there is only a certain amount of complex instruction evaluation data; on the other hand, there are no dedicated algorithms to improve the ability to follow complex instructions. To this end, this paper introduces TRACE, a benchmark for improving and evaluating the complex instructionfollowing ability, which consists of 120K training data and 1K evaluation data. Furthermore, we propose IOPO (Input-Output Preference Optimization) alignment method which takes both input and output preference pairs into consideration, where LLMs not only rapidly align with response preferences but also meticulously explore the instruction preferences. Extensive experiments on both in-domain and outof-domain datasets confirm the effectiveness of IOPO, showing 8.15%, 2.18% improvements on in-domain data and 6.29%, 3.13% on outof-domain data compared to SFT and DPO respectively.

Updated: 2024-11-27 07:29:59

标题: IOPO：通过输入输出偏好优化赋予LLMs复杂指令跟随的能力

摘要: 在大型语言模型（LLMs）领域中，模型准确遵循指示的能力至关重要，因为越来越多的代理和应用程序利用LLMs进行构建，其中指示的复杂性迅速增加。然而，一方面，复杂指令评估数据仅有限；另一方面，没有专门的算法来提高遵循复杂指令的能力。因此，本文介绍了TRACE，一个用于改进和评估复杂指令遵循能力的基准，包括120K个训练数据和1K个评估数据。此外，我们提出了IOPO（输入-输出偏好优化）对齐方法，考虑了输入和输出偏好对，LLMs不仅能够快速与响应偏好对齐，还能精心探索指令偏好。对领域内和领域外数据集进行的广泛实验验证了IOPO的有效性，相对于SFT和DPO，分别在领域内数据上实现了8.15％和2.18％的改进，在领域外数据上分别实现了6.29％和3.13％的改进。

更新时间: 2024-11-27 07:29:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.06208v2

Aligning Knowledge Concepts to Whole Slide Images for Precise Histopathology Image Analysis

Due to the large size and lack of fine-grained annotation, Whole Slide Images (WSIs) analysis is commonly approached as a Multiple Instance Learning (MIL) problem. However, previous studies only learn from training data, posing a stark contrast to how human clinicians teach each other and reason about histopathologic entities and factors. Here we present a novel knowledge concept-based MIL framework, named ConcepPath to fill this gap. Specifically, ConcepPath utilizes GPT-4 to induce reliable diseasespecific human expert concepts from medical literature, and incorporate them with a group of purely learnable concepts to extract complementary knowledge from training data. In ConcepPath, WSIs are aligned to these linguistic knowledge concepts by utilizing pathology vision-language model as the basic building component. In the application of lung cancer subtyping, breast cancer HER2 scoring, and gastric cancer immunotherapy-sensitive subtyping task, ConcepPath significantly outperformed previous SOTA methods which lack the guidance of human expert knowledge.

Updated: 2024-11-27 07:27:52

标题: 将知识概念与整张切片图像对齐，以进行精准的组织病理学图像分析。

摘要: 由于大尺寸和缺乏细粒度注释，整张幻灯片图像（WSIs）分析通常被视为多实例学习（MIL）问题。然而，先前的研究仅从训练数据中学习，与人类临床医生相互教导和推理组织病理学实体和因素的方式形成鲜明对比。在这里，我们提出了一个新颖的基于知识概念的MIL框架，名为ConcepPath，以填补这一差距。具体而言，ConcepPath利用GPT-4从医学文献中诱导出可靠的疾病特定的人类专家概念，并将其与一组纯学习概念结合起来，从训练数据中提取互补知识。在ConcepPath中，通过利用病理学视觉语言模型作为基本构建组件，将WSIs与这些语言知识概念对齐。在肺癌亚型、乳腺癌HER2评分和胃癌免疫治疗敏感亚型任务的应用中，ConcepPath明显优于以往缺乏人类专家知识指导的SOTA方法。

更新时间: 2024-11-27 07:27:52

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.18101v1

A First Look at GPT Apps: Landscape and Vulnerability

Following OpenAI's introduction of GPTs, a surge in GPT apps has led to the launch of dedicated LLM app stores. Nevertheless, given its debut, there is a lack of sufficient understanding of this new ecosystem. To fill this gap, this paper presents a first comprehensive longitudinal (5-month) study of the evolution, landscape, and vulnerability of the emerging LLM app ecosystem, focusing on two GPT app stores: \textit{GPTStore.AI} and the official \textit{OpenAI GPT Store}. Specifically, we develop two automated tools and a TriLevel configuration extraction strategy to efficiently gather metadata (\ie names, creators, descriptions, \etc) and user feedback for all GPT apps across these two stores, as well as configurations (\ie system prompts, knowledge files, and APIs) for the top 10,000 popular apps. Our extensive analysis reveals: (1) the user enthusiasm for GPT apps consistently rises, whereas creator interest plateaus within three months of GPTs' launch; (2) nearly 90\% system prompts can be easily accessed due to widespread failure to secure GPT app configurations, leading to considerable plagiarism and duplication among apps. Our findings highlight the necessity of enhancing the LLM app ecosystem by the app stores, creators, and users.

Updated: 2024-11-27 07:26:34

标题: 首次了解GPT应用：概况和脆弱性

摘要: 随着OpenAI推出GPT，GPT应用程序的激增导致专门的LLM应用程序商店的推出。然而，由于其首次亮相，对这个新生态系统缺乏足够的理解。为了填补这一空白，本文提出了对新兴LLM应用程序生态系统的演变、景观和脆弱性进行首次全面的纵向（5个月）研究，重点关注两个GPT应用商店：GPTStore.AI和官方OpenAI GPT商店。具体来说，我们开发了两个自动化工具和一个TriLevel配置提取策略，以高效地收集这两个商店中所有GPT应用程序的元数据（如名称、创建者、描述等）和用户反馈，以及热门前10000个应用程序的配置（如系统提示、知识文件和API）。我们的广泛分析揭示了：（1）用户对GPT应用程序的热情持续上升，而创建者的兴趣在GPT推出后三个月内趋于平稳；（2）近90%的系统提示因未能保护GPT应用程序配置而容易访问，导致应用程序之间存在大量抄袭和重复。我们的发现突出了增强LLM应用程序生态系统的必要性，这需要应用商店、创建者和用户共同努力。

更新时间: 2024-11-27 07:26:34

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2402.15105v3

Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation

Self-training approach for large language models (LLMs) improves reasoning abilities by training the models on their self-generated rationales. Previous approaches have labeled rationales that produce correct answers for a given question as appropriate for training. However, a single measure risks misjudging rationale quality, leading the models to learn flawed reasoning patterns. To address this issue, we propose CREST (Consistency-driven Rationale Evaluation for Self-Training), a self-training framework that further evaluates each rationale through follow-up questions and leverages this evaluation to guide its training. Specifically, we introduce two methods: (1) filtering out rationales that frequently result in incorrect answers on follow-up questions and (2) preference learning based on mixed preferences from rationale evaluation results of both original and follow-up questions. Experiments on three question-answering datasets using open LLMs show that CREST not only improves the logical robustness and correctness of rationales but also improves reasoning abilities compared to previous self-training approaches.

Updated: 2024-11-27 07:25:02

标题: 自我训练遇上一致性：通过一致性驱动的理由评估改善LLM的推理

摘要: 自我训练方法对大型语言模型（LLMs）进行训练，通过在自动生成的理由上训练模型来提高推理能力。先前的方法将产生正确答案的理由标记为适合训练。然而，单一的度量风险误判理由质量，导致模型学习到有缺陷的推理模式。为了解决这个问题，我们提出了CREST（自我训练的一致性驱动理由评估），这是一个自我训练框架，通过后续问题进一步评估每个理由，并利用这种评估来指导其训练。具体而言，我们引入了两种方法：（1）过滤出在后续问题中经常导致错误答案的理由，以及（2）基于原始问题和后续问题的理由评估结果的混合偏好进行偏好学习。使用开放LLMs对三个问答数据集进行实验表明，与先前的自我训练方法相比，CREST不仅提高了理由的逻辑稳健性和正确性，还提高了推理能力。

更新时间: 2024-11-27 07:25:02

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.06387v3

Simple Relative Deviation Bounds for Covariance and Gram Matrices

We provide non-asymptotic, relative deviation bounds for the eigenvalues of empirical covariance and gram matrices in general settings. Unlike typical uniform bounds, which may fail to capture the behavior of smaller eigenvalues, our results provide sharper control across the spectrum. Our analysis is based on a general-purpose theorem that allows one to convert existing uniform bounds into relative ones. The theorems and techniques emphasize simplicity and should be applicable across various settings.

Updated: 2024-11-27 07:22:55

标题: 协方差和格拉姆矩阵的简单相对偏差界限

摘要: 我们在一般情况下为经验协方差和格拉姆矩阵的特征值提供了非渐近的相对偏差界限。与通常的均匀界限不同，这些界限可能无法捕捉较小特征值的行为，我们的结果在整个频谱范围内提供了更准确的控制。我们的分析基于一个通用定理，允许将现有的均匀界限转化为相对界限。这些定理和技术强调简单性，并应适用于各种设置。

更新时间: 2024-11-27 07:22:55

领域: math.PR,cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2410.05754v2

ALPI: Auto-Labeller with Proxy Injection for 3D Object Detection using 2D Labels Only

3D object detection plays a crucial role in various applications such as autonomous vehicles, robotics and augmented reality. However, training 3D detectors requires a costly precise annotation, which is a hindrance to scaling annotation to large datasets. To address this challenge, we propose a weakly supervised 3D annotator that relies solely on 2D bounding box annotations from images, along with size priors. One major problem is that supervising a 3D detection model using only 2D boxes is not reliable due to ambiguities between different 3D poses and their identical 2D projection. We introduce a simple yet effective and generic solution: we build 3D proxy objects with annotations by construction and add them to the training dataset. Our method requires only size priors to adapt to new classes. To better align 2D supervision with 3D detection, our method ensures depth invariance with a novel expression of the 2D losses. Finally, to detect more challenging instances, our annotator follows an offline pseudo-labelling scheme which gradually improves its 3D pseudo-labels. Extensive experiments on the KITTI dataset demonstrate that our method not only performs on-par or above previous works on the Car category, but also achieves performance close to fully supervised methods on more challenging classes. We further demonstrate the effectiveness and robustness of our method by being the first to experiment on the more challenging nuScenes dataset. We additionally propose a setting where weak labels are obtained from a 2D detector pre-trained on MS-COCO instead of human annotations. The code is available at https://github.com/CEA-LIST/ALPI

Updated: 2024-11-27 07:22:47

标题: ALPI：使用仅2D标签进行3D物体检测的自动标签器和代理注入

摘要: 3D物体检测在自动驾驶车辆、机器人和增强现实等各种应用中起着至关重要的作用。然而，训练3D检测器需要昂贵的精确标注，这对将标注扩展到大型数据集构成了障碍。为解决这一挑战，我们提出了一种仅依赖于图像中的2D边界框标注和尺寸先验的弱监督3D标注器。一个主要问题是，仅使用2D边界框监督3D检测模型不可靠，因为不同3D姿态及其相同的2D投影之间存在歧义。我们引入了一个简单但有效且通用的解决方案：通过构建带有注释的3D代理对象，并将其添加到训练数据集中。我们的方法仅需要尺寸先验来适应新的类别。为了更好地将2D监督与3D检测对齐，我们的方法通过一种新颖的方式确保深度不变性。最后，为了检测更具挑战性的实例，我们的标注器采用离线伪标记方案，逐渐改进其3D伪标记。在KITTI数据集上进行的大量实验表明，我们的方法不仅在汽车类别上表现与之前的作品相当甚至更好，而且在更具挑战性的类别上也接近完全监督方法的性能。我们进一步通过首次在更具挑战性的nuScenes数据集上进行实验展示了我们方法的有效性和鲁棒性。我们还提出了一个设置，其中弱标签是从在MS-COCO上预训练的2D检测器中获得，而不是从人工标注中获得。该代码可在https://github.com/CEA-LIST/ALPI上找到。

更新时间: 2024-11-27 07:22:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.17197v3

Enhancing Signed Graph Neural Networks through Curriculum-Based Training

Signed graphs are powerful models for representing complex relations with both positive and negative connections. Recently, Signed Graph Neural Networks (SGNNs) have emerged as potent tools for analyzing such graphs. To our knowledge, no prior research has been conducted on devising a training plan specifically for SGNNs. The prevailing training approach feeds samples (edges) to models in a random order, resulting in equal contributions from each sample during the training process, but fails to account for varying learning difficulties based on the graph's structure. We contend that SGNNs can benefit from a curriculum that progresses from easy to difficult, similar to human learning. The main challenge is evaluating the difficulty of edges in a signed graph. We address this by theoretically analyzing the difficulty of SGNNs in learning adequate representations for edges in unbalanced cycles and propose a lightweight difficulty measurer. This forms the basis for our innovative Curriculum representation learning framework for Signed Graphs, referred to as CSG. The process involves using the measurer to assign difficulty scores to training samples, adjusting their order using a scheduler and training the SGNN model accordingly. We empirically our approach on six real-world signed graph datasets. Our method demonstrates remarkable results, enhancing the accuracy of popular SGNN models by up to 23.7% and showing a reduction of 8.4% in standard deviation, enhancing model stability.

Updated: 2024-11-27 07:15:44

标题: 通过基于课程的训练增强带符号图神经网络

摘要: Signed graphs are powerful models for representing complex relations with both positive and negative connections. Recently, Signed Graph Neural Networks (SGNNs) have emerged as potent tools for analyzing such graphs. To our knowledge, no prior research has been conducted on devising a training plan specifically for SGNNs. The prevailing training approach feeds samples (edges) to models in a random order, resulting in equal contributions from each sample during the training process, but fails to account for varying learning difficulties based on the graph's structure. We contend that SGNNs can benefit from a curriculum that progresses from easy to difficult, similar to human learning. The main challenge is evaluating the difficulty of edges in a signed graph. We address this by theoretically analyzing the difficulty of SGNNs in learning adequate representations for edges in unbalanced cycles and propose a lightweight difficulty measurer. This forms the basis for our innovative Curriculum representation learning framework for Signed Graphs, referred to as CSG. The process involves using the measurer to assign difficulty scores to training samples, adjusting their order using a scheduler and training the SGNN model accordingly. We empirically our approach on six real-world signed graph datasets. Our method demonstrates remarkable results, enhancing the accuracy of popular SGNN models by up to 23.7% and showing a reduction of 8.4% in standard deviation, enhancing model stability.

更新时间: 2024-11-27 07:15:44

领域: cs.LG

下载: http://arxiv.org/abs/2310.11083v2

Derivation of Closed Form of Expected Improvement for Gaussian Process Trained on Log-Transformed Objective

Expected Improvement (EI) is arguably the most widely used acquisition function in Bayesian optimization. However, it is often challenging to enhance the performance with EI due to its sensitivity to numerical precision. Previously, Hutter et al. (2009) tackled this problem by using Gaussian process trained on the log-transformed objective function and it was reported that this trick improves the predictive accuracy of GP, leading to substantially better performance. Although Hutter et al. (2009) offered the closed form of their EI, its intermediate derivation has not been provided so far. In this paper, we give a friendly derivation of their proposition.

Updated: 2024-11-27 07:13:41

标题: 在对目标进行对数转换的高斯过程训练中，预期改进的闭合形式的推导

摘要: 预期改进（EI）可以说是贝叶斯优化中最广泛使用的收购函数。然而，由于其对数值精度的敏感性，要提高EI的性能往往是具有挑战性的。之前，Hutter等人（2009年）通过在对数转换的目标函数上使用高斯过程来解决这个问题，并报告说这个技巧提高了GP的预测准确性，从而大大提高了性能。虽然Hutter等人（2009年）提供了他们的EI的闭合形式，但其中间推导迄今为止尚未提供。在本文中，我们提供了他们命题的友好推导。

更新时间: 2024-11-27 07:13:41

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2411.18095v1

Comprehensive Kernel Safety in the Spectre Era: Mitigations and Performance Evaluation (Extended Version)

The efficacy of address space layout randomization has been formally demonstrated in a shared-memory model by Abadi et al., contingent on specific assumptions about victim programs. However, modern operating systems, implementing layout randomization in the kernel, diverge from these assumptions and operate on a separate memory model with communication through system calls. In this work, we relax Abadi et al.'s language assumptions while demonstrating that layout randomization offers a comparable safety guarantee in a system with memory separation. However, in practice, speculative execution and side-channels are recognized threats to layout randomization. We show that kernel safety cannot be restored for attackers capable of using side-channels and speculative execution, and introduce enforcement mechanisms that can guarantee speculative kernel safety for safe system calls in the Spectre era. We implement two suitable mechanisms and we use them to compile the Linux kernel in order to evaluate their performance overhead.

Updated: 2024-11-27 07:06:28

标题: 在“幽灵”时代的全面内核安全：缓解措施和性能评估（扩展版）

摘要: Abadi等人在共享内存模型中正式证明了地址空间布局随机化的有效性，前提是对受害程序进行了特定的假设。然而，现代操作系统在内核中实现布局随机化时，与这些假设不同，并在通过系统调用进行通信的独立内存模型上运行。在这项工作中，我们放宽了Abadi等人对语言假设，同时证明了在具有内存分离的系统中，布局随机化提供了可比较的安全保证。然而，在实践中，投机执行和侧信道被认为是对布局随机化的威胁。我们展示了对于能够利用侧信道和投机执行的攻击者，无法恢复内核的安全性，并引入强制执行机制，可以确保在“幽灵”时代对安全系统调用进行投机内核安全性的保证。我们实施了两种合适的机制，并使用它们来编译Linux内核，以评估它们的性能开销。

更新时间: 2024-11-27 07:06:28

领域: cs.CR

下载: http://arxiv.org/abs/2411.18094v1

Towards More Accurate US Presidential Election via Multi-step Reasoning with Large Language Models

Can Large Language Models (LLMs) accurately predict election outcomes? While LLMs have demonstrated impressive performance in various domains, including healthcare, legal analysis, and creative tasks, their ability to forecast elections remains unknown. Election prediction poses unique challenges, such as limited voter-level data, rapidly changing political landscapes, and the need to model complex human behavior. To address these challenges, we introduce a multi-step reasoning framework designed for political analysis. Our approach is validated on real-world data from the American National Election Studies (ANES) 2016 and 2020, as well as synthetic personas generated by the leading machine learning framework, offering scalable datasets for voter behavior modeling. To capture temporal dynamics, we incorporate candidates' policy positions and biographical details, ensuring that the model adapts to evolving political contexts. Drawing on Chain of Thought prompting, our multi-step reasoning pipeline systematically integrates demographic, ideological, and time-dependent factors, enhancing the model's predictive power.

Updated: 2024-11-27 07:05:31

标题: 朝着更准确的美国总统选举：通过大型语言模型进行多步推理

摘要: 大型语言模型（LLMs）能够准确预测选举结果吗？虽然LLMs在各个领域展示了出色的表现，包括医疗保健、法律分析和创意任务，但它们对选举结果的预测能力仍然未知。选举预测面临独特的挑战，如有限的选民层次数据、政治格局快速变化以及需要对复杂的人类行为进行建模。为了解决这些挑战，我们引入了一个专为政治分析设计的多步推理框架。我们的方法在美国国家选举研究（ANES）2016年和2020年的真实数据以及由领先的机器学习框架生成的合成角色上得到验证，为选民行为建模提供可扩展的数据集。为了捕捉时间动态，我们将候选人的政策立场和个人资料细节纳入，确保模型适应不断变化的政治背景。借鉴Chain of Thought提示，我们的多步推理流程系统地整合了人口统计、意识形态和时间相关因素，增强了模型的预测能力。

更新时间: 2024-11-27 07:05:31

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.03321v2

MONOPOLY: Learning to Price Public Facilities for Revaluing Private Properties with Large-Scale Urban Data

The value assessment of private properties is an attractive but challenging task which is widely concerned by a majority of people around the world. A prolonged topic among us is ``\textit{how much is my house worth?}''. To answer this question, most experienced agencies would like to price a property given the factors of its attributes as well as the demographics and the public facilities around it. However, no one knows the exact prices of these factors, especially the values of public facilities which may help assess private properties. In this paper, we introduce our newly launched project ``Monopoly'' (named after a classic board game) in which we propose a distributed approach for revaluing private properties by learning to price public facilities (such as hospitals etc.) with the large-scale urban data we have accumulated via Baidu Maps. To be specific, our method organizes many points of interest (POIs) into an undirected weighted graph and formulates multiple factors including the virtual prices of surrounding public facilities as adaptive variables to parallelly estimate the housing prices we know. Then the prices of both public facilities and private properties can be iteratively updated according to the loss of prediction until convergence. We have conducted extensive experiments with the large-scale urban data of several metropolises in China. Results show that our approach outperforms several mainstream methods with significant margins. Further insights from more in-depth discussions demonstrate that the ``Monopoly'' is an innovative application in the interdisciplinary field of business intelligence and urban computing, and it will be beneficial to tens of millions of our users for investments and to the governments for urban planning as well as taxation.

Updated: 2024-11-27 06:44:41

标题: 《垄断：利用大规模城市数据学习为重新评估私人财产定价的公共设施》

摘要: 私人财产的价值评估是一个吸引人但具有挑战性的任务，被世界上大多数人广泛关注。我们之间一个长期的话题是“我的房子值多少钱？”为了回答这个问题，大多数经验丰富的机构希望根据其属性因素以及周围的人口统计数据和公共设施来定价一处财产。然而，没有人知道这些因素的确切价格，尤其是可能有助于评估私人财产的公共设施的价值。在本文中，我们介绍了我们新推出的项目“Monopoly”（取名自经典桌游），在这个项目中，我们提出了一种通过学习定价公共设施（如医院等）来重新评估私人财产的分布式方法，我们通过百度地图积累的大规模城市数据。具体来说，我们的方法将许多兴趣点（POIs）组织成一个无向加权图，并将包括周围公共设施的虚拟价格在内的多个因素形成自适应变量，以并行估算我们所知道的房价。然后，根据预测误差，可以迭代更新公共设施和私人财产的价格，直至收敛。我们对中国几个大都市的大规模城市数据进行了广泛实验。结果显示，我们的方法在几个主流方法上表现优越。更深入讨论的进一步见解表明，“Monopoly”是商业智能和城市计算跨学科领域中的创新应用，将有利于数千万用户进行投资，对政府进行城市规划和税收。

更新时间: 2024-11-27 06:44:41

领域: cs.AI,cs.SI

下载: http://arxiv.org/abs/2411.18085v1

Integrating Multi-Modal Input Token Mixer Into Mamba-Based Decision Models: Decision MetaMamba

Sequence modeling with State Space models (SSMs) has demonstrated performance surpassing that of Transformers in various tasks, raising expectations for their potential to outperform the Decision Transformer and its enhanced variants in offline reinforcement learning (RL). However, decision models based on Mamba, a state-of-the-art SSM, failed to achieve superior performance compared to these enhanced Decision Transformers. We hypothesize that this limitation arises from information loss during the selective scanning phase. To address this, we propose the Decision MetaMamba (DMM), which augments Mamba with a token mixer in its input layer. This mixer explicitly accounts for the multimodal nature of offline RL inputs, comprising state, action, and return-to-go. The DMM demonstrates improved performance while significantly reducing parameter count compared to prior models. Notably, similar performance gains were achieved using a simple linear token mixer, emphasizing the importance of preserving information from proximate time steps rather than the specific design of the token mixer itself. This novel modification to Mamba's input layer represents a departure from conventional timestamp-based encoding approaches used in Transformers. By enhancing performance of Mamba in offline RL, characterized by memory efficiency and fast inference, this work opens new avenues for its broader application in future RL research.

Updated: 2024-11-27 06:39:42

标题: 将多模态输入令牌混合器集成到基于Mamba的决策模型中：决策MetaMamba

摘要: 使用状态空间模型（SSMs）进行序列建模已经表现出优于变压器在各种任务中的性能，从而提高了人们对其在离线强化学习（RL）中超越决策变压器及其增强变体的潜力的期望。然而，基于Mamba的决策模型未能比这些增强的决策变压器获得卓越的性能。我们假设这种限制是在选择性扫描阶段中信息丢失所致。为了解决这个问题，我们提出了决策元Mamba（DMM），它在其输入层中增加了一个令牌混合器。这个混合器明确考虑了离线RL输入的多模态性质，包括状态、动作和返回值。与先前模型相比，DMM表现出了改善的性能，同时显著减少了参数数量。值得注意的是，使用简单的线性令牌混合器也能实现类似的性能增益，强调了保留来自相邻时间步的信息的重要性，而不是令牌混合器本身的具体设计。这种对Mamba输入层的新颖修改代表了一种与变压器中使用的传统时间戳编码方法不同的方法。通过增强Mamba在离线RL中的性能，该工作开辟了新的途径，使其在未来RL研究中更广泛地应用。

更新时间: 2024-11-27 06:39:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.10517v3

From Exploration to Revelation: Detecting Dark Patterns in Mobile Apps

Mobile apps are essential in daily life, yet they often employ dark patterns, such as visual tricks to highlight certain options or linguistic tactics to nag users into making purchases, to manipulate user behavior. Current research mainly uses manual methods to detect dark patterns, a process that is time-consuming and struggles to keep pace with continually updating and emerging apps. While some studies targeted at automated detection, they are constrained to static patterns and still necessitate manual app exploration. To bridge these gaps, we present AppRay, an innovative system that seamlessly blends task-oriented app exploration with automated dark pattern detection, reducing manual efforts. Our approach consists of two steps: First, we harness the commonsense knowledge of large language models for targeted app exploration, supplemented by traditional random exploration to capture a broader range of UI states. Second, we developed a static and dynamic dark pattern detector powered by a contrastive learning-based multi-label classifier and a rule-based refiner to perform detection. We contributed two datasets, AppRay-Dark and AppRay-Light, with 2,185 unique deceptive patterns (including 149 dynamic instances) across 18 types from 876 UIs and 871 benign UIs. These datasets cover both static and dynamic dark patterns while preserving UI relationships. Experimental results confirm that AppRay can efficiently explore the app and identify a wide range of dark patterns with great performance.

Updated: 2024-11-27 06:39:35

标题: 从探索到启示：检测移动应用中的黑暗模式

摘要: 移动应用在日常生活中至关重要，然而它们经常采用黑暗模式，比如视觉技巧来突出某些选项或语言策略来催促用户进行购买，以操纵用户行为。目前的研究主要使用手动方法检测黑暗模式，这是一个耗时且难以跟上不断更新和新出现的应用的过程。虽然一些研究致力于自动检测，但它们局限于静态模式，仍然需要手动应用探索。为了弥合这些差距，我们提出了AppRay，这是一个创新系统，将面向任务的应用探索与自动黑暗模式检测无缝融合，减少了手动工作。我们的方法包括两个步骤：首先，我们利用大型语言模型的常识知识进行有针对性的应用探索，辅以传统的随机探索来捕获更广泛的UI状态。其次，我们开发了一个静态和动态黑暗模式检测器，由对比学习为基础的多标签分类器和基于规则的精化器来进行检测。我们贡献了两个数据集，AppRay-Dark和AppRay-Light，包含来自876个UI和871个良性UI的2,185个独特的欺诈模式（包括149个动态实例）跨越18种类型。这些数据集涵盖了静态和动态的黑暗模式，同时保留了UI之间的关系。实验结果证实，AppRay能够高效地探索应用程序并识别各种黑暗模式，表现出色。

更新时间: 2024-11-27 06:39:35

领域: cs.SE,cs.AI,cs.HC,D.2; I.2; H.5

下载: http://arxiv.org/abs/2411.18084v1

Graph Neural Networks for Job Shop Scheduling Problems: A Survey

Job shop scheduling problems (JSSPs) represent a critical and challenging class of combinatorial optimization problems. Recent years have witnessed a rapid increase in the application of graph neural networks (GNNs) to solve JSSPs, albeit lacking a systematic survey of the relevant literature. This paper aims to thoroughly review prevailing GNN methods for different types of JSSPs and the closely related flow-shop scheduling problems (FSPs), especially those leveraging deep reinforcement learning (DRL). We begin by presenting the graph representations of various JSSPs, followed by an introduction to the most commonly used GNN architectures. We then review current GNN-based methods for each problem type, highlighting key technical elements such as graph representations, GNN architectures, GNN tasks, and training algorithms. Finally, we summarize and analyze the advantages and limitations of GNNs in solving JSSPs and provide potential future research opportunities. We hope this survey can motivate and inspire innovative approaches for more powerful GNN-based approaches in tackling JSSPs and other scheduling problems.

Updated: 2024-11-27 06:28:37

标题: 图神经网络用于作业车间调度问题：一项调查

摘要: 作业车间调度问题（JSSPs）代表了一类关键且具有挑战性的组合优化问题。近年来，图神经网络（GNNs）在解决JSSPs方面的应用迅速增加，尽管缺乏相关文献的系统调查。本文旨在全面审查不同类型JSSPs和与之紧密相关的流水车间调度问题（FSPs）的主流GNN方法，特别是利用深度强化学习（DRL）的方法。我们首先介绍各种JSSPs的图表示，然后介绍最常用的GNN架构。接着，我们审查当前针对每种问题类型的基于GNN的方法，重点介绍关键技术要素，如图表示、GNN架构、GNN任务和训练算法。最后，我们总结并分析了GNN在解决JSSPs中的优势和局限性，并提供潜在的未来研究机会。希望这份调查可以激励和启发创新方法，以更强大的基于GNN的方法来应对JSSPs和其他调度问题。

更新时间: 2024-11-27 06:28:37

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.14096v2

Enabling Adoption of Regenerative Agriculture through Soil Carbon Copilots

Mitigating climate change requires transforming agriculture to minimize environ mental impact and build climate resilience. Regenerative agricultural practices enhance soil organic carbon (SOC) levels, thus improving soil health and sequestering carbon. A challenge to increasing regenerative agriculture practices is cheaply measuring SOC over time and understanding how SOC is affected by regenerative agricultural practices and other environmental factors and farm management practices. To address this challenge, we introduce an AI-driven Soil Organic Carbon Copilot that automates the ingestion of complex multi-resolution, multi-modal data to provide large-scale insights into soil health and regenerative practices. Our data includes extreme weather event data (e.g., drought and wildfire incidents), farm management data (e.g., cropland information and tillage predictions), and SOC predictions. We find that integrating public data and specialized models enables large-scale, localized analysis for sustainable agriculture. In comparisons of agricultural practices across California counties, we find evidence that diverse agricultural activity may mitigate the negative effects of tillage; and that while extreme weather conditions heavily affect SOC, composting may mitigate SOC loss. Finally, implementing role-specific personas empowers agronomists, farm consultants, policymakers, and other stakeholders to implement evidence-based strategies that promote sustainable agriculture and build climate resilience.

Updated: 2024-11-27 06:25:09

标题: 通过土壤碳共同引导实现再生农业的采用

摘要: 应对气候变化需要转变农业，以减少环境影响并建立气候适应能力。再生农业实践提高土壤有机碳（SOC）水平，从而改善土壤健康并封存碳。增加再生农业实践的挑战在于廉价地长期测量SOC，并了解SOC如何受再生农业实践和其他环境因素以及农场管理实践的影响。为了解决这一挑战，我们引入了一种基于人工智能的土壤有机碳副驾驶员，自动化处理复杂的多分辨率、多模态数据，以提供关于土壤健康和再生实践的大规模见解。我们的数据包括极端天气事件数据（例如干旱和野火事件）、农场管理数据（例如耕地信息和耕作预测）以及SOC预测。我们发现，整合公共数据和专门模型能够实现可持续农业的大规模本地化分析。在加州各县农业实践的比较中，我们发现多样化的农业活动可能减轻耕作的负面影响；而极端天气条件严重影响SOC，但堆肥可能减轻SOC损失。最后，实施特定角色的人设赋予农学家、农场顾问、政策制定者和其他利益相关者实施基于证据的战略的权力，以促进可持续农业发展并建立气候适应能力。

更新时间: 2024-11-27 06:25:09

领域: cs.IR,cs.AI,cs.ET

下载: http://arxiv.org/abs/2411.16872v2

Heterophilic Graph Neural Networks Optimization with Causal Message-passing

In this work, we discover that causal inference provides a promising approach to capture heterophilic message-passing in Graph Neural Network (GNN). By leveraging cause-effect analysis, we can discern heterophilic edges based on asymmetric node dependency. The learned causal structure offers more accurate relationships among nodes. To reduce the computational complexity, we introduce intervention-based causal inference in graph learning. We first simplify causal analysis on graphs by formulating it as a structural learning model and define the optimization problem within the Bayesian scheme. We then present an analysis of decomposing the optimization target into a consistency penalty and a structure modification based on cause-effect relations. We then estimate this target by conditional entropy and present insights into how conditional entropy quantifies the heterophily. Accordingly, we propose CausalMP, a causal message-passing discovery network for heterophilic graph learning, that iteratively learns the explicit causal structure of input graphs. We conduct extensive experiments in both heterophilic and homophilic graph settings. The result demonstrates that the our model achieves superior link prediction performance. Training on causal structure can also enhance node representation in classification task across different base models.

Updated: 2024-11-27 06:12:01

标题: 异质性图神经网络优化及因果消息传递

摘要: 在这项工作中，我们发现因果推断为捕获图神经网络（GNN）中异质信息传递提供了一种有希望的方法。通过利用因果分析，我们可以根据不对称的节点依赖性区分异质边。学习到的因果结构提供了更准确的节点关系。为了降低计算复杂度，我们引入了基于干预的图学习中的因果推断。我们首先将图上的因果分析简化为一个结构学习模型，并在贝叶斯框架内定义了优化问题。然后，我们提出了将优化目标分解为一致性惩罚和基于因果关系的结构修改的分析。我们通过条件熵估计这个目标，并提供了条件熵如何量化异质性的见解。因此，我们提出了CausalMP，一种用于异质图学习的因果信息传递发现网络，该网络迭代地学习输入图的显式因果结构。我们在异质图和同质图设置中进行了大量实验。结果表明，我们的模型实现了更优越的链接预测性能。在因果结构上的训练还可以增强不同基础模型中分类任务中的节点表示。

更新时间: 2024-11-27 06:12:01

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2411.13821v2

Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache

How to efficiently serve LLMs in practice has become exceptionally challenging due to their prohibitive memory and computation requirements. In this study, we investigate optimizing the KV cache, whose memory footprint poses a critical bottleneck in LLM inference, especially when dealing with long context tasks. To tackle the challenge, we introduce MiniKV, a KV cache optimization method that simultaneously preserves long context task accuracy while significantly reducing KV cache size via a novel 2-bit layer-discriminative KV cache. More importantly, we develop specialized CUDA kernels to make MiniKV compatible with FlashAttention. Experiments on a wide range of long context tasks show that MiniKV effectively achieves 86% KV cache compression ratio while recovering over 98.5% of accuracy, outperforming state-of-the-art methods while achieving excellent measured system performance improvements.

Updated: 2024-11-27 06:10:49

标题: 通过2位层区分KV缓存推动LLM推理的极限

摘要: 在实践中如何高效地为LLMs提供服务已变得异常具有挑战性，因为它们对内存和计算资源的需求是禁锢的。在本研究中，我们研究了优化KV缓存的方法，其内存占用在LLM推断中构成了一个关键瓶颈，特别是在处理长上下文任务时。为了解决这一挑战，我们引入了MiniKV，一种KV缓存优化方法，通过一种新颖的2位层区分的KV缓存，同时保持长上下文任务的准确性，显著减小KV缓存大小。更重要的是，我们开发了专门的CUDA内核，使MiniKV与FlashAttention兼容。对一系列长上下文任务的实验表明，MiniKV有效地实现了86%的KV缓存压缩比，同时恢复了超过98.5%的准确性，优于最先进的方法，同时实现了优秀的系统性能改善。

更新时间: 2024-11-27 06:10:49

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.18077v1

CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density Forecasting

A crowd density forecasting task aims to predict how the crowd density map will change in the future from observed past crowd density maps. However, the past crowd density maps are often incomplete due to the miss-detection of pedestrians, and it is crucial to develop a robust crowd density forecasting model against the miss-detection. This paper presents a MAsked crowd density Completion framework for crowd density forecasting (CrowdMAC), which is simultaneously trained to forecast future crowd density maps from partially masked past crowd density maps (i.e., forecasting maps from past maps with miss-detection) while reconstructing the masked observation maps (i.e., imputing past maps with miss-detection). Additionally, we propose Temporal-Density-aware Masking (TDM), which non-uniformly masks tokens in the observed crowd density map, considering the sparsity of the crowd density maps and the informativeness of the subsequent frames for the forecasting task. Moreover, we introduce multi-task masking to enhance training efficiency. In the experiments, CrowdMAC achieves state-of-the-art performance on seven large-scale datasets, including SDD, ETH-UCY, inD, JRDB, VSCrowd, FDST, and croHD. We also demonstrate the robustness of the proposed method against both synthetic and realistic miss-detections. The code is released at https://fujiry0.github.io/CrowdMAC-project-page.

Updated: 2024-11-27 06:04:20

标题: CrowdMAC: 用于稳健人群密度预测的掩蔽人群密度补全

摘要: 人群密度预测任务旨在预测未来人群密度图如何从观察到的过去人群密度图中变化。然而，由于行人的误检测，过去的人群密度图通常是不完整的，因此开发一种针对误检测的稳健人群密度预测模型至关重要。本文提出了一种用于人群密度预测的MAsked人群密度完成框架（CrowdMAC），该框架同时训练以从部分蒙面的过去人群密度图中预测未来人群密度图（即，从带有误检测的过去图预测图），同时重建蒙面的观察图（即，用误检测填补过去图）。此外，我们提出了Temporal-Density-aware Masking（TDM），它在观察到的人群密度图中非均匀地蒙面标记，考虑到人群密度图的稀疏性和后续帧对于预测任务的信息量。此外，我们引入了多任务蒙面以增强训练效率。在实验中，CrowdMAC在包括SDD、ETH-UCY、inD、JRDB、VSCrowd、FDST和croHD在内的七个大规模数据集上实现了最先进的性能。我们还展示了所提出方法对合成和真实误检测的稳健性。代码发布在https://fujiry0.github.io/CrowdMAC-project-page。

更新时间: 2024-11-27 06:04:20

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.14725v3

DuMapper: Towards Automatic Verification of Large-Scale POIs with Street Views at Baidu Maps

With the increased popularity of mobile devices, Web mapping services have become an indispensable tool in our daily lives. To provide user-satisfied services, such as location searches, the point of interest (POI) database is the fundamental infrastructure, as it archives multimodal information on billions of geographic locations closely related to people's lives, such as a shop or a bank. Therefore, verifying the correctness of a large-scale POI database is vital. To achieve this goal, many industrial companies adopt volunteered geographic information (VGI) platforms that enable thousands of crowdworkers and expert mappers to verify POIs seamlessly; but to do so, they have to spend millions of dollars every year. To save the tremendous labor costs, we devised DuMapper, an automatic system for large-scale POI verification with the multimodal street-view data at Baidu Maps. DuMapper takes the signboard image and the coordinates of a real-world place as input to generate a low-dimensional vector, which can be leveraged by ANN algorithms to conduct a more accurate search through billions of archived POIs in the database for verification within milliseconds. It can significantly increase the throughput of POI verification by $50$ times. DuMapper has already been deployed in production since \DuMPOnline, which dramatically improves the productivity and efficiency of POI verification at Baidu Maps. As of December 31, 2021, it has enacted over $405$ million iterations of POI verification within a 3.5-year period, representing an approximate workload of $800$ high-performance expert mappers.

Updated: 2024-11-27 05:54:33

标题: DuMapper：在百度地图上利用街景自动验证大规模POI

摘要: 随着移动设备的普及，Web地图服务已成为我们日常生活中不可或缺的工具。为了提供用户满意的服务，如位置搜索，兴趣点（POI）数据库是基础设施，它存档了与人们生活密切相关的数十亿个地理位置的多模态信息，如商店或银行。因此，验证大规模POI数据库的正确性至关重要。为了实现这一目标，许多工业公司采用了能够让数千名众包工作者和专家制图人员无缝验证POI的志愿地理信息（VGI）平台；但为此，他们每年必须花费数百万美元。为了节省巨大的劳动成本，我们设计了DuMapper，这是一个利用百度地图的多模态街景数据进行大规模POI验证的自动系统。DuMapper以标志牌图像和真实世界地点的坐标作为输入，生成一个低维向量，可供ANN算法利用，通过数据库中数十亿个存档POI进行更准确的搜索以在毫秒内进行验证。它可以将POI验证的吞吐量显著提高50倍。DuMapper自\DuMPOnline以来已经投入生产，并显著提高了百度地图的POI验证的生产力和效率。截至2021年12月31日，在3.5年的时间内，它已执行了超过4.05亿次POI验证迭代，相当于800名高性能专家制图人员的工作量。

更新时间: 2024-11-27 05:54:33

领域: cs.AI,cs.IR

下载: http://arxiv.org/abs/2411.18073v1

Simulating Tabular Datasets through LLMs to Rapidly Explore Hypotheses about Real-World Entities

Do horror writers have worse childhoods than other writers? Though biographical details are known about many writers, quantitatively exploring such a qualitative hypothesis requires significant human effort, e.g. to sift through many biographies and interviews of writers and to iteratively search for quantitative features that reflect what is qualitatively of interest. This paper explores the potential to quickly prototype these kinds of hypotheses through (1) applying LLMs to estimate properties of concrete entities like specific people, companies, books, kinds of animals, and countries; (2) performing off-the-shelf analysis methods to reveal possible relationships among such properties (e.g. linear regression); and towards further automation, (3) applying LLMs to suggest the quantitative properties themselves that could help ground a particular qualitative hypothesis (e.g. number of adverse childhood events, in the context of the running example). The hope is to allow sifting through hypotheses more quickly through collaboration between human and machine. Our experiments highlight that indeed, LLMs can serve as useful estimators of tabular data about specific entities across a range of domains, and that such estimations improve with model scale. Further, initial experiments demonstrate the potential of LLMs to map a qualitative hypothesis of interest to relevant concrete variables that the LLM can then estimate. The conclusion is that LLMs offer intriguing potential to help illuminate scientifically interesting patterns latent within the internet-scale data they are trained upon.

Updated: 2024-11-27 05:48:44

标题: 使用LLMs模拟表格数据集，快速探索有关现实世界实体的假设

摘要: 恐怖作家的童年是否比其他作家更糟糕？尽管许多作家的生平细节已知，但定量地探讨这样一个定性假设需要大量的人力，例如筛选许多作家的传记和采访，然后迭代地搜索反映定性兴趣的定量特征。本文探讨了通过以下方法快速原型化这类假设的潜力：（1）应用LLMs估计具体实体的属性，如特定人物、公司、书籍、动物种类和国家；（2）执行现成分析方法来揭示这些属性之间可能的关系（例如线性回归）；并朝着进一步自动化的方向，（3）应用LLMs来建议可能有助于支持特定定性假设的定量属性（例如，在进行中的示例的背景中，不良童年事件的数量）。希望通过人机协作更快地筛选假设。我们的实验表明，LLMs确实可以作为对特定实体的表格数据的有用估计器，跨领域范围内的各种领域，这些估计随着模型规模的增加而改善。此外，初步实验展示了LLMs将感兴趣的定性假设映射到LLM随后可以估计的相关具体变量的潜力。结论是LLMs提供了有趣的潜力，可以帮助揭示它们所训练的互联网规模数据中潜在的科学有趣模式。

更新时间: 2024-11-27 05:48:44

领域: cs.AI

下载: http://arxiv.org/abs/2411.18071v1

Selective Classification Under Distribution Shifts

In selective classification (SC), a classifier abstains from making predictions that are likely to be wrong to avoid excessive errors. To deploy imperfect classifiers -- either due to intrinsic statistical noise of data or for robustness issue of the classifier or beyond -- in high-stakes scenarios, SC appears to be an attractive and necessary path to follow. Despite decades of research in SC, most previous SC methods still focus on the ideal statistical setting only, i.e., the data distribution at deployment is the same as that of training, although practical data can come from the wild. To bridge this gap, in this paper, we propose an SC framework that takes into account distribution shifts, termed generalized selective classification, that covers label-shifted (or out-of-distribution) and covariate-shifted samples, in addition to typical in-distribution samples, the first of its kind in the SC literature. We focus on non-training-based confidence-score functions for generalized SC on deep learning (DL) classifiers, and propose two novel margin-based score functions. Through extensive analysis and experiments, we show that our proposed score functions are more effective and reliable than the existing ones for generalized SC on a variety of classification tasks and DL classifiers. Code is available at https://github.com/sun-umn/sc_with_distshift.

Updated: 2024-11-27 05:48:34

标题: 分布偏移下的选择性分类

摘要: 在选择性分类（SC）中，分类器会避免进行可能错误的预测，以避免过多的错误。在高风险场景中部署不完美的分类器--无论是由于数据固有的统计噪声，还是由于分类器的鲁棒性问题或其他问题--SC似乎是一个吸引人且必要的路径。尽管在SC领域进行了几十年的研究，大多数先前的SC方法仍然只关注理想的统计设置，即部署时的数据分布与训练时相同，尽管实际数据可能来自野外。为了弥合这一差距，在本文中，我们提出了一个考虑分布转移的SC框架，称为广义选择性分类，该框架涵盖了标签转移（或超出分布）和协变量转移样本，除了典型的分布内样本外，这在SC文献中是首次出现的。我们关注基于深度学习（DL）分类器的广义SC的非训练型置信分数函数，并提出了两种新颖的基于边缘的得分函数。通过广泛的分析和实验，我们展示了我们提出的得分函数在各种分类任务和DL分类器的广义SC中比现有方法更有效和可靠。代码可在https://github.com/sun-umn/sc_with_distshift 上找到。

更新时间: 2024-11-27 05:48:34

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.05160v2

Large Scale Evaluation of Deep Learning-based Explainable Solar Flare Forecasting Models with Attribution-based Proximity Analysis

Accurate and reliable predictions of solar flares are essential due to their potentially significant impact on Earth and space-based infrastructure. Although deep learning models have shown notable predictive capabilities in this domain, current evaluations often focus on accuracy while neglecting interpretability and reliability--factors that are especially critical in operational settings. To address this gap, we propose a novel proximity-based framework for analyzing post hoc explanations to assess the interpretability of deep learning models for solar flare prediction. Our study compares two models trained on full-disk line-of-sight (LoS) magnetogram images to predict $\geq$M-class solar flares within a 24-hour window. We employ the Guided Gradient-weighted Class Activation Mapping (Guided Grad-CAM) method to generate attribution maps from these models, which we then analyze to gain insights into their decision-making processes. To support the evaluation of explanations in operational systems, we introduce a proximity-based metric that quantitatively assesses the accuracy and relevance of local explanations when regions of interest are known. Our findings indicate that the models' predictions align with active region characteristics to varying degrees, offering valuable insights into their behavior. This framework enhances the evaluation of model interpretability in solar flare forecasting and supports the development of more transparent and reliable operational systems.

Updated: 2024-11-27 05:43:34

标题: 基于属性接近性分析的深度学习可解释太阳耀斑预测模型的大规模评估

摘要: 太阳耀斑的准确可靠预测对地球和基于空间的基础设施具有潜在重要影响，因此至关重要。尽管深度学习模型在这个领域展现出显著的预测能力，但目前的评估往往侧重于准确性，而忽视了可解释性和可靠性——这些因素在操作设置中尤为关键。为了弥补这一差距，我们提出了一种新颖的基于接近性的框架，用于分析事后解释，以评估太阳耀斑预测的深度学习模型的可解释性。我们的研究比较了两个模型，它们是基于全盘视线（LoS）磁力图像进行训练的，以预测24小时内的≥M级太阳耀斑。我们采用了引导梯度加权类激活映射（Guided Grad-CAM）方法从这些模型生成归因图，然后对其进行分析，以获得对它们的决策过程的洞察。为了支持在操作系统中评估解释，我们引入了一种基于接近性的度量方法，定量评估在已知兴趣区域时本地解释的准确性和相关性。我们的研究结果表明，模型的预测与活动区特征在不同程度上一致，为了解它们的行为提供了宝贵的见解。这一框架增强了太阳耀斑预测模型的解释性评估，并支持更透明和可靠的操作系统的发展。

更新时间: 2024-11-27 05:43:34

领域: cs.LG,astro-ph.SR,cs.CV,stat.ML

下载: http://arxiv.org/abs/2411.18070v1

OpenMU: Your Swiss Army Knife for Music Understanding

We present OpenMU-Bench, a large-scale benchmark suite for addressing the data scarcity issue in training multimodal language models to understand music. To construct OpenMU-Bench, we leveraged existing datasets and bootstrapped new annotations. OpenMU-Bench also broadens the scope of music understanding by including lyrics understanding and music tool usage. Using OpenMU-Bench, we trained our music understanding model, OpenMU, with extensive ablations, demonstrating that OpenMU outperforms baseline models such as MU-Llama. Both OpenMU and OpenMU-Bench are open-sourced to facilitate future research in music understanding and to enhance creative music production efficiency.

Updated: 2024-11-27 05:43:19

标题: OpenMU：您的音乐理解瑞士军刀

摘要: 我们提出了OpenMU-Bench，这是一个用于解决训练多模态语言模型以理解音乐的数据稀缺问题的大规模基准套件。为了构建OpenMU-Bench，我们利用现有数据集并启动新的注释。OpenMU-Bench还通过包括歌词理解和音乐工具使用来扩展音乐理解的范围。使用OpenMU-Bench，我们对我们的音乐理解模型OpenMU进行了广泛的消融实验，证明OpenMU优于基线模型如MU-Llama。OpenMU和OpenMU-Bench均为开源，以促进未来音乐理解研究并提高创意音乐制作效率。

更新时间: 2024-11-27 05:43:19

领域: cs.SD,cs.AI,cs.CL,cs.MM,eess.AS

下载: http://arxiv.org/abs/2410.15573v3

PersonaCraft: Personalized Full-Body Image Synthesis for Multiple Identities from Single References Using 3D-Model-Conditioned Diffusion

Personalized image generation has been significantly advanced, enabling the creation of highly realistic and customized images. However, existing methods often struggle with generating images of multiple people due to occlusions and fail to accurately personalize full-body shapes. In this paper, we propose PersonaCraft, a novel approach that combines diffusion models with 3D human modeling to address these limitations. Our method effectively manages occlusions by incorporating 3D-aware pose conditioning with SMPLx-ControlNet and accurately personalizes human full-body shapes through SMPLx fitting. Additionally, PersonaCraft enables user-defined body shape adjustments, adding flexibility for individual body customization. Experimental results demonstrate the superior performance of PersonaCraft in generating high-quality, realistic images of multiple individuals while resolving occlusion issues, thus establishing a new standard for multi-person personalized image synthesis. Project page: https://gwang-kim.github.io/persona_craft

Updated: 2024-11-27 05:41:15

标题: PersonaCraft：使用3D模型条件扩散进行单一参考多个身份的个性化全身图像合成

摘要: 个性化图像生成已经取得了显著进展，使得能够创建高度逼真和定制的图像。然而，现有方法通常在生成多人图像时遇到遮挡问题，并且难以准确个性化全身形状。在本文中，我们提出了一种新方法PersonaCraft，该方法结合了扩散模型和3D人体建模，以解决这些限制。我们的方法通过将3D感知姿势条件与SMPLx-ControlNet结合，有效地处理遮挡，并通过SMPLx拟合准确地个性化人体全身形状。此外，PersonaCraft还允许用户定义身体形状调整，为个体身体定制增加了灵活性。实验结果表明，PersonaCraft在生成高质量、逼真的多个个体图像方面表现出色，同时解决了遮挡问题，从而确立了多人个性化图像合成的新标准。项目页面：https://gwang-kim.github.io/persona_craft

更新时间: 2024-11-27 05:41:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18068v1

Measuring Compliance of Consent Revocation on the Web

The GDPR requires websites to facilitate the right to revoke consent from Web users. While numerous studies measured compliance of consent with the various consent requirements, no prior work has studied consent revocation on the Web. Therefore, it remains unclear how difficult it is to revoke consent on the websites' interfaces, nor whether revoked consent is properly stored and communicated behind the user interface. Our work aims to fill this gap by measuring compliance of consent revocation on the Web on the top-200 websites. We found that 19.87% of websites make it difficult for users to revoke consent throughout different interfaces, 20.5% of websites require more effort than acceptance, and 2.48% do not provide consent revocation at all, thus violating legal requirements for valid consent. 57.5% websites do not delete the cookies after consent revocation enabling continuous illegal processing of users' data. Moreover, we analyzed 281 websites implementing the IAB Europe TCF, and found 22 websites that store a positive consent despite user's revocation. Surprisingly, we found that on 101 websites, third parties that have received consent upon user's acceptance, are not informed of user's revocation, leading to the illegal processing of users' data by such third parties. Our findings emphasise the need for improved legal compliance of consent revocation, and proper, consistent, and uniform implementation of revocation communication and data deletion practices.

Updated: 2024-11-27 05:40:51

标题: 在网页上测量同意撤销的合规性

摘要: 《GDPR要求网站为用户提供撤销同意的权利。虽然许多研究测量了同意与各种同意要求的遵从性，但之前没有研究研究了网络上的同意撤销。因此，目前尚不清楚在网站界面上撤销同意有多困难，撤销的同意是否得到妥善存储和传达。我们的研究旨在填补这一空白，通过测量网络上前200个网站的同意撤销遵从性。我们发现，19.87%的网站使用户通过不同界面难以撤销同意，20.5%的网站需要比接受更多的努力，2.48%根本不提供同意撤销，因此违反了有效同意的法律要求。57.5%的网站在同意撤销后不删除Cookie，从而持续非法处理用户数据。此外，我们分析了实施IAB欧洲TCF的281个网站，发现22个网站存储了积极的同意，尽管用户撤销了同意。令人惊讶的是，我们发现在101个网站上，接受用户同意的第三方并未被告知用户的撤销，导致这些第三方非法处理用户数据。我们的发现强调了改善同意撤销的法律遵从性的需求，以及对撤销通信和数据删除实践进行适当、一致和统一的实施的必要性。》

更新时间: 2024-11-27 05:40:51

领域: cs.CR

下载: http://arxiv.org/abs/2411.15414v2

AI-driven inverse design of materials: Past, present and future

The discovery of advanced materials is the cornerstone of human technological development and progress. The structures of materials and their corresponding properties are essentially the result of a complex interplay of multiple degrees of freedom such as lattice, charge, spin, symmetry, and topology. This poses significant challenges for the inverse design methods of materials. Humans have long explored new materials through a large number of experiments and proposed corresponding theoretical systems to predict new material properties and structures. With the improvement of computational power, researchers have gradually developed various electronic structure calculation methods, such as the density functional theory and high-throughput computational methods. Recently, the rapid development of artificial intelligence technology in the field of computer science has enabled the effective characterization of the implicit association between material properties and structures, thus opening up an efficient paradigm for the inverse design of functional materials. A significant progress has been made in inverse design of materials based on generative and discriminative models, attracting widespread attention from researchers. Considering this rapid technological progress, in this survey, we look back on the latest advancements in AI-driven inverse design of materials by introducing the background, key findings, and mainstream technological development routes. In addition, we summarize the remaining issues for future directions. This survey provides the latest overview of AI-driven inverse design of materials, which can serve as a useful resource for researchers.

Updated: 2024-11-27 05:20:40

标题: 人工智能驱动的材料逆向设计：过去、现在和未来

摘要: 先进材料的发现是人类技术发展和进步的基石。材料的结构及其相应的性质基本上是多个自由度（如晶格、电荷、自旋、对称性和拓扑）之间复杂相互作用的结果。这给材料的逆向设计方法提出了重大挑战。人类长期以来通过大量实验探索新材料，并提出相应的理论系统来预测新材料的性质和结构。随着计算能力的提高，研究人员逐渐发展了各种电子结构计算方法，如密度泛函理论和高通量计算方法。最近，在计算机科学领域，人工智能技术的快速发展使得材料性质和结构之间的隐性关联得以有效表征，从而开启了功能材料逆向设计的高效范式。基于生成式和判别式模型的材料逆向设计取得了显著进展，并吸引了广泛的研究关注。鉴于这一快速的技术进步，在本调查中，我们回顾了人工智能驱动的材料逆向设计的最新进展，介绍了背景、关键发现和主流技术发展路线。此外，我们总结了未来方向的剩余问题。本调查提供了人工智能驱动的材料逆向设计的最新概况，可作为研究人员的有用资源。

更新时间: 2024-11-27 05:20:40

领域: cond-mat.mtrl-sci,cond-mat.supr-con,cs.AI

下载: http://arxiv.org/abs/2411.09429v2

Mortality Prediction of Pulmonary Embolism Patients with Deep Learning and XGBoost

Pulmonary Embolism (PE) is a serious cardiovascular condition that remains a leading cause of mortality and critical illness, underscoring the need for enhanced diagnostic strategies. Conventional clinical methods have limited success in predicting 30-day in-hospital mortality of PE patients. In this study, we present a new algorithm, called PEP-Net, for 30-day mortality prediction of PE patients based on the initial imaging data (CT) that opportunistically integrates a 3D Residual Network (3DResNet) with Extreme Gradient Boosting (XGBoost) algorithm with patient level binary labels without annotations of the emboli and its extent. Our proposed system offers a comprehensive prediction strategy by handling class imbalance problems, reducing overfitting via regularization, and reducing the prediction variance for more stable predictions. PEP-Net was tested in a cohort of 193 volumetric CT scans diagnosed with Acute PE, and it demonstrated a superior performance by significantly outperforming baseline models (76-78\%) with an accuracy of 94.5\% (+/-0.3) and 94.0\% (+/-0.7) when the input image is either lung region (Lung-ROI) or heart region (Cardiac-ROI). Our results advance PE prognostics by using only initial imaging data, setting a new benchmark in the field. While purely deep learning models have become the go-to for many medical classification (diagnostic) tasks, combined ResNet and XGBoost models herein outperform sole deep learning models due to a potential reason for having lack of enough data.

Updated: 2024-11-27 05:15:55

标题: 使用深度学习和XGBoost对肺栓塞患者的死亡预测

摘要: 肺栓塞（PE）是一种严重的心血管疾病，仍然是死亡和危重疾病的主要原因，强调了需要增强诊断策略。传统的临床方法在预测PE患者30天住院死亡率方面成功有限。在这项研究中，我们提出了一种新的算法，称为PEP-Net，用于基于初始成像数据（CT）预测PE患者30天死亡率，该算法将3D残差网络（3DResNet）与极限梯度提升（XGBoost）算法机会地集成在一起，不需要栓子及其范围的注释。我们提出的系统通过处理类别不平衡问题、通过正则化减少过度拟合，并降低预测方差以获得更稳定的预测，提供了全面的预测策略。PEP-Net在一个诊断为急性PE的193例容积CT扫描队列中进行了测试，通过明显优于基线模型（76-78\%）的性能表现，当输入图像为肺部区域（Lung-ROI）或心脏区域（Cardiac-ROI）时，准确率分别为94.5\%（+/-0.3）和94.0\%（+/-0.7）。我们的结果通过仅使用初始成像数据推进了PE预后，为该领域设立了新的基准。尽管纯深度学习模型已经成为许多医学分类（诊断）任务的首选，但由于缺乏足够数据的潜在原因，本文中组合的ResNet和XGBoost模型胜过单一的深度学习模型。

更新时间: 2024-11-27 05:15:55

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.18063v1

PRSI: Privacy-Preserving Recommendation Model Based on Vector Splitting and Interactive Protocols

With the development of the internet, recommending interesting products to users has become a highly valuable research topic for businesses. Recommendation systems play a crucial role in addressing this issue. To prevent the leakage of each user's (client's) private data, Federated Recommendation Systems (FedRec) have been proposed and widely used. However, extensive research has shown that FedRec suffers from security issues such as data privacy leakage, and it is challenging to train effective models with FedRec when each client only holds interaction information for a single user. To address these two problems, this paper proposes a new privacy-preserving recommendation system (PRSI), which includes a preprocessing module and two main phases. The preprocessing module employs split vectors and fake interaction items to protect clients' interaction information and recommendation results. The two main phases are: (1) the collection of interaction information and (2) the sending of recommendation results. In the interaction information collection phase, each client uses the preprocessing module and random communication methods (according to the designed interactive protocol) to protect their ID information and IP addresses. In the recommendation results sending phase, the central server uses the preprocessing module and triplets to distribute recommendation results to each client under secure conditions, following the designed interactive protocol. Finally, we conducted multiple sets of experiments to verify the security, accuracy, and communication cost of the proposed method.

Updated: 2024-11-27 05:14:15

标题: PRSI：基于向量分割和交互协议的隐私保护推荐模型

摘要: 随着互联网的发展，向用户推荐有趣的产品已经成为企业的一个极具价值的研究课题。推荐系统在解决这个问题中起着至关重要的作用。为了防止每个用户（客户）的个人数据泄露，联邦推荐系统（FedRec）被提出并广泛应用。然而，广泛的研究表明FedRec存在数据隐私泄露等安全问题，而且当每个客户仅持有单个用户的交互信息时，使用FedRec训练有效模型也是具有挑战性的。为了解决这两个问题，本文提出了一种新的隐私保护推荐系统（PRSI），包括一个预处理模块和两个主要阶段。预处理模块采用分割向量和虚假交互项目来保护客户的交互信息和推荐结果。这两个主要阶段分别是：（1）交互信息收集和（2）推荐结果发送。在交互信息收集阶段，每个客户使用预处理模块和随机通信方法（根据设计的交互协议）来保护他们的ID信息和IP地址。在推荐结果发送阶段，中央服务器使用预处理模块和三元组在安全条件下将推荐结果分发给每个客户，遵循设计的交互协议。最后，我们进行了多组实验来验证所提方法的安全性、准确性和通信成本。

更新时间: 2024-11-27 05:14:15

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2411.18653v1

ORIS: Online Active Learning Using Reinforcement Learning-based Inclusive Sampling for Robust Streaming Analytics System

Effective labeled data collection plays a critical role in developing and fine-tuning robust streaming analytics systems. However, continuously labeling documents to filter relevant information poses significant challenges like limited labeling budget or lack of high-quality labels. There is a need for efficient human-in-the-loop machine learning (HITL-ML) design to improve streaming analytics systems. One particular HITL- ML approach is online active learning, which involves iteratively selecting a small set of the most informative documents for labeling to enhance the ML model performance. The performance of such algorithms can get affected due to human errors in labeling. To address these challenges, we propose ORIS, a method to perform Online active learning using Reinforcement learning-based Inclusive Sampling of documents for labeling. ORIS aims to create a novel Deep Q-Network-based strategy to sample incoming documents that minimize human errors in labeling and enhance the ML model performance. We evaluate the ORIS method on emotion recognition tasks, and it outperforms traditional baselines in terms of both human labeling performance and the ML model performance.

Updated: 2024-11-27 05:11:37

标题: ORIS：使用基于强化学习的包容抽样的在线主动学习，用于稳健的流式分析系统

摘要: 有效的标记数据收集在开发和优化强大的流式分析系统中起着至关重要的作用。然而，持续标记文档以过滤相关信息面临着诸如有限的标记预算或缺乏高质量标签等重大挑战。需要一种高效的人机协同机器学习（HITL-ML）设计来改善流式分析系统。一种特定的HITL-ML方法是在线主动学习，它涉及迭代选择一小组最具信息量的文档进行标记，以增强机器学习模型的性能。这种算法的性能可能会受到标记中的人为错误的影响。为了解决这些挑战，我们提出了ORIS，一种使用基于强化学习的含纳采样文档标记的在线主动学习方法。ORIS旨在创建一种新颖的基于深度Q网络的策略，对传入的文档进行采样，以最小化标记中的人为错误，并增强机器学习模型的性能。我们在情绪识别任务上评估了ORIS方法，结果表明它在人工标记性能和机器学习模型性能方面均优于传统基准线。

更新时间: 2024-11-27 05:11:37

领域: cs.LG

下载: http://arxiv.org/abs/2411.18060v1

Digital Twin-Centered Hybrid Data-Driven Multi-Stage Deep Learning Framework for Enhanced Nuclear Reactor Power Prediction

The accurate and efficient modeling of nuclear reactor transients is crucial for ensuring safe and optimal reactor operation. Traditional physics-based models, while valuable, can be computationally intensive and may not fully capture the complexities of real-world reactor behavior. This paper introduces a novel hybrid digital twin-focused multi-stage deep learning framework that addresses these limitations, offering a faster and more robust solution for predicting the final steady-state power of reactor transients. By leveraging a combination of feed-forward neural networks with both classification and regression stages, and training on a unique dataset that integrates real-world measurements of reactor power and controls state from the Missouri University of Science and Technology Reactor (MSTR) with noise-enhanced simulated data, our approach achieves remarkable accuracy (96% classification, 2.3% MAPE). The incorporation of simulated data with noise significantly improves the model's generalization capabilities, mitigating the risk of overfitting. Designed as a digital twin supporting system, this framework integrates real-time, synchronized predictions of reactor state transitions, enabling dynamic operational monitoring and optimization. This innovative solution not only enables rapid and precise prediction of reactor behavior but also has the potential to revolutionize nuclear reactor operations, facilitating enhanced safety protocols, optimized performance, and streamlined decision-making processes. By aligning data-driven insights with the principles of digital twins, this work lays the groundwork for adaptable and scalable solutions in nuclear system management.

Updated: 2024-11-27 05:02:36

标题: 数字孪生为中心的混合数据驱动多阶段深度学习框架，用于增强核反应堆功率预测

摘要: 核反应堆瞬态过程的准确和高效建模对于确保核反应堆安全和最佳运行至关重要。传统的基于物理的模型虽然有价值，但计算密集且可能无法完全捕捉现实世界反应堆行为的复杂性。本文介绍了一种新颖的混合数字孪生重点多阶段深度学习框架，解决了这些局限性，为预测反应堆瞬态的最终稳态功率提供了更快速和更稳健的解决方案。通过利用前馈神经网络与分类和回归阶段相结合，并在一个独特数据集上训练，该数据集整合了密苏里科技大学反应堆(MSTR)的反应堆功率和控制状态的真实测量数据以及增强噪声的模拟数据，我们的方法实现了显著的准确性(96%分类，2.3% MAPE)。将带有噪声的模拟数据纳入显著提高了模型的泛化能力，减少了过拟合的风险。作为一个支持数字孪生系统的框架，该框架整合了反应堆状态转换的实时、同步预测，实现了动态操作监测和优化。这一创新解决方案不仅实现了对反应堆行为的快速和精确预测，还有潜力革新核反应堆运行方式，促进增强安全协议、优化性能和简化决策流程。通过将数据驱动的见解与数字孪生原则相结合，这项工作为核系统管理提供了可适应和可扩展的解决方案奠定了基础。

更新时间: 2024-11-27 05:02:36

领域: stat.AP,cs.LG,stat.ML

下载: http://arxiv.org/abs/2211.13157v4

FAMES: Fast Approximate Multiplier Substitution for Mixed-Precision Quantized DNNs--Down to 2 Bits!

A widely-used technique in designing energy-efficient deep neural network (DNN) accelerators is quantization. Recent progress in this direction has reduced the bitwidths used in DNN down to 2. Meanwhile, many prior works apply approximate multipliers (AppMuls) in designing DNN accelerators to lower their energy consumption. Unfortunately, these works still assume a bitwidth much larger than 2, which falls far behind the state-of-the-art in quantization area and even challenges the meaningfulness of applying AppMuls in DNN accelerators, since a high-bitwidth AppMul consumes much more energy than a low-bitwidth exact multiplier! Thus, an important problem to study is: Can approximate multipliers be effectively applied to quantized DNN models with very low bitwidths? In this work, we give an affirmative answer to this question and present a systematic solution that achieves the answer: FAMES, a fast approximate multiplier substitution method for mixed-precision DNNs. Our experiments demonstrate an average 28.67% energy reduction on state-of-the-art mixed-precision quantized models with bitwidths as low as 2 bits and accuracy losses kept under 1%. Additionally, our approach is up to 300x faster than previous genetic algorithm-based methods.

Updated: 2024-11-27 04:58:10

标题: FAMES：用于混合精度量化的快速近似乘法器替代——降至2位！

摘要: 在设计能效深度神经网络（DNN）加速器中广泛使用的技术是量化。最近在这方面取得的进展已将DNN中使用的比特宽度降低到2位。同时，许多先前的研究在设计DNN加速器时应用了近似乘法器（AppMuls）以降低能耗。不幸的是，这些研究仍然假设比特宽度远大于2，这远远落后于量化领域的最新技术水平，甚至挑战了在DNN加速器中应用AppMuls的意义，因为高比特宽度的AppMul消耗的能量比低比特宽度的精确乘法器要多得多！因此，一个重要的研究问题是：近似乘法器是否可以有效地应用于具有非常低比特宽度的量化DNN模型？在这项工作中，我们对这个问题给出了肯定的答案，并提出了一个系统解决方案来实现这个答案：FAMES，一种用于混合精度DNN的快速近似乘法器替代方法。我们的实验表明，在比特宽度仅为2位且精度损失保持在1%以下的最新混合精度量化模型上，能量平均减少了28.67％。此外，我们的方法比先前基于遗传算法的方法快300倍。

更新时间: 2024-11-27 04:58:10

领域: cs.LG,cs.ET

下载: http://arxiv.org/abs/2411.18055v1

Using different sources of ground truths and transfer learning to improve the generalization of photometric redshift estimation

In this work, we explore methods to improve galaxy redshift predictions by combining different ground truths. Traditional machine learning models rely on training sets with known spectroscopic redshifts, which are precise but only represent a limited sample of galaxies. To make redshift models more generalizable to the broader galaxy population, we investigate transfer learning and directly combining ground truth redshifts derived from photometry and spectroscopy. We use the COSMOS2020 survey to create a dataset, TransferZ, which includes photometric redshift estimates derived from up to 35 imaging filters using template fitting. This dataset spans a wider range of galaxy types and colors compared to spectroscopic samples, though its redshift estimates are less accurate. We first train a base neural network on TransferZ and then refine it using transfer learning on a dataset of galaxies with more precise spectroscopic redshifts (GalaxiesML). In addition, we train a neural network on a combined dataset of TransferZ and GalaxiesML. Both methods reduce bias by $\sim$ 5x, RMS error by $\sim$ 1.5x, and catastrophic outlier rates by 1.3x on GalaxiesML, compared to a baseline trained only on TransferZ. However, we also find a reduction in performance for RMS and bias when evaluated on TransferZ data. Overall, our results demonstrate these approaches can meet cosmological requirements.

Updated: 2024-11-27 04:55:37

标题: 利用不同来源的地面真相和迁移学习来提高光度红移估计的泛化能力

摘要: 在这项工作中，我们探讨了通过结合不同的基本事实来改进星系红移预测的方法。传统的机器学习模型依赖于具有已知光谱红移的训练集，这些红移精确但只代表了有限的星系样本。为了使红移模型更具普适性，我们研究了迁移学习和直接结合从光度学和光谱学中得出的基本事实红移的方法。我们使用COSMOS2020调查创建了一个数据集TransferZ，其中包括使用模板拟合从高达35个成像滤波器中得出的光度红移估计。与光谱样本相比，这个数据集涵盖了更广泛的星系类型和颜色，尽管其红移估计不太准确。我们首先在TransferZ上训练一个基本神经网络，然后使用更精确的光谱红移数据集（GalaxiesML）进行迁移学习来对其进行改进。此外，我们在TransferZ和GalaxiesML的组合数据集上训练了一个神经网络。与仅在TransferZ上训练的基线相比，这两种方法在GalaxiesML上将偏差减少了约5倍，均方根误差减少了约1.5倍，灾难性异常值率减少了1.3倍。然而，我们在TransferZ数据上评估时也发现了均方根误差和偏差性能的降低。总体而言，我们的结果表明这些方法可以满足宇宙学要求。

更新时间: 2024-11-27 04:55:37

领域: astro-ph.IM,astro-ph.GA,cs.LG

下载: http://arxiv.org/abs/2411.18054v1

Faster Accelerated First-order Methods for Convex Optimization with Strongly Convex Function Constraints

In this paper, we introduce faster accelerated primal-dual algorithms for minimizing a convex function subject to strongly convex function constraints. Prior to our work, the best complexity bound was $\mathcal{O}(1/{\varepsilon})$, regardless of the strong convexity of the constraint function. It is unclear whether the strong convexity assumption can enable even better convergence results. To address this issue, we have developed novel techniques to progressively estimate the strong convexity of the Lagrangian function. Our approach, for the first time, effectively leverages the constraint strong convexity, obtaining an improved complexity of $\mathcal{O}(1/\sqrt{\varepsilon})$. This rate matches the complexity lower bound for strongly-convex-concave saddle point optimization and is therefore order-optimal. We show the superior performance of our methods in sparsity-inducing constrained optimization, notably Google's personalized PageRank problem. Furthermore, we show that a restarted version of the proposed methods can effectively identify the optimal solution's sparsity pattern within a finite number of steps, a result that appears to have independent significance.

Updated: 2024-11-27 04:54:18

标题: 更快的加速一阶方法用于带有强凸函数约束的凸优化

摘要: 在这篇论文中，我们介绍了一种更快的加速原始-对偶算法，用于在强凸函数约束条件下最小化凸函数。在我们的工作之前，最佳复杂度界限是$\mathcal{O}(1/{\varepsilon})$，无论约束函数的强凸性如何。尚不清楚强凸性假设是否能够实现更好的收敛结果。为了解决这个问题，我们开发了新颖的技术，逐渐估计拉格朗日函数的强凸性。我们的方法首次有效地利用了约束的强凸性，获得了改进的复杂度$\mathcal{O}(1/\sqrt{\varepsilon})$。这个速率与强凸-凹鞍点优化的复杂度下界相匹配，因此是最优的。我们展示了我们的方法在诱导稀疏的约束优化中的出色性能，尤其是在谷歌的个性化PageRank问题中。此外，我们展示了所提出方法的重新启动版本可以有效地在有限步数内识别出最优解的稀疏模式，这一结果似乎具有独立意义。

更新时间: 2024-11-27 04:54:18

领域: math.OC,cs.LG,90C25, 90C30, 90C06

下载: http://arxiv.org/abs/2212.11143v4

EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos

Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical phase recognition for open surgery understudied. This discrepancy is primarily attributed to the scarcity of publicly available open surgery video datasets for surgical phase recognition. To address this issue, we introduce a new egocentric open surgery video dataset for phase recognition, named EgoSurgery-Phase. This dataset comprises 15 hours of real open surgery videos spanning 9 distinct surgical phases all captured using an egocentric camera attached to the surgeon's head. In addition to video, the EgoSurgery-Phase offers eye gaze. As far as we know, it is the first real open surgery video dataset for surgical phase recognition publicly available. Furthermore, inspired by the notable success of masked autoencoders (MAEs) in video understanding tasks (e.g., action recognition), we propose a gaze-guided masked autoencoder (GGMAE). Considering the regions where surgeons' gaze focuses are often critical for surgical phase recognition (e.g., surgical field), in our GGMAE, the gaze information acts as an empirical semantic richness prior to guiding the masking process, promoting better attention to semantically rich spatial regions. GGMAE significantly improves the previous state-of-the-art recognition method (6.4% in Jaccard) and the masked autoencoder-based method (3.1% in Jaccard) on EgoSurgery-Phase. The dataset is released at https://github.com/Fujiry0/EgoSurgery.

Updated: 2024-11-27 04:52:51

标题: 自我手术阶段：来自自我中心开放手术视频的手术阶段识别数据集

摘要: 手术阶段识别因其潜力能够为现代手术室的众多需求提供解决方案而受到重视。然而，大多数现有方法集中在微创手术上，而对于开放手术的手术阶段识别研究较少。这种差异主要归因于公开可用的开放手术视频数据集稀缺。为解决这一问题，我们引入了一个新的针对阶段识别的主观开放手术视频数据集，名为EgoSurgery-Phase。该数据集包括15小时的真实开放手术视频，涵盖了9个不同的手术阶段，所有视频都是使用连接到外科医生头部的主观摄像头拍摄的。除视频外，EgoSurgery-Phase还提供眼神凝视信息。据我们所知，这是第一个公开可用的用于手术阶段识别的真实开放手术视频数据集。此外，受到遮蔽自动编码器（MAEs）在视频理解任务（如动作识别）中的显著成功的启发，我们提出了一种注视引导的遮蔽自动编码器（GGMAE）。考虑到外科医生眼神凝视的重点区域通常对手术阶段识别（例如手术场地）至关重要，在我们的GGMAE中，眼神信息作为一个经验性的语义丰富度先验来引导遮蔽过程，促进更好地关注语义丰富的空间区域。GGMAE在EgoSurgery-Phase上显著提高了先前的最先进识别方法（Jaccard指数提高了6.4%）和基于遮蔽自动编码器的方法（Jaccard指数提高了3.1%）。该数据集已发布在https://github.com/Fujiry0/EgoSurgery。

更新时间: 2024-11-27 04:52:51

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19644v3

Single-cell Curriculum Learning-based Deep Graph Embedding Clustering

The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies enables the investigation of cellular-level tissue heterogeneity. Cell annotation significantly contributes to the extensive downstream analysis of scRNA-seq data. However, The analysis of scRNA-seq for biological inference presents challenges owing to its intricate and indeterminate data distribution, characterized by a substantial volume and a high frequency of dropout events. Furthermore, the quality of training samples varies greatly, and the performance of the popular scRNA-seq data clustering solution GNN could be harmed by two types of low-quality training nodes: 1) nodes on the boundary; 2) nodes that contribute little additional information to the graph. To address these problems, we propose a single-cell curriculum learning-based deep graph embedding clustering (scCLG). We first propose a Chebyshev graph convolutional autoencoder with multi-criteria (ChebAE) that combines three optimization objectives, including topology reconstruction loss of cell graphs, zero-inflated negative binomial (ZINB) loss, and clustering loss, to learn cell-cell topology representation. Meanwhile, we employ a selective training strategy to train GNN based on the features and entropy of nodes and prune the difficult nodes based on the difficulty scores to keep the high-quality graph. Empirical results on a variety of gene expression datasets show that our model outperforms state-of-the-art methods. The code of scCLG will be made publicly available at https://github.com/LFD-byte/scCLG.

Updated: 2024-11-27 04:46:17

标题: 基于单细胞课程学习的深度图嵌入聚类

摘要: 单细胞RNA测序（scRNA-seq）技术的迅速发展使得细胞水平组织异质性的研究成为可能。细胞注释对于scRNA-seq数据的广泛下游分析起着重要作用。然而，由于其复杂和不确定的数据分布，以及大量和高频的dropout事件，scRNA-seq数据的生物推断分析存在挑战。此外，训练样本的质量差异很大，并且流行的scRNA-seq数据聚类解决方案GNN的性能可能会受到两种低质量训练节点的影响：1）边界上的节点；2）对图中增加的信息贡献很少的节点。为了解决这些问题，我们提出了基于单细胞课程学习的深度图嵌入聚类（scCLG）。我们首先提出了一个结合了三个优化目标的Chebyshev图卷积自编码器（ChebAE），包括细胞图的拓扑重建损失、零膨胀负二项分布（ZINB）损失和聚类损失，以学习细胞-细胞拓扑表示。同时，我们采用一种选择性训练策略，基于节点的特征和熵来训练GNN，并根据困难分数修剪困难节点，以保持高质量的图。对各种基因表达数据集的实证结果表明，我们的模型优于最先进的方法。scCLG的代码将在https://github.com/LFD-byte/scCLG 上公开。

更新时间: 2024-11-27 04:46:17

领域: cs.LG,cs.AI,q-bio.GN

下载: http://arxiv.org/abs/2408.10511v3

FreezeAsGuard: Mitigating Illegal Adaptation of Diffusion Models via Selective Tensor Freezing

Text-to-image diffusion models can be fine-tuned in custom domains to adapt to specific user preferences, but such adaptability has also been utilized for illegal purposes, such as forging public figures' portraits, duplicating copyrighted artworks and generating explicit contents. Existing work focused on detecting the illegally generated contents, but cannot prevent or mitigate illegal adaptations of diffusion models. Other schemes of model unlearning and reinitialization, similarly, cannot prevent users from relearning the knowledge of illegal model adaptation with custom data. In this paper, we present FreezeAsGuard, a new technique that addresses these limitations and enables irreversible mitigation of illegal adaptations of diffusion models. Our approach is that the model publisher selectively freezes tensors in pre-trained diffusion models that are critical to illegal model adaptations, to mitigate the fine-tuned model's representation power in illegal adaptations, but minimize the impact on other legal adaptations. Experiment results in multiple text-to-image application domains show that FreezeAsGuard provides 37% stronger power in mitigating illegal model adaptations compared to competitive baselines, while incurring less than 5% impact on legal model adaptations. The source code is available at: https://github.com/pittisl/FreezeAsGuard.

Updated: 2024-11-27 04:43:01

标题: FreezeAsGuard: 通过选择性张量冻结调节扩散模型的非法调整

摘要: 文本到图像扩散模型可以在定制领域进行微调，以适应特定用户偏好，但这种适应性也被用于非法目的，例如伪造公众人物的肖像、复制受版权保护的艺术作品和生成明确内容。现有工作侧重于检测非法生成的内容，但无法阻止或减轻扩散模型的非法适应。类似地，其他模型遗忘和重初始化方案也无法阻止用户通过定制数据重新学习非法模型适应的知识。本文介绍了一种新技术FreezeAsGuard，它解决了这些限制，并实现了对扩散模型的非法适应的不可逆缓解。我们的方法是，模型发布者选择性地冻结预训练扩散模型中对非法模型适应至关重要的张量，以减轻经过微调的模型在非法适应中的表征能力，但最大程度地减少对其他合法适应的影响。在多个文本到图像应用领域的实验结果显示，与竞争基线相比，FreezeAsGuard提供了37%更强的非法模型适应缓解能力，同时对合法模型适应的影响不到5%。源代码可在https://github.com/pittisl/FreezeAsGuard获取。

更新时间: 2024-11-27 04:43:01

领域: cs.LG,cs.AI,cs.CR,cs.CV

下载: http://arxiv.org/abs/2405.17472v2

Federated Learning for Time-Series Healthcare Sensing with Incomplete Modalities

Many healthcare sensing applications utilize multimodal time-series data from sensors embedded in mobile and wearable devices. Federated Learning (FL), with its privacy-preserving advantages, is particularly well-suited for health applications. However, most multimodal FL methods assume the availability of complete modality data for local training, which is often unrealistic. Moreover, recent approaches tackling incomplete modalities scale poorly and become inefficient as the number of modalities increases. To address these limitations, we propose FLISM, an efficient FL training algorithm with incomplete sensing modalities while maintaining high accuracy. FLISM employs three key techniques: (1) modality-invariant representation learning to extract effective features from clients with a diverse set of modalities, (2) modality quality-aware aggregation to prioritize contributions from clients with higher-quality modality data, and (3) global-aligned knowledge distillation to reduce local update shifts caused by modality differences. Extensive experiments on real-world datasets show that FLISM not only achieves high accuracy but is also faster and more efficient compared with state-of-the-art methods handling incomplete modality problems in FL. We release the code as open-source at https://github.com/AdibaOrz/FLISM.

Updated: 2024-11-27 04:41:17

标题: 使用不完整模态的时间序列医疗传感联邦学习

摘要: 许多医疗传感应用利用内置于移动和可穿戴设备中的多模态时间序列数据。联邦学习（FL）以其保护隐私的优势特别适用于健康应用。然而，大多数多模态FL方法假定本地训练需要完整的模态数据，这通常是不现实的。此外，最近处理不完整模态的方法随着模态数量的增加而扩展能力不佳，效率低下。为了解决这些限制，我们提出了FLISM，一种高效的FL训练算法，具有不完整传感模态，同时保持高准确性。FLISM采用三种关键技术：（1）模态不变表示学习，从具有不同模态集的客户中提取有效特征，（2）模态质量感知聚合，优先考虑具有更高质量模态数据的客户的贡献，以及（3）全局对齐知识蒸馏，减少由模态差异引起的本地更新偏移。对真实世界数据集的广泛实验表明，FLISM不仅实现了高准确性，而且与处理FL中不完整模态问题的最先进方法相比，速度更快，效率更高。我们将代码作为开源发布在https://github.com/AdibaOrz/FLISM。

更新时间: 2024-11-27 04:41:17

领域: cs.LG

下载: http://arxiv.org/abs/2405.11828v2

RL for Mitigating Cascading Failures: Targeted Exploration via Sensitivity Factors

Electricity grid's resiliency and climate change strongly impact one another due to an array of technical and policy-related decisions that impact both. This paper introduces a physics-informed machine learning-based framework to enhance grid's resiliency. Specifically, when encountering disruptive events, this paper designs remedial control actions to prevent blackouts. The proposed Physics-Guided Reinforcement Learning (PG-RL) framework determines effective real-time remedial line-switching actions, considering their impact on power balance, system security, and grid reliability. To identify an effective blackout mitigation policy, PG-RL leverages power-flow sensitivity factors to guide the RL exploration during agent training. Comprehensive evaluations using the Grid2Op platform demonstrate that incorporating physical signals into RL significantly improves resource utilization within electric grids and achieves better blackout mitigation policies - both of which are critical in addressing climate change.

Updated: 2024-11-27 04:34:31

标题: 强化学习用于减轻串联故障：通过敏感因素进行有针对性的探索

摘要: 电网的弹性和气候变化之间存在着强烈的相互影响，这是由于一系列技术和政策决策同时影响到两者。本文引入了一个基于物理学的机器学习框架来增强电网的弹性。具体来说，在遇到破坏性事件时，本文设计了补救控制措施以防止停电。提出的物理引导强化学习（PG-RL）框架确定了有效的实时补救线路切换行动，考虑了它们对电力平衡、系统安全和电网可靠性的影响。为了确定一个有效的停电缓解政策，PG-RL利用功率流敏感因子来引导RL在代理训练过程中的探索。使用Grid2Op平台进行的全面评估表明，将物理信号整合到RL中显著提高了电网内资源利用率，并实现了更好的停电缓解政策 - 这两者在应对气候变化方面都至关重要。

更新时间: 2024-11-27 04:34:31

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.18050v1

EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos

Surgical tool detection is a fundamental task for understanding egocentric open surgery videos. However, detecting surgical tools presents significant challenges due to their highly imbalanced class distribution, similar shapes and similar textures, and heavy occlusion. The lack of a comprehensive large-scale dataset compounds these challenges. In this paper, we introduce EgoSurgery-Tool, an extension of the existing EgoSurgery-Phase dataset, which contains real open surgery videos captured using an egocentric camera attached to the surgeon's head, along with phase annotations. EgoSurgery-Tool has been densely annotated with surgical tools and comprises over 49K surgical tool bounding boxes across 15 categories, constituting a large-scale surgical tool detection dataset. EgoSurgery-Tool also provides annotations for hand detection with over 46K hand-bounding boxes, capturing hand-object interactions that are crucial for understanding activities in egocentric open surgery. EgoSurgery-Tool is superior to existing datasets due to its larger scale, greater variety of surgical tools, more annotations, and denser scenes. We conduct a comprehensive analysis of EgoSurgery-Tool using nine popular object detectors to assess their effectiveness in both surgical tool and hand detection. The dataset will be released at https://github.com/Fujiry0/EgoSurgery.

Updated: 2024-11-27 04:30:46

标题: 自我手术工具：来自自我中心开放手术视频的手术工具和手部检测数据集

摘要: 手术工具检测是理解自我中心开放手术视频的基本任务。然而，由于手术工具的类别分布不平衡、形状相似、纹理相似以及严重遮挡等因素，检测手术工具存在显著挑战。缺乏全面大规模的数据集进一步加剧了这些挑战。本文介绍了EgoSurgery-Tool，这是现有EgoSurgery-Phase数据集的扩展，包含使用连接到外科医生头部的自我中心摄像头拍摄的真实开放手术视频，以及阶段注释。EgoSurgery-Tool已经密集注释了手术工具，涵盖了15个类别的超过49,000个手术工具边界框，构成一个大规模手术工具检测数据集。EgoSurgery-Tool还提供了手部检测的注释，包括超过46,000个手部边界框，捕捉了理解自我中心开放手术活动中关键的手部-物体交互。由于规模更大、手术工具种类更多、注释更多、场景更密集，EgoSurgery-Tool优于现有数据集。我们使用九种流行的目标检测器对EgoSurgery-Tool进行全面分析，评估它们在手术工具和手部检测方面的有效性。该数据集将在https://github.com/Fujiry0/EgoSurgery发布。

更新时间: 2024-11-27 04:30:46

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.03095v4

Temporal Reversed Training for Spiking Neural Networks with Generalized Spatio-Temporal Representation

Spiking neural networks (SNNs) have received widespread attention as an ultra-low power computing paradigm. Recent studies have focused on improving the feature extraction capability of SNNs, but they suffer from inefficient inference and suboptimal performance. In this paper, we propose a simple yet effective temporal reversed training (TRT) method to optimize the spatio-temporal performance of SNNs and circumvent these problems. We perturb the input temporal data by temporal reversal, prompting the SNN to produce original-reversed consistent outputs and to learn perturbation-invariant representations. For static data without temporal dimension, we generalize this strategy by exploiting the inherent temporal property of SNNs for spike feature temporal reversal. In addition, we utilize the lightweight ``star operation" (element-wise multiplication) to hybridize the original and temporally reversed spike firing rates and expand the implicit dimensions, which serves as spatio-temporal regularization to further enhance the generalization of the SNN. Our method involves only a temporal reversal operation and element-wise multiplication during training, thus incurring negligible training overhead and not affecting the inference efficiency at all. Extensive experiments on static/neuromorphic object/action recognition, and 3D point cloud classification tasks demonstrate the effectiveness and generalizability of our method. In particular, with only two timesteps, our method achieves 74.77\% and 90.57\% accuracy on ImageNet and ModelNet40, respectively.

Updated: 2024-11-27 04:25:26

标题: 时序反向训练用于具有广义时空表示的脉冲神经网络

摘要: 脉冲神经网络（SNNs）作为一种超低功耗的计算范式，受到了广泛关注。最近的研究集中在提高SNNs的特征提取能力，但它们存在推理效率低和性能亚优的问题。在本文中，我们提出了一种简单而有效的时间反转训练（TRT）方法，以优化SNNs的时空性能并规避这些问题。我们通过时间反转扰动输入时间数据，促使SNN产生原始-反转一致的输出并学习扰动不变表示。对于没有时间维度的静态数据，我们通过利用SNNs的固有时间属性进行脉冲特征时间反转来概括这种策略。此外，我们利用轻量级的“星操作”（逐元素乘法）来混合原始和时间反转的脉冲发射率，并扩展隐式维度，这作为时空正则化进一步增强了SNN的泛化能力。我们的方法在训练过程中仅涉及时间反转操作和逐元素乘法，因此产生可忽略的训练开销，并且完全不影响推理效率。对静态/神经形态对象/动作识别以及3D点云分类任务的大量实验证明了我们方法的有效性和泛化能力。特别地，仅使用两个时间步，我们的方法在ImageNet和ModelNet40上分别达到了74.77％和90.57％的准确率。

更新时间: 2024-11-27 04:25:26

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.09108v2

Heterogeneous Relationships of Subjects and Shapelets for Semi-supervised Multivariate Series Classification

Multivariate time series (MTS) classification is widely applied in fields such as industry, healthcare, and finance, aiming to extract key features from complex time series data for accurate decision-making and prediction. However, existing methods for MTS often struggle due to the challenges of effectively modeling high-dimensional data and the lack of labeled data, resulting in poor classification performance. To address this issue, we propose a heterogeneous relationships of subjects and shapelets method for semi-supervised MTS classification. This method offers a novel perspective by integrating various types of additional information while capturing the relationships between them. Specifically, we first utilize a contrast temporal self-attention module to obtain sparse MTS representations, and then model the similarities between these representations using soft dynamic time warping to construct a similarity graph. Secondly, we learn the shapelets for different subject types, incorporating both the subject features and their shapelets as additional information to further refine the similarity graph, ultimately generating a heterogeneous graph. Finally, we use a dual level graph attention network to get prediction. Through this method, we successfully transform dataset into a heterogeneous graph, integrating multiple additional information and achieving precise semi-supervised node classification. Experiments on the Human Activity Recognition, sleep stage classification and University of East Anglia datasets demonstrate that our method outperforms current state-of-the-art methods in MTS classification tasks, validating its superiority.

Updated: 2024-11-27 04:25:13

标题: 主题和样本的异质关系在半监督多变量时间序列分类中的作用

摘要: 多元时间序列（MTS）分类广泛应用于工业、医疗保健和金融等领域，旨在从复杂的时间序列数据中提取关键特征，以实现准确的决策和预测。然而，由于高维数据的有效建模和缺乏标记数据的挑战，现有的MTS方法常常面临困难，导致分类性能较差。为解决这一问题，我们提出了一种用于半监督MTS分类的主体和形状子方法。该方法通过整合各种类型的附加信息并捕捉它们之间的关系，提供了一种新颖的视角。具体而言，我们首先利用对比时间自注意模块获得稀疏的MTS表示，然后使用软动态时间扭转来建立相似图模型这些表示之间的相似性。其次，我们学习不同主体类型的形状子，将主体特征和它们的形状子作为附加信息以进一步优化相似图，最终生成异质图。最后，我们使用双级图注意网络进行预测。通过这种方法，我们成功将数据集转换为异质图，整合多种附加信息，并实现精确的半监督节点分类。在人类活动识别、睡眠阶段分类和东英格兰大学数据集上的实验证明，我们的方法在MTS分类任务中胜过当前的最新方法，验证了其优越性。

更新时间: 2024-11-27 04:25:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.18043v1

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

In this paper, we propose AutoDAN-Turbo, a black-box jailbreak method that can automatically discover as many jailbreak strategies as possible from scratch, without any human intervention or predefined scopes (e.g., specified candidate strategies), and use them for red-teaming. As a result, AutoDAN-Turbo can significantly outperform baseline methods, achieving a 74.3% higher average attack success rate on public benchmarks. Notably, AutoDAN-Turbo achieves an 88.5 attack success rate on GPT-4-1106-turbo. In addition, AutoDAN-Turbo is a unified framework that can incorporate existing human-designed jailbreak strategies in a plug-and-play manner. By integrating human-designed strategies, AutoDAN-Turbo can even achieve a higher attack success rate of 93.4 on GPT-4-1106-turbo.

Updated: 2024-11-27 04:24:57

标题: AutoDAN-Turbo: 一种终身代理，用于策略自我探索以越狱LLMs

摘要: 在这篇论文中，我们提出了AutoDAN-Turbo，这是一种黑盒越狱方法，可以自动从头开始发现尽可能多的越狱策略，无需任何人为干预或预定义范围（例如指定的候选策略），并将它们用于红队行动。结果，AutoDAN-Turbo可以显著优于基准方法，在公共基准测试中实现了74.3%更高的平均攻击成功率。值得注意的是，AutoDAN-Turbo在GPT-4-1106-turbo上实现了88.5的攻击成功率。此外，AutoDAN-Turbo是一个统一的框架，可以以即插即用的方式整合现有的人为设计的越狱策略。通过整合人为设计的策略，AutoDAN-Turbo甚至可以在GPT-4-1106-turbo上实现更高的攻击成功率，达到93.4。

更新时间: 2024-11-27 04:24:57

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05295v3

VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis

The Large Vision Language Model (VLM) has recently addressed remarkable progress in bridging two fundamental modalities. VLM, trained by a sufficiently large dataset, exhibits a comprehensive understanding of both visual and linguistic to perform diverse tasks. To distill this knowledge accurately, in this paper, we introduce a novel approach that explicitly utilizes VLM as an objective function form for the Human-Object Interaction (HOI) detection task (\textbf{VLM-HOI}). Specifically, we propose a method that quantifies the similarity of the predicted HOI triplet using the Image-Text matching technique. We represent HOI triplets linguistically to fully utilize the language comprehension of VLMs, which are more suitable than CLIP models due to their localization and object-centric nature. This matching score is used as an objective for contrastive optimization. To our knowledge, this is the first utilization of VLM language abilities for HOI detection. Experiments demonstrate the effectiveness of our method, achieving state-of-the-art HOI detection accuracy on benchmarks. We believe integrating VLMs into HOI detection represents important progress towards more advanced and interpretable analysis of human-object interactions.

Updated: 2024-11-27 04:13:23

标题: VLM-HOI：用于可解释人物-物体交互分析的视觉语言模型

摘要: 最近，大规模视觉语言模型（VLM）在连接两个基本模态方面取得了显著进展。通过足够大的数据集训练，VLM表现出对视觉和语言的全面理解，可以执行多样化的任务。为了准确提炼这些知识，在本文中，我们引入了一种新颖的方法，明确地利用VLM作为人-物互动检测任务（VLM-HOI）的目标函数形式。具体来说，我们提出一种方法，通过图像-文本匹配技术量化预测的HOI三元组的相似度。我们在语言上表示HOI三元组，充分利用VLM的语言理解能力，这比CLIP模型更合适，因为后者具有定位和以物体为中心的特性。这种匹配分数被用作对比优化的目标。据我们所知，这是首次利用VLM的语言能力进行HOI检测。实验证明了我们方法的有效性，在基准测试中实现了最先进的HOI检测准确性。我们认为将VLM整合到HOI检测中代表了对人-物互动更高级和可解释分析的重要进展。

更新时间: 2024-11-27 04:13:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18038v1

Towards Black-Box Membership Inference Attack for Diffusion Models

Given the rising popularity of AI-generated art and the associated copyright concerns, identifying whether an artwork was used to train a diffusion model is an important research topic. The work approaches this problem from the membership inference attack (MIA) perspective. We first identify the limitation of applying existing MIA methods for proprietary diffusion models: the required access of internal U-nets. To address the above problem, we introduce a novel membership inference attack method that uses only the image-to-image variation API and operates without access to the model's internal U-net. Our method is based on the intuition that the model can more easily obtain an unbiased noise prediction estimate for images from the training set. By applying the API multiple times to the target image, averaging the outputs, and comparing the result to the original image, our approach can classify whether a sample was part of the training set. We validate our method using DDIM and Stable Diffusion setups and further extend both our approach and existing algorithms to the Diffusion Transformer architecture. Our experimental results consistently outperform previous methods.

Updated: 2024-11-27 03:48:21

标题: 朝向黑盒成员推断攻击的扩散模型

摘要: 随着人工智能生成艺术的流行和相关的版权问题，确定一幅艺术作品是否被用于训练扩散模型成为一个重要的研究课题。该研究从成员推断攻击（MIA）的角度解决了这个问题。我们首先确定了将现有的MIA方法应用于专有扩散模型的局限性：需要访问内部U-网。为了解决上述问题，我们引入了一种新颖的成员推断攻击方法，只使用图像到图像变异API，并且在没有访问模型内部U-网的情况下运行。我们的方法基于这样的直觉：模型可以更容易地为训练集中的图像获得无偏噪声预测估计。通过将API多次应用于目标图像，对输出进行平均，并将结果与原始图像进行比较，我们的方法可以分类出样本是否属于训练集。我们使用DDIM和Stable Diffusion设置验证了我们的方法，并进一步将我们的方法和现有算法扩展到扩散变换器架构。我们的实验结果一致优于先前的方法。

更新时间: 2024-11-27 03:48:21

领域: cs.CR,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.20771v3

Privacy-preserving Robotic-based Multi-factor Authentication Scheme for Secure Automated Delivery System

Package delivery is a critical aspect of various industries, but it often incurs high financial costs and inefficiencies when relying solely on human resources. The last-mile transport problem, in particular, contributes significantly to the expenditure of human resources in major companies. Robot-based delivery systems have emerged as a potential solution for last-mile delivery to address this challenge. However, robotic delivery systems still face security and privacy issues, like impersonation, replay, man-in-the-middle attacks (MITM), unlinkability, and identity theft. In this context, we propose a privacy-preserving multi-factor authentication scheme specifically designed for robot delivery systems. Additionally, AI-assisted robotic delivery systems are susceptible to machine learning-based attacks (e.g. FGSM, PGD, etc.). We introduce the \emph{first} transformer-based audio-visual fusion defender to tackle this issue, which effectively provides resilience against adversarial samples. Furthermore, we provide a rigorous formal analysis of the proposed protocol and also analyse the protocol security using a popular symbolic proof tool called ProVerif and Scyther. Finally, we present a real-world implementation of the proposed robotic system with the computation cost and energy consumption analysis. Code and pre-trained models are available at: https://drive.google.com/drive/folders/18B2YbxtV0Pyj5RSFX-ZzCGtFOyorBHil

Updated: 2024-11-27 03:48:00

标题: 隐私保护的基于机器人的安全自动交付系统多因素身份验证方案

摘要: 包裹递送是各行各业的一个关键方面，但当仅依赖人力资源时往往会产生高昂的财务成本和低效率。特别是最后一英里运输问题，对大公司的人力资源支出产生了显著影响。基于机器人的递送系统已经成为解决最后一英里递送挑战的潜在解决方案。然而，机器人递送系统仍面临安全和隐私问题，如冒充、重放、中间人攻击、不可关联性和身份盗窃。在这种情况下，我们提出了一种专为机器人递送系统设计的隐私保护多因素认证方案。此外，AI辅助的机器人递送系统容易受到基于机器学习的攻击（例如FGSM、PGD等）。我们引入了第一个基于转换器的音频视觉融合防御者来解决这个问题，从而有效提供对抗对抗样本的韧性。此外，我们对所提议的协议进行了严格的形式化分析，并使用一种流行的符号证明工具ProVerif和Scyther分析了协议的安全性。最后，我们展示了所提议的机器人系统的实际实现，包括计算成本和能量消耗分析。代码和预训练模型可在以下网址获取：https://drive.google.com/drive/folders/18B2YbxtV0Pyj5RSFX-ZzCGtFOyorBHil.

更新时间: 2024-11-27 03:48:00

领域: cs.CR

下载: http://arxiv.org/abs/2411.18027v1

Leveraging A New GAN-based Transformer with ECDH Crypto-system for Enhancing Energy Theft Detection in Smart Grid

Detecting energy theft is vital for effectively managing power grids, as it ensures precise billing and prevents financial losses. Split-learning emerges as a promising decentralized machine learning technique for identifying energy theft while preserving user data confidentiality. Nevertheless, traditional split learning approaches are vulnerable to privacy leakage attacks, which significantly threaten data confidentiality. To address this challenge, we propose a novel GAN-Transformer-based split learning framework in this paper. This framework leverages the strengths of the transformer architecture, which is known for its capability to process long-range dependencies in energy consumption data. Thus, it enhances the accuracy of energy theft detection without compromising user privacy. A distinctive feature of our approach is the deployment of a novel mask-based method, marking a first in its field to effectively combat privacy leakage in split learning scenarios targeted at AI-enabled adversaries. This method protects sensitive information during the model's training phase. Our experimental evaluations indicate that the proposed framework not only achieves accuracy levels comparable to conventional methods but also significantly enhances privacy protection. The results underscore the potential of the GAN-Transformer split learning framework as an effective and secure tool in the domain of energy theft detection.

Updated: 2024-11-27 03:41:38

标题: 利用基于GAN的新型变压器与ECDH加密系统增强智能电网中能量盗窃检测

摘要: 检测能量盗窃对于有效管理电力网络至关重要，因为它确保精确计费并防止财务损失。分布式学习作为一种有前途的机器学习技术，可以识别能量盗窃，同时保护用户数据的保密性。然而，传统的分布式学习方法容易受到隐私泄霏攻击的威胁，严重威胁数据的保密性。为了解决这一挑战，本文提出了一种基于GAN-Transformer的分布式学习框架。该框架利用了Transformer架构的优势，该架构以处理能量消耗数据中的长距离依赖性而闻名。因此，它提高了能量盗窃检测的准确性，同时不会损害用户的隐私。我们方法的一个独特特征是采用了一种基于掩码的方法，这是该领域首次有效应对AI对手的分布式学习场景中的隐私泄霏。这种方法在模型的训练阶段保护敏感信息。我们的实验评估表明，所提出的框架不仅达到了与传统方法相当的准确性水平，而且显著增强了隐私保护。结果强调了GAN-Transformer分布式学习框架在能量盗窃检测领域作为一种有效且安全工具的潜力。

更新时间: 2024-11-27 03:41:38

领域: cs.CR

下载: http://arxiv.org/abs/2411.18023v1

Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding

Efficient inference in large language models (LLMs) has become a critical focus as their scale and complexity grow. Traditional autoregressive decoding, while effective, suffers from computational inefficiencies due to its sequential token generation process. Speculative decoding addresses this bottleneck by introducing a two-stage framework: drafting and verification. A smaller, efficient model generates a preliminary draft, which is then refined by a larger, more sophisticated model. This paper provides a comprehensive survey of speculative decoding methods, categorizing them into draft-centric and model-centric approaches. We discuss key ideas associated with each method, highlighting their potential for scaling LLM inference. This survey aims to guide future research in optimizing speculative decoding and its integration into real-world LLM applications.

Updated: 2024-11-27 03:25:44

标题: 更深入地了解高效推理方法：一项关于推测解码的调查

摘要: 随着大型语言模型（LLMs）的规模和复杂性不断增长，高效的推理变得至关重要。传统的自回归解码虽然有效，但由于其顺序生成标记的过程而存在计算效率低下的问题。猜测性解码通过引入一个两阶段框架：起草和验证，来解决这一瓶颈问题。一个较小、高效的模型生成一个初步草稿，然后由一个更大、更复杂的模型进行优化。本文对猜测性解码方法进行了全面调查，将其分类为以草案为中心和以模型为中心的方法。我们讨论了与每种方法相关的关键思想，突出了它们在扩展LLM推理方面的潜力。这项调查旨在指导未来研究优化猜测性解码及其在现实世界LLM应用中的整合。

更新时间: 2024-11-27 03:25:44

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.13157v2

Time-aware Heterogeneous Graph Transformer with Adaptive Attention Merging for Health Event Prediction

The widespread application of Electronic Health Records (EHR) data in the medical field has led to early successes in disease risk prediction using deep learning methods. These methods typically require extensive data for training due to their large parameter sets. However, existing works do not exploit the full potential of EHR data. A significant challenge arises from the infrequent occurrence of many medical codes within EHR data, limiting their clinical applicability. Current research often lacks in critical areas: 1) incorporating disease domain knowledge; 2) heterogeneously learning disease representations with rich meanings; 3) capturing the temporal dynamics of disease progression. To overcome these limitations, we introduce a novel heterogeneous graph learning model designed to assimilate disease domain knowledge and elucidate the intricate relationships between drugs and diseases. This model innovatively incorporates temporal data into visit-level embeddings and leverages a time-aware transformer alongside an adaptive attention mechanism to produce patient representations. When evaluated on two healthcare datasets, our approach demonstrated notable enhancements in both prediction accuracy and interpretability over existing methodologies, signifying a substantial advancement towards personalized and proactive healthcare management.

Updated: 2024-11-27 03:21:03

标题: 时间感知的异构图变换器与自适应注意力合并用于健康事件预测

摘要: 电子健康记录（EHR）数据在医学领域的广泛应用已经导致了使用深度学习方法进行疾病风险预测的早期成功。这些方法通常需要大量数据进行训练，因为其参数集很大。然而，现有的研究并没有充分利用EHR数据的潜力。一个重要的挑战来自于EHR数据中许多医学代码的不经常出现，限制了它们在临床中的适用性。目前的研究在一些关键领域经常存在不足：1）整合疾病领域知识；2）异质学习疾病表示具有丰富含义；3）捕捉疾病进展的时间动态。为了克服这些限制，我们引入了一种新颖的异质图学习模型，旨在吸收疾病领域知识并阐明药物和疾病之间错综复杂的关系。该模型创新地将时间数据整合到访问级别的嵌入中，并利用一个时间感知变压器以及自适应注意机制来生成患者表示。在两个医疗保健数据集上评估时，我们的方法在预测准确性和可解释性方面均表现出显著的提升，标志着向个性化和积极主动的医疗管理迈出了重要的一步。

更新时间: 2024-11-27 03:21:03

领域: cs.LG

下载: http://arxiv.org/abs/2404.14815v3

Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Fold Paralysis

This paper presents the Multimodal Laryngoscopic Video Analyzing System (MLVAS), a novel system that leverages both audio and video data to automatically extract key segments and metrics from raw laryngeal videostroboscopic videos for assisted clinical assessment. The system integrates video-based glottis detection with an audio keyword spotting method to analyze both video and audio data, identifying patient vocalizations and refining video highlights to ensure optimal inspection of vocal fold movements. Additionally, MLVAS features an advanced strobing video extraction module that specifically identifies strobing frames from laryngeal videostroboscopy by analyzing hue, saturation, and value fluctuations. Beyond key segment extraction, MLVAS provides effective metrics for Vocal Fold Paralysis (VFP) detection. It employs a novel two-stage glottis segmentation process using a U-Net for initial segmentation, followed by a diffusion-based refinement to reduce false positives, providing better segmentation masks for downstream tasks. MLVAS estimates the vibration dynamics for both left and right vocal folds from the segmented glottis masks to detect unilateral VFP by measuring the angle deviation with the estimated glottal midline. Comparing the variance between left's and right's dynamics, the system effectively distinguishes between left and right VFP. We conducted several ablation studies to demonstrate the effectiveness of each module in the proposed MLVAS. The experimental results on a public segmentation dataset show the effectiveness of our proposed segmentation module. In addition, VFP classification results on a real-world clinic dataset demonstrate MLVAS's ability of providing reliable and objective metrics as well as visualization for assisted clinical diagnosis.

Updated: 2024-11-27 03:19:11

标题: 多模式喉镜视频分析辅助诊断声带麻痹

摘要: 这篇论文介绍了多模式喉镜视频分析系统（MLVAS），这是一个利用音频和视频数据自动提取原始喉镜视频中关键片段和指标以辅助临床评估的新系统。该系统整合了基于视频的声门检测和音频关键词识别方法，分析视频和音频数据，识别患者的语音，并通过精细化视频高亮部分确保声带运动的最佳检查。此外，MLVAS具有先进的频闪视频提取模块，通过分析色调、饱和度和值的波动，特别识别喉镜频闪术中的频闪帧。除了关键片段提取，MLVAS还提供了用于检测声带麻痹（VFP）的有效指标。它采用了一种新颖的两阶段声门分割过程，首先使用U-Net进行初始分割，然后通过扩散式细化来减少误判，为下游任务提供更好的分割掩模。MLVAS从分割的声门掩模中估计左右声带的振动动态，通过测量与估计的声门中线的角度偏差来检测单侧VFP。通过比较左右动态之间的方差，系统有效区分左侧和右侧VFP。我们进行了几项消融研究，以展示所提出的MLVAS中每个模块的有效性。在一个公共分割数据集上的实验结果显示了我们提出的分割模块的有效性。此外，在一个真实世界的临床数据集上的VFP分类结果表明，MLVAS能够提供可靠和客观的指标，以及辅助临床诊断的可视化能力。

更新时间: 2024-11-27 03:19:11

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2409.03597v2

AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions

In software maintenance, bug reproduction is essential for effective fault localization and repair. Manually writing reproduction scripts is a time-consuming task with high requirements for developers. Hence, automation of bug reproduction has increasingly attracted attention from researchers and practitioners. However, the existing studies on bug reproduction are generally limited to specific bug types such as program crashes, and hard to be applied to general bug reproduction. In this paper, considering the superior performance of agent-based methods in code intelligence tasks, we focus on designing an agent-based framework for the task. Directly employing agents would lead to limited bug reproduction performance, due to entangled subtasks, lengthy retrieved context, and unregulated actions. To mitigate the challenges, we propose an Automated gEneral buG reproductIon Scripts generation framework, named AEGIS, which is the first agent-based framework for the task. AEGIS mainly contains two modules: (1) A concise context construction module, which aims to guide the code agent in extracting structured information from issue descriptions, identifying issue-related code with detailed explanations, and integrating these elements to construct the concise context; (2) A FSM-based multi-feedback optimization module to further regulate the behavior of the code agent within the finite state machine (FSM), ensuring a controlled and efficient script generation process based on multi-dimensional feedback. Extensive experiments on the public benchmark dataset show that AEGIS outperforms the state-of-the-art baseline by 23.0% in F->P metric. In addition, the bug reproduction scripts generated by AEGIS can improve the relative resolved rate of Agentless by 12.5%.

Updated: 2024-11-27 03:16:47

标题: AEGIS：基于代理的框架，用于从问题描述中进行一般性错误重现

摘要: 在软件维护中，错误再现对于有效的故障定位和修复至关重要。手动编写再现脚本是一项耗时且对开发人员要求较高的任务。因此，错误再现的自动化越来越受到研究人员和实践者的关注。然而，现有的关于错误再现的研究通常局限于特定类型的错误，如程序崩溃，难以应用于一般的错误再现。本文考虑到基于代理的方法在代码智能任务中的卓越性能，着重设计了一个基于代理的框架用于此任务。直接使用代理将导致错误再现性能有限，因为任务交织在一起，检索到的上下文过长，行为没有规范。为了缓解这些挑战，我们提出了一个名为AEGIS的自动化通用错误再现脚本生成框架，这是该任务的第一个基于代理的框架。AEGIS主要包含两个模块：(1)一个简洁的上下文构建模块，旨在指导代码代理从问题描述中提取结构化信息，识别具有详细说明的与问题相关的代码，并整合这些元素以构建简洁的上下文；(2)基于FSM的多反馈优化模块，进一步规范代码代理在有限状态机（FSM）内的行为，确保基于多维反馈的受控和高效的脚本生成过程。对公共基准数据集的大量实验表明，AEGIS在F->P指标上优于最先进的基准线23.0%。此外，AEGIS生成的错误再现脚本可以将Agentless的相对解决率提高12.5%。

更新时间: 2024-11-27 03:16:47

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2411.18015v1

Diffeomorphic Latent Neural Operator Learning for Data-Efficient Predictions of Solutions to Partial Differential Equations

A computed approximation of the solution operator to a system of partial differential equations (PDEs) is needed in various areas of science and engineering. Neural operators have been shown to be quite effective at predicting these solution generators after training on high-fidelity ground truth data (e.g. numerical simulations). However, in order to generalize well to unseen spatial domains, neural operators must be trained on an extensive amount of geometrically varying data samples that may not be feasible to acquire or simulate in certain contexts (i.e., patient-specific medical data, large-scale computationally intensive simulations.) We propose that in order to learn a PDE solution operator that can generalize across multiple domains without needing to sample enough data expressive enough for all possible geometries, we can train instead a latent neural operator on just a few ground truth solution fields diffeomorphically mapped from different geometric/spatial domains to a fixed reference configuration. Furthermore, the form of the solutions is dependent on the choice of mapping to and from the reference domain. We emphasize that preserving properties of the differential operator when constructing these mappings can significantly reduce the data requirement for achieving an accurate model due to the regularity of the solution fields that the latent neural operator is training on. We provide motivating numerical experimentation that demonstrates an extreme case of this consideration by exploiting the conformal invariance of the Laplacian

Updated: 2024-11-27 03:16:00

标题: 同胚潜在神经算子学习用于部分微分方程数据高效预测解的研究

摘要: 在科学和工程的各个领域中，需要对偏微分方程（PDEs）系统的解算子进行计算近似。经过训练，神经算子已被证明在预测这些解生成器方面非常有效，使用高保真度的真实数据（如数值模拟）。然而，为了在未知空间域中具有良好的泛化能力，神经算子必须在大量几何变化的数据样本上进行训练，这在某些情况下可能无法获取或模拟（即患者特定的医学数据，大规模计算密集型模拟）。我们建议为了学习一个可以跨多个领域泛化的PDE解算子，而不需要对所有可能几何形状具有足够表现力的数据样本进行采样，可以训练一个潜在神经算子，该算子仅基于从不同几何/空间域差分映射到固定参考配置的少量真实解场。此外，解的形式取决于到参考域和从参考域的映射选择。我们强调，在构建这些映射时保留微分算子的属性可以显著减少实现准确模型的数据需求，因为潜在神经算子训练的解场的规则性。我们提供了激励性的数值实验，通过利用拉普拉斯算子的共形不变性来演示此考虑的极端情况。

更新时间: 2024-11-27 03:16:00

领域: cs.LG

下载: http://arxiv.org/abs/2411.18014v1

Causal and Local Correlations Based Network for Multivariate Time Series Classification

Recently, time series classification has attracted the attention of a large number of researchers, and hundreds of methods have been proposed. However, these methods often ignore the spatial correlations among dimensions and the local correlations among features. To address this issue, the causal and local correlations based network (CaLoNet) is proposed in this study for multivariate time series classification. First, pairwise spatial correlations between dimensions are modeled using causality modeling to obtain the graph structure. Then, a relationship extraction network is used to fuse local correlations to obtain long-term dependency features. Finally, the graph structure and long-term dependency features are integrated into the graph neural network. Experiments on the UEA datasets show that CaLoNet can obtain competitive performance compared with state-of-the-art methods.

Updated: 2024-11-27 02:54:26

标题: 基于因果关系和局部相关性的多变量时间序列分类网络

摘要: 最近，时间序列分类引起了大量研究人员的关注，并提出了数百种方法。然而，这些方法通常忽略了维度间的空间相关性和特征间的局部相关性。为了解决这个问题，本研究提出了基于因果和局部相关性的网络（CaLoNet）用于多变量时间序列分类。首先，利用因果建模来建模维度间的成对空间相关性以获取图形结构。然后，使用关系提取网络来融合局部相关性以获取长期依赖特征。最后，将图形结构和长期依赖特征整合到图神经网络中。在UEA数据集上的实验表明，与最先进的方法相比，CaLoNet能够获得竞争性的性能。

更新时间: 2024-11-27 02:54:26

领域: cs.LG,cs.AI,stat.ME,stat.ML

下载: http://arxiv.org/abs/2411.18008v1

Generative Semantic Communication for Joint Image Transmission and Segmentation

Semantic communication has emerged as a promising technology for enhancing communication efficiency. However, most existing research emphasizes single-task reconstruction, neglecting model adaptability and generalization across multi-task systems. In this paper, we propose a novel generative semantic communication system that supports both image reconstruction and segmentation tasks. Our approach builds upon semantic knowledge bases (KBs) at both the transmitter and receiver, with each semantic KB comprising a source KB and a task KB. The source KB at the transmitter leverages a hierarchical Swin-Transformer, a generative AI scheme, to extract multi-level features from the input image. Concurrently, the counterpart source KB at the receiver utilizes hierarchical residual blocks to generate task-specific knowledge. Furthermore, the two task KBs adopt a semantic similarity model to map different task requirements into pre-defined task instructions, thereby facilitating the feature selection of the source KBs. Additionally, we develop a unified residual block-based joint source and channel (JSCC) encoder and two task-specific JSCC decoders to achieve the two image tasks. In particular, a generative diffusion model is adopted to construct the JSCC decoder for the image reconstruction task. Experimental results demonstrate that our multi-task generative semantic communication system outperforms previous single-task communication systems in terms of peak signal-to-noise ratio and segmentation accuracy.

Updated: 2024-11-27 02:51:26

标题: 生成语义通信用于联合图像传输和分割

摘要: 语义交流已经成为一种提高通信效率的有前途的技术。然而，大多数现有研究强调单一任务重建，忽视了模型适应性和跨多任务系统的泛化。在本文中，我们提出了一种支持图像重建和分割任务的新颖生成语义通信系统。我们的方法建立在发射机和接收机都有的语义知识库（KBs）之上，每个语义KB包括一个源KB和一个任务KB。发射机上的源KB利用分层Swin-Transformer，一种生成式人工智能方案，从输入图像中提取多级特征。与此同时，接收机上的对应源KB利用分层残差块来生成任务特定的知识。此外，两个任务KB采用语义相似性模型将不同任务要求映射到预定义的任务指令，从而促进源KB的特征选择。另外，我们开发了一个基于统一残差块的联合源和信道（JSCC）编码器以及两个特定任务的JSCC解码器来实现这两个图像任务。特别地，采用生成扩散模型构建了用于图像重建任务的JSCC解码器。实验结果表明，我们的多任务生成语义通信系统在峰值信噪比和分割准确度方面优于先前的单一任务通信系统。

更新时间: 2024-11-27 02:51:26

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2411.18005v1

Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens

We reveal that low-bit quantization favors undertrained large language models (LLMs) by observing that models with larger sizes or fewer training tokens experience less quantization-induced degradation (QiD) when applying low-bit quantization, whereas smaller models with extensive training tokens suffer significant QiD. To gain deeper insights into this trend, we study over 1500 quantized LLM checkpoints of various sizes and at different training levels (undertrained or fully trained) in a controlled setting, deriving scaling laws for understanding the relationship between QiD and factors such as the number of training tokens, model size and bit width. With the derived scaling laws, we propose a novel perspective that we can use QiD to measure an LLM's training levels and determine the number of training tokens required for fully training LLMs of various sizes. Moreover, we use the scaling laws to predict the quantization performance of different-sized LLMs trained with 100 trillion tokens. Our projection shows that the low-bit quantization performance of future models, which are expected to be trained with over 100 trillion tokens, may NOT be desirable. This poses a potential challenge for low-bit quantization in the future and highlights the need for awareness of a model's training level when evaluating low-bit quantization research. To facilitate future research on this problem, we release all the 1500+ quantized checkpoints used in this work at https://huggingface.co/Xu-Ouyang.

Updated: 2024-11-27 02:51:04

标题: 低比特量化有利于未训练充分的LLM：具有100T训练标记的量化LLM的缩放定律

摘要: 我们揭示了低比特量化有利于未充分训练的大型语言模型（LLMs），观察到具有更大规模或较少训练令牌的模型在应用低比特量化时经历的量化诱导退化（QiD）较少，而具有大量训练令牌的较小模型遭受显著的QiD。为了更深入地了解这一趋势，我们在受控环境中研究了1500多个不同规模和不同训练水平（未充分训练或完全训练）的量化LLM检查点，推导出用于理解QiD与训练令牌数量、模型大小和比特宽度等因素之间关系的标度律。通过推导的标度律，我们提出了一个新颖的观点，即我们可以利用QiD来衡量LLM的训练水平，并确定各种规模的LLM完全训练所需的训练令牌数量。此外，我们利用这些标度律预测了使用100万亿令牌进行训练的不同规模LLM的量化性能。我们的预测显示，未来模型的低比特量化性能可能并不理想，这对未来的低比特量化构成潜在挑战，并强调在评估低比特量化研究时需要意识到模型的训练水平。为了促进未来对这一问题的研究，我们在https://huggingface.co/Xu-Ouyang上发布了本文中使用的所有1500多个量化检查点。

更新时间: 2024-11-27 02:51:04

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2411.17691v2

HAAT: Hybrid Attention Aggregation Transformer for Image Super-Resolution

In the research area of image super-resolution, Swin-transformer-based models are favored for their global spatial modeling and shifting window attention mechanism. However, existing methods often limit self-attention to non overlapping windows to cut costs and ignore the useful information that exists across channels. To address this issue, this paper introduces a novel model, the Hybrid Attention Aggregation Transformer (HAAT), designed to better leverage feature information. HAAT is constructed by integrating Swin-Dense-Residual-Connected Blocks (SDRCB) with Hybrid Grid Attention Blocks (HGAB). SDRCB expands the receptive field while maintaining a streamlined architecture, resulting in enhanced performance. HGAB incorporates channel attention, sparse attention, and window attention to improve nonlocal feature fusion and achieve more visually compelling results. Experimental evaluations demonstrate that HAAT surpasses state-of-the-art methods on benchmark datasets. Keywords: Image super-resolution, Computer vision, Attention mechanism, Transformer

Updated: 2024-11-27 02:47:17

标题: HAAT：用于图像超分辨率的混合注意力聚合变压器

摘要: 在图像超分辨率研究领域，基于Swin-transformer的模型因其全局空间建模和移动窗口注意机制而备受青睐。然而，现有方法通常将自注意力限制在非重叠窗口中以节约成本，并忽略存在于通道之间的有用信息。为解决这一问题，本文引入了一种新颖的模型，混合注意力聚合变压器（HAAT），旨在更好地利用特征信息。HAAT由Swin-Dense-Residual-Connected Blocks（SDRCB）与Hybrid Grid Attention Blocks（HGAB）集成构建。SDRCB扩展了感受野，同时保持了简化的架构，从而提高了性能。HGAB融合了通道注意力、稀疏注意力和窗口注意力，以改善非局部特征融合，并实现更具视觉吸引力的结果。实验评估表明，HAAT在基准数据集上超越了最先进的方法。关键词：图像超分辨率、计算机视觉、注意机制、变压器

更新时间: 2024-11-27 02:47:17

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2411.18003v1

An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition

With the rapid advancements in deep learning, computer vision tasks have seen significant improvements, making two-stream neural networks a popular focus for video based action recognition. Traditional models using RGB and optical flow streams achieve strong performance but at a high computational cost. To address this, we introduce a representation flow algorithm to replace the optical flow branch in the egocentric action recognition model, enabling end-to-end training while reducing computational cost and prediction time. Our model, designed for egocentric action recognition, uses class activation maps (CAMs) to improve accuracy and ConvLSTM for spatio temporal encoding with spatial attention. When evaluated on the GTEA61, EGTEA GAZE+, and HMDB datasets, our model matches the accuracy of the original model on GTEA61 and exceeds it by 0.65% and 0.84% on EGTEA GAZE+ and HMDB, respectively. Prediction runtimes are significantly reduced to 0.1881s, 0.1503s, and 0.1459s, compared to the original model's 101.6795s, 25.3799s, and 203.9958s. Ablation studies were also conducted to study the impact of different parameters on model performance. Keywords: two-stream, egocentric, action recognition, CAM, representation flow, CAM, ConvLSTM

Updated: 2024-11-27 02:46:46

标题: 一种基于RGB流和表示流的端到端双流网络用于人类动作识别

摘要: 随着深度学习的快速发展，计算机视觉任务有了显著的改进，使得基于视频的动作识别成为热门研究重点。传统模型使用RGB和光流流实现了强大的性能，但计算成本较高。为了解决这个问题，我们引入了一种表示流算法来替代自我中心动作识别模型中的光流分支，实现端到端训练，并降低计算成本和预测时间。我们的模型专为自我中心动作识别设计，使用类激活图(CAMs)来提高准确性，并使用ConvLSTM进行时空编码和空间注意力。在GTEA61、EGTEA GAZE+和HMDB数据集上评估时，我们的模型在GTEA61上与原始模型的准确性相匹配，并分别超出EGTEA GAZE+和HMDB 0.65%和0.84%。相比原始模型的101.6795秒、25.3799秒和203.9958秒，预测运行时间显著减少至0.1881秒、0.1503秒和0.1459秒。还进行了消融研究，研究不同参数对模型性能的影响。关键词：双流、自我中心、动作识别、CAM、表示流、ConvLSTM

更新时间: 2024-11-27 02:46:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18002v1

An unconditional distribution learning advantage with shallow quantum circuits

One of the core challenges of research in quantum computing is concerned with the question whether quantum advantages can be found for near-term quantum circuits that have implications for practical applications. Motivated by this mindset, in this work, we prove an unconditional quantum advantage in the probably approximately correct (PAC) distribution learning framework with shallow quantum circuit hypotheses. We identify a meaningful generative distribution learning problem where constant-depth quantum circuits using one and two qubit gates (QNC^0) are superior compared to constant-depth bounded fan-in classical circuits (NC^0) as a choice for hypothesis classes. We hence prove a PAC distribution learning separation for shallow quantum circuits over shallow classical circuits. We do so by building on recent results by Bene Watts and Parham on unconditional quantum advantages for sampling tasks with shallow circuits, which we technically uplift to a hyperplane learning problem, identifying non-local correlations as the origin of the quantum advantage.

Updated: 2024-11-27 02:44:36

标题: 使用浅层量子电路的无条件分布学习优势

摘要: 量子计算研究中的一个核心挑战是关于是否可以在近期量子电路中找到量子优势，这对实际应用有重要意义。在这项工作中，我们受到这种思维方式的启发，证明了在浅量子电路假设下，在可能近似正确（PAC）分布学习框架中存在无条件的量子优势。我们确定了一个有意义的生成分布学习问题，其中使用一和两量子比特门（QNC^0）的常深度量子电路优于常深度有界扇入的经典电路（NC^0）作为假设类的选择。因此，我们证明了浅量子电路优于浅经典电路的PAC分布学习分离。我们通过在Bene Watts和Parham最近的研究结果基础上构建，将其技术上提升到超平面学习问题，将非局部相关性确定为量子优势的起源。

更新时间: 2024-11-27 02:44:36

领域: quant-ph,cs.AI

下载: http://arxiv.org/abs/2411.15548v2

BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models

While large language models (LLMs) exhibit remarkable capabilities across various tasks, they encounter potential security risks such as jailbreak attacks, which exploit vulnerabilities to bypass security measures and generate harmful outputs. Existing jailbreak strategies mainly focus on maximizing attack success rate (ASR), frequently neglecting other critical factors, including the relevance of the jailbreak response to the query and the level of stealthiness. This narrow focus on single objectives can result in ineffective attacks that either lack contextual relevance or are easily recognizable. In this work, we introduce BlackDAN, an innovative black-box attack framework with multi-objective optimization, aiming to generate high-quality prompts that effectively facilitate jailbreaking while maintaining contextual relevance and minimizing detectability. BlackDAN leverages Multiobjective Evolutionary Algorithms (MOEAs), specifically the NSGA-II algorithm, to optimize jailbreaks across multiple objectives including ASR, stealthiness, and semantic relevance. By integrating mechanisms like mutation, crossover, and Pareto-dominance, BlackDAN provides a transparent and interpretable process for generating jailbreaks. Furthermore, the framework allows customization based on user preferences, enabling the selection of prompts that balance harmfulness, relevance, and other factors. Experimental results demonstrate that BlackDAN outperforms traditional single-objective methods, yielding higher success rates and improved robustness across various LLMs and multimodal LLMs, while ensuring jailbreak responses are both relevant and less detectable.

Updated: 2024-11-27 02:41:48

标题: BlackDAN：一种黑盒多目标方法，用于对大型语言模型进行有效和上下文化的越狱

摘要: 尽管大型语言模型(LLMs)在各种任务中展现出卓越的能力，但它们面临潜在的安全风险，如越狱攻击，这种攻击利用漏洞绕过安全措施并生成有害输出。现有的越狱策略主要集中在最大化攻击成功率(ASR)，经常忽略其他关键因素，包括越狱响应与查询的相关性和潜在性水平。这种对单一目标的狭隘关注可能导致无效的攻击，要么缺乏上下文相关性，要么容易被识别。在这项工作中，我们引入了BlackDAN，一种创新的黑盒攻击框架，具有多目标优化，旨在生成高质量提示，有效促进越狱同时保持上下文相关性和最小化可检测性。BlackDAN利用多目标进化算法(MOEAs)，特别是NSGA-II算法，优化跨多个目标的越狱，包括ASR、潜在性和语义相关性。通过集成突变、交叉和帕累托支配等机制，BlackDAN提供了一个透明和可解释的生成越狱的过程。此外，该框架允许基于用户偏好进行定制，从而选择平衡有害性、相关性和其他因素的提示。实验结果表明，BlackDAN优于传统的单一目标方法，提高了各种LLMs和多模态LLMs的成功率和鲁棒性，同时确保越狱响应既相关又不易被检测。

更新时间: 2024-11-27 02:41:48

领域: cs.CR,cs.AI,cs.CL,cs.LG,cs.NE

下载: http://arxiv.org/abs/2410.09804v3

A Novel Pareto-optimal Ranking Method for Comparing Multi-objective Optimization Algorithms

As the interest in multi- and many-objective optimization algorithms grows, the performance comparison of these algorithms becomes increasingly important. A large number of performance indicators for multi-objective optimization algorithms have been introduced, each of which evaluates these algorithms based on a certain aspect. Therefore, assessing the quality of multi-objective results using multiple indicators is essential to guarantee that the evaluation considers all quality perspectives. This paper proposes a novel multi-metric comparison method to rank the performance of multi-/ many-objective optimization algorithms based on a set of performance indicators. We utilize the Pareto optimality concept (i.e., non-dominated sorting algorithm) to create the rank levels of algorithms by simultaneously considering multiple performance indicators as criteria/objectives. As a result, four different techniques are proposed to rank algorithms based on their contribution at each Pareto level. This method allows researchers to utilize a set of existing/newly developed performance metrics to adequately assess/rank multi-/many-objective algorithms. The proposed methods are scalable and can accommodate in its comprehensive scheme any newly introduced metric. The method was applied to rank 10 competing algorithms in the 2018 CEC competition solving 15 many-objective test problems. The Pareto-optimal ranking was conducted based on 10 well-known multi-objective performance indicators and the results were compared to the final ranks reported by the competition, which were based on the inverted generational distance (IGD) and hypervolume indicator (HV) measures. The techniques suggested in this paper have broad applications in science and engineering, particularly in areas where multiple metrics are used for comparisons. Examples include machine learning and data mining.

Updated: 2024-11-27 02:34:54

标题: 一种用于比较多目标优化算法的新型帕累托最优排序方法

摘要: 随着对多目标和多目标优化算法的兴趣增长，这些算法的性能比较变得越来越重要。已经引入了大量用于评估多目标优化算法的性能指标，每个指标基于某种方面评估这些算法。因此，使用多个指标评估多目标结果的质量是必不可少的，以确保评估考虑所有质量视角。本文提出了一种新颖的多指标比较方法，根据一组性能指标对多目标优化算法的性能进行排名。我们利用帕累托最优性概念（即非支配排序算法）通过同时考虑多个性能指标作为准则/目标来创建算法的排名水平。因此，提出了四种不同的技术来根据每个帕累托水平的贡献来对算法进行排名。这种方法允许研究人员利用一组现有/新开发的性能指标充分评估/排名多目标优化算法。所提出的方法是可扩展的，并且可以在其全面方案中适应任何新引入的度量。该方法应用于对2018年CEC竞赛中解决了15个多目标测试问题的10个竞争算法进行排名。帕累托最优排名是基于10个着名的多目标性能指标进行的，并将结果与竞赛报告的最终排名进行比较，后者是基于反向代际距离（IGD）和超体积指标（HV）度量。本文提出的技术在科学和工程领域具有广泛的应用，特别是在使用多个度量进行比较的领域，例如机器学习和数据挖掘。

更新时间: 2024-11-27 02:34:54

领域: cs.AI,cs.NE

下载: http://arxiv.org/abs/2411.17999v1

BAHOP: Similarity-based Basin Hopping for A fast hyper-parameter search in WSI classification

Pre-processing whole slide images (WSIs) can impact classification performance. Our study shows that using fixed hyper-parameters for pre-processing out-of-domain WSIs can significantly degrade performance. Therefore, it is critical to search domain-specific hyper-parameters during inference. However, searching for an optimal parameter set is time-consuming. To overcome this, we propose BAHOP, a novel Similarity-based Basin Hopping optimization for fast parameter tuning to enhance inference performance on out-of-domain data. The proposed BAHOP achieves 5\% to 30\% improvement in accuracy with $\times5$ times faster on average.

Updated: 2024-11-27 02:19:33

标题: BAHOP：基于相似性的盆地跳跃算法用于WSI分类中的快速超参数搜索

摘要: 对全切片图像（WSIs）进行预处理可能会影响分类性能。我们的研究表明，对于预处理不在领域内的WSIs，使用固定的超参数可能会显著降低性能。因此，在推断期间搜索特定领域的超参数至关重要。然而，搜索最佳参数集是耗时的。为了克服这一问题，我们提出了BAHOP，一种新颖的基于相似性的盆地跳跃优化方法，用于快速调整参数以提高在不在领域内数据上的推断性能。提出的BAHOP平均达到5\%至30\%的准确度提升，平均速度提高了5倍。

更新时间: 2024-11-27 02:19:33

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.11161v3

Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9

We present a prompt-engineering-based text-augmentation approach applied to a language-queried audio source separation (LASS) task. To enhance the performance of LASS, the proposed approach utilizes large language models (LLMs) to generate multiple captions corresponding to each sentence of the training dataset. To this end, we first perform experiments to identify the most effective prompts for caption augmentation with a smaller number of captions. A LASS model trained with these augmented captions demonstrates improved performance on the DCASE 2024 Task 9 validation set compared to that trained without augmentation. This study highlights the effectiveness of LLM-based caption augmentation in advancing language-queried audio source separation.

Updated: 2024-11-27 02:17:54

标题: 基于大型语言模型生成的字幕增强语言查询音频源分离的性能改进，用于DCASE挑战2024任务9

摘要: 我们提出了一种基于提示工程的文本增强方法，应用于语言查询音频源分离（LASS）任务。为了增强LASS的性能，所提出的方法利用大型语言模型（LLMs）生成与训练数据集中每个句子对应的多个标题。为此，我们首先进行实验，以确定用较少数量的标题进行标题增强的最有效提示。使用这些增强的标题训练的LASS模型在DCASE 2024任务9验证集上表现出比没有增强训练的模型更好的性能。这项研究强调了LLM-based标题增强在推进语言查询音频源分离方面的有效性。

更新时间: 2024-11-27 02:17:54

领域: eess.AS,cs.AI,cs.SD

下载: http://arxiv.org/abs/2406.11248v2

New Faithfulness-Centric Interpretability Paradigms for Natural Language Processing

As machine learning becomes more widespread and is used in more critical applications, it's important to provide explanations for these models, to prevent unintended behavior. Unfortunately, many current interpretability methods struggle with faithfulness. Therefore, this Ph.D. thesis investigates the question "How to provide and ensure faithful explanations for complex general-purpose neural NLP models?" The main thesis is that we should develop new paradigms in interpretability. This is achieved by first developing solid faithfulness metrics and then applying the lessons learned from this investigation to develop new paradigms. The two new paradigms explored are faithfulness measurable models (FMMs) and self-explanations. The idea in self-explanations is to have large language models explain themselves, we identify that current models are not capable of doing this consistently. However, we suggest how this could be achieved. The idea of FMMs is to create models that are designed such that measuring faithfulness is cheap and precise. This makes it possible to optimize an explanation towards maximum faithfulness, which makes FMMs designed to be explained. We find that FMMs yield explanations that are near theoretical optimal in terms of faithfulness. Overall, from all investigations of faithfulness, results show that post-hoc and intrinsic explanations are by default model and task-dependent. However, this was not the case when using FMMs, even with the same post-hoc explanation methods. This shows, that even simple modifications to the model, such as randomly masking the training dataset, as was done in FMMs, can drastically change the situation and result in consistently faithful explanations. This answers the question of how to provide and ensure faithful explanations.

Updated: 2024-11-27 02:17:34

标题: 自然语言处理的新忠实中心解释性范式

摘要: 随着机器学习的普及和在更多关键应用中的使用，为这些模型提供解释以防止意外行为变得越来越重要。不幸的是，许多当前的可解释性方法在忠实度方面遇到困难。因此，这篇博士论文探讨了一个问题：“如何为复杂的通用神经自然语言处理模型提供并确保忠实的解释？”主要论点是我们应该发展新的解释范式。首先通过开发坚实的忠实度度量标准，然后应用从这项研究中学到的经验教训来发展新的范式。探讨的两种新范式是忠实度可衡量模型（FMMs）和自解释。自解释的想法是让大型语言模型解释自己，我们发现当前模型无法一致做到这一点。但是，我们提出了如何实现这一点的建议。FMMs的想法是创建设计良好的模型，以便测量忠实度廉价且精确。这使得可以将解释优化到最大忠实度，使FMMs被设计为可解释的。我们发现，FMMs提供的解释在忠实度方面几乎达到了理论最优。总的来说，在所有忠实度调查中，结果表明事后和内在解释默认是模型和任务相关的。然而，当使用FMMs时，即使使用相同的事后解释方法，情况也不是这样。这表明，即使对模型进行简单的修改，例如像在FMMs中所做的随机屏蔽训练数据集，也可以大幅改变情况，并产生一致忠实的解释。这回答了如何提供并确保忠实的解释的问题。

更新时间: 2024-11-27 02:17:34

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.17992v1

Regularized Multi-LLMs Collaboration for Enhanced Score-based Causal Discovery

As the significance of understanding the cause-and-effect relationships among variables increases in the development of modern systems and algorithms, learning causality from observational data has become a preferred and efficient approach over conducting randomized control trials. However, purely observational data could be insufficient to reconstruct the true causal graph. Consequently, many researchers tried to utilise some form of prior knowledge to improve causal discovery process. In this context, the impressive capabilities of large language models (LLMs) have emerged as a promising alternative to the costly acquisition of prior expert knowledge. In this work, we further explore the potential of using LLMs to enhance causal discovery approaches, particularly focusing on score-based methods, and we propose a general framework to utilise the capacity of not only one but multiple LLMs to augment the discovery process.

Updated: 2024-11-27 01:56:21

标题: 正则化多LLMs协作以增强基于分数的因果发现

摘要: 随着在现代系统和算法的发展中理解变量之间因果关系的重要性增加，从观测数据中学习因果关系已成为优选且高效的方法，而不是进行随机对照试验。然而，纯观测数据可能不足以重建真实的因果图。因此，许多研究人员尝试利用某种形式的先验知识来改进因果发现过程。在这种背景下，大型语言模型（LLMs）的强大能力已经成为昂贵先验专家知识获取的有希望的替代方案。在这项工作中，我们进一步探讨了利用LLMs来增强因果发现方法的潜力，特别是集中在基于分数的方法上，并提出了一个通用框架来利用不仅一个而是多个LLMs的能力来增强发现过程。

更新时间: 2024-11-27 01:56:21

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2411.17989v1

Verbalized Representation Learning for Interpretable Few-Shot Generalization

Humans recognize objects after observing only a few examples, a remarkable capability enabled by their inherent language understanding of the real-world environment. Developing verbalized and interpretable representation can significantly improve model generalization in low-data settings. In this work, we propose Verbalized Representation Learning (VRL), a novel approach for automatically extracting human-interpretable features for object recognition using few-shot data. Our method uniquely captures inter-class differences and intra-class commonalities in the form of natural language by employing a Vision-Language Model (VLM) to identify key discriminative features between different classes and shared characteristics within the same class. These verbalized features are then mapped to numeric vectors through the VLM. The resulting feature vectors can be further utilized to train and infer with downstream classifiers. Experimental results show that, at the same model scale, VRL achieves a 24% absolute improvement over prior state-of-the-art methods while using 95% less data and a smaller mode. Furthermore, compared to human-labeled attributes, the features learned by VRL exhibit a 20% absolute gain when used for downstream classification tasks. Code is available at: https://github.com/joeyy5588/VRL/tree/main.

Updated: 2024-11-27 01:55:08

标题: 可解释的少样本泛化的口头表征学习

摘要: 人类在观察几个例子后就能识别物体，这是一种非常显著的能力，这得益于他们对现实世界环境的内在语言理解。开发口头化和可解释的表示形式可以显著提高模型在低数据情况下的泛化能力。在这项工作中，我们提出了一种新颖的方法，称为Verbalized Representation Learning（VRL），用于自动提取人类可解释特征，以进行物体识别使用少量数据。我们的方法通过使用视觉语言模型（VLM）来以自然语言的形式捕捉类间差异和类内共同特征，从而识别不同类别之间的关键区别特征和同一类别内的共享特征。然后，这些口头化特征通过VLM映射到数值向量。生成的特征向量可以进一步用于训练和推断下游分类器。实验结果显示，在相同的模型规模下，VRL相对于先前的最先进方法实现了24%的绝对改进，同时使用的数据量减少了95%并且模型更小。此外，与人工标记的属性相比，VRL学习的特征在用于下游分类任务时表现出20%的绝对增益。代码可在以下链接找到：https://github.com/joeyy5588/VRL/tree/main.

更新时间: 2024-11-27 01:55:08

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.18651v1

Optimized Conformal Selection: Powerful Selective Inference After Conformity Score Optimization

Model selection/optimization in conformal inference is challenging, since it may break the exchangeability between labeled and unlabeled data. We study this problem in the context of conformal selection, which uses conformal p-values to select ``interesting'' instances with large unobserved labels from a pool of unlabeled data, while controlling the FDR in finite sample. For validity, existing solutions require the model choice to be independent of the data used to construct the p-values and calibrate the selection set. However, when presented with many model choices and limited labeled data, it is desirable to (i) select the best model in a data-driven manner, and (ii) mitigate power loss due to sample splitting. This paper presents OptCS, a general framework that allows valid statistical testing (selection) after flexible data-driven model optimization. We introduce general conditions under which OptCS constructs valid conformal p-values despite substantial data reuse and handles complex p-value dependencies to maintain finite-sample FDR control via a novel multiple testing procedure. We instantiate this general recipe to propose three FDR-controlling procedures, each optimizing the models differently: (i) selecting the most powerful one among multiple pre-trained candidate models, (ii) using all data for model fitting without sample splitting, and (iii) combining full-sample model fitting and selection. We demonstrate the efficacy of our methods via simulation studies and real applications in drug discovery and alignment of large language models in radiology report generation.

Updated: 2024-11-27 01:40:50

标题: 优化的一致性选择：一致性得分优化后的强大选择推理

摘要: 在符合推理中的模型选择/优化是具有挑战性的，因为它可能会破坏标记和未标记数据之间的可交换性。我们研究了在符合选择的背景下的这个问题，它使用符合p值从一组未标记数据中选择具有大量未观察标签的“有趣”实例，同时控制有限样本中的FDR。对于有效性，现有解决方案要求模型选择与用于构建p值和校准选择集的数据无关。然而，当面临许多模型选择和有限标记数据时，希望（i）以数据驱动的方式选择最佳模型，以及（ii）减轻由于样本分割而导致的功率损失。本文介绍了OptCS，这是一个通用框架，允许在灵活的数据驱动模型优化之后进行有效的统计测试（选择）。我们引入了一般条件，使得OptCS在实质性数据重用的情况下构建有效的符合p值，并通过一种新颖的多重检验过程处理复杂的p值依赖关系，以维持有限样本的FDR控制。我们具体化这一一般配方，提出了三种控制FDR的程序，每种程序都以不同的方式优化模型：（i）在多个预先训练的候选模型中选择最强大的一个，（ii）使用所有数据进行模型拟合，而无需样本分割，以及（iii）结合全样本模型拟合和选择。我们通过模拟研究和药物发现以及放射学报告生成中对大型语言模型的对齐等实际应用来展示我们方法的有效性。

更新时间: 2024-11-27 01:40:50

领域: stat.ME,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2411.17983v1

The importance of visual modelling languages in generative software engineering

Multimodal GPTs represent a watershed in the interplay between Software Engineering and Generative Artificial Intelligence. GPT-4 accepts image and text inputs, rather than simply natural language. We investigate relevant use cases stemming from these enhanced capabilities of GPT-4. To the best of our knowledge, no other work has investigated similar use cases involving Software Engineering tasks carried out via multimodal GPTs prompted with a mix of diagrams and natural language.

Updated: 2024-11-27 01:15:36

标题: 视觉建模语言在生成式软件工程中的重要性

摘要: 多模GPT代表了软件工程和生成人工智能之间相互作用的一个分水岭。GPT-4接受图像和文本输入，而不仅仅是自然语言。我们调查了由GPT-4的增强功能所带来的相关用例。据我们所知，没有其他工作调查过类似的使用案例，其中通过用图表和自然语言提示的多模GPT执行软件工程任务。

更新时间: 2024-11-27 01:15:36

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2411.17976v1

Improved implicit diffusion model with knowledge distillation to estimate the spatial distribution density of carbon stock in remote sensing imagery

The forest serves as the most significant terrestrial carbon stock mechanism, effectively reducing atmospheric CO$_2$ concentrations and mitigating climate change. Remote sensing provides high data accuracy and enables large-scale observations. Optical images facilitate long-term monitoring, which is crucial for future carbon stock estimation studies. This study focuses on Huize County, Qujing City, Yunnan Province, China, utilizing GF-1 WFV satellite imagery. The KD-VGG and KD-UNet modules were introduced for initial feature extraction, and the improved implicit diffusion model (IIDM) was proposed. The results showed: (1) The VGG module improved initial feature extraction, improving accuracy, and reducing inference time with optimized model parameters. (2) The Cross-attention + MLPs module enabled effective feature fusion, establishing critical relationships between global and local features, achieving high-accuracy estimation. (3) The IIDM model, a novel contribution, demonstrated the highest estimation accuracy with an RMSE of 12.17\%, significantly improving by 41.69\% to 42.33\% compared to the regression model. In carbon stock estimation, the generative model excelled in extracting deeper features, significantly outperforming other models, demonstrating the feasibility of AI-generated content in quantitative remote sensing. The 16-meter resolution estimates provide a robust basis for tailoring forest carbon sink regulations, enhancing regional carbon stock management.

Updated: 2024-11-27 01:06:05

标题: 利用知识蒸馏改进的隐式扩散模型来估计遥感影像中碳储量的空间分布密度

摘要: 森林作为最重要的陆地碳储量机制，有效减少大气中的CO$_2$浓度，缓解气候变化。遥感技术提供了高精度的数据，并实现了大规模观测。光学图像有助于长期监测，对未来碳储量估算研究至关重要。本研究以中国云南省曲靖市会泽县为研究对象，利用GF-1 WFV卫星图像。引入了KD-VGG和KD-UNet模块进行初始特征提取，并提出了改进的隐式扩散模型（IIDM）。结果显示：（1）VGG模块改进了初始特征提取，通过优化模型参数提高了准确性，减少了推断时间。（2）交叉注意力+MLPs模块实现了有效的特征融合，建立了全局和局部特征之间的关键关系，实现了高准确度的估算。（3）IIDM模型作为一项新的贡献，表现出最高的估算精度，RMSE为12.17％，与回归模型相比显著提高了41.69％至42.33％。在碳储量估算中，生成模型在提取更深层次特征方面表现出色，明显优于其他模型，展示了AI生成内容在定量遥感中的可行性。16米分辨率的估算为调整森林碳汇规定提供了坚实基础，增强了区域碳储管理。

更新时间: 2024-11-27 01:06:05

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.17973v1

Graph Neural Network for Cerebral Blood Flow Prediction With Clinical Datasets

Accurate prediction of cerebral blood flow is essential for the diagnosis and treatment of cerebrovascular diseases. Traditional computational methods, however, often incur significant computational costs, limiting their practicality in real-time clinical applications. This paper proposes a graph neural network (GNN) to predict blood flow and pressure in previously unseen cerebral vascular network structures that were not included in training data. The GNN was developed using clinical datasets from patients with stenosis, featuring complex and abnormal vascular geometries. Additionally, the GNN model was trained on data incorporating a wide range of inflow conditions, vessel topologies, and network connectivities to enhance its generalization capability. The approach achieved Pearson's correlation coefficients of 0.727 for pressure and 0.824 for flow rate, with sufficient training data. These findings demonstrate the potential of the GNN for real-time cerebrovascular diagnostics, particularly in handling intricate and pathological vascular networks.

Updated: 2024-11-27 01:01:37

标题: 用图神经网络对临床数据进行脑血流预测

摘要: 准确预测脑血流对于诊断和治疗脑血管疾病至关重要。然而，传统的计算方法往往需要巨大的计算成本，限制了它们在实时临床应用中的实用性。本文提出了一种图神经网络（GNN）来预测以前未见过的脑血管网络结构中的血流和压力，这些结构未包含在训练数据中。GNN是使用来自患有狭窄的患者的临床数据集开发的，具有复杂和异常的血管几何形状。此外，GNN模型是在包含各种入流条件、血管拓扑和网络连接性的数据上进行训练，以增强其泛化能力。该方法在足够的训练数据下实现了0.727的压力和0.824的流速的皮尔逊相关系数。这些发现展示了GNN在实时脑血管诊断中的潜力，特别是在处理复杂和病理性血管网络方面。

更新时间: 2024-11-27 01:01:37

领域: eess.IV,cs.AI,cs.CE,cs.LG

下载: http://arxiv.org/abs/2411.17971v1

FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting

Time Series Forecasting (TSF) is key functionality in numerous fields, including in finance, weather services, and energy management. While TSF methods are emerging these days, many of them require domain-specific data collection and model training and struggle with poor generalization performance on new domains. Foundation models aim to overcome this limitation. Pre-trained on large-scale language or time series data, they exhibit promising inferencing capabilities in new or unseen data. This has spurred a surge in new TSF foundation models. We propose a new benchmark, FoundTS, to enable thorough and fair evaluation and comparison of such models. FoundTS covers a variety of TSF foundation models, including those based on large language models and those pretrained on time series. Next, FoundTS supports different forecasting strategies, including zero-shot, few-shot, and full-shot, thereby facilitating more thorough evaluations. Finally, FoundTS offers a pipeline that standardizes evaluation processes such as dataset splitting, loading, normalization, and few-shot sampling, thereby facilitating fair evaluations. Building on this, we report on an extensive evaluation of TSF foundation models on a broad range of datasets from diverse domains and with different statistical characteristics. Specifically, we identify pros and cons and inherent limitations of existing foundation models, and we identify directions for future model design. We make our code and datasets available at https://anonymous.4open.science/r/FoundTS-C2B0.

Updated: 2024-11-27 00:58:59

标题: FoundTS：基于时间序列预测的基础模型的全面统一基准测试

摘要: 时间序列预测（TSF）是许多领域的关键功能，包括金融、气象服务和能源管理。虽然TSF方法如今正在兴起，但许多方法需要特定领域的数据收集和模型训练，并且在新领域上表现出较差的泛化性能。基础模型旨在克服这一限制。经过大规模语言或时间序列数据预训练后，它们在新的或未见数据中展现出有希望的推理能力。这引发了新的TSF基础模型的激增。我们提出了一个新的基准，FoundTS，以便对这些模型进行彻底和公正的评估和比较。FoundTS涵盖了各种TSF基础模型，包括基于大型语言模型和预训练于时间序列的模型。此外，FoundTS支持不同的预测策略，包括零次、少次和完全次，从而促进更全面的评估。最后，FoundTS提供了一个标准化评估流程的管道，如数据集拆分、加载、标准化和少次抽样，从而促进公平评估。基于此，我们对来自不同领域且具有不同统计特征的广泛数据集上的TSF基础模型进行了全面评估。具体来说，我们确定了现有基础模型的优缺点和固有限制，并确定了未来模型设计的方向。我们将我们的代码和数据集提供在https://anonymous.4open.science/r/FoundTS-C2B0。

更新时间: 2024-11-27 00:58:59

领域: cs.LG

下载: http://arxiv.org/abs/2410.11802v4

Resolution-Agnostic Transformer-based Climate Downscaling

Understanding future weather changes at regional and local scales is crucial for planning and decision-making, particularly in the context of extreme weather events, as well as for broader applications in agriculture, insurance, and infrastructure development. However, the computational cost of downscaling Global Climate Models (GCMs) to the fine resolutions needed for such applications presents a significant barrier. Drawing on advancements in weather forecasting models, this study introduces a cost-efficient downscaling method using a pretrained Earth Vision Transformer (Earth ViT) model. Initially trained on ERA5 data to downscale from 50 km to 25 km resolution, the model is then tested on the higher resolution BARRA-SY dataset at a 3 km resolution. Remarkably, it performs well without additional training, demonstrating its ability to generalize across different resolutions. This approach holds promise for generating large ensembles of regional climate simulations by downscaling GCMs with varying input resolutions without incurring additional training costs. Ultimately, this method could provide more comprehensive estimates of potential future changes in key climate variables, aiding in effective planning for extreme weather events and climate change adaptation strategies.

Updated: 2024-11-27 00:55:18

标题: 分辨率无关的基于Transformer的气候降尺度方法

摘要: 在规划和决策中，特别是在极端天气事件的背景下以及在农业、保险和基础设施开发等更广泛的应用中，了解区域和局部尺度的未来天气变化至关重要。然而，将全球气候模型（GCMs）缩放到这些应用所需的精细分辨率的计算成本构成了重要障碍。基于天气预报模型的进展，本研究引入了一种成本效益的缩放方法，使用预训练的Earth ViT模型。该模型最初在ERA5数据上进行训练，将分辨率从50公里缩小至25公里，然后在更高分辨率的BARRA-SY数据集上以3公里分辨率进行测试。值得注意的是，它在没有额外训练的情况下表现良好，表明其能够在不同分辨率间进行泛化。这种方法有望通过将具有不同输入分辨率的GCMs进行缩放而生成大量的区域气候模拟集合，而无需额外的训练成本。最终，该方法可能为关键气候变量未来变化的更全面估计提供帮助，有助于有效规划极端天气事件和气候变化适应策略。

更新时间: 2024-11-27 00:55:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.14774v2

Optimized Tradeoffs for Private Prediction with Majority Ensembling

We study a classical problem in private prediction, the problem of computing an $(m\epsilon, \delta)$-differentially private majority of $K$ $(\epsilon, \Delta)$-differentially private algorithms for $1 \leq m \leq K$ and $1 > \delta \geq \Delta \geq 0$. Standard methods such as subsampling or randomized response are widely used, but do they provide optimal privacy-utility tradeoffs? To answer this, we introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm. It is parameterized by a data-dependent noise function $\gamma$, and enables efficient utility optimization over the class of all private algorithms, encompassing those standard methods. We show that maximizing the utility of an $(m\epsilon, \delta)$-private majority algorithm can be computed tractably through an optimization problem for any $m \leq K$ by a novel structural result that reduces the infinitely many privacy constraints into a polynomial set. In some settings, we show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility. Lastly, we demonstrate the strong empirical effectiveness of our first-of-its-kind privacy-constrained utility optimization for ensembling labels for private prediction from private teachers in image classification. Notably, our DaRRM framework with an optimized $\gamma$ exhibits substantial utility gains when compared against several baselines.

Updated: 2024-11-27 00:48:48

标题: 私有预测中多数集成的优化权衡

摘要: 我们研究了一个私人预测中的经典问题，即计算对于$1 \leq m \leq K$，$1 > \delta \geq \Delta \geq 0$，$K$个$(\epsilon, \Delta)$-差分隐私算法的$(m\epsilon, \delta)$-差分隐私多数的问题。标准方法如子采样或随机响应被广泛使用，但它们提供最佳的隐私-效用权衡吗？为了回答这个问题，我们引入了基于数据的随机响应多数（DaRRM）算法。它由一个数据相关的噪声函数$\gamma$参数化，并能够在所有私有算法的类中实现有效的效用优化，包括那些标准方法。我们表明，通过一种新颖的结构结果，通过将无限多的隐私约束减少为多项式集，可以在任何$m \leq K$的情况下通过优化问题可行地计算最大化$(m\epsilon, \delta)$-私有多数算法的效用。在某些情况下，我们表明DaRRM相对于常用基线获得了2倍的隐私增益，且效用固定。最后，我们展示了我们首创的隐私约束效用优化对于在图像分类中从私人教师中集成标签进行私人预测的强大经验效果。值得注意的是，与几种基线相比，我们的使用优化的DaRRM框架展现出了显著的效用增益。

更新时间: 2024-11-27 00:48:48

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2411.17965v1

ESS-ReduNet: Enhancing Subspace Separability of ReduNet via Dynamic Expansion with Bayesian Inference

ReduNet is a deep neural network model that leverages the principle of maximal coding rate \textbf{redu}ction to transform original data samples into a low-dimensional, linear discriminative feature representation. Unlike traditional deep learning frameworks, ReduNet constructs its parameters explicitly layer by layer, with each layer's parameters derived based on the features transformed from the preceding layer. Rather than directly using labels, ReduNet uses the similarity between each category's spanned subspace and the data samples for feature updates at each layer. This may lead to features being updated in the wrong direction, impairing the correct construction of network parameters and reducing the network's convergence speed. To address this issue, based on the geometric interpretation of the network parameters, this paper presents ESS-ReduNet to enhance the separability of each category's subspace by dynamically controlling the expansion of the overall spanned space of the samples. Meanwhile, label knowledge is incorporated with Bayesian inference to encourage the decoupling of subspaces. Finally, stability, as assessed by the condition number, serves as an auxiliary criterion for halting training. Experiments on the ESR, HAR, Covertype, and Gas datasets demonstrate that ESS-ReduNet achieves more than 10x improvement in convergence compared to ReduNet. Notably, on the ESR dataset, the features transformed by ESS-ReduNet achieve a 47\% improvement in SVM classification accuracy.

Updated: 2024-11-27 00:37:12

标题: ESS-ReduNet：通过贝叶斯推断动态扩展增强ReduNet的子空间可分性

摘要: ReduNet是一个深度神经网络模型，利用最大编码速率减少的原则将原始数据样本转换为低维、线性判别特征表示。与传统的深度学习框架不同，ReduNet明确地逐层构建其参数，每一层的参数基于前一层转换的特征推导而来。ReduNet不直接使用标签，而是使用每个类别跨距子空间与数据样本之间的相似性进行每一层的特征更新。这可能导致特征在错误方向上更新，影响网络参数的正确构建并减慢网络的收敛速度。为了解决这个问题，基于网络参数的几何解释，本文提出了ESS-ReduNet来通过动态控制样本整体跨距空间的扩展来增强每个类别子空间的可分性。同时，标签知识与贝叶斯推断结合，鼓励子空间的解耦。最后，通过条件数评估的稳定性作为停止训练的辅助标准。对ESR、HAR、Covertype和Gas数据集的实验表明，与ReduNet相比，ESS-ReduNet的收敛速度提高了10倍以上。值得注意的是，在ESR数据集上，由ESS-ReduNet转换的特征在SVM分类准确率上提高了47%。

更新时间: 2024-11-27 00:37:12

领域: cs.LG

下载: http://arxiv.org/abs/2411.17961v1

Adversarial Training in Low-Label Regimes with Margin-Based Interpolation

Adversarial training has emerged as an effective approach to train robust neural network models that are resistant to adversarial attacks, even in low-label regimes where labeled data is scarce. In this paper, we introduce a novel semi-supervised adversarial training approach that enhances both robustness and natural accuracy by generating effective adversarial examples. Our method begins by applying linear interpolation between clean and adversarial examples to create interpolated adversarial examples that cross decision boundaries by a controlled margin. This sample-aware strategy tailors adversarial examples to the characteristics of each data point, enabling the model to learn from the most informative perturbations. Additionally, we propose a global epsilon scheduling strategy that progressively adjusts the upper bound of perturbation strengths during training. The combination of these strategies allows the model to develop increasingly complex decision boundaries with better robustness and natural accuracy. Empirical evaluations show that our approach effectively enhances performance against various adversarial attacks, such as PGD and AutoAttack.

Updated: 2024-11-27 00:35:13

标题: 低标签制度下基于边界插值的对抗训练

摘要: 对抗训练已经成为一种有效的方法，用于训练抗击对抗攻击的稳健神经网络模型，即使在标记数据稀缺的低标签制度下也是如此。在本文中，我们介绍了一种新颖的半监督对抗训练方法，通过生成有效的对抗样本来增强稳健性和自然准确性。我们的方法首先通过在干净和对抗样本之间进行线性插值，创建插值对抗样本，通过受控边界跨越决策边界。这种样本感知策略将对抗样本量身定制给每个数据点的特征，使模型能够从最具信息性的扰动中学习。此外，我们提出了一种全局epsilon调度策略，逐渐调整训练过程中的扰动强度上限。这些策略的结合使得模型能够开发出更加复杂的决策边界，从而获得更好的稳健性和自然准确性。实证评估表明，我们的方法有效地增强了对抗攻击的性能，如PGD和AutoAttack。

更新时间: 2024-11-27 00:35:13

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2411.17959v1

Dynamic Logistic Ensembles with Recursive Probability and Automatic Subset Splitting for Enhanced Binary Classification

This paper presents a novel approach to binary classification using dynamic logistic ensemble models. The proposed method addresses the challenges posed by datasets containing inherent internal clusters that lack explicit feature-based separations. By extending traditional logistic regression, we develop an algorithm that automatically partitions the dataset into multiple subsets, constructing an ensemble of logistic models to enhance classification accuracy. A key innovation in this work is the recursive probability calculation, derived through algebraic manipulation and mathematical induction, which enables scalable and efficient model construction. Compared to traditional ensemble methods such as Bagging and Boosting, our approach maintains interpretability while offering competitive performance. Furthermore, we systematically employ maximum likelihood and cost functions to facilitate the analytical derivation of recursive gradients as functions of ensemble depth. The effectiveness of the proposed approach is validated on a custom dataset created by introducing noise and shifting data to simulate group structures, resulting in significant performance improvements with layers. Implemented in Python, this work balances computational efficiency with theoretical rigor, providing a robust and interpretable solution for complex classification tasks with broad implications for machine learning applications. Code at https://github.com/ensemble-art/Dynamic-Logistic-Ensembles

Updated: 2024-11-27 00:22:55

标题: 动态逻辑集成：递归概率和自动子集拆分用于增强二元分类

摘要: 本文提出了一种使用动态逻辑集成模型进行二元分类的新方法。所提出的方法解决了包含固有内部集群且缺乏明确基于特征的分离的数据集所带来的挑战。通过扩展传统的逻辑回归，我们开发了一种算法，可以自动将数据集划分为多个子集，构建逻辑模型的集成以增强分类准确性。本文的一个关键创新是通过代数操作和数学归纳推导出的递归概率计算，使得模型的构建可扩展且高效。与传统的集成方法如Bagging和Boosting相比，我们的方法保持可解释性同时提供竞争性能。此外，我们系统地使用最大似然和成本函数来促进递归梯度作为集成深度函数的分析导出。所提出方法的有效性在一个自定义数据集上得到验证，该数据集通过引入噪声和转移数据来模拟群体结构，结果显示在层数上有显著的性能改进。本文使用Python实现，平衡了计算效率和理论严密性，为复杂分类任务提供了稳健且可解释的解决方案，对机器学习应用具有广泛的影响。代码位于https://github.com/ensemble-art/Dynamic-Logistic-Ensembles。

更新时间: 2024-11-27 00:22:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.18649v1

A Semantic Framework for Neuro-Symbolic Computing

The field of neuro-symbolic AI aims to benefit from the combination of neural networks and symbolic systems. A cornerstone of the field is the translation or encoding of symbolic knowledge into neural networks. Although many neuro-symbolic methods and approaches have been proposed, and with a large increase in recent years, no common definition of encoding exists that can enable a precise, theoretical comparison of neuro-symbolic methods. This paper addresses this problem by introducing a semantic framework for neuro-symbolic AI. We start by providing a formal definition of semantic encoding, specifying the components and conditions under which a knowledge-base can be encoded correctly by a neural network. We then show that many neuro-symbolic approaches are accounted for by this definition. We provide a number of examples and correspondence proofs applying the proposed framework to the neural encoding of various forms of knowledge representation. Many, at first sight disparate, neuro-symbolic methods, are shown to fall within the proposed formalization. This is expected to provide guidance to future neuro-symbolic encodings by placing them in the broader context of semantic encodings of entire families of existing neuro-symbolic systems. The paper hopes to help initiate a discussion around the provision of a theory for neuro-symbolic AI and a semantics for deep learning.

Updated: 2024-11-27 00:22:09

标题: 一个神经符号计算的语义框架

摘要: 神经符号人工智能领域旨在从神经网络和符号系统的结合中获益。该领域的基石是将符号知识翻译或编码为神经网络。虽然已经提出了许多神经符号方法和方法，并且近年来有了大幅增加，但并不存在一种通用的编码定义，可以实现对神经符号方法进行精确的理论比较。本文通过引入一个神经符号人工智能的语义框架来解决这个问题。我们首先提供了语义编码的正式定义，明确了一个知识库在哪些组件和条件下可以被神经网络正确地编码。然后我们展示了许多神经符号方法都符合这个定义。我们提供了一些例子和对应的证明，将提出的框架应用于各种形式的知识表示的神经编码。许多乍一看不同的神经符号方法都被证明符合提出的形式化。这有望通过将它们置于现有神经符号系统整个家族的语义编码的更广泛背景中，为未来的神经符号编码提供指导。本文希望引发关于神经符号人工智能理论和深度学习语义的讨论。

更新时间: 2024-11-27 00:22:09

领域: cs.AI

下载: http://arxiv.org/abs/2212.12050v5

Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning

The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, this paper transforms lane rendering image anomaly detection into a classification problem and proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing to tackle it leveraging state-of-the-art deep learning techniques, especially those involving Transformer models. Various experiments verify the effectiveness of the proposed pipeline. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection, and notably, the self-supervised pre-training with MiM can greatly enhance the detection accuracy while significantly reducing the total training time. For instance, employing the Swin Transformer with Uniform Masking as self-supervised pretraining (Swin-Trans-UM) yielded a heightened accuracy at 94.77% and an improved Area Under The Curve (AUC) score of 0.9743 compared with the pure Swin Transformer without pre-training (Swin-Trans) with an accuracy of 94.01% and an AUC of 0.9498. The fine-tuning epochs were dramatically reduced to 41 from the original 280. In conclusion, the proposed pipeline, with its incorporation of self-supervised pre-training using MiM and other advanced deep learning techniques, emerges as a robust solution for enhancing the accuracy and efficiency of lane rendering image anomaly detection in digital navigation systems.

Updated: 2024-11-27 00:21:57

标题: 使用具有自监督预训练和定制微调的变压器进行车道渲染的智能异常检测

摘要: 数字地图的不断发展为驾驶员提供了极大的便利。然而，车道渲染地图图像中存在异常时，偶尔会引入潜在危险，因为这种异常可能会误导人类驾驶员，从而导致不安全的驾驶条件。为了应对这一问题并准确有效地检测异常，本文将车道渲染图像异常检测转化为一个分类问题，并提出了一个由数据预处理、使用遮罩图像建模（MiM）方法进行自监督预训练、使用基于交叉熵的损失进行定制微调以及后处理组成的四阶段流程，以应用最先进的深度学习技术来解决问题，特别是涉及Transformer模型的技术。各种实验验证了所提出的流程的有效性。结果表明，所提出的流程在车道渲染图像异常检测方面表现出优越的性能，特别是MiM的自监督预训练可以大大提高检测准确性，同时显著减少总训练时间。例如，使用统一掩膜的Swin Transformer作为自监督预训练（Swin-Trans-UM）产生了94.77%的提高准确率和0.9743的改进曲线下面积（AUC）得分，而与未经预训练的纯Swin Transformer（Swin-Trans）相比，其准确率为94.01%，AUC为0.9498。微调周期从原来的280大幅减少到41。总之，所提出的流程，通过将MiM和其他先进的深度学习技术结合在一起，成为数字导航系统中提高车道渲染图像异常检测准确性和效率的稳健解决方案。

更新时间: 2024-11-27 00:21:57

领域: cs.CV,cs.AI,cs.LG,eess.IV,stat.ML

下载: http://arxiv.org/abs/2312.04398v4

Algorithmic Collusion by Large Language Models

The rise of algorithmic pricing raises concerns of algorithmic collusion. We conduct experiments with algorithmic pricing agents based on Large Language Models (LLMs). We find that (1) LLM-based agents are adept at pricing tasks, (2) LLM-based pricing agents autonomously collude in oligopoly settings to the detriment of consumers, and (3) variation in seemingly innocuous phrases in LLM instructions ("prompts") may increase collusion. Novel off-path analysis techniques uncover price-war concerns as contributing to these phenomena. Our results extend to auction settings. Our findings uncover unique challenges to any future regulation of LLM-based pricing agents, and black-box pricing agents more broadly.

Updated: 2024-11-27 00:19:55

标题: 大型语言模型的算法勾结

摘要: 算法定价的崛起引发了算法勾结的担忧。我们进行了基于大型语言模型（LLMs）的算法定价代理的实验。我们发现（1）基于LLM的代理在定价任务上表现出色，（2）基于LLM的定价代理在寡头垄断环境下自主勾结，对消费者造成损害，（3）在LLM指令（“提示”）中看似无害的短语变化可能增加勾结。新颖的离路径分析技术揭示了价格战的担忧对这些现象的贡献。我们的结果延伸到拍卖环境。我们的发现揭示了对基于LLM的定价代理以及更广泛的黑盒定价代理未来任何监管的独特挑战。

更新时间: 2024-11-27 00:19:55

领域: econ.GN,cs.AI,cs.GT,q-fin.EC

下载: http://arxiv.org/abs/2404.00806v2

SelfEval: Leveraging the discriminative nature of generative models for evaluation

We present an automated way to evaluate the text alignment of text-to-image generative diffusion models using standard image-text recognition datasets. Our method, called SelfEval, uses the generative model to compute the likelihood of real images given text prompts, and the likelihood can be used to perform recognition tasks with the generative model. We evaluate generative models on standard datasets created for multimodal text-image discriminative learning and assess fine-grained aspects of their performance: attribute binding, color recognition, counting, shape recognition, spatial understanding. Existing automated metrics rely on an external pretrained model like CLIP (VLMs) or LLMs, and are sensitive to the exact pretrained model and its limitations. SelfEval sidesteps these issues, and to the best of our knowledge, is the first automated metric to show a high degree of agreement for measuring text-faithfulness with the gold-standard human evaluations across multiple generative models, benchmarks and evaluation metrics. SelfEval also reveals that generative models showcase competitive recognition performance on challenging tasks such as Winoground image-score compared to discriminative models. We hope SelfEval enables easy and reliable automated evaluation for diffusion models.

Updated: 2024-11-27 00:15:47

标题: 自评：利用生成模型的区分性质进行评估

摘要: 我们提出了一种自动评估文本到图像生成扩散模型文本对齐的方法，使用标准的图像文本识别数据集。我们的方法称为SelfEval，利用生成模型计算给定文本提示的真实图像的可能性，这种可能性可以用于执行生成模型的识别任务。我们评估了为多模式文本-图像辨别学习创建的标准数据集上的生成模型，并评估其性能的细致方面：属性绑定，颜色识别，计数，形状识别，空间理解。现有的自动度量依赖于像CLIP（VLMs）或LLMs这样的外部预训练模型，并且对于确切的预训练模型及其局限性很敏感。SelfEval避开了这些问题，并据我们所知，是第一个显示高度一致度的自动度量，用于跨多个生成模型，基准和评估度量测量文本忠实度与金标准人类评估。SelfEval还揭示了生成模型在具有挑战性的任务（如Winoground图像评分）上展示出具有竞争力的识别性能，与辨别模型相比。我们希望SelfEval能够为扩散模型提供简单可靠的自动化评估。

更新时间: 2024-11-27 00:15:47

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2311.10708v2

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

Value-based reinforcement learning (RL) can in principle learn effective policies for a wide range of multi-turn problems, from games to dialogue to robotic control, including via offline RL from static previously collected datasets. However, despite the widespread use of policy gradient methods to train large language models for single turn tasks (e.g., question answering), value-based methods for multi-turn RL in an off-policy or offline setting have proven particularly challenging to scale to the setting of large language models. This setting requires effectively leveraging pretraining, scaling to large architectures with billions of parameters, and training on large datasets, all of which represent major challenges for current value-based RL methods. In this work, we propose a novel offline RL algorithm that addresses these drawbacks, casting Q-learning as a modified supervised fine-tuning (SFT) problem where the probabilities of tokens directly translate to Q-values. In this way we obtain an algorithm that smoothly transitions from maximizing the likelihood of the data during pretraining to learning a near-optimal Q-function during finetuning. Our algorithm has strong theoretical foundations, enjoying performance bounds similar to state-of-the-art Q-learning methods, while in practice utilizing an objective that closely resembles SFT. Because of this, our approach can enjoy the full benefits of the pretraining of language models, without the need to reinitialize any weights before RL finetuning, and without the need to initialize new heads for predicting values or advantages. Empirically, we evaluate our method on both pretrained LLMs and VLMs, on a variety of tasks including both natural language dialogue and robotic manipulation and navigation from images.

Updated: 2024-11-27 00:05:44

标题: Q-SFT: 通过监督微调的Q学习用于语言模型

摘要: 基于价值的强化学习（RL）原则上可以学习到一系列多轮问题的有效策略，从游戏到对话再到机器人控制，包括通过离线RL从静态先前收集的数据集中学习。然而，尽管策略梯度方法被广泛用于训练用于单轮任务（例如，问答）的大型语言模型，但用于多轮RL的基于价值的方法在离线或离线设置中被证明尤为具有挑战性，难以扩展到具有数十亿参数的大型语言模型设置。这种设置要求有效地利用预训练，扩展到具有数十亿参数的大型架构，并在大型数据集上进行训练，所有这些都对当前基于价值的RL方法表示重大挑战。在这项工作中，我们提出了一种新颖的离线RL算法，解决了这些缺点，将Q学习构建为修改的监督微调（SFT）问题，其中令牌的概率直接转换为Q值。通过这种方式，我们得到了一种算法，可以平滑地从预训期间最大化数据的可能性过渡到在微调期间学习接近最优Q函数。我们的算法具有坚实的理论基础，享有类似于最先进的Q学习方法的性能界限，同时在实践中利用一种与SFT紧密相似的目标。由于这个原因，我们的方法可以充分享受语言模型的预训练的全部好处，无需在RL微调之前重新初始化任何权重，也无需初始化用于预测值或优势的新头部。在实证方面，我们在预训练LLMs和VLMs上对我们的方法进行评估，涵盖各种任务，包括自然语言对话和从图像中进行机器人操作和导航。

更新时间: 2024-11-27 00:05:44

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.05193v2