Arxiv Day: Article

WROOM: An Autonomous Driving Approach for Off-Road Navigation

Off-road navigation is a challenging problem both at the planning level to get a smooth trajectory and at the control level to avoid flipping over, hitting obstacles, or getting stuck at a rough patch. There have been several recent works using classical approaches involving depth map prediction followed by smooth trajectory planning and using a controller to track it. We design an end-to-end reinforcement learning (RL) system for an autonomous vehicle in off-road environments using a custom-designed simulator in the Unity game engine. We warm-start the agent by imitating a rule-based controller and utilize Proximal Policy Optimization (PPO) to improve the policy based on a reward that incorporates Control Barrier Functions (CBF), facilitating the agent's ability to generalize effectively to real-world scenarios. The training involves agents concurrently undergoing domain-randomized trials in various environments. We also propose a novel simulation environment to replicate off-road driving scenarios and deploy our proposed approach on a real buggy RC car. Videos and additional results: https://sites.google.com/view/wroom-utd/home

Updated: 2024-04-12 23:55:59

标题: WROOM：一种用于越野导航的自主驾驶方法

摘要: 越野导航在规划水平和控制水平上都是一个具有挑战性的问题，需要规划平稳的轨迹以避免翻车、撞击障碍物或卡在崎岖路段上。最近有几项研究采用了传统方法，包括深度图预测、平滑轨迹规划和控制器跟踪。我们设计了一个端到端的强化学习系统，用于越野环境中的自动驾驶车辆，使用了Unity游戏引擎中的自定义模拟器。我们通过模仿基于规则的控制器来启动代理，并利用Proximal Policy Optimization（PPO）根据结合了控制屏障函数（CBF）的奖励来改进策略，从而促进代理有效地泛化到真实场景中。训练包括代理在各种环境中同时进行域随机化试验。我们还提出了一个新颖的模拟环境来复制越野驾驶场景，并将我们提出的方法部署在真实的越野遥控汽车上。视频和额外结果：https://sites.google.com/view/wroom-utd/home

更新时间: 2024-04-12 23:55:59

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2404.08855v1

Uncertainty Quantification in Detecting Choroidal Metastases on MRI via Evolutionary Strategies

Uncertainty quantification plays a vital role in facilitating the practical implementation of AI in radiology by addressing growing concerns around trustworthiness. Given the challenges associated with acquiring large, annotated datasets in this field, there is a need for methods that enable uncertainty quantification in small data AI approaches tailored to radiology images. In this study, we focused on uncertainty quantification within the context of the small data evolutionary strategies-based technique of deep neuroevolution (DNE). Specifically, we employed DNE to train a simple Convolutional Neural Network (CNN) with MRI images of the eyes for binary classification. The goal was to distinguish between normal eyes and those with metastatic tumors called choroidal metastases. The training set comprised 18 images with choroidal metastases and 18 without tumors, while the testing set contained a tumor-to-normal ratio of 15:15. We trained CNN model weights via DNE for approximately 40,000 episodes, ultimately reaching a convergence of 100% accuracy on the training set. We saved all models that achieved maximal training set accuracy. Then, by applying these models to the testing set, we established an ensemble method for uncertainty quantification.The saved set of models produced distributions for each testing set image between the two classes of normal and tumor-containing. The relative frequencies permitted uncertainty quantification of model predictions. Intriguingly, we found that subjective features appreciated by human radiologists explained images for which uncertainty was high, highlighting the significance of uncertainty quantification in AI-driven radiological analyses.

Updated: 2024-04-12 23:49:37

标题: MRI通过进化策略在检测脉络膜转移中的不确定性量化

摘要: 不确定性量化在促进在放射学中实际应用人工智能方面发挥着至关重要的作用，因为它解决了人们对可信度的日益关注。鉴于在这一领域获取大型标注数据集所面临的挑战，有必要开发能够在放射学图像定制的小数据人工智能方法中实现不确定性量化的方法。在这项研究中，我们专注于在小数据进化策略为基础的深度神经进化（DNE）技术中进行不确定性量化。具体来说，我们利用DNE训练了一个简单的卷积神经网络（CNN），使用眼睛的MRI图像进行二元分类。目标是区分正常眼睛和具有脉络膜转移性肿瘤的眼睛。训练集包括18张带有脉络膜转移性肿瘤的图像和18张没有肿瘤的图像，而测试集包含15:15的肿瘤与正常眼睛比例。我们通过DNE对CNN模型权重进行了约40,000个episode的训练，最终在训练集上达到了100%的准确率。我们保存了所有达到最大训练集准确率的模型。然后，通过将这些模型应用于测试集，我们建立了一种用于不确定性量化的集成方法。保存的模型集为每个测试集图像在正常和含肿瘤两类之间产生了分布。相对频率允许对模型预测进行不确定性量化。有趣的是，我们发现，被人类放射科医师欣赏的主观特征解释了不确定性较高的图像，突显了不确定性量化在基于人工智能的放射学分析中的重要性。

更新时间: 2024-04-12 23:49:37

领域: cs.CV,cs.LG,cs.NE

下载: http://arxiv.org/abs/2404.08853v1

Assessing Economic Viability: A Comparative Analysis of Total Cost of Ownership for Domain-Adapted Large Language Models versus State-of-the-art Counterparts in Chip Design Coding Assistance

This paper presents a comparative analysis of total cost of ownership (TCO) and performance between domain-adapted large language models (LLM) and state-of-the-art (SoTA) LLMs , with a particular emphasis on tasks related to coding assistance for chip design. We examine the TCO and performance metrics of a domain-adaptive LLM, ChipNeMo, against two leading LLMs, Claude 3 Opus and ChatGPT-4 Turbo, to assess their efficacy in chip design coding generation. Through a detailed evaluation of the accuracy of the model, training methodologies, and operational expenditures, this study aims to provide stakeholders with critical information to select the most economically viable and performance-efficient solutions for their specific needs. Our results underscore the benefits of employing domain-adapted models, such as ChipNeMo, that demonstrate improved performance at significantly reduced costs compared to their general-purpose counterparts. In particular, we reveal the potential of domain-adapted LLMs to decrease TCO by approximately 90%-95%, with the cost advantages becoming increasingly evident as the deployment scale expands. With expansion of deployment, the cost benefits of ChipNeMo become more pronounced, making domain-adaptive LLMs an attractive option for organizations with substantial coding needs supported by LLMs

Updated: 2024-04-12 23:37:56

标题: 评估经济可行性：领域自适应大语言模型与芯片设计编码辅助的最新对手的总体拥有成本的比较分析

摘要: 本文对领域适应大型语言模型（LLM）和最先进（SoTA）LLM之间的总拥有成本（TCO）和性能进行了比较分析，特别强调与芯片设计相关的编码辅助任务。我们对领域适应型LLM ChipNeMo与两个领先的LLM（Claude 3 Opus和ChatGPT-4 Turbo）的TCO和性能指标进行了评估，以评估它们在芯片设计编码生成方面的有效性。通过对模型准确性、训练方法和运营支出的详细评估，本研究旨在为利益相关者提供选择最经济和性能高效的解决方案以满足其特定需求的关键信息。我们的结果强调了采用领域适应型模型（如ChipNeMo）的好处，这些模型在成本显著降低的同时表现出改善的性能，相比于其通用对应物。特别是，我们揭示了领域适应型LLM在降低TCO方面的潜力，成本优势随着部署规模的扩大而日益显现。随着部署规模的扩大，ChipNeMo的成本优势变得更加明显，使领域适应型LLM成为具有大量编码需求的组织的吸引人选择。

更新时间: 2024-04-12 23:37:56

领域: cs.AI,cs.CE,cs.LG

下载: http://arxiv.org/abs/2404.08850v1

LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models

Differential privacy (DP) is widely being employed in the industry as a practical standard for privacy protection. While private training of computer vision or natural language processing applications has been studied extensively, the computational challenges of training of recommender systems (RecSys) with DP have not been explored. In this work, we first present our detailed characterization of private RecSys training using DP-SGD, root-causing its several performance bottlenecks. Specifically, we identify DP-SGD's noise sampling and noisy gradient update stage to suffer from a severe compute and memory bandwidth limitation, respectively, causing significant performance overhead in training private RecSys. Based on these findings, we propose LazyDP, an algorithm-software co-design that addresses the compute and memory challenges of training RecSys with DP-SGD. Compared to a state-of-the-art DP-SGD training system, we demonstrate that LazyDP provides an average 119x training throughput improvement while also ensuring mathematically equivalent, differentially private RecSys models to be trained.

Updated: 2024-04-12 23:32:06

标题: LazyDP：共同设计算法-软件，用于可扩展训练差分隐私推荐模型

摘要: 差分隐私（DP）被广泛应用于工业界作为隐私保护的实际标准。虽然已经广泛研究了使用DP进行计算机视觉或自然语言处理应用程序的私有训练，但尚未探讨使用DP进行推荐系统（RecSys）训练的计算挑战。在这项工作中，我们首先详细描述了使用DP-SGD进行私有RecSys训练的特征，找出了导致其性能瓶颈的几个原因。具体来说，我们发现DP-SGD的噪声采样和嘈杂的梯度更新阶段分别受到严重的计算和内存带宽限制，导致私有RecSys训练中出现显著的性能开销。基于这些发现，我们提出了LazyDP，这是一种算法和软件共同设计，可以解决使用DP-SGD训练RecSys时的计算和内存挑战。与最先进的DP-SGD训练系统相比，我们展示了LazyDP提供了平均119倍的训练吞吐量改善，同时还确保了数学上等效的差分隐私RecSys模型可以进行训练。

更新时间: 2024-04-12 23:32:06

领域: cs.IR,cs.CR,cs.LG

下载: http://arxiv.org/abs/2404.08847v1

Experimental Design for Active Transductive Inference in Large Language Models

Transduction, the ability to include query-specific examples in the prompt at inference time, is one of the emergent abilities of large language models (LLMs). In this work, we propose a framework for adaptive prompt design called active transductive inference (ATI). We design the LLM prompt by adaptively choosing few-shot examples for a given inference query. The examples are initially unlabeled and we query the user to label the most informative ones, which maximally reduces the uncertainty in the LLM prediction. We propose two algorithms, GO and SAL, which differ in how the few-shot examples are chosen. We analyze these algorithms in linear models: first GO and then use its equivalence with SAL. We experiment with many different tasks and show that GO and SAL outperform other methods for choosing few-shot examples in the LLM prompt at inference time.

Updated: 2024-04-12 23:27:46

标题: 大型语言模型中主动传导推理的实验设计

摘要: 转导性，即在推理时将特定于查询的示例包含在提示中的能力，是大型语言模型（LLMs）的新兴能力之一。在这项工作中，我们提出了一种自适应提示设计框架，称为主动转导推理（ATI）。我们通过自适应地选择给定推理查询的少样本示例来设计LLM提示。这些示例最初未标记，我们询问用户标记最具信息量的示例，从而最大程度地减少LLM预测中的不确定性。我们提出了两种算法，GO和SAL，它们在选择少样本示例的方式上有所不同。我们在线性模型中分析这些算法：首先是GO，然后利用其与SAL的等效性。我们在许多不同任务上进行实验，并展示GO和SAL在推理时选择LLM提示中少样本示例方面优于其他方法。

更新时间: 2024-04-12 23:27:46

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.08846v1

Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying

We introduce Tied-LoRA, a novel paradigm leveraging weight tying and selective training to enhance the parameter efficiency of Low-rank Adaptation (LoRA). Our exploration encompasses different plausible combinations of parameter training and freezing, coupled with weight tying, aimed at identifying the optimal trade-off between performance and the count of trainable parameters. Across $5$ diverse tasks and two foundational language models with different parameter counts, our experiments provide comprehensive insights into the inherent trade-offs between efficiency and performance. Our findings reveal a specific Tied-LoRA configuration that distinguishes itself by showcasing comparable performance to LoRA across multiple tasks while utilizing only a fraction of the parameters employed by the standard LoRA method, particularly at elevated ranks. This underscores the efficacy of Tied-LoRA in achieving impressive results with significantly reduced model complexity.

Updated: 2024-04-12 23:15:51

标题: Tied-Lora: 使用权重绑定增强LoRA的参数效率

摘要: 我们介绍了Tied-LoRA，这是一种新颖的范式，利用权重绑定和选择性训练来增强低秩适应（LoRA）的参数效率。我们的探索涵盖了不同的参数训练和冻结的可能组合，结合权重绑定，旨在找到性能和可训练参数数量之间的最佳折衷。在5个不同任务和两个具有不同参数数量的基础语言模型上，我们的实验提供了对效率和性能之间固有折衷的全面见解。我们的发现揭示了一个特定的Tied-LoRA配置，它通过展示与LoRA在多个任务上相当的性能，同时仅利用标准LoRA方法所使用参数的一小部分，特别是在提升的秩上。这强调了Tied-LoRA在以显著减少的模型复杂性实现令人印象深刻的结果方面的效力。

更新时间: 2024-04-12 23:15:51

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.09578v2

Post-Quantum Hybrid Digital Signatures with Hardware-Support for Digital Twins

Digital Twins (DT) virtually model cyber-physical objects using Internet of Things (IoT) components (e.g., sensors) to gather and process senstive information stored in the cloud. Trustworthiness of the streamed data is crucial which requires quantum safety and breach resiliency. Digital signatures are essential for scalable authentication and non-repudiation. Yet, NIST PQC signature standards are exorbitantly costly for low-end IoT without considering forward security. Moreover, Post-Quantum (PQ) signatures lack aggregation, which is highly desirable to reduce the transmission and storage burdens in DTs. Hence, there is an urgent need for lightweight digital signatures that offer compromise resiliency and compactness while permitting an effective transition into the PQ era for DTs. We create a series of highly lightweight digital signatures called Hardware-ASsisted Efficient Signature (HASES) that meets the above requirements. The core of HASES is a hardware-assisted cryptographic commitment construct oracle (CCO) that permits verifiers to obtain expensive commitments without signer interaction. We created three HASES schemes: PQ-HASES is a forward-secure PQ signature, LA-HASES is an efficient aggregate Elliptic-Curve signature, and HY-HASES is a novel hybrid scheme that combines PQ-HASES and LA-HASES with novel strong nesting and sequential aggregation. HASES does not require a secure-hardware on the signer. We proved that HASES schemes are secure and implemented them on commodity hardware and an 8-bit AVR ATmega2560. Our experiments confirm that PQ-HASES and LA-HASES are two magnitudes of times more signer efficient than their PQ and conventional-secure counterparts, respectively. HY-HASES outperforms NIST PQC and conventional signature combinations, offering a standardcompliant transitional solution for emerging DTs. We open-source HASES schemes for public testing and adaptation.

Updated: 2024-04-12 23:13:34

标题: Post-Quantum混合数字签名技术及数字孪生硬件支持

摘要: 数字孪生（DT）利用物联网组件（例如传感器）虚拟建模网络物理对象，以收集和处理存储在云端的敏感信息。流式数据的可信度至关重要，需要量子安全和防泄漏能力。数字签名对于可扩展的身份验证和不可否认性至关重要。然而，NIST PQC签名标准对于低端物联网而言成本过高，且未考虑前向安全性。此外，后量子（PQ）签名缺乏聚合功能，而在数字孪生中减少传输和存储负担是非常重要的。因此，迫切需要轻量级数字签名，既具有妥协抗性和紧凑性，又能有效过渡到PQ时代的数字孪生中。我们创建了一系列名为硬件辅助高效签名（HASES）的高度轻量级数字签名，满足上述要求。HASES的核心是一个硬件辅助的密码承诺构造预言机（CCO），允许验证者在无需签名者交互的情况下获取昂贵的承诺。我们创建了三种HASES方案：PQ-HASES是一种前向安全的PQ签名，LA-HASES是一种高效的椭圆曲线聚合签名，HY-HASES是一种结合了PQ-HASES和LA-HASES的新型混合方案，具有新颖的强嵌套和顺序聚合功能。HASES不需要签名者上的安全硬件。我们证明了HASES方案的安全性，并在商品硬件和8位AVR ATmega2560上实现了它们。我们的实验证实，PQ-HASES和LA-HASES的签名效率分别比它们的PQ和传统安全对应物高两个数量级。HY-HASES优于NIST PQC和传统签名组合，为新兴数字孪生提供了符合标准的过渡解决方案。我们开放源代码HASES方案供公众测试和适应。

更新时间: 2024-04-12 23:13:34

领域: cs.CR

下载: http://arxiv.org/abs/2305.12298v2

Multi-fingered Robotic Hand Grasping in Cluttered Environments through Hand-object Contact Semantic Mapping

The integration of optimization method and generative models has significantly advanced dexterous manipulation techniques for five-fingered hand grasping. Yet, the application of these techniques in cluttered environments is a relatively unexplored area. To address this research gap, we have developed a novel method for generating five-fingered hand grasp samples in cluttered settings. This method emphasizes simulated grasp quality and the nuanced interaction between the hand and surrounding objects. A key aspect of our approach is our data generation method, capable of estimating contact spatial and semantic representations and affordance grasps based on object affordance information. Furthermore, our Contact Semantic Conditional Variational Autoencoder (CoSe-CVAE) network is adept at creating comprehensive contact maps from point clouds, incorporating both spatial and semantic data. We introduce a unique grasp detection technique that efficiently formulates mechanical hand grasp poses from these maps. Additionally, our evaluation model is designed to assess grasp quality and collision probability, significantly improving the practicality of five-fingered hand grasping in complex scenarios. Our data generation method outperforms previous datasets in grasp diversity, scene diversity, modality diversity. Our grasp generation method has demonstrated remarkable success, outperforming established baselines with 81.0% average success rate in real-world single-object grasping and 75.3% success rate in multi-object grasping. The dataset and supplementary materials can be found at https://sites.google.com/view/ffh-clutteredgrasping, and we will release the code upon publication.

Updated: 2024-04-12 23:11:36

标题: 在混乱环境中通过手-物体接触语义映射实现多指机器人手抓取

摘要: 优化方法和生成模型的集成显著推进了五指手抓的灵巧操纵技术。然而，在混乱环境中应用这些技术是一个相对未开发的领域。为了填补这一研究空白，我们开发了一种在混乱环境中生成五指手抓样本的新方法。该方法强调模拟抓取质量和手部与周围物体之间微妙的互动。我们方法的关键方面是我们的数据生成方法，能够基于物体可负担性信息估计接触空间和语义表示以及可负担性抓取。此外，我们的接触语义条件变分自动编码器（CoSe-CVAE）网络擅长从点云创建综合接触图，结合空间和语义数据。我们引入了一种独特的抓取检测技术，有效地从这些图中制定机械手抓取姿势。此外，我们的评估模型旨在评估抓取质量和碰撞概率，显著提高了在复杂场景中进行五指手抓的实用性。我们的数据生成方法在抓取多样性、场景多样性和模态多样性方面优于先前的数据集。我们的抓取生成方法已经取得了显著的成功，在现实世界中单物体抓取的平均成功率达到81.0%，在多物体抓取中的成功率为75.3%。数据集和补充资料可以在https://sites.google.com/view/ffh-clutteredgrasping 找到，我们将在发表后发布代码。

更新时间: 2024-04-12 23:11:36

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.08844v1

Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation

In this work, we propose sequence-level certainty as a common theme over hallucination in Knowledge Grounded Dialogue Generation (KGDG). We explore the correlation between the level of hallucination in model responses and two types of sequence-level certainty: probabilistic certainty and semantic certainty. Empirical results reveal that higher levels of both types of certainty in model responses are correlated with lower levels of hallucination. We further propose Certainty-based Response Ranking (CRR), a decoding-time hallucination mitigation method that samples several response candidates, ranks them based on sequence-level certainty, and outputs the response with the highest certainty level. Aligning with our definitions of sequence-level certainty, we design 2 types of CRR approaches: Probabilistic CRR (P-CRR) and Semantic CRR (S-CRR). P-CRR ranks individually sampled model responses using the arithmetic mean log-probability of the entire sequence. S-CRR approaches certainty estimation from meaning-space, and ranks model response candidates based on their semantic certainty level as measured by an entailment-based Agreement Score (AS). Through extensive experiments across 3 KGDG datasets, 3 decoding methods, and 4 KGDG models, we validate the effectiveness of CRR for reducing hallucination in KGDG task.

Updated: 2024-04-12 23:09:52

标题: 序列级确定性降低基于知识的对话生成中的幻觉

摘要: 在这项工作中，我们提出序列级确定性作为知识驱动对话生成（KGDG）中幻觉的一个共同主题。我们探讨模型响应中幻觉水平与两种序列级确定性（概率确定性和语义确定性）之间的相关性。实证结果显示，模型响应中两种确定性类型的较高水平与较低水平的幻觉水平相关。我们进一步提出了基于确定性的响应排序（CRR），这是一种在解码时减轻幻觉的方法，它对多个响应候选进行抽样，根据序列级确定性对它们进行排序，并输出具有最高确定性水平的响应。与我们对序列级确定性的定义一致，我们设计了两种CRR方法：概率CRR（P-CRR）和语义CRR（S-CRR）。P-CRR使用整个序列的算术平均对单独抽样的模型响应进行排序。S-CRR从含义空间中进行确定性估计，并根据它们的语义确定性水平（通过基于蕴涵的协议分数（AS）测量）对模型响应候选进行排序。通过在3个KGDG数据集、3种解码方法和4个KGDG模型上进行广泛实验，我们验证了CRR在减少KGDG任务中的幻觉方面的有效性。

更新时间: 2024-04-12 23:09:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.18794v3

Multiply-Robust Causal Change Attribution

Comparing two samples of data, we observe a change in the distribution of an outcome variable. In the presence of multiple explanatory variables, how much of the change can be explained by each possible cause? We develop a new estimation strategy that, given a causal model, combines regression and re-weighting methods to quantify the contribution of each causal mechanism. Our proposed methodology is multiply robust, meaning that it still recovers the target parameter under partial misspecification. We prove that our estimator is consistent and asymptotically normal. Moreover, it can be incorporated into existing frameworks for causal attribution, such as Shapley values, which will inherit the consistency and large-sample distribution properties. Our method demonstrates excellent performance in Monte Carlo simulations, and we show its usefulness in an empirical application.

Updated: 2024-04-12 22:57:01

标题: 多重稳健因果变化归因

摘要: 比较两组数据样本，我们观察到一个结果变量分布的改变。在存在多个解释变量的情况下，每个可能原因能解释多少改变？我们开发了一种新的估计策略，结合回归和重新加权方法，根据因果模型量化每个因果机制的贡献。我们提出的方法是多重稳健的，意味着即使在部分规格化错误的情况下仍能恢复目标参数。我们证明我们的估计量是一致的和渐近正态的。此外，它可以整合到现有的因果归因框架中，如夏普利值，将继承一致性和大样本分布特性。我们的方法在蒙特卡罗模拟中表现出色，并展示了在实证应用中的有用性。

更新时间: 2024-04-12 22:57:01

领域: stat.ME,cs.LG,econ.EM,stat.ML

下载: http://arxiv.org/abs/2404.08839v1

Predicting Traffic Congestion at Urban Intersections Using Data-Driven Modeling

Traffic congestion at intersections is a significant issue in urban areas, leading to increased commute times, safety hazards, and operational inefficiencies. This study aims to develop a predictive model for congestion at intersections in major U.S. cities, utilizing a dataset of trip-logging metrics from commercial vehicles across 4,800 intersections. The dataset encompasses 27 features, including intersection coordinates, street names, time of day, and traffic metrics (Kashyap et al., 2019). Additional features, such as rainfall/snowfall percentage, distance from downtown and outskirts, and road types, were incorporated to enhance the model's predictive power. The methodology involves data exploration, feature transformation, and handling missing values through low-rank models and label encoding. The proposed model has the potential to assist city planners and governments in anticipating traffic hot spots, optimizing operations, and identifying infrastructure challenges.

Updated: 2024-04-12 22:53:41

标题: 利用数据驱动建模预测城市交叉口交通拥堵

摘要: 交通拥堵在城市地区是一个重要问题，导致通勤时间增加，安全隐患增加，以及运营效率低下。本研究旨在开发一个预测模型，用于预测美国主要城市的交叉口拥堵情况，利用了来自4800个交叉口的商用车辆的行程记录数据集。该数据集包括27个特征，包括交叉口坐标、街道名称、时间、以及交通指标。为了增强模型的预测能力，还加入了降雨/降雪百分比、距离市中心和郊区的距离、以及道路类型等额外特征。方法包括数据探索、特征转换，以及使用低秩模型和标签编码处理缺失值。该模型有潜力帮助城市规划者和政府预测交通热点，优化运营，并识别基础设施挑战。

更新时间: 2024-04-12 22:53:41

领域: cs.LG

下载: http://arxiv.org/abs/2404.08838v1

Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers

An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from object-level features. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where consistent improvements in performance and sample efficiency are observed.

Updated: 2024-04-12 22:49:28

标题: 摘要生成器和关系性交叉注意力：Transformer中显式关系推理的归纳偏好

摘要: 我们提出了一种Transformer的扩展，通过一个名为Abstractor的新模块实现了显式的关系推理。在Abstractor的核心是一种称为关系交叉注意力的注意力变体。该方法受到了对关系学习的架构归纳偏差的启发，该偏差从对象级特征中解开了关系信息。这使得显式的关系推理成为可能，支持从有限数据中进行抽象和泛化。首先在简单的辨别性关系任务上评估了Abstractor，并与现有的关系架构进行了比较。接下来，在纯粹的关系序列到序列任务上评估了Abstractor，在样本效率方面与标准的Transformer相比有着显著的改进。最后，在基于数学问题解决的一系列任务上评估了Abstractor，在性能和样本效率方面观察到了一致的改进。

更新时间: 2024-04-12 22:49:28

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2304.00195v4

Vehicle-to-Vehicle Charging: Model, Complexity, and Heuristics

The rapid adoption of Electric Vehicles (EVs) poses challenges for electricity grids to accommodate or mitigate peak demand. Vehicle-to-Vehicle Charging (V2VC) has been recently adopted by popular EVs, posing new opportunities and challenges to the management and operation of EVs. We present a novel V2VC model that allows decision-makers to take V2VC into account when optimizing their EV operations. We show that optimizing V2VC is NP-Complete and find that even small problem instances are computationally challenging. We propose R-V2VC, a heuristic that takes advantage of the resulting totally unimodular constraint matrix to efficiently solve problems of realistic sizes. Our results demonstrate that R-V2VC presents a linear growth in the solution time as the problem size increases, while achieving solutions of optimal or near-optimal quality. R-V2VC can be used for real-world operations and to study what-if scenarios when evaluating the costs and benefits of V2VC.

Updated: 2024-04-12 22:46:37

标题: 车辆间充电：模型、复杂性和启发式

摘要: 电动汽车（EVs）的快速普及对电网提出了容纳或减轻峰值需求的挑战。最近流行的EVs已经采用了车辆对车辆充电（V2VC），为EVs的管理和运营提出了新的机遇和挑战。我们提出了一种新颖的V2VC模型，允许决策者在优化他们的EV运营时考虑V2VC。我们表明，优化V2VC是NP-Complete的，即使是小问题实例也具有计算挑战性。我们提出了R-V2VC，一种利用结果完全单调约束矩阵的启发式方法，可以高效解决实际规模的问题。我们的结果表明，随着问题规模的增加，R-V2VC的解决时间呈线性增长，同时实现了最优或接近最优质量的解决方案。R-V2VC可以用于实际运营，并在评估V2VC的成本和收益时研究假设情景。

更新时间: 2024-04-12 22:46:37

领域: cs.AI

下载: http://arxiv.org/abs/2404.08837v1

BERT-LSH: Reducing Absolute Compute For Attention

This study introduces a novel BERT-LSH model that incorporates Locality Sensitive Hashing (LSH) to approximate the attention mechanism in the BERT architecture. We examine the computational efficiency and performance of this model compared to a standard baseline BERT model. Our findings reveal that BERT-LSH significantly reduces computational demand for the self-attention layer while unexpectedly outperforming the baseline model in pretraining and fine-tuning tasks. These results suggest that the LSH-based attention mechanism not only offers computational advantages but also may enhance the model's ability to generalize from its training data. For more information, visit our GitHub repository: https://github.com/leo4life2/algoml-final

Updated: 2024-04-12 22:35:00

标题: BERT-LSH：减少注意力计算的绝对量

摘要: 这项研究介绍了一种新颖的BERT-LSH模型，该模型将局部敏感哈希（LSH）结合到BERT架构中以近似注意机制。我们比较了该模型与标准基准BERT模型在计算效率和性能方面的差异。我们的研究结果表明，BERT-LSH显著降低了自注意层的计算需求，同时在预训练和微调任务中意外地优于基准模型。这些结果表明，基于LSH的注意机制不仅提供了计算优势，还可能增强模型从训练数据中泛化的能力。更多信息，请访问我们的GitHub存储库：https://github.com/leo4life2/algoml-final

更新时间: 2024-04-12 22:35:00

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.08836v1

General surgery vision transformer: A video pre-trained foundation model for general surgery

The absence of openly accessible data and specialized foundation models is a major barrier for computational research in surgery. Toward this, (i) we open-source the largest dataset of general surgery videos to-date, consisting of 680 hours of surgical videos, including data from robotic and laparoscopic techniques across 28 procedures; (ii) we propose a technique for video pre-training a general surgery vision transformer (GSViT) on surgical videos based on forward video prediction that can run in real-time for surgical applications, toward which we open-source the code and weights of GSViT; (iii) we also release code and weights for procedure-specific fine-tuned versions of GSViT across 10 procedures; (iv) we demonstrate the performance of GSViT on the Cholec80 phase annotation task, displaying improved performance over state-of-the-art single frame predictors.

Updated: 2024-04-12 22:30:54

标题: 普通外科视觉变换器：一种用于普通外科的视频预训练基础模型

摘要: 手术计算研究中一个主要的障碍是缺乏开放获取的数据和专业的基础模型。为此，（i）我们开源迄今最大的一组普通外科手术视频数据集，包括来自28种手术技术的680小时手术视频数据，包括机器人和腹腔镜技术的数据；（ii）我们提出了一种基于前向视频预测的手术视频预训练通用外科视觉变换器（GSViT）技术，可实时运行于手术应用中，我们开源了GSViT的代码和权重；（iii）我们还发布了针对10种手术程序的特定程序微调版本的GSViT的代码和权重；（iv）我们展示了GSViT在Cholec80阶段注释任务上的性能，显示出比最先进的单帧预测器性能更好。

更新时间: 2024-04-12 22:30:54

领域: cs.CV,cs.LG,q-bio.TO

下载: http://arxiv.org/abs/2403.05949v3

SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Geographic information is essential for modeling tasks in fields ranging from ecology to epidemiology. However, extracting relevant location characteristics for a given task can be challenging, often requiring expensive data fusion or distillation from massive global imagery datasets. To address this challenge, we introduce Satellite Contrastive Location-Image Pretraining (SatCLIP). This global, general-purpose geographic location encoder learns an implicit representation of locations by matching CNN and ViT inferred visual patterns of openly available satellite imagery with their geographic coordinates. The resulting SatCLIP location encoder efficiently summarizes the characteristics of any given location for convenient use in downstream tasks. In our experiments, we use SatCLIP embeddings to improve prediction performance on nine diverse location-dependent tasks including temperature prediction, animal recognition, and population density estimation. Across tasks, SatCLIP consistently outperforms alternative location encoders and improves geographic generalization by encoding visual similarities of spatially distant environments. These results demonstrate the potential of vision-location models to learn meaningful representations of our planet from the vast, varied, and largely untapped modalities of geospatial data.

Updated: 2024-04-12 22:23:32

标题: SatCLIP：利用卫星图像的全球通用位置嵌入

摘要: 地理信息对于从生态学到流行病学等各个领域的建模任务至关重要。然而，提取给定任务的相关位置特征可能具有挑战性，通常需要从庞大的全球图像数据集中进行昂贵的数据融合或提取。为了解决这一挑战，我们引入了Satellite Contrastive Location-Image Pretraining（SatCLIP）。这个全球通用地理位置编码器通过匹配CNN和ViT推断的可用卫星图像的视觉模式与它们的地理坐标来学习位置的隐式表示。由此产生的SatCLIP位置编码器有效地总结了任何给定位置的特征，以便在下游任务中方便使用。在我们的实验中，我们使用SatCLIP嵌入来提高九个不同的位置相关任务的预测性能，包括温度预测、动物识别和人口密度估算。在各种任务中，SatCLIP始终优于替代的位置编码器，并通过编码空间上远的环境的视觉相似性来提高地理泛化能力。这些结果展示了视觉-位置模型从地理空间数据的广泛、多样和大部分未开发的模态中学习有意义的地球表示的潜力。

更新时间: 2024-04-12 22:23:32

领域: cs.CV,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2311.17179v3

Variance Reduction based Experience Replay for Policy Optimization

For reinforcement learning on complex stochastic systems, it is desirable to effectively leverage the information from historical samples collected in previous iterations to accelerate policy optimization. Classical experience replay, while effective, treats all observations uniformly, neglecting their relative importance. To address this limitation, we introduce a novel Variance Reduction Experience Replay (VRER) framework, enabling the selective reuse of relevant samples to improve policy gradient estimation. VRER, as an adaptable method that can seamlessly integrate with different policy optimization algorithms, forms the foundation of our sample efficient off-policy learning algorithm known as Policy Gradient with VRER (PG-VRER). Furthermore, the lack of a rigorous understanding of the experience replay approach in the literature motivates us to introduce a novel theoretical framework that accounts for sample dependencies induced by Markovian noise and behavior policy interdependencies. This framework is then employed to analyze the finite-time convergence of the proposed PG-VRER algorithm, revealing a crucial bias-variance trade-off in policy gradient estimation: the reuse of older experience tends to introduce a larger bias while simultaneously reducing gradient estimation variance. Extensive experiments have shown that VRER offers a notable and consistent acceleration in learning optimal policies and enhances the performance of state-of-the-art (SOTA) policy optimization approaches.

Updated: 2024-04-12 22:13:14

标题: 基于方差减少的经验重放用于策略优化

摘要: 对于复杂随机系统上的强化学习，有效利用历史样本信息以加速策略优化是可取的。经典的经验回放虽然有效，但将所有观察结果一视同仁，忽略了它们的相对重要性。为了解决这一局限性，我们引入了一种新颖的方差降低经验回放（VRER）框架，使得可以选择性地重复使用相关样本以改善策略梯度估计。VRER作为一种适应性方法，可以无缝地与不同的策略优化算法集成，构建了我们的高效离线学习算法的基础，即带有VRER的策略梯度（PG-VRER）。此外，文献中对经验回放方法的缺乏严格理解促使我们引入了一个新颖的理论框架，考虑了由马尔可夫噪声和行为策略引起的样本依赖性。然后利用该框架分析了所提出的PG-VRER算法的有限时间收敛性，揭示了策略梯度估计中的关键偏差-方差权衡：重复使用较旧的经验往往会引入更大的偏差，同时降低梯度估计的方差。大量实验证明，VRER在学习最佳策略和增强最先进的策略优化方法的性能方面提供了显著和一致的加速。

更新时间: 2024-04-12 22:13:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2110.08902v4

PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining

Differential Privacy (DP) image data synthesis, which leverages the DP technique to generate synthetic data to replace the sensitive data, allowing organizations to share and utilize synthetic images without privacy concerns. Previous methods incorporate the advanced techniques of generative models and pre-training on a public dataset to produce exceptional DP image data, but suffer from problems of unstable training and massive computational resource demands. This paper proposes a novel DP image synthesis method, termed PRIVIMAGE, which meticulously selects pre-training data, promoting the efficient creation of DP datasets with high fidelity and utility. PRIVIMAGE first establishes a semantic query function using a public dataset. Then, this function assists in querying the semantic distribution of the sensitive dataset, facilitating the selection of data from the public dataset with analogous semantics for pre-training. Finally, we pre-train an image generative model using the selected data and then fine-tune this model on the sensitive dataset using Differentially Private Stochastic Gradient Descent (DP-SGD). PRIVIMAGE allows us to train a lightly parameterized generative model, reducing the noise in the gradient during DP-SGD training and enhancing training stability. Extensive experiments demonstrate that PRIVIMAGE uses only 1% of the public dataset for pre-training and 7.6% of the parameters in the generative model compared to the state-of-the-art method, whereas achieves superior synthetic performance and conserves more computational resources. On average, PRIVIMAGE achieves 30.1% lower FID and 12.6% higher Classification Accuracy than the state-of-the-art method. The replication package and datasets can be accessed online.

Updated: 2024-04-12 22:08:40

标题: PrivImage：使用具有语义感知预训练的扩散模型生成差分私有合成图像

摘要: 差分隐私（DP）图像数据合成利用DP技术生成合成数据来替换敏感数据，使组织能够共享和利用合成图像而无需担心隐私问题。先前的方法将生成模型的先进技术和在公共数据集上的预训练相结合，以产生出色的DP图像数据，但存在训练不稳定和大量计算资源需求的问题。本文提出了一种新颖的DP图像合成方法，称为PRIVIMAGE，精心选择预训练数据，促进高保真度和实用性的DP数据集的有效创建。PRIVIMAGE首先利用公共数据集建立一个语义查询函数。然后，该函数帮助查询敏感数据集的语义分布，方便选择具有类似语义的公共数据集进行预训练。最后，我们使用选定的数据预训练一个图像生成模型，然后使用差分隐私随机梯度下降（DP-SGD）在敏感数据集上对该模型进行微调。PRIVIMAGE允许我们训练一个轻量级参数化的生成模型，在DP-SGD训练期间减少梯度中的噪音并提高训练稳定性。大量实验证明，与最先进的方法相比，PRIVIMAGE仅使用公共数据集的1％进行预训练，并在生成模型中使用7.6％的参数，同时实现了更优越的合成性能并保留更多的计算资源。平均而言，PRIVIMAGE的FID较低30.1％，分类准确率较高12.6％。复制包和数据集可在线访问。

更新时间: 2024-04-12 22:08:40

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2311.12850v3

Public-private funding models in open source software development: A case study on scikit-learn

Governments are increasingly funding open source software (OSS) development to address concerns regarding software security, digital sovereignty, and national competitiveness in science and innovation. While announcements of governmental funding are generally well-received by OSS developers, we still have a limited understanding of how they evaluate the relative benefits and drawbacks of such funding compared to other types of funding. This paper explores this question through a case study on scikit-learn, a Python library for machine learning, whose funding combines research grants, commercial sponsorship, community donations, and a 32 million Euro grant from France's artificial intelligence strategy. Through 25 interviews with scikit-learn's maintainers and funders, this study makes two key contributions to research and practice. First, the study contributes novel findings about the design and implementation of a public-private funding model in an OSS project. It sheds light on the respective roles that public and private funders have played in supporting scikit-learn, and the processes and governance mechanisms employed by the maintainers to balance their funders' diverse interests and to safeguard community interests. Second, it offers practical recommendations. For OSS developer communities, it illustrates the benefits of a diversified funding model for balancing the merits and drawbacks of different funding sources and mitigating dependence on single funders. For companies, it serves as a reminder that sponsoring developers or OSS projects can significantly help maintainers, who often struggle with limited resources and towering workloads. For governments, it emphasises the importance of funding the maintenance of existing OSS in addition to funding the development of new software or features. The paper concludes with suggestions for future research.

Updated: 2024-04-12 22:06:02

标题: 公共-私人资金模式在开源软件开发中的应用：以scikit-learn为例研究

摘要: 政府越来越多地资助开源软件（OSS）开发，以解决关于软件安全、数字主权和国家在科学和创新方面的竞争力的担忧。尽管政府资助的消息通常受到OSS开发者的欢迎，但我们对他们如何评估这种资助相对于其他类型的资助的优势和劣势仍了解有限。本文通过对scikit-learn的案例研究探讨了这个问题，scikit-learn是一个Python机器学习库，其资助包括研究资助、商业赞助、社区捐赠以及来自法国人工智能战略的3200万欧元资助。通过对scikit-learn的维护者和资助者进行的25次访谈，本研究为研究和实践做出了两个关键贡献。首先，本研究为公私资助模式在OSS项目中的设计和实施提供了新颖发现。它揭示了公共和私人资助者在支持scikit-learn项目中所扮演的角色，以及维护者为平衡资助者的多样利益和维护社区利益而采取的流程和治理机制。其次，它提供了实用建议。对于OSS开发者社区，它说明了通过多元化资助模式平衡不同资助来源的优点和缺点，并减轻对单一资助者的依赖的好处。对于公司来说，它提醒赞助开发人员或OSS项目可以显著帮助经常面临有限资源和庞大工作量的维护者。对于政府来说，它强调了除了资助新软件或功能的开发之外，资助现有OSS的维护同样重要。文章最后提出了未来研究的建议。

更新时间: 2024-04-12 22:06:02

领域: cs.SE,cs.AI,cs.CY,cs.LG,K.4.1

下载: http://arxiv.org/abs/2404.06484v3

Structured Model Pruning for Efficient Inference in Computational Pathology

Recent years have seen significant efforts to adopt Artificial Intelligence (AI) in healthcare for various use cases, from computer-aided diagnosis to ICU triage. However, the size of AI models has been rapidly growing due to scaling laws and the success of foundational models, which poses an increasing challenge to leverage advanced models in practical applications. It is thus imperative to develop efficient models, especially for deploying AI solutions under resource-constrains or with time sensitivity. One potential solution is to perform model compression, a set of techniques that remove less important model components or reduce parameter precision, to reduce model computation demand. In this work, we demonstrate that model pruning, as a model compression technique, can effectively reduce inference cost for computational and digital pathology based analysis with a negligible loss of analysis performance. To this end, we develop a methodology for pruning the widely used U-Net-style architectures in biomedical imaging, with which we evaluate multiple pruning heuristics on nuclei instance segmentation and classification, and empirically demonstrate that pruning can compress models by at least 70% with a negligible drop in performance.

Updated: 2024-04-12 22:05:01

标题: 计算病理学中的高效推理结构化模型剪枝

摘要: 近年来，人们在医疗保健领域采用人工智能（AI）进行各种用例，从辅助诊断到ICU分诊都有显著的努力。然而，由于规模定律和基础模型的成功，AI模型的规模快速增长，这给在实际应用中利用先进模型带来了越来越大的挑战。因此，开发高效的模型是至关重要的，特别是在资源受限或有时间敏感性的情况下部署AI解决方案。一种潜在的解决方案是进行模型压缩，一组技术可以去除不太重要的模型组件或减少参数精度，以减少模型计算需求。在这项工作中，我们展示了作为模型压缩技术的模型修剪可以有效地减少计算和数字病理学分析的推断成本，同时几乎不损失分析性能。为此，我们开发了一种对生物医学成像中广泛使用的U-Net风格架构进行修剪的方法，通过该方法我们评估了在细胞核实例分割和分类上的多种修剪启发式方法，并经验性地证明修剪可以将模型压缩至少70%，同时几乎不降低性能。

更新时间: 2024-04-12 22:05:01

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.08831v1

MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers

Adversarial robustness often comes at the cost of degraded accuracy, impeding the real-life application of robust classification models. Training-based solutions for better trade-offs are limited by incompatibilities with already-trained high-performance large models, necessitating the exploration of training-free ensemble approaches. Observing that robust models are more confident in correct predictions than in incorrect ones on clean and adversarial data alike, we speculate amplifying this "benign confidence property" can reconcile accuracy and robustness in an ensemble setting. To achieve so, we propose "MixedNUTS", a training-free method where the output logits of a robust classifier and a standard non-robust classifier are processed by nonlinear transformations with only three parameters, which are optimized through an efficient algorithm. MixedNUTS then converts the transformed logits into probabilities and mixes them as the overall output. On CIFAR-10, CIFAR-100, and ImageNet datasets, experimental results with custom strong adaptive attacks demonstrate MixedNUTS's vastly improved accuracy and near-SOTA robustness -- it boosts CIFAR-100 clean accuracy by 7.86 points, sacrificing merely 0.87 points in robust accuracy.

Updated: 2024-04-12 22:03:06

标题: 混合坚果：通过非线性混合分类器实现无需训练的精度-鲁棒性平衡

摘要: 对抗性鲁棒性通常以降低准确性为代价，阻碍了鲁棒分类模型在现实生活中的应用。基于训练的解决方案对于更好的权衡受到限制，因为它们与已经训练好的高性能大型模型不兼容，因此需要探索无需训练的集成方法。观察到鲁棒模型在干净数据和对抗数据上的正确预测要比错误预测更自信，我们推测放大这种“良性信心属性”可以在集成设置中调和准确性和鲁棒性。为了实现这一目标，我们提出了“MixedNUTS”，这是一种无需训练的方法，其中鲁棒分类器和标准非鲁棒分类器的输出logits经过非线性变换处理，只有三个参数通过高效算法进行优化。MixedNUTS然后将转换后的logits转换为概率，并将它们混合作为整体输出。在CIFAR-10、CIFAR-100和ImageNet数据集上，通过自定义强适应性攻击的实验结果表明，MixedNUTS的准确性大幅提高，鲁棒性接近SOTA水平--它将CIFAR-100的干净准确率提高了7.86个百分点，仅牺牲了0.87个百分点的鲁棒准确率。

更新时间: 2024-04-12 22:03:06

领域: cs.LG,cs.AI,cs.CV,68T07

下载: http://arxiv.org/abs/2402.02263v3

Measuring the Predictability of Recommender Systems using Structural Complexity Metrics

Recommender systems (RS) are central to the filtering and curation of online content. These algorithms predict user ratings for unseen items based on past preferences. Despite their importance, the innate predictability of RS has received limited attention. This study introduces data-driven metrics to measure the predictability of RS based on the structural complexity of the user-item rating matrix. A low predictability score indicates complex and unpredictable user-item interactions, while a high predictability score reveals less complex patterns with predictive potential. We propose two strategies that use singular value decomposition (SVD) and matrix factorization (MF) to measure structural complexity. By perturbing the data and evaluating the prediction of the perturbed version, we explore the structural consistency indicated by the SVD singular vectors. The assumption is that a random perturbation of highly structured data does not change its structure. Empirical results show a high correlation between our metrics and the accuracy of the best-performing prediction algorithms on real data sets.

Updated: 2024-04-12 22:00:27

标题: 利用结构复杂性度量标准衡量推荐系统的可预测性

摘要: 推荐系统（RS）对在线内容的过滤和筛选至关重要。这些算法基于过去的偏好预测用户对未见项目的评分。尽管它们的重要性，RS的固有可预测性受到了有限的关注。本研究引入了基于用户-项目评分矩阵的结构复杂性来衡量RS可预测性的数据驱动度量。低可预测性分数表示复杂且不可预测的用户-项目交互，而高可预测性分数揭示了具有预测潜力的较简单模式。我们提出了两种策略，使用奇异值分解（SVD）和矩阵分解（MF）来衡量结构复杂性。通过扰动数据并评估扰动版本的预测，我们探讨了SVD奇异向量所指示的结构一致性。假设是高度结构化数据的随机扰动不会改变其结构。实证结果显示，我们的度量与真实数据集上表现最佳的预测算法的准确性之间存在高度相关性。

更新时间: 2024-04-12 22:00:27

领域: cs.IR,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2404.08829v1

Hindsight PRIORs for Reward Learning from Human Preferences

Preference based Reinforcement Learning (PbRL) removes the need to hand specify a reward function by learning a reward from preference feedback over policy behaviors. Current approaches to PbRL do not address the credit assignment problem inherent in determining which parts of a behavior most contributed to a preference, which result in data intensive approaches and subpar reward functions. We address such limitations by introducing a credit assignment strategy (Hindsight PRIOR) that uses a world model to approximate state importance within a trajectory and then guides rewards to be proportional to state importance through an auxiliary predicted return redistribution objective. Incorporating state importance into reward learning improves the speed of policy learning, overall policy performance, and reward recovery on both locomotion and manipulation tasks. For example, Hindsight PRIOR recovers on average significantly (p<0.05) more reward on MetaWorld (20%) and DMC (15%). The performance gains and our ablations demonstrate the benefits even a simple credit assignment strategy can have on reward learning and that state importance in forward dynamics prediction is a strong proxy for a state's contribution to a preference decision. Code repository can be found at https://github.com/apple/ml-rlhf-hindsight-prior.

Updated: 2024-04-12 21:59:42

标题: 人类偏好中的回顾性先验对奖励学习的影响

摘要: 基于偏好的强化学习（PbRL）通过从策略行为的偏好反馈中学习奖励，消除了手动指定奖励函数的需要。目前的PbRL方法并未解决确定哪些行为部分对偏好产生最大贡献的credit assignment问题，这导致了数据密集型方法和次优的奖励函数。我们通过引入一种credit assignment策略（Hindsight PRIOR）来解决这些限制，该策略利用世界模型来近似轨迹中的状态重要性，然后通过辅助预测返回重分配目标来指导奖励与状态重要性成比例。将状态重要性纳入奖励学习中可以提高策略学习速度、整体策略性能以及在运动和操纵任务中的奖励回收。例如，Hindsight PRIOR在MetaWorld（20%）和DMC（15%）上平均显著（p<0.05）提高了奖励。性能提升和我们的消融实验证明了一个简单的credit assignment策略在奖励学习中的好处，以及在前向动力学预测中状态重要性是对状态对偏好决策的贡献的强大代理。代码库可在https://github.com/apple/ml-rlhf-hindsight-prior找到。

更新时间: 2024-04-12 21:59:42

领域: cs.LG,cs.AI,cs.HC

下载: http://arxiv.org/abs/2404.08828v1

Negative Feedback Training: A Novel Concept to Improve Robustness of NVCIM DNN Accelerators

Compute-in-memory (CIM) accelerators built upon non-volatile memory (NVM) devices excel in energy efficiency and latency when performing Deep Neural Network (DNN) inference, thanks to their in-situ data processing capability. However, the stochastic nature and intrinsic variations of NVM devices often result in performance degradation in DNN inference. Introducing these non-ideal device behaviors during DNN training enhances robustness, but drawbacks include limited accuracy improvement, reduced prediction confidence, and convergence issues. This arises from a mismatch between the deterministic training and non-deterministic device variations, as such training, though considering variations, relies solely on the model's final output. In this work, we draw inspiration from the control theory and propose a novel training concept: Negative Feedback Training (NFT) leveraging the multi-scale noisy information captured from network. We develop two specific NFT instances, Oriented Variational Forward (OVF) and Intermediate Representation Snapshot (IRS). Extensive experiments show that our methods outperform existing state-of-the-art methods with up to a 46.71% improvement in inference accuracy while reducing epistemic uncertainty, boosting output confidence, and improving convergence probability. Their effectiveness highlights the generality and practicality of our NFT concept in enhancing DNN robustness against device variations.

Updated: 2024-04-12 21:56:21

标题: 负反馈训练：提高NVCIM DNN加速器稳健性的新概念

摘要: 在基于非易失性存储器(NVM)设备构建的计算内存(CIM)加速器在执行深度神经网络(DNN)推断时，在能效和延迟方面表现出色，这要归功于它们的就地数据处理能力。然而，NVM设备的随机性质和固有变化往往导致DNN推断性能下降。引入这些非理想设备行为在DNN训练过程中增强了鲁棒性，但缺点包括有限的准确性改进、降低的预测置信度和收敛问题。这是由于确定性训练和非确定性设备变化之间的不匹配，因此训练虽然考虑了变化，但却仅依赖于模型的最终输出。在这项工作中，我们从控制理论中汲取灵感，提出了一种新的训练概念：负反馈训练(NFT)，利用从网络中捕获的多尺度噪声信息。我们开发了两个具体的NFT实例，即定向变分前向(OVF)和中间表示快照(IRS)。大量实验表明，我们的方法在推断准确性方面超越了现有的最先进方法，提高了长期不确定性，增强了输出置信度，改善了收敛概率。它们的有效性突显了我们的NFT概念在增强DNN对设备变化的鲁棒性方面的普遍性和实用性。

更新时间: 2024-04-12 21:56:21

领域: cs.LG,cs.AI,cs.AR

下载: http://arxiv.org/abs/2305.14561v4

Inverse Kinematics for Neuro-Robotic Grasping with Humanoid Embodied Agents

This paper introduces a novel zero-shot motion planning method that allows users to quickly design smooth robot motions in Cartesian space. A B\'ezier curve-based Cartesian plan is transformed into a joint space trajectory by our neuro-inspired inverse kinematics (IK) method CycleIK, for which we enable platform independence by scaling it to arbitrary robot designs. The motion planner is evaluated on the physical hardware of the two humanoid robots NICO and NICOL in a human-in-the-loop grasping scenario. Our method is deployed with an embodied agent that is a large language model (LLM) at its core. We generalize the embodied agent, that was introduced for NICOL, to also be embodied by NICO. The agent can execute a discrete set of physical actions and allows the user to verbally instruct various different robots. We contribute a grasping primitive to its action space that allows for precise manipulation of household objects. The new CycleIK method is compared to popular numerical IK solvers and state-of-the-art neural IK methods in simulation and is shown to be competitive with or outperform all evaluated methods when the algorithm runtime is very short. The grasping primitive is evaluated on both NICOL and NICO robots with a reported grasp success of 72% to 82% for each robot, respectively.

Updated: 2024-04-12 21:42:34

标题: 神经机器人抓取的逆运动学与人形机器人代理的实体联动

摘要: 这篇论文介绍了一种新颖的零预置运动规划方法，允许用户快速设计平滑的机器人运动在笛卡尔空间。基于贝塞尔曲线的笛卡尔计划通过我们的神经启发逆运动学（IK）方法CycleIK转换为关节空间轨迹，通过将其缩放到任意机器人设计实现平台独立性。该运动规划器在两个人形机器人NICO和NICOL的实际硬件上进行评估，处于人机协作抓取场景中。我们的方法部署了一个以大型语言模型（LLM）为核心的具体化代理。我们将为NICOL引入的具体化代理泛化为NICO。该代理可以执行一组离散的物理动作，并允许用户口头指导各种不同的机器人。我们为其动作空间添加了一个抓取原语，可以精确操纵家庭物品。新的CycleIK方法在模拟中与流行的数值IK求解器和最先进的神经IK方法进行比较，并在算法运行时间非常短时显示出竞争力或超越所有评估方法。抓取原语在NICOL和NICO机器人上进行评估，分别报告了每个机器人的抓取成功率为72%至82%。

更新时间: 2024-04-12 21:42:34

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.08825v1

Into the Fog: Evaluating Multiple Object Tracking Robustness

State-of-the-art (SOTA) trackers have shown remarkable Multiple Object Tracking (MOT) performance when trained and evaluated on current benchmarks. However, these benchmarks primarily consist of clear scenarios, overlooking adverse atmospheric conditions such as fog, haze, smoke and dust. As a result, the robustness of SOTA trackers remains underexplored. To address these limitations, we propose a pipeline for physic-based volumetric fog simulation in arbitrary real-world MOT dataset utilizing frame-by-frame monocular depth estimation and a fog formation optical model. Moreover, we enhance our simulation by rendering of both homogeneous and heterogeneous fog effects. We propose to use the dark channel prior method to estimate fog (smoke) color, which shows promising results even in night and indoor scenes. We present the leading tracking benchmark MOTChallenge (MOT17 dataset) overlaid by fog (smoke for indoor scenes) of various intensity levels and conduct a comprehensive evaluation of SOTA MOT methods, revealing their limitations under fog and fog-similar challenges.

Updated: 2024-04-12 21:41:50

标题: 进入迷雾：评估多目标跟踪的稳健性

摘要: 最先进的追踪器在当前基准测试中展示出了令人瞩目的多目标跟踪（MOT）性能。然而，这些基准主要由清晰场景组成，忽视了不利的大气条件，如雾、阴霭、烟雾和尘土。因此，最先进的追踪器的鲁棒性仍未得到充分探讨。为了解决这些限制，我们提出了一个基于物理的体积雾模拟管道，利用逐帧单目深度估计和雾形成光学模型在任意真实世界MOT数据集中进行模拟。此外，我们通过渲染均匀和不均匀的雾效果来增强我们的模拟。我们提议使用暗通道先验方法来估计雾（烟雾）的颜色，即使在夜晚和室内场景中也显示出有希望的结果。我们展示了具有各种强度水平雾（室内场景的烟雾）覆盖的领先跟踪基准MOTChallenge（MOT17数据集），并对最先进的MOT方法进行全面评估，揭示它们在雾和类似雾的挑战下的限制。

更新时间: 2024-04-12 21:41:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.10534v1

Adversarial Patterns: Building Robust Android Malware Classifiers

Machine learning models are increasingly being adopted across various fields, such as medicine, business, autonomous vehicles, and cybersecurity, to analyze vast amounts of data, detect patterns, and make predictions or recommendations. In the field of cybersecurity, these models have made significant improvements in malware detection. However, despite their ability to understand complex patterns from unstructured data, these models are susceptible to adversarial attacks that perform slight modifications in malware samples, leading to misclassification from malignant to benign. Numerous defense approaches have been proposed to either detect such adversarial attacks or improve model robustness. These approaches have resulted in a multitude of attack and defense techniques and the emergence of a field known as `adversarial machine learning.' In this survey paper, we provide a comprehensive review of adversarial machine learning in the context of Android malware classifiers. Android is the most widely used operating system globally and is an easy target for malicious agents. The paper first presents an extensive background on Android malware classifiers, followed by an examination of the latest advancements in adversarial attacks and defenses. Finally, the paper provides guidelines for designing robust malware classifiers and outlines research directions for the future.

Updated: 2024-04-12 21:41:08

标题: 对抗性模式：构建稳健的安卓恶意软件分类器

摘要: 机器学习模型越来越被广泛应用于各个领域，如医学、商业、自动驾驶车辆和网络安全，以分析海量数据、检测模式，并进行预测或建议。在网络安全领域，这些模型在恶意软件检测方面取得了显著进展。然而，尽管这些模型能够从非结构化数据中理解复杂模式，但它们容易受到对恶意软件样本进行轻微修改的敌对攻击的影响，导致将恶性误分类为良性。已经提出了许多防御方法，旨在检测此类敌对攻击或提高模型的鲁棒性。这些方法产生了大量攻击和防御技术，并形成了一个被称为“对抗机器学习”的领域。在这篇调查论文中，我们在Android恶意软件分类器的背景下全面审查了对抗机器学习。Android是全球使用最广泛的操作系统，是恶意代理人的易目标。论文首先介绍了Android恶意软件分类器的广泛背景，然后检查了对抗攻击和防御的最新进展。最后，论文提供了设计强大恶意软件分类器的指导方针，并概述了未来的研究方向。

更新时间: 2024-04-12 21:41:08

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2203.02121v2

Single-image driven 3d viewpoint training data augmentation for effective wine label recognition

Confronting the critical challenge of insufficient training data in the field of complex image recognition, this paper introduces a novel 3D viewpoint augmentation technique specifically tailored for wine label recognition. This method enhances deep learning model performance by generating visually realistic training samples from a single real-world wine label image, overcoming the challenges posed by the intricate combinations of text and logos. Classical Generative Adversarial Network (GAN) methods fall short in synthesizing such intricate content combination. Our proposed solution leverages time-tested computer vision and image processing strategies to expand our training dataset, thereby broadening the range of training samples for deep learning applications. This innovative approach to data augmentation circumvents the constraints of limited training resources. Using the augmented training images through batch-all triplet metric learning on a Vision Transformer (ViT) architecture, we can get the most discriminative embedding features for every wine label, enabling us to perform one-shot recognition of existing wine labels in the training classes or future newly collected wine labels unavailable in the training. Experimental results show a significant increase in recognition accuracy over conventional 2D data augmentation techniques.

Updated: 2024-04-12 21:30:09

标题: 单图驱动3D视角训练数据增强以实现有效的葡萄酒标签识别

摘要: 面临复杂图像识别领域中训练数据不足的关键挑战，本文介绍了一种专门为葡萄酒标签识别定制的新颖的3D视角增强技术。该方法通过从单个真实世界葡萄酒标签图像生成视觉逼真的训练样本，提高了深度学习模型的性能，克服了文字和标志的复杂组合带来的挑战。传统的生成对抗网络（GAN）方法在合成这种复杂内容组合方面表现不佳。我们提出的解决方案利用经过时间考验的计算机视觉和图像处理策略扩展我们的训练数据集，从而扩大了深度学习应用的训练样本范围。这种创新的数据增强方法规避了有限训练资源的限制。通过在Vision Transformer（ViT）架构上进行批处理三元度量学习使用增强的训练图像，我们可以为每个葡萄酒标签获取最具辨别性的嵌入特征，从而使我们能够在训练类别中对现有葡萄酒标签进行一次性识别或对未来新收集的葡萄酒标签进行识别。实验结果显示与传统的2D数据增强技术相比，识别准确度显著提高。

更新时间: 2024-04-12 21:30:09

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.08820v1

The Illusion of State in State-Space Models

State-space models (SSMs) have emerged as a potential alternative architecture for building large language models (LLMs) compared to the previously ubiquitous transformer architecture. One theoretical weakness of transformers is that they cannot express certain kinds of sequential computation and state tracking (Merrill and Sabharwal, 2023), which SSMs are explicitly designed to address via their close architectural similarity to recurrent neural networks (RNNs). But do SSMs truly have an advantage (over transformers) in expressive power for state tracking? Surprisingly, the answer is no. Our analysis reveals that the expressive power of SSMs is limited very similarly to transformers: SSMs cannot express computation outside the complexity class $\mathsf{TC}^0$. In particular, this means they cannot solve simple state-tracking problems like permutation composition. It follows that SSMs are provably unable to accurately track chess moves with certain notation, evaluate code, or track entities in a long narrative. To supplement our formal analysis, we report experiments showing that Mamba-style SSMs indeed struggle with state tracking. Thus, despite its recurrent formulation, the "state" in an SSM is an illusion: SSMs have similar expressiveness limitations to non-recurrent models like transformers, which may fundamentally limit their ability to solve real-world state-tracking problems.

Updated: 2024-04-12 21:30:06

标题: 状态空间模型中的国家幻觉

摘要: 状态空间模型（SSMs）已经成为构建大型语言模型（LLMs）的潜在替代架构，相对于之前普遍存在的变压器架构。变压器的一个理论弱点是它们无法表达某些类型的顺序计算和状态跟踪（Merrill和Sabharwal，2023），而SSMs通过其与循环神经网络（RNNs）的紧密架构相似性明确设计来解决这个问题。但是SSMs在状态跟踪的表现上真的有优势吗（相对于变压器）？令人惊讶的是，答案是否定的。我们的分析显示，SSMs的表达能力与变压器非常相似受限：SSMs无法表达超出复杂度类 $\mathsf{TC}^0$ 的计算。特别是，这意味着它们无法解决简单的状态跟踪问题，比如排列组合。由此可见，SSMs无法准确跟踪带有特定符号的棋盘走法，评估代码，或在长篇叙述中跟踪实体。为了补充我们的形式分析，我们报告了实验证明Mamba风格的SSMs确实在状态跟踪方面存在困难。因此，尽管其具有循环式的构想，SSM中的“状态”是一种幻觉：SSMs与变压器等非循环模型具有类似的表达限制，这可能从根本上限制了它们解决现实世界状态跟踪问题的能力。

更新时间: 2024-04-12 21:30:06

领域: cs.LG,cs.CC,cs.CL,cs.FL

下载: http://arxiv.org/abs/2404.08819v1

Empowering Malware Detection Efficiency within Processing-in-Memory Architecture

The widespread integration of embedded systems across various industries has facilitated seamless connectivity among devices and bolstered computational capabilities. Despite their extensive applications, embedded systems encounter significant security threats, with one of the most critical vulnerabilities being malicious software, commonly known as malware. In recent times, malware detection techniques leveraging Machine Learning have gained popularity. Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) have proven particularly efficient in image processing tasks. However, one major drawback of neural network architectures is their substantial computational resource requirements. Continuous training of malware detection models with updated malware and benign samples demands immense computational resources, presenting a challenge for real-world applications. In response to these concerns, we propose a Processing-in-Memory (PIM)-based architecture to mitigate memory access latency, thereby reducing the resources consumed during model updates. To further enhance throughput and minimize energy consumption, we incorporate precision scaling techniques tailored for CNN models. Our proposed PIM architecture exhibits a 1.09x higher throughput compared to existing Lookup Table (LUT)-based PIM architectures. Additionally, precision scaling combined with PIM enhances energy efficiency by 1.5x compared to full-precision operations, without sacrificing performance. This innovative approach offers a promising solution to the resource-intensive nature of malware detection model updates, paving the way for more efficient and sustainable cybersecurity practices.

Updated: 2024-04-12 21:28:43

标题: 在处理内存架构中提高恶意软件检测效率

摘要: 各行业广泛整合嵌入式系统促进了设备之间的无缝连接并增强了计算能力。尽管嵌入式系统有着广泛的应用，但也面临着重大的安全威胁，其中最关键的漏洞之一是恶意软件，通常称为恶意软件。近年来，利用机器学习的恶意软件检测技术变得越来越受欢迎。深度神经网络（DNNs）和卷积神经网络（CNNs）在图像处理任务中表现出特别高效。然而，神经网络架构的一个主要缺点是它们对计算资源的巨大需求。持续训练恶意软件检测模型并使用更新的恶意软件和良性样本需要巨大的计算资源，这对实际应用构成挑战。为了应对这些问题，我们提出了一种基于内存处理（PIM）的架构，以减轻内存访问延迟，从而减少模型更新过程中消耗的资源。为了进一步提高吞吐量并最大程度减少能源消耗，我们引入了专门针对CNN模型的精度缩放技术。我们提出的PIM架构与现有的基于查找表（LUT）的PIM架构相比，吞吐量高出1.09倍。此外，精度缩放与PIM相结合，能够使能效提高1.5倍，而不会牺牲性能。这种创新的方法为资源密集型的恶意软件检测模型更新提供了有希望的解决方案，为更高效和可持续的网络安全实践铺平了道路。

更新时间: 2024-04-12 21:28:43

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2404.08818v1

PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis

Deep neural networks for classification are vulnerable to adversarial attacks, where small perturbations to input samples lead to incorrect predictions. This susceptibility, combined with the black-box nature of such networks, limits their adoption in critical applications like autonomous driving. Feature-attribution-based explanation methods provide relevance of input features for model predictions on input samples, thus explaining model decisions. However, we observe that both model predictions and feature attributions for input samples are sensitive to noise. We develop a practical method for this characteristic of model prediction and feature attribution to detect adversarial samples. Our method, PASA, requires the computation of two test statistics using model prediction and feature attribution and can reliably detect adversarial samples using thresholds learned from benign samples. We validate our lightweight approach by evaluating the performance of PASA on varying strengths of FGSM, PGD, BIM, and CW attacks on multiple image and non-image datasets. On average, we outperform state-of-the-art statistical unsupervised adversarial detectors on CIFAR-10 and ImageNet by 14\% and 35\% ROC-AUC scores, respectively. Moreover, our approach demonstrates competitive performance even when an adversary is aware of the defense mechanism.

Updated: 2024-04-12 21:22:21

标题: PASA：使用预测和归因敏感性分析的攻击无关的无监督对抗检测

摘要: 深度神经网络用于分类容易受到对抗性攻击的影响，即对输入样本进行微小扰动会导致错误的预测。这种易受攻击性结合网络的黑盒特性限制了它们在诸如自动驾驶等关键应用中的采用。基于特征归因的解释方法提供了输入特征对模型对输入样本的预测的相关性，从而解释模型的决策。然而，我们观察到模型预测和输入样本的特征归因都对噪声敏感。我们开发了一种实用方法，用于检测对抗样本的这种特征，即模型预测和特征归因的敏感性。我们的方法PASA需要使用模型预测和特征归因计算两个检验统计量，并可以可靠地使用从良性样本中学习的阈值检测对抗性样本。我们通过评估PASA在多种图像和非图像数据集上的FGSM、PGD、BIM和CW攻击的不同强度下的性能来验证我们的轻量级方法。平均而言，我们在CIFAR-10和ImageNet数据集上的ROC-AUC得分分别比最先进的统计非监督对抗检测器高出14%和35%。此外，即使对手意识到防御机制，我们的方法也表现出竞争性的性能。

更新时间: 2024-04-12 21:22:21

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.10789v1

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily adept at processing textual information. To address this constraint, researchers have endeavored to integrate visual capabilities with LLMs, resulting in the emergence of Vision-Language Models (VLMs). These advanced models are instrumental in tackling more intricate tasks such as image captioning and visual question answering. In our comprehensive survey paper, we delve into the key advancements within the realm of VLMs. Our classification organizes VLMs into three distinct categories: models dedicated to vision-language understanding, models that process multimodal inputs to generate unimodal (textual) outputs and models that both accept and produce multimodal inputs and outputs.This classification is based on their respective capabilities and functionalities in processing and generating various modalities of data.We meticulously dissect each model, offering an extensive analysis of its foundational architecture, training data sources, as well as its strengths and limitations wherever possible, providing readers with a comprehensive understanding of its essential components. We also analyzed the performance of VLMs in various benchmark datasets. By doing so, we aim to offer a nuanced understanding of the diverse landscape of VLMs. Additionally, we underscore potential avenues for future research in this dynamic domain, anticipating further breakthroughs and advancements.

Updated: 2024-04-12 21:20:37

标题: 探索视觉-语言模型的前沿：当前方法和未来方向的调查

摘要: 大型语言模型（LLMs）的出现显著地改变了人工智能革命的轨迹。然而，这些LLMs展现出一个显著的局限性，即它们主要擅长处理文本信息。为了解决这一限制，研究人员努力将视觉能力与LLMs整合，从而出现了视觉语言模型（VLMs）。这些先进模型在处理更复杂的任务，如图像字幕和视觉问答方面发挥了重要作用。在我们的综合调查论文中，我们深入探讨了VLMs领域的关键进展。我们的分类将VLMs分为三个不同的类别：专注于视觉语言理解的模型，处理多模输入以生成单模（文本）输出的模型以及既接受又产生多模输入和输出的模型。这种分类基于它们在处理和生成各种数据模态方面的能力和功能。我们对每个模型进行了细致的分析，对其基础架构、训练数据来源以及可能的优点和局限性进行了广泛的分析，为读者提供了对其基本组成部分的全面理解。我们还分析了VLMs在各种基准数据集中的表现。通过这样做，我们旨在提供对VLMs多样化领域的微妙理解。此外，我们强调了未来研究在这一充满活力的领域中的潜在途径，期待进一步的突破和进展。

更新时间: 2024-04-12 21:20:37

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.07214v2

E3: Ensemble of Expert Embedders for Adapting Synthetic Image Detectors to New Generators Using Limited Data

As generative AI progresses rapidly, new synthetic image generators continue to emerge at a swift pace. Traditional detection methods face two main challenges in adapting to these generators: the forensic traces of synthetic images from new techniques can vastly differ from those learned during training, and access to data for these new generators is often limited. To address these issues, we introduce the Ensemble of Expert Embedders (E3), a novel continual learning framework for updating synthetic image detectors. E3 enables the accurate detection of images from newly emerged generators using minimal training data. Our approach does this by first employing transfer learning to develop a suite of expert embedders, each specializing in the forensic traces of a specific generator. Then, all embeddings are jointly analyzed by an Expert Knowledge Fusion Network to produce accurate and reliable detection decisions. Our experiments demonstrate that E3 outperforms existing continual learning methods, including those developed specifically for synthetic image detection.

Updated: 2024-04-12 21:14:20

标题: E3: 专家嵌入器集成组合，利用有限数据调整合成图像检测器以适应新生成器

摘要: 随着生成式人工智能的快速发展，新的合成图像生成器不断涌现出来。传统的检测方法在适应这些生成器时面临两个主要挑战：新技术生成的合成图像的取证痕迹可能与训练期间学习到的痕迹大相径庭，而这些新生成器的数据获取通常受到限制。为了解决这些问题，我们引入了一种新颖的持续学习框架——Expert Embedders的组合（E3），用于更新合成图像检测器。E3能够使用最少的训练数据准确检测新出现的生成器生成的图像。我们的方法首先利用迁移学习来开发一套专门针对特定生成器的取证痕迹的专家嵌入器。然后，所有嵌入器由Expert Knowledge Fusion Network共同分析，以生成准确可靠的检测决策。我们的实验表明，E3的性能优于现有的持续学习方法，包括那些专门用于合成图像检测的方法。

更新时间: 2024-04-12 21:14:20

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.08814v1

Optimizing Malware Detection in IoT Networks: Leveraging Resource-Aware Distributed Computing for Enhanced Security

In recent years, networked IoT systems have revo- lutionized connectivity, portability, and functionality, offering a myriad of advantages. However, these systems are increasingly targeted by adversaries due to inherent security vulnerabilities and limited computational and storage resources. Malicious applications, commonly known as malware, pose a significant threat to IoT devices and networks. While numerous malware detection techniques have been proposed, existing approaches often overlook the resource constraints inherent in IoT environ- ments, assuming abundant resources for detection tasks. This oversight is compounded by ongoing workloads such as sens- ing and on-device computations, further diminishing available resources for malware detection. To address these challenges, we present a novel resource- and workload-aware malware detection framework integrated with distributed computing for IoT networks. Our approach begins by analyzing available resources for malware detection using a lightweight regression model. Depending on resource availability, ongoing workload executions, and communication costs, the malware detection task is dynamically allocated either on-device or offloaded to neighboring IoT nodes with sufficient resources. To safeguard data integrity and user privacy, rather than transferring the entire malware detection task, the classifier is partitioned and distributed across multiple nodes, and subsequently integrated at the parent node for comprehensive malware detection. Experimental analysis demonstrates the efficacy of our proposed technique, achieving a remarkable speed-up of 9.8x compared to on-device inference, while maintaining a high malware detection accuracy of 96.7%.

Updated: 2024-04-12 21:11:29

标题: 在物联网网络中优化恶意软件检测：利用资源感知分布式计算增强安全性

摘要: 近年来，网络化的物联网系统已经彻底改变了连接性、可携性和功能性，提供了许多优势。然而，由于固有的安全漏洞和有限的计算和存储资源，这些系统正日益成为攻击者的目标。恶意应用程序，通常被称为恶意软件，对物联网设备和网络构成了重大威胁。虽然已经提出了许多恶意软件检测技术，但现有方法往往忽视了物联网环境中固有的资源限制，假设有充足的资源用于检测任务。这种疏忽受到了持续工作负载（如传感和设备上的计算）的影响，进一步减少了用于恶意软件检测的可用资源。为了解决这些挑战，我们提出了一个新颖的、与分布式计算集成的资源和工作负载感知的恶意软件检测框架，用于物联网网络。我们的方法首先通过使用轻量级回归模型分析可用资源来进行恶意软件检测。根据资源的可用性、正在执行的工作负载和通信成本，恶意软件检测任务动态分配到设备上或外部传输到具有足够资源的相邻物联网节点。为了保护数据完整性和用户隐私，我们并没有传输整个恶意软件检测任务，而是将分类器分区并分布到多个节点，然后在父节点集成以进行全面的恶意软件检测。实验分析展示了我们提出的技术的有效性，与设备上的推断相比，实现了显著的加速，速度提高了9.8倍，同时保持高达96.7%的恶意软件检测准确率。

更新时间: 2024-04-12 21:11:29

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2404.10012v1

Can Public Large Language Models Help Private Cross-device Federated Learning?

We study (differentially) private federated learning (FL) of language models. The language models in cross-device FL are relatively small, which can be trained with meaningful formal user-level differential privacy (DP) guarantees when massive parallelism in training is enabled by the participation of a moderate size of users. Recently, public data has been used to improve privacy-utility trade-offs for both large and small language models. In this work, we provide a systematic study of using large-scale public data and LLMs to help differentially private training of on-device FL models, and further improve the privacy-utility tradeoff by techniques of distillation. Moreover, we propose a novel distribution matching algorithm with theoretical grounding to sample public data close to private data distribution, which significantly improves the sample efficiency of (pre-)training on public data. The proposed method is efficient and effective for training private models by taking advantage of public data, especially for customized on-device architectures that do not have ready-to-use pre-trained models.

Updated: 2024-04-12 21:01:12

标题: 公共大型语言模型是否能帮助私有的跨设备联邦学习？

摘要: 我们研究（不同ially）私人联邦学习（FL）的语言模型。跨设备FL中的语言模型相对较小，可以在训练中启用大规模并行性时，通过参与适量用户来提供有意义的形式用户级差分隐私（DP）保证。最近，已经使用公共数据来改善大型和小型语言模型的隐私-效用权衡。在这项工作中，我们系统地研究了利用大规模公共数据和LLMs来帮助在设备FL模型的差分私有训练，并通过蒸馏技术进一步改善隐私-效用权衡。此外，我们提出了一个具有理论基础的新型分布匹配算法，以接近私有数据分布对公共数据进行采样，从而显着提高了（预）在公共数据上进行训练的样本效率。该提出的方法通过利用公共数据有效地训练私有模型，特别适用于没有现成预训练模型的定制设备架构。

更新时间: 2024-04-12 21:01:12

领域: cs.LG

下载: http://arxiv.org/abs/2305.12132v2

Reducing the Barriers to Entry for Foundation Model Training

The world has recently witnessed an unprecedented acceleration in demands for Machine Learning and Artificial Intelligence applications. This spike in demand has imposed tremendous strain on the underlying technology stack in supply chain, GPU-accelerated hardware, software, datacenter power density, and energy consumption. If left on the current technological trajectory, future demands show insurmountable spending trends, further limiting market players, stifling innovation, and widening the technology gap. To address these challenges, we propose a fundamental change in the AI training infrastructure throughout the technology ecosystem. The changes require advancements in supercomputing and novel AI training approaches, from high-end software to low-level hardware, microprocessor, and chip design, while advancing the energy efficiency required by a sustainable infrastructure. This paper presents the analytical framework that quantitatively highlights the challenges and points to the opportunities to reduce the barriers to entry for training large language models.

Updated: 2024-04-12 20:58:25

标题: 降低基础模型训练的准入门槛

摘要: 最近世界上目睹了对机器学习和人工智能应用需求的前所未有的加速。这种需求激增给供应链、GPU加速硬件、软件、数据中心功耗和能耗带来了巨大压力。如果按照当前的技术轨迹发展，未来需求将呈现出不可逾越的支出趋势，进一步限制市场参与者，扼杀创新，并拉大技术差距。为了应对这些挑战，我们提出了对整个技术生态系统中的人工智能训练基础设施进行根本性改变。这些改变需要在超级计算和新颖的人工智能训练方法方面取得进展，从高端软件到低级硬件、微处理器和芯片设计，同时提高可持续基础设施所需的能效。本文提出了定量突出挑战并指向减少训练大型语言模型准入障碍的机会的分析框架。

更新时间: 2024-04-12 20:58:25

领域: cs.ET,cs.AI,cs.AR,cs.LG

下载: http://arxiv.org/abs/2404.08811v1

Leveraging viscous Hamilton-Jacobi PDEs for uncertainty quantification in scientific machine learning

Uncertainty quantification (UQ) in scientific machine learning (SciML) combines the powerful predictive power of SciML with methods for quantifying the reliability of the learned models. However, two major challenges remain: limited interpretability and expensive training procedures. We provide a new interpretation for UQ problems by establishing a new theoretical connection between some Bayesian inference problems arising in SciML and viscous Hamilton-Jacobi partial differential equations (HJ PDEs). Namely, we show that the posterior mean and covariance can be recovered from the spatial gradient and Hessian of the solution to a viscous HJ PDE. As a first exploration of this connection, we specialize to Bayesian inference problems with linear models, Gaussian likelihoods, and Gaussian priors. In this case, the associated viscous HJ PDEs can be solved using Riccati ODEs, and we develop a new Riccati-based methodology that provides computational advantages when continuously updating the model predictions. Specifically, our Riccati-based approach can efficiently add or remove data points to the training set invariant to the order of the data and continuously tune hyperparameters. Moreover, neither update requires retraining on or access to previously incorporated data. We provide several examples from SciML involving noisy data and \textit{epistemic uncertainty} to illustrate the potential advantages of our approach. In particular, this approach's amenability to data streaming applications demonstrates its potential for real-time inferences, which, in turn, allows for applications in which the predicted uncertainty is used to dynamically alter the learning process.

Updated: 2024-04-12 20:54:01

标题: 利用粘性哈密顿-雅可比偏微分方程在科学机器学习中进行不确定性量化

摘要: 在科学机器学习（SciML）中，不确定性量化（UQ）将SciML的强大预测能力与量化学习模型可靠性的方法相结合。然而，仍然存在两个主要挑战：解释性有限和昂贵的训练程序。我们通过建立一种新的理论连接，为UQ问题提供了一个新的解释，这种理论连接是在SciML中出现的一些贝叶斯推断问题与粘性哈密顿-雅可比偏微分方程（HJ PDEs）之间的。换句话说，我们展示了后验均值和协方差可以从粘性HJ PDE的解的空间梯度和Hessian中恢复。作为对这种连接的首次探索，我们专注于具有线性模型、高斯似然和高斯先验的贝叶斯推断问题。在这种情况下，相关的粘性HJ PDE可以使用Riccati ODEs来解决，并且我们开发了一种基于Riccati的方法论，在持续更新模型预测时提供了计算优势。具体地，我们的Riccati方法可以有效地添加或删除数据点到训练集中，不受数据顺序的影响，并持续调整超参数。此外，两个更新都不需要重新训练或访问先前纳入的数据。我们提供了几个涉及嘈杂数据和认知不确定性的SciML示例，以说明我们方法的潜在优势。特别是，这种方法对数据流应用的适应性展示了其实时推断的潜力，进而允许应用程序使用预测的不确定性动态地改变学习过程。

更新时间: 2024-04-12 20:54:01

领域: cs.LG,stat.ML,35F21, 62F15, 65L99, 65N99, 68T05, 35B37

下载: http://arxiv.org/abs/2404.08809v1

Enhancing IoT Malware Detection through Adaptive Model Parallelism and Resource Optimization

The widespread integration of IoT devices has greatly improved connectivity and computational capabilities, facilitating seamless communication across networks. Despite their global deployment, IoT devices are frequently targeted for security breaches due to inherent vulnerabilities. Among these threats, malware poses a significant risk to IoT devices. The lack of built-in security features and limited resources present challenges for implementing effective malware detection techniques on IoT devices. Moreover, existing methods assume access to all device resources for malware detection, which is often not feasible for IoT devices deployed in critical real-world scenarios. To overcome this challenge, this study introduces a novel approach to malware detection tailored for IoT devices, leveraging resource and workload awareness inspired by model parallelism. Initially, the device assesses available resources for malware detection using a lightweight regression model. Based on resource availability, ongoing workload, and communication costs, the malware detection task is dynamically allocated either on-device or offloaded to neighboring IoT nodes with sufficient resources. To uphold data integrity and user privacy, instead of transferring the entire malware detection task, the classifier is divided and distributed across multiple nodes, then integrated at the parent node for detection. Experimental results demonstrate that this proposed technique achieves a significant speedup of 9.8 x compared to on-device inference, while maintaining a high malware detection accuracy of 96.7%.

Updated: 2024-04-12 20:51:25

标题: 通过自适应模型并行性和资源优化增强物联网恶意软件检测

摘要: 物联网设备的广泛整合极大地提高了连接性和计算能力，促进了网络间的无缝通信。尽管它们已经在全球范围内部署，但由于固有的漏洞，物联网设备经常成为安全漏洞的目标。在这些威胁中，恶意软件对物联网设备构成重大风险。缺乏内置安全功能和有限的资源为在物联网设备上实施有效的恶意软件检测技术带来了挑战。此外，现有方法通常假定可以访问所有设备资源进行恶意软件检测，而在关键实际场景中部署的物联网设备往往无法实现这一点。为了克服这一挑战，本研究引入了一种针对物联网设备量身定制的恶意软件检测新方法，利用受到模型并行性启发的资源和工作负载意识。初始阶段，设备利用轻量级回归模型评估用于恶意软件检测的可用资源。根据资源可用性、正在进行的工作负载和通信成本，恶意软件检测任务会动态分配到设备上或者转移到具有足够资源的相邻物联网节点。为了维护数据完整性和用户隐私，与其传输整个恶意软件检测任务，最好将分类器分割并分布到多个节点上，然后在父节点上集成以进行检测。实验结果表明，这种提出的技术相对于设备上的推断实现了显著的加速，达到了9.8倍，同时保持了高达96.7%的恶意软件检测准确率。

更新时间: 2024-04-12 20:51:25

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2404.08808v1

Real-time guidewire tracking and segmentation in intraoperative x-ray

During endovascular interventions, physicians have to perform accurate and immediate operations based on the available real-time information, such as the shape and position of guidewires observed on the fluoroscopic images, haptic information and the patients' physiological signals. For this purpose, real-time and accurate guidewire segmentation and tracking can enhance the visualization of guidewires and provide visual feedback for physicians during the intervention as well as for robot-assisted interventions. Nevertheless, this task often comes with the challenge of elongated deformable structures that present themselves with low contrast in the noisy fluoroscopic image sequences. To address these issues, a two-stage deep learning framework for real-time guidewire segmentation and tracking is proposed. In the first stage, a Yolov5s detector is trained, using the original X-ray images as well as synthetic ones, which is employed to output the bounding boxes of possible target guidewires. More importantly, a refinement module based on spatiotemporal constraints is incorporated to robustly localize the guidewire and remove false detections. In the second stage, a novel and efficient network is proposed to segment the guidewire in each detected bounding box. The network contains two major modules, namely a hessian-based enhancement embedding module and a dual self-attention module. Quantitative and qualitative evaluations on clinical intra-operative images demonstrate that the proposed approach significantly outperforms our baselines as well as the current state of the art and, in comparison, shows higher robustness to low quality images.

Updated: 2024-04-12 20:39:19

标题: 实时导丝跟踪和分割在术中X射线中

摘要: 在血管内介入手术中，医生必须根据实时信息执行准确和即时的操作，例如在透视图像上观察到的导丝的形状和位置、触觉信息以及患者的生理信号。为此，实时和准确的导丝分割和跟踪可以增强导丝的可视化，并为医生在介入过程以及机器人辅助介入中提供视觉反馈。然而，这项任务通常面临着具有低对比度的噪声性透视图像序列中呈现出的延长可变形结构的挑战。为解决这些问题，提出了一个用于实时导丝分割和跟踪的两阶段深度学习框架。在第一阶段，通过使用原始X射线图像以及合成图像训练了一个Yolov5s检测器，用于输出可能目标导丝的边界框。更重要的是，引入了基于时空约束的精细化模块，以稳健地定位导丝并消除错误检测。在第二阶段，提出了一种新颖和高效的网络，用于对每个检测到的边界框中的导丝进行分割。该网络包含两个主要模块，即基于Hessian的增强嵌入模块和双自注意力模块。对临床术中图像进行定量和定性评估表明，所提出的方法明显优于我们的基线以及当前的技术水平，并且相比之下，对质量较低的图像表现出更高的鲁棒性。

更新时间: 2024-04-12 20:39:19

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.08805v1

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy. We introduce Megalodon, a neural architecture for efficient sequence modeling with unlimited context length. Megalodon inherits the architecture of Mega (exponential moving average with gated attention), and further introduces multiple technical components to improve its capability and stability, including complex exponential moving average (CEMA), timestep normalization layer, normalized attention mechanism and pre-norm with two-hop residual configuration. In a controlled head-to-head comparison with Llama2, Megalodon achieves better efficiency than Transformer in the scale of 7 billion parameters and 2 trillion training tokens. Megalodon reaches a training loss of 1.70, landing mid-way between Llama2-7B (1.75) and 13B (1.67). Code: https://github.com/XuezheMax/megalodon

Updated: 2024-04-12 20:28:14

标题: 巨齿鲨：具有无限上下文长度的高效LLM预训练和推理

摘要: 变压器的二次复杂度和弱长度外推限制了它们在长序列上扩展的能力，虽然存在像线性注意力和状态空间模型这样的次二次解决方案，但它们在预训练效率和下游任务准确性方面在实证上表现不佳。我们介绍了Megalodon，这是一种用于高效序列建模的神经架构，具有无限的上下文长度。Megalodon继承了Mega的架构（带有门控注意力的指数移动平均），并进一步引入了多个技术组件来提高其能力和稳定性，包括复杂的指数移动平均（CEMA）、时间步归一化层、归一化注意机制和带有两跳残差配置的预规范。在与Llama2的对照比较中，Megalodon在规模为70亿参数和2万亿训练令牌的情况下比变压器实现了更好的效率。Megalodon的训练损失为1.70，在Llama2-7B（1.75）和13B（1.67）之间。代码：https://github.com/XuezheMax/megalodon

更新时间: 2024-04-12 20:28:14

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.08801v1

Generative AI-Based Effective Malware Detection for Embedded Computing Systems

One of the pivotal security threats for the embedded computing systems is malicious software a.k.a malware. With efficiency and efficacy, Machine Learning (ML) has been widely adopted for malware detection in recent times. Despite being efficient, the existing techniques require a tremendous number of benign and malware samples for training and modeling an efficient malware detector. Furthermore, such constraints limit the detection of emerging malware samples due to the lack of sufficient malware samples required for efficient training. To address such concerns, we introduce a code-aware data generation technique that generates multiple mutated samples of the limitedly seen malware by the devices. Loss minimization ensures that the generated samples closely mimic the limitedly seen malware and mitigate the impractical samples. Such developed malware is further incorporated into the training set to formulate the model that can efficiently detect the emerging malware despite having limited exposure. The experimental results demonstrates that the proposed technique achieves an accuracy of 90% in detecting limitedly seen malware, which is approximately 3x more than the accuracy attained by state-of-the-art techniques.

Updated: 2024-04-12 20:18:00

标题: 基于生成式人工智能的嵌入式计算系统有效恶意软件检测

摘要: 嵌入式计算系统的关键安全威胁之一是恶意软件，也称为恶意软件。机器学习（ML）已广泛应用于最近恶意软件检测，以提高效率和效果。尽管现有技术效率高，但需要大量良性和恶意样本进行训练和建模，以建立高效的恶意软件检测器。此外，这些限制限制了对新兴恶意软件样本的检测，因为缺乏足够的恶意软件样本进行有效训练。为了解决这些问题，我们引入了一种代码感知的数据生成技术，通过生成有限数量的设备已见恶意软件的多个变异样本。损失最小化确保生成的样本与有限见过的恶意软件相似，并减轻不切实际的样本。这种开发的恶意软件进一步纳入训练集，以制定可以有效检测新兴恶意软件的模型，尽管其曝光有限。实验结果表明，所提出的技术在检测有限见过的恶意软件方面实现了90%的准确率，这比最先进技术实现的准确率高出约3倍。

更新时间: 2024-04-12 20:18:00

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2404.02344v2

Semantic Approach to Quantifying the Consistency of Diffusion Model Image Generation

In this study, we identify the need for an interpretable, quantitative score of the repeatability, or consistency, of image generation in diffusion models. We propose a semantic approach, using a pairwise mean CLIP (Contrastive Language-Image Pretraining) score as our semantic consistency score. We applied this metric to compare two state-of-the-art open-source image generation diffusion models, Stable Diffusion XL and PixArt-{\alpha}, and we found statistically significant differences between the semantic consistency scores for the models. Agreement between the Semantic Consistency Score selected model and aggregated human annotations was 94%. We also explored the consistency of SDXL and a LoRA-fine-tuned version of SDXL and found that the fine-tuned model had significantly higher semantic consistency in generated images. The Semantic Consistency Score proposed here offers a measure of image generation alignment, facilitating the evaluation of model architectures for specific tasks and aiding in informed decision-making regarding model selection.

Updated: 2024-04-12 20:16:03

标题: 语义方法量化扩散模型图像生成一致性

摘要: 在这项研究中，我们确定了扩散模型中图像生成的可重复性或一致性的可解释性、定量评分的需求。我们提出了一种语义方法，使用成对均值CLIP（对比语言-图像预训练）得分作为我们的语义一致性分数。我们将这一度量标准应用于比较两种最先进的开源图像生成扩散模型，即稳定扩散XL和PixArt-α，并发现模型之间的语义一致性得分存在显著差异。语义一致性评分选定的模型与人类注释的一致性达到了94%。我们还探讨了SDXL和LoRA微调版本的一致性，发现微调模型在生成的图像中具有显著更高的语义一致性。这里提出的语义一致性评分提供了图像生成对齐的度量，有助于评估特定任务的模型架构，并帮助在模型选择方面做出明智决策。

更新时间: 2024-04-12 20:16:03

领域: cs.CV,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2404.08799v1

Diffusion-Based Joint Temperature and Precipitation Emulation of Earth System Models

Earth system models (ESMs) are the principal tools used in climate science to generate future climate projections under various atmospheric emissions scenarios on a global or regional scale. Generative deep learning approaches are suitable for emulating these tools due to their computational efficiency and ability, once trained, to generate realizations in a fraction of the time required by ESMs. We extend previous work that used a generative probabilistic diffusion model to emulate ESMs by targeting the joint emulation of multiple variables, temperature and precipitation, by a single diffusion model. Joint generation of multiple variables is critical to generate realistic samples of phenomena resulting from the interplay of multiple variables. The diffusion model emulator takes in the monthly mean-maps of temperature and precipitation and produces the daily values of each of these variables that exhibit statistical properties similar to those generated by ESMs. Our results show the outputs from our extended model closely resemble those from ESMs on various climate metrics including dry spells and hot streaks, and that the joint distribution of temperature and precipitation in our sample closely matches those of ESMs.

Updated: 2024-04-12 20:13:19

标题: 扩散基于的地球系统模型的温度和降水联合模拟

摘要: 地球系统模型（ESMs）是气候科学中用于在全球或区域范围内生成未来气候预测的主要工具。生成式深度学习方法适用于模拟这些工具，因为它们在计算效率和一旦训练完成后以较短时间生成实现的能力方面表现出色，这比ESMs所需的时间要短得多。我们扩展了先前使用生成概率扩散模型来模拟ESMs的工作，通过定位单个扩散模型联合模拟多个变量，即温度和降水。联合生成多个变量对于生成由多个变量相互作用产生的现象的逼真样本至关重要。扩散模型仿真器接收温度和降水的月均地图，并生成展现出类似于ESMs生成的统计特性的这些变量的每日值。我们的结果显示，我们扩展的模型的输出与ESMs在包括干旱时期和高温期在内的各种气候指标上密切相似，而且我们样本中的温度和降水的联合分布与ESMs的紧密匹配。

更新时间: 2024-04-12 20:13:19

领域: physics.ao-ph,cs.LG,physics.geo-ph

下载: http://arxiv.org/abs/2404.08797v1

PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck

CLIP-based classifiers rely on the prompt containing a {class name} that is known to the text encoder. Therefore, they perform poorly on new classes or the classes whose names rarely appear on the Internet (e.g., scientific names of birds). For fine-grained classification, we propose PEEB - an explainable and editable classifier to (1) express the class name into a set of text descriptors that describe the visual parts of that class; and (2) match the embeddings of the detected parts to their textual descriptors in each class to compute a logit score for classification. In a zero-shot setting where the class names are unknown, PEEB outperforms CLIP by a huge margin (~10x in top-1 accuracy). Compared to part-based classifiers, PEEB is not only the state-of-the-art (SOTA) on the supervised-learning setting (88.80% and 92.20% accuracy on CUB-200 and Dogs-120, respectively) but also the first to enable users to edit the text descriptors to form a new classifier without any re-training. Compared to concept bottleneck models, PEEB is also the SOTA in both zero-shot and supervised-learning settings.

Updated: 2024-04-12 20:10:29

标题: PEEB：基于部件的图像分类器与可解释和可编辑的语言瓶颈

摘要: 基于CLIP的分类器依赖于包含文本编码器已知的{类名}的提示。因此，它们在新类别或类别名称很少出现在互联网上的类别上表现不佳（例如，鸟类的学名）。针对细粒度分类，我们提出了PEEB - 一种可解释且可编辑的分类器，用于（1）将类别名称表达为描述该类别视觉部分的一组文本描述符；以及（2）将检测到的部分的嵌入与每个类别中它们的文本描述符进行匹配，以计算分类的逻辑分数。在零样本设置中，当类别名称未知时，PEEB在top-1准确率方面大幅优于CLIP（约提高了10倍）。与基于部分的分类器相比，PEEB不仅在监督学习设置上达到了最新技术水平（分别在CUB-200和Dogs-120上达到了88.80%和92.20%的准确率），而且还是第一个使用户能够编辑文本描述符以形成新的分类器而无需重新训练的分类器。与概念瓶颈模型相比，PEEB在零样本和监督学习设置中也是最新技术水平。

更新时间: 2024-04-12 20:10:29

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.05297v3

JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models

The proliferation of large language models (LLMs) has underscored concerns regarding their security vulnerabilities, notably against jailbreak attacks, where adversaries design jailbreak prompts to circumvent safety mechanisms for potential misuse. Addressing these concerns necessitates a comprehensive analysis of jailbreak prompts to evaluate LLMs' defensive capabilities and identify potential weaknesses. However, the complexity of evaluating jailbreak performance and understanding prompt characteristics makes this analysis laborious. We collaborate with domain experts to characterize problems and propose an LLM-assisted framework to streamline the analysis process. It provides automatic jailbreak assessment to facilitate performance evaluation and support analysis of components and keywords in prompts. Based on the framework, we design JailbreakLens, a visual analysis system that enables users to explore the jailbreak performance against the target model, conduct multi-level analysis of prompt characteristics, and refine prompt instances to verify findings. Through a case study, technical evaluations, and expert interviews, we demonstrate our system's effectiveness in helping users evaluate model security and identify model weaknesses.

Updated: 2024-04-12 19:54:42

标题: 越狱镜头：对大型语言模型进行越狱攻击的视觉分析

摘要: 大型语言模型（LLM）的大量增长凸显了对其安全漏洞的担忧，尤其是针对越狱攻击的担忧，其中对手设计越狱提示以规避潜在滥用的安全机制。解决这些问题需要对越狱提示进行全面分析，以评估LLMs的防御能力并识别潜在弱点。然而，评估越狱性能和理解提示特征的复杂性使得这种分析费力。我们与领域专家合作，对问题进行表征，并提出了一个LLM辅助框架来简化分析过程。它提供自动越狱评估，以便促进性能评估，并支持对提示中的组件和关键词进行分析。基于这个框架，我们设计了JailbreakLens，一个可视化分析系统，使用户能够探索针对目标模型的越狱性能，进行提示特征的多层分析，并优化提示实例以验证发现。通过案例研究、技术评估和专家访谈，我们展示了我们的系统在帮助用户评估模型安全性并识别模型弱点方面的有效性。

更新时间: 2024-04-12 19:54:42

领域: cs.CR,cs.CL,cs.HC

下载: http://arxiv.org/abs/2404.08793v1

The Path To Autonomous Cyber Defense

Defenders are overwhelmed by the number and scale of attacks against their networks.This problem will only be exacerbated as attackers leverage artificial intelligence to automate their workflows. We propose a path to autonomous cyber agents able to augment defenders by automating critical steps in the cyber defense life cycle.

Updated: 2024-04-12 19:51:45

标题: 实现自主网络防御的途径

摘要: 防御者被对其网络的攻击数量和规模所压倒。随着攻击者利用人工智能来自动化其工作流程，这个问题只会进一步恶化。我们提出了一条通往自主网络安全代理的路径，能够通过自动化网络安全防御生命周期中的关键步骤来增强防御者。

更新时间: 2024-04-12 19:51:45

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2404.10788v1

Convergence of coordinate ascent variational inference for log-concave measures via optimal transport

Mean field variational inference (VI) is the problem of finding the closest product (factorized) measure, in the sense of relative entropy, to a given high-dimensional probability measure $\rho$. The well known Coordinate Ascent Variational Inference (CAVI) algorithm aims to approximate this product measure by iteratively optimizing over one coordinate (factor) at a time, which can be done explicitly. Despite its popularity, the convergence of CAVI remains poorly understood. In this paper, we prove the convergence of CAVI for log-concave densities $\rho$. If additionally $\log \rho$ has Lipschitz gradient, we find a linear rate of convergence, and if also $\rho$ is strongly log-concave, we find an exponential rate. Our analysis starts from the observation that mean field VI, while notoriously non-convex in the usual sense, is in fact displacement convex in the sense of optimal transport when $\rho$ is log-concave. This allows us to adapt techniques from the optimization literature on coordinate descent algorithms in Euclidean space.

Updated: 2024-04-12 19:43:54

标题: 通过最优输运的对数凹测度的坐标上升变分推断的收敛性

摘要: 均场变分推断（VI）是在相对熵意义下找到与给定高维概率测度$\rho$最接近的乘积（因子化）测度的问题。著名的坐标上升变分推断（CAVI）算法旨在通过逐次优化一个坐标（因子）来近似这个乘积测度，这可以明确地完成。尽管CAVI很受欢迎，但其收敛性仍然不够清楚。在本文中，我们证明了对于对数凹密度$\rho$，CAVI的收敛性。如果此外$\log \rho$具有Lipschitz梯度，我们找到一个线性的收敛速度，如果$\rho$还是强对数凹的，则我们找到指数速度。我们的分析始于这样一个观察：均场VI，在通常意义上臭名昭著的非凸性，实际上在最优输运的意义上是位移凸的，当$\rho$是对数凹时。这使我们能够从欧几里得空间的坐标下降算法的优化文献中调整技术。

更新时间: 2024-04-12 19:43:54

领域: stat.ML,cs.LG,math.OC,math.PR,math.ST,stat.TH

下载: http://arxiv.org/abs/2404.08792v1

Handling Reward Misspecification in the Presence of Expectation Mismatch

Detecting and handling misspecified objectives, such as reward functions, has been widely recognized as one of the central challenges within the domain of Artificial Intelligence (AI) safety research. However, even with the recognition of the importance of this problem, we are unaware of any works that attempt to provide a clear definition for what constitutes (a) misspecified objectives and (b) successfully resolving such misspecifications. In this work, we use the theory of mind, i.e., the human user's beliefs about the AI agent, as a basis to develop a formal explanatory framework called Expectation Alignment (EAL) to understand the objective misspecification and its causes. Our \EAL\ framework not only acts as an explanatory framework for existing works but also provides us with concrete insights into the limitations of existing methods to handle reward misspecification and novel solution strategies. We use these insights to propose a new interactive algorithm that uses the specified reward to infer potential user expectations about the system behavior. We show how one can efficiently implement this algorithm by mapping the inference problem into linear programs. We evaluate our method on a set of standard Markov Decision Process (MDP) benchmarks.

Updated: 2024-04-12 19:43:37

标题: 在期望不匹配的情况下处理奖励错误规范化

摘要: 检测和处理错误规定的目标，如奖励函数，被广泛认为是人工智能（AI）安全研究领域的一个核心挑战之一。然而，即使认识到这个问题的重要性，我们还不知道有任何作品试图为什么构成（a）错误规定的目标和（b）成功解决这些错误规定提供一个明确的定义。在这项工作中，我们使用心灵理论，即人类用户对AI代理的信仰，作为基础来发展一个名为期望对齐（EAL）的正式解释框架，以理解目标规定的错误及其原因。我们的EAL框架不仅作为现有作品的解释框架，还为我们提供了关于处理奖励规定错误和新颖解决策略的具体见解。我们利用这些见解提出了一种新的交互算法，该算法使用指定的奖励来推断系统行为的潜在用户期望。我们展示了如何通过将推断问题映射到线性规划中来有效实现这种算法。我们在一组标准马尔可夫决策过程（MDP）基准上评估了我们的方法。

更新时间: 2024-04-12 19:43:37

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.08791v1

Differentiable and Stable Long-Range Tracking of Multiple Posterior Modes

Particle filters flexibly represent multiple posterior modes nonparametrically, via a collection of weighted samples, but have classically been applied to tracking problems with known dynamics and observation likelihoods. Such generative models may be inaccurate or unavailable for high-dimensional observations like images. We instead leverage training data to discriminatively learn particle-based representations of uncertainty in latent object states, conditioned on arbitrary observations via deep neural network encoders. While prior discriminative particle filters have used heuristic relaxations of discrete particle resampling, or biased learning by truncating gradients at resampling steps, we achieve unbiased and low-variance gradient estimates by representing posteriors as continuous mixture densities. Our theory and experiments expose dramatic failures of existing reparameterization-based estimators for mixture gradients, an issue we address via an importance-sampling gradient estimator. Unlike standard recurrent neural networks, our mixture density particle filter represents multimodal uncertainty in continuous latent states, improving accuracy and robustness. On a range of challenging tracking and robot localization problems, our approach achieves dramatic improvements in accuracy, while also showing much greater stability across multiple training runs.

Updated: 2024-04-12 19:33:52

标题: 多个后验模式的可微稳定长程跟踪

摘要: 粒子滤波器以一组加权样本的形式灵活地非参数表示多个后验模式，但传统上被应用于具有已知动态和观测似然的跟踪问题。对于像图像这样的高维观测，这样的生成模型可能不准确或不可用。我们利用训练数据，通过深度神经网络编码器，以歧视性学习基于粒子的不确定性表示潜在对象状态，以任意观测为条件。虽然先前的歧视性粒子滤波器已经使用启发式的离散粒子重采样的放松，或通过在重采样步骤截断梯度来实现有偏学习，但我们通过将后验表示为连续混合密度，实现了无偏和低方差梯度估计。我们的理论和实验揭示了现有基于重新参数化的混合梯度估计器的显著失败，这一问题我们通过重要性采样梯度估计器来解决。与标准循环神经网络不同，我们的混合密度粒子滤波器在连续潜在状态中表示多模态不确定性，提高了准确性和稳健性。在一系列具有挑战性的跟踪和机器人定位问题上，我们的方法在准确性方面取得了显著改进，同时在多次训练运行中也表现出更大的稳定性。

更新时间: 2024-04-12 19:33:52

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2404.08789v1

Detecting AI-Generated Images via CLIP

As AI-generated image (AIGI) methods become more powerful and accessible, it has become a critical task to determine if an image is real or AI-generated. Because AIGI lack the signatures of photographs and have their own unique patterns, new models are needed to determine if an image is AI-generated. In this paper, we investigate the ability of the Contrastive Language-Image Pre-training (CLIP) architecture, pre-trained on massive internet-scale data sets, to perform this differentiation. We fine-tune CLIP on real images and AIGI from several generative models, enabling CLIP to determine if an image is AI-generated and, if so, determine what generation method was used to create it. We show that the fine-tuned CLIP architecture is able to differentiate AIGI as well or better than models whose architecture is specifically designed to detect AIGI. Our method will significantly increase access to AIGI-detecting tools and reduce the negative effects of AIGI on society, as our CLIP fine-tuning procedures require no architecture changes from publicly available model repositories and consume significantly less GPU resources than other AIGI detection models.

Updated: 2024-04-12 19:29:10

标题: 通过CLIP检测生成的人工智能图像

摘要: 随着人工智能生成图像（AIGI）方法变得更加强大和易于获取，确定图像是真实还是人工智能生成的已成为一项关键任务。因为AIGI缺乏照片的特征并具有自己独特的模式，所以需要新模型来确定图像是否是人工智能生成的。在本文中，我们研究了在大规模互联网数据集上预训练的对比语言-图像预训练（CLIP）架构的能力，以执行这种区分。我们在真实图像和来自多个生成模型的AIGI上对CLIP进行微调，使其能够确定图像是否是人工智能生成的，并且如果是的话，确定使用了哪种生成方法来创建它。我们展示了经过微调的CLIP架构能够比专门设计用于检测AIGI的模型更好地区分AIGI。我们的方法将显著增加对AIGI检测工具的访问，并减少AIGI对社会的负面影响，因为我们的CLIP微调程序不需要从公开可用的模型存储库中进行架构更改，并且消耗的GPU资源显著少于其他AIGI检测模型。

更新时间: 2024-04-12 19:29:10

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.08788v1

CiFlow: Dataflow Analysis and Optimization of Key Switching for Homomorphic Encryption

Homomorphic encryption (HE) is a privacy-preserving computation technique that enables computation on encrypted data. Today, the potential of HE remains largely unrealized as it is impractically slow, preventing it from being used in real applications. A major computational bottleneck in HE is the key-switching operation, accounting for approximately 70% of the overall HE execution time and involving a large amount of data for inputs, intermediates, and keys. Prior research has focused on hardware accelerators to improve HE performance, typically featuring large on-chip SRAMs and high off-chip bandwidth to deal with large scale data. In this paper, we present a novel approach to improve key-switching performance by rigorously analyzing its dataflow. Our primary goal is to optimize data reuse with limited on-chip memory to minimize off-chip data movement. We introduce three distinct dataflows: Max-Parallel (MP), Digit-Centric (DC), and Output-Centric (OC), each with unique scheduling approaches for key-switching computations. Through our analysis, we show how our proposed Output-Centric technique can effectively reuse data by significantly lowering the intermediate key-switching working set and alleviating the need for massive off-chip bandwidth. We thoroughly evaluate the three dataflows using the RPU, a recently published vector processor tailored for ring processing algorithms, which includes HE. This evaluation considers sweeps of bandwidth and computational throughput, and whether keys are buffered on-chip or streamed. With OC, we demonstrate up to 4.16x speedup over the MP dataflow and show how OC can save 12.25x on-chip SRAM by streaming keys for minimal performance penalty.

Updated: 2024-04-12 19:17:58

标题: CiFlow：同态加密关键切换的数据流分析和优化

摘要: 同态加密（HE）是一种保护隐私的计算技术，可以在加密数据上进行计算。今天，HE的潜力仍然没有得到充分实现，因为它速度过慢，无法在实际应用中使用。HE中的一个主要计算瓶颈是密钥切换操作，约占整体HE执行时间的70％，涉及大量输入、中间数据和密钥。先前的研究集中在硬件加速器上，以提高HE性能，通常具有大型芯片上的SRAM和高离片带宽，以处理大规模数据。在本文中，我们提出了一种新的方法，通过对其数据流的严格分析来改善密钥切换性能。我们的主要目标是优化数据重用，限制芯片上的内存，以最小化离片数据移动。我们引入了三种不同的数据流：最大并行（MP），数字中心（DC）和输出中心（OC），每种数据流都具有独特的调度方法用于密钥切换计算。通过我们的分析，我们展示了我们提出的输出中心技术如何通过显著降低中间密钥切换工作集来有效地重用数据，并减轻对大量离片带宽的需求。我们使用RPU进行了对三种数据流的全面评估，RPU是一种专为环处理算法（包括HE）设计的最近发布的矢量处理器。这个评估考虑了带宽和计算吞吐量的扫描，以及密钥是在芯片上缓冲还是流式传输。通过OC，我们展示了相对于MP数据流高达4.16倍的速度提升，并展示了如何通过流式传输密钥来节省12.25倍的芯片上SRAM，而只有最小的性能惩罚。

更新时间: 2024-04-12 19:17:58

领域: cs.CR,cs.AR,cs.PF

下载: http://arxiv.org/abs/2311.01598v3

NeuroLGP-SM: Scalable Surrogate-Assisted Neuroevolution for Deep Neural Networks

Evolutionary Algorithms (EAs) play a crucial role in the architectural configuration and training of Artificial Deep Neural Networks (DNNs), a process known as neuroevolution. However, neuroevolution is hindered by its inherent computational expense, requiring multiple generations, a large population, and numerous epochs. The most computationally intensive aspect lies in evaluating the fitness function of a single candidate solution. To address this challenge, we employ Surrogate-assisted EAs (SAEAs). While a few SAEAs approaches have been proposed in neuroevolution, none have been applied to truly large DNNs due to issues like intractable information usage. In this work, drawing inspiration from Genetic Programming semantics, we use phenotypic distance vectors, outputted from DNNs, alongside Kriging Partial Least Squares (KPLS), an approach that is effective in handling these large vectors, making them suitable for search. Our proposed approach, named Neuro-Linear Genetic Programming surrogate model (NeuroLGP-SM), efficiently and accurately estimates DNN fitness without the need for complete evaluations. NeuroLGP-SM demonstrates competitive or superior results compared to 12 other methods, including NeuroLGP without SM, convolutional neural networks, support vector machines, and autoencoders. Additionally, it is worth noting that NeuroLGP-SM is 25% more energy-efficient than its NeuroLGP counterpart. This efficiency advantage adds to the overall appeal of our proposed NeuroLGP-SM in optimising the configuration of large DNNs.

Updated: 2024-04-12 19:15:38

标题: NeuroLGP-SM：用于深度神经网络的可扩展代理辅助神经进化

摘要: 进化算法（EAs）在人工深度神经网络（DNNs）的架构配置和训练中发挥着至关重要的作用，这个过程被称为神经进化。然而，神经进化受到其固有的计算开销的阻碍，需要多代、大规模种群和众多时代。最具计算密集性的方面在于评估单个候选解的适应函数。为了解决这一挑战，我们采用了辅助代理进化算法（SAEAs）。虽然在神经进化中提出了一些SAEAs方法，但由于信息使用的棘手问题，尚未将其应用于真正大型的DNNs。在这项工作中，我们从遗传编程语义中汲取灵感，使用从DNNs输出的表型距离向量，结合Kriging偏最小二乘（KPLS）方法，这种方法在处理这些大向量方面非常有效，使其适用于搜索。我们提出的方法称为神经线性遗传编程代理模型（NeuroLGP-SM），能够高效准确地估计DNN的适应度，无需完全评估。NeuroLGP-SM相较于其他12种方法，包括没有SM的NeuroLGP、卷积神经网络、支持向量机和自编码器，展示出竞争力或更优的结果。此外，值得注意的是，NeuroLGP-SM比其NeuroLGP对应物节能25%。这种效率优势增加了我们提出的NeuroLGP-SM在优化大型DNNs配置中的吸引力。

更新时间: 2024-04-12 19:15:38

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2404.08786v1

Stochastic Halpern iteration in normed spaces and applications to reinforcement learning

We analyze the oracle complexity of the stochastic Halpern iteration with variance reduction, where we aim to approximate fixed-points of nonexpansive and contractive operators in a normed finite-dimensional space. We show that if the underlying stochastic oracle is with uniformly bounded variance, our method exhibits an overall oracle complexity of $\tilde{O}(\varepsilon^{-5})$, improving recent rates established for the stochastic Krasnoselskii-Mann iteration. Also, we establish a lower bound of $\Omega(\varepsilon^{-3})$, which applies to a wide range of algorithms, including all averaged iterations even with minibatching. Using a suitable modification of our approach, we derive a $O(\varepsilon^{-2}(1-\gamma)^{-3})$ complexity bound in the case in which the operator is a $\gamma$-contraction. As an application, we propose new synchronous algorithms for average reward and discounted reward Markov decision processes. In particular, for the average reward, our method improves on the best-known sample complexity.

Updated: 2024-04-12 19:14:59

标题: 随机哈尔珀恩迭代在赋范空间中的应用及其在强化学习中的应用

摘要: 我们分析了带有方差减少的随机Halpern迭代的Oracle复杂度，我们的目标是在一个标准的有限维空间中逼近非扩张和收缩算子的不动点。我们表明，如果底层的随机Oracle具有均匀有界的方差，我们的方法展现出总体的Oracle复杂度为$\tilde{O}(\varepsilon^{-5})$，这优于最近建立的随机Krasnoselskii-Mann迭代的速率。此外，我们建立了一个$\Omega(\varepsilon^{-3})$的下界，适用于广泛的算法，包括所有带小批处理的平均迭代。通过对我们方法的适当修改，我们在算子是$\gamma$-收缩的情况下得到一个$O(\varepsilon^{-2}(1-\gamma)^{-3})$的复杂度界。作为一个应用，我们提出了新的用于平均奖励和折扣奖励马尔可夫决策过程的同步算法。特别是对于平均奖励，我们的方法改进了已知的最佳样本复杂度。

更新时间: 2024-04-12 19:14:59

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.12338v2

Under pressure: learning-based analog gauge reading in the wild

We propose an interpretable framework for reading analog gauges that is deployable on real world robotic systems. Our framework splits the reading task into distinct steps, such that we can detect potential failures at each step. Our system needs no prior knowledge of the type of gauge or the range of the scale and is able to extract the units used. We show that our gauge reading algorithm is able to extract readings with a relative reading error of less than 2%.

Updated: 2024-04-12 19:13:42

标题: 承受压力：野外环境中基于学习的模拟量表读数

摘要: 我们提出了一个可在现实世界机器人系统上部署的可解释性框架，用于读取模拟表。我们的框架将阅读任务分为不同的步骤，从而可以在每个步骤检测潜在的故障。我们的系统无需对表的类型或刻度范围有任何先验知识，并且能够提取使用的单位。我们展示了我们的表读取算法能够提取出相对读数误差小于2%的读数。

更新时间: 2024-04-12 19:13:42

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2404.08785v1

Towards Sim-to-Real Industrial Parts Classification with Synthetic Dataset

This paper is about effectively utilizing synthetic data for training deep neural networks for industrial parts classification, in particular, by taking into account the domain gap against real-world images. To this end, we introduce a synthetic dataset that may serve as a preliminary testbed for the Sim-to-Real challenge; it contains 17 objects of six industrial use cases, including isolated and assembled parts. A few subsets of objects exhibit large similarities in shape and albedo for reflecting challenging cases of industrial parts. All the sample images come with and without random backgrounds and post-processing for evaluating the importance of domain randomization. We call it Synthetic Industrial Parts dataset (SIP-17). We study the usefulness of SIP-17 through benchmarking the performance of five state-of-the-art deep network models, supervised and self-supervised, trained only on the synthetic data while testing them on real data. By analyzing the results, we deduce some insights on the feasibility and challenges of using synthetic data for industrial parts classification and for further developing larger-scale synthetic datasets. Our dataset and code are publicly available.

Updated: 2024-04-12 19:04:59

标题: 朝着使用合成数据集进行从Sim到Real的工业零件分类

摘要: 本文探讨了如何有效利用合成数据来训练深度神经网络，以用于工业零件分类，特别是考虑到与真实世界图像之间的领域差距。为此，我们引入了一个合成数据集，可作为Sim-to-Real挑战的初步测试平台；它包含了六种工业用途的17个对象，包括孤立和组装零件。一些对象的子集在形状和反射具有挑战性的工业零件案例中展现出较大的相似性。所有样本图像都带有和不带有随机背景和后期处理，以评估领域随机化的重要性。我们将其称为合成工业零件数据集（SIP-17）。我们通过对五种最先进的深度网络模型的性能进行基准测试，包括监督和自监督，仅在合成数据上进行训练，然后在真实数据上对其进行测试，以研究SIP-17的实用性。通过分析结果，我们推断出使用合成数据进行工业零件分类以及进一步开发更大规模合成数据集的可行性和挑战。我们的数据集和代码是公开可用的。

更新时间: 2024-04-12 19:04:59

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.08778v1

AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs

In this work, we extend the instruction-tuned Llama-2 model with end-to-end general-purpose speech processing and reasoning abilities while maintaining the wide range of original LLM capabilities, without using any carefully curated paired data. The resulting end-to-end model, named AudioChatLlama, can utilize audio prompts as a replacement for text and sustain a conversation. Such a model also has extended cross-modal capabilities such as being able to perform spoken question answering (QA), speech translation, and audio summarization amongst many other closed and open-domain tasks. This is unlike prior approaches in speech, in which LLMs are extended to handle audio for a limited number of pre-designated tasks. On both synthesized and recorded speech QA test sets, evaluations show that our end-to-end approach is on par with or outperforms cascaded systems (speech recognizer + LLM) in terms of modeling the response to a prompt. Furthermore, unlike cascades, our approach can interchange text and audio modalities and intrinsically utilize prior context in a conversation to provide better results.

Updated: 2024-04-12 18:55:22

标题: AudioChatLlama：面向LLMs的通用语音能力

摘要: 在这项工作中，我们扩展了经过调整的Llama-2模型，具有端到端的通用语音处理和推理能力，同时保持原始LLM功能的广泛范围，而不使用任何精心策划的配对数据。最终的端到端模型被命名为AudioChatLlama，可以利用音频提示作为文本的替代，并保持对话。这样的模型还具有扩展的跨模态功能，例如能够执行口头问答（QA）、语音翻译和音频摘要等许多封闭和开放领域任务。这与先前在语音领域的方法不同，其中LLMs被扩展为处理一定数量的预先指定任务的音频。在合成和录制的语音QA测试集上，评估显示我们的端到端方法在模拟响应提示方面与级联系统（语音识别器+LLM）不相上下或表现更好。此外，与级联系统不同，我们的方法可以交换文本和音频模态，并在对话中内在地利用先前的上下文来提供更好的结果。

更新时间: 2024-04-12 18:55:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.06753v2

Predicting the Geothermal Gradient in Colombia: a Machine Learning Approach

Accurate determination of the geothermal gradient is critical for assessing the geothermal energy potential of a given region. Of particular interest is the case of Colombia, a country with abundant geothermal resources. A history of active oil and gas exploration and production has left drilled boreholes in different geological settings, providing direct measurements of the geothermal gradient. Unfortunately, large regions of the country where geothermal resources might exist lack such measurements. Indirect geophysical measurements are costly and difficult to perform at regional scales. Computational thermal models could be constructed, but they require very detailed knowledge of the underlying geology and uniform sampling of subsurface temperatures to be well-constrained. We present an alternative approach that leverages recent advances in supervised machine learning and available direct measurements to predict the geothermal gradient in regions where only global-scale geophysical datasets and course geological knowledge are available. We find that a Gradient Boosted Regression Tree algorithm yields optimal predictions and extensively validate the trained model. We show that predictions of our model are within 12\% accuracy and that independent measurements performed by other authors agree well with our model. Finnally, we present a geothermal gradient map for Colombia that highlights regions where futher exploration and data collection should be performed.

Updated: 2024-04-12 18:52:02

标题: 预测哥伦比亚地热梯度：一种机器学习方法

摘要: 准确确定地热梯度对于评估特定地区地热能潜力至关重要。哥伦比亚是一个拥有丰富地热资源的国家，特别引人关注。活跃的石油和天然气勘探与生产历史留下了在不同地质环境中钻探的钻孔，提供了对地热梯度的直接测量。不幸的是，该国许多可能存在地热资源的地区缺乏这样的测量数据。间接地球物理测量成本高且在区域尺度上难以执行。计算热模型可以构建，但需要对基础地质有非常详细的了解，并对地下温度进行统一采样以使其受限。我们提出了一种利用监督机器学习和现有直接测量来预测地热梯度的替代方法，在仅有全球尺度地球物理数据和粗略地质知识的地区。我们发现梯度提升回归树算法产生最佳预测，并对训练模型进行了广泛验证。我们展示了我们模型的预测准确度在12%之内，并且其他作者进行的独立测量与我们的模型吻合良好。最后，我们展示了哥伦比亚的地热梯度图，突出显示了需要进行进一步勘探和数据收集的地区。

更新时间: 2024-04-12 18:52:02

领域: physics.geo-ph,cs.LG

下载: http://arxiv.org/abs/2404.05184v2

Corn Yield Prediction Model with Deep Neural Networks for Smallholder Farmer Decision Support System

Crop yield prediction has been modeled on the assumption that there is no interaction between weather and soil variables. However, this paper argues that an interaction exists, and it can be finely modelled using the Kendall Correlation coefficient. Given the nonlinearity of the interaction between weather and soil variables, a deep neural network regressor (DNNR) is carefully designed with consideration to the depth, number of neurons of the hidden layers, and the hyperparameters with their optimizations. Additionally, a new metric, the average of absolute root squared error (ARSE) is proposed to combine the strengths of root mean square error (RMSE) and mean absolute error (MAE). With the ARSE metric, the proposed DNNR(s), optimised random forest regressor (RFR) and the extreme gradient boosting regressor (XGBR) achieved impressively small yield errors, 0.0172 t/ha, and 0.0243 t/ha, 0.0001 t/ha, and 0.001 t/ha, respectively. However, the DNNR(s), with changes to the explanatory variables to ensure generalizability to unforeseen data, DNNR(s) performed best. Further analysis reveals that a strong interaction does exist between weather and soil variables. Precisely, yield is observed to increase when precipitation is reduced and silt increased, and vice-versa. However, the degree of decrease or increase is not quantified in this paper. Contrary to existing yield models targeted towards agricultural policies and global food security, the goal of the proposed corn yield model is to empower the smallholder farmer to farm smartly and intelligently, thus the prediction model is integrated into a mobile application that includes education, and a farmer-to-market access module.

Updated: 2024-04-12 18:49:46

标题: 小农户决策支持系统中基于深度神经网络的玉米产量预测模型

摘要: 作物产量预测一直是基于天气和土壤变量之间没有相互作用的假设建模的。然而，本文认为存在相互作用，并且可以使用肯德尔相关系数进行精细建模。考虑到天气和土壤变量之间的相互作用的非线性，设计了一个深度神经网络回归器(DNNR)，并对隐藏层的深度、神经元数量以及超参数进行了精心设计和优化。此外，提出了一个新的指标，平均绝对平方根误差(ARSE)，以结合均方根误差(RMSE)和平均绝对误差(MAE)的优势。通过ARSE指标，提出的DNNR(s)、优化的随机森林回归器(RFR)和极端梯度提升回归器(XGBR)分别取得了令人印象深刻的小型产量误差，分别为0.0172 t/ha、0.0243 t/ha、0.0001 t/ha和0.001 t/ha。然而，通过对解释变量进行变化以确保对未知数据的泛化性，DNNR(s)表现最佳。进一步的分析表明，天气和土壤变量之间存在强烈的相互作用。准确地说，当降水减少和粉土增加时，产量增加，反之亦然。然而，本文未对减少或增加的程度进行量化。与现有的针对农业政策和全球粮食安全的产量模型相反，提出的玉米产量模型的目标是让小农户能够聪明地种植，因此预测模型被集成到一个包括教育和农民到市场接入模块的移动应用程序中。

更新时间: 2024-04-12 18:49:46

领域: cs.LG,cs.AI,cs.CY,cs.HC

下载: http://arxiv.org/abs/2401.03768v2

BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks

Novices frequently engage with machine learning tutorials in computational notebooks and have been adopting code generation technologies based on large language models (LLMs). However, they encounter difficulties in understanding and working with code produced by LLMs. To mitigate these challenges, we introduce a novel workflow into computational notebooks that augments LLM-based code generation with an additional ephemeral UI step, offering users UI-based scaffolds as an intermediate stage between user prompts and code generation. We present this workflow in BISCUIT, an extension for JupyterLab that provides users with ephemeral UIs generated by LLMs based on the context of their code and intentions, scaffolding users to understand, guide, and explore with LLM-generated code. Through a user study where 10 novices used BISCUIT for machine learning tutorials, we discover that BISCUIT offers user semantic representation of code to aid their understanding, reduces the complexity of prompt engineering, and creates a playground for users to explore different variables and iterate on their ideas. We discuss the implications of our findings for UI-centric interactive paradigm in code generation LLMs.

Updated: 2024-04-12 18:46:18

标题: 饼干：在计算笔记本中使用瞬时用户界面为LLM生成的代码搭建脚手架

摘要: 新手经常在计算笔记本中使用机器学习教程，并采用基于大型语言模型（LLMs）的代码生成技术。然而，他们在理解和使用LLMs生成的代码时遇到困难。为了减轻这些挑战，我们将一种新的工作流程引入计算笔记本，通过一个额外的短暂UI步骤来增强基于LLMs的代码生成，为用户提供基于UI的支架作为用户提示和代码生成之间的中间阶段。我们在JupyterLab的扩展“BISCUIT”中展示了这种工作流程，该扩展为用户提供了基于其代码和意图上下文生成的短暂UI，为用户提供理解、指导和探索LLM生成的代码的支架。通过一项用户研究，其中有10名新手使用BISCUIT进行机器学习教程，我们发现BISCUIT提供了用户对代码的语义表示，以帮助他们理解，减少了提示工程的复杂性，并为用户提供了一个探索不同变量并迭代其想法的平台。我们讨论了我们在代码生成LLMs中发现的UI中心交互范式的影响。

更新时间: 2024-04-12 18:46:18

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2404.07387v2

LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning

Understanding human instructions to identify the target objects is vital for perception systems. In recent years, the advancements of Large Language Models (LLMs) have introduced new possibilities for image segmentation. In this work, we delve into reasoning segmentation, a novel task that enables segmentation system to reason and interpret implicit user intention via large language model reasoning and then segment the corresponding target. Our work on reasoning segmentation contributes on both the methodological design and dataset labeling. For the model, we propose a new framework named LLM-Seg. LLM-Seg effectively connects the current foundational Segmentation Anything Model and the LLM by mask proposals selection. For the dataset, we propose an automatic data generation pipeline and construct a new reasoning segmentation dataset named LLM-Seg40K. Experiments demonstrate that our LLM-Seg exhibits competitive performance compared with existing methods. Furthermore, our proposed pipeline can efficiently produce high-quality reasoning segmentation datasets. The LLM-Seg40K dataset, developed through this pipeline, serves as a new benchmark for training and evaluating various reasoning segmentation approaches. Our code, models and dataset are at https://github.com/wangjunchi/LLMSeg.

Updated: 2024-04-12 18:45:51

标题: LLM-Seg：连接图像分割和大型语言模型推理

摘要: 理解人类指令以识别目标对象对于感知系统至关重要。近年来，大型语言模型（LLMs）的进展为图像分割带来了新的可能性。在这项工作中，我们深入研究了推理分割，这是一项新颖的任务，可以通过大型语言模型的推理来推理和解释隐含的用户意图，然后对相应的目标进行分割。我们在推理分割方面的工作在方法设计和数据集标记方面都做出了贡献。对于模型，我们提出了一个名为LLM-Seg的新框架。LLM-Seg通过蒙版提议选择有效地连接了当前基础分割任何模型和LLM。对于数据集，我们提出了一个自动数据生成流水线，并构建了一个名为LLM-Seg40K的新推理分割数据集。实验表明，我们的LLM-Seg表现出与现有方法相比具有竞争力的性能。此外，我们提出的流水线可以高效地生成高质量的推理分割数据集。通过这个流水线开发的LLM-Seg40K数据集可以作为训练和评估各种推理分割方法的新基准。我们的代码、模型和数据集可以在https://github.com/wangjunchi/LLMSeg找到。

更新时间: 2024-04-12 18:45:51

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.08767v1

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

Large Language Models (LLMs) have dramatically advanced AI applications, yet their deployment remains challenging due to their immense inference costs. Recent studies ameliorate the computational costs of LLMs by increasing their activation sparsity but suffer from significant performance degradation on downstream tasks. In this work, we introduce a new framework for sparsifying the activations of base LLMs and reducing inference costs, dubbed Contextually Aware Thresholding for Sparsity (CATS). CATS is relatively simple, easy to implement, and highly effective. At the heart of our framework is a new non-linear activation function. We demonstrate that CATS can be applied to various base models, including Mistral-7B and Llama2-7B, and outperforms existing sparsification techniques in downstream task performance. More precisely, CATS-based models often achieve downstream task performance within 1-2% of their base models without any fine-tuning and even at activation sparsity levels of 50%. Furthermore, CATS-based models converge faster and display better task performance than competing techniques when fine-tuning is applied. Finally, we develop a custom GPU kernel for efficient implementation of CATS that translates the activation of sparsity of CATS to real wall-clock time speedups. Our custom kernel implementation of CATS results in a ~15% improvement in wall-clock inference latency of token generation on both Llama-7B and Mistral-7B.

Updated: 2024-04-12 18:42:18

标题: 猫：大型语言模型中稀疏性的上下文感知阈值化

摘要: 大型语言模型（LLMs）已经极大地推动了人工智能应用的发展，然而它们的部署仍然具有挑战性，因为其巨大的推理成本。最近的研究通过增加激活稀疏性来改善LLMs的计算成本，但在下游任务中遭遇了显著的性能降级。在这项工作中，我们引入了一种新的框架来稀疏基础LLMs的激活并降低推理成本，命名为上下文感知稀疏性阈值（CATS）。CATS相对简单，易于实现，并且非常有效。我们框架的核心是一个新的非线性激活函数。我们展示了CATS可以应用于各种基础模型，包括Mistral-7B和Llama2-7B，并且在下游任务性能方面优于现有的稀疏化技术。更具体地说，基于CATS的模型通常可以在没有任何微调的情况下，在激活稀疏水平达到50%时，实现与其基础模型相近的下游任务性能。此外，当应用微调时，基于CATS的模型收敛更快，表现出比竞争技术更好的任务性能。最后，我们开发了一个定制的GPU内核，用于高效实现CATS，将CATS的稀疏激活转化为实际的墙钟时间加速。我们对CATS的定制内核实现结果在Llama-7B和Mistral-7B上的令牌生成墙钟推理延迟上实现了约15%的改善。

更新时间: 2024-04-12 18:42:18

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.08763v1

`Eyes of a Hawk and Ears of a Fox': Part Prototype Network for Generalized Zero-Shot Learning

Current approaches in Generalized Zero-Shot Learning (GZSL) are built upon base models which consider only a single class attribute vector representation over the entire image. This is an oversimplification of the process of novel category recognition, where different regions of the image may have properties from different seen classes and thus have different predominant attributes. With this in mind, we take a fundamentally different approach: a pre-trained Vision-Language detector (VINVL) sensitive to attribute information is employed to efficiently obtain region features. A learned function maps the region features to region-specific attribute attention used to construct class part prototypes. We conduct experiments on a popular GZSL benchmark consisting of the CUB, SUN, and AWA2 datasets where our proposed Part Prototype Network (PPN) achieves promising results when compared with other popular base models. Corresponding ablation studies and analysis show that our approach is highly practical and has a distinct advantage over global attribute attention when localized proposals are available.

Updated: 2024-04-12 18:37:00

标题: "鹰眼与狐耳：通用零样本学习的部分原型网络"

摘要: 目前广义零样本学习（GZSL）中的当前方法建立在基础模型之上，该模型仅考虑整个图像上的单个类属性向量表示。这是对新类别识别过程的过度简化，图像的不同区域可能具有来自不同已见类别的属性，因此具有不同的主要属性。考虑到这一点，我们采取了一种根本不同的方法：使用对属性信息敏感的预训练的视觉语言检测器（VINVL）来高效获取区域特征。一个学习的函数将区域特征映射到区域特定属性注意力，用于构建类部分原型。我们在一个包含CUB、SUN和AWA2数据集的流行GZSL基准测试上进行实验，我们提出的部分原型网络（PPN）与其他流行的基础模型相比取得了令人满意的结果。相应的消融研究和分析表明，我们的方法非常实用，并且在存在局部提议时具有明显优势。

更新时间: 2024-04-12 18:37:00

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.08761v1

The Generation Gap:Exploring Age Bias in Large Language Models

In this paper, we explore the alignment of values in Large Language Models (LLMs) with specific age groups, leveraging data from the World Value Survey across thirteen categories. Through a diverse set of prompts tailored to ensure response robustness, we find a general inclination of LLM values towards younger demographics. Additionally, we explore the impact of incorporating age identity information in prompts and observe challenges in mitigating value discrepancies with different age cohorts. Our findings highlight the age bias in LLMs and provide insights for future work.

Updated: 2024-04-12 18:36:20

标题: 代沟：探究大型语言模型中的年龄偏见

摘要: 在这篇论文中，我们探讨了大型语言模型（LLMs）的价值观与特定年龄群体的一致性，利用了来自世界价值调查的数据，涵盖了十三个类别。通过一系列旨在确保回应稳健性的多样化提示，我们发现LLM的价值观普遍倾向于年龄较轻的人群。此外，我们探讨了在提示中加入年龄身份信息的影响，并观察到在减轻与不同年龄群体之间的价值差异方面存在挑战。我们的发现突显了LLMs中的年龄偏见，并为未来的研究提供了见解。

更新时间: 2024-04-12 18:36:20

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.08760v1

Generating Illustrated Instructions

We introduce the new task of generating Illustrated Instructions, i.e., visual instructions customized to a user's needs. We identify desiderata unique to this task, and formalize it through a suite of automatic and human evaluation metrics, designed to measure the validity, consistency, and efficacy of the generations. We combine the power of large language models (LLMs) together with strong text-to-image generation diffusion models to propose a simple approach called StackedDiffusion, which generates such illustrated instructions given text as input. The resulting model strongly outperforms baseline approaches and state-of-the-art multimodal LLMs; and in 30% of cases, users even prefer it to human-generated articles. Most notably, it enables various new and exciting applications far beyond what static articles on the web can provide, such as personalized instructions complete with intermediate steps and pictures in response to a user's individual situation.

Updated: 2024-04-12 18:34:31

标题: 生成图解说明书

摘要: 我们引入了一个新任务，即生成图解说明书，即根据用户的需求定制的视觉说明书。我们确定了这个任务独特的愿望，并通过一套自动和人类评估指标对其进行了形式化，旨在衡量生成物的有效性、一致性和效果。我们结合了大型语言模型（LLMs）的强大力量以及强大的文本到图像生成扩散模型，提出了一种称为StackedDiffusion的简单方法，该方法以文本作为输入生成这种图解说明书。结果模型远远优于基线方法和最先进的多模式LLMs；在30%的情况下，用户甚至更喜欢它而不是人工生成的文章。最值得注意的是，它使得各种新颖的应用程序超出了网页上静态文章所能提供的范围，例如根据用户个人情况提供的包含中间步骤和图片的个性化说明书。

更新时间: 2024-04-12 18:34:31

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2312.04552v2

Training a Vision Language Model as Smartphone Assistant

Addressing the challenge of a digital assistant capable of executing a wide array of user tasks, our research focuses on the realm of instruction-based mobile device control. We leverage recent advancements in large language models (LLMs) and present a visual language model (VLM) that can fulfill diverse tasks on mobile devices. Our model functions by interacting solely with the user interface (UI). It uses the visual input from the device screen and mimics human-like interactions, encompassing gestures such as tapping and swiping. This generality in the input and output space allows our agent to interact with any application on the device. Unlike previous methods, our model operates not only on a single screen image but on vision-language sentences created from sequences of past screenshots along with corresponding actions. Evaluating our method on the challenging Android in the Wild benchmark demonstrates its promising efficacy and potential.

Updated: 2024-04-12 18:28:44

标题: 训练一个视觉语言模型作为智能手机助手

摘要: 我们的研究致力于解决数字助手面临的执行各种用户任务的挑战，重点关注基于指令的移动设备控制领域。我们利用最近在大型语言模型（LLMs）方面取得的进展，并提出了一个可以完成移动设备上各种任务的视觉语言模型（VLM）。我们的模型通过与用户界面（UI）进行交互来运作。它利用设备屏幕的视觉输入，并模仿人类般的互动，包括轻触和滑动等手势。输入和输出空间的广泛性使我们的代理能够与设备上的任何应用程序进行交互。与之前的方法不同，我们的模型不仅在单个屏幕图像上运行，还在从过去截图序列和相应操作创建的视觉语言句子上运行。我们在具有挑战性的Android in the Wild基准测试上评估我们的方法，证明了其有希望的有效性和潜力。

更新时间: 2024-04-12 18:28:44

领域: cs.LG,cs.AI,cs.CV,cs.HC

下载: http://arxiv.org/abs/2404.08755v1

OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM's knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to diversified data modalities and large table sizes. In this work, we propose OpenTab, an open-domain table reasoning framework powered by LLMs. Overall, OpenTab leverages table retriever to fetch relevant tables and then generates SQL programs to parse the retrieved tables efficiently. Utilizing the intermediate data derived from the SQL executions, it conducts grounded inference to produce accurate response. Extensive experimental evaluation shows that OpenTab significantly outperforms baselines in both open- and closed-domain settings, achieving up to 21.5% higher accuracy. We further run ablation studies to validate the efficacy of our proposed designs of the system.

Updated: 2024-04-12 18:27:34

标题: OpenTab：推动大型语言模型作为开放领域表格推理器的发展

摘要: 大型语言模型（LLMs）在大量数据上训练后，在各种自然语言任务上表现出色，但无法处理需要先前未经训练的知识的任务。一个解决方案是使用检索器来获取相关信息，以扩展LLM的知识范围。然而，由于数据模态多样化和表格大小巨大，现有的面向文本的基于检索的LLMs在结构化表格数据上并不理想。在这项工作中，我们提出了OpenTab，一个由LLMs驱动的开放领域表格推理框架。总的来说，OpenTab利用表格检索器来获取相关表格，然后生成SQL程序以有效地解析检索到的表格。利用从SQL执行中派生的中间数据，它进行基于事实的推理以产生准确的响应。大量的实验评估表明，OpenTab在开放和封闭领域设置中明显优于基线，准确率提高了高达21.5%。我们进一步进行消融研究来验证我们提出的系统设计的有效性。

更新时间: 2024-04-12 18:27:34

领域: cs.LG

下载: http://arxiv.org/abs/2402.14361v2

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

Reinforcement learning (RL) theory has largely focused on proving minimax sample complexity bounds. These require strategic exploration algorithms that use relatively limited function classes for representing the policy or value function. Our goal is to explain why deep RL algorithms often perform well in practice, despite using random exploration and much more expressive function classes like neural networks. Our work arrives at an explanation by showing that many stochastic MDPs can be solved by performing only a few steps of value iteration on the random policy's Q function and then acting greedily. When this is true, we find that it is possible to separate the exploration and learning components of RL, making it much easier to analyze. We introduce a new RL algorithm, SQIRL, that iteratively learns a near-optimal policy by exploring randomly to collect rollouts and then performing a limited number of steps of fitted-Q iteration over those rollouts. Any regression algorithm that satisfies basic in-distribution generalization properties can be used in SQIRL to efficiently solve common MDPs. This can explain why deep RL works, since it is empirically established that neural networks generalize well in-distribution. Furthermore, SQIRL explains why random exploration works well in practice. We leverage SQIRL to derive instance-dependent sample complexity bounds for RL that are exponential only in an "effective horizon" of lookahead and on the complexity of the class used for function approximation. Empirically, we also find that SQIRL performance strongly correlates with PPO and DQN performance in a variety of stochastic environments, supporting that our theoretical analysis is predictive of practical performance. Our code and data are available at https://github.com/cassidylaidlaw/effective-horizon.

Updated: 2024-04-12 18:26:36

标题: 有效的地平线解释了随机环境中深度强化学习的表现

摘要: 强化学习（RL）理论主要集中在证明极小化样本复杂性上。这些要求采用相对有限的函数类来表示策略或值函数的战略探索算法。我们的目标是解释为什么深度RL算法在实践中通常表现良好，尽管使用随机探索和更具表现力的函数类，如神经网络。我们的工作通过展示许多随机MDP可以通过在随机策略的Q函数上执行几步值迭代然后贪婪地行动来解决，从而得出一个解释。当这一点成立时，我们发现可以将RL的探索和学习组件分开，这样分析就变得更容易。我们引入了一种新的RL算法SQIRL，通过随机探索收集轨迹，然后对这些轨迹执行有限次数的拟合Q迭代来迭代地学习一个接近最优的策略。任何满足基本分布泛化性质的回归算法都可以在SQIRL中使用，以有效解决常见的MDP。这可以解释为什么深度RL有效，因为经验性证明神经网络在分布中具有良好的泛化能力。此外，SQIRL解释了为什么随机探索在实践中很有效。我们利用SQIRL推导了RL的实例相关样本复杂性界限，仅在“有效视野”和用于函数逼近的类的复杂度上呈指数增长。在实证方面，我们还发现，在各种随机环境中，SQIRL的性能与PPO和DQN的性能密切相关，支持我们的理论分析对实际性能的预测。我们的代码和数据可在https://github.com/cassidylaidlaw/effective-horizon上找到。

更新时间: 2024-04-12 18:26:36

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.08369v2

Computing distances and means on manifolds with a metric-constrained Eikonal approach

Computing distances on Riemannian manifolds is a challenging problem with numerous applications, from physics, through statistics, to machine learning. In this paper, we introduce the metric-constrained Eikonal solver to obtain continuous, differentiable representations of distance functions on manifolds. The differentiable nature of these representations allows for the direct computation of globally length-minimising paths on the manifold. We showcase the use of metric-constrained Eikonal solvers for a range of manifolds and demonstrate the applications. First, we demonstrate that metric-constrained Eikonal solvers can be used to obtain the Fr\'echet mean on a manifold, employing the definition of a Gaussian mixture model, which has an analytical solution to verify the numerical results. Second, we demonstrate how the obtained distance function can be used to conduct unsupervised clustering on the manifold -- a task for which existing approaches are computationally prohibitive. This work opens opportunities for distance computations on manifolds.

Updated: 2024-04-12 18:26:32

标题: 使用度量约束的Eikonal方法在流形上计算距离和均值

摘要: 在黎曼流形上计算距离是一个具有挑战性的问题，具有许多应用，从物理学，统计学到机器学习。在本文中，我们引入了度量约束Eikonal求解器，以获得流形上距离函数的连续、可微表示。这些表示的可微性使得可以直接计算流形上的全局长度最小化路径。我们展示了度量约束Eikonal求解器在一系列流形上的应用，并展示了应用场景。首先，我们展示了度量约束Eikonal求解器可以用于在流形上获得Fr\'echet均值，利用高斯混合模型的定义，该定义具有解析解以验证数值结果。其次，我们展示了如何利用获得的距离函数在流形上进行无监督聚类——这是现有方法在计算上禁止的任务。这项工作为在流形上进行距离计算开辟了机会。

更新时间: 2024-04-12 18:26:32

领域: cs.LG,cs.CG,math.MG

下载: http://arxiv.org/abs/2404.08754v1

Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

This paper presents Conformer-1, an end-to-end Automatic Speech Recognition (ASR) model trained on an extensive dataset of 570k hours of speech audio data, 91% of which was acquired from publicly available sources. To achieve this, we perform Noisy Student Training after generating pseudo-labels for the unlabeled public data using a strong Conformer RNN-T baseline model. The addition of these pseudo-labeled data results in remarkable improvements in relative Word Error Rate (WER) by 11.5% and 24.3% for our asynchronous and realtime models, respectively. Additionally, the model is more robust to background noise owing to the addition of these data. The results obtained in this study demonstrate that the incorporation of pseudo-labeled publicly available data is a highly effective strategy for improving ASR accuracy and noise robustness.

Updated: 2024-04-12 18:23:35

标题: Conformer-1：通过大规模半监督引导提高鲁棒性的自动语音识别(Auto Speech Recognition, ASR)

摘要: 本文介绍了Conformer-1，这是一个端到端的自动语音识别（ASR）模型，经过对570k小时的语音音频数据进行训练，其中91%来自公开来源。为了实现这一目标，我们在使用强大的Conformer RNN-T基线模型生成未标记公共数据的伪标签后，进行了Noisy Student Training。这些伪标记数据的添加使相对词错误率（WER）在异步和实时模型中分别提高了11.5%和24.3%。此外，由于增加了这些数据，模型对背景噪声更加稳健。本研究取得的结果表明，将伪标记的公开数据纳入ASR中是提高准确性和噪声鲁棒性的高效策略。

更新时间: 2024-04-12 18:23:35

领域: eess.AS,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2404.07341v2

FastLogAD: Log Anomaly Detection with Mask-Guided Pseudo Anomaly Generation and Discrimination

Nowadays large computers extensively output logs to record the runtime status and it has become crucial to identify any suspicious or malicious activities from the information provided by the realtime logs. Thus, fast log anomaly detection is a necessary task to be implemented for automating the infeasible manual detection. Most of the existing unsupervised methods are trained only on normal log data, but they usually require either additional abnormal data for hyperparameter selection or auxiliary datasets for discriminative model optimization. In this paper, aiming for a highly effective discriminative model that enables rapid anomaly detection,we propose FastLogAD, a generator-discriminator framework trained to exhibit the capability of generating pseudo-abnormal logs through the Mask-Guided Anomaly Generation (MGAG) model and efficiently identifying the anomalous logs via the Discriminative Abnormality Separation (DAS) model. Particularly, pseudo-abnormal logs are generated by replacing randomly masked tokens in a normal sequence with unlikely candidates. During the discriminative stage, FastLogAD learns a distinct separation between normal and pseudoabnormal samples based on their embedding norms, allowing the selection of a threshold without exposure to any test data and achieving competitive performance. Extensive experiments on several common benchmarks show that our proposed FastLogAD outperforms existing anomaly detection approaches. Furthermore, compared to previous methods, FastLogAD achieves at least x10 speed increase in anomaly detection over prior work. Our implementation is available at https://github.com/YifeiLin0226/FastLogAD.

Updated: 2024-04-12 18:23:29

标题: FastLogAD: 使用面部引导的伪异常生成和区分的日志异常检测

摘要: 当今大型计算机广泛输出日志以记录运行时状态，并且从实时日志提供的信息中识别任何可疑或恶意活动已变得至关重要。因此，快速日志异常检测是一个必要的任务，用于自动化不可行的手动检测。大多数现有的无监督方法只针对正常日志数据进行训练，但它们通常需要额外的异常数据用于超参数选择或辅助数据集用于判别模型优化。在本文中，为了实现一种高效的判别模型，能够实现快速异常检测，我们提出了FastLogAD，这是一个生成器-判别器框架，经过训练可以通过Mask-Guided Anomaly Generation（MGAG）模型生成伪异常日志，并通过Discriminative Abnormality Separation（DAS）模型有效地识别异常日志。特别地，伪异常日志是通过将正常序列中的随机屏蔽标记替换为不太可能的候选标记而生成的。在判别阶段，FastLogAD基于它们的嵌入范数学习正常和伪异常样本之间的明显分离，从而在不暴露任何测试数据的情况下实现阈值的选择，并实现竞争性性能。对几个常见基准进行的广泛实验表明，我们提出的FastLogAD优于现有的异常检测方法。此外，与以前的方法相比，FastLogAD在异常检测速度方面至少实现了x10倍的提速。我们的实现可在https://github.com/YifeiLin0226/FastLogAD 上找到。

更新时间: 2024-04-12 18:23:29

领域: cs.LG

下载: http://arxiv.org/abs/2404.08750v1

Observation-specific explanations through scattered data approximation

This work introduces the definition of observation-specific explanations to assign a score to each data point proportional to its importance in the definition of the prediction process. Such explanations involve the identification of the most influential observations for the black-box model of interest. The proposed method involves estimating these explanations by constructing a surrogate model through scattered data approximation utilizing the orthogonal matching pursuit algorithm. The proposed approach is validated on both simulated and real-world datasets.

Updated: 2024-04-12 18:20:26

标题: 通过分散数据近似进行的观测特定解释

摘要: 这项工作介绍了观测特定解释的定义，为了将分数分配给每个数据点，其重要性与预测过程的定义成正比。这样的解释涉及确定对感兴趣的黑盒模型最具影响力的观测。提议的方法涉及通过利用正交匹配追踪算法构建散乱数据逼近来估计这些解释。提议的方法在模拟和真实世界数据集上得到验证。

更新时间: 2024-04-12 18:20:26

领域: stat.ML,cs.AI,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2404.08747v1

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.

Updated: 2024-04-12 17:58:52

标题: 国家对抗多智能体强化学习的解决方案是什么？

摘要: 多种多样的多智能体强化学习（MARL）方法已经开发出来，假设智能体的策略是基于准确的状态信息。然而，通过深度强化学习（DRL）学习的策略容易受到对抗性状态扰动攻击的影响。在这项工作中，我们提出了一个状态对抗性马尔可夫博弈（SAMG），并首次尝试在状态不确定性下研究MARL的不同解决方案概念。我们的分析显示，在SAMG中，通常使用的最优智能体策略和鲁棒纳什均衡的解决方案概念并不总是存在。为了避免这种困难，我们考虑了一个称为鲁棒智能体策略的新解决方案概念，其中智能体的目标是最大化最坏情况下的期望状态值。我们证明了在有限状态和有限动作的SAMG中存在鲁棒智能体策略。此外，我们提出了一个名为Robust Multi-Agent Adversarial Actor-Critic（RMA3C）算法，用于在状态不确定性下学习MARL智能体的鲁棒策略。我们的实验表明，我们的算法在面对状态扰动时优于现有方法，并极大地提高了MARL策略的鲁棒性。我们的代码已在https://songyanghan.github.io/what_is_solution/上公开。

更新时间: 2024-04-12 17:58:52

领域: cs.AI,cs.GT,cs.MA

下载: http://arxiv.org/abs/2212.02705v5

Pre-training Small Base LMs with Fewer Tokens

We study the effectiveness of a simple approach to develop a small base language model (LM) starting from an existing large base LM: first inherit a few transformer blocks from the larger LM, and then train this smaller model on a very small subset (0.1\%) of the raw pretraining data of the larger model. We call our simple recipe Inheritune and first demonstrate it for building a small base LM with 1.5B parameters using 1B tokens (and a starting few layers of larger LM of 3B parameters); we do this using a single A6000 GPU for less than half a day. Across 9 diverse evaluation datasets as well as the MMLU benchmark, the resulting model compares favorably to publicly available base models of 1B-2B size, some of which have been trained using 50-1000 times more tokens. We investigate Inheritune in a slightly different setting where we train small LMs utilizing larger LMs and their full pre-training dataset. Here we show that smaller LMs trained utilizing some of the layers of GPT2-medium (355M) and GPT-2-large (770M) can effectively match the val loss of their bigger counterparts when trained from scratch for the same number of training steps on OpenWebText dataset with 9B tokens. We analyze our recipe with extensive experiments and demonstrate it efficacy on diverse settings. Our code is available at https://github.com/sanyalsunny111/LLM-Inheritune.

Updated: 2024-04-12 17:53:34

标题: 使用更少标记预训练小基础语言模型

摘要: 我们研究了一种简单方法来开发一个小型基础语言模型（LM），从一个现有的大型基础LM开始：首先从较大的LM继承几个变压器块，然后在较大模型的原始预训练数据的非常小的子集（0.1%）上训练这个较小的模型。我们称我们的简单方法为Inheritune，并首先展示它用于构建一个具有15亿参数的小型基础LM，使用10亿标记（以及3亿参数的较大LM的起始几层）；我们仅使用单个A6000 GPU不到半天完成此操作。在9个不同的评估数据集以及MMLU基准测试中，生成的模型与公开可用的1B-2B大小的基准模型相比表现出色，其中一些模型使用了50-1000倍的标记进行训练。我们在略有不同的设置中研究了Inheritune，我们训练小型LM利用较大的LM及其完整的预训练数据集。在这里，我们展示了使用GPT2-medium（355M）和GPT-2-large（770M）的一些层训练的较小LM可以在OpenWebText数据集上进行相同数量的训练步骤时有效地匹配它们更大的对应物的val loss。我们通过广泛的实验证实了我们的方法，并展示了它在不同设置下的有效性。我们的代码可在https://github.com/sanyalsunny111/LLM-Inheritune获取。

更新时间: 2024-04-12 17:53:34

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.08634v1

FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models

Few-shot classification with foundation models (e.g., CLIP, DINOv2, PaLM-2) enables users to build an accurate classifier with a few labeled training samples (called support samples) for a classification task. However, an attacker could perform data poisoning attacks by manipulating some support samples such that the classifier makes the attacker-desired, arbitrary prediction for a testing input. Empirical defenses cannot provide formal robustness guarantees, leading to a cat-and-mouse game between the attacker and defender. Existing certified defenses are designed for traditional supervised learning, resulting in sub-optimal performance when extended to few-shot classification. In our work, we propose FCert, the first certified defense against data poisoning attacks to few-shot classification. We show our FCert provably predicts the same label for a testing input under arbitrary data poisoning attacks when the total number of poisoned support samples is bounded. We perform extensive experiments on benchmark few-shot classification datasets with foundation models released by OpenAI, Meta, and Google in both vision and text domains. Our experimental results show our FCert: 1) maintains classification accuracy without attacks, 2) outperforms existing state-of-the-art certified defenses for data poisoning attacks, and 3) is efficient and general.

Updated: 2024-04-12 17:50:40

标题: FCert：在基础模型时代实现具有可证明鲁棒性的少样本分类

摘要: 利用基础模型（例如CLIP、DINOv2、PaLM-2）进行少样本分类使用户能够利用少量标记的训练样本（称为支持样本）构建准确的分类器。然而，攻击者可以通过操纵一些支持样本来进行数据毒化攻击，使分类器对测试输入进行攻击者所期望的任意预测。经验性防御措施无法提供正式的鲁棒性保证，导致攻击者和防御者之间进行一场猫鼠游戏。现有的认证防御措施是为传统监督学习设计的，将其扩展到少样本分类时会导致性能不佳。在我们的工作中，我们提出了FCert，这是首个针对少样本分类的数据毒化攻击的认证防御措施。我们展示了我们的FCert在受到有限数量毒化支持样本时，能够证明对测试输入做出相同的标签预测。我们在OpenAI、Meta和Google发布的基础模型在视觉和文本领域的基准少样本分类数据集上进行了大量实验。我们的实验结果表明，我们的FCert：1）在没有攻击的情况下保持分类准确性，2）胜过现有的最先进认证防御措施对数据毒化攻击，3）高效且通用。

更新时间: 2024-04-12 17:50:40

领域: cs.CR

下载: http://arxiv.org/abs/2404.08631v1

A Conceptual Framework for Conversational Search and Recommendation: Conceptualizing Agent-Human Interactions During the Conversational Search Process

The conversational search task aims to enable a user to resolve information needs via natural language dialogue with an agent. In this paper, we aim to develop a conceptual framework of the actions and intents of users and agents explaining how these actions enable the user to explore the search space and resolve their information need. We outline the different actions and intents, before discussing key decision points in the conversation where the agent needs to decide how to steer the conversational search process to a successful and/or satisfactory conclusion. Essentially, this paper provides a conceptualization of the conversational search process between an agent and user, which provides a framework and a starting point for research, development and evaluation of conversational search agents.

Updated: 2024-04-12 17:48:18

标题: 一个用于对话式搜索和推荐的概念框架：在对话式搜索过程中概念化代理-人类互动

摘要: 对话式搜索任务旨在通过用户与代理人之间的自然语言对话来解决信息需求。本文旨在开发一个概念框架，描述用户和代理人的行为和意图，解释这些行为如何使用户探索搜索空间并解决他们的信息需求。我们概述了不同的行为和意图，然后讨论对话中的关键决策点，代理人需要决定如何引导对话式搜索过程以取得成功和/或令人满意的结论。本文本质上提供了代理人和用户之间对话式搜索过程的概念化，为对话式搜索代理人的研究、开发和评估提供了一个框架和起点。

更新时间: 2024-04-12 17:48:18

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2404.08630v1

Is ChatGPT Transforming Academics' Writing Style?

Based on one million arXiv papers submitted from May 2018 to January 2024, we assess the textual density of ChatGPT's writing style in their abstracts by means of a statistical analysis of word frequency changes. Our model is calibrated and validated on a mixture of real abstracts and ChatGPT-modified abstracts (simulated data) after a careful noise analysis. We find that ChatGPT is having an increasing impact on arXiv abstracts, especially in the field of computer science, where the fraction of ChatGPT-revised abstracts is estimated to be approximately 35%, if we take the output of one of the simplest prompts, "revise the following sentences", as a baseline. We conclude with an analysis of both positive and negative aspects of the penetration of ChatGPT into academics' writing style.

Updated: 2024-04-12 17:41:05

标题: ChatGPT是否改变了学术写作风格？

摘要: 根据从2018年5月至2024年1月提交的100万份arXiv论文，我们通过对单词频率变化的统计分析，评估了ChatGPT在摘要中的文本密度写作风格。我们的模型在经过仔细的噪声分析后，根据真实摘要和ChatGPT修改的摘要（模拟数据）进行了校准和验证。我们发现ChatGPT对arXiv摘要的影响正在增加，特别是在计算机科学领域，如果以最简单提示“修改以下句子”输出为基准，那么ChatGPT修改的摘要比例约为35%。最后，我们对ChatGPT渗入学术写作风格的积极和消极方面进行了分析。

更新时间: 2024-04-12 17:41:05

领域: cs.CL,cs.AI,cs.DL,cs.LG

下载: http://arxiv.org/abs/2404.08627v1

Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks

In this work, we instantiate a regularized form of the gradient clipping algorithm and prove that it can converge to the global minima of deep neural network loss functions provided that the net is of sufficient width. We present empirical evidence that our theoretically founded regularized gradient clipping algorithm is also competitive with the state-of-the-art deep-learning heuristics. Hence the algorithm presented here constitutes a new approach to rigorous deep learning. The modification we do to standard gradient clipping is designed to leverage the PL* condition, a variant of the Polyak-Lojasiewicz inequality which was recently proven to be true for various neural networks for any depth within a neighborhood of the initialisation.

Updated: 2024-04-12 17:37:42

标题: 正则化梯度截断可证明地训练宽而深的神经网络

摘要: 在这项工作中，我们实例化了梯度裁剪算法的正则化形式，并证明只要网络足够宽，它可以收敛到深度神经网络损失函数的全局最小值。我们提供经验证据表明，我们在理论上建立的正则化梯度裁剪算法也与最先进的深度学习启发式算法竞争力相当。因此，这里提出的算法构成了一种严格的深度学习新方法。我们对标准梯度裁剪所做的修改旨在利用PL*条件，这是Polyak-Lojasiewicz不等式的一种变体，最近已被证明对于初始值附近的各种深度神经网络是成立的。

更新时间: 2024-04-12 17:37:42

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2404.08624v1

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Large Multimodal Models (LMMs) have shown significant reasoning capabilities by connecting a visual encoder and a large language model. LMMs typically use a fixed amount of visual tokens, such as the penultimate layer features in the CLIP visual encoder, as the prefix content. Recent LMMs incorporate more complex visual inputs, such as high-resolution images and videos, which increase the number of visual tokens significantly. However, due to the design of the Transformer architecture, computational costs associated with these models tend to increase quadratically with the number of input tokens. To tackle this problem, we explore a token reduction mechanism and find, similar to prior work, that many visual tokens are spatially redundant. Based on this, we propose PruMerge, a novel adaptive visual token reduction approach, which largely reduces the number of visual tokens while maintaining comparable model performance. We first select the unpruned visual tokens based on their similarity to class tokens and spatial tokens. We then cluster the pruned tokens based on key similarity and merge the clustered tokens with the unpruned tokens to supplement their information. Empirically, when applied to LLaVA-1.5, our approach can compress the visual tokens by 18 times on average, and achieve comparable performance across diverse visual question-answering and reasoning tasks. Code and checkpoints are at https://llava-prumerge.github.io/.

Updated: 2024-04-12 17:34:29

标题: LLaVA-PruMerge：用于高效大型多模态模型的自适应标记减少

摘要: 大型多模态模型（LMMs）通过连接视觉编码器和大型语言模型展现了显著的推理能力。LMMs通常使用固定数量的视觉标记，比如CLIP视觉编码器中的倒数第二层特征作为前缀内容。最近的LMMs融入了更复杂的视觉输入，如高分辨率图像和视频，这显著增加了视觉标记的数量。然而，由于Transformer架构的设计，这些模型的计算成本往往随着输入标记的数量而呈二次增长。为了解决这个问题，我们探索了一种标记减少机制，并发现与先前的工作类似，许多视觉标记在空间上是冗余的。基于此，我们提出了PruMerge，一种新颖的自适应视觉标记减少方法，大大减少了视觉标记的数量，同时保持了可比较的模型性能。我们首先基于它们与类标记和空间标记的相似性选择未修剪的视觉标记。然后，我们根据关键相似性对修剪的标记进行聚类，并将聚类的标记与未修剪的标记合并，以补充它们的信息。从经验上看，当应用于LLaVA-1.5时，我们的方法平均可以将视觉标记压缩18倍，并在各种视觉问答和推理任务中实现可比较的性能。代码和检查点位于https://llava-prumerge.github.io/。

更新时间: 2024-04-12 17:34:29

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.15388v4

Inferentialist Resource Semantics

In systems modelling, a system typically comprises located resources relative to which processes execute. One important use of logic in informatics is in modelling such systems for the purpose of reasoning (perhaps automated) about their behaviour and properties. To this end, one requires an interpretation of logical formulae in terms of the resources and states of the system; such an interpretation is called a resource semantics of the logic. This paper shows how inferentialism -- the view that meaning is given in terms of inferential behaviour -- enables a versatile and expressive framework for resource semantics. Specifically, how inferentialism seamlessly incorporates the assertion-based approach of the logic of Bunched Implications, foundational in program verification (e.g., as the basis of Separation Logic), and the renowned number-of-uses reading of Linear Logic. This integration enables reasoning about shared and separated resources in intuitive and familiar ways, as well as about the composition and interfacing of system components.

Updated: 2024-04-12 17:24:57

标题: 推理资源语义论

摘要: 在系统建模中，一个系统通常包括相对于其执行过程的位于资源。逻辑在信息学中的一个重要用途是为了对系统的行为和属性进行推理（也许是自动化的）。为此，人们需要将逻辑公式解释为系统的资源和状态；这种解释被称为逻辑的资源语义。本文展示了推理主义的观点——意义是根据推理行为给出的——如何为资源语义提供了一个多才多艺且富有表现力的框架。具体来说，推理主义如何无缝地融合了基于束状蕴涵的逻辑（例如作为程序验证中分离逻辑的基础）和著名的线性逻辑的使用次数解读。这种整合使得以直观和熟悉的方式推理共享和分离的资源，以及系统组件的组合和接口。

更新时间: 2024-04-12 17:24:57

领域: cs.LO,cs.CR,cs.SY,eess.SY,math.LO

下载: http://arxiv.org/abs/2402.09217v4

Automatic Quantification of Serial PET/CT Images for Pediatric Hodgkin Lymphoma Patients Using a Longitudinally-Aware Segmentation Network

$\textbf{Purpose}$: Automatic quantification of longitudinal changes in PET scans for lymphoma patients has proven challenging, as residual disease in interim-therapy scans is often subtle and difficult to detect. Our goal was to develop a longitudinally-aware segmentation network (LAS-Net) that can quantify serial PET/CT images for pediatric Hodgkin lymphoma patients. $\textbf{Materials and Methods}$: This retrospective study included baseline (PET1) and interim (PET2) PET/CT images from 297 patients enrolled in two Children's Oncology Group clinical trials (AHOD1331 and AHOD0831). LAS-Net incorporates longitudinal cross-attention, allowing relevant features from PET1 to inform the analysis of PET2. Model performance was evaluated using Dice coefficients for PET1 and detection F1 scores for PET2. Additionally, we extracted and compared quantitative PET metrics, including metabolic tumor volume (MTV) and total lesion glycolysis (TLG) in PET1, as well as qPET and $\Delta$SUVmax in PET2, against physician measurements. We quantified their agreement using Spearman's $\rho$ correlations and employed bootstrap resampling for statistical analysis. $\textbf{Results}$: LAS-Net detected residual lymphoma in PET2 with an F1 score of 0.606 (precision/recall: 0.615/0.600), outperforming all comparator methods (P<0.01). For baseline segmentation, LAS-Net achieved a mean Dice score of 0.772. In PET quantification, LAS-Net's measurements of qPET, $\Delta$SUVmax, MTV and TLG were strongly correlated with physician measurements, with Spearman's $\rho$ of 0.78, 0.80, 0.93 and 0.96, respectively. The performance remained high, with a slight decrease, in an external testing cohort. $\textbf{Conclusion}$: LAS-Net achieved high performance in quantifying PET metrics across serial scans, highlighting the value of longitudinal awareness in evaluating multi-time-point imaging datasets.

Updated: 2024-04-12 17:20:57

标题: 使用具有纵向感知分割网络对儿童霍奇金淋巴瘤患者的连续PET/CT图像进行自动量化

摘要: $\textbf{目的}$：自动量化淋巴瘤患者PET扫描的纵向变化一直是具有挑战性的，因为在中期治疗扫描中残留病变通常微妙且难以检测。我们的目标是开发一个纵向感知分割网络（LAS-Net），可以量化儿童霍奇金淋巴瘤患者的连续PET/CT图像。 $\textbf{材料和方法}$：这项回顾性研究包括了两个儿童肿瘤学组临床试验（AHOD1331和AHOD0831）中297名患者的基线（PET1）和中期（PET2）PET/CT图像。LAS-Net结合了纵向交叉注意力，允许来自PET1的相关特征影响对PET2的分析。使用Dice系数评估模型性能用于PET1，使用PET2的检测F1分数。此外，我们提取并比较了PET1中的代谢瘤体积（MTV）和总病变醇代谢（TLG），以及PET2中的qPET和$\Delta$SUVmax，与医生的测量进行了比较。我们使用Spearman's $\rho$相关性来量化它们的一致性，并采用bootstrap重采样进行统计分析。 $\textbf{结果}$：LAS-Net在PET2中检测到残留淋巴瘤的F1分数为0.606（精确度/召回率：0.615/0.600），优于所有比较方法（P<0.01）。对于基线分割，LAS-Net实现了平均Dice分数为0.772。在PET量化方面，LAS-Net的qPET、$\Delta$SUVmax、MTV和TLG的测量与医生的测量强相关，分别为Spearman's $\rho$为0.78、0.80、0.93和0.96。性能在外部测试队列中仍然保持高水平，略有下降。 $\textbf{结论}$：LAS-Net在量化连续扫描中的PET指标方面表现出高性能，突显了在评估多时间点成像数据集时纵向意识的价值。

更新时间: 2024-04-12 17:20:57

领域: cs.CV,cs.AI,physics.med-ph

下载: http://arxiv.org/abs/2404.08611v1

A Dynamical Model of Neural Scaling Laws

On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude. This phenomenon is known as a neural scaling law. Of fundamental importance is the compute-optimal scaling law, which reports the performance as a function of units of compute when choosing model sizes optimally. We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization. This reproduces many observations about neural scaling laws. First, our model makes a prediction about why the scaling of performance with training time and with model size have different power law exponents. Consequently, the theory predicts an asymmetric compute-optimal scaling rule where the number of training steps are increased faster than model parameters, consistent with recent empirical observations. Second, it has been observed that early in training, networks converge to their infinite-width dynamics at a rate $1/\textit{width}$ but at late time exhibit a rate $\textit{width}^{-c}$, where $c$ depends on the structure of the architecture and task. We show that our model exhibits this behavior. Lastly, our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.

Updated: 2024-04-12 17:16:09

标题: 一个神经比例定律的动力学模型

摘要: 在各种任务中，神经网络的性能随着训练时间、数据集大小和模型大小的增加而可预见地提高，跨越多个数量级。这种现象被称为神经缩放定律。计算优化缩放定律是至关重要的，它报告了在选择模型大小最佳时，性能与计算单元的关系。我们分析了使用梯度下降训练的随机特征模型作为网络训练和泛化的可解模型。这重现了关于神经缩放定律的许多观察。首先，我们的模型对于性能随训练时间和模型大小缩放的不同幂律指数做出了预测。因此，理论预测了一个不对称的计算优化缩放规则，其中训练步骤的数量增加速度快于模型参数，与最近的经验观察一致。其次，观察到在训练早期，网络以速率$1/\textit{width}$收敛到无限宽度动态，但在后期表现出速率$\textit{width}^{-c}$，其中$c$取决于架构和任务的结构。我们展示了我们的模型表现出这种行为。最后，我们的理论展示了由于数据重复使用而导致训练和测试损失之间的差距如何随着时间逐渐增加。

更新时间: 2024-04-12 17:16:09

领域: stat.ML,cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2402.01092v2

Hyperbolic Delaunay Geometric Alignment

Hyperbolic machine learning is an emerging field aimed at representing data with a hierarchical structure. However, there is a lack of tools for evaluation and analysis of the resulting hyperbolic data representations. To this end, we propose Hyperbolic Delaunay Geometric Alignment (HyperDGA) -- a similarity score for comparing datasets in a hyperbolic space. The core idea is counting the edges of the hyperbolic Delaunay graph connecting datapoints across the given sets. We provide an empirical investigation on synthetic and real-life biological data and demonstrate that HyperDGA outperforms the hyperbolic version of classical distances between sets. Furthermore, we showcase the potential of HyperDGA for evaluating latent representations inferred by a Hyperbolic Variational Auto-Encoder.

Updated: 2024-04-12 17:14:58

标题: 双曲Delaunay几何对齐

摘要: 双曲机器学习是一个新兴领域，旨在用层次结构表示数据。然而，目前缺乏用于评估和分析结果双曲数据表示的工具。为此，我们提出了双曲Delaunay几何对齐（HyperDGA）-一种用于在双曲空间比较数据集的相似度分数。其核心思想是计算连接给定集合中数据点的双曲Delaunay图的边。我们对合成和现实生物数据进行了实证研究，并证明HyperDGA优于双曲距离在集合之间的经典版本。此外，我们展示了HyperDGA评估通过双曲变分自动编码器推断的潜在表示的潜力。

更新时间: 2024-04-12 17:14:58

领域: cs.LG

下载: http://arxiv.org/abs/2404.08608v1

Mayhem: Targeted Corruption of Register and Stack Variables

In the past decade, many vulnerabilities were discovered in microarchitectures which yielded attack vectors and motivated the study of countermeasures. Further, architectural and physical imperfections in DRAMs led to the discovery of Rowhammer attacks which give an adversary power to introduce bit flips in a victim's memory space. Numerous studies analyzed Rowhammer and proposed techniques to prevent it altogether or to mitigate its effects. In this work, we push the boundary and show how Rowhammer can be further exploited to inject faults into stack variables and even register values in a victim's process. We achieve this by targeting the register value that is stored in the process's stack, which subsequently is flushed out into the memory, where it becomes vulnerable to Rowhammer. When the faulty value is restored into the register, it will end up used in subsequent iterations. The register value can be stored in the stack via latent function calls in the source or by actively triggering signal handlers. We demonstrate the power of the findings by applying the techniques to bypass SUDO and SSH authentication. We further outline how MySQL and other cryptographic libraries can be targeted with the new attack vector. There are a number of challenges this work overcomes with extensive experimentation before coming together to yield an end-to-end attack on an OpenSSL digital signature: achieving co-location with stack and register variables, with synchronization provided via a blocking window. We show that stack and registers are no longer safe from the Rowhammer attack.

Updated: 2024-04-12 17:06:54

标题: 混乱：寻找并篡改寄存器和栈变量

摘要: 在过去的十年中，许多微架构中发现了许多漏洞，这导致了攻击向量的产生，并促使了对抗措施的研究。此外，DRAM中的架构和物理缺陷导致了Rowhammer攻击的发现，使对手有能力在受害者的内存空间中引入位翻转。许多研究对Rowhammer进行了分析，并提出了防止它或减轻其影响的技术。在这项工作中，我们推动了边界，并展示了如何进一步利用Rowhammer向受害者的进程中的堆栈变量甚至寄存器值注入故障。我们通过瞄准存储在进程堆栈中的寄存器值来实现这一点，随后将其冲洗到内存中，在那里它变得容易受到Rowhammer攻击。当错误值被恢复到寄存器时，它将在后续迭代中被使用。寄存器值可以通过源中的潜在函数调用或通过主动触发信号处理程序存储在堆栈中。我们通过将这些技术应用于绕过SUDO和SSH身份验证来展示这些发现的力量。我们进一步概述了如何使用新的攻击向量针对MySQL和其他加密库。在产生端到端攻击OpenSSL数字签名之前，这项工作克服了许多挑战，通过广泛实验来实现与堆栈和寄存器变量的共存，并通过阻塞窗口提供同步。我们展示了堆栈和寄存器不再安全免受Rowhammer攻击的影响。

更新时间: 2024-04-12 17:06:54

领域: cs.CR

下载: http://arxiv.org/abs/2309.02545v2

Sliding down the stairs: how correlated latent variables accelerate learning with neural networks

Neural networks extract features from data using stochastic gradient descent (SGD). In particular, higher-order input cumulants (HOCs) are crucial for their performance. However, extracting information from the $p$th cumulant of $d$-dimensional inputs is computationally hard: the number of samples required to recover a single direction from an order-$p$ tensor (tensor PCA) using online SGD grows as $d^{p-1}$, which is prohibitive for high-dimensional inputs. This result raises the question of how neural networks extract relevant directions from the HOCs of their inputs efficiently. Here, we show that correlations between latent variables along the directions encoded in different input cumulants speed up learning from higher-order correlations. We show this effect analytically by deriving nearly sharp thresholds for the number of samples required by a single neuron to weakly-recover these directions using online SGD from a random start in high dimensions. Our analytical results are confirmed in simulations of two-layer neural networks and unveil a new mechanism for hierarchical learning in neural networks.

Updated: 2024-04-12 17:01:25

标题: 滑下楼梯：如何利用相关的潜变量加速神经网络学习

摘要: 神经网络利用随机梯度下降（SGD）从数据中提取特征。特别地，高阶输入累积量（HOCs）对其性能至关重要。然而，从$d$维输入的第$p$阶累积量中提取信息在计算上是困难的：使用在线SGD从一个$p$阶张量（张量PCA）中恢复一个方向所需的样本数量随着$d^{p-1}$增长，这对于高维输入来说是不可行的。这个结果引发了一个问题，即神经网络如何有效地从其输入的HOCs中提取相关方向。在这里，我们展示了潜在变量之间的相关性沿着不同输入累积量中编码的方向加速了从高阶相关性中学习。我们通过在高维中从随机起点使用在线SGD恢复这些方向所需的样本数量的近似阈值，从而在分析中展示了这种效果。我们的分析结果在两层神经网络的模拟中得到了确认，并揭示了神经网络中分层学习的新机制。

更新时间: 2024-04-12 17:01:25

领域: stat.ML,cond-mat.stat-mech,cs.LG,math.PR,math.ST,stat.TH

下载: http://arxiv.org/abs/2404.08602v1

Generating Synthetic Time Series Data for Cyber-Physical Systems

Data augmentation is an important facilitator of deep learning applications in the time series domain. A gap is identified in the literature, demonstrating sparse exploration of the transformer, the dominant sequence model, for data augmentation in time series. A architecture hybridizing several successful priors is put forth and tested using a powerful time domain similarity metric. Results suggest the challenge of this domain, and several valuable directions for future work.

Updated: 2024-04-12 16:55:08

标题: 生成用于网络物理系统的合成时间序列数据

摘要: 数据增强是深度学习应用在时间序列领域中的重要促进因素。文献中发现了一个空白，表明对于时间序列数据增强，主导的序列模型Transformer的探索较为有限。提出了一个融合了几种成功先验的架构，并使用强大的时间域相似度度量进行测试。结果表明了这一领域的挑战，以及未来工作的几个有价值的方向。

更新时间: 2024-04-12 16:55:08

领域: cs.LG

下载: http://arxiv.org/abs/2404.08601v1

Can LLMs substitute SQL? Comparing Resource Utilization of Querying LLMs versus Traditional Relational Databases

Large Language Models (LLMs) can automate or substitute different types of tasks in the software engineering process. This study evaluates the resource utilization and accuracy of LLM in interpreting and executing natural language queries against traditional SQL within relational database management systems. We empirically examine the resource utilization and accuracy of nine LLMs varying from 7 to 34 Billion parameters, including Llama2 7B, Llama2 13B, Mistral, Mixtral, Optimus-7B, SUS-chat-34B, platypus-yi-34b, NeuralHermes-2.5-Mistral-7B and Starling-LM-7B-alpha, using a small transaction dataset. Our findings indicate that using LLMs for database queries incurs significant energy overhead (even small and quantized models), making it an environmentally unfriendly approach. Therefore, we advise against replacing relational databases with LLMs due to their substantial resource utilization.

Updated: 2024-04-12 16:44:28

标题: LLMs能否替代SQL？比较LLMs和传统关系数据库查询的资源利用情况

摘要: 大型语言模型（LLM）可以自动化或替代软件工程过程中的不同类型任务。本研究评估了LLM在解释和执行自然语言查询方面与传统SQL在关系数据库管理系统中的资源利用率和准确性。我们经验性地检验了9个LLM的资源利用率和准确性，参数从70亿到340亿不等，包括Llama2 7B、Llama2 13B、Mistral、Mixtral、Optimus-7B、SUS-chat-34B、platypus-yi-34b、NeuralHermes-2.5-Mistral-7B 和 Starling-LM-7B-alpha，使用了一个小型交易数据集。我们的研究结果表明，使用LLM进行数据库查询会产生显著的能源开销（即使是小型和量子化的模型），这使得它成为一种环境不友好的方法。因此，我们建议不要用LLM替代关系数据库，因为它们具有相当大的资源利用率。

更新时间: 2024-04-12 16:44:28

领域: cs.DB,cs.AI,cs.CL,68-04,H.2.m

下载: http://arxiv.org/abs/2404.08727v1

Improving Referring Image Segmentation using Vision-Aware Text Features

Referring image segmentation is a challenging task that involves generating pixel-wise segmentation masks based on natural language descriptions. Existing methods have relied mostly on visual features to generate the segmentation masks while treating text features as supporting components. This over-reliance on visual features can lead to suboptimal results, especially in complex scenarios where text prompts are ambiguous or context-dependent. To overcome these challenges, we present a novel framework VATEX to improve referring image segmentation by enhancing object and context understanding with Vision-Aware Text Feature. Our method involves using CLIP to derive a CLIP Prior that integrates an object-centric visual heatmap with text description, which can be used as the initial query in DETR-based architecture for the segmentation task. Furthermore, by observing that there are multiple ways to describe an instance in an image, we enforce feature similarity between text variations referring to the same visual input by two components: a novel Contextual Multimodal Decoder that turns text embeddings into vision-aware text features, and a Meaning Consistency Constraint to ensure further the coherent and consistent interpretation of language expressions with the context understanding obtained from the image. Our method achieves a significant performance improvement on three benchmark datasets RefCOCO, RefCOCO+ and G-Ref. Code is available at: https://nero1342.github.io/VATEX\_RIS.

Updated: 2024-04-12 16:38:48

标题: 通过视觉感知文本特征改进参考图像分割

摘要: 参考图像分割是一个具有挑战性的任务，涉及根据自然语言描述生成基于像素的分割掩模。现有方法大多依赖于视觉特征来生成分割掩模，同时将文本特征视为支持组件。过度依赖视觉特征可能导致次优结果，特别是在文本提示模棱两可或依赖上下文的复杂场景中。为了克服这些挑战，我们提出了一个新颖的框架VATEX，通过增强对象和上下文理解与视觉感知文本特征来改进参考图像分割。我们的方法涉及使用CLIP来推导一个CLIP先验，将以物体为中心的视觉热图与文本描述集成在一起，可以作为基于DETR架构的分割任务中的初始查询。此外，通过观察到描述图像中实例的多种方式，我们通过两个组件实施特征相似性约束，这两个组件分别是：一种将文本嵌入转换为视觉感知文本特征的新颖上下文多模解码器，以及一种意义一致性约束，以进一步确保语言表达的一致性和一致性解释与从图像获取的上下文理解相一致。我们的方法在三个基准数据集RefCOCO、RefCOCO+和G-Ref上实现了显著的性能改进。代码可在https://nero1342.github.io/VATEX\_RIS 上获得。

更新时间: 2024-04-12 16:38:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.08590v1

ProbMCL: Simple Probabilistic Contrastive Learning for Multi-label Visual Classification

Multi-label image classification presents a challenging task in many domains, including computer vision and medical imaging. Recent advancements have introduced graph-based and transformer-based methods to improve performance and capture label dependencies. However, these methods often include complex modules that entail heavy computation and lack interpretability. In this paper, we propose Probabilistic Multi-label Contrastive Learning (ProbMCL), a novel framework to address these challenges in multi-label image classification tasks. Our simple yet effective approach employs supervised contrastive learning, in which samples that share enough labels with an anchor image based on a decision threshold are introduced as a positive set. This structure captures label dependencies by pulling positive pair embeddings together and pushing away negative samples that fall below the threshold. We enhance representation learning by incorporating a mixture density network into contrastive learning and generating Gaussian mixture distributions to explore the epistemic uncertainty of the feature encoder. We validate the effectiveness of our framework through experimentation with datasets from the computer vision and medical imaging domains. Our method outperforms the existing state-of-the-art methods while achieving a low computational footprint on both datasets. Visualization analyses also demonstrate that ProbMCL-learned classifiers maintain a meaningful semantic topology.

Updated: 2024-04-12 16:37:46

标题: ProbMCL：多标签视觉分类的简单概率对比学习

摘要: 多标签图像分类在许多领域中都是一个具有挑战性的任务，包括计算机视觉和医学影像。最近的进展引入了基于图和变压器的方法来提高性能并捕获标签依赖关系。然而，这些方法通常包括复杂的模块，需要大量计算，并且缺乏可解释性。在本文中，我们提出了概率多标签对比学习（ProbMCL），这是一个新颖的框架，用于解决多标签图像分类任务中的这些挑战。我们的简单而有效的方法采用了监督对比学习，其中基于决策阈值与锚图像共享足够标签的样本被引入为正集。这种结构通过将正对嵌入样本拉在一起并将低于阈值的负样本推开来捕捉标签依赖关系。我们通过将混合密度网络纳入对比学习并生成高斯混合分布来增强表示学习，以探索特征编码器的认知不确定性。我们通过在计算机视觉和医学影像领域的数据集上进行实验验证了我们框架的有效性。我们的方法在两个数据集上都表现优于现有的最先进方法，并且具有低的计算占用。可视化分析还表明，ProbMCL学习的分类器保持着有意义的语义拓扑结构。

更新时间: 2024-04-12 16:37:46

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.01448v2

Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts

Visual question answering (VQA) is known as an AI-complete task as it requires understanding, reasoning, and inferring about the vision and the language content. Over the past few years, numerous neural architectures have been suggested for the VQA problem. However, achieving success in zero-shot VQA remains a challenge due to its requirement for advanced generalization and reasoning skills. This study explores the impact of incorporating image captioning as an intermediary process within the VQA pipeline. Specifically, we explore the efficacy of utilizing image captions instead of images and leveraging large language models (LLMs) to establish a zero-shot setting. Since image captioning is the most crucial step in this process, we compare the impact of state-of-the-art image captioning models on VQA performance across various question types in terms of structure and semantics. We propose a straightforward and efficient question-driven image captioning approach within this pipeline to transfer contextual information into the question-answering (QA) model. This method involves extracting keywords from the question, generating a caption for each image-question pair using the keywords, and incorporating the question-driven caption into the LLM prompt. We evaluate the efficacy of using general-purpose and question-driven image captions in the VQA pipeline. Our study highlights the potential of employing image captions and harnessing the capabilities of LLMs to achieve competitive performance on GQA under the zero-shot setting. Our code is available at \url{https://github.com/ovguyo/captions-in-VQA}.

Updated: 2024-04-12 16:35:23

标题: 通过问题驱动的图像标题作为提示增强视觉问答

摘要: 视觉问答（VQA）被认为是一项AI完整的任务，因为它需要理解、推理和推断关于视觉和语言内容。在过去几年中，已经提出了许多神经架构来解决VQA问题。然而，在零样本VQA中取得成功仍然是一个挑战，因为它需要高级泛化和推理技能。本研究探讨了在VQA管道中加入图像字幕作为中间过程的影响。具体地，我们探讨了利用图像字幕而不是图像以及利用大型语言模型（LLMs）建立零样本设置的有效性。由于图像字幕是这一过程中最关键的步骤，我们比较了先进的图像字幕模型对VQA性能在不同结构和语义问题类型上的影响。我们提出了一个简单高效的问题驱动图像字幕方法，将上下文信息传递到问答（QA）模型中。该方法涉及从问题中提取关键词，使用这些关键词为每个图像-问题对生成字幕，并将问题驱动的字幕整合到LLM提示中。我们评估了在VQA管道中使用通用和问题驱动图像字幕的有效性。我们的研究突出了利用图像字幕和发挥LLMs能力，在零样本设置下取得GQA上有竞争力的表现的潜力。我们的代码可在\url{https://github.com/ovguyo/captions-in-VQA}找到。

更新时间: 2024-04-12 16:35:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.08589v1

An Integrated Toolbox for Creating Neuromorphic Edge Applications

Spiking Neural Networks (SNNs) and neuromorphic models are more efficient and have more biological realism than the activation functions typically used in deep neural networks, transformer models and generative AI. SNNs have local learning rules, are able to learn on small data sets, and can adapt through neuromodulation. Although research has shown their advantages, there are still few compelling practical applications, especially at the edge where sensors and actuators need to be processed in a timely fashion. One reason for this might be that SNNs are much more challenging to understand, build, and operate due to their intrinsic properties. For instance, the mathematical foundation involves differential equations rather than basic activation functions. To address these challenges, we have developed CARLsim++. It is an integrated toolbox that enables fast and easy creation of neuromorphic applications. It encapsulates the mathematical intrinsics and low-level C++ programming by providing a graphical user interface for users who do not have a background in software engineering but still want to create neuromorphic models. Developers can easily configure inputs and outputs to devices and robots. These can be accurately simulated before deploying on physical devices. CARLsim++ can lead to rapid development of neuromorphic applications for simulation or edge processing.

Updated: 2024-04-12 16:34:55

标题: 一个用于创建神经形态边缘应用的综合工具箱

摘要: 脉冲神经网络（SNNs）和神经形态模型比深度神经网络、变压器模型和生成式人工智能中通常使用的激活函数更高效，更具生物学现实性。SNNs具有局部学习规则，能够在小数据集上学习，并能通过神经调节进行适应。尽管研究表明它们的优势，但仍然缺乏引人注目的实际应用，特别是在传感器和执行器需要及时处理的边缘。其中一个原因可能是SNNs要理解、构建和操作要困难得多，这是由于它们的内在特性所致。例如，数学基础涉及微分方程而不是基本的激活函数。为了解决这些挑战，我们开发了CARLsim++。它是一个集成工具箱，可以快速、轻松地创建神经形态应用程序。它通过为没有软件工程背景但仍想创建神经形态模型的用户提供图形用户界面，封装了数学内在和低级C++编程。开发人员可以轻松配置设备和机器人的输入和输出。这些可以在部署到物理设备之前进行准确模拟。CARLsim++可以促进神经形态应用程序的快速开发，用于模拟或边缘处理。

更新时间: 2024-04-12 16:34:55

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2404.08726v1

FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation

In the realm of fashion object detection and segmentation for online shopping images, existing state-of-the-art fashion parsing models encounter limitations, particularly when exposed to non-model-worn apparel and close-up shots. To address these failures, we introduce FashionFail; a new fashion dataset with e-commerce images for object detection and segmentation. The dataset is efficiently curated using our novel annotation tool that leverages recent foundation models. The primary objective of FashionFail is to serve as a test bed for evaluating the robustness of models. Our analysis reveals the shortcomings of leading models, such as Attribute-Mask R-CNN and Fashionformer. Additionally, we propose a baseline approach using naive data augmentation to mitigate common failure cases and improve model robustness. Through this work, we aim to inspire and support further research in fashion item detection and segmentation for industrial applications. The dataset, annotation tool, code, and models are available at \url{https://rizavelioglu.github.io/fashionfail/}.

Updated: 2024-04-12 16:28:30

标题: 时尚失败：解决时尚目标检测和分割中的失败案例

摘要: 在时尚目标检测和分割领域，现有的尖端时尚解析模型在面对非模特穿着的服装和特写镜头时存在局限性。为了解决这些失败，我们引入了FashionFail；一个新的时尚数据集，用于对象检测和分割的电子商务图像。该数据集是通过我们的新型注释工具高效策划的，该工具利用最近的基础模型。FashionFail的主要目标是作为评估模型鲁棒性的测试平台。我们的分析揭示了领先模型的缺陷，如属性掩模R-CNN和时尚形式。此外，我们提出了一种基线方法，使用朴素数据增强来减轻常见的失败案例并改善模型的鲁棒性。通过这项工作，我们的目标是激发和支持进一步研究时尚物品检测和分割，以应用于工业应用。数据集、注释工具、代码和模型可在\url{https://rizavelioglu.github.io/fashionfail/}上获得。

更新时间: 2024-04-12 16:28:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.08582v1

Leap: molecular synthesisability scoring with intermediates

Assessing whether a molecule can be synthesised is a primary task in drug discovery. It enables computational chemists to filter for viable compounds or bias molecular generative models. The notion of synthesisability is dynamic as it evolves depending on the availability of key compounds. A common approach in drug discovery involves exploring the chemical space surrounding synthetically-accessible intermediates. This strategy improves the synthesisability of the derived molecules due to the availability of key intermediates. Existing synthesisability scoring methods such as SAScore, SCScore and RAScore, cannot condition on intermediates dynamically. Our approach, Leap, is a GPT-2 model trained on the depth, or longest linear path, of predicted synthesis routes that allows information on the availability of key intermediates to be included at inference time. We show that Leap surpasses all other scoring methods by at least 5% on AUC score when identifying synthesisable molecules, and can successfully adapt predicted scores when presented with a relevant intermediate compound.

Updated: 2024-04-12 16:26:04

标题: Leap：具有中间体的分子合成能力评分

摘要: 评估一种分子是否可以合成是药物发现中的首要任务。它使计算化学家能够筛选出可行的化合物或对分子生成模型进行偏置。合成性的概念是动态的，因为它取决于关键化合物的可用性。药物发现中的一种常见方法涉及探索周围可合成的中间体的化学空间。这种策略提高了由于关键中间体的可用性而导致的衍生分子的合成性。现有的合成性评分方法，如SAScore、SCScore和RAScore，无法动态地对中间体进行调节。我们的方法，Leap，是一个在预测的合成路径的深度或最长线性路径上训练的GPT-2模型，允许在推断时包含有关关键中间体可用性的信息。我们展示了Leap在识别可合成分子时至少比所有其他评分方法的AUC分数高出5%，并且可以成功地在出现相关的中间体化合物时调整预测的分数。

更新时间: 2024-04-12 16:26:04

领域: q-bio.BM,cs.LG,physics.chem-ph

下载: http://arxiv.org/abs/2403.13005v2

Small Models Are (Still) Effective Cross-Domain Argument Extractors

Effective ontology transfer has been a major goal of recent work on event argument extraction (EAE). Two methods in particular -- question answering (QA) and template infilling (TI) -- have emerged as promising approaches to this problem. However, detailed explorations of these techniques' ability to actually enable this transfer are lacking. In this work, we provide such a study, exploring zero-shot transfer using both techniques on six major EAE datasets at both the sentence and document levels. Further, we challenge the growing reliance on LLMs for zero-shot extraction, showing that vastly smaller models trained on an appropriate source ontology can yield zero-shot performance superior to that of GPT-3.5 or GPT-4.

Updated: 2024-04-12 16:23:41

标题: 小模型仍然有效的跨领域论点提取器

摘要: 最近在事件论证提取（EAE）方面，有效的本体转移一直是一个主要目标。两种特别的方法--问答（QA）和模板填充（TI）--已被提出作为解决这一问题的有希望的途径。然而，对这些技术实际实现转移能力的详细探讨尚不足。在这项工作中，我们提供了这样一项研究，探讨了在句子和文档级别上使用这两种技术进行零射击转移在六个主要的EAE数据集上的表现。此外，我们质疑了对零射击提取越来越依赖LLMs的趋势，展示出在适当的源本体上训练的远小于LLMs的模型可以产生比GPT-3.5或GPT-4更优秀的零射击表现。

更新时间: 2024-04-12 16:23:41

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.08579v1

Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation

This paper introduces CRITICAL, a novel closed-loop framework for autonomous vehicle (AV) training and testing. CRITICAL stands out for its ability to generate diverse scenarios, focusing on critical driving situations that target specific learning and performance gaps identified in the Reinforcement Learning (RL) agent. The framework achieves this by integrating real-world traffic dynamics, driving behavior analysis, surrogate safety measures, and an optional Large Language Model (LLM) component. It is proven that the establishment of a closed feedback loop between the data generation pipeline and the training process can enhance the learning rate during training, elevate overall system performance, and augment safety resilience. Our evaluations, conducted using the Proximal Policy Optimization (PPO) and the HighwayEnv simulation environment, demonstrate noticeable performance improvements with the integration of critical case generation and LLM analysis, indicating CRITICAL's potential to improve the robustness of AV systems and streamline the generation of critical scenarios. This ultimately serves to hasten the development of AV agents, expand the general scope of RL training, and ameliorate validation efforts for AV safety.

Updated: 2024-04-12 16:13:10

标题: 利用语言模型集成和关键场景生成增强自主车辆培训

摘要: 本文介绍了CRITICAL，这是一个用于自主车辆（AV）培训和测试的新型闭环框架。CRITICAL以其能够生成多样化场景而脱颖而出，重点关注关键驾驶情况，针对强化学习（RL）代理中确定的特定学习和性能差距。该框架通过整合现实世界交通动态、驾驶行为分析、替代安全措施以及一个可选的大型语言模型（LLM）组件来实现这一目标。已经证实，在数据生成管道和训练过程之间建立闭环反馈环路可以增强训练过程中的学习速率，提高整个系统的性能，并增强安全弹性。我们使用Proximal Policy Optimization（PPO）和HighwayEnv仿真环境进行的评估表明，与关键案例生成和LLM分析的整合相结合，CRITICAL在AV系统的稳健性和关键场景生成方面显示出明显的性能改进，从而有望加快AV代理的发展，扩大RL培训的一般范围，并改善AV安全验证工作。

更新时间: 2024-04-12 16:13:10

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.08570v1

Incremental Extractive Opinion Summarization Using Cover Trees

Extractive opinion summarization involves automatically producing a summary of text about an entity (e.g., a product's reviews) by extracting representative sentences that capture prevalent opinions in the review set. Typically, in online marketplaces user reviews accumulate over time, and opinion summaries need to be updated periodically to provide customers with up-to-date information. In this work, we study the task of extractive opinion summarization in an incremental setting, where the underlying review set evolves over time. Many of the state-of-the-art extractive opinion summarization approaches are centrality-based, such as CentroidRank (Radev et al., 2004; Chowdhury et al., 2022). CentroidRank performs extractive summarization by selecting a subset of review sentences closest to the centroid in the representation space as the summary. However, these methods are not capable of operating efficiently in an incremental setting, where reviews arrive one at a time. In this paper, we present an efficient algorithm for accurately computing the CentroidRank summaries in an incremental setting. Our approach, CoverSumm, relies on indexing review representations in a cover tree and maintaining a reservoir of candidate summary review sentences. CoverSumm's efficacy is supported by a theoretical and empirical analysis of running time. Empirically, on a diverse collection of data (both real and synthetically created to illustrate scaling considerations), we demonstrate that CoverSumm is up to 36x faster than baseline methods, and capable of adapting to nuanced changes in data distribution. We also conduct human evaluations of the generated summaries and find that CoverSumm is capable of producing informative summaries consistent with the underlying review set.

Updated: 2024-04-12 16:13:06

标题: 使用Cover Trees进行增量式抽取式意见总结

摘要: 抽取性意见总结涉及通过提取捕捉评论集中意见的代表性句子，自动生成有关实体（例如产品评论）的摘要。通常，在在线市场中，用户评论会随时间累积，意见摘要需要定期更新，以向客户提供最新信息。在这项工作中，我们研究了在增量设置中进行抽取性意见总结的任务，其中底层评论集随时间演变。许多最先进的抽取性意见总结方法是基于中心性的，例如CentroidRank。CentroidRank通过选择表示空间中最接近质心的评论句子子集来进行抽取总结。然而，这些方法无法在评论逐个到达的增量设置中高效运行。在本文中，我们提出了一种在增量设置中准确计算CentroidRank摘要的高效算法。我们的方法CoverSumm依赖于在覆盖树中索引评论表示，并保持候选摘要评论句子的水库。CoverSumm的有效性得到了对运行时间的理论和实证分析的支持。在多样的数据集（既包括真实数据，也包括为了说明规模考虑而人工合成的数据）上，我们证明CoverSumm比基准方法快36倍，并能够适应数据分布中微妙的变化。我们还进行了生成摘要的人类评估，发现CoverSumm能够生成符合底层评论集的信息摘要。

更新时间: 2024-04-12 16:13:06

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2401.08047v2

Towards Measuring and Modeling "Culture" in LLMs: A Survey

We present a survey of 39 recent papers that aim to study cultural representation and inclusion in large language models. We observe that none of the studies define "culture," which is a complex, multifaceted concept; instead, they probe the models on some specially designed datasets which represent certain aspects of "culture." We call these aspects the proxies of cultures, and organize them across three dimensions of demographic, semantic and linguistic-cultural interaction proxies. We also categorize the probing methods employed. Our analysis indicates that only certain aspects of "culture," such as values and objectives, have been studied, leaving several other interesting and important facets, especially the multitude of semantic domains (Thompson et al., 2020) and aboutness (Hershcovich et al., 2022), unexplored. Two other crucial gaps are the lack of robustness and situatedness of the current methods. Based on these observations, we provide several recommendations for a holistic and practically useful research agenda for furthering cultural inclusion in LLMs and LLM-based applications.

Updated: 2024-04-12 16:09:59

标题: 朝向在LLMs中测量和建模“文化”：一项调查

摘要: 我们提供了一份对近期39篇旨在研究大型语言模型中文化表达和包容性的论文的调查。我们观察到，这些研究中没有定义“文化”，这是一个复杂、多方面的概念；相反，它们在一些专门设计的数据集上对模型进行探讨，这些数据集代表了“文化”的某些方面。我们称这些方面为文化的代理，将它们组织在人口统计、语义和语言文化交互代理的三个维度上。我们还对使用的探测方法进行了分类。我们的分析表明，只有“文化”的某些方面，如价值观和目标，已被研究，留下了其他几个有趣且重要的方面，尤其是语义领域的多样性（Thompson等人，2020）和关于内容的多样性（Hershcovich等人，2022）尚未被探究。另外两个关键的空白是当前方法的缺乏健壮性和情境性。基于这些观察，我们提出了几项建议，为推动大型语言模型和基于大型语言模型的应用中文化包容性的全面且实用的研究议程。

更新时间: 2024-04-12 16:09:59

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.15412v2

Mitigating Receiver Impact on Radio Frequency Fingerprint Identification via Domain Adaptation

Radio Frequency Fingerprint Identification (RFFI), which exploits non-ideal hardware-induced unique distortion resident in the transmit signals to identify an emitter, is emerging as a means to enhance the security of communication systems. Recently, machine learning has achieved great success in developing state-of-the-art RFFI models. However, few works consider cross-receiver RFFI problems, where the RFFI model is trained and deployed on different receivers. Due to altered receiver characteristics, direct deployment of RFFI model on a new receiver leads to significant performance degradation. To address this issue, we formulate the cross-receiver RFFI as a model adaptation problem, which adapts the trained model to unlabeled signals from a new receiver. We first develop a theoretical generalization error bound for the adaptation model. Motivated by the bound, we propose a novel method to solve the cross-receiver RFFI problem, which includes domain alignment and adaptive pseudo-labeling. The former aims at finding a feature space where both domains exhibit similar distributions, effectively reducing the domain discrepancy. Meanwhile, the latter employs a dynamic pseudo-labeling scheme to implicitly transfer the label information from the labeled receiver to the new receiver. Experimental results indicate that the proposed method can effectively mitigate the receiver impact and improve the cross-receiver RFFI performance.

Updated: 2024-04-12 16:08:32

标题: 通过域适应缓解接收机对射频指纹识别的影响

摘要: 无线电频率指纹识别（RFFI）利用发射信号中由硬件引起的非理想唯一失真来识别发射器，正成为增强通信系统安全性的一种手段。最近，机器学习在开发最先进的RFFI模型方面取得了巨大成功。然而，很少有研究考虑跨接收器RFFI问题，其中RFFI模型在不同接收器上进行训练和部署。由于接收器特性的改变，直接将RFFI模型部署到新接收器上会导致性能显着下降。为解决这一问题，我们将跨接收器RFFI作为模型适应问题来制定，该问题将训练好的模型适应新接收器上的未标记信号。我们首先为适应模型开发了一个理论泛化误差界限。受该界限启发，我们提出了一种解决跨接收器RFFI问题的新方法，其中包括领域对齐和自适应伪标记。前者旨在找到一个特征空间，其中两个领域展示出相似的分布，有效减小领域差异。同时，后者采用动态伪标记方案，将标记信息从标记接收器隐式转移到新接收器。实验结果表明，所提出的方法可以有效减轻接收器的影响，并提高跨接收器RFFI的性能。

更新时间: 2024-04-12 16:08:32

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2404.08566v1

A Change Detection Reality Check

In recent years, there has been an explosion of proposed change detection deep learning architectures in the remote sensing literature. These approaches claim to offer state-of-the-art performance on different standard benchmark datasets. However, has the field truly made significant progress? In this paper we perform experiments which conclude a simple U-Net segmentation baseline without training tricks or complicated architectural changes is still a top performer for the task of change detection.

Updated: 2024-04-12 16:07:55

标题: 变化检测现实检验

摘要: 近年来，在遥感文献中提出了大量的变化检测深度学习架构。这些方法声称在不同的标准基准数据集上提供了最先进的性能。然而，该领域是否真正取得了显著进展？本文进行了实验证明，一个简单的U-Net分割基线，无需训练技巧或复杂的架构更改，仍然是变化检测任务的顶级表现者。

更新时间: 2024-04-12 16:07:55

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.06994v2

The Impact of Variable Ordering on Bayesian Network Structure Learning

Causal Bayesian Networks provide an important tool for reasoning under uncertainty with potential application to many complex causal systems. Structure learning algorithms that can tell us something about the causal structure of these systems are becoming increasingly important. In the literature, the validity of these algorithms is often tested for sensitivity over varying sample sizes, hyper-parameters, and occasionally objective functions. In this paper, we show that the order in which the variables are read from data can have much greater impact on the accuracy of the algorithm than these factors. Because the variable ordering is arbitrary, any significant effect it has on learnt graph accuracy is concerning, and this raises questions about the validity of the results produced by algorithms that are sensitive to, but have not been assessed against, different variable orderings.

Updated: 2024-04-12 16:05:03

标题: 变量排序对贝叶斯网络结构学习的影响

摘要: 因果贝叶斯网络为在不确定性下进行推理提供了一种重要工具，可能应用于许多复杂因果系统。能够告诉我们有关这些系统因果结构的结构学习算法变得越来越重要。在文献中，这些算法的有效性通常通过在不同样本量、超参数和偶尔客观函数上进行敏感性测试来检验。本文中，我们展示了从数据中读取变量的顺序对算法的准确性产生的影响远远大于这些因素。由于变量排序是任意的，它对学习图准确性的显著影响令人担忧，这引发了对对敏感但尚未针对不同变量顺序进行评估的算法产生结果有效性的质疑。

更新时间: 2024-04-12 16:05:03

领域: cs.LG

下载: http://arxiv.org/abs/2206.08952v2

IDD-X: A Multi-View Dataset for Ego-relative Important Object Localization and Explanation in Dense and Unstructured Traffic

Intelligent vehicle systems require a deep understanding of the interplay between road conditions, surrounding entities, and the ego vehicle's driving behavior for safe and efficient navigation. This is particularly critical in developing countries where traffic situations are often dense and unstructured with heterogeneous road occupants. Existing datasets, predominantly geared towards structured and sparse traffic scenarios, fall short of capturing the complexity of driving in such environments. To fill this gap, we present IDD-X, a large-scale dual-view driving video dataset. With 697K bounding boxes, 9K important object tracks, and 1-12 objects per video, IDD-X offers comprehensive ego-relative annotations for multiple important road objects covering 10 categories and 19 explanation label categories. The dataset also incorporates rearview information to provide a more complete representation of the driving environment. We also introduce custom-designed deep networks aimed at multiple important object localization and per-object explanation prediction. Overall, our dataset and introduced prediction models form the foundation for studying how road conditions and surrounding entities affect driving behavior in complex traffic situations.

Updated: 2024-04-12 16:00:03

标题: IDD-X：用于密集和非结构化交通环境中自我相关重要物体定位和解释的多视图数据集

摘要: 智能车辆系统需要深入理解道路状况、周围实体和自动驾驶车辆的驾驶行为之间的相互作用，以确保安全高效的导航。这在发展中国家尤为关键，因为交通情况通常密集且无结构，道路上的车辆种类繁多。现有数据集主要针对结构化和稀疏的交通场景，无法捕捉在这种环境中驾驶的复杂性。为了填补这一空白，我们提出了IDD-X，一个大规模双视角驾驶视频数据集。IDD-X包含697K个边界框、9K个重要对象轨迹，每个视频1-12个对象，为多个重要道路对象提供全面的自我相对标注，涵盖10个类别和19个解释标签类别。该数据集还整合了后视信息，提供了对驾驶环境更完整的表征。我们还介绍了专门设计的深度网络，旨在实现多个重要对象的定位和每个对象的解释预测。总的来说，我们的数据集和引入的预测模型为研究道路状况和周围实体如何影响复杂交通情况下的驾驶行为奠定了基础。

更新时间: 2024-04-12 16:00:03

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2404.08561v1

Scalability in Building Component Data Annotation: Enhancing Facade Material Classification with Synthetic Data

Computer vision models trained on Google Street View images can create material cadastres. However, current approaches need manually annotated datasets that are difficult to obtain and often have class imbalance. To address these challenges, this paper fine-tuned a Swin Transformer model on a synthetic dataset generated with DALL-E and compared the performance to a similar manually annotated dataset. Although manual annotation remains the gold standard, the synthetic dataset performance demonstrates a reasonable alternative. The findings will ease annotation needed to develop material cadastres, offering architects insights into opportunities for material reuse, thus contributing to the reduction of demolition waste.

Updated: 2024-04-12 15:54:48

标题: 建筑构件数据标注的可扩展性：利用合成数据增强立面材料分类

摘要: 在谷歌街景图像上训练的计算机视觉模型可以创建材料地籍。然而，目前的方法需要手动注释的数据集，这些数据集很难获得，并且经常存在类别不平衡。为了解决这些挑战，本文在使用DALL-E生成的合成数据集上对Swin Transformer模型进行了微调，并将其性能与类似的手动注释数据集进行了比较。尽管手动注释仍然是黄金标准，但合成数据集的性能表现出了一个合理的替代方案。这些发现将简化开发材料地籍所需的注释，为建筑师提供关于材料再利用机会的见解，从而有助于减少拆除废物。

更新时间: 2024-04-12 15:54:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.08557v1

RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs

State-of-the-art large language models (LLMs) have become indispensable tools for various tasks. However, training LLMs to serve as effective assistants for humans requires careful consideration. A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences and mitigate issues like toxicity and hallucinations. Yet, an understanding of RLHF for LLMs is largely entangled with initial design choices that popularized the method and current research focuses on augmenting those choices rather than fundamentally improving the framework. In this paper, we analyze RLHF through the lens of reinforcement learning principles to develop an understanding of its fundamentals, dedicating substantial focus to the core component of RLHF -- the reward model. Our study investigates modeling choices, caveats of function approximation, and their implications on RLHF training algorithms, highlighting the underlying assumptions made about the expressivity of reward. Our analysis improves the understanding of the role of reward models and methods for their training, concurrently revealing limitations of the current methodology. We characterize these limitations, including incorrect generalization, model misspecification, and the sparsity of feedback, along with their impact on the performance of a language model. The discussion and analysis are substantiated by a categorical review of current literature, serving as a reference for researchers and practitioners to understand the challenges of RLHF and build upon existing efforts.

Updated: 2024-04-12 15:54:15

标题: RLHF解读：对LLM的人类反馈强化学习的关键分析

摘要: 最先进的大型语言模型（LLMs）已经成为各种任务中不可或缺的工具。然而，训练LLMs以成为有效的人类助手需要谨慎考虑。一种有前途的方法是通过人类反馈进行强化学习（RLHF），利用人类反馈根据人类偏好更新模型，并减轻毒性和幻觉等问题。然而，对LLMs进行RLHF的理解很大程度上受到最初设计选择的影响，这些选择推广了该方法，当前的研究重点是增强这些选择，而不是从根本上改进框架。在本文中，我们通过强化学习原则的视角分析RLHF，以发展对其基本原理的理解，重点放在RLHF的核心组件 - 奖励模型上。我们的研究调查了建模选择、函数逼近的警告以及它们对RLHF训练算法的影响，突出了对奖励表达能力所做的基本假设。我们的分析提高了对奖励模型的角色及其训练方法的理解，同时揭示了当前方法的局限性。我们对这些局限性进行了表征，包括不正确的泛化、模型规范不当和反馈的稀疏性，以及它们对语言模型性能的影响。讨论和分析得到了当前文献的分类评论的支持，为研究人员和从业者理解RLHF的挑战并建立在现有工作基础上提供了参考。

更新时间: 2024-04-12 15:54:15

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.08555v1

Rotation-equivariant Graph Neural Networks for Learning Glassy Liquids Representations

The difficult problem of relating the static structure of glassy liquids and their dynamics is a good target for Machine Learning, an approach which excels at finding complex patterns hidden in data. Indeed, this approach is currently a hot topic in the glassy liquids community, where the state of the art consists in Graph Neural Networks (GNNs), which have great expressive power but are heavy models and lack interpretability. Inspired by recent advances in the field of Machine Learning group-equivariant representations, we build a GNN that learns a robust representation of the glass' static structure by constraining it to preserve the roto-translation (SE(3)) equivariance. We show that this constraint significantly improves the predictive power at comparable or reduced number of parameters but most importantly, improves the ability to generalize to unseen temperatures. While remaining a Deep network, our model has improved interpretability compared to other GNNs, as the action of our basic convolution layer relates directly to well-known rotation-invariant expert features. Through transfer-learning experiments displaying unprecedented performance, we demonstrate that our network learns a robust representation, which allows us to push forward the idea of a learned structural order parameter for glasses.

Updated: 2024-04-12 15:52:37

标题: 旋转等变图神经网络用于学习玻璃态液体表示

摘要: 困扰玻璃液体静态结构与动力学关系的困难问题是机器学习的一个良好目标，这种方法擅长发现数据中隐藏的复杂模式。事实上，这种方法目前在玻璃液体社区中是一个热门话题，其中最先进的技术是图神经网络（GNNs），它具有很强的表达能力，但是模型复杂且缺乏可解释性。受到机器学习群等变表示领域的最新进展的启发，我们构建了一个GNN，通过限制其保持旋转平移（SE(3)）等变性，学习了玻璃的静态结构的稳健表示。我们展示了这一约束显著提高了预测能力，同时具有相当或减少的参数数量，但更重要的是，提高了对未见温度的泛化能力。尽管仍然是一个深度网络，我们的模型相对于其他GNNs具有更好的可解释性，因为我们基本卷积层的作用直接与众所周知的旋转不变专家特征相关。通过展示前所未有的性能的迁移学习实验，我们证明了我们的网络学习了一个稳健的表示，这使我们能够推动关于玻璃的学习结构有序参数的想法。

更新时间: 2024-04-12 15:52:37

领域: cond-mat.soft,cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2211.03226v3

Generalization in diffusion models arises from geometry-adaptive harmonic representations

Deep neural networks (DNNs) trained for image denoising are able to generate high-quality samples with score-based reverse diffusion algorithms. These impressive capabilities seem to imply an escape from the curse of dimensionality, but recent reports of memorization of the training set raise the question of whether these networks are learning the "true" continuous density of the data. Here, we show that two DNNs trained on non-overlapping subsets of a dataset learn nearly the same score function, and thus the same density, when the number of training images is large enough. In this regime of strong generalization, diffusion-generated images are distinct from the training set, and are of high visual quality, suggesting that the inductive biases of the DNNs are well-aligned with the data density. We analyze the learned denoising functions and show that the inductive biases give rise to a shrinkage operation in a basis adapted to the underlying image. Examination of these bases reveals oscillating harmonic structures along contours and in homogeneous regions. We demonstrate that trained denoisers are inductively biased towards these geometry-adaptive harmonic bases since they arise not only when the network is trained on photographic images, but also when it is trained on image classes supported on low-dimensional manifolds for which the harmonic basis is suboptimal. Finally, we show that when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic, the denoising performance of the networks is near-optimal.

Updated: 2024-04-12 15:48:47

标题: 扩散模型中的泛化源自几何自适应谐波表示

摘要: Image denoising deep neural networks have shown impressive capabilities in generating high-quality samples using score-based reverse diffusion algorithms. However, concerns have been raised regarding the memorization of training data and whether these networks truly learn the continuous density of the data. In this study, we demonstrate that two DNNs trained on separate subsets of a dataset learn similar score functions and densities when the number of training images is sufficiently large. This indicates strong generalization and suggests that the inductive biases of the networks align well with the data density. Analysis of the learned denoising functions reveals a shrinkage operation in a basis adapted to the underlying image, characterized by oscillating harmonic structures along contours and in homogeneous regions. Trained denoisers exhibit inductive bias towards geometry-adaptive harmonic bases, even when trained on image classes supported on low-dimensional manifolds where the harmonic basis is suboptimal. Furthermore, when trained on regular image classes where the optimal basis is known to be geometry-adaptive and harmonic, the denoising performance of the networks approaches optimality.

更新时间: 2024-04-12 15:48:47

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.02557v3

Analyzing Decades-Long Environmental Changes in Namibia Using Archival Aerial Photography and Deep Learning

This study explores object detection in historical aerial photographs of Namibia to identify long-term environmental changes. Specifically, we aim to identify key objects -- \textit{Waterholes}, \textit{Omuti homesteads}, and \textit{Big trees} -- around Oshikango in Namibia using sub-meter gray-scale aerial imagery from 1943 and 1972. In this work, we propose a workflow for analyzing historical aerial imagery using a deep semantic segmentation model on sparse hand-labels. To this end, we employ a number of strategies including class-weighting, pseudo-labeling and empirical p-value-based filtering to balance skewed and sparse representations of objects in the ground truth data. Results demonstrate the benefits of these different training strategies resulting in an average $F_1=0.661$ and $F_1=0.755$ over the three objects of interest for the 1943 and 1972 imagery, respectively. We also identified that the average size of Waterhole and Big trees increased while the average size of Omutis decreased between 1943 and 1972 reflecting some of the local effects of the massive post-Second World War economic, agricultural, demographic, and environmental changes. This work also highlights the untapped potential of historical aerial photographs in understanding long-term environmental changes beyond Namibia (and Africa). With the lack of adequate satellite technology in the past, archival aerial photography offers a great alternative to uncover decades-long environmental changes.

Updated: 2024-04-12 15:37:53

标题: 利用档案航空摄影和深度学习分析纳米比亚长达数十年的环境变化

摘要: 这项研究探讨了利用纳米比亚历史航空照片进行目标检测，以识别长期环境变化。具体来说，我们旨在利用1943年和1972年的亚米尺度灰度航空影像，识别纳米比亚奥希坎戈周围的关键对象——水坑、奥穆蒂家园和大树。在这项工作中，我们提出了一个工作流程，使用深度语义分割模型对稀疏手工标记的历史航空影像进行分析。为此，我们采用了一些策略，包括类别加权、伪标记和基于经验p值的过滤，以平衡地面真实数据中对象的倾斜和稀疏表示。结果表明，这些不同的训练策略带来了好处，分别在1943年和1972年的影像中，对三个感兴趣的对象的平均$F_1=0.661$和$F_1=0.755$。我们还发现，1943年至1972年间，水坑和大树的平均大小增加，而奥穆蒂的平均大小减小，反映了二战后经济、农业、人口和环境变化的一些局部影响。这项工作还强调了利用历史航空照片理解纳米比亚（和非洲）以外的长期环境变化的潜力。过去缺乏足够的卫星技术，档案航空摄影为揭示数十年来的环境变化提供了一个重要替代方案。

更新时间: 2024-04-12 15:37:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.08544v1

Memory Traces: Are Transformers Tulving Machines?

Memory traces--changes in the memory system that result from the perception and encoding of an event--were measured in pioneering studies by Endel Tulving and Michael J. Watkins in 1975. These and further experiments informed the maturation of Tulving's memory model, from the GAPS (General Abstract Processing System} to the SPI (Serial-Parallel Independent) model. Having current top of the line LLMs revisit the original Tulving-Watkins tests may help in assessing whether foundation models completely instantiate or not this class of psychological models.

Updated: 2024-04-12 15:37:35

标题: 记忆痕迹：变压器是否是图尔文机器？

摘要: 记忆痕迹——由事件的感知和编码产生的记忆系统中的变化——是由Endel Tulving和Michael J. Watkins在1975年的开创性研究中测量的。这些以及更多的实验启发了Tulving记忆模型的发展，从GAPS（General Abstract Processing System）到SPI（Serial-Parallel Independent）模型。让当前最先进的LLMs重新审视原始的Tulving-Watkins测试可能有助于评估基础模型是否完全实现了这类心理模型。

更新时间: 2024-04-12 15:37:35

领域: cs.AI,I.2.4

下载: http://arxiv.org/abs/2404.08543v1

Multi-Task Learning for Routing Problem with Cross-Problem Zero-Shot Generalization

Vehicle routing problems (VRPs), which can be found in numerous real-world applications, have been an important research topic for several decades. Recently, the neural combinatorial optimization (NCO) approach that leverages a learning-based model to solve VRPs without manual algorithm design has gained substantial attention. However, current NCO methods typically require building one model for each routing problem, which significantly hinders their practical application for real-world industry problems with diverse attributes. In this work, we make the first attempt to tackle the crucial challenge of cross-problem generalization. In particular, we formulate VRPs as different combinations of a set of shared underlying attributes and solve them simultaneously via a single model through attribute composition. In this way, our proposed model can successfully solve VRPs with unseen attribute combinations in a zero-shot generalization manner. Extensive experiments are conducted on eleven VRP variants, benchmark datasets, and industry logistic scenarios. The results show that the unified model demonstrates superior performance in the eleven VRPs, reducing the average gap to around 5% from over 20% in the existing approach and achieving a significant performance boost on benchmark datasets as well as a real-world logistics application. The source code is included in https://github.com/FeiLiu36/MTNCO.

Updated: 2024-04-12 15:34:18

标题: 多任务学习用于具有跨问题零样本泛化的路由问题

摘要: 车辆路径问题（VRPs）可以在许多现实世界的应用中找到，多年来一直是一个重要的研究课题。最近，利用基于学习的模型来解决VRPs而无需手动设计算法的神经组合优化（NCO）方法引起了广泛关注。然而，当前的NCO方法通常需要为每个路径问题构建一个模型，这极大地阻碍了它们在具有各种属性的实际行业问题中的实际应用。在这项工作中，我们首次尝试解决跨问题泛化的关键挑战。具体而言，我们将VRPs构建为一组共享的基础属性的不同组合，并通过属性组合同时通过单一模型解决它们。通过这种方式，我们提出的模型可以以零短泛化方式成功解决具有未见属性组合的VRPs。我们在十一种VRP变体、基准数据集和工业物流场景上进行了大量实验。结果显示，统一模型在十一种VRPs中表现出优越性能，将平均差距从现有方法的超过20%减少到约5%，并在基准数据集以及真实世界的物流应用中实现了显著的性能提升。源代码包含在https://github.com/FeiLiu36/MTNCO。

更新时间: 2024-04-12 15:34:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.16891v2

Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking

Contrastive learning has gained widespread adoption for retrieval tasks due to its minimal requirement for manual annotations. However, popular contrastive frameworks typically learn from binary relevance, making them ineffective at incorporating direct fine-grained rankings. In this paper, we curate a large-scale dataset featuring detailed relevance scores for each query-document pair to facilitate future research and evaluation. Subsequently, we propose Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking (GCL), which is designed to learn from fine-grained rankings beyond binary relevance scores. Our results show that GCL achieves a 94.5% increase in NDCG@10 for in-domain and 26.3 to 48.8% increases for cold-start evaluations, all relative to the CLIP baseline and involving ground truth rankings.

Updated: 2024-04-12 15:30:03

标题: 多模态检索和排序的广义对比学习

摘要: 对比学习由于其对手动注释的最小需求而在检索任务中广泛被采用。然而，流行的对比框架通常从二元相关性中学习，使它们难以有效地整合直接的细粒度排名。在本文中，我们整理了一个大规模数据集，为每个查询-文档对提供了详细的相关性评分，以促进未来的研究和评估。随后，我们提出了适用于多模态检索和排名的广义对比学习（GCL），旨在从二元相关性分数之外的细粒度排名中学习。我们的结果表明，相对于CLIP基线并涉及基准排名，GCL在领域内NDCG@10上实现了94.5%的增长，并在冷启动评估中实现了26.3至48.8%的增长。

更新时间: 2024-04-12 15:30:03

领域: cs.IR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.08535v1

Advancing Forest Fire Prevention: Deep Reinforcement Learning for Effective Firebreak Placement

Over the past decades, the increase in both frequency and intensity of large-scale wildfires due to climate change has emerged as a significant natural threat. The pressing need to design resilient landscapes capable of withstanding such disasters has become paramount, requiring the development of advanced decision-support tools. Existing methodologies, including Mixed Integer Programming, Stochastic Optimization, and Network Theory, have proven effective but are hindered by computational demands, limiting their applicability. In response to this challenge, we propose using artificial intelligence techniques, specifically Deep Reinforcement Learning, to address the complex problem of firebreak placement in the landscape. We employ value-function based approaches like Deep Q-Learning, Double Deep Q-Learning, and Dueling Double Deep Q-Learning. Utilizing the Cell2Fire fire spread simulator combined with Convolutional Neural Networks, we have successfully implemented a computational agent capable of learning firebreak locations within a forest environment, achieving good results. Furthermore, we incorporate a pre-training loop, initially teaching our agent to mimic a heuristic-based algorithm and observe that it consistently exceeds the performance of these solutions. Our findings underscore the immense potential of Deep Reinforcement Learning for operational research challenges, especially in fire prevention. Our approach demonstrates convergence with highly favorable results in problem instances as large as 40 x 40 cells, marking a significant milestone in applying Reinforcement Learning to this critical issue. To the best of our knowledge, this study represents a pioneering effort in using Reinforcement Learning to address the aforementioned problem, offering promising perspectives in fire prevention and landscape management

Updated: 2024-04-12 15:10:57

标题: 推动森林火灾预防：深度强化学习用于有效的防火带放置

摘要: 在过去几十年中，由于气候变化导致大规模野火的频率和强度增加，已经成为一个重要的自然威胁。迫切需要设计能够抵御这类灾害的有弹性的景观，这要求开发先进的决策支持工具。现有的方法，包括混合整数规划、随机优化和网络理论，已经被证明是有效的，但受到计算需求的限制，限制了它们的适用性。为了应对这一挑战，我们提议使用人工智能技术，特别是深度强化学习，来解决景观中火线布置的复杂问题。我们采用基于价值函数的方法，如深度Q学习、双深度Q学习和对抗双深度Q学习。利用Cell2Fire火灾传播模拟器结合卷积神经网络，我们成功实现了一个能够学习森林环境中火线位置的计算代理，并取得了良好的结果。此外，我们还加入了一个预训练循环，最初教导我们的代理模仿启发式算法，并观察到它始终超过了这些解决方案的性能。我们的研究结果强调了深度强化学习在操作研究挑战中的巨大潜力，特别是在防火方面。我们的方法表明，在问题实例规模达到40 x 40个单元格时，具有高度有利的结果，标志着将强化学习应用于这一关键问题的重要里程碑。据我们所知，这项研究代表了在解决上述问题中使用强化学习的开创性努力，为防火和景观管理提供了有希望的展望。

更新时间: 2024-04-12 15:10:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.08523v1

Rethinking How to Evaluate Language Model Jailbreak

Large language models (LLMs) have become increasingly integrated with various applications. To ensure that LLMs do not generate unsafe responses, they are aligned with safeguards that specify what content is restricted. However, such alignment can be bypassed to produce prohibited content using a technique commonly referred to as jailbreak. Different systems have been proposed to perform the jailbreak automatically. These systems rely on evaluation methods to determine whether a jailbreak attempt is successful. However, our analysis reveals that current jailbreak evaluation methods have two limitations. (1) Their objectives lack clarity and do not align with the goal of identifying unsafe responses. (2) They oversimplify the jailbreak result as a binary outcome, successful or not. In this paper, we propose three metrics, safeguard violation, informativeness, and relative truthfulness, to evaluate language model jailbreak. Additionally, we demonstrate how these metrics correlate with the goal of different malicious actors. To compute these metrics, we introduce a multifaceted approach that extends the natural language generation evaluation method after preprocessing the response. We evaluate our metrics on a benchmark dataset produced from three malicious intent datasets and three jailbreak systems. The benchmark dataset is labeled by three annotators. We compare our multifaceted approach with three existing jailbreak evaluation methods. Experiments demonstrate that our multifaceted evaluation outperforms existing methods, with F1 scores improving on average by 17% compared to existing baselines. Our findings motivate the need to move away from the binary view of the jailbreak problem and incorporate a more comprehensive evaluation to ensure the safety of the language model.

Updated: 2024-04-12 15:02:15

标题: 重新思考如何评估语言模型越狱

摘要: 大型语言模型（LLMs）越来越多地与各种应用程序集成。为确保LLMs不会生成不安全的响应，它们与规范相结合，规定了受限内容是什么。然而，这种对齐可以被绕过，使用一种通常被称为越狱的技术来生成被禁止的内容。已经提出了不同的系统来自动执行越狱。这些系统依赖于评估方法来确定越狱尝试是否成功。然而，我们的分析显示，当前的越狱评估方法存在两个限制。（1）它们的目标缺乏清晰性，不符合识别不安全响应的目标。（2）它们过度简化越狱结果为二进制结果，成功与否。在本文中，我们提出了三个指标，即安全违规、信息量和相对真实性，用于评估语言模型的越狱。此外，我们展示了这些指标与不同恶意行为者的目标的相关性。为了计算这些指标，我们介绍了一个多方面的方法，扩展了自然语言生成评估方法，在预处理响应后进行评估。我们在从三个恶意意图数据集和三个越狱系统产生的基准数据集上评估我们的指标。基准数据集由三位注释者标记。我们将我们的多方面方法与三种现有的越狱评估方法进行比较。实验证明，我们的多方面评估优于现有方法，平均F1分数比现有基线提高了17％。我们的发现促使我们需要摆脱对越狱问题的二进制观点，并纳入更全面的评估，以确保语言模型的安全。

更新时间: 2024-04-12 15:02:15

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2404.06407v2

Fuxi-DA: A Generalized Deep Learning Data Assimilation Framework for Assimilating Satellite Observations

Data assimilation (DA), as an indispensable component within contemporary Numerical Weather Prediction (NWP) systems, plays a crucial role in generating the analysis that significantly impacts forecast performance. Nevertheless, the development of an efficient DA system poses significant challenges, particularly in establishing intricate relationships between the background data and the vast amount of multi-source observation data within limited time windows in operational settings. To address these challenges, researchers design complex pre-processing methods for each observation type, leveraging approximate modeling and the power of super-computing clusters to expedite solutions. The emergence of deep learning (DL) models has been a game-changer, offering unified multi-modal modeling, enhanced nonlinear representation capabilities, and superior parallelization. These advantages have spurred efforts to integrate DL models into various domains of weather modeling. Remarkably, DL models have shown promise in matching, even surpassing, the forecast accuracy of leading operational NWP models worldwide. This success motivates the exploration of DL-based DA frameworks tailored for weather forecasting models. In this study, we introduces FuxiDA, a generalized DL-based DA framework for assimilating satellite observations. By assimilating data from Advanced Geosynchronous Radiation Imager (AGRI) aboard Fengyun-4B, FuXi-DA consistently mitigates analysis errors and significantly improves forecast performance. Furthermore, through a series of single-observation experiments, Fuxi-DA has been validated against established atmospheric physics, demonstrating its consistency and reliability.

Updated: 2024-04-12 15:02:14

标题: Fuxi-DA：用于同化卫星观测的通用深度学习数据同化框架

摘要: 数据同化（DA）作为当代数值天气预报（NWP）系统中不可或缺的组成部分，在生成显著影响预报性能的分析中发挥着至关重要的作用。然而，开发高效的DA系统面临着重大挑战，特别是在运行设置中在有限时间窗口内建立背景数据和大量多源观测数据之间的复杂关系。为了解决这些挑战，研究人员为每种观测类型设计了复杂的预处理方法，利用近似建模和超级计算集群的能力来加快解决方案。深度学习（DL）模型的出现改变了游戏规则，提供了统一的多模态建模、增强的非线性表示能力和卓越的并行化。这些优势推动了将DL模型整合到各个天气建模领域的努力。值得注意的是，DL模型已经显示出在全球领先的运营NWP模型的预报准确性方面具有匹配甚至超越的潜力。这一成功激励着探索为天气预报模型量身定制的基于DL的DA框架。在本研究中，我们介绍了FuxiDA，一个通用的基于DL的DA框架，用于同化卫星观测数据。通过同化来自风云四号B卫星上的高级地球同步辐射成像仪（AGRI）的数据，FuXi-DA不断减少分析误差，显著改善了预报性能。此外，通过一系列单次观测实验，Fuxi-DA已经经过了对已建立的大气物理学的验证，展示了其一致性和可靠性。

更新时间: 2024-04-12 15:02:14

领域: cs.LG,physics.ao-ph

下载: http://arxiv.org/abs/2404.08522v1

SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera

One of the most critical factors in achieving sharp Novel View Synthesis (NVS) using neural field methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) is the quality of the training images. However, Conventional RGB cameras are susceptible to motion blur. In contrast, neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information, which can provide a sharp representation of the scene as additional training data. Recent methods have explored the integration of event cameras to improve the quality of NVS. The event-RGB approaches have some limitations, such as high training costs and the inability to work effectively in the background. Instead, our study introduces a new method that uses the spike camera to overcome these limitations. By considering texture reconstruction from spike streams as ground truth, we design the Texture from Spike (TfS) loss. Since the spike camera relies on temporal integration instead of temporal differentiation used by event cameras, our proposed TfS loss maintains manageable training costs. It handles foreground objects with backgrounds simultaneously. We also provide a real-world dataset captured with our spike-RGB camera system to facilitate future research endeavors. We conduct extensive experiments using synthetic and real-world datasets to demonstrate that our design can enhance novel view synthesis across NeRF and 3DGS. The code and dataset will be made available for public access.

Updated: 2024-04-12 14:58:21

标题: SpikeNVS：通过尖峰相机增强模糊图像的新视角合成

摘要: 通过使用神经场方法，如神经辐射场（NeRF）和3D高斯喷斑（3DGS），实现尖锐的新视角合成（NVS）是训练图像质量中最关键的因素之一。然而，传统的RGB相机容易受到运动模糊的影响。相比之下，类似事件和脉冲相机的神经形态摄像头固有地捕获更全面的时间信息，这可以作为额外的训练数据提供对场景的尖锐表示。最近的方法探索了整合事件相机以改善NVS质量。事件-RGB方法存在一些局限性，例如高训练成本和无法在后台有效工作。相反，我们的研究引入了一种使用脉冲相机来克服这些限制的新方法。通过将脉冲流的纹理重建作为地面实况，我们设计了脉冲纹理（TfS）损失。由于脉冲相机依赖于时间积分，而不是事件相机使用的时间微分，我们提出的TfS损失保持了可管理的训练成本。它同时处理前景对象和背景。我们还提供了使用我们的脉冲-RGB相机系统捕获的真实世界数据集，以促进未来的研究努力。我们使用合成和真实世界数据集进行了广泛的实验，以证明我们的设计可以增强NeRF和3DGS的新视角合成。代码和数据集将提供给公众访问。

更新时间: 2024-04-12 14:58:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.06710v3

Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward

While Large Language Models (LLMs) have seen widespread applications across numerous fields, their limited interpretability poses concerns regarding their safe operations from multiple aspects, e.g., truthfulness, robustness, and fairness. Recent research has started developing quality assurance methods for LLMs, introducing techniques such as offline detector-based or uncertainty estimation methods. However, these approaches predominantly concentrate on post-generation analysis, leaving the online safety analysis for LLMs during the generation phase an unexplored area. To bridge this gap, we conduct in this work a comprehensive evaluation of the effectiveness of existing online safety analysis methods on LLMs. We begin with a pilot study that validates the feasibility of detecting unsafe outputs in the early generation process. Following this, we establish the first publicly available benchmark of online safety analysis for LLMs, including a broad spectrum of methods, models, tasks, datasets, and evaluation metrics. Utilizing this benchmark, we extensively analyze the performance of state-of-the-art online safety analysis methods on both open-source and closed-source LLMs. This analysis reveals the strengths and weaknesses of individual methods and offers valuable insights into selecting the most appropriate method based on specific application scenarios and task requirements. Furthermore, we also explore the potential of using hybridization methods, i.e., combining multiple methods to derive a collective safety conclusion, to enhance the efficacy of online safety analysis for LLMs. Our findings indicate a promising direction for the development of innovative and trustworthy quality assurance methodologies for LLMs, facilitating their reliable deployments across diverse domains.

Updated: 2024-04-12 14:55:16

标题: LLM的在线安全分析：基准、评估和前进路径

摘要: 尽管大型语言模型（LLMs）在许多领域得到了广泛的应用，但它们有限的可解释性引发了关于它们安全运行的多方面担忧，例如真实性、稳健性和公平性。最近的研究已经开始为LLMs开发质量保证方法，引入了离线基于检测器或不确定性估计方法等技术。然而，这些方法主要集中在后生成分析上，而在生成阶段对LLMs进行在线安全分析的研究尚未开展。为了弥补这一差距，我们在这项工作中对现有在线安全分析方法在LLMs上的有效性进行了全面评估。我们从一项试点研究开始，验证了在早期生成过程中检测不安全输出的可行性。随后，我们建立了第一个公开可用的LLMs在线安全分析基准，包括广泛的方法、模型、任务、数据集和评估指标。利用这个基准，我们广泛分析了最先进的在线安全分析方法在开源和闭源LLMs上的性能。这一分析揭示了各种方法的优势和劣势，并为根据特定应用场景和任务需求选择最适合的方法提供了宝贵的见解。此外，我们还探讨了使用混合方法的潜力，即结合多种方法得出集体安全结论，以增强LLMs的在线安全分析效果。我们的研究结果表明了为LLMs开发创新和可靠的质量保证方法的有希望方向，促进它们在各个领域的可靠部署。

更新时间: 2024-04-12 14:55:16

领域: cs.SE,cs.AI,cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2404.08517v1

Adversarial Imitation Learning via Boosting

Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning algorithms in improving sample efficiency and scalability to higher-dimensional observations. Despite DAC's empirical success, the original AIL objective is on-policy and DAC's ad-hoc application of off-policy training does not guarantee successful imitation (Kostrikov et al., 2019; 2020). Follow-up work such as ValueDICE (Kostrikov et al., 2020) tackles this issue by deriving a fully off-policy AIL objective. Instead in this work, we develop a novel and principled AIL algorithm via the framework of boosting. Like boosting, our new algorithm, AILBoost, maintains an ensemble of properly weighted weak learners (i.e., policies) and trains a discriminator that witnesses the maximum discrepancy between the distributions of the ensemble and the expert policy. We maintain a weighted replay buffer to represent the state-action distribution induced by the ensemble, allowing us to train discriminators using the entire data collected so far. In the weighted replay buffer, the contribution of the data from older policies are properly discounted with the weight computed based on the boosting framework. Empirically, we evaluate our algorithm on both controller state-based and pixel-based environments from the DeepMind Control Suite. AILBoost outperforms DAC on both types of environments, demonstrating the benefit of properly weighting replay buffer data for off-policy training. On state-based environments, DAC outperforms ValueDICE and IQ-Learn (Gary et al., 2021), achieving competitive performance with as little as one expert trajectory.

Updated: 2024-04-12 14:53:36

标题: 对抗性模仿学习通过增强学习

摘要: 对抗性模仿学习（AIL）已经成为各种模仿学习（IL）应用中的主导框架，具有区分器演员评论家（DAC）（Kostrikov等人，2019）展示了离策略学习算法在提高样本效率和可伸缩性到更高维观察中的有效性。尽管DAC在经验上取得了成功，但原始的AIL目标是在策略上的，DAC对离策略训练的临时应用不能保证成功的模仿（Kostrikov等人，2019；2020）。后续工作，如ValueDICE（Kostrikov等人，2020）通过导出一个完全离策略的AIL目标来解决这个问题。在这项工作中，我们通过增强框架开发了一种新颖且有原则的AIL算法。与增强相似，我们的新算法AILBoost维护一个适当加权的弱学习器（即策略）集合，并训练一个区分器，它观察集合和专家策略之间的分布之间的最大差异。我们维护一个加权重播缓冲区，以表示集合诱导的状态-动作分布，从而使我们能够使用迄今收集的所有数据来训练区分器。在加权重播缓冲区中，从较旧策略获取的数据的贡献是正确折扣的，权重是根据增强框架计算的。从经验上看，我们在DeepMind控制套件中评估了我们的算法，包括基于控制器状态和基于像素的环境。AILBoost在这两种类型的环境中表现优于DAC，展示了正确加权重播缓冲区数据对离策略训练的好处。在基于状态的环境中，DAC优于ValueDICE和IQ-Learn（Gary等人，2021），即使只有一条专家轨迹，也能实现竞争性表现。

更新时间: 2024-04-12 14:53:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.08513v1

Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction

Are multimodal inputs necessary for grammar induction? Recent work has shown that multimodal training inputs can improve grammar induction. However, these improvements are based on comparisons to weak text-only baselines that were trained on relatively little textual data. To determine whether multimodal inputs are needed in regimes with large amounts of textual training data, we design a stronger text-only baseline, which we refer to as LC-PCFG. LC-PCFG is a C-PFCG that incorporates em-beddings from text-only large language models (LLMs). We use a fixed grammar family to directly compare LC-PCFG to various multi-modal grammar induction methods. We compare performance on four benchmark datasets. LC-PCFG provides an up to 17% relative improvement in Corpus-F1 compared to state-of-the-art multimodal grammar induction methods. LC-PCFG is also more computationally efficient, providing an up to 85% reduction in parameter count and 8.8x reduction in training time compared to multimodal approaches. These results suggest that multimodal inputs may not be necessary for grammar induction, and emphasize the importance of strong vision-free baselines for evaluating the benefit of multimodal approaches.

Updated: 2024-04-12 14:53:30

标题: 重新评估在无监督语法归纳中多模态信号的必要性

摘要: 是否需要多模态输入来进行语法归纳？最近的研究表明，多模态训练输入可以改善语法归纳。然而，这些改进是基于与仅基于相对较少的文本数据训练的弱文本基线的比较而得出的。为了确定在具有大量文本训练数据的情况下是否需要多模态输入，我们设计了一个更强大的仅文本基线，我们将其称为LC-PCFG。LC-PCFG是一个包含来自仅文本大型语言模型（LLMs）的嵌入的C-PFCG。我们使用固定的语法族直接将LC-PCFG与各种多模态语法归纳方法进行比较。我们在四个基准数据集上比较性能。与最先进的多模态语法归纳方法相比，LC-PCFG在语料库F1上提供了高达17%的相对改进。LC-PCFG还更具计算效率，与多模态方法相比，参数数量减少高达85%，训练时间减少8.8倍。这些结果表明，对于语法归纳，可能并不需要多模态输入，并强调了评估多模态方法的好处的强大无视觉基线的重要性。

更新时间: 2024-04-12 14:53:30

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2212.10564v3

RFFNet: Large-Scale Interpretable Kernel Methods via Random Fourier Features

Kernel methods provide a flexible and theoretically grounded approach to nonlinear and nonparametric learning. While memory and run-time requirements hinder their applicability to large datasets, many low-rank kernel approximations, such as random Fourier features, were recently developed to scale up such kernel methods. However, these scalable approaches are based on approximations of isotropic kernels, which cannot remove the influence of irrelevant features. In this work, we design random Fourier features for a family of automatic relevance determination (ARD) kernels, and introduce RFFNet, a new large-scale kernel method that learns the kernel relevances' on the fly via first-order stochastic optimization. We present an effective initialization scheme for the method's non-convex objective function, evaluate if hard-thresholding RFFNet's learned relevances yield a sensible rule for variable selection, and perform an extensive ablation study of RFFNet's components. Numerical validation on simulated and real-world data shows that our approach has a small memory footprint and run-time, achieves low prediction error, and effectively identifies relevant features, thus leading to more interpretable solutions. We supply users with an efficient, PyTorch-based library, that adheres to the scikit-learn standard API and code for fully reproducing our results.

Updated: 2024-04-12 14:51:32

标题: RFFNet：通过随机傅里叶特征实现大规模可解释的核方法

摘要: 核方法提供了一种灵活且理论基础扎实的非线性和非参数学习方法。尽管内存和运行时间要求限制了它们对大型数据集的适用性，但许多低秩核近似方法，如随机傅立叶特征，最近被开发用于扩展这些核方法。然而，这些可扩展方法基于各向同性核的近似，无法消除无关特征的影响。在这项工作中，我们为一系列自动相关性确定（ARD）核设计了随机傅立叶特征，引入了RFFNet，一种新的大规模核方法，通过一阶随机优化实时学习核相关性。我们提出了一种有效的初始化方案，用于该方法的非凸目标函数，并评估了硬阈值化RFFNet学习到的相关性是否产生合理的变量选择规则，并对RFFNet的组件进行了广泛的消融研究。在模拟和真实数据上进行的数值验证表明，我们的方法具有较小的内存占用和运行时间，实现了低预测误差，并有效识别了相关特征，从而导致更可解释的解决方案。我们为用户提供了一个高效的基于PyTorch的库，符合scikit-learn标准API，并提供完全复现我们结果的代码。

更新时间: 2024-04-12 14:51:32

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2211.06410v2

Leveraging Multi-AI Agents for Cross-Domain Knowledge Discovery

In the rapidly evolving field of artificial intelligence, the ability to harness and integrate knowledge across various domains stands as a paramount challenge and opportunity. This study introduces a novel approach to cross-domain knowledge discovery through the deployment of multi-AI agents, each specialized in distinct knowledge domains. These AI agents, designed to function as domain-specific experts, collaborate in a unified framework to synthesize and provide comprehensive insights that transcend the limitations of single-domain expertise. By facilitating seamless interaction among these agents, our platform aims to leverage the unique strengths and perspectives of each, thereby enhancing the process of knowledge discovery and decision-making. We present a comparative analysis of the different multi-agent workflow scenarios evaluating their performance in terms of efficiency, accuracy, and the breadth of knowledge integration. Through a series of experiments involving complex, interdisciplinary queries, our findings demonstrate the superior capability of domain specific multi-AI agent system in identifying and bridging knowledge gaps. This research not only underscores the significance of collaborative AI in driving innovation but also sets the stage for future advancements in AI-driven, cross-disciplinary research and application. Our methods were evaluated on a small pilot data and it showed a trend we expected, if we increase the amount of data we custom train the agents, the trend is expected to be more smooth.

Updated: 2024-04-12 14:50:41

标题: 利用多个AI代理进行跨领域知识发现

摘要: 在快速发展的人工智能领域，利用和整合各个领域的知识能力是一个重要的挑战和机遇。本研究引入了一种新颖的跨领域知识发现方法，通过部署多个专门针对不同知识领域的AI代理来实现。这些AI代理被设计为领域特定的专家，它们在统一框架中合作，综合提供超越单一领域专业知识限制的全面见解。通过促进这些代理之间的无缝交互，我们的平台旨在利用每个代理的独特优势和视角，从而增强知识发现和决策过程。我们对不同多代理工作流场景进行了比较分析，评估它们在效率、准确性和知识整合广度方面的表现。通过一系列涉及复杂的跨学科查询的实验，我们的研究结果表明，领域特定的多AI代理系统在识别和弥合知识缺口方面具有卓越能力。这项研究不仅强调了合作人工智能推动创新的重要性，还为未来基于人工智能的跨学科研究和应用的进展奠定了基础。我们的方法在小规模试验数据上进行了评估，并显示了我们预期的趋势，如果增加训练代理的数据量，预计趋势会更加平稳。

更新时间: 2024-04-12 14:50:41

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.08511v1

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction

Large language models (LLMs) have been driving a new wave of interactive AI applications across numerous domains. However, efficiently serving LLM inference requests is challenging due to their unpredictable execution times originating from the autoregressive nature of generative models. Existing LLM serving systems exploit first-come-first-serve (FCFS) scheduling, suffering from head-of-line blocking issues. To address the non-deterministic nature of LLMs and enable efficient interactive LLM serving, we present a speculative shortest-job-first (SSJF) scheduler that uses a light proxy model to predict LLM output sequence lengths. Our open-source SSJF implementation does not require changes to memory management or batching strategies. Evaluations on real-world datasets and production workload traces show that SSJF reduces average job completion times by 30.5-39.6% and increases throughput by 2.2-3.6x compared to FCFS schedulers, across no batching, dynamic batching, and continuous batching settings.

Updated: 2024-04-12 14:46:15

标题: 高效的交互式LLM服务与基于代理模型的序列长度预测

摘要: 大型语言模型(LLMs)正在推动新一波跨多个领域的交互式人工智能应用。然而，由于生成模型的自回归特性导致执行时间不可预测，因此有效地处理LLM推断请求是具有挑战性的。现有的LLM服务系统利用先到先服务(FCFS)调度，存在头部阻塞问题。为了解决LLMs的非确定性特性并实现高效的交互式LLM服务，我们提出了一种猜测最短作业优先(SSJF)调度器，该调度器使用轻量级代理模型来预测LLM输出序列长度。我们的开源SSJF实现不需要更改内存管理或批处理策略。对真实数据集和生产工作负载跟踪的评估显示，与FCFS调度器相比，SSJF将平均作业完成时间缩短了30.5-39.6%，吞吐量提高了2.2-3.6倍，在不批处理、动态批处理和连续批处理设置下均如此。

更新时间: 2024-04-12 14:46:15

领域: cs.DC,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.08509v1

Approximate Stein Classes for Truncated Density Estimation

Estimating truncated density models is difficult, as these models have intractable normalising constants and hard to satisfy boundary conditions. Score matching can be adapted to solve the truncated density estimation problem, but requires a continuous weighting function which takes zero at the boundary and is positive elsewhere. Evaluation of such a weighting function (and its gradient) often requires a closed-form expression of the truncation boundary and finding a solution to a complicated optimisation problem. In this paper, we propose approximate Stein classes, which in turn leads to a relaxed Stein identity for truncated density estimation. We develop a novel discrepancy measure, truncated kernelised Stein discrepancy (TKSD), which does not require fixing a weighting function in advance, and can be evaluated using only samples on the boundary. We estimate a truncated density model by minimising the Lagrangian dual of TKSD. Finally, experiments show the accuracy of our method to be an improvement over previous works even without the explicit functional form of the boundary.

Updated: 2024-04-12 14:45:07

标题: 截断密度估计的近似Stein类别

摘要: 估计截断密度模型是困难的，因为这些模型具有难以处理的归一化常数和难以满足边界条件。得分匹配可以被改进以解决截断密度估计问题，但需要一个连续的加权函数，该函数在边界处为零，在其他地方为正。这种加权函数（及其梯度）的评估通常需要一个截断边界的闭合形式表达，并找到一个解决复杂优化问题的解。在本文中，我们提出了近似斯坦类，从而导致截断密度估计的放松斯坦恒等式。我们开发了一种新颖的差异度量，截断核斯坦距离（TKSD），它不需要事先固定一个加权函数，并且只能使用边界上的样本进行评估。我们通过最小化TKSD的Lagrange对偶来估计截断密度模型。最后，实验证明，即使没有边界的显式函数形式，我们的方法的准确性也优于先前的工作。

更新时间: 2024-04-12 14:45:07

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2306.00602v2

Identifying Important Group of Pixels using Interactions

To better understand the behavior of image classifiers, it is useful to visualize the contribution of individual pixels to the model prediction. In this study, we propose a method, MoXI ($\textbf{Mo}$del e$\textbf{X}$planation by $\textbf{I}$nteractions), that efficiently and accurately identifies a group of pixels with high prediction confidence. The proposed method employs game-theoretic concepts, Shapley values and interactions, taking into account the effects of individual pixels and the cooperative influence of pixels on model confidence. Theoretical analysis and experiments demonstrate that our method better identifies the pixels that are highly contributing to the model outputs than widely-used visualization by Grad-CAM, Attention rollout, and Shapley value. While prior studies have suffered from the exponential computational cost in the computation of Shapley value and interactions, we show that this can be reduced to quadratic cost for our task. The code is available at https://github.com/KosukeSumiyasu/MoXI.

Updated: 2024-04-12 14:44:04

标题: 通过交互识别重要像素组

摘要: 为了更好地理解图像分类器的行为，有必要可视化单个像素对模型预测的贡献。在这项研究中，我们提出了一种方法，MoXI（模型解释通过交互），能够高效准确地识别预测置信度高的像素组。所提出的方法采用博弈论概念、Shapley值和交互作用，考虑了单个像素的影响以及像素对模型置信度的协同影响。理论分析和实验证明，我们的方法能够更好地识别对模型输出有高贡献的像素，比Grad-CAM、Attention rollout和Shapley值等广泛使用的可视化方法表现更好。尽管先前的研究在计算Shapley值和交互作用时受到指数级计算成本的困扰，但我们展示出对于我们的任务，这可以降低到二次成本。代码可在https://github.com/KosukeSumiyasu/MoXI 上找到。

更新时间: 2024-04-12 14:44:04

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.03785v2

Automated Verification of Equivalence Properties in Advanced Logic Programs -- Bachelor Thesis

With the increase in industrial applications using Answer Set Programming, the need for formal verification tools, particularly for critical applications, has also increased. During the program optimisation process, it would be desirable to have a tool which can automatically verify whether an optimised subprogram can replace the original subprogram. Formally this corresponds to the problem of verifying the strong equivalence of two programs. In order to do so, the translation tool anthem was developed. It can be used in conjunction with an automated theorem prover for classical logic to verify that two programs are strongly equivalent. With the current version of anthem, only the strong equivalence of positive programs with a restricted input language can be verified. This is a result of the translation $\tau^*$ implemented in anthem that produces formulas in the logic of here-and-there, which coincides with classical logic only for positive programs. This thesis extends anthem in order to overcome these limitations. First, the transformation $\sigma^*$ is presented, which transforms formulas from the logic of here-and-there to classical logic. A theorem formalises how $\sigma^*$ can be used to express equivalence in the logic of here-and-there in classical logic. Second, the translation $\tau^*$ is extended to programs containing pools. Another theorem shows how $\sigma^*$ can be combined with $\tau^*$ to express the strong equivalence of two programs in classical logic. With $\sigma^*$ and the extended $\tau^*$, it is possible to express the strong equivalence of logic programs containing negation, simple choices, and pools. Both the extended $\tau^*$ and $\sigma^*$ are implemented in a new version of anthem. Several examples of logic programs containing pools, negation, and simple choice rules, which the new version of anthem can translate to classical logic, are presented. Some a...

Updated: 2024-04-12 14:43:21

标题: 高级逻辑程序中等价性属性的自动验证 -- 学士论文

摘要: 随着工业应用中使用Answer Set Programming的增加，对于正式验证工具的需求，特别是对于关键应用程序，也在增加。在程序优化过程中，希望有一个工具可以自动验证优化的子程序是否可以替代原始子程序。从形式上来说，这对应于验证两个程序的强等价性的问题。为了做到这一点，开发了翻译工具anthem。它可以与经典逻辑的自动定理证明器一起使用，以验证两个程序是否强等价。在当前版本的anthem中，只能验证具有受限输入语言的正程序的强等价性。这是anthem中实现的翻译$\tau^*$的结果，它生成了与古典逻辑中的此处和彼处逻辑相一致的公式，仅适用于正程序。本论文扩展了anthem以克服这些限制。首先，介绍了转换$\sigma^*$，它将此处和彼处逻辑中的公式转换为古典逻辑。一个定理形式化了如何使用$\sigma^*$在古典逻辑中表达此处和彼处逻辑中的等价性。其次，扩展了包含池的程序的翻译$\tau^*$。另一个定理显示了如何将$\sigma^*$与$\tau^*$结合起来在古典逻辑中表达两个程序的强等价性。借助$\sigma^*$和扩展的$\tau^*$，可以表达包含否定、简单选择和池的逻辑程序的强等价性。新版本的anthem中实现了扩展的$\tau^*$和$\sigma^*$。展示了新版本的anthem可以将包含池、否定和简单选择规则的逻辑程序转换为古典逻辑的几个示例。其中一些...

更新时间: 2024-04-12 14:43:21

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2310.19806v3

Beyond Bayesian Model Averaging over Paths in Probabilistic Programs with Stochastic Support

The posterior in probabilistic programs with stochastic support decomposes as a weighted sum of the local posterior distributions associated with each possible program path. We show that making predictions with this full posterior implicitly performs a Bayesian model averaging (BMA) over paths. This is potentially problematic, as BMA weights can be unstable due to model misspecification or inference approximations, leading to sub-optimal predictions in turn. To remedy this issue, we propose alternative mechanisms for path weighting: one based on stacking and one based on ideas from PAC-Bayes. We show how both can be implemented as a cheap post-processing step on top of existing inference engines. In our experiments, we find them to be more robust and lead to better predictions compared to the default BMA weights.

Updated: 2024-04-12 14:36:18

标题: 在具有随机支持的概率程序中超越贝叶斯模型平均路径

摘要: 在具有随机支持的概率程序中，后验分解为与每个可能程序路径相关联的局部后验分布的加权和。我们表明，使用这个完整后验进行预测隐式执行了对路径的贝叶斯模型平均（BMA）。这可能存在问题，因为由于模型误设或推断近似，BMA权重可能不稳定，从而导致次优预测。为了解决这个问题，我们提出了基于堆叠和基于PAC-Bayes思想的路径加权的替代机制。我们展示了如何将两种方法作为现有推断引擎的廉价后处理步骤来实现。在我们的实验中，我们发现它们比默认的BMA权重更加稳健，并且导致更好的预测结果。

更新时间: 2024-04-12 14:36:18

领域: cs.LG,cs.PL

下载: http://arxiv.org/abs/2310.14888v2

Integrated Variational Fourier Features for Fast Spatial Modelling with Gaussian Processes

Sparse variational approximations are popular methods for scaling up inference and learning in Gaussian processes to larger datasets. For $N$ training points, exact inference has $O(N^3)$ cost; with $M \ll N$ features, state of the art sparse variational methods have $O(NM^2)$ cost. Recently, methods have been proposed using more sophisticated features; these promise $O(M^3)$ cost, with good performance in low dimensional tasks such as spatial modelling, but they only work with a very limited class of kernels, excluding some of the most commonly used. In this work, we propose integrated Fourier features, which extends these performance benefits to a very broad class of stationary covariance functions. We motivate the method and choice of parameters from a convergence analysis and empirical exploration, and show practical speedup in synthetic and real world spatial regression tasks.

Updated: 2024-04-12 14:31:51

标题: 集成变分傅立叶特征用于高斯过程快速空间建模

摘要: 稀疏变分逼近是用于扩展高斯过程推理和学习至更大数据集的流行方法。对于$N$个训练点，精确推理的成本为$O(N^3)$；而当$M \ll N$时，最先进的稀疏变分方法的成本为$O(NM^2)$。最近，提出了使用更复杂特征的方法；这些方法承诺具有$O(M^3)$的成本，在低维任务如空间建模中表现良好，但它们只适用于一类非常有限的核函数，排除了一些常用的核函数。在这项工作中，我们提出了整合傅里叶特征，将这些性能优势扩展至非常广泛的平稳协方差函数类。我们从收敛分析和经验探索的角度来解释该方法和参数选择，并展示了在合成和实际空间回归任务中的实际加速效果。

更新时间: 2024-04-12 14:31:51

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2308.14142v2

Harnessing the Power of Large Language Model for Uncertainty Aware Graph Processing

Handling graph data is one of the most difficult tasks. Traditional techniques, such as those based on geometry and matrix factorization, rely on assumptions about the data relations that become inadequate when handling large and complex graph data. On the other hand, deep learning approaches demonstrate promising results in handling large graph data, but they often fall short of providing interpretable explanations. To equip the graph processing with both high accuracy and explainability, we introduce a novel approach that harnesses the power of a large language model (LLM), enhanced by an uncertainty-aware module to provide a confidence score on the generated answer. We experiment with our approach on two graph processing tasks: few-shot knowledge graph completion and graph classification. Our results demonstrate that through parameter efficient fine-tuning, the LLM surpasses state-of-the-art algorithms by a substantial margin across ten diverse benchmark datasets. Moreover, to address the challenge of explainability, we propose an uncertainty estimation based on perturbation, along with a calibration scheme to quantify the confidence scores of the generated answers. Our confidence measure achieves an AUC of 0.8 or higher on seven out of the ten datasets in predicting the correctness of the answer generated by LLM.

Updated: 2024-04-12 14:30:10

标题: 利用大型语言模型的力量进行不确定性感知图处理

摘要: 处理图数据是最困难的任务之一。传统技术，如基于几何和矩阵分解的技术，依赖于对数据关系的假设，当处理大规模和复杂的图数据时，这些假设变得不足够。另一方面，深度学习方法在处理大规模图数据方面表现出有希望的结果，但它们常常无法提供可解释的解释。为了使图处理具有高精度和可解释性，我们引入了一种新颖的方法，利用大型语言模型（LLM）的能力，并通过一个带有不确定性感知模块来提供生成答案的置信度分数。我们在两个图处理任务上通过我们的方法进行实验：少样本知识图完成和图分类。我们的结果表明，通过参数高效的微调，LLM在十个不同的基准数据集上超过了最先进的算法相当大的幅度。此外，为了解决可解释性的挑战，我们提出了基于扰动的不确定性估计，以及一个校准方案来量化生成答案的置信度分数。我们的置信度测量在十个数据集中有七个达到了0.8或更高的AUC，用于预测LLM生成的答案的正确性。

更新时间: 2024-04-12 14:30:10

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.00589v2

Analyzing and Overcoming Local Optima in Complex Multi-Objective Optimization by Decomposition-Based Evolutionary Algorithms

When addressing the challenge of complex multi-objective optimization problems, particularly those with non-convex and non-uniform Pareto fronts, Decomposition-based Multi-Objective Evolutionary Algorithms (MOEADs) often converge to local optima, thereby limiting solution diversity. Despite its significance, this issue has received limited theoretical exploration. Through a comprehensive geometric analysis, we identify that the traditional method of Reference Point (RP) selection fundamentally contributes to this challenge. In response, we introduce an innovative RP selection strategy, the Weight Vector-Guided and Gaussian-Hybrid method, designed to overcome the local optima issue. This approach employs a novel RP type that aligns with weight vector directions and integrates a Gaussian distribution to combine three distinct RP categories. Our research comprises two main experimental components: an ablation study involving 14 algorithms within the MOEADs framework, spanning from 2014 to 2022, to validate our theoretical framework, and a series of empirical tests to evaluate the effectiveness of our proposed method against both traditional and cutting-edge alternatives. Results demonstrate that our method achieves remarkable improvements in both population diversity and convergence.

Updated: 2024-04-12 14:29:45

标题: 使用基于分解的进化算法分析和克服复杂多目标优化中的局部最优解

摘要: 在应对复杂的多目标优化问题，特别是具有非凸和非均匀帕累托前沿的问题时，基于分解的多目标进化算法（MOEADs）通常会收敛到局部最优解，从而限制了解决方案的多样性。尽管这个问题很重要，但它的理论探讨却受到了限制。通过全面的几何分析，我们确定传统的参考点（RP）选择方法在根本上造成了这一挑战。为此，我们引入了一种创新的RP选择策略，即权向量引导和高斯混合方法，旨在克服局部最优解问题。这种方法采用了一种新颖的RP类型，与权向量方向对齐，并集成了高斯分布以结合三种不同的RP类别。我们的研究包括两个主要的实验组成部分：一个包括14种算法的消融研究，涵盖了从2014年到2022年的MOEADs框架，以验证我们的理论框架；以及一系列经验测试，评估我们提出的方法对传统和尖端替代方法的有效性。结果表明，我们的方法在种群多样性和收敛方面取得了明显的改进。

更新时间: 2024-04-12 14:29:45

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2404.08501v1

On the Minimax Regret in Online Ranking with Top-k Feedback

In online ranking, a learning algorithm sequentially ranks a set of items and receives feedback on its ranking in the form of relevance scores. Since obtaining relevance scores typically involves human annotation, it is of great interest to consider a partial feedback setting where feedback is restricted to the top-$k$ items in the rankings. Chaudhuri and Tewari [2017] developed a framework to analyze online ranking algorithms with top $k$ feedback. A key element in their work was the use of techniques from partial monitoring. In this paper, we further investigate online ranking with top $k$ feedback and solve some open problems posed by Chaudhuri and Tewari [2017]. We provide a full characterization of minimax regret rates with the top $k$ feedback model for all $k$ and for the following ranking performance measures: Pairwise Loss, Discounted Cumulative Gain, and Precision@n. In addition, we give an efficient algorithm that achieves the minimax regret rate for Precision@n.

Updated: 2024-04-12 14:28:39

标题: 关于在线排名中带有Top-k反馈的最小后悔问题

摘要: 在线排名中，学习算法依次对一组项目进行排名，并以相关性评分的形式接收其排名的反馈。由于获取相关性评分通常涉及人工注释，因此考虑限制反馈仅针对排名中的前 $k$ 个项目的部分反馈设置具有很大的意义。Chaudhuri和Tewari [2017]开发了一个框架来分析带有top $k$反馈的在线排名算法。他们工作的一个关键元素是使用来自部分监控的技术。在本文中，我们进一步研究了带有top $k$反馈的在线排名，并解决了Chaudhuri和Tewari [2017]提出的一些未解问题。我们为所有 $k$ 和以下排名性能指标提供了top $k$反馈模型的极小化遗憾率的完整刻画：成对损失、折现累积增益和Precision@n。此外，我们提供了一个有效的算法，实现了Precision@n的极小化遗憾率。

更新时间: 2024-04-12 14:28:39

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2309.02425v2

Dataset Reset Policy Optimization for RLHF

Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a reward model from an offline preference dataset followed by running online RL to optimize the learned reward model. In this work, leveraging the idea of reset, we propose a new RLHF algorithm with provable guarantees. Motivated by the fact that offline preference dataset provides informative states (i.e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution. In theory, we show that DR-PO learns to perform at least as good as any policy that is covered by the offline dataset under general function approximation with finite sample complexity. In experiments, we demonstrate that on both the TL;DR summarization and the Anthropic Helpful Harmful (HH) dataset, the generation from DR-PO is better than that from Proximal Policy Optimization (PPO) and Direction Preference Optimization (DPO), under the metric of GPT4 win-rate. Code for this work can be found at https://github.com/Cornell-RL/drpo.

Updated: 2024-04-12 14:25:49

标题: RLHF的数据集重置策略优化

摘要: 强化学习（RL）从基于人类偏好的反馈中得到了广泛应用，是微调生成模型的一种流行范例，已经产生了令人印象深刻的模型，如GPT-4和Claude3 Opus。这个框架通常包括两个步骤：从脱机偏好数据集中学习奖励模型，然后运行在线RL来优化学习到的奖励模型。在这项工作中，借鉴重置的思想，我们提出了一种具有可证明保证的新的RLHF算法。受到脱机偏好数据集提供信息状态的启发（即标签者喜欢的数据），我们的新算法，Dataset Reset Policy Optimization（DR-PO），通过数据集重置将现有的脱机偏好数据集整合到在线策略训练过程中：它直接将策略优化器重置到脱机数据集中的状态，而不总是从初始状态分布开始。理论上，我们展示了DR-PO学习至少与在一般函数逼近和有限样本复杂度下被脱机数据集覆盖的任何策略一样好。在实验中，我们展示了在TL;DR摘要和人为有用有害（HH）数据集上，从DR-PO生成的效果比Proximal Policy Optimization（PPO）和Direction Preference Optimization（DPO）好，根据GPT4胜率指标。此工作的代码可在https://github.com/Cornell-RL/drpo找到。

更新时间: 2024-04-12 14:25:49

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.08495v1

Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation

Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow these disparities by supervise fine-tuning the mPLMs with multilingual data. However, obtaining labeled multilingual data is time-consuming, and fine-tuning mPLM with limited labeled multilingual data merely encapsulates the knowledge specific to the labeled data. Therefore, we introduce ALSACE to leverage the learned knowledge from the well-performing languages to guide under-performing ones within the same mPLM, eliminating the need for additional labeled multilingual data. Experiments show that ALSACE effectively mitigates language-level performance disparity across various mPLMs while showing the competitive performance on different multilingual NLU tasks, ranging from full resource to limited resource settings. The code for our approach is available at https://github.com/pkunlp-icler/ALSACE.

Updated: 2024-04-12 14:19:16

标题: 通过教师语言选择和跨语言自蒸馏缓解mPLMs中语言级性能差异

摘要: 大规模多语言预训练语言模型（mPLMs）在跨语言任务上表现出色，但同一mPLM中不同语言之间存在显著的性能差异。先前的研究试图通过使用多语言数据对mPLMs进行监督微调来缩小这些差异。然而，获取标记的多语言数据是耗时的，而使用有限标记的多语言数据对mPLM进行微调仅包含特定于标记数据的知识。因此，我们引入ALSACE来利用从表现良好的语言中学习的知识来指导同一mPLM中性能较差的语言，消除了对额外标记的多语言数据的需求。实验证明，ALSACE有效地减轻了各种mPLMs之间的语言级性能差异，同时在不同的多语言NLU任务上表现出竞争性能，从全资源到有限资源设置。我们的方法的代码可在https://github.com/pkunlp-icler/ALSACE上找到。

更新时间: 2024-04-12 14:19:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.08491v1

Multimodal Learning for Materials

Artificial intelligence is transforming computational materials science, improving the prediction of material properties, and accelerating the discovery of novel materials. Recently, publicly available material data repositories have grown rapidly. This growth encompasses not only more materials, but also a greater variety and quantity of their associated properties. Existing machine learning efforts in materials science focus primarily on single-modality tasks, i.e., relationships between materials and a single physical property, thus not taking advantage of the rich and multimodal set of material properties. Here, we introduce Multimodal Learning for Materials (MultiMat), which enables self-supervised multi-modality training of foundation models for materials. We demonstrate our framework's potential using data from the Materials Project database on multiple axes: (i) MultiMat achieves state-of-the-art performance for challenging material property prediction tasks; (ii) MultiMat enables novel and accurate material discovery via latent space similarity, enabling screening for stable materials with desired properties; and (iii) MultiMat encodes interpretable emergent features that may provide novel scientific insights.

Updated: 2024-04-12 14:17:34

标题: 多模式学习用于材料

摘要: 人工智能正在改变计算材料科学，提高材料性质预测的准确性，并加速新材料的发现。最近，公开可用的材料数据仓库迅速增长。这种增长不仅包括更多的材料，还包括更多种类和数量的相关性质。在材料科学中现有的机器学习工作主要集中在单模态任务，即材料与单一物理性质之间的关系，因此没有充分利用丰富的多模态材料性质集合。在这里，我们介绍了用于材料的多模态学习（MultiMat），它能够进行自监督的多模态训练，为材料基础模型提供支持。我们使用来自Materials Project数据库的数据，展示了我们的框架的潜力：（i）MultiMat在具有挑战性的材料性质预测任务中实现了最先进的性能；（ii）MultiMat通过潜在空间相似性实现了新颖且准确的材料发现，从而使得可以筛选具有所需特性的稳定材料；以及（iii）MultiMat编码了可解释的新兴特征，可能为提供新的科学见解。

更新时间: 2024-04-12 14:17:34

领域: cs.LG,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2312.00111v3

Semantic Communication for Cooperative Multi-Task Processing over Wireless Networks

In this paper, we have expanded the current status of semantic communication limited to processing one task to a more general system that can handle multiple tasks concurrently. In pursuit of this, we first introduced our definition of the "semantic source", enabling the interpretation of multiple semantics based on a single observation. A semantic encoder design is then introduced, featuring the division of the encoder into a common unit and multiple specific units enabling cooperative multi-task processing. Simulation results demonstrate the effectiveness of the proposed semantic source and the system design. Our approach employs information maximization (infomax) and end-to-end design principles.

Updated: 2024-04-12 14:03:41

标题: 无线网络上协同多任务处理的语义通信

摘要: 在本文中，我们将语义通信的当前状态从仅处理一项任务扩展到一个更通用的系统，可以同时处理多项任务。为了实现这一目标，我们首先介绍了我们对“语义源”的定义，从而使得可以基于单一观察解释多个语义。然后介绍了一种语义编码器设计，该设计将编码器分为一个通用单元和多个特定单元，从而实现合作式多任务处理。仿真结果表明了提出的语义源和系统设计的有效性。我们的方法采用了信息最大化（infomax）和端到端设计原则。

更新时间: 2024-04-12 14:03:41

领域: eess.SP,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2404.08483v1

A Quadratic Synchronization Rule for Distributed Deep Learning

In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models. Local gradient methods, such as Local SGD, address this issue by allowing workers to compute locally for $H$ steps without synchronizing with others, hence reducing communication frequency. While $H$ has been viewed as a hyperparameter to trade optimization efficiency for communication cost, recent research indicates that setting a proper $H$ value can lead to generalization improvement. Yet, selecting a proper $H$ is elusive. This work proposes a theory-grounded method for determining $H$, named the Quadratic Synchronization Rule (QSR), which recommends dynamically setting $H$ in proportion to $\frac{1}{\eta^2}$ as the learning rate $\eta$ decays over time. Extensive ImageNet experiments on ResNet and ViT show that local gradient methods with QSR consistently improve the test accuracy over other synchronization strategies. Compared with the standard data parallel training, QSR enables Local AdamW on ViT-B to cut the training time on 16 or 64 GPUs down from 26.7 to 20.2 hours or from 8.6 to 5.5 hours and, at the same time, achieves $1.16\%$ or $0.84\%$ higher top-1 validation accuracy.

Updated: 2024-04-12 13:59:01

标题: 一个用于分布式深度学习的二次同步规则

摘要: 在使用数据并行分布式深度学习中，每个训练步骤同步梯度可能会导致巨大的通信开销，特别是当许多节点一起训练大型模型时。本地梯度方法，如本地SGD，通过允许工作节点在不与其他节点同步的情况下本地计算$H$步骤来解决这个问题，从而减少通信频率。虽然$H$被视为一个超参数来权衡优化效率和通信成本，但最近的研究表明设置适当的$H$值可以提高泛化能力。然而，选择适当的$H$是棘手的。本文提出了一个基于理论的确定$H$的方法，命名为二次同步规则(QSR)，建议随着学习率$\eta$随时间衰减，动态地将$H$设置为$\frac{1}{\eta^2}$的比例。在ResNet和ViT上进行了大量的ImageNet实验，结果显示具有QSR的本地梯度方法始终比其他同步策略提高了测试准确性。与标准的数据并行训练相比，QSR使得在ViT-B上的本地AdamW能够将在16或64个GPU上的训练时间从26.7小时减少到20.2小时或从8.6小时减少到5.5小时，并且同时实现$1.16\%$或$0.84\%$更高的top-1验证准确性。

更新时间: 2024-04-12 13:59:01

领域: cs.LG

下载: http://arxiv.org/abs/2310.14423v2

Decoding AI: The inside story of data analysis in ChatGPT

As a result of recent advancements in generative AI, the field of Data Science is prone to various changes. This review critically examines the Data Analysis (DA) capabilities of ChatGPT assessing its performance across a wide range of tasks. While DA provides researchers and practitioners with unprecedented analytical capabilities, it is far from being perfect, and it is important to recognize and address its limitations.

Updated: 2024-04-12 13:57:30

标题: 解码人工智能：ChatGPT中数据分析的内幕故事

摘要: 由于生成式人工智能近期的进展，数据科学领域容易发生各种变化。这篇评论文章对ChatGPT的数据分析（DA）能力进行了批判性的审查，评估其在各种任务中的表现。虽然数据分析为研究人员和实践者提供了前所未有的分析能力，但它远非完美，识别和解决其局限性是至关重要的。

更新时间: 2024-04-12 13:57:30

领域: cs.LG,cs.CL,stat.CO

下载: http://arxiv.org/abs/2404.08480v1

Combining Statistical Depth and Fermat Distance for Uncertainty Quantification

We measure the Out-of-domain uncertainty in the prediction of Neural Networks using a statistical notion called ``Lens Depth'' (LD) combined with Fermat Distance, which is able to capture precisely the ``depth'' of a point with respect to a distribution in feature space, without any assumption about the form of distribution. Our method has no trainable parameter. The method is applicable to any classification model as it is applied directly in feature space at test time and does not intervene in training process. As such, it does not impact the performance of the original model. The proposed method gives excellent qualitative result on toy datasets and can give competitive or better uncertainty estimation on standard deep learning datasets compared to strong baseline methods.

Updated: 2024-04-12 13:54:21

标题: 结合统计深度和费马距离进行不确定性量化

摘要: 我们利用一种称为“透镜深度”（LD）结合费马距离的统计概念来衡量神经网络预测中的域外不确定性，这能够精确捕捉特征空间中点相对于分布的“深度”，而无需假设分布的形式。我们的方法没有可训练参数。该方法适用于任何分类模型，因为它直接应用于特征空间，在测试时不会干预训练过程。因此，它不会影响原始模型的性能。所提出的方法在玩具数据集上给出了优秀的定性结果，并在标准深度学习数据集上相比强基线方法能够提供竞争性或更好的不确定性估计。

更新时间: 2024-04-12 13:54:21

领域: stat.ML,cs.AI,cs.LG,math.PR,stat.AP

下载: http://arxiv.org/abs/2404.08476v1

Solving Parametric PDEs with Radial Basis Functions and Deep Neural Networks

We propose the POD-DNN, a novel algorithm leveraging deep neural networks (DNNs) along with radial basis functions (RBFs) in the context of the proper orthogonal decomposition (POD) reduced basis method (RBM), aimed at approximating the parametric mapping of parametric partial differential equations on irregular domains. The POD-DNN algorithm capitalizes on the low-dimensional characteristics of the solution manifold for parametric equations, alongside the inherent offline-online computational strategy of RBM and DNNs. In numerical experiments, POD-DNN demonstrates significantly accelerated computation speeds during the online phase. Compared to other algorithms that utilize RBF without integrating DNNs, POD-DNN substantially improves the computational speed in the online inference process. Furthermore, under reasonable assumptions, we have rigorously derived upper bounds on the complexity of approximating parametric mappings with POD-DNN, thereby providing a theoretical analysis of the algorithm's empirical performance.

Updated: 2024-04-12 13:47:07

标题: 用径向基函数和深度神经网络求解参数化偏微分方程

摘要: 我们提出了POD-DNN，这是一种新颖的算法，利用深度神经网络（DNNs）和径向基函数（RBFs）在适当正交分解（POD）降低基方法（RBM）的背景下，旨在近似不规则域上参数化偏微分方程的参数映射。POD-DNN算法利用参数方程解流形的低维特性，以及RBM和DNNs的固有离线-在线计算策略。在数值实验中，POD-DNN在在线阶段展现出显著加速的计算速度。与仅利用RBF而不整合DNNs的其他算法相比，POD-DNN在在线推理过程中大大提高了计算速度。此外，在合理假设下，我们已严格推导出用POD-DNN近似参数映射的复杂度的上界，从而为算法的实证表现提供了理论分析。

更新时间: 2024-04-12 13:47:07

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2404.06834v2

TSLANet: Rethinking Transformers for Time Series Representation Learning

Time series data, characterized by its intrinsic long and short-range dependencies, poses a unique challenge across analytical applications. While Transformer-based models excel at capturing long-range dependencies, they face limitations in noise sensitivity, computational efficiency, and overfitting with smaller datasets. In response, we introduce a novel Time Series Lightweight Adaptive Network (TSLANet), as a universal convolutional model for diverse time series tasks. Specifically, we propose an Adaptive Spectral Block, harnessing Fourier analysis to enhance feature representation and to capture both long-term and short-term interactions while mitigating noise via adaptive thresholding. Additionally, we introduce an Interactive Convolution Block and leverage self-supervised learning to refine the capacity of TSLANet for decoding complex temporal patterns and improve its robustness on different datasets. Our comprehensive experiments demonstrate that TSLANet outperforms state-of-the-art models in various tasks spanning classification, forecasting, and anomaly detection, showcasing its resilience and adaptability across a spectrum of noise levels and data sizes. The code is available at \url{https://github.com/emadeldeen24/TSLANet}

Updated: 2024-04-12 13:41:29

标题: TSLANet：重新思考变压器在时间序列表示学习中的应用

摘要: 时间序列数据以其固有的长短期依赖性而闻名，这在分析应用中提出了独特的挑战。虽然基于Transformer的模型擅长捕捉长期依赖关系，但在处理较小数据集时面临噪声敏感性、计算效率和过拟合等限制。为此，我们引入了一种全新的时间序列轻量级自适应网络(TSLANet)，作为用于各种时间序列任务的通用卷积模型。具体而言，我们提出了一种自适应谱块，利用傅里叶分析来增强特征表示，并通过自适应阈值处理来捕捉长期和短期交互作用，同时减轻噪声影响。此外，我们引入了一个交互式卷积块，并利用自监督学习来提高TSLANet对解码复杂时间模式的能力，并改善其在不同数据集上的鲁棒性。我们的全面实验表明，TSLANet在跨越分类、预测和异常检测等各种任务中优于最先进的模型，展示了它在各种噪声水平和数据规模下的弹性和适应性。代码可在\url{https://github.com/emadeldeen24/TSLANet}上找到。

更新时间: 2024-04-12 13:41:29

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.08472v1

Identification of a replicable optical security element using laser speckle

An optical security element containing an area of random rough relief is proposed. It combines the low cost of mass replication inherent in traditional security holograms with the impossibility of holographic copying, when the wave restored by the hologram is rewritten as a copy of this hologram. The proposed optical element is also protected from contact and photographic copying. Laboratory samples of optical elements were obtained by taking replicas of a rough surface. Identification of the authenticity of optical elements was demonstrated by calculating the cross-correlation of speckle patterns produced by coherent light scattered off different replicas. It is assumed that the proposed security elements can be mass-produced on standard equipment for embossing security holograms.

Updated: 2024-04-12 13:25:50

标题: 使用激光散斑识别可复制的光学安全元素

摘要: 提出了一种含有随机粗糙凸起区域的光学安全元件。它结合了传统安全全息图中固有的大规模复制的低成本和全息复制的不可能性，当被全息图还原的波被重新写入为该全息图的副本时。所提出的光学元件也受到接触和摄影复制的保护。通过复制粗糙表面获得了光学元件的实验室样本。通过计算由相干光散射在不同复制品上产生的散斑图案的互相关来识别光学元件的真实性。据推测，所提出的安全元件可以在用于压印安全全息图的标准设备上进行大规模生产。

更新时间: 2024-04-12 13:25:50

领域: cs.CR,physics.optics

下载: http://arxiv.org/abs/2404.08723v1

VADA: a Data-Driven Simulator for Nanopore Sequencing

Nanopore sequencing offers the ability for real-time analysis of long DNA sequences at a low cost, enabling new applications such as early detection of cancer. Due to the complex nature of nanopore measurements and the high cost of obtaining ground truth datasets, there is a need for nanopore simulators. Existing simulators rely on handcrafted rules and parameters and do not learn an internal representation that would allow for analysing underlying biological factors of interest. Instead, we propose VADA, a purely data-driven method for simulating nanopores based on an autoregressive latent variable model. We embed subsequences of DNA and introduce a conditional prior to address the challenge of a collapsing conditioning. We introduce an auxiliary regressor on the latent variable to encourage our model to learn an informative latent representation. We empirically demonstrate that our model achieves competitive simulation performance on experimental nanopore data. Moreover, we show we have learned an informative latent representation that is predictive of the DNA labels. We hypothesize that other biological factors of interest, beyond the DNA labels, can potentially be extracted from such a learned latent representation.

Updated: 2024-04-12 13:24:28

标题: VADA：一种用于纳米孔测序的数据驱动模拟器

摘要: 纳米孔测序技术提供了实时分析长DNA序列的能力，成本低廉，使得新的应用如癌症早期检测成为可能。由于纳米孔测量的复杂性以及获取真实数据集的高成本，需要纳米孔模拟器。现有的模拟器依赖手工制定的规则和参数，无法学习允许分析感兴趣的潜在生物因素的内部表示。相反，我们提出了VADA，一种基于自回归潜变量模型的纯数据驱动方法来模拟纳米孔。我们嵌入DNA的子序列，并引入条件先验以解决条件的崩溃挑战。我们在潜变量上引入了一个辅助回归器，以鼓励我们的模型学习一个信息丰富的潜在表示。我们在实验性纳米孔数据上实验证明，我们的模型在模拟性能上达到了竞争水平。此外，我们展示了我们已经学习了一个有预测性的DNA标签的信息丰富的潜在表示。我们假设，除了DNA标签之外，其他感兴趣的生物因素有可能从这样学习到的潜在表示中提取出来。

更新时间: 2024-04-12 13:24:28

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2404.08722v1

OTTER: Improving Zero-Shot Classification via Optimal Transport

Popular zero-shot models suffer due to artifacts inherited from pretraining. A particularly detrimental artifact, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have incompatible requirements such as access to labeled downstream task data or knowledge of the true label balance in the pretraining distribution. We sidestep these challenges and introduce a simple and lightweight approach to adjust pretrained model predictions via optimal transport. Our technique requires only an estimate of the label distribution of a downstream task. Theoretically, we characterize the improvement produced by our procedure under certain mild conditions and provide bounds on the error caused by misspecification. Empirically, we validate our method in a wide array of zero-shot image and text classification tasks, improving accuracy by 4.8% and 15.9% on average, and beating baselines like Prior Matching -- often by significant margins -- in 17 out of 21 datasets.

Updated: 2024-04-12 13:18:47

标题: OTTER: 通过最优传输改进零样本分类

摘要: 流行的零-shot模型由于继承自预训练的人为因素而受到影响。一个特别有害的人为因素，是由于不平衡的网络规模预训练数据所导致的不匹配的标签分布。现有的修复标签分布的方法在零-shot设置中并不适用，因为它们具有不兼容的要求，比如需要访问标记的下游任务数据或者需要知道预训练分布中真实标签平衡的知识。我们避开了这些挑战，并引入了一种简单且轻量级的通过最优传输调整预训练模型预测的方法。我们的技术只需要一个下游任务标签分布的估计。在理论上，我们描述了在某些温和条件下我们的程序产生的改进，并提供了由错误引起的误差的界限。在实证上，我们验证了我们的方法在广泛的零-shot图像和文本分类任务中，平均提高了4.8%和15.9%的准确性，并且在21个数据集中的17个中击败了像Prior Matching这样的基线模型，通常优势明显。

更新时间: 2024-04-12 13:18:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.08461v1

Unsupervised Learning of Group Invariant and Equivariant Representations

Equivariant neural networks, whose hidden features transform according to representations of a group G acting on the data, exhibit training efficiency and an improved generalisation performance. In this work, we extend group invariant and equivariant representation learning to the field of unsupervised deep learning. We propose a general learning strategy based on an encoder-decoder framework in which the latent representation is separated in an invariant term and an equivariant group action component. The key idea is that the network learns to encode and decode data to and from a group-invariant representation by additionally learning to predict the appropriate group action to align input and output pose to solve the reconstruction task. We derive the necessary conditions on the equivariant encoder, and we present a construction valid for any G, both discrete and continuous. We describe explicitly our construction for rotations, translations and permutations. We test the validity and the robustness of our approach in a variety of experiments with diverse data types employing different network architectures.

Updated: 2024-04-12 13:16:54

标题: 无监督学习的群不变和等变表示

摘要: Equivariant神经网络的隐藏特征根据作用于数据的G群表示进行变换，表现出训练效率和改进的泛化性能。在这项工作中，我们将群不变和等变表示学习扩展到无监督深度学习领域。我们提出了一种基于编码器-解码器框架的通用学习策略，其中潜在表示被分为不变项和等变群作用组件。关键思想是网络通过另外学习预测适当的群作用来对齐输入和输出姿势，从而解决重建任务。我们推导出等变编码器的必要条件，并提出了适用于任何G（离散和连续）的构造。我们明确描述了我们的构造对于旋转、平移和排列的情况。我们通过使用不同的网络架构进行各种实验来测试我们方法的有效性和鲁棒性。

更新时间: 2024-04-12 13:16:54

领域: cs.LG

下载: http://arxiv.org/abs/2202.07559v3

Beyond One-Size-Fits-All: Adapting Counterfactual Explanations to User Objectives

Explainable Artificial Intelligence (XAI) has emerged as a critical area of research aimed at enhancing the transparency and interpretability of AI systems. Counterfactual Explanations (CFEs) offer valuable insights into the decision-making processes of machine learning algorithms by exploring alternative scenarios where certain factors differ. Despite the growing popularity of CFEs in the XAI community, existing literature often overlooks the diverse needs and objectives of users across different applications and domains, leading to a lack of tailored explanations that adequately address the different use cases. In this paper, we advocate for a nuanced understanding of CFEs, recognizing the variability in desired properties based on user objectives and target applications. We identify three primary user objectives and explore the desired characteristics of CFEs in each case. By addressing these differences, we aim to design more effective and tailored explanations that meet the specific needs of users, thereby enhancing collaboration with AI systems.

Updated: 2024-04-12 13:11:55

标题: 超越一刀切：将反事实解释调整到用户目标

摘要: 可解释人工智能（XAI）已成为一个关键的研究领域，旨在增强人工智能系统的透明度和可解释性。反事实解释（CFEs）通过探索某些因素不同的替代场景，为机器学习算法的决策过程提供了宝贵的见解。尽管CFEs在XAI社区中越来越受欢迎，但现有文献往往忽视了不同应用和领域用户的多样化需求和目标，导致缺乏能够充分满足不同用例的定制解释。在本文中，我们主张对CFEs进行细致的理解，认识到基于用户目标和目标应用的期望属性的可变性。我们确定了三个主要用户目标，并探讨了在每种情况下CFEs的期望特征。通过解决这些差异，我们旨在设计更有效和定制的解释，以满足用户的具体需求，从而增强与人工智能系统的协作。

更新时间: 2024-04-12 13:11:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.08721v1

On the Independence Assumption in Neurosymbolic Learning

State-of-the-art neurosymbolic learning systems use probabilistic reasoning to guide neural networks towards predictions that conform to logical constraints over symbols. Many such systems assume that the probabilities of the considered symbols are conditionally independent given the input to simplify learning and reasoning. We study and criticise this assumption, highlighting how it can hinder optimisation and prevent uncertainty quantification. We prove that loss functions bias conditionally independent neural networks to become overconfident in their predictions. As a result, they are unable to represent uncertainty over multiple valid options. Furthermore, we prove that these loss functions are difficult to optimise: they are non-convex, and their minima are usually highly disconnected. Our theoretical analysis gives the foundation for replacing the conditional independence assumption and designing more expressive neurosymbolic probabilistic models.

Updated: 2024-04-12 13:09:48

标题: 关于神经符号学习中的独立性假设

摘要: 目前最先进的神经符号学习系统使用概率推理来引导神经网络向符号上的逻辑约束预测。许多这样的系统假设考虑的符号在给定输入的情况下是条件独立的，以简化学习和推理过程。我们对这一假设进行了研究和批评，强调它如何阻碍了优化并阻止了不确定性的量化。我们证明了损失函数会导致条件独立神经网络在其预测中变得过于自信。因此，它们无法表示对多个有效选项的不确定性。此外，我们证明这些损失函数很难优化：它们是非凸的，它们的最小值通常是高度断开的。我们的理论分析为替换条件独立假设和设计更具表现力的神经符号概率模型奠定了基础。

更新时间: 2024-04-12 13:09:48

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.08458v1

A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

In this work, we propose a novel backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations (BSDEs), where the deep neural network (DNN) models are trained not only on the inputs and labels but also the differentials of the corresponding labels. This is motivated by the fact that differential deep learning can provide an efficient approximation of the labels and their derivatives with respect to inputs. The BSDEs are reformulated as differential deep learning problems by using Malliavin calculus. The Malliavin derivatives of solution to a BSDE satisfy themselves another BSDE, resulting thus in a system of BSDEs. Such formulation requires the estimation of the solution, its gradient, and the Hessian matrix, represented by the triple of processes $\left(Y, Z, \Gamma\right).$ All the integrals within this system are discretized by using the Euler-Maruyama method. Subsequently, DNNs are employed to approximate the triple of these unknown processes. The DNN parameters are backwardly optimized at each time step by minimizing a differential learning type loss function, which is defined as a weighted sum of the dynamics of the discretized BSDE system, with the first term providing the dynamics of the process $Y$ and the other the process $Z$. An error analysis is carried out to show the convergence of the proposed algorithm. Various numerical experiments up to $50$ dimensions are provided to demonstrate the high efficiency. Both theoretically and numerically, it is demonstrated that our proposed scheme is more efficient compared to other contemporary deep learning-based methodologies, especially in the computation of the process $\Gamma$.

Updated: 2024-04-12 13:05:35

标题: 一种用于解决高维非线性反向随机微分方程的反向差分深度学习算法

摘要: 在这项工作中，我们提出了一种新颖的基于反向微分深度学习的算法，用于解决高维非线性反向随机微分方程（BSDEs），其中深度神经网络（DNN）模型不仅在输入和标签上进行训练，还在相应标签的微分上进行训练。这是因为微分深度学习可以有效地近似标签及其相对于输入的导数。BSDEs被重构为微分深度学习问题，通过使用Malliavin微积分。BSDE的解的Malliavin导数满足另一个BSDE，从而导致一组BSDE。这种表述需要估计解、其梯度和Hessian矩阵，由过程三元组$\left(Y, Z, \Gamma\right)$表示。在这个系统中的所有积分都通过使用Euler-Maruyama方法离散化。随后，DNN被用来逼近这三个未知过程。DNN参数在每个时间步被反向优化，通过最小化微分学习类型的损失函数，该损失函数被定义为离散BSDE系统的动态的加权和，第一项提供了过程$Y$的动态，另一项提供了过程$Z$的动态。进行了误差分析以展示所提出算法的收敛性。提供了多个高达50维的数值实验，以展示高效性。从理论上和数值上都证明了，我们提出的方案在计算过程$\Gamma$方面比其他当代基于深度学习的方法更有效。

更新时间: 2024-04-12 13:05:35

领域: math.NA,cs.LG,cs.NA,q-fin.CP,65C30, 68T07, 60H07, 91G20

下载: http://arxiv.org/abs/2404.08456v1

Lightweight Multi-System Multivariate Interconnection and Divergence Discovery

Identifying outlier behavior among sensors and subsystems is essential for discovering faults and facilitating diagnostics in large systems. At the same time, exploring large systems with numerous multivariate data sets is challenging. This study presents a lightweight interconnection and divergence discovery mechanism (LIDD) to identify abnormal behavior in multi-system environments. The approach employs a multivariate analysis technique that first estimates the similarity heatmaps among the sensors for each system and then applies information retrieval algorithms to provide relevant multi-level interconnection and discrepancy details. Our experiment on the readout systems of the Hadron Calorimeter of the Compact Muon Solenoid (CMS) experiment at CERN demonstrates the effectiveness of the proposed method. Our approach clusters readout systems and their sensors consistent with the expected calorimeter interconnection configurations, while capturing unusual behavior in divergent clusters and estimating their root causes.

Updated: 2024-04-12 13:02:33

标题: 轻量级多系统多元互联和差异发现

摘要: 在大型系统中，识别传感器和子系统之间的异常行为对于发现故障并促进诊断至关重要。同时，在具有大量多变量数据集的大型系统中进行探索是具有挑战性的。本研究提出了一种轻量级互连和差异发现机制（LIDD），用于识别多系统环境中的异常行为。该方法采用多变量分析技术，首先估计每个系统传感器之间的相似性热图，然后应用信息检索算法提供相关的多级互连和差异细节。我们在欧洲核子研究组织（CERN）的紧凑型μ子线圈(CMS)实验的Hadron量能器读出系统上进行的实验证明了所提出方法的有效性。我们的方法将读出系统和它们的传感器聚类，与预期的量能器互连配置一致，同时捕获发散簇中的异常行为并估计其根本原因。

更新时间: 2024-04-12 13:02:33

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.08453v1

Federated Optimization with Doubly Regularized Drift Correction

Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while keeping the data localized. The standard method, FedAvg, suffers from client drift which can hamper performance and increase communication costs over centralized methods. Previous works proposed various strategies to mitigate drift, yet none have shown uniformly improved communication-computation trade-offs over vanilla gradient descent. In this work, we revisit DANE, an established method in distributed optimization. We show that (i) DANE can achieve the desired communication reduction under Hessian similarity constraints. Furthermore, (ii) we present an extension, DANE+, which supports arbitrary inexact local solvers and has more freedom to choose how to aggregate the local updates. We propose (iii) a novel method, FedRed, which has improved local computational complexity and retains the same communication complexity compared to DANE/DANE+. This is achieved by using doubly regularized drift correction.

Updated: 2024-04-12 12:57:43

标题: 使用双重正则化漂移校正进行联合优化

摘要: 联邦学习是一种分布式优化范式，允许在去中心化设备上训练机器学习模型，同时保持数据本地化。标准方法FedAvg受到客户端漂移的影响，可能影响性能并增加与集中式方法相比的通信成本。先前的研究提出了各种策略来减轻漂移，然而没有一个表现出在普通梯度下降上统一改进通信-计算折衷的优势。在这项工作中，我们重新审视了分布式优化中的一种成熟方法DANE。我们展示了（i）在Hessian相似性约束下，DANE可以实现所需的通信减少。此外，（ii）我们提出了一个扩展方法DANE+，支持任意不准确的本地求解器，并具有更多选择如何聚合本地更新的自由。我们提出了（iii）一种新颖的方法FedRed，它具有改进的本地计算复杂性，与DANE/DANE+相比保持相同的通信复杂性。这是通过使用双重正则化漂移校正实现的。

更新时间: 2024-04-12 12:57:43

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2404.08447v1

Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation

State-of-the-art methods for conditional average treatment effect (CATE) estimation make widespread use of representation learning. Here, the idea is to reduce the variance of the low-sample CATE estimation by a (potentially constrained) low-dimensional representation. However, low-dimensional representations can lose information about the observed confounders and thus lead to bias, because of which the validity of representation learning for CATE estimation is typically violated. In this paper, we propose a new, representation-agnostic refutation framework for estimating bounds on the representation-induced confounding bias that comes from dimensionality reduction (or other constraints on the representations) in CATE estimation. First, we establish theoretically under which conditions CATE is non-identifiable given low-dimensional (constrained) representations. Second, as our remedy, we propose a neural refutation framework which performs partial identification of CATE or, equivalently, aims at estimating lower and upper bounds of the representation-induced confounding bias. We demonstrate the effectiveness of our bounds in a series of experiments. In sum, our refutation framework is of direct relevance in practice where the validity of CATE estimation is of importance.

Updated: 2024-04-12 12:57:40

标题: 对于治疗效应估计的表示诱导混淆偏差的界限

摘要: 目前关于条件平均治疗效应（CATE）估计的最先进方法广泛使用表示学习。在这里，想法是通过（可能受限）低维度表示来减少低样本CATE估计的方差。然而，低维表示可能会丢失关于观察到的混淆因素的信息，从而导致偏差，因此表示学习用于CATE估计的有效性通常受到违反。在本文中，我们提出了一个新的、与表示无关的反驳框架，用于估计由维度约束引起的表示诱发的混淆偏差的上下界，用于CATE估计。首先，我们在理论上建立了在哪些条件下CATE在给定低维（受限）表示时是不可辨识的。其次，作为我们的补救措施，我们提出了一个神经反驳框架，该框架执行部分CATE的识别，或等效地，旨在估计表示诱发的混淆偏差的下限和上限。我们通过一系列实验展示了我们边界的有效性。总之，我们的反驳框架在实践中直接相关，特别在CATE估计的有效性至关重要的情况下。

更新时间: 2024-04-12 12:57:40

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.11321v3

Anti-Byzantine Attacks Enabled Vehicle Selection for Asynchronous Federated Learning in Vehicular Edge Computing

In vehicle edge computing (VEC), asynchronous federated learning (AFL) is used, where the edge receives a local model and updates the global model, effectively reducing the global aggregation latency.Due to different amounts of local data,computing capabilities and locations of the vehicles, renewing the global model with same weight is inappropriate.The above factors will affect the local calculation time and upload time of the local model, and the vehicle may also be affected by Byzantine attacks, leading to the deterioration of the vehicle data. However, based on deep reinforcement learning (DRL), we can consider these factors comprehensively to eliminate vehicles with poor performance as much as possible and exclude vehicles that have suffered Byzantine attacks before AFL. At the same time, when aggregating AFL, we can focus on those vehicles with better performance to improve the accuracy and safety of the system. In this paper, we proposed a vehicle selection scheme based on DRL in VEC. In this scheme, vehicle s mobility, channel conditions with temporal variations, computational resources with temporal variations, different data amount, transmission channel status of vehicles as well as Byzantine attacks were taken into account.Simulation results show that the proposed scheme effectively improves the safety and accuracy of the global model.

Updated: 2024-04-12 12:56:16

标题: 反拜占廷攻击使异步联邦学习在车载边缘计算中选择车辆

摘要: 在车载边缘计算（VEC）中，使用异步联邦学习（AFL），其中边缘接收本地模型并更新全局模型，有效降低全局聚合延迟。由于车辆的本地数据量、计算能力和位置不同，使用相同权重更新全局模型是不合适的。上述因素将影响本地计算时间和本地模型的上传时间，车辆也可能受到拜占庭攻击的影响，导致车辆数据的恶化。然而，基于深度强化学习（DRL），我们可以全面考虑这些因素，尽可能消除性能较差的车辆，并在AFL之前排除受到拜占庭攻击的车辆。同时，在聚合AFL时，我们可以专注于那些性能更好的车辆，以提高系统的准确性和安全性。本文提出了一种基于DRL的VEC中的车辆选择方案。在这个方案中，考虑了车辆的移动性、具有时间变化的信道条件、具有时间变化的计算资源、不同的数据量、车辆的传输信道状态以及拜占庭攻击。模拟结果表明，所提出的方案有效提高了全局模型的安全性和准确性。

更新时间: 2024-04-12 12:56:16

领域: cs.LG

下载: http://arxiv.org/abs/2404.08444v1

Calibration of Continual Learning Models

Continual Learning (CL) focuses on maximizing the predictive performance of a model across a non-stationary stream of data. Unfortunately, CL models tend to forget previous knowledge, thus often underperforming when compared with an offline model trained jointly on the entire data stream. Given that any CL model will eventually make mistakes, it is of crucial importance to build calibrated CL models: models that can reliably tell their confidence when making a prediction. Model calibration is an active research topic in machine learning, yet to be properly investigated in CL. We provide the first empirical study of the behavior of calibration approaches in CL, showing that CL strategies do not inherently learn calibrated models. To mitigate this issue, we design a continual calibration approach that improves the performance of post-processing calibration methods over a wide range of different benchmarks and CL strategies. CL does not necessarily need perfect predictive models, but rather it can benefit from reliable predictive models. We believe our study on continual calibration represents a first step towards this direction.

Updated: 2024-04-12 12:33:26

标题: 持续学习模型的校准

摘要: Continual Learning (CL)专注于在一个非定态数据流中最大化模型的预测性能。不幸的是，CL模型往往会忘记先前的知识，因此在与在整个数据流上联合训练的离线模型进行比较时，通常表现不佳。鉴于任何CL模型最终都会犯错误，构建校准的CL模型至关重要：这些模型在进行预测时能可靠地告知其置信度。模型校准是机器学习中的一个活跃研究课题，但在CL中尚未得到适当的研究。我们提供了对校准方法在CL中行为的第一次实证研究，表明CL策略并不会本质上学习到校准的模型。为了缓解这个问题，我们设计了一种持续校准方法，可以提高后处理校准方法在各种不同基准和CL策略上的性能。CL并不一定需要完美的预测模型，而是可以受益于可靠的预测模型。我们相信我们对持续校准的研究代表了朝着这个方向迈出的第一步。

更新时间: 2024-04-12 12:33:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.07817v2

An improved tabular data generator with VAE-GMM integration

The rising use of machine learning in various fields requires robust methods to create synthetic tabular data. Data should preserve key characteristics while addressing data scarcity challenges. Current approaches based on Generative Adversarial Networks, such as the state-of-the-art CTGAN model, struggle with the complex structures inherent in tabular data. These data often contain both continuous and discrete features with non-Gaussian distributions. Therefore, we propose a novel Variational Autoencoder (VAE)-based model that addresses these limitations. Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture. This avoids the limitations imposed by assuming a strictly Gaussian latent space, allowing for a more accurate representation of the underlying data distribution during data generation. Furthermore, our model offers enhanced flexibility by allowing the use of various differentiable distributions for individual features, making it possible to handle both continuous and discrete data types. We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones, based on their resemblance and utility. This evaluation demonstrates significant outperformance against CTGAN and TVAE, establishing its potential as a valuable tool for generating synthetic tabular data in various domains, particularly in healthcare.

Updated: 2024-04-12 12:31:06

标题: 一个集成VAE-GMM的改进表格数据生成器

摘要: 在各个领域中，机器学习的不断应用需要强大的方法来创建合成表格数据。数据应该保留关键特征，同时解决数据稀缺挑战。目前基于生成对抗网络（GAN）的方法，如最先进的CTGAN模型，在处理表格数据中固有的复杂结构时遇到困难。这些数据往往包含具有非高斯分布的连续和离散特征。因此，我们提出了一种基于变分自动编码器（VAE）的新型模型，以解决这些限制。受到TVAE模型的启发，我们的方法在VAE架构中结合了贝叶斯高斯混合模型（BGM）。这避免了假定严格高斯潜在空间所施加的限制，允许在数据生成过程中更准确地表示底层数据分布。此外，我们的模型通过允许使用各种可微分分布来处理各个特征，提供了增强的灵活性，从而能够处理连续和离散数据类型。我们在三个混合数据类型的真实世界数据集上对模型进行了彻底验证，其中包括两个具有医学相关性的数据集，基于它们的相似性和实用性。这种评估表明，我们的模型在CTGAN和TVAE方面取得了显著的优势，证实了其作为在各个领域特别是在医疗保健领域生成合成表格数据的有价值工具的潜力。

更新时间: 2024-04-12 12:31:06

领域: cs.LG,cs.AI,I.2.1

下载: http://arxiv.org/abs/2404.08434v1

Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation

Prompt Tuning is emerging as a scalable and cost-effective method to fine-tune Pretrained Language Models (PLMs), which are often referred to as Large Language Models (LLMs). This study benchmarks the performance and computational efficiency of Prompt Tuning and baselines for multi-label text classification. This is applied to the challenging task of classifying companies into an investment firm's proprietary industry taxonomy, supporting their thematic investment strategy. Text-to-text classification is frequently reported to outperform task-specific classification heads, but has several limitations when applied to a multi-label classification problem where each label consists of multiple tokens: (a) Generated labels may not match any label in the label taxonomy; (b) The fine-tuning process lacks permutation invariance and is sensitive to the order of the provided labels; (c) The model provides binary decisions rather than appropriate confidence scores. Limitation (a) is addressed by applying constrained decoding using Trie Search, which slightly improves classification performance. All limitations (a), (b), and (c) are addressed by replacing the PLM's language head with a classification head, which is referred to as Prompt Tuned Embedding Classification (PTEC). This improves performance significantly, while also reducing computational costs during inference. In our industrial application, the training data is skewed towards well-known companies. We confirm that the model's performance is consistent across both well-known and less-known companies. Our overall results indicate the continuing need to adapt state-of-the-art methods to domain-specific tasks, even in the era of PLMs with strong generalization abilities. We release our codebase and a benchmarking dataset at https://github.com/EQTPartners/PTEC.

Updated: 2024-04-12 12:25:50

标题: 快速调整的嵌入分类方法用于多标签行业部门分配

摘要: 快速调整正逐渐成为一种可扩展且具有成本效益的方法，用于微调预训练语言模型（PLMs），通常被称为大型语言模型（LLMs）。本研究对Prompt Tuning和多标签文本分类的基准性能和计算效率进行了评估。这被应用于将公司分类为投资公司专有的行业分类法的挑战性任务，支持其主题投资策略。文本到文本分类经常被报道能够优于特定任务分类头，但在应用于每个标签由多个标记构成的多标签分类问题时存在一些限制：（a）生成的标签可能不匹配标签分类法中的任何标签；（b）微调过程缺乏置换不变性，并且对提供的标签顺序敏感；（c）模型提供二进制决策而不是适当的置信度分数。限制（a）通过应用使用Trie搜索的受限解码来解决，这略微改善了分类性能。通过用分类头替换PLM的语言头来解决所有限制（a）、（b）和（c），这被称为Prompt Tuned Embedding Classification（PTEC）。这显著提高了性能，同时在推断期间减少了计算成本。在我们的工业应用中，训练数据偏向于知名公司。我们确认模型的性能在知名和不太知名的公司之间保持一致。我们的总体结果表明，即使在具有强大泛化能力的PLMs时代，仍然有必要将最新的方法调整为特定领域的任务。我们在https://github.com/EQTPartners/PTEC发布了我们的代码库和基准数据集。

更新时间: 2024-04-12 12:25:50

领域: cs.CL,cs.AI,68T50,I.2.7; I.2.0

下载: http://arxiv.org/abs/2309.12075v3

Adversarially Robust Spiking Neural Networks Through Conversion

Spiking neural networks (SNNs) provide an energy-efficient alternative to a variety of artificial neural network (ANN) based AI applications. As the progress in neuromorphic computing with SNNs expands their use in applications, the problem of adversarial robustness of SNNs becomes more pronounced. To the contrary of the widely explored end-to-end adversarial training based solutions, we address the limited progress in scalable robust SNN training methods by proposing an adversarially robust ANN-to-SNN conversion algorithm. Our method provides an efficient approach to embrace various computationally demanding robust learning objectives that have been proposed for ANNs. During a post-conversion robust finetuning phase, our method adversarially optimizes both layer-wise firing thresholds and synaptic connectivity weights of the SNN to maintain transferred robustness gains from the pre-trained ANN. We perform experimental evaluations in a novel setting proposed to rigorously assess the robustness of SNNs, where numerous adaptive adversarial attacks that account for the spike-based operation dynamics are considered. Results show that our approach yields a scalable state-of-the-art solution for adversarially robust deep SNNs with low-latency.

Updated: 2024-04-12 12:18:19

标题: 通过转换实现对抗鲁棒的脉冲神经网络

摘要: 尖峰神经网络（SNNs）提供了一种节能的替代方案，可用于各种基于人工神经网络（ANN）的人工智能应用。随着具有SNNs的神经形态计算在应用中的扩展，SNNs的对抗鲁棒性问题变得更加突出。与广泛探讨的基于端到端对抗训练的解决方案相反，我们提出了一种对抗鲁棒的ANN-to-SNN转换算法，以解决可扩展鲁棒SNN训练方法方面的有限进展。我们的方法为采用各种计算密集型鲁棒学习目标提供了高效的方法，这些目标已被提出用于ANNs。在转换后的鲁棒微调阶段，我们的方法通过对SNN的逐层发射阈值和突触连接权重进行对抗优化，以保持从预训练的ANN传递的鲁棒性收益。我们在一个新颖的设置中进行实验评估，用于严格评估SNNs的鲁棒性，在这个设置中考虑了许多适应基于尖峰操作动态的对抗攻击。结果显示，我们的方法提供了一种可扩展的最先进解决方案，用于具有低延迟的对抗鲁棒深度SNNs。

更新时间: 2024-04-12 12:18:19

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.09266v2

Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task

Intention-based Human-Robot Interaction (HRI) systems allow robots to perceive and interpret user actions to proactively interact with humans and adapt to their behavior. Therefore, intention prediction is pivotal in creating a natural interactive collaboration between humans and robots. In this paper, we examine the use of Large Language Models (LLMs) for inferring human intention during a collaborative object categorization task with a physical robot. We introduce a hierarchical approach for interpreting user non-verbal cues, like hand gestures, body poses, and facial expressions and combining them with environment states and user verbal cues captured using an existing Automatic Speech Recognition (ASR) system. Our evaluation demonstrates the potential of LLMs to interpret non-verbal cues and to combine them with their context-understanding capabilities and real-world knowledge to support intention prediction during human-robot interaction.

Updated: 2024-04-12 12:15:14

标题: 比较苹果和橙子：LLM驱动的多模式意图预测在物体分类任务中的应用

摘要: 基于意图的人机交互（HRI）系统允许机器人感知和解释用户动作，以主动与人类互动并适应其行为。因此，意图预测对于创建人类与机器人之间自然互动合作至关重要。在本文中，我们研究了在与物理机器人进行协作对象分类任务时，利用大型语言模型（LLMs）推断人类意图的方法。我们引入了一种分层方法，用于解释用户的非语言线索，如手势、身体姿势和面部表情，并将其与使用现有自动语音识别（ASR）系统捕获的环境状态和用户语言线索相结合。我们的评估表明，LLMs有潜力解释非语言线索，并将其与上下文理解能力和现实世界知识相结合，支持在人机交互过程中进行意图预测。

更新时间: 2024-04-12 12:15:14

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2404.08424v1

SIR-RL: Reinforcement Learning for Optimized Policy Control during Epidemiological Outbreaks in Emerging Market and Developing Economies

The outbreak of COVID-19 has highlighted the intricate interplay between public health and economic stability on a global scale. This study proposes a novel reinforcement learning framework designed to optimize health and economic outcomes during pandemics. The framework leverages the SIR model, integrating both lockdown measures (via a stringency index) and vaccination strategies to simulate disease dynamics. The stringency index, indicative of the severity of lockdown measures, influences both the spread of the disease and the economic health of a country. Developing nations, which bear a disproportionate economic burden under stringent lockdowns, are the primary focus of our study. By implementing reinforcement learning, we aim to optimize governmental responses and strike a balance between the competing costs associated with public health and economic stability. This approach also enhances transparency in governmental decision-making by establishing a well-defined reward function for the reinforcement learning agent. In essence, this study introduces an innovative and ethical strategy to navigate the challenge of balancing public health and economic stability amidst infectious disease outbreaks.

Updated: 2024-04-12 12:11:51

标题: SIR-RL：新兴市场和发展中国家流行病爆发期间优化政策控制的强化学习

摘要: COVID-19疫情的爆发突显了全球范围内公共卫生和经济稳定之间复杂的相互作用。本研究提出了一种新颖的强化学习框架，旨在优化疫情期间的健康和经济结果。该框架利用SIR模型，整合了封锁措施（通过一种严格度指数）和疫苗接种策略来模拟疾病动态。严格度指数表明封锁措施的严重程度，影响疾病传播和一个国家的经济健康。发展中国家，在严格封锁措施下承担不成比例的经济负担，是我们研究的主要关注对象。通过实施强化学习，我们旨在优化政府应对措施，平衡与公共卫生和经济稳定相关的竞争成本。这种方法还通过为强化学习代理建立明确定义的奖励函数，增强了政府决策的透明度。本研究引入了一种创新和道德的策略，以应对在传染病爆发中平衡公共卫生和经济稳定的挑战。

更新时间: 2024-04-12 12:11:51

领域: cs.LG,physics.soc-ph,q-bio.PE

下载: http://arxiv.org/abs/2404.08423v1

AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees

Large language models (LLMs) are increasingly capable of completing knowledge intensive tasks by recalling information from a static pretraining corpus. Here we are concerned with LLMs in the context of evolving data requirements. For instance: batches of new data that are introduced periodically; subsets of data with user-based access controls; or requirements on dynamic removal of documents with guarantees that associated knowledge cannot be recalled. We wish to satisfy these requirements while at the same time ensuring a model does not forget old information when new data becomes available. To address these issues, we introduce AdapterSwap, a training and inference scheme that organizes knowledge from a data collection into a set of low-rank adapters, which are dynamically composed during inference. Our experiments demonstrate AdapterSwap's ability to support efficient continual learning, while also enabling organizations to have fine-grained control over data access and deletion.

Updated: 2024-04-12 12:06:02

标题: AdapterSwap：具有数据删除和访问控制保证的LLM连续训练

摘要: 大型语言模型（LLMs）越来越能够通过回忆静态预训练语料库中的信息来完成知识密集型任务。在这里，我们关注LLMs在不断变化的数据需求背景下的情况。例如：定期引入的新数据批次；具有基于用户的访问控制的数据子集；或要求动态删除文档并保证相关知识无法被回忆。我们希望在新数据可用时满足这些需求的同时，确保模型不会忘记旧信息。为了解决这些问题，我们引入了AdapterSwap，这是一种训练和推断方案，它将数据集中的知识组织成一组低秩适配器，在推断过程中动态组合。我们的实验证明AdapterSwap能够支持高效的持续学习，同时也使组织能够对数据访问和删除进行细粒度控制。

更新时间: 2024-04-12 12:06:02

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.08417v1

Contrastive Graph Pooling for Explainable Classification of Brain Networks

Functional magnetic resonance imaging (fMRI) is a commonly used technique to measure neural activation. Its application has been particularly important in identifying underlying neurodegenerative conditions such as Parkinson's, Alzheimer's, and Autism. Recent analysis of fMRI data models the brain as a graph and extracts features by graph neural networks (GNNs). However, the unique characteristics of fMRI data require a special design of GNN. Tailoring GNN to generate effective and domain-explainable features remains challenging. In this paper, we propose a contrastive dual-attention block and a differentiable graph pooling method called ContrastPool to better utilize GNN for brain networks, meeting fMRI-specific requirements. We apply our method to 5 resting-state fMRI brain network datasets of 3 diseases and demonstrate its superiority over state-of-the-art baselines. Our case study confirms that the patterns extracted by our method match the domain knowledge in neuroscience literature, and disclose direct and interesting insights. Our contributions underscore the potential of ContrastPool for advancing the understanding of brain networks and neurodegenerative conditions. The source code is available at https://github.com/AngusMonroe/ContrastPool.

Updated: 2024-04-12 12:05:57

标题: 对比图池化用于可解释的脑网络分类

摘要: 功能性磁共振成像（fMRI）是一种常用的技术，用于测量神经激活。其应用在识别帕金森病、阿尔茨海默病和自闭症等潜在的神经退行性疾病方面尤为重要。最近对fMRI数据进行的分析将大脑建模为图，并通过图神经网络（GNNs）提取特征。然而，fMRI数据的独特特征要求对GNN进行特殊设计。调整GNN以生成有效且可解释的特征仍然具有挑战性。在本文中，我们提出了一种对比双关注块和一种可微分图池化方法称为ContrastPool，以更好地利用GNN来处理大脑网络，满足fMRI特定的要求。我们将我们的方法应用于3种疾病的5个静息态fMRI大脑网络数据集，并证明其优于最先进的基线方法。我们的案例研究证实，我们的方法提取出的模式与神经科学文献中的领域知识相匹配，并揭示了直接且有趣的见解。我们的贡献强调了ContrastPool在推动对大脑网络和神经退行性疾病的理解方面的潜力。源代码可在https://github.com/AngusMonroe/ContrastPool找到。

更新时间: 2024-04-12 12:05:57

领域: q-bio.NC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2307.11133v2

Evolutionary Preference Sampling for Pareto Set Learning

Recently, Pareto Set Learning (PSL) has been proposed for learning the entire Pareto set using a neural network. PSL employs preference vectors to scalarize multiple objectives, facilitating the learning of mappings from preference vectors to specific Pareto optimal solutions. Previous PSL methods have shown their effectiveness in solving artificial multi-objective optimization problems (MOPs) with uniform preference vector sampling. The quality of the learned Pareto set is influenced by the sampling strategy of the preference vector, and the sampling of the preference vector needs to be decided based on the Pareto front shape. However, a fixed preference sampling strategy cannot simultaneously adapt the Pareto front of multiple MOPs. To address this limitation, this paper proposes an Evolutionary Preference Sampling (EPS) strategy to efficiently sample preference vectors. Inspired by evolutionary algorithms, we consider preference sampling as an evolutionary process to generate preference vectors for neural network training. We integrate the EPS strategy into five advanced PSL methods. Extensive experiments demonstrate that our proposed method has a faster convergence speed than baseline algorithms on 7 testing problems. Our implementation is available at https://github.com/rG223/EPS.

Updated: 2024-04-12 11:58:13

标题: Pareto集学习的进化偏好抽样

摘要: 最近，Pareto Set Learning (PSL) 已经被提出用于使用神经网络学习整个 Pareto 集。PSL 使用偏好向量对多个目标进行标量化，促进了从偏好向量到特定 Pareto 最优解的映射的学习。先前的 PSL 方法已经显示出在解决具有统一偏好向量采样的人工多目标优化问题(MOPs)中的有效性。学习到的 Pareto 集的质量受到偏好向量采样策略的影响，偏好向量的采样需要基于 Pareto 前沿形状来决定。然而，固定的偏好采样策略不能同时适应多个 MOPs 的 Pareto 前沿。为了解决这个限制，本文提出了一种进化偏好采样(EPS)策略来高效地采样偏好向量。受到进化算法的启发，我们将偏好采样视为一个生成偏好向量用于神经网络训练的进化过程。我们将 EPS 策略集成到五种先进的 PSL 方法中。大量实验表明，我们提出的方法在 7 个测试问题上比基线算法具有更快的收敛速度。我们的实现可在 https://github.com/rG223/EPS 上获得。

更新时间: 2024-04-12 11:58:13

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2404.08414v1

Enhancing MAP-Elites with Multiple Parallel Evolution Strategies

With the development of fast and massively parallel evaluations in many domains, Quality-Diversity (QD) algorithms, that already proved promising in a large range of applications, have seen their potential multiplied. However, we have yet to understand how to best use a large number of evaluations as using them for random variations alone is not always effective. High-dimensional search spaces are a typical situation where random variations struggle to effectively search. Another situation is uncertain settings where solutions can appear better than they truly are and naively evaluating more solutions might mislead QD algorithms. In this work, we propose MAP-Elites-Multi-ES (MEMES), a novel QD algorithm based on Evolution Strategies (ES) designed to exploit fast parallel evaluations more effectively. MEMES maintains multiple (up to 100) simultaneous ES processes, each with its own independent objective and reset mechanism designed for QD optimisation, all on just a single GPU. We show that MEMES outperforms both gradient-based and mutation-based QD algorithms on black-box optimisation and QD-Reinforcement-Learning tasks, demonstrating its benefit across domains. Additionally, our approach outperforms sampling-based QD methods in uncertain domains when given the same evaluation budget. Overall, MEMES generates reproducible solutions that are high-performing and diverse through large-scale ES optimisation on easily accessible hardware.

Updated: 2024-04-12 11:51:29

标题: 用多个并行进化策略增强MAP-Elites

摘要: 随着许多领域中快速和并行评估的发展，已经在许多应用中证明有前途的质量多样性（QD）算法的潜力得到了倍增。然而，我们尚未理解如何最好地利用大量的评估，因为仅仅使用它们进行随机变化并不总是有效的。高维度搜索空间是随机变化往往难以有效搜索的典型情况。另一种情况是不确定的设置，在这种情况下，解决方案可能看起来比实际更好，而简单地评估更多的解决方案可能会误导QD算法。在这项工作中，我们提出了一种基于进化策略（ES）的新型QD算法MAP-Elites-Multi-ES（MEMES），旨在更有效地利用快速并行评估。MEMES维护多个（最多100个）同时进行的ES进程，每个进程都有自己独立的目标和重置机制，旨在进行QD优化，全部在一个GPU上运行。我们展示了MEMES在黑盒优化和QD强化学习任务中优于基于梯度和基于突变的QD算法，证明了其跨领域的好处。此外，我们的方法在给定相同评估预算的情况下，在不确定的领域中优于基于采样的QD方法。总的来说，MEMES通过在易于获取的硬件上进行大规模ES优化生成了高性能和多样化的可复制解决方案。

更新时间: 2024-04-12 11:51:29

领域: cs.NE,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2303.06137v2

Deep Classifier Mimicry without Data Access

Access to pre-trained models has recently emerged as a standard across numerous machine learning domains. Unfortunately, access to the original data the models were trained on may not equally be granted. This makes it tremendously challenging to fine-tune, compress models, adapt continually, or to do any other type of data-driven update. We posit that original data access may however not be required. Specifically, we propose Contrastive Abductive Knowledge Extraction (CAKE), a model-agnostic knowledge distillation procedure that mimics deep classifiers without access to the original data. To this end, CAKE generates pairs of noisy synthetic samples and diffuses them contrastively toward a model's decision boundary. We empirically corroborate CAKE's effectiveness using several benchmark datasets and various architectural choices, paving the way for broad application.

Updated: 2024-04-12 11:50:26

标题: 没有数据访问的深度分类器模仿

摘要: 最近，许多机器学习领域已经开始普遍采用预训练模型。不幸的是，可能并不总是能够获得模型训练所需的原始数据。这使得微调、压缩模型、持续适应或进行任何其他类型的数据驱动更新变得极为具有挑战性。我们认为，可能并不需要访问原始数据。具体而言，我们提出了对比推断知识提取（CAKE），这是一种与模型无关的知识蒸馏过程，模拟深度分类器而无需访问原始数据。为此，CAKE生成一对有噪声的合成样本，并将它们对比地向模型的决策边界扩散。我们通过几个基准数据集和各种架构选择的实证方法验证了CAKE的有效性，为广泛应用铺平了道路。

更新时间: 2024-04-12 11:50:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.02090v4

Kernel-Based Testing for Single-Cell Differential Analysis

Single-cell technologies offer insights into molecular feature distributions, but comparing them poses challenges. We propose a kernel-testing framework for non-linear cell-wise distribution comparison, analyzing gene expression and epigenomic modifications. Our method allows feature-wise and global transcriptome/epigenome comparisons, revealing cell population heterogeneities. Using a classifier based on embedding variability, we identify transitions in cell states, overcoming limitations of traditional single-cell analysis. Applied to single-cell ChIP-Seq data, our approach identifies untreated breast cancer cells with an epigenomic profile resembling persister cells. This demonstrates the effectiveness of kernel testing in uncovering subtle population variations that might be missed by other methods.

Updated: 2024-04-12 11:48:03

标题: 基于核的单细胞差异分析测试

摘要: 单细胞技术提供了对分子特征分布的见解，但比较它们存在挑战。我们提出了一个用于非线性细胞级分布比较的核测试框架，分析基因表达和表观基因组修饰。我们的方法允许特征级和全局转录组/表观基因组比较，揭示细胞群体的异质性。通过基于嵌入变异性的分类器，我们识别出细胞状态的转变，克服了传统单细胞分析的局限性。应用于单细胞ChIP-Seq数据，我们的方法识别出表观基因组配置类似于残留细胞的未经处理的乳腺癌细胞。这表明核测试在揭示可能被其他方法忽略的微妙人群变异方面的有效性。

更新时间: 2024-04-12 11:48:03

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2307.08509v3

PiRD: Physics-informed Residual Diffusion for Flow Field Reconstruction

The use of machine learning in fluid dynamics is becoming more common to expedite the computation when solving forward and inverse problems of partial differential equations. Yet, a notable challenge with existing convolutional neural network (CNN)-based methods for data fidelity enhancement is their reliance on specific low-fidelity data patterns and distributions during the training phase. In addition, the CNN-based method essentially treats the flow reconstruction task as a computer vision task that prioritizes the element-wise precision which lacks a physical and mathematical explanation. This dependence can dramatically affect the models' effectiveness in real-world scenarios, especially when the low-fidelity input deviates from the training data or contains noise not accounted for during training. The introduction of diffusion models in this context shows promise for improving performance and generalizability. Unlike direct mapping from a specific low-fidelity to a high-fidelity distribution, diffusion models learn to transition from any low-fidelity distribution towards a high-fidelity one. Our proposed model - Physics-informed Residual Diffusion, demonstrates the capability to elevate the quality of data from both standard low-fidelity inputs, to low-fidelity inputs with injected Gaussian noise, and randomly collected samples. By integrating physics-based insights into the objective function, it further refines the accuracy and the fidelity of the inferred high-quality data. Experimental results have shown that our approach can effectively reconstruct high-quality outcomes for two-dimensional turbulent flows from a range of low-fidelity input conditions without requiring retraining.

Updated: 2024-04-12 11:45:51

标题: PiRD：用于流场重建的物理信息残差扩散

摘要: 使用机器学习在流体动力学中变得更加普遍，可以加快解决偏微分方程的正向和反向问题时的计算。然而，现有基于卷积神经网络（CNN）的数据保真度增强方法存在一个显著挑战，即在训练阶段依赖特定的低保真度数据模式和分布。此外，基于CNN的方法本质上将流重建任务视为一项计算机视觉任务，优先考虑元素精度，缺乏物理和数学解释。这种依赖性可能会显著影响模型在现实场景中的有效性，特别是当低保真度输入偏离训练数据或包含在训练期间未考虑的噪声时。在这种情况下引入扩散模型显示出提高性能和泛化能力的潜力。与直接从特定低保真度到高保真度分布的映射不同，扩散模型学习从任何低保真度分布向高保真度分布的过渡。我们提出的模型 - 物理信息残差扩散，展示了提升数据质量的能力，从标准低保真度输入到注入高斯噪声的低保真度输入和随机采集样本。通过将基于物理的见解整合到目标函数中，进一步提炼了推断的高质量数据的准确性和保真度。实验结果表明，我们的方法能够有效地重建二维湍流流动的高质量结果，而无需重新训练，即使在一系列低保真度输入条件下也能实现。

更新时间: 2024-04-12 11:45:51

领域: physics.flu-dyn,cs.AI

下载: http://arxiv.org/abs/2404.08412v1

Incremental Learning with Concept Drift Detection and Prototype-based Embeddings for Graph Stream Classification

Data stream mining aims at extracting meaningful knowledge from continually evolving data streams, addressing the challenges posed by nonstationary environments, particularly, concept drift which refers to a change in the underlying data distribution over time. Graph structures offer a powerful modelling tool to represent complex systems, such as, critical infrastructure systems and social networks. Learning from graph streams becomes a necessity to understand the dynamics of graph structures and to facilitate informed decision-making. This work introduces a novel method for graph stream classification which operates under the general setting where a data generating process produces graphs with varying nodes and edges over time. The method uses incremental learning for continual model adaptation, selecting representative graphs (prototypes) for each class, and creating graph embeddings. Additionally, it incorporates a loss-based concept drift detection mechanism to recalculate graph prototypes when drift is detected.

Updated: 2024-04-12 11:43:07

标题: 基于原型嵌入的概念漂移检测的图流分类增量学习

摘要: 数据流挖掘旨在从不断演变的数据流中提取有意义的知识，解决非平稳环境带来的挑战，特别是概念漂移，指的是随着时间变化的基础数据分布的改变。图结构提供了一个强大的建模工具，用于表示复杂系统，如关键基础设施系统和社交网络。从图流中学习变得必不可少，以了解图结构的动态，并促进知情决策。本文介绍了一种新颖的图流分类方法，适用于在数据生成过程中产生随时间变化的节点和边的图的一般设置下。该方法使用增量学习进行持续模型适应，为每个类别选择代表性图形（原型），并创建图嵌入。此外，它还结合了基于损失的概念漂移检测机制，当检测到漂移时重新计算图原型。

更新时间: 2024-04-12 11:43:07

领域: cs.LG

下载: http://arxiv.org/abs/2404.02572v2

Box Facets and Cut Facets of Lifted Multicut Polytopes

The lifted multicut problem is a combinatorial optimization problem whose feasible solutions relate one-to-one to the decompositions of a graph $G = (V, E)$. Given an augmentation $\widehat{G} = (V, E \cup F)$ of $G$ and given costs $c \in \mathbb{R}^{E \cup F}$, the objective is to minimize the sum of those $c_{uw}$ with $uw \in E \cup F$ for which $u$ and $w$ are in distinct components. For $F = \emptyset$, the problem specializes to the multicut problem, and for $E = \tbinom{V}{2}$ to the clique partitioning problem. We study a binary linear program formulation of the lifted multicut problem. More specifically, we contribute to the analysis of the associated lifted multicut polytopes: Firstly, we establish a necessary, sufficient and efficiently decidable condition for a lower box inequality to define a facet. Secondly, we show that deciding whether a cut inequality of the binary linear program defines a facet is NP-hard.

Updated: 2024-04-12 11:38:20

标题: 提升的多截面多面体的箱面和切面

摘要: 提升多切问题是一个组合优化问题，其可行解与图$G = (V, E)$的分解一一对应。给定图$G$的增广$\widehat{G} = (V, E \cup F)$和成本$c \in \mathbb{R}^{E \cup F}$，目标是最小化那些$uw \in E \cup F$的$c_{uw}$的总和，其中$u$和$w$在不同的分量中。对于$F = \emptyset$，该问题特化为多切问题，对于$E = \tbinom{V}{2}$，特化为团分割问题。我们研究了提升多切问题的二进制线性规划形式。具体地，我们为相关的提升多切多面体的分析做出了贡献：首先，我们建立了定义一个面的下界盒不等式的必要、充分和有效判定条件。其次，我们证明了判断二进制线性规划的切割不等式是否定义一个面是NP难问题。

更新时间: 2024-04-12 11:38:20

领域: cs.DM,cs.LG

下载: http://arxiv.org/abs/2402.16814v3

Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses

Mitigating hallucination issues is a key challenge that must be overcome to reliably deploy large language models (LLMs) in real-world scenarios. Recently, various methods have been proposed to detect and revise factual errors in LLM-generated texts, in order to reduce hallucination. In this paper, we propose Re-Ex, a method for post-editing LLM-generated responses. Re-Ex introduces a novel reasoning step dubbed as the factual error explanation step. Re-Ex revises the initial response of LLMs using 3-steps : first, external tools are used to retrieve the evidences of the factual errors in the initial LLM response; next, LLM is instructed to explain the problematic parts of the response based on the gathered evidence; finally, LLM revises the initial response using the explanations provided in the previous step. In addition to the explanation step, Re-Ex also incorporates new prompting techniques to reduce the token count and inference time required for the response revision process. Compared with existing methods including FacTool, CoVE, and RARR, Re-Ex provides better detection and revision performance with less inference time and fewer tokens in multiple benchmarks.

Updated: 2024-04-12 11:37:44

标题: 再解释：解释后的修订减少了LLM回应中的事实错误

摘要: 减轻幻觉问题是必须克服的关键挑战，以可靠地在现实场景中部署大型语言模型（LLMs）。最近，已经提出了各种方法来检测和修正LLM生成的文本中的事实错误，以减少幻觉。在本文中，我们提出了一种用于后期编辑LLM生成的响应的方法Re-Ex。Re-Ex引入了一种新颖的推理步骤，称为事实错误解释步骤。Re-Ex使用3个步骤修订LLM的初始响应：首先，使用外部工具检索初始LLM响应中的事实错误的证据；接下来，指导LLM根据收集到的证据解释响应中有问题的部分；最后，LLM使用前一步提供的解释修订初始响应。除了解释步骤，Re-Ex还结合了新的提示技术，以减少响应修订过程所需的标记数量和推理时间。与现有方法（包括FacTool、CoVE和RARR）相比，Re-Ex在多个基准测试中提供了更好的检测和修订性能，推理时间更少，标记数量更少。

更新时间: 2024-04-12 11:37:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.17097v2

Seismic First Break Picking in a Higher Dimension Using Deep Graph Learning

Contemporary automatic first break (FB) picking methods typically analyze 1D signals, 2D source gathers, or 3D source-receiver gathers. Utilizing higher-dimensional data, such as 2D or 3D, incorporates global features, improving the stability of local picking. Despite the benefits, high-dimensional data requires structured input and increases computational demands. Addressing this, we propose a novel approach using deep graph learning called DGL-FB, constructing a large graph to efficiently extract information. In this graph, each seismic trace is represented as a node, connected by edges that reflect similarities. To manage the size of the graph, we develop a subgraph sampling technique to streamline model training and inference. Our proposed framework, DGL-FB, leverages deep graph learning for FB picking. It encodes subgraphs into global features using a deep graph encoder. Subsequently, the encoded global features are combined with local node signals and fed into a ResUNet-based 1D segmentation network for FB detection. Field survey evaluations of DGL-FB show superior accuracy and stability compared to a 2D U-Net-based benchmark method.

Updated: 2024-04-12 11:36:24

标题: 使用深度图学习在更高维度中进行地震初至拾取

摘要: 当代自动首破（FB）拾取方法通常分析1D信号，2D源聚集或3D源-接收器聚集。利用更高维度的数据，如2D或3D，可以融入全局特征，提高本地拾取的稳定性。尽管有好处，高维数据需要结构化输入，并增加了计算需求。为了解决这个问题，我们提出了一种使用深度图学习的新方法，称为DGL-FB，构建一个大图以有效地提取信息。在这个图中，每个地震道迹被表示为一个节点，通过反映相似性的边相连接。为了管理图的大小，我们开发了一个子图采样技术来简化模型训练和推断。我们提出的框架DGL-FB利用深度图学习进行FB拾取。它使用深度图编码器将子图编码为全局特征。随后，编码的全局特征与本地节点信号相结合，并馈入基于ResUNet的1D分割网络进行FB检测。对DGL-FB的现场调查评估显示，与基于2D U-Net的基准方法相比，其准确性和稳定性都更高。

更新时间: 2024-04-12 11:36:24

领域: cs.LG,cs.AI,eess.SP,physics.geo-ph

下载: http://arxiv.org/abs/2404.08408v1

Complexity of Probabilistic Reasoning for Neurosymbolic Classification Techniques

Neurosymbolic artificial intelligence is a growing field of research aiming to combine neural network learning capabilities with the reasoning abilities of symbolic systems. Informed multi-label classification is a sub-field of neurosymbolic AI which studies how to leverage prior knowledge to improve neural classification systems. A well known family of neurosymbolic techniques for informed classification use probabilistic reasoning to integrate this knowledge during learning, inference or both. Therefore, the asymptotic complexity of probabilistic reasoning is of cardinal importance to assess the scalability of such techniques. However, this topic is rarely tackled in the neurosymbolic literature, which can lead to a poor understanding of the limits of probabilistic neurosymbolic techniques. In this paper, we introduce a formalism for informed supervised classification tasks and techniques. We then build upon this formalism to define three abstract neurosymbolic techniques based on probabilistic reasoning. Finally, we show computational complexity results on several representation languages for prior knowledge commonly found in the neurosymbolic literature.

Updated: 2024-04-12 11:31:37

标题: 神经符号分类技术中概率推理的复杂性

摘要: 神经符号人工智能是一个不断发展的研究领域，旨在结合神经网络学习能力和符号系统的推理能力。知情多标签分类是神经符号人工智能的一个子领域，研究如何利用先验知识来改进神经分类系统。一组众所周知的神经符号技术用概率推理来在学习、推断或两者过程中整合这些知识。因此，概率推理的渐近复杂性对于评估此类技术的可扩展性至关重要。然而，这个主题在神经符号文献中很少被讨论，这可能导致对概率神经符号技术的极限的理解不足。在本文中，我们引入了一个针对知情监督分类任务和技术的形式化。然后，我们基于这个形式化定义了三种基于概率推理的抽象神经符号技术。最后，我们展示了在神经符号文献中常见的先验知识表示语言上的计算复杂性结果。

更新时间: 2024-04-12 11:31:37

领域: cs.AI,cs.CC,cs.LG,cs.SC

下载: http://arxiv.org/abs/2404.08404v1

Learning representations of learning representations

The ICLR conference is unique among the top machine learning conferences in that all submitted papers are openly available. Here we present the ICLR dataset consisting of abstracts of all 24 thousand ICLR submissions from 2017-2024 with meta-data, decision scores, and custom keyword-based labels. We find that on this dataset, bag-of-words representation outperforms most dedicated sentence transformer models in terms of $k$NN classification accuracy, and the top performing language models barely outperform TF-IDF. We see this as a challenge for the NLP community. Furthermore, we use the ICLR dataset to study how the field of machine learning has changed over the last seven years, finding some improvement in gender balance. Using a 2D embedding of the abstracts' texts, we describe a shift in research topics from 2017 to 2024 and identify hedgehogs and foxes among the authors with the highest number of ICLR submissions.

Updated: 2024-04-12 11:30:16

标题: 学习学习表示形式

摘要: ICLR会议在顶级机器学习会议中独一无二，因为所有提交的论文都是公开可用的。在这里，我们介绍了ICLR数据集，包括2017年至2024年的所有2.4万份ICLR提交的摘要，以及元数据、决策分数和基于关键词的自定义标签。我们发现，在这个数据集上，词袋表示在$k$NN分类准确性方面优于大多数专用句子转换模型，而表现最好的语言模型几乎比TF-IDF稍好一些。我们认为这对自然语言处理社区是一个挑战。此外，我们使用ICLR数据集研究了过去七年机器学习领域的变化，发现性别平衡有所改善。通过对摘要文本的二维嵌入，我们描述了从2017年到2024年研究主题的变化，并识别出在ICLR提交数量最多的作者中的刺猬和狐狸。

更新时间: 2024-04-12 11:30:16

领域: cs.CL,cs.DL,cs.LG

下载: http://arxiv.org/abs/2404.08403v1

Calibration-Aware Bayesian Learning

Deep learning models, including modern systems like large language models, are well known to offer unreliable estimates of the uncertainty of their decisions. In order to improve the quality of the confidence levels, also known as calibration, of a model, common approaches entail the addition of either data-dependent or data-independent regularization terms to the training loss. Data-dependent regularizers have been recently introduced in the context of conventional frequentist learning to penalize deviations between confidence and accuracy. In contrast, data-independent regularizers are at the core of Bayesian learning, enforcing adherence of the variational distribution in the model parameter space to a prior density. The former approach is unable to quantify epistemic uncertainty, while the latter is severely affected by model misspecification. In light of the limitations of both methods, this paper proposes an integrated framework, referred to as calibration-aware Bayesian neural networks (CA-BNNs), that applies both regularizers while optimizing over a variational distribution as in Bayesian learning. Numerical results validate the advantages of the proposed approach in terms of expected calibration error (ECE) and reliability diagrams.

Updated: 2024-04-12 11:30:04

标题: 校准感知的贝叶斯学习

摘要: 深度学习模型，包括现代系统如大型语言模型，众所周知在提供其决策不确定性估计时存在不可靠性。为了提高模型的置信水平质量，也称为校准，常见方法包括在训练损失中添加数据相关或数据无关的正则化项。最近在传统频率学习的背景下引入了数据相关的正则化器，以惩罚置信度和准确度之间的偏差。相反，数据无关的正则化器是贝叶斯学习的核心，强制将模型参数空间中的变分分布与先验密度保持一致。前一种方法无法量化认识不确定性，而后者受模型错误规定的严重影响。鉴于两种方法的限制，本文提出了一个集成框架，称为校准感知贝叶斯神经网络（CA-BNNs），在优化变分分布的同时应用两种正则化器，就像贝叶斯学习一样。数值结果验证了所提方法在期望校准误差（ECE）和可靠性图中的优势。

更新时间: 2024-04-12 11:30:04

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2305.07504v2

No Bells, Just Whistles: Sports Field Registration by Leveraging Geometric Properties

Broadcast sports field registration is traditionally addressed as a homography estimation task, mapping the visible image area to a planar field model, predominantly focusing on the main camera shot. Addressing the shortcomings of previous approaches, we propose a novel calibration pipeline enabling camera calibration using a 3D soccer field model and extending the process to assess the multiple-view nature of broadcast videos. Our approach begins with a keypoint generation pipeline derived from SoccerNet dataset annotations, leveraging the geometric properties of the court. Subsequently, we execute classical camera calibration through DLT algorithm in a minimalist fashion, without further refinement. Through extensive experimentation on real-world soccer broadcast datasets such as SoccerNet-Calibration, WorldCup 2014 and TS- WorldCup, our method demonstrates superior performance in both multiple- and single-view 3D camera calibration while maintaining competitive results in homography estimation compared to state-of-the-art techniques.

Updated: 2024-04-12 11:15:15

标题: 无需铃声，只需哨声：利用几何属性进行体育场地注册

摘要: 广播体育场地注册传统上被视为一个单应性估计任务，将可见图像区域映射到平面场地模型，主要集中在主摄像头镜头上。为了解决先前方法的不足，我们提出了一种新颖的校准流程，通过使用3D足球场地模型进行相机校准，并将该过程扩展到评估广播视频的多视角性质。我们的方法始于从SoccerNet数据集注释派生的关键点生成流程，利用球场的几何属性。随后，我们以极简主义方式通过DLT算法执行经典相机校准，无需进一步细化。通过对真实世界足球广播数据集（如SoccerNet-Calibration，2014年世界杯和TS-世界杯）的广泛实验，我们的方法在多视角和单视角3D相机校准方面表现出卓越性能，同时与最先进的技术相比，在单应性估计方面保持竞争力。

更新时间: 2024-04-12 11:15:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.08401v1

Exploring Contrastive Learning for Long-Tailed Multi-Label Text Classification

Learning an effective representation in multi-label text classification (MLTC) is a significant challenge in NLP. This challenge arises from the inherent complexity of the task, which is shaped by two key factors: the intricate connections between labels and the widespread long-tailed distribution of the data. To overcome this issue, one potential approach involves integrating supervised contrastive learning with classical supervised loss functions. Although contrastive learning has shown remarkable performance in multi-class classification, its impact in the multi-label framework has not been thoroughly investigated. In this paper, we conduct an in-depth study of supervised contrastive learning and its influence on representation in MLTC context. We emphasize the importance of considering long-tailed data distributions to build a robust representation space, which effectively addresses two critical challenges associated with contrastive learning that we identify: the "lack of positives" and the "attraction-repulsion imbalance". Building on this insight, we introduce a novel contrastive loss function for MLTC. It attains Micro-F1 scores that either match or surpass those obtained with other frequently employed loss functions, and demonstrates a significant improvement in Macro-F1 scores across three multi-label datasets.

Updated: 2024-04-12 11:12:16

标题: 探索对长尾多标签文本分类的对比学习

摘要: 在自然语言处理中，多标签文本分类（MLTC）中学习有效的表示是一个重要挑战。这一挑战源于任务的内在复杂性，其形成的两个关键因素是标签之间的复杂连接和数据的广泛长尾分布。为了克服这一问题，一个潜在的方法是将监督对比学习与经典监督损失函数相结合。虽然对比学习在多类别分类中表现出色，但其在多标签框架中的影响尚未得到深入研究。本文对监督对比学习及其在MLTC环境中对表示的影响进行了深入研究。我们强调考虑长尾数据分布以构建稳健的表示空间的重要性，这有效地解决了我们所发现的对比学习面临的两个关键挑战："缺乏正样本"和"吸引-排斥不平衡"。基于这一洞察，我们引入了一种新颖的对比损失函数用于MLTC。它达到了Micro-F1得分，要么匹配要么超过其他常用损失函数得到的得分，并且在三个多标签数据集上显示出Macro-F1得分的显著提升。

更新时间: 2024-04-12 11:12:16

领域: cs.LG,cs.CL,cs.IR

下载: http://arxiv.org/abs/2404.08720v1

Multi-Agent eXperimenter (MAX)

We present a novel multi-agent simulator named Multi-Agent eXperimenter (MAX) that is designed to simulate blockchain experiments involving large numbers of agents of different types acting in one or several environments. The architecture of MAX is highly modular, enabling easy addition of new models.

Updated: 2024-04-12 11:07:10

标题: 多智能体实验者（MAX）

摘要: 我们介绍了一个名为Multi-Agent eXperimenter（MAX）的新型多代理模拟器，旨在模拟涉及大量不同类型代理在一个或多个环境中行动的区块链实验。 MAX的架构非常模块化，可以轻松添加新的模型。

更新时间: 2024-04-12 11:07:10

领域: cs.MA,cs.AI,cs.DC

下载: http://arxiv.org/abs/2404.08398v1

Data-Driven Preference Sampling for Pareto Front Learning

Pareto front learning is a technique that introduces preference vectors in a neural network to approximate the Pareto front. Previous Pareto front learning methods have demonstrated high performance in approximating simple Pareto fronts. These methods often sample preference vectors from a fixed Dirichlet distribution. However, no fixed sampling distribution can be adapted to diverse Pareto fronts. Efficiently sampling preference vectors and accurately estimating the Pareto front is a challenge. To address this challenge, we propose a data-driven preference vector sampling framework for Pareto front learning. We utilize the posterior information of the objective functions to adjust the parameters of the sampling distribution flexibly. In this manner, the proposed method can sample preference vectors from the location of the Pareto front with a high probability. Moreover, we design the distribution of the preference vector as a mixture of Dirichlet distributions to improve the performance of the model in disconnected Pareto fronts. Extensive experiments validate the superiority of the proposed method compared with state-of-the-art algorithms.

Updated: 2024-04-12 11:06:22

标题: 数据驱动的偏好抽样用于帕累托前沿学习

摘要: 帕累托前沿学习是一种技术，它在神经网络中引入偏好向量来近似帕累托前沿。先前的帕累托前沿学习方法已经表现出在近似简单帕累托前沿方面的高性能。这些方法通常从固定的狄利克雷分布中采样偏好向量。然而，没有固定的采样分布能够适应多样的帕累托前沿。高效地采样偏好向量并准确估计帕累托前沿是一个挑战。为了解决这一挑战，我们提出了一个基于数据驱动的帕累托前沿学习的偏好向量采样框架。我们利用目标函数的后验信息来灵活调整采样分布的参数。通过这种方式，所提出的方法可以以较高的概率从帕累托前沿的位置采样偏好向量。此外，我们将偏好向量的分布设计为狄利克雷分布的混合，以提高模型在不连续帕累托前沿中的性能。大量实验证实了所提出方法相较于最先进算法的优越性。

更新时间: 2024-04-12 11:06:22

领域: cs.LG,68T05,I.2.6

下载: http://arxiv.org/abs/2404.08397v1

NC-TTT: A Noise Contrastive Approach for Test-Time Training

Despite their exceptional performance in vision tasks, deep learning models often struggle when faced with domain shifts during testing. Test-Time Training (TTT) methods have recently gained popularity by their ability to enhance the robustness of models through the addition of an auxiliary objective that is jointly optimized with the main task. Being strictly unsupervised, this auxiliary objective is used at test time to adapt the model without any access to labels. In this work, we propose Noise-Contrastive Test-Time Training (NC-TTT), a novel unsupervised TTT technique based on the discrimination of noisy feature maps. By learning to classify noisy views of projected feature maps, and then adapting the model accordingly on new domains, classification performance can be recovered by an important margin. Experiments on several popular test-time adaptation baselines demonstrate the advantages of our method compared to recent approaches for this task. The code can be found at:https://github.com/GustavoVargasHakim/NCTTT.git

Updated: 2024-04-12 10:54:11

标题: NC-TTT: 一种用于测试时间训练的噪声对比方法

摘要: 尽管深度学习模型在视觉任务中表现出色，但在测试过程中面临领域转移时往往会遇到困难。最近，测试时训练（TTT）方法因其通过添加一个与主要任务联合优化的辅助目标来增强模型的鲁棒性而变得流行。这个严格无监督的辅助目标在测试时被用来适应模型，而不需要访问标签。在这项工作中，我们提出了一种基于区分噪声特征图的新颖无监督TTT技术，即Noise-Contrastive Test-Time Training（NC-TTT）。通过学习对投影特征图的噪声视图进行分类，然后根据新领域调整模型，分类性能可以得到明显提升。对几种流行的测试时适应基线进行实验表明，与最近的方法相比，我们的方法具有明显优势。源代码可在以下链接找到：https://github.com/GustavoVargasHakim/NCTTT.git

更新时间: 2024-04-12 10:54:11

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.08392v1

Explaining the Machine Learning Solution of the Ising Model

As powerful as machine learning (ML) techniques are in solving problems involving data with large dimensionality, explaining the results from the fitted parameters remains a challenging task of utmost importance, especially in physics applications. This work shows how this can be accomplished for the ferromagnetic Ising model, the main target of several ML studies in statistical physics. Here it is demonstrated that the successful unsupervised identification of the phases and order parameter by principal component analysis, a common method in those studies, detects that the magnetization per spin has its greatest variation with the temperature, the actual control parameter of the phase transition. Then, by using a neural network (NN) without hidden layers (the simplest possible) and informed by the symmetry of the Hamiltonian, an explanation is provided for the strategy used in finding the supervised learning solution for the critical temperature of the model's continuous phase transition. This allows the prediction of the minimal extension of the NN to solve the problem when the symmetry is not known, which becomes also explainable. These results pave the way to a physics-informed explainable generalized framework, enabling the extraction of physical laws and principles from the parameters of the models.

Updated: 2024-04-12 10:36:28

标题: 解释伊辛模型的机器学习解决方案

摘要: 尽管机器学习（ML）技术在解决涉及大维数据的问题方面非常强大，但解释拟合参数结果仍然是一项具有至关重要意义的具有挑战性的任务，尤其是在物理应用中。本研究展示了如何针对铁磁伊辛模型实现这一目标，这是统计物理学中多个ML研究的主要目标。在这里，通过主成分分析成功地识别相和序参量，这是这些研究中常见的方法，发现自旋磁化率随温度变化最为显著，而温度正是相变的实际控制参数。然后，通过使用没有隐藏层的神经网络（NN）（即最简单的结构）并借助哈密顿量的对称性，提供了一个解释，解释了在找到模型连续相变的临界温度的监督学习解决方案中所使用的策略。这使得在不知道对称性时可以预测解决问题所需的NN的最小扩展，同时也能够解释。这些结果为一个具有物理信息的可解释的通用框架铺平了道路，使得可以从模型的参数中提取物理定律和原则。

更新时间: 2024-04-12 10:36:28

领域: cond-mat.dis-nn,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2402.11701v2

Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think

Multiple choice questions (MCQs) are commonly used to evaluate the capabilities of large language models (LLMs). One common way to evaluate the model response is to rank the candidate answers based on the log probability of the first token prediction. An alternative way is to examine the text output. Prior work has shown that first token probabilities lack robustness to changes in MCQ phrasing, and that first token probabilities do not match text answers for instruction-tuned models. Therefore, in this paper, we investigate the robustness of text answers. We show that the text answers are more robust to question perturbations than the first token probabilities, when the first token answers mismatch the text answers. The difference in robustness increases as the mismatch rate becomes greater. As the mismatch reaches over 50\%, the text answer is more robust to option order changes than the debiased first token probabilities using state-of-the-art debiasing methods such as PriDe. Our findings provide further evidence for the benefits of text answer evaluation over first token probability evaluation.

Updated: 2024-04-12 10:36:15

标题: 看文本：指导调整的语言模型比你想象的更强大的多选选择器

摘要: 多项选择题（MCQs）通常用于评估大型语言模型（LLMs）的能力。评估模型响应的一种常见方法是根据第一个标记预测的对数概率对候选答案进行排名。另一种方法是检查文本输出。先前的研究表明，第一个标记的概率对MCQ措辞的变化缺乏稳健性，并且对于经过指导调整的模型，第一个标记的概率与文本答案不匹配。因此，在本文中，我们研究了文本答案的稳健性。我们发现，当第一个标记的答案与文本答案不匹配时，文本答案比第一个标记的概率更稳健地承受问题的扰动。随着不匹配率的增加，稳健性的差异也增加。当不匹配率超过50\%时，文本答案比使用PriDe等最先进的去偏方法对选项顺序变化更具稳健性的第一个标记的概率更具稳健性。我们的发现进一步证明了文本答案评估相对于第一个标记概率评估的优势。

更新时间: 2024-04-12 10:36:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.08382v1

Graph data augmentation with Gromow-Wasserstein Barycenters

Graphs are ubiquitous in various fields, and deep learning methods have been successful applied in graph classification tasks. However, building large and diverse graph datasets for training can be expensive. While augmentation techniques exist for structured data like images or numerical data, the augmentation of graph data remains challenging. This is primarily due to the complex and non-Euclidean nature of graph data. In this paper, it has been proposed a novel augmentation strategy for graphs that operates in a non-Euclidean space. This approach leverages graphon estimation, which models the generative mechanism of networks sequences. Computational results demonstrate the effectiveness of the proposed augmentation framework in improving the performance of graph classification models. Additionally, using a non-Euclidean distance, specifically the Gromow-Wasserstein distance, results in better approximations of the graphon. This framework also provides a means to validate different graphon estimation approaches, particularly in real-world scenarios where the true graphon is unknown.

Updated: 2024-04-12 10:22:55

标题: 使用Gromow-Wasserstein Barycenters进行图数据增强

摘要: 图表在各个领域中无处不在，深度学习方法已成功应用于图分类任务。然而，为训练构建大型和多样化的图数据集可能成本高昂。虽然存在用于结构化数据如图像或数值数据的增强技术，但图数据的增强仍具挑战性。这主要是由于图数据的复杂和非欧几里得性质所致。本文提出了一种在非欧几里得空间中操作的图增强策略。该方法利用图音估计来建模网络序列的生成机制。计算结果表明，所提出的增强框架在改善图分类模型性能方面非常有效。此外，使用非欧几里得距离，特别是Gromow-Wasserstein距离，可以更好地逼近图音。该框架还提供一种验证不同图音估计方法的手段，特别是在真实场景中真实图音未知的情况下。

更新时间: 2024-04-12 10:22:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.08376v1

Impacts of Color and Texture Distortions on Earth Observation Data in Deep Learning

Land cover classification and change detection are two important applications of remote sensing and Earth observation (EO) that have benefited greatly from the advances of deep learning. Convolutional and transformer-based U-net models are the state-of-the-art architectures for these tasks, and their performances have been boosted by an increased availability of large-scale annotated EO datasets. However, the influence of different visual characteristics of the input EO data on a model's predictions is not well understood. In this work we systematically examine model sensitivities with respect to several color- and texture-based distortions on the input EO data during inference, given models that have been trained without such distortions. We conduct experiments with multiple state-of-the-art segmentation networks for land cover classification and show that they are in general more sensitive to texture than to color distortions. Beyond revealing intriguing characteristics of widely used land cover classification models, our results can also be used to guide the development of more robust models within the EO domain.

Updated: 2024-04-12 10:15:45

标题: 深度学习中颜色和纹理失真对地球观测数据的影响

摘要: 土地覆盖分类和变化检测是遥感和地球观测（EO）的两个重要应用领域，受益于深度学习的进展。卷积和基于变压器的U-net模型是这些任务的最先进架构，它们的性能受益于大规模标记的EO数据集的增加可用性。然而，不同视觉特征对模型预测的影响并不明确。在这项工作中，我们系统地检查模型对输入EO数据中几种基于颜色和纹理的失真的敏感性，在推断期间，给定已经训练过的模型没有这种失真。我们对多个最先进的土地覆盖分类分割网络进行实验，并显示它们通常对纹理失真比对颜色失真更敏感。除了揭示广泛使用的土地覆盖分类模型的有趣特性外，我们的结果还可以用来指导EO领域内更健壮模型的开发。

更新时间: 2024-04-12 10:15:45

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.04385v2

Few-Shot Cross-System Anomaly Trace Classification for Microservice-based systems

Microservice-based systems (MSS) may experience failures in various fault categories due to their complex and dynamic nature. To effectively handle failures, AIOps tools utilize trace-based anomaly detection and root cause analysis. In this paper, we propose a novel framework for few-shot abnormal trace classification for MSS. Our framework comprises two main components: (1) Multi-Head Attention Autoencoder for constructing system-specific trace representations, which enables (2) Transformer Encoder-based Model-Agnostic Meta-Learning to perform effective and efficient few-shot learning for abnormal trace classification. The proposed framework is evaluated on two representative MSS, Trainticket and OnlineBoutique, with open datasets. The results show that our framework can adapt the learned knowledge to classify new, unseen abnormal traces of novel fault categories both within the same system it was initially trained on and even in the different MSS. Within the same MSS, our framework achieves an average accuracy of 93.26\% and 85.2\% across 50 meta-testing tasks for Trainticket and OnlineBoutique, respectively, when provided with 10 instances for each task. In a cross-system context, our framework gets an average accuracy of 92.19\% and 84.77\% for the same meta-testing tasks of the respective system, also with 10 instances provided for each task. Our work demonstrates the applicability of achieving few-shot abnormal trace classification for MSS and shows how it can enable cross-system adaptability. This opens an avenue for building more generalized AIOps tools that require less system-specific data labeling for anomaly detection and root cause analysis.

Updated: 2024-04-12 10:09:16

标题: 跨系统少样本异常追踪分类在微服务系统中的应用

摘要: 基于微服务的系统（MSS）可能由于其复杂和动态的特性而在各种故障类别中经历故障。为了有效处理故障，AIOps工具利用基于迹线的异常检测和根本原因分析。在本文中，我们提出了一个新颖的框架，用于MSS的少样本异常迹线分类。我们的框架包括两个主要组件：（1）用于构建系统特定迹线表示的多头注意力自动编码器，从而实现（2）基于Transformer编码器的模型无关元学习，以进行有效且高效的少样本学习，用于异常迹线分类。提出的框架在两个典型MSS，Trainticket和OnlineBoutique上进行了评估，使用开放数据集。结果表明，我们的框架可以适应学习到的知识，对新的、未见过的故障类别的异常迹线进行分类，无论是在最初训练的同一系统内，还是在不同的MSS内。在同一MSS内，当为每个任务提供10个实例时，我们的框架分别实现了Trainticket和OnlineBoutique的50个元测试任务的平均准确率为93.26%和85.2%。在跨系统的情况下，对于相应系统的相同元测试任务，我们的框架在每个任务提供10个实例的情况下，平均准确率分别为92.19%和84.77%。我们的工作展示了实现MSS的少样本异常迹线分类的适用性，并展示了它如何实现跨系统的适应性。这为构建更加通用的AIOps工具开辟了途径，这些工具对于异常检测和根本原因分析需要更少的系统特定数据标记。

更新时间: 2024-04-12 10:09:16

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.18998v3

Differentiable All-pole Filters for Time-varying Audio Systems

Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-to-end training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they cannot accurately reflect the gradient of the original system. We alleviate this difficulty by re-expressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within any audio system containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and feed-forward compressor. We make our code available and provide the trained audio effect and synth models in a VST plugin at https://christhetree.github.io/all_pole_filters/.

Updated: 2024-04-12 09:58:58

标题: 可微分的全极点滤波器用于时变音频系统

摘要: 无限脉冲响应滤波器是许多时变音频系统的基本构建模块，如音频效果和合成器。然而，它们的递归结构阻碍了使用自动微分进行这些系统的端到端训练。尽管先前的工作中已经提出并广泛使用了类似频率采样和基于帧的处理等非递归滤波器逼近方法，但它们无法准确反映原始系统的梯度。我们通过重新表达时变全极点滤波器，使梯度能够通过自身反向传播，从而使滤波器的实现不受自动微分框架的技术限制。这种实现可以在包含极点滤波器的任何音频系统中使用，以进行高效的梯度评估。我们展示了它在相位器、时变减法合成器和前馈压缩器上对建模真实世界动态音频系统的训练效率和表现能力。我们提供了代码，并在VST插件中提供了经过训练的音频效果和合成器模型，网址为https://christhetree.github.io/all_pole_filters/。

更新时间: 2024-04-12 09:58:58

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2404.07970v2

Large-Scale Multi-Domain Recommendation: an Automatic Domain Feature Extraction and Personalized Integration Framework

Feed recommendation is currently the mainstream mode for many real-world applications (e.g., TikTok, Dianping), it is usually necessary to model and predict user interests in multiple scenarios (domains) within and even outside the application. Multi-domain learning is a typical solution in this regard. While considerable efforts have been made in this regard, there are still two long-standing challenges: (1) Accurately depicting the differences among domains using domain features is crucial for enhancing the performance of each domain. However, manually designing domain features and models for numerous domains can be a laborious task. (2) Users typically have limited impressions in only a few domains. Extracting features automatically from other domains and leveraging them to improve the predictive capabilities of each domain has consistently posed a challenging problem. In this paper, we propose an Automatic Domain Feature Extraction and Personalized Integration (DFEI) framework for the large-scale multi-domain recommendation. The framework automatically transforms the behavior of each individual user into an aggregation of all user behaviors within the domain, which serves as the domain features. Unlike offline feature engineering methods, the extracted domain features are higher-order representations and directly related to the target label. Besides, by personalized integration of domain features from other domains for each user and the innovation in the training mode, the DFEI framework can yield more accurate conversion identification. Experimental results on both public and industrial datasets, consisting of over 20 domains, clearly demonstrate that the proposed framework achieves significantly better performance compared with SOTA baselines. Furthermore, we have released the source code of the proposed framework at https://github.com/xidongbo/DFEI.

Updated: 2024-04-12 09:57:17

标题: 大规模多领域推荐：自动领域特征提取和个性化整合框架

摘要: 目前，推荐系统是许多现实世界应用（例如TikTok，大众点评）的主流模式，通常需要对用户在应用内甚至应用外的多个场景（领域）的兴趣进行建模和预测。多领域学习是在这方面的典型解决方案。尽管在这方面已经付出了相当多的努力，但仍然存在两个长期存在的挑战：(1) 使用领域特征准确描述领域之间的差异对于提升每个领域的性能至关重要。然而，为众多领域手动设计领域特征和模型可能是一项繁重的任务。(2) 用户通常在只有少数几个领域中有限的印象。自动从其他领域提取特征并利用它们来提高每个领域的预测能力一直是一个具有挑战性的问题。在本文中，我们提出了一个用于大规模多领域推荐的自动领域特征提取和个性化整合（DFEI）框架。该框架将每个个体用户的行为自动转化为领域内所有用户行为的聚合，作为领域特征。与离线特征工程方法不同，提取的领域特征是高阶表示，并直接与目标标签相关。此外，通过对每个用户从其他领域的领域特征进行个性化整合和在训练模式上的创新，DFEI框架可以产生更准确的转化识别。在包含20多个领域的公共和工业数据集上的实验结果清楚地表明，所提出的框架与SOTA基线相比取得了显著更好的性能。此外，我们已经在https://github.com/xidongbo/DFEI 上发布了所提出框架的源代码。

更新时间: 2024-04-12 09:57:17

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2404.08361v1

Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval

In today's digital world, seeking answers to health questions on the Internet is a common practice. However, existing question answering (QA) systems often rely on using pre-selected and annotated evidence documents, thus making them inadequate for addressing novel questions. Our study focuses on the open-domain QA setting, where the key challenge is to first uncover relevant evidence in large knowledge bases. By utilizing the common retrieve-then-read QA pipeline and PubMed as a trustworthy collection of medical research documents, we answer health questions from three diverse datasets. We modify different retrieval settings to observe their influence on the QA pipeline's performance, including the number of retrieved documents, sentence selection process, the publication year of articles, and their number of citations. Our results reveal that cutting down on the amount of retrieved documents and favoring more recent and highly cited documents can improve the final macro F1 score up to 10%. We discuss the results, highlight interesting examples, and outline challenges for future research, like managing evidence disagreement and crafting user-friendly explanations.

Updated: 2024-04-12 09:56:12

标题: 使用可靠和时间感知的证据检索来改进健康问题回答

摘要: 在当今数字化世界中，在互联网上寻找健康问题的答案是一种常见做法。然而，现有的问答（QA）系统通常依赖于使用预先选择和注释的证据文档，因此使它们无法解决新问题。我们的研究重点放在开放域QA设置上，其中关键挑战是首先在大型知识库中发现相关证据。通过利用常见的检索-阅读QA流程和PubMed作为可信的医学研究文献收集，我们从三个不同的数据集回答健康问题。我们修改了不同的检索设置来观察它们对QA流程性能的影响，包括检索到的文档数量、句子选择过程、文章的发表年份和引用次数。我们的结果显示，减少检索到的文档数量并偏爱最近和被引用次数较多的文档可以将最终宏观F1分数提高多达10%。我们讨论了结果，突出了有趣的例子，并概述了未来研究的挑战，如管理证据分歧和制作用户友好的解释。

更新时间: 2024-04-12 09:56:12

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2404.08359v1

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models

Large-scale vision-and-language models, such as CLIP, are typically trained on web-scale data, which can introduce inappropriate content and lead to the development of unsafe and biased behavior. This, in turn, hampers their applicability in sensitive and trustworthy contexts and could raise significant concerns in their adoption. Our research introduces a novel approach to enhancing the safety of vision-and-language models by diminishing their sensitivity to NSFW (not safe for work) inputs. In particular, our methodology seeks to sever "toxic" linguistic and visual concepts, unlearning the linkage between unsafe linguistic or visual items and unsafe regions of the embedding space. We show how this can be done by fine-tuning a CLIP model on synthetic data obtained from a large language model trained to convert between safe and unsafe sentences, and a text-to-image generator. We conduct extensive experiments on the resulting embedding space for cross-modal retrieval, text-to-image, and image-to-text generation, where we show that our model can be remarkably employed with pre-trained generative models. Our source code and trained models are available at: https://github.com/aimagelab/safe-clip.

Updated: 2024-04-12 09:37:37

标题: Safe-CLIP：从视觉和语言模型中移除不安全的概念

摘要: 大规模的视觉与语言模型，如CLIP，通常是在网络规模的数据上进行训练的，这可能会引入不当内容并导致不安全和偏见行为的发展。这反过来会影响它们在敏感和可信任的环境中的适用性，并可能引起人们对其采用的重大关注。我们的研究引入了一种新颖的方法，通过减少对NSFW（不适合工作的）输入的敏感性，提高视觉与语言模型的安全性。具体而言，我们的方法旨在切断“有害”的语言和视觉概念，取消不安全语言或视觉项目与嵌入空间不安全区域之间的联系。我们展示了如何通过在从一个大型语言模型训练的转换安全和不安全句子之间的数据以及文本到图像生成器中获得的合成数据上对CLIP模型进行微调来实现这一点。我们对产生的嵌入空间进行了广泛的实验，包括跨模态检索、文本到图像和图像到文本生成，在这些实验中，我们展示了我们的模型可以与预训练的生成模型非常有效地配合使用。我们的源代码和训练模型可在以下链接找到: https://github.com/aimagelab/safe-clip.

更新时间: 2024-04-12 09:37:37

领域: cs.CV,cs.AI,cs.CL,cs.MM

下载: http://arxiv.org/abs/2311.16254v2

Lightweight Deep Learning for Resource-Constrained Environments: A Survey

Over the past decade, the dominance of deep learning has prevailed across various domains of artificial intelligence, including natural language processing, computer vision, and biomedical signal processing. While there have been remarkable improvements in model accuracy, deploying these models on lightweight devices, such as mobile phones and microcontrollers, is constrained by limited resources. In this survey, we provide comprehensive design guidance tailored for these devices, detailing the meticulous design of lightweight models, compression methods, and hardware acceleration strategies. The principal goal of this work is to explore methods and concepts for getting around hardware constraints without compromising the model's accuracy. Additionally, we explore two notable paths for lightweight deep learning in the future: deployment techniques for TinyML and Large Language Models. Although these paths undoubtedly have potential, they also present significant challenges, encouraging research into unexplored areas.

Updated: 2024-04-12 09:34:38

标题: 轻量级深度学习在资源受限环境中的应用：一项调研

摘要: 在过去的十年中，深度学习的主导地位在人工智能的各个领域中占据主导地位，包括自然语言处理、计算机视觉和生物医学信号处理。虽然模型准确性有了显著提高，但在轻量级设备上部署这些模型，如手机和微控制器，受到资源有限的限制。在这项调查中，我们提供了专为这些设备量身定制的全面设计指导，详细介绍了轻量级模型的细致设计、压缩方法和硬件加速策略。这项工作的主要目标是探索方法和概念，以克服硬件约束而不影响模型的准确性。此外，我们还探讨了未来轻量级深度学习的两条值得关注的路径：TinyML和大型语言模型的部署技术。尽管这些路径无疑有潜力，但也存在重大挑战，鼓励研究未开发的领域。

更新时间: 2024-04-12 09:34:38

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.07236v2

Self-Supervised k-Space Regularization for Motion-Resolved Abdominal MRI Using Neural Implicit k-Space Representation

Neural implicit k-space representations have shown promising results for dynamic MRI at high temporal resolutions. Yet, their exclusive training in k-space limits the application of common image regularization methods to improve the final reconstruction. In this work, we introduce the concept of parallel imaging-inspired self-consistency (PISCO), which we incorporate as novel self-supervised k-space regularization enforcing a consistent neighborhood relationship. At no additional data cost, the proposed regularization significantly improves neural implicit k-space reconstructions on simulated data. Abdominal in-vivo reconstructions using PISCO result in enhanced spatio-temporal image quality compared to state-of-the-art methods. Code is available at https://github.com/vjspi/PISCO-NIK.

Updated: 2024-04-12 09:31:11

标题: 使用神经隐式K空间表示的自监督K空间正则化技术用于运动修正的腹部MRI

摘要: 神经隐式k空间表示在高时间分辨率下的动态MRI中显示出有希望的结果。然而，它们在k空间中的独立训练限制了常见图像正规化方法的应用，以改善最终重建。在这项工作中，我们引入了并行成像启发的自洽性（PISCO）概念，将其作为新颖的自监督k空间正规化，强化一致的邻域关系。在没有额外数据成本的情况下，所提出的正规化显着改善了在模拟数据上的神经隐式k空间重建。使用PISCO进行腹部体内重建结果显示出比最先进方法更好的时空图像质量。代码可在https://github.com/vjspi/PISCO-NIK上找到。

更新时间: 2024-04-12 09:31:11

领域: eess.IV,cs.CV,cs.LG,eess.SP,physics.med-ph

下载: http://arxiv.org/abs/2404.08350v1

Unraveling the Impact of Initial Choices and In-Loop Interventions on Learning Dynamics in Autonomous Scanning Probe Microscopy

The current focus in Autonomous Experimentation (AE) is on developing robust workflows to conduct the AE effectively. This entails the need for well-defined approaches to guide the AE process, including strategies for hyperparameter tuning and high-level human interventions within the workflow loop. This paper presents a comprehensive analysis of the influence of initial experimental conditions and in-loop interventions on the learning dynamics of Deep Kernel Learning (DKL) within the realm of AE in Scanning Probe Microscopy. We explore the concept of 'seed effect', where the initial experiment setup has a substantial impact on the subsequent learning trajectory. Additionally, we introduce an approach of the seed point interventions in AE allowing the operator to influence the exploration process. Using a dataset from Piezoresponse Force Microscopy (PFM) on PbTiO3 thin films, we illustrate the impact of the 'seed effect' and in-loop seed interventions on the effectiveness of DKL in predicting material properties. The study highlights the importance of initial choices and adaptive interventions in optimizing learning rates and enhancing the efficiency of automated material characterization. This work offers valuable insights into designing more robust and effective AE workflows in microscopy with potential applications across various characterization techniques. The analysis code that supports the funding is publicly available at https://github.com/Slautin/2024_Seed_effect_DKL_BO.

Updated: 2024-04-12 09:28:47

标题: 解开自主扫描探针显微镜中初始选择和循环干预对学习动态的影响

摘要: 自主实验（AE）的当前重点是开发稳健的工作流程，以有效开展AE。这需要明确定义指导AE过程的方法，包括超参数调整和工作流程循环中高级人类干预的策略。本文综合分析了初试实验条件和循环干预对扫描探针显微镜中自主实验中深层核学习（DKL）学习动态的影响。我们探讨了“种子效应”的概念，即初始实验设置对后续学习轨迹产生重大影响。此外，我们介绍了AE中种子点干预的方法，允许操作员影响探索过程。通过使用PbTiO3薄膜上的压电响应力显微镜（PFM）数据集，我们阐明了“种子效应”和循环种子干预对DKL在预测材料性质方面的影响。该研究强调了初始选择和自适应干预在优化学习速率和提高自动化材料表征效率方面的重要性。这项工作为在显微镜中设计更稳健和有效的AE工作流程提供了有价值的见解，在各种表征技术中具有潜在应用。支持资金的分析代码可在https://github.com/Slautin/2024_Seed_effect_DKL_BO上公开获得。

更新时间: 2024-04-12 09:28:47

领域: cs.LG,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2402.00071v2

Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks

Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the "modality imbalance" problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting its overall effectiveness. To address this challenge, the core idea is to balance the optimization of each modality to achieve a joint optimum. Existing approaches often employ a modal-level control mechanism for adjusting the update of each modal parameter. However, such a global-wise updating mechanism ignores the different importance of each parameter. Inspired by subnetwork optimization, we explore a uniform sampling-based optimization strategy and find it more effective than global-wise updating. According to the findings, we further propose a novel importance sampling-based, element-wise joint optimization method, called Adaptively Mask Subnetworks Considering Modal Significance(AMSS). Specifically, we incorporate mutual information rates to determine the modal significance and employ non-uniform adaptive sampling to select foreground subnetworks from each modality for parameter updates, thereby rebalancing multi-modal learning. Additionally, we demonstrate the reliability of the AMSS strategy through convergence analysis. Building upon theoretical insights, we further enhance the multi-modal mask subnetwork strategy using unbiased estimation, referred to as AMSS+. Extensive experiments reveal the superiority of our approach over comparison methods.

Updated: 2024-04-12 09:22:24

标题: 学习通过自适应屏蔽子网络来重新平衡多模态优化

摘要: 多模态学习旨在通过统一来自各种模态的模型来提高性能，但在真实数据中常常面临“模态不平衡”问题，导致对主导模态的偏见，忽视其他模态，从而限制其整体有效性。为了解决这一挑战，核心思想是平衡每个模态的优化，以实现联合最优。现有方法通常采用模态级控制机制来调整每个模态参数的更新。然而，这种全局更新机制忽略了每个参数的不同重要性。受子网络优化启发，我们探索了基于均匀抽样的优化策略，并发现其比全局更新更有效。根据研究结果，我们进一步提出了一种基于重要性抽样的元素级联合优化方法，称为考虑模态重要性的自适应掩码子网络方法（AMSS）。具体来说，我们将互信息率纳入确定模态重要性，并采用非均匀自适应抽样来选择每个模态的前景子网络进行参数更新，从而重新平衡多模态学习。此外，我们通过收敛分析证明了AMSS策略的可靠性。基于理论洞察，我们进一步利用无偏估计增强了多模态掩码子网络策略，称为AMSS+。广泛的实验显示了我们方法相对于比较方法的优越性。

更新时间: 2024-04-12 09:22:24

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.08347v1

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Transformers come with a high computational cost, yet their effectiveness in addressing problems in language and vision has sparked extensive research aimed at enhancing their efficiency. However, diverse experimental conditions, spanning multiple input domains, prevent a fair comparison based solely on reported results, posing challenges for model selection. To address this gap in comparability, we design a comprehensive benchmark of more than 30 models for image classification, evaluating key efficiency aspects, including accuracy, speed, and memory usage. This benchmark provides a standardized baseline across the landscape of efficiency-oriented transformers and our framework of analysis, based on Pareto optimality, reveals surprising insights. Despite claims of other models being more efficient, ViT remains Pareto optimal across multiple metrics. We observe that hybrid attention-CNN models exhibit remarkable inference memory- and parameter-efficiency. Moreover, our benchmark shows that using a larger model in general is more efficient than using higher resolution images. Thanks to our holistic evaluation, we provide a centralized resource for practitioners and researchers, facilitating informed decisions when selecting transformers or measuring progress of the development of efficient transformers.

Updated: 2024-04-12 09:21:33

标题: 哪种Transformer更值得青睐：视觉Transformer效率的比较分析

摘要: 变压器的计算成本很高，但它们在语言和视觉问题上的有效性引发了广泛的研究，旨在提高其效率。然而，多样的实验条件涵盖了多个输入领域，仅基于报告的结果进行公平比较存在挑战，为模型选择带来困难。为了填补这一可比性差距，我们设计了一个涵盖了30多个模型的全面基准，用于图像分类，评估关键的效率方面，包括准确性、速度和内存使用。这个基准提供了效率导向变压器领域的标准基线，我们基于帕累托最优性的分析框架揭示了令人惊讶的见解。尽管其他模型声称更加高效，但ViT在多个指标上仍然保持帕累托最优。我们观察到混合注意力-CNN模型展示了卓越的推理内存和参数效率。此外，我们的基准显示，通常情况下使用更大的模型比使用更高分辨率的图像更有效。由于我们的全面评估，我们为从业者和研究人员提供了一个集中的资源，便于在选择变压器或衡量高效变压器开发进展时做出明智的决定。

更新时间: 2024-04-12 09:21:33

领域: cs.CV,cs.AI,cs.LG,68T07,I.4.0; I.2.10; I.5.1

下载: http://arxiv.org/abs/2308.09372v2

Is Complexity an Illusion?

Simplicity is held by many to be the key to general intelligence. Simpler models tend to "generalise", identifying the cause or generator of data with greater sample efficiency. The implications of the correlation between simplicity and generalisation extend far beyond computer science, addressing questions of physics and even biology. Yet simplicity is a property of form, while generalisation is of function. In interactive settings, any correlation between the two depends on interpretation. In theory there could be no correlation and yet in practice, there is. Previous theoretical work showed generalisation to be a consequence of "weak" constraints implied by function, not form. Experiments demonstrated choosing weak constraints over simple forms yielded a 110-500% improvement in generalisation rate. Here we measure the complexity of weak constraints, and show that if one does not presuppose an abstraction layer, then all have equal complexity. However, in the context of a spatially and temporally extended abstraction layer, efficiency demands weak constraints take simple forms, and simplicity becomes correlated with generalisation. Simplicity has no causal influence on generalisation, but appears to due to confounding.

Updated: 2024-04-12 09:08:35

标题: 复杂性是一种错觉吗？

摘要: 许多人认为简单是智能的关键。简单的模型往往能够更有效地“概括”，用更少的样本识别数据的原因或生成器。简单性和泛化之间的相关性的影响远远超出了计算机科学，涉及到物理甚至生物学的问题。然而，简单性是形式的属性，而泛化是功能的属性。在交互设置中，两者之间的任何相关性都取决于解释。理论上可能没有相关性，但实际上却存在。先前的理论工作表明，泛化是由功能隐含的“弱”约束的结果，而不是由形式。实验证明，选择弱约束而不是简单形式可以使泛化率提高110-500%。在这里，我们衡量了弱约束的复杂性，并表明如果不预先假设一个抽象层，则所有约束的复杂性都相同。然而，在一个空间和时间上延伸的抽象层的背景下，效率要求弱约束采取简单形式，简单性与泛化成为相关。简单性对泛化没有因果影响，但似乎是由于混杂。

更新时间: 2024-04-12 09:08:35

领域: cs.AI

下载: http://arxiv.org/abs/2404.07227v2

Using Large Language Models to Understand Telecom Standards

The Third Generation Partnership Project (3GPP) has successfully introduced standards for global mobility. However, the volume and complexity of these standards has increased over time, thus complicating access to relevant information for vendors and service providers. Use of Generative Artificial Intelligence (AI) and in particular Large Language Models (LLMs), may provide faster access to relevant information. In this paper, we evaluate the capability of state-of-art LLMs to be used as Question Answering (QA) assistants for 3GPP document reference. Our contribution is threefold. First, we provide a benchmark and measuring methods for evaluating performance of LLMs. Second, we do data preprocessing and fine-tuning for one of these LLMs and provide guidelines to increase accuracy of the responses that apply to all LLMs. Third, we provide a model of our own, TeleRoBERTa, that performs on-par with foundation LLMs but with an order of magnitude less number of parameters. Results show that LLMs can be used as a credible reference tool on telecom technical documents, and thus have potential for a number of different applications from troubleshooting and maintenance, to network operations and software product development.

Updated: 2024-04-12 09:08:30

标题: 使用大型语言模型来理解电信标准

摘要: 第三代合作伙伴计划（3GPP）已成功推出了全球移动性标准。然而，这些标准的数量和复杂性随着时间的推移而增加，因此使供应商和服务提供商难以访问相关信息。使用生成式人工智能（AI），特别是大型语言模型（LLM），可能会提供更快的获取相关信息的途径。在本文中，我们评估了最先进的LLM用作3GPP文档参考的问答助手的能力。我们的贡献有三个方面。首先，我们提供了一个用于评估LLM性能的基准和测量方法。其次，我们对其中一个LLM进行了数据预处理和微调，并提供了适用于所有LLM的提高响应准确性的指南。第三，我们提供了我们自己的模型TeleRoBERTa，其性能与基础LLM相当，但参数数量少一个数量级。结果显示，LLM可以作为电信技术文档的可信参考工具，并因此在故障排除和维护、网络运营和软件产品开发等多种应用中具有潜力。

更新时间: 2024-04-12 09:08:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.02929v2

Toward a Theory of Tokenization in LLMs

While there has been a large body of research attempting to circumvent tokenization for language modeling (Clark et al., 2022; Xue et al., 2022), the current consensus is that it is a necessary initial step for designing state-of-the-art performant language models. In this paper, we investigate tokenization from a theoretical point of view by studying the behavior of transformers on simple data generating processes. When trained on data drawn from certain simple $k^{\text{th}}$-order Markov processes for $k > 1$, transformers exhibit a surprising phenomenon - in the absence of tokenization, they empirically fail to learn the right distribution and predict characters according to a unigram model (Makkuva et al., 2024). With the addition of tokenization, however, we empirically observe that transformers break through this barrier and are able to model the probabilities of sequences drawn from the source near-optimally, achieving small cross-entropy loss. With this observation as starting point, we study the end-to-end cross-entropy loss achieved by transformers with and without tokenization. With the appropriate tokenization, we show that even the simplest unigram models (over tokens) learnt by transformers are able to model the probability of sequences drawn from $k^{\text{th}}$-order Markov sources near optimally. Our analysis provides a justification for the use of tokenization in practice through studying the behavior of transformers on Markovian data.

Updated: 2024-04-12 09:01:14

标题: 走向LLMs中Tokenization理论

摘要: 虽然已经有大量研究试图规避语言建模中的标记化（Clark等人，2022年；Xue等人，2022年），但目前的共识是，这是设计最先进性能的语言模型的必要初始步骤。在本文中，我们从理论角度研究了标记化，通过研究变压器在简单数据生成过程上的行为。当在从某些简单的$k^{\text{th}}$阶马尔可夫过程中抽取的数据上进行训练时，变压器表现出一种令人惊讶的现象 - 在没有标记化的情况下，它们在经验上无法学习正确的分布，并根据单字模型（Makkuva等人，2024年）预测字符。然而，通过添加标记化，我们在经验上观察到，变压器突破了这一障碍，并能够近乎最佳地建模从源中抽取的序列的概率，实现小的交叉熵损失。以此观察为起点，我们研究了变压器在有和没有标记化的情况下实现的端到端交叉熵损失。通过适当的标记化，我们表明，即使是变压器学习的最简单的单字模型（关于标记）也能够近乎最佳地建模从$k^{\text{th}}$阶马尔可夫源中抽取的序列的概率。我们的分析通过研究变压器在马尔可夫数据上的行为，为实践中使用标记化提供了理由。

更新时间: 2024-04-12 09:01:14

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.08335v1

Properties of Discrete Sliced Wasserstein Losses

The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence and a uniform Central Limit result on the process $\mathcal{E}_p(Y)$. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies.

Updated: 2024-04-12 08:51:55

标题: 离散切片瓦瑟斯坦损失的特性

摘要: 切片Wasserstein（SW）距离已成为比较概率测度的流行替代方法。广泛的应用包括图像处理、领域适应和生成建模，通常通过优化一些参数来最小化SW，它在离散概率测度之间作为损失函数（因为具有密度的测度在数值上无法实现）。所有这些优化问题都有相同的子问题，即最小化切片Wasserstein能量。本文研究了$\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z)$的性质，即作为一个函数的两个具有相同点数的均匀离散测度之间的SW距离，其中一个测度的支持$Y \in \mathbb{R}^{n \times d}$。我们调查了这个能量的正则性和优化性质，以及它的蒙特卡洛逼近$\mathcal{E}_p$（仅使用$p$个样本来估计SW中的期望），并展示了$\mathcal{E}_p$的临界点收敛到$\mathcal{E}$的临界点，以及对过程$\mathcal{E}_p(Y)$的几乎肯定一致收敛和一致中心极限结果。最后，我们展示了以某种意义上，最小化$\mathcal{E}$和$\mathcal{E}_p$的随机梯度下降方法收敛于这些能量的（Clarke）临界点。

更新时间: 2024-04-12 08:51:55

领域: stat.ML,cs.LG,math.OC,math.PR

下载: http://arxiv.org/abs/2307.10352v4

CGS-Mask: Making Time Series Predictions Intuitive for All

Artificial intelligence (AI) has immense potential in time series prediction, but most explainable tools have limited capabilities in providing a systematic understanding of important features over time. These tools typically rely on evaluating a single time point, overlook the time ordering of inputs, and neglect the time-sensitive nature of time series applications. These factors make it difficult for users, particularly those without domain knowledge, to comprehend AI model decisions and obtain meaningful explanations. We propose CGS-Mask, a post-hoc and model-agnostic cellular genetic strip mask-based saliency approach to address these challenges. CGS-Mask uses consecutive time steps as a cohesive entity to evaluate the impact of features on the final prediction, providing binary and sustained feature importance scores over time. Our algorithm optimizes the mask population iteratively to obtain the optimal mask in a reasonable time. We evaluated CGS-Mask on synthetic and real-world datasets, and it outperformed state-of-the-art methods in elucidating the importance of features over time. According to our pilot user study via a questionnaire survey, CGS-Mask is the most effective approach in presenting easily understandable time series prediction results, enabling users to comprehend the decision-making process of AI models with ease.

Updated: 2024-04-12 08:44:25

标题: CGS-Mask：使时间序列预测对所有人都直观化

摘要: 人工智能（AI）在时间序列预测方面具有巨大潜力，但大多数可解释工具在提供重要特征的系统理解方面能力有限。这些工具通常依赖于评估单个时间点，忽视输入的时间顺序，并忽视时间序列应用的时间敏感性质。这些因素使得用户，特别是那些没有领域知识的用户，很难理解AI模型的决策并获得有意义的解释。我们提出了CGS-Mask，这是一种基于细胞遗传条带蒙版的事后和模型无关的显著性方法，旨在解决这些挑战。CGS-Mask使用连续的时间步作为一个连贯的实体来评估特征对最终预测的影响，提供随时间变化的二进制和持续的特征重要性评分。我们的算法通过迭代优化蒙版种群，以在合理时间内获得最佳蒙版。我们在合成和真实世界数据集上评估了CGS-Mask，并在阐明随时间推移特征重要性方面胜过最先进方法。根据我们通过问卷调查进行的初步用户研究，CGS-Mask是呈现易于理解的时间序列预测结果最有效的方法，使用户能够轻松理解AI模型的决策过程。

更新时间: 2024-04-12 08:44:25

领域: cs.AI

下载: http://arxiv.org/abs/2312.09513v3

Be Bayesian by Attachments to Catch More Uncertainty

Bayesian Neural Networks (BNNs) have become one of the promising approaches for uncertainty estimation due to the solid theorical foundations. However, the performance of BNNs is affected by the ability of catching uncertainty. Instead of only seeking the distribution of neural network weights by in-distribution (ID) data, in this paper, we propose a new Bayesian Neural Network with an Attached structure (ABNN) to catch more uncertainty from out-of-distribution (OOD) data. We first construct a mathematical description for the uncertainty of OOD data according to the prior distribution, and then develop an attached Bayesian structure to integrate the uncertainty of OOD data into the backbone network. ABNN is composed of an expectation module and several distribution modules. The expectation module is a backbone deep network which focuses on the original task, and the distribution modules are mini Bayesian structures which serve as attachments of the backbone. In particular, the distribution modules aim at extracting the uncertainty from both ID and OOD data. We further provide theoretical analysis for the convergence of ABNN, and experimentally validate its superiority by comparing with some state-of-the-art uncertainty estimation methods Code will be made available.

Updated: 2024-04-12 08:37:18

标题: 依附于贝叶斯的方式来捕捉更多的不确定性

摘要: 贝叶斯神经网络（BNNs）已经成为不确定性估计的一种有前途的方法，因为其扎实的理论基础。然而，BNNs的性能受到捕捉不确定性能力的影响。本文提出了一种新的带有附加结构的贝叶斯神经网络（ABNN），以从超出分布（OOD）数据中捕获更多的不确定性，而不仅仅是通过分布数据来寻找神经网络权重的分布。我们首先根据先验分布为OOD数据的不确定性构建了一个数学描述，然后开发了一个附加的贝叶斯结构，将OOD数据的不确定性整合到骨干网络中。ABNN由一个期望模块和几个分布模块组成。期望模块是一个专注于原始任务的骨干深度网络，而分布模块是作为骨干的附件的小型贝叶斯结构。特别地，分布模块旨在从ID和OOD数据中提取不确定性。我们进一步对ABNN的收敛性进行了理论分析，并通过与一些最先进的不确定性估计方法进行比较来实验验证其优越性。源代码将提供。

更新时间: 2024-04-12 08:37:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.13027v2

Uncertainty Aware Tropical Cyclone Wind Speed Estimation from Satellite Data

Deep neural networks (DNNs) have been successfully applied to earth observation (EO) data and opened new research avenues. Despite the theoretical and practical advances of these techniques, DNNs are still considered black box tools and by default are designed to give point predictions. However, the majority of EO applications demand reliable uncertainty estimates that can support practitioners in critical decision making tasks. This work provides a theoretical and quantitative comparison of existing uncertainty quantification methods for DNNs applied to the task of wind speed estimation in satellite imagery of tropical cyclones. We provide a detailed evaluation of predictive uncertainty estimates from state-of-the-art uncertainty quantification (UQ) methods for DNNs. We find that predictive uncertainties can be utilized to further improve accuracy and analyze the predictive uncertainties of different methods across storm categories.

Updated: 2024-04-12 08:35:38

标题: 不确定性感知卫星数据估算热带气旋风速

摘要: 深度神经网络（DNNs）已成功应用于地球观测（EO）数据，并开辟了新的研究途径。尽管这些技术在理论和实践上取得了进展，但DNNs仍被认为是黑匣工具，并默认设计为提供点预测。然而，大多数EO应用需要可靠的不确定性估计，以支持从业者在关键决策任务中。本文对DNNs应用于卫星图像中热带气旋风速估计任务的现有不确定性量化方法进行了理论和定量比较。我们对来自最先进不确定性量化（UQ）方法的预测不确定性估计进行了详细评估。我们发现，预测不确定性可用于进一步提高准确性，并分析不同方法在风暴类别中的预测不确定性。

更新时间: 2024-04-12 08:35:38

领域: physics.ao-ph,cs.LG,stat.AP

下载: http://arxiv.org/abs/2404.08325v1

Short vs. Long-term Coordination of Drones: When Distributed Optimization Meets Deep Reinforcement Learning

Swarms of autonomous interactive drones, with the support of recharging technology, can provide compelling sensing capabilities in Smart Cities, such as traffic monitoring and disaster response. This paper aims to deliver a novel coordination solution for the cost-effective navigation, sensing, and recharging of drones. Existing approaches, such as deep reinforcement learning (DRL), offer long-term adaptability, but lack energy efficiency, resilience, and flexibility in dynamic environments. Therefore, this paper proposes a novel approach where each drone independently determines its flying direction and recharging place using DRL, while adapting navigation and sensing through distributed optimization, which improves energy-efficiency during sensing tasks. Furthermore, drones efficiently exchange information while retaining decision-making autonomy via a structured tree communication model. Extensive experimentation with datasets generated from realistic urban mobility underscores an outstanding performance of the proposed solution compared to state-of-the-art methods. Significant new insights show that long-term methods optimize scarce drone resource for traffic management, while the integration of short-term methods is crucial for advising on charging policies and maintaining battery safety.

Updated: 2024-04-12 08:32:58

标题: 短期与长期的无人机协作：分布式优化与深度强化学习的结合

摘要: 自主互动无人机群通过充电技术支持，可以在智慧城市中提供引人注目的感知能力，如交通监控和灾难响应。本文旨在为无人机的成本有效导航、感知和充电提供一种新颖的协调解决方案。现有方法，如深度强化学习（DRL），具有长期适应性，但在动态环境中缺乏能源效率、韧性和灵活性。因此，本文提出了一种新颖的方法，每架无人机独立使用DRL确定其飞行方向和充电地点，同时通过分布式优化调整导航和感知，提高感知任务中的能效。此外，无人机通过结构化的树通信模型高效交换信息，保留决策自主性。通过使用从现实城市移动性生成的数据集进行广泛实验，强调了所提出解决方案与最先进方法相比的出色表现。重要的新见解显示，长期方法优化有限的无人机资源用于交通管理，而短期方法的整合对于制定充电政策和维护电池安全至关重要。

更新时间: 2024-04-12 08:32:58

领域: cs.RO,cs.LG,cs.MA

下载: http://arxiv.org/abs/2311.09852v5

BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task Promoting

From-scratch name disambiguation is an essential task for establishing a reliable foundation for academic platforms. It involves partitioning documents authored by identically named individuals into groups representing distinct real-life experts. Canonically, the process is divided into two decoupled tasks: locally estimating the pairwise similarities between documents followed by globally grouping these documents into appropriate clusters. However, such a decoupled approach often inhibits optimal information exchange between these intertwined tasks. Therefore, we present BOND, which bootstraps the local and global informative signals to promote each other in an end-to-end regime. Specifically, BOND harnesses local pairwise similarities to drive global clustering, subsequently generating pseudo-clustering labels. These global signals further refine local pairwise characterizations. The experimental results establish BOND's superiority, outperforming other advanced baselines by a substantial margin. Moreover, an enhanced version, BOND+, incorporating ensemble and post-match techniques, rivals the top methods in the WhoIsWho competition.

Updated: 2024-04-12 08:28:52

标题: BOND: 从头开始的名称消歧，多任务促进的自举方法

摘要: 从头开始的名称消歧是建立学术平台可靠基础的重要任务。它涉及将由同名个人撰写的文件分为代表不同真实专家的组。经典地，该过程分为两个解耦的任务：在本地估计文档之间的成对相似性，然后全局地将这些文档分组成适当的簇。然而，这种解耦方法经常抑制这些相互关联任务之间的最佳信息交换。因此，我们提出了BOND，它利用本地和全局信息信号在端到端的制度中相互促进。具体地，BOND利用本地成对相似性驱动全局聚类，随后生成伪聚类标签。这些全局信号进一步完善本地成对特征描述。实验结果证实了BOND的优越性，明显优于其他先进基线。此外，增强版BOND+结合集成和后期匹配技术，与WhoIsWho竞赛中的顶尖方法不相上下。

更新时间: 2024-04-12 08:28:52

领域: cs.SI,cs.AI,H.3.7; H.3.3

下载: http://arxiv.org/abs/2404.08322v1

Multi-Step Traffic Prediction for Multi-Period Planning in Optical Networks

A multi-period planning framework is proposed that exploits multi-step ahead traffic predictions to address service overprovisioning and improve adaptability to traffic changes, while ensuring the necessary quality-of-service (QoS) levels. An encoder-decoder deep learning model is initially leveraged for multi-step ahead prediction by analyzing real-traffic traces. This information is then exploited by multi-period planning heuristics to efficiently utilize available network resources while minimizing undesired service disruptions (caused due to lightpath re-allocations), with these heuristics outperforming a single-step ahead prediction approach.

Updated: 2024-04-12 08:20:01

标题: 光网络中多时期规划的多步交通预测

摘要: 提出了一个多期规划框架，利用多步预测交通来解决服务过度提供问题，并提高适应交通变化的能力，同时确保必要的服务质量水平。首先利用编码器-解码器深度学习模型对实际交通迹象进行多步预测。然后利用多期规划启发式方法来有效利用可用网络资源，同时最小化不希望的服务中断（由于光路径重新分配而引起），这些启发式方法胜过单步预测方法。

更新时间: 2024-04-12 08:20:01

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2404.08314v1

The Integration of Semantic and Structural Knowledge in Knowledge Graph Entity Typing

The Knowledge Graph Entity Typing (KGET) task aims to predict missing type annotations for entities in knowledge graphs. Recent works only utilize the \textit{\textbf{structural knowledge}} in the local neighborhood of entities, disregarding \textit{\textbf{semantic knowledge}} in the textual representations of entities, relations, and types that are also crucial for type inference. Additionally, we observe that the interaction between semantic and structural knowledge can be utilized to address the false-negative problem. In this paper, we propose a novel \textbf{\underline{S}}emantic and \textbf{\underline{S}}tructure-aware KG \textbf{\underline{E}}ntity \textbf{\underline{T}}yping~{(SSET)} framework, which is composed of three modules. First, the \textit{Semantic Knowledge Encoding} module encodes factual knowledge in the KG with a Masked Entity Typing task. Then, the \textit{Structural Knowledge Aggregation} module aggregates knowledge from the multi-hop neighborhood of entities to infer missing types. Finally, the \textit{Unsupervised Type Re-ranking} module utilizes the inference results from the two models above to generate type predictions that are robust to false-negative samples. Extensive experiments show that SSET significantly outperforms existing state-of-the-art methods.

Updated: 2024-04-12 08:17:44

标题: 知识图谱实体类型的语义和结构知识整合

摘要: 知识图谱实体类型（KGET）任务旨在预测知识图谱中实体的缺失类型注释。最近的研究仅利用实体局部邻域中的\textit{\textbf{结构知识}}，忽略了实体、关系和类型的文本表示中同样关键的\textit{\textbf{语义知识}}，这对类型推断也非常重要。此外，我们观察到语义和结构知识之间的交互可以用来解决假阴性问题。在本文中，我们提出了一个新颖的\textbf{\underline{S}}emantic and \textbf{\underline{S}}tructure-aware KG \textbf{\underline{E}}ntity \textbf{\underline{T}}yping~{(SSET)}框架，由三个模块组成。首先，\textit{Semantic Knowledge Encoding} 模块使用蒙版实体类型任务对知识图谱中的事实知识进行编码。然后，\textit{Structural Knowledge Aggregation} 模块从实体的多跳邻域中聚合知识以推断缺失的类型。最后，\textit{Unsupervised Type Re-ranking} 模块利用上述两个模型的推断结果生成对假阴性样本具有鲁棒性的类型预测。大量实验证明，SSET明显优于现有的最先进方法。

更新时间: 2024-04-12 08:17:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.08313v1

The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice. We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimizes the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP. Despite recent efforts, the sample complexity of RMDPs remained mostly unsettled regardless of the uncertainty set in use. It was unclear if distributional robustness bears any statistical consequences when benchmarked against standard RL. Assuming access to a generative model that draws samples based on the nominal MDP, we characterize the sample complexity of RMDPs when the uncertainty set is specified via either the total variation (TV) distance or $\chi^2$ divergence. The algorithm studied here is a model-based method called {\em distributionally robust value iteration}, which is shown to be near-optimal for the full range of uncertainty levels. Somewhat surprisingly, our results uncover that RMDPs are not necessarily easier or harder to learn than standard MDPs. The statistical consequence incurred by the robustness requirement depends heavily on the size and shape of the uncertainty set: in the case w.r.t.~the TV distance, the minimax sample complexity of RMDPs is always smaller than that of standard MDPs; in the case w.r.t.~the $\chi^2$ divergence, the sample complexity of RMDPs can often far exceed the standard MDP counterpart.

Updated: 2024-04-12 08:09:33

标题: 《具有生成模型的强化学习中分布鲁棒性的奇特代价》

摘要: 本文研究了强化学习（RL）中模型的鲁棒性，以减少实践中的模拟-真实差距。我们采用分布鲁棒马尔科夫决策过程（RMDPs）框架，旨在学习一种策略，当部署的环境落在围绕名义MDP的预定不确定性集合内时，优化最坏情况下的性能。尽管最近有一些努力，但RMDPs的样本复杂性仍然未解决，无论使用的不确定性集合是什么。目前尚不清楚在与标准RL进行基准测试时，分布鲁棒性是否会产生任何统计后果。假设可以访问一个根据名义MDP绘制样本的生成模型，我们描述了当通过总变差（TV）距离或$\chi^2$散度指定不确定性集合时的RMDPs的样本复杂性。这里研究的算法是一种称为{\em 分布鲁棒值迭代}的基于模型的方法，被证明在全范围的不确定性水平上是接近最优的。有点令人惊讶的是，我们的结果揭示了RMDPs不一定比标准MDPs更容易或更难学习。鲁棒性要求带来的统计后果在很大程度上取决于不确定性集合的大小和形状：在TV距离的情况下，RMDPs的最小最大样本复杂性始终小于标准MDPs的；在$\chi^2$散度的情况下，RMDPs的样本复杂性往往远远超过标准MDP的对应值。

更新时间: 2024-04-12 08:09:33

领域: cs.LG,cs.IT,math.IT,math.ST,stat.TH

下载: http://arxiv.org/abs/2305.16589v2

Manifest V3 Unveiled: Navigating the New Era of Browser Extensions

Introduced over a decade ago, Chrome extensions now exceed 200,000 in number. In 2020, Google announced a shift in extension development with Manifest Version 3 (V3), aiming to replace the previous Version 2 (V2) by January 2023. This deadline was later extended to January 2025. The company's decision is grounded in enhancing three main pillars: privacy, security, and performance. This paper presents a comprehensive analysis of the Manifest V3 ecosystem. We start by investigating the adoption rate of V3, detailing the percentage of adoption from its announcement up until 2024. Our findings indicate, prior to the 2023 pause, less than 5% of all extensions had transitioned to V3, despite the looming deadline for the complete removal of V2, while currently nine out of ten new extensions are being uploaded in Manifest V3. Furthermore, we compare the security and privacy enhancements between V2 and V3 and we evaluate the improved security attributable to V3's safer APIs, examining how certain APIs, which were vulnerable or facilitated malicious behavior, have been deprecated or removed in V3. We dynamically execute 517 confirmed malicious extensions and we see a 87.8% removal of APIs related to malicious behavior due to the improvements of V3. We discover that only 154 (29.8%) of these extensions remain functional post-conversion. This analysis leads to the conclusion that V3 reduces the avenues for abuse of such APIs. However, despite the reduction in APIs associated with malicious activities, the new Manifest V3 protocol is not immune to such behavior. Our research demonstrates, through a proof of concept, the adaptability of malicious activities to V3. After the proof of concept changes are applied, we showcase 290 (56%) of the examined malicious extensions retain their capability to conduct harmful activities within the V3 framework.

Updated: 2024-04-12 08:09:26

标题: V3揭示：导航浏览器扩展的新时代

摘要: 引入十多年前，Chrome扩展现在数量超过了20万。 2020年，Google宣布了扩展开发的转变，采用了Manifest Version 3（V3），旨在在2023年1月之前取代之前的Version 2（V2）。该截止日期后来延长到2025年1月。公司的决定基于增强三大支柱：隐私、安全和性能。本文对Manifest V3生态系统进行了全面分析。我们首先调查了V3的采用率，详细说明了从其宣布到2024年之间的采用百分比。我们的研究结果表明，在2023年暂停之前，不到5%的所有扩展已经过渡到V3，尽管之前已经有了完全移除V2的截止日期，而目前十分之九的新扩展都在Manifest V3中上传。此外，我们比较了V2和V3之间的安全和隐私增强，并评估了V3更安全的API所带来的安全性提升，检查了某些曾经容易受到攻击或促进恶意行为的API在V3中已经被弃用或移除。我们动态执行了517个确认为恶意的扩展，由于V3的改进，我们看到与恶意行为相关的API的移除率为87.8%。我们发现，在转换后仅有154个（29.8%）扩展仍然在功能上正常。这个分析得出结论，V3减少了这些API被滥用的可能性。然而，尽管与恶意活动相关的API减少了，新的Manifest V3协议仍然无法免受此类行为的侵害。我们的研究通过一个概念证明，展示了恶意活动对V3的适应能力。在应用了概念证明变化后，我们展示了290个（56%）经过检查的恶意扩展仍然保留了在V3框架内进行有害活动的能力。

更新时间: 2024-04-12 08:09:26

领域: cs.CR

下载: http://arxiv.org/abs/2404.08310v1

Subtoxic Questions: Dive Into Attitude Change of LLM's Response in Jailbreak Attempts

As Large Language Models (LLMs) of Prompt Jailbreaking are getting more and more attention, it is of great significance to raise a generalized research paradigm to evaluate attack strengths and a basic model to conduct subtler experiments. In this paper, we propose a novel approach by focusing on a set of target questions that are inherently more sensitive to jailbreak prompts, aiming to circumvent the limitations posed by enhanced LLM security. Through designing and analyzing these sensitive questions, this paper reveals a more effective method of identifying vulnerabilities in LLMs, thereby contributing to the advancement of LLM security. This research not only challenges existing jailbreaking methodologies but also fortifies LLMs against potential exploits.

Updated: 2024-04-12 08:08:44

标题: 亚毒性问题：深入探讨LLM在越狱尝试中态度变化的回应

摘要: 随着大型语言模型（LLMs）的Prompt越来越受到关注，提出一个通用的研究范式来评估攻击强度和进行更微妙的实验是非常重要的。本文提出了一种新颖的方法，通过关注一组目标问题，这些问题本质上更容易受到越狱提示的影响，旨在规避增强LLM安全性带来的限制。通过设计和分析这些敏感问题，本文揭示了一种更有效的方法来识别LLMs中的漏洞，从而促进了LLMs安全性的进步。这项研究不仅挑战了现有的越狱方法，还加强了LLMs对潜在利用的防范。

更新时间: 2024-04-12 08:08:44

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.08309v1

Performance Analysis of Decentralized Physical Infrastructure Networks and Centralized Clouds

The advent of Decentralized Physical Infrastructure Networks (DePIN) represents a shift in the digital infrastructure of today's Internet. While Centralized Service Providers (CSP) monopolize cloud computing, DePINs aim to enhance data sovereignty and confidentiality and increase resilience against a single point of failure. Due to the novelty of the emerging field of DePIN, this work focuses on the potential of DePINs to disrupt traditional centralized architectures by taking advantage of the Internet of Things (IoT) devices and crypto-economic design in combination with blockchains. This combination yields Acurast, a more distributed, resilient, and user-centric physical infrastructure deployment. Through comparative analysis with centralized systems, particularly in serverless computing contexts, this work seeks to lay the first steps in scientifically evaluating DePINs and quantitatively comparing them in terms of efficiency and effectiveness in real-world applications. The findings suggest DePINs' potential to (i) reduce trust assumptions and physically decentralized infrastructure, (ii) increase efficiency and performance simultaneously while improving the computation's (iii) confidentiality and verifiability.

Updated: 2024-04-12 08:00:38

标题: 分析去中心化物理基础设施网络和集中式云的性能

摘要: DePIN的出现代表了当今互联网数字基础设施的转变。虽然集中式服务提供商（CSP）垄断了云计算，但DePIN旨在增强数据主权和机密性，并提高抗击单点故障的能力。由于DePIN这一新兴领域的创新性，本文关注DePIN利用物联网设备和加密经济设计与区块链相结合，颠覆传统集中式架构的潜力。这一组合产生了Acurast，一种更分布式、更具韧性和更用户中心的物理基础设施部署。通过与集中式系统的比较分析，特别是在无服务器计算环境中，本文旨在科学评估DePIN，并在效率和实际应用的有效性方面进行定量比较的第一步。研究结果表明DePIN有潜力（i）减少信任假设和物理分散基础设施，（ii）同时提高效率和性能，同时改善计算的（iii）机密性和可验证性。

更新时间: 2024-04-12 08:00:38

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2404.08306v1

A Large Scale Survey of Motivation in Software Development and Analysis of its Validity

Context: Motivation is known to improve performance. In software development in particular, there has been considerable interest in the motivation of contributors to open source. Objective: We identify 11 motivators from the literature (enjoying programming, ownership of code, learning, self use, etc.), and evaluate their relative effect on motivation. Since motivation is an internal subjective feeling, we also analyze the validity of the answers. Method: We conducted a survey with 66 questions on motivation which was completed by 521 developers. Most of the questions used an 11 point scale. We evaluated the validity of the answers validity by comparing related questions, comparing to actual behavior on GitHub, and comparison with the same developer in a follow up survey. Results: Validity problems include moderate correlations between answers to related questions, as well as self promotion and mistakes in the answers. Despite these problems, predictive analysis, investigating how diverse motivators influence the probability of high motivation, provided valuable insights. The correlations between the different motivators are low, implying their independence. High values in all 11 motivators predict increased probability of high motivation. In addition, improvement analysis shows that an increase in most motivators predicts an increase in general motivation.

Updated: 2024-04-12 07:51:21

标题: 软件开发中动机的大规模调查及其有效性分析

摘要: 背景：已知动机能够提高绩效。在软件开发领域，人们特别关注对开源项目的贡献者的动机。目标：我们从文献中确定了11种动机因素（如享受编程、拥有代码、学习、自我使用等），并评估它们对动机的相对影响。由于动机是一种内部主观感受，我们还分析了答案的有效性。方法：我们进行了一项关于动机的调查，共有66个问题，由521名开发者完成。大多数问题采用了一个11点评分标准。通过比较相关问题的答案、与GitHub上的实际行为进行比较以及与同一开发者在后续调查中进行比较，我们评估了答案的有效性。结果：有效性问题包括相关问题的答案之间的中等相关性，以及自我推广和答案中的错误。尽管存在这些问题，预测分析揭示了不同动机因素如何影响高动机概率的宝贵见解。不同动机因素之间的相关性较低，表明它们相互独立。在所有11种动机因素中取得高值预测了高动机的概率增加。此外，改进分析显示，大多数动机因素的增加预测了总体动机的增加。

更新时间: 2024-04-12 07:51:21

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2404.08303v1

Viewing the process of generating counterfactuals as a source of knowledge: a new approach for explaining classifiers

There are now many explainable AI methods for understanding the decisions of a machine learning model. Among these are those based on counterfactual reasoning, which involve simulating features changes and observing the impact on the prediction. This article proposes to view this simulation process as a source of creating a certain amount of knowledge that can be stored to be used, later, in different ways. This process is illustrated in the additive model and, more specifically, in the case of the naive Bayes classifier, whose interesting properties for this purpose are shown.

Updated: 2024-04-12 07:49:57

标题: 查看生成反事实过程作为知识来源：解释分类器的新方法

摘要: 现在有许多可解释的人工智能方法用于理解机器学习模型的决策。其中之一是基于逆向推理的方法，涉及模拟特征变化并观察对预测的影响。本文提出将这个模拟过程视为创建一定量知识的源泉，这些知识可以存储以便以后以不同方式使用。这个过程在加法模型中得到了说明，更具体地，在朴素贝叶斯分类器的情况下展示了其用于这一目的的有趣属性。

更新时间: 2024-04-12 07:49:57

领域: cs.LG

下载: http://arxiv.org/abs/2309.04284v4

Collaborative-Enhanced Prediction of Spending on Newly Downloaded Mobile Games under Consumption Uncertainty

With the surge in mobile gaming, accurately predicting user spending on newly downloaded games has become paramount for maximizing revenue. However, the inherently unpredictable nature of user behavior poses significant challenges in this endeavor. To address this, we propose a robust model training and evaluation framework aimed at standardizing spending data to mitigate label variance and extremes, ensuring stability in the modeling process. Within this framework, we introduce a collaborative-enhanced model designed to predict user game spending without relying on user IDs, thus ensuring user privacy and enabling seamless online training. Our model adopts a unique approach by separately representing user preferences and game features before merging them as input to the spending prediction module. Through rigorous experimentation, our approach demonstrates notable improvements over production models, achieving a remarkable \textbf{17.11}\% enhancement on offline data and an impressive \textbf{50.65}\% boost in an online A/B test. In summary, our contributions underscore the importance of stable model training frameworks and the efficacy of collaborative-enhanced models in predicting user spending behavior in mobile gaming.

Updated: 2024-04-12 07:47:02

标题: 合作增强的消费不确定性下新下载手机游戏花费预测

摘要: 随着移动游戏的激增，准确预测新下载游戏的用户消费已成为最大化收入的关键。然而，用户行为的不可预测性在这一努力中带来了重大挑战。为了解决这个问题，我们提出了一个强大的模型训练和评估框架，旨在标准化消费数据，以减少标签方差和极端值，确保建模过程的稳定性。在这个框架内，我们引入了一个协同增强模型，旨在预测用户游戏消费，而不依赖于用户ID，从而确保用户隐私并实现无缝的在线训练。我们的模型采用了独特的方法，通过分别表示用户偏好和游戏特性，然后将它们合并作为消费预测模块的输入。通过严格的实验，我们的方法证明了相对于生产模型的显著改进，在离线数据上实现了显著的17.11%提升，在在线A/B测试中实现了令人印象深刻的50.65%增长。总之，我们的贡献强调了稳定的模型训练框架的重要性，以及协同增强模型在预测移动游戏中用户消费行为方面的有效性。

更新时间: 2024-04-12 07:47:02

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2404.08301v1

Neural Likelihood Approximation for Integer Valued Time Series Data

Stochastic processes defined on integer valued state spaces are popular within the physical and biological sciences. These models are necessary for capturing the dynamics of small systems where the individual nature of the populations cannot be ignored and stochastic effects are important. The inference of the parameters of such models, from time series data, is challenging due to intractability of the likelihood. To work at all, current simulation based inference methods require the generation of realisations of the model conditional on the data, which can be both tricky to implement and computationally expensive. In this paper we instead construct a neural likelihood approximation that can be trained using unconditional simulation of the underlying model, which is much simpler. We demonstrate our method by performing inference on a number of ecological and epidemiological models, showing that we can accurately approximate the true posterior while achieving significant computational speed ups compared to current best methods.

Updated: 2024-04-12 07:45:49

标题: 整数值时间序列数据的神经似然近似方法

摘要: 在整数值状态空间上定义的随机过程在物理和生物科学中很受欢迎。这些模型对于捕捉小系统动态至关重要，其中人口的个体特性不可忽略，随机效应也很重要。从时间序列数据推断这些模型的参数是具有挑战性的，因为似然函数难以处理。目前的基于模拟的推断方法要求在数据的条件下生成模型的实现，这既难以实现又计算昂贵。在本文中，我们构建了一个神经似然函数逼近，可以通过对底层模型进行无条件模拟进行训练，这要简单得多。我们通过在多个生态和流行病学模型上进行推断来演示我们的方法，表明我们可以准确地逼近真实后验概率，同时与当前最佳方法相比实现了显著的计算速度提升。

更新时间: 2024-04-12 07:45:49

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2310.12544v2

Study of Emotion Concept Formation by Integrating Vision, Physiology, and Word Information using Multilayered Multimodal Latent Dirichlet Allocation

How are emotions formed? Through extensive debate and the promulgation of diverse theories , the theory of constructed emotion has become prevalent in recent research on emotions. According to this theory, an emotion concept refers to a category formed by interoceptive and exteroceptive information associated with a specific emotion. An emotion concept stores past experiences as knowledge and can predict unobserved information from acquired information. Therefore, in this study, we attempted to model the formation of emotion concepts using a constructionist approach from the perspective of the constructed emotion theory. Particularly, we constructed a model using multilayered multimodal latent Dirichlet allocation , which is a probabilistic generative model. We then trained the model for each subject using vision, physiology, and word information obtained from multiple people who experienced different visual emotion-evoking stimuli. To evaluate the model, we verified whether the formed categories matched human subjectivity and determined whether unobserved information could be predicted via categories. The verification results exceeded chance level, suggesting that emotion concept formation can be explained by the proposed model.

Updated: 2024-04-12 07:34:46

标题: 通过整合视觉、生理和词汇信息使用多层多模态潜在狄利克雷分配研究情绪概念形成

摘要: 情绪是如何形成的？通过广泛的讨论和各种理论的宣传，构建情绪理论已经在最近的情绪研究中变得普遍。根据这一理论，情绪概念指的是由与特定情绪相关的内觉和外觉信息形成的类别。情绪概念将过去的经验存储为知识，并且可以通过获得的信息预测未观察到的信息。因此，在这项研究中，我们尝试使用建构情绪理论的角度，采用构建主义方法对情绪概念的形成进行建模。特别是，我们使用多层多模态潜在Dirichlet分配构建了一个模型，这是一个概率生成模型。然后，我们使用从经历不同视觉情绪引发刺激的多个人获得的视觉、生理和文字信息对每个受试者训练了模型。为了评估模型，我们验证了形成的类别是否与人类主观性匹配，并确定是否可以通过类别预测未观察到的信息。验证结果超过了偶然水平，表明情绪概念的形成可以通过提出的模型来解释。

更新时间: 2024-04-12 07:34:46

领域: cs.AI,cs.HC,cs.LG,cs.RO,cs.SC

下载: http://arxiv.org/abs/2404.08295v1

State-Space Systems as Dynamic Generative Models

A probabilistic framework to study the dependence structure induced by deterministic discrete-time state-space systems between input and output processes is introduced. General sufficient conditions are formulated under which output processes exist and are unique once an input process has been fixed, a property that in the deterministic state-space literature is known as the echo state property. When those conditions are satisfied, the given state-space system becomes a generative model for probabilistic dependences between two sequence spaces. Moreover, those conditions guarantee that the output depends continuously on the input when using the Wasserstein metric. The output processes whose existence is proved are shown to be causal in a specific sense and to generalize those studied in purely deterministic situations. The results in this paper constitute a significant stochastic generalization of sufficient conditions for the deterministic echo state property to hold, in the sense that the stochastic echo state property can be satisfied under contractivity conditions that are strictly weaker than those in deterministic situations. This means that state-space systems can induce a purely probabilistic dependence structure between input and output sequence spaces even when there is no functional relation between those two spaces.

Updated: 2024-04-12 07:32:57

标题: 状态空间系统作为动态生成模型

摘要: 引入了一个概率框架来研究由确定性离散时间状态空间系统在输入和输出过程之间引起的依赖结构。提出了一般的充分条件，一旦确定了输入过程，输出过程就存在且唯一，这在确定性状态空间文献中被称为回声状态属性。当这些条件满足时，给定的状态空间系统成为两个序列空间之间概率依赖关系的生成模型。此外，这些条件保证在使用Wasserstein度量时，输出连续地依赖于输入。证明了存在的输出过程在特定意义上是因果的，并且推广了在纯粹确定性情况下研究的过程。本文的结果构成了确定性回声状态属性成立的充分条件的重要随机泛化，即在比确定性情况下更弱的收缩条件下可以满足随机回声状态属性。这意味着状态空间系统可以在输入和输出序列空间之间引发纯粹的概率依赖结构，即使这两个空间之间没有功能关系。

更新时间: 2024-04-12 07:32:57

领域: stat.ML,cs.LG,math.DS,math.PR,math.ST,stat.TH,37H05, 37N35, 62M10, 68T05

下载: http://arxiv.org/abs/2404.08717v1

A Survey of Neural Network Robustness Assessment in Image Recognition

In recent years, there has been significant attention given to the robustness assessment of neural networks. Robustness plays a critical role in ensuring reliable operation of artificial intelligence (AI) systems in complex and uncertain environments. Deep learning's robustness problem is particularly significant, highlighted by the discovery of adversarial attacks on image classification models. Researchers have dedicated efforts to evaluate robustness in diverse perturbation conditions for image recognition tasks. Robustness assessment encompasses two main techniques: robustness verification/ certification for deliberate adversarial attacks and robustness testing for random data corruptions. In this survey, we present a detailed examination of both adversarial robustness (AR) and corruption robustness (CR) in neural network assessment. Analyzing current research papers and standards, we provide an extensive overview of robustness assessment in image recognition. Three essential aspects are analyzed: concepts, metrics, and assessment methods. We investigate the perturbation metrics and range representations used to measure the degree of perturbations on images, as well as the robustness metrics specifically for the robustness conditions of classification models. The strengths and limitations of the existing methods are also discussed, and some potential directions for future research are provided.

Updated: 2024-04-12 07:19:16

标题: 对图像识别中神经网络鲁棒性评估的调查

摘要: 近年来，对神经网络的稳健性评估受到了重要关注。稳健性在确保人工智能系统在复杂和不确定环境中可靠运行方面起着关键作用。深度学习的稳健性问题尤为重要，这一点被发现对图像分类模型发起对抗攻击所凸显。研究人员致力于评估在不同扰动条件下图像识别任务的稳健性。稳健性评估包括两种主要技术：针对故意对抗攻击的稳健性验证/认证以及针对随机数据破坏的稳健性测试。在这项调查中，我们详细研究了神经网络评估中的对抗稳健性（AR）和破坏稳健性（CR）。通过分析当前的研究论文和标准，我们提供了对图像识别中稳健性评估的全面概述。分析了三个基本方面：概念、度量标准和评估方法。我们调查了用于衡量图像扰动程度的扰动度量标准和范围表示，以及专门用于分类模型稳健性条件的稳健性度量标准。同时还讨论了现有方法的优势和局限性，并提供了一些未来研究的潜在方向。

更新时间: 2024-04-12 07:19:16

领域: cs.CV,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.08285v1

Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example

Breast cancer is a relatively common cancer among gynecological cancers. Its diagnosis often relies on the pathology of cells in the lesion. The pathological diagnosis of breast cancer not only requires professionals and time, but also sometimes involves subjective judgment. To address the challenges of dependence on pathologists expertise and the time-consuming nature of achieving accurate breast pathological image classification, this paper introduces an approach utilizing convolutional neural networks (CNNs) for the rapid categorization of pathological images, aiming to enhance the efficiency of breast pathological image detection. And the approach enables the rapid and automatic classification of pathological images into benign and malignant groups. The methodology involves utilizing a convolutional neural network (CNN) model leveraging the Inceptionv3 architecture and transfer learning algorithm for extracting features from pathological images. Utilizing a neural network with fully connected layers and employing the SoftMax function for image classification. Additionally, the concept of image partitioning is introduced to handle high-resolution images. To achieve the ultimate classification outcome, the classification probabilities of each image block are aggregated using three algorithms: summation, product, and maximum. Experimental validation was conducted on the BreaKHis public dataset, resulting in accuracy rates surpassing 0.92 across all four magnification coefficients (40X, 100X, 200X, and 400X). It demonstrates that the proposed method effectively enhances the accuracy in classifying pathological images of breast cancer.

Updated: 2024-04-12 07:08:05

标题: 卷积神经网络对癌症细胞病理学图像进行分类：以乳腺癌为例

摘要: 乳腺癌是妇科癌症中相对常见的一种。其诊断通常依赖于病变细胞的病理学。乳腺癌的病理诊断不仅需要专业人士和时间，有时还涉及主观判断。为了解决对病理学专家的依赖和实现准确乳腺病理图像分类的耗时性质所带来的挑战，本文介绍了一种利用卷积神经网络（CNNs）进行快速病理图像分类的方法，旨在提高乳腺病理图像检测的效率。该方法使得病理图像能够快速自动地分类为良性和恶性组。方法涉及利用卷积神经网络（CNN）模型，利用Inceptionv3架构和迁移学习算法从病理图像中提取特征。利用具有完全连接层的神经网络，并使用SoftMax函数进行图像分类。此外，引入图像分割的概念来处理高分辨率图像。为了实现最终的分类结果，使用三种算法对每个图像块的分类概率进行聚合：求和、乘积和最大值。实验验证在BreaKHis公共数据集上进行，结果显示在所有四个放大系数（40X、100X、200X和400X）上的准确率超过0.92。这表明所提出的方法有效地提高了对乳腺癌病理图像进行分类的准确性。

更新时间: 2024-04-12 07:08:05

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.08279v1

Struggle with Adversarial Defense? Try Diffusion

Adversarial attacks induce misclassification by introducing subtle perturbations. Recently, diffusion models are applied to the image classifiers to improve adversarial robustness through adversarial training or by purifying adversarial noise. However, diffusion-based adversarial training often encounters convergence challenges and high computational expenses. Additionally, diffusion-based purification inevitably causes data shift and is deemed susceptible to stronger adaptive attacks. To tackle these issues, we propose the Truth Maximization Diffusion Classifier (TMDC), a generative Bayesian classifier that builds upon pre-trained diffusion models and the Bayesian theorem. Unlike data-driven classifiers, TMDC, guided by Bayesian principles, utilizes the conditional likelihood from diffusion models to determine the class probabilities of input images, thereby insulating against the influences of data shift and the limitations of adversarial training. Moreover, to enhance TMDC's resilience against more potent adversarial attacks, we propose an optimization strategy for diffusion classifiers. This strategy involves post-training the diffusion model on perturbed datasets with ground-truth labels as conditions, guiding the diffusion model to learn the data distribution and maximizing the likelihood under the ground-truth labels. The proposed method achieves state-of-the-art performance on the CIFAR10 dataset against heavy white-box attacks and strong adaptive attacks. Specifically, TMDC achieves robust accuracies of 82.81% against $l_{\infty}$ norm-bounded perturbations and 86.05% against $l_{2}$ norm-bounded perturbations, respectively, with $\epsilon=0.05$.

Updated: 2024-04-12 06:52:40

标题: 与对手防御作斗争？尝试扩散

摘要: 对抗性攻击通过引入微小扰动导致误分类。最近，扩散模型被应用于图像分类器，通过对抗训练或净化对抗性噪声来提高对抗性鲁棒性。然而，基于扩散的对抗性训练经常遇到收敛挑战和高计算开销。此外，基于扩散的净化不可避免地导致数据偏移，并被认为容易受到更强的自适应攻击的影响。为了解决这些问题，我们提出了Truth Maximization Diffusion Classifier（TMDC），这是一个基于生成贝叶斯分类器，建立在预训练的扩散模型和贝叶斯定理之上。与数据驱动的分类器不同，TMDC受贝叶斯原则指导，利用扩散模型的条件似然来确定输入图像的类概率，从而抵御数据偏移的影响和对抗训练的局限性。此外，为增强TMDC对更强的对抗性攻击的韧性，我们提出了一种扩散分类器的优化策略。该策略涉及在受扰动的数据集上对扩散模型进行后训练，以地面真实标签作为条件，引导扩散模型学习数据分布并最大化地面真实标签下的似然。所提出的方法在CIFAR10数据集上取得了最先进的性能，对抗大规模白盒攻击和强自适应攻击。具体而言，TMDC分别在$l_{\infty}$规范限制扰动和$l_{2}$规范限制扰动下实现了82.81%和86.05%的鲁棒准确率，$\epsilon=0.05$。

更新时间: 2024-04-12 06:52:40

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2404.08273v1

Transfer Learning Study of Motion Transformer-based Trajectory Predictions

Trajectory planning in autonomous driving is highly dependent on predicting the emergent behavior of other road users. Learning-based methods are currently showing impressive results in simulation-based challenges, with transformer-based architectures technologically leading the way. Ultimately, however, predictions are needed in the real world. In addition to the shifts from simulation to the real world, many vehicle- and country-specific shifts, i.e. differences in sensor systems, fusion and perception algorithms as well as traffic rules and laws, are on the agenda. Since models that can cover all system setups and design domains at once are not yet foreseeable, model adaptation plays a central role. Therefore, a simulation-based study on transfer learning techniques is conducted on basis of a transformer-based model. Furthermore, the study aims to provide insights into possible trade-offs between computational time and performance to support effective transfers into the real world.

Updated: 2024-04-12 06:50:32

标题: 基于运动转换器的轨迹预测的迁移学习研究

摘要: 自动驾驶中的轨迹规划高度依赖于预测其他道路使用者的紧急行为。基于学习的方法目前在基于模拟的挑战中显示出令人印象深刻的结果，以变压器为基础的架构技术领先。然而，最终预测需要在现实世界中进行。除了从模拟到现实世界的转变外，还有许多车辆和国家特定的变化，即传感器系统、融合和感知算法以及交通规则和法律的差异，都在议程之上。由于目前尚无法预见能够同时涵盖所有系统设置和设计领域的模型，模型适应起着至关重要的作用。因此，基于变压器模型进行的迁移学习技术的基于模拟的研究正在进行中。此外，该研究旨在提供关于计算时间和性能之间可能的权衡，以支持有效地转移到现实世界。

更新时间: 2024-04-12 06:50:32

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2404.08271v1

PrivAgE: A Toolchain for Privacy-Preserving Distributed Aggregation on Edge-Devices

Valuable insights, such as frequently visited environments in the wake of the COVID-19 pandemic, can oftentimes only be gained by analyzing sensitive data spread across edge-devices like smartphones. To facilitate such an analysis, we present a toolchain called PrivAgE for a distributed, privacy-preserving aggregation of local data by taking the limited resources of edge-devices into account. The distributed aggregation is based on secure summation and simultaneously satisfies the notion of differential privacy. In this way, other parties can neither learn the sensitive data of single clients nor a single client's influence on the final result. We perform an evaluation of the power consumption, the running time and the bandwidth overhead on real as well as simulated devices and demonstrate the flexibility of our toolchain by presenting an extension of the summation of histograms to distributed clustering.

Updated: 2024-04-12 06:44:29

标题: PrivAgE：面向边缘设备的隐私保护分布式聚合工具链

摘要: 宝贵的见解，例如在COVID-19大流行后经常访问的环境，往往只能通过分析分布在智能手机等边缘设备上的敏感数据来获得。为了促进这种分析，我们提出了一个名为PrivAgE的工具链，用于通过考虑边缘设备的有限资源，实现对本地数据的分布式、隐私保护聚合。分布式聚合基于安全求和，同时满足差分隐私的概念。通过这种方式，其他参与方既不能了解单个客户的敏感数据，也不能了解单个客户对最终结果的影响。我们对真实设备和模拟设备上的功耗、运行时间和带宽开销进行了评估，并通过将直方图求和扩展为分布式聚类，展示了我们工具链的灵活性。

更新时间: 2024-04-12 06:44:29

领域: cs.CR

下载: http://arxiv.org/abs/2309.12483v2

Graph Neural Networks in Vision-Language Image Understanding: A Survey

2D image understanding is a complex problem within computer vision, but it holds the key to providing human-level scene comprehension. It goes further than identifying the objects in an image, and instead, it attempts to understand the scene. Solutions to this problem form the underpinning of a range of tasks, including image captioning, visual question answering (VQA), and image retrieval. Graphs provide a natural way to represent the relational arrangement between objects in an image, and thus, in recent years graph neural networks (GNNs) have become a standard component of many 2D image understanding pipelines, becoming a core architectural component, especially in the VQA group of tasks. In this survey, we review this rapidly evolving field and we provide a taxonomy of graph types used in 2D image understanding approaches, a comprehensive list of the GNN models used in this domain, and a roadmap of future potential developments. To the best of our knowledge, this is the first comprehensive survey that covers image captioning, visual question answering, and image retrieval techniques that focus on using GNNs as the main part of their architecture.

Updated: 2024-04-12 06:42:47

标题: 视觉语言图像理解中的图神经网络：一项调查

摘要: 2D图像理解是计算机视觉中一个复杂的问题，但它是提供人类级场景理解的关键。它不仅仅是识别图像中的对象，而是试图理解整个场景。解决这个问题的方法构成了一系列任务的基础，包括图像字幕生成、视觉问答（VQA）和图像检索。图形提供了一种自然的方式来表示图像中对象之间的关系排列，因此，近年来，图神经网络（GNNs）已经成为许多2D图像理解流程的标准组件，尤其是在VQA任务组中成为核心架构组件。在本调查中，我们回顾了这个快速发展的领域，并提供了一份用于2D图像理解方法的图形类型的分类，以及在该领域使用的GNN模型的全面列表，以及未来潜在发展的路线图。据我们所知，这是第一份全面调查，涵盖了图像字幕生成、视觉问答和图像检索技术，重点是使用GNNs作为其架构的主要部分。

更新时间: 2024-04-12 06:42:47

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2303.03761v2

Efficient Graph Laplacian Estimation by Proximal Newton

The Laplacian-constrained Gaussian Markov Random Field (LGMRF) is a common multivariate statistical model for learning a weighted sparse dependency graph from given data. This graph learning problem can be formulated as a maximum likelihood estimation (MLE) of the precision matrix, subject to Laplacian structural constraints, with a sparsity-inducing penalty term. This paper aims to solve this learning problem accurately and efficiently. First, since the commonly used $\ell_1$-norm penalty is inappropriate in this setting and may lead to a complete graph, we employ the nonconvex minimax concave penalty (MCP), which promotes sparse solutions with lower estimation bias. Second, as opposed to existing first-order methods for this problem, we develop a second-order proximal Newton approach to obtain an efficient solver, utilizing several algorithmic features, such as using Conjugate Gradients, preconditioning, and splitting to active/free sets. Numerical experiments demonstrate the advantages of the proposed method in terms of both computational complexity and graph learning accuracy compared to existing methods.

Updated: 2024-04-12 06:38:32

标题: 通过近端牛顿法高效估计图拉普拉斯算子

摘要: Laplacian-constrained Gaussian Markov随机场（LGMRF）是一种常见的多变量统计模型，用于从给定数据中学习加权稀疏依赖图。这个图学习问题可以被表述为一个最大似然估计（MLE）的精度矩阵，受到拉普拉斯结构约束的影响，带有稀疏诱导惩罚项。本文旨在准确高效地解决这个学习问题。首先，由于常用的$\ell_1$-范数惩罚在这种情况下不合适，可能导致完全图，我们采用非凸极小化凹罚项（MCP），促进具有更低估计偏差的稀疏解。其次，与现有的此问题的一阶方法相反，我们开发了一个二阶近端牛顿方法，以获得一个高效的求解器，利用了几个算法特性，如使用共轭梯度，预处理和将活动/自由集分离。数值实验表明，与现有方法相比，所提出的方法在计算复杂度和图学习准确性方面具有优势。

更新时间: 2024-04-12 06:38:32

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2302.06434v3

FedAgg: Adaptive Federated Learning with Aggregated Gradients

Federated Learning (FL) has emerged as a pivotal paradigm within distributed model training, facilitating collaboration among multiple devices to refine a shared model, harnessing their respective datasets as orchestrated by a central server, while ensuring the localization of private data. Nonetheless, the non-independent-and-identically-distributed (Non-IID) data generated on heterogeneous clients and the incessant information exchange among participants may markedly impede training efficacy and retard the convergence rate. In this paper, we refine the conventional stochastic gradient descent (SGD) methodology by introducing aggregated gradients at each local training epoch and propose an adaptive learning rate iterative algorithm that concerns the divergence between local and average parameters. To surmount the obstacle that acquiring other clients' local information, we introduce the mean-field approach by leveraging two mean-field terms to approximately estimate the average local parameters and gradients over time in a manner that precludes the need for local information exchange among clients and design the decentralized adaptive learning rate for each client. Through meticulous theoretical analysis, we provide a robust convergence guarantee for our proposed algorithm and ensure its wide applicability. Our numerical experiments substantiate the superiority of our framework in comparison with existing state-of-the-art FL strategies for enhancing model performance and accelerating convergence rate under IID and Non-IID data distributions.

Updated: 2024-04-12 06:26:04

标题: FedAgg：具有聚合梯度的自适应联邦学习

摘要: 联邦学习（FL）已经成为分布式模型训练中的一个关键范式，促进多个设备之间的合作，以改进共享模型，通过中央服务器协调它们各自的数据集，同时确保私有数据的本地化。然而，在异构客户端生成的非独立同分布（Non-IID）数据和参与者之间不断的信息交换可能会显著阻碍训练效果并减缓收敛速度。在本文中，我们通过在每个本地训练周期引入聚合梯度来改进传统的随机梯度下降（SGD）方法，并提出了一种自适应学习率迭代算法，关注本地和平均参数之间的差异。为了克服获取其他客户端本地信息的障碍，我们引入了平均场方法，通过利用两个平均场项来近似估计随时间变化的平均本地参数和梯度，从而避免了客户端之间的本地信息交换，并为每个客户端设计了分散的自适应学习率。通过细致的理论分析，我们为提出的算法提供了稳健的收敛保证，并确保其广泛适用性。我们的数值实验证实了我们的框架相对于现有最先进的FL策略在IID和Non-IID数据分布下提高模型性能和加速收敛速度的优越性。

更新时间: 2024-04-12 06:26:04

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2303.15799v4

ADMarker: A Multi-Modal Federated Learning System for Monitoring Digital Biomarkers of Alzheimer's Disease

Alzheimer's Disease (AD) and related dementia are a growing global health challenge due to the aging population. In this paper, we present ADMarker, the first end-to-end system that integrates multi-modal sensors and new federated learning algorithms for detecting multidimensional AD digital biomarkers in natural living environments. ADMarker features a novel three-stage multi-modal federated learning architecture that can accurately detect digital biomarkers in a privacy-preserving manner. Our approach collectively addresses several major real-world challenges, such as limited data labels, data heterogeneity, and limited computing resources. We built a compact multi-modality hardware system and deployed it in a four-week clinical trial involving 91 elderly participants. The results indicate that ADMarker can accurately detect a comprehensive set of digital biomarkers with up to 93.8% accuracy and identify early AD with an average of 88.9% accuracy. ADMarker offers a new platform that can allow AD clinicians to characterize and track the complex correlation between multidimensional interpretable digital biomarkers, demographic factors of patients, and AD diagnosis in a longitudinal manner.

Updated: 2024-04-12 06:25:43

标题: ADMarker：用于监测阿尔茨海默病数字生物标志物的多模态联合学习系统

摘要: 阿尔茨海默病（AD）及相关痴呆症是由于人口老龄化而日益严重的全球健康挑战。在本文中，我们介绍了ADMarker，这是第一个集成多模态传感器和新型联邦学习算法的端到端系统，用于在自然生活环境中检测多维AD数字生物标志物。ADMarker具有一种新颖的三阶段多模态联邦学习架构，可以在保护隐私的方式下准确检测数字生物标志物。我们的方法共同解决了几个主要的现实世界挑战，如数据标签有限、数据异质性和计算资源有限。我们构建了一个紧凑的多模态硬件系统，并在一个持续四周的临床试验中，涉及91名老年参与者。结果表明，ADMarker能够以高达93.8％的准确度准确检测一整套数字生物标志物，并以平均88.9％的准确度识别早期AD。ADMarker提供了一个新平台，可以让AD临床医生以纵向方式描述和跟踪多维可解释数字生物标志物、患者的人口因素和AD诊断之间的复杂相关性。

更新时间: 2024-04-12 06:25:43

领域: cs.LG

下载: http://arxiv.org/abs/2310.15301v3

Relational Prompt-based Pre-trained Language Models for Social Event Detection

Social Event Detection (SED) aims to identify significant events from social streams, and has a wide application ranging from public opinion analysis to risk management. In recent years, Graph Neural Network (GNN) based solutions have achieved state-of-the-art performance. However, GNN-based methods often struggle with noisy and missing edges between messages, affecting the quality of learned message embedding. Moreover, these methods statically initialize node embedding before training, which, in turn, limits the ability to learn from message texts and relations simultaneously. In this paper, we approach social event detection from a new perspective based on Pre-trained Language Models (PLMs), and present RPLM_SED (Relational prompt-based Pre-trained Language Models for Social Event Detection). We first propose a new pairwise message modeling strategy to construct social messages into message pairs with multi-relational sequences. Secondly, a new multi-relational prompt-based pairwise message learning mechanism is proposed to learn more comprehensive message representation from message pairs with multi-relational prompts using PLMs. Thirdly, we design a new clustering constraint to optimize the encoding process by enhancing intra-cluster compactness and inter-cluster dispersion, making the message representation more distinguishable. We evaluate the RPLM_SED on three real-world datasets, demonstrating that the RPLM_SED model achieves state-of-the-art performance in offline, online, low-resource, and long-tail distribution scenarios for social event detection tasks.

Updated: 2024-04-12 06:23:07

标题: 基于关系提示的预训练语言模型用于社交事件检测

摘要: 社会事件检测（SED）旨在从社交流中识别重要事件，具有从公共舆论分析到风险管理等广泛应用。近年来，基于图神经网络（GNN）的解决方案已经取得了最先进的性能。然而，基于GNN的方法经常在消息之间存在噪声和缺失边缘的情况下遇到困难，影响了学习消息嵌入的质量。此外，这些方法在训练之前静态初始化节点嵌入，从而限制了同时从消息文本和关系中学习的能力。在本文中，我们从基于预训练语言模型（PLMs）的新视角探讨社会事件检测，并提出了RPLM_SED（基于关系提示的预训练语言模型用于社会事件检测）。首先，我们提出了一种新的成对消息建模策略，将社交消息构建成具有多关系序列的消息对。其次，提出了一种新的基于多关系提示的成对消息学习机制，利用PLMs从具有多关系提示的消息对中学习更全面的消息表示。第三，我们设计了一种新的聚类约束来优化编码过程，通过增强簇内紧密度和簇间离散度，使消息表示更具区分性。我们在三个真实数据集上评估了RPLM_SED，表明RPLM_SED模型在离线、在线、低资源和长尾分布场景下的社会事件检测任务中达到了最先进的性能。

更新时间: 2024-04-12 06:23:07

领域: cs.CL,cs.AI,cs.LG,cs.SI

下载: http://arxiv.org/abs/2404.08263v1

Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain

Several previous studies have considered language- and domain-specific large language models (LLMs) as separate topics. This study explores the combination of a non-English language and a high-demand industry domain, focusing on a Japanese business-specific LLM. This type of a model requires expertise in the business domain, strong language skills, and regular updates of its knowledge. We trained a 13-billion-parameter LLM from scratch using a new dataset of business texts and patents, and continually pretrained it with the latest business documents. Further we propose a new benchmark for Japanese business domain question answering (QA) and evaluate our models on it. The results show that our pretrained model improves QA accuracy without losing general knowledge, and that continual pretraining enhances adaptation to new information. Our pretrained model and business domain benchmark are publicly available.

Updated: 2024-04-12 06:21:48

标题: 预训练和更新语言和领域特定的大型语言模型：日本商业领域的案例研究

摘要: 过去的一些研究已经考虑到语言和领域特定的大型语言模型（LLMs）作为独立的主题。本研究探讨了非英语语言和高需求行业领域的结合，重点关注日本商业特定的LLM。这种模型需要在商业领域具有专业知识、强大的语言技能，并定期更新其知识。我们从头开始使用一组新的商业文本和专利数据集训练了一个130亿参数的LLM，并持续对其进行最新商业文件的预训练。此外，我们提出了一个新的日本商业领域问题回答（QA）基准，并在其上评估了我们的模型。结果显示，我们的预训练模型提高了QA准确性，同时没有丧失一般知识，并且持续的预训练增强了对新信息的适应能力。我们的预训练模型和商业领域基准已公开可用。

更新时间: 2024-04-12 06:21:48

领域: cs.CL,cs.AI,68T50

下载: http://arxiv.org/abs/2404.08262v1

Combating Advanced Persistent Threats: Challenges and Solutions

The rise of advanced persistent threats (APTs) has marked a significant cybersecurity challenge, characterized by sophisticated orchestration, stealthy execution, extended persistence, and targeting valuable assets across diverse sectors. Provenance graph-based kernel-level auditing has emerged as a promising approach to enhance visibility and traceability within intricate network environments. However, it still faces challenges including reconstructing complex lateral attack chains, detecting dynamic evasion behaviors, and defending smart adversarial subgraphs. To bridge the research gap, this paper proposes an efficient and robust APT defense scheme leveraging provenance graphs, including a network-level distributed audit model for cost-effective lateral attack reconstruction, a trust-oriented APT evasion behavior detection strategy, and a hidden Markov model based adversarial subgraph defense approach. Through prototype implementation and extensive experiments, we validate the effectiveness of our system. Lastly, crucial open research directions are outlined in this emerging field.

Updated: 2024-04-12 06:10:43

标题: 对抗高级持续性威胁：挑战与解决方案

摘要: 高级持久威胁（APTs）的崛起标志着一个重要的网络安全挑战，其特点是精密的编排、隐蔽的执行、持续的存在以及针对不同行业的有价值资产。基于来源图的内核级审计已经成为一种有前途的方法，可以增强复杂网络环境中的可见性和可追溯性。然而，它仍面临挑战，包括重建复杂的横向攻击链、检测动态规避行为以及防御聪明的对抗子图。为了弥合研究差距，本文提出了一种有效且强大的APT防御方案，利用来源图，包括网络级分布式审计模型用于成本效益的横向攻击重建、面向信任的APT规避行为检测策略以及基于隐马尔可夫模型的对抗子图防御方法。通过原型实现和大量实验，我们验证了系统的有效性。最后，本文概述了这一新兴领域中的关键开放研究方向。

更新时间: 2024-04-12 06:10:43

领域: cs.CR

下载: http://arxiv.org/abs/2309.09498v2

Practical Region-level Attack against Segment Anything Models

Segment Anything Models (SAM) have made significant advancements in image segmentation, allowing users to segment target portions of an image with a single click (i.e., user prompt). Given its broad applications, the robustness of SAM against adversarial attacks is a critical concern. While recent works have explored adversarial attacks against a pre-defined prompt/click, their threat model is not yet realistic: (1) they often assume the user-click position is known to the attacker (point-based attack), and (2) they often operate under a white-box setting with limited transferability. In this paper, we propose a more practical region-level attack where attackers do not need to know the precise user prompt. The attack remains effective as the user clicks on any point on the target object in the image, hiding the object from SAM. Also, by adapting a spectrum transformation method, we make the attack more transferable under a black-box setting. Both control experiments and testing against real-world SAM services confirm its effectiveness.

Updated: 2024-04-12 06:09:24

标题: 实用的区域级攻击：针对分段模型的攻击

摘要: 段分模型（SAM）在图像分割领域取得了显著进展，使用户能够通过单击（即用户提示）来分割图像的目标部分。考虑到其广泛的应用，SAM对抗对抗攻击的鲁棒性是一个关键关注点。尽管最近的研究已经探讨了针对预定义提示/点击的对抗攻击，但他们的威胁模型尚不够现实：（1）他们经常假设攻击者已知用户点击位置（基于点的攻击），（2）他们经常在白盒设置下操作，具有有限的可转移性。在本文中，我们提出了一个更实用的区域级攻击，攻击者不需要知道精确的用户提示。当用户点击图像中目标物体的任何点时，攻击仍然有效，将目标物体隐藏在SAM之外。此外，通过采用一种光谱变换方法，我们使攻击在黑盒设置下具有更强的可转移性。控制实验和对真实世界SAM服务的测试都确认了其有效性。

更新时间: 2024-04-12 06:09:24

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2404.08255v1

Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models

Diffusion models have emerged as a robust framework for various generative tasks, such as image and audio synthesis, and have also demonstrated a remarkable ability to generate mixed-type tabular data comprising both continuous and discrete variables. However, current approaches to training diffusion models on mixed-type tabular data tend to inherit the imbalanced distributions of features present in the training dataset, which can result in biased sampling. In this research, we introduce a fair diffusion model designed to generate balanced data on sensitive attributes. We present empirical evidence demonstrating that our method effectively mitigates the class imbalance in training data while maintaining the quality of the generated samples. Furthermore, we provide evidence that our approach outperforms existing methods for synthesizing tabular data in terms of performance and fairness.

Updated: 2024-04-12 06:08:43

标题: 使用扩散模型进行平衡的混合型表格数据综合

摘要: 扩散模型已经成为各种生成任务的强大框架，例如图像和音频合成，并且还展示了生成混合类型表格数据（包括连续和离散变量）的显著能力。然而，目前针对混合类型表格数据进行扩散模型训练的方法往往会继承训练数据集中存在的特征不平衡分布，这可能导致有偏采样。在这项研究中，我们介绍了一种旨在在敏感属性上生成平衡数据的公平扩散模型。我们提供了实证证据表明，我们的方法有效地缓解了训练数据中的类别不平衡问题，同时保持了生成样本的质量。此外，我们提供证据表明，我们的方法在性能和公平性方面优于现有的合成表格数据方法。

更新时间: 2024-04-12 06:08:43

领域: cs.LG

下载: http://arxiv.org/abs/2404.08254v1

Adaptive Federated Learning via New Entropy Approach

Federated Learning (FL) has emerged as a prominent distributed machine learning framework that enables geographically discrete clients to train a global model collaboratively while preserving their privacy-sensitive data. However, due to the non-independent-and-identically-distributed (Non-IID) data generated by heterogeneous clients, the performances of the conventional federated optimization schemes such as FedAvg and its variants deteriorate, requiring the design to adaptively adjust specific model parameters to alleviate the negative influence of heterogeneity. In this paper, by leveraging entropy as a new metric for assessing the degree of system disorder, we propose an adaptive FEDerated learning algorithm based on ENTropy theory (FedEnt) to alleviate the parameter deviation among heterogeneous clients and achieve fast convergence. Nevertheless, given the data disparity and parameter deviation of heterogeneous clients, determining the optimal dynamic learning rate for each client becomes a challenging task as there is no communication among participating clients during the local training epochs. To enable a decentralized learning rate for each participating client, we first introduce the mean-field terms to estimate the components associated with other clients' local parameters. Furthermore, we provide rigorous theoretical analysis on the existence and determination of the mean-field estimators. Based on the mean-field estimators, the closed-form adaptive learning rate for each client is derived by constructing the Hamilton equation. Moreover, the convergence rate of our proposed FedEnt is proved. The extensive experimental results on the real-world datasets (i.e., MNIST, EMNIST-L, CIFAR10, and CIFAR100) show that our FedEnt algorithm surpasses FedAvg and its variants (i.e., FedAdam, FedProx, and FedDyn) under Non-IID settings and achieves a faster convergence rate.

Updated: 2024-04-12 06:04:55

标题: 基于新熵方法的自适应联邦学习

摘要: Federated Learning (FL)已经成为一个突出的分布式机器学习框架，使地理上分散的客户端能够协作训练一个全局模型，同时保护他们的隐私敏感数据。然而，由于异构客户端生成的非独立同分布（Non-IID）数据，传统的联邦优化方案（如FedAvg及其变种）的性能下降，需要设计自适应调整特定模型参数以减轻异质性的负面影响。本文通过利用熵作为评估系统混乱程度的新指标，提出了一种基于ENTropy理论的自适应FEDerated学习算法（FedEnt），以减轻异质客户端之间的参数偏差并实现快速收敛。然而，鉴于异构客户端的数据差异和参数偏差，在本地训练过程中参与客户端之间没有通信，确定每个客户端的最佳动态学习率成为一项具有挑战性的任务。为了为每个参与客户端实现分散的学习率，我们首先引入平均场项来估计与其他客户端的本地参数相关的组件。此外，我们对平均场估计器的存在和确定进行了严格的理论分析。基于平均场估计器，通过构建哈密尔顿方程推导出每个客户端的闭合形式自适应学习率。此外，我们证明了我们提出的FedEnt的收敛速度。对真实世界数据集（即MNIST、EMNIST-L、CIFAR10和CIFAR100）进行的广泛实验结果表明，我们的FedEnt算法在非独立同分布设置下超越了FedAvg及其变种（如FedAdam、FedProx和FedDyn），并实现了更快的收敛速度。

更新时间: 2024-04-12 06:04:55

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2303.14966v3

A Systematic Construction Approach for All $4\times 4$ Involutory MDS Matrices

Maximum distance separable (MDS) matrices play a crucial role not only in coding theory but also in the design of block ciphers and hash functions. Of particular interest are involutory MDS matrices, which facilitate the use of a single circuit for both encryption and decryption in hardware implementations. In this article, we present several characterizations of involutory MDS matrices of even order. Additionally, we introduce a new matrix form for obtaining all involutory MDS matrices of even order and compare it with other matrix forms available in the literature. We then propose a technique to systematically construct all $4 \times 4$ involutory MDS matrices over a finite field $\mathbb{F}_{2^m}$. This method significantly reduces the search space by focusing on involutory MDS class representative matrices, leading to the generation of all such matrices within a substantially smaller set compared to considering all $4 \times 4$ involutory matrices. Specifically, our approach involves searching for these representative matrices within a set of cardinality $(2^m-1)^5$. Through this method, we provide an explicit enumeration of the total number of $4 \times 4$ involutory MDS matrices over $\mathbb{F}_{2^m}$ for $m=3,4,\ldots,8$.

Updated: 2024-04-12 05:37:42

标题: 一种用于所有$4\times 4$的反对角MDS矩阵的系统构建方法

摘要: 最大距离可分隔（MDS）矩阵在编码理论以及块密码和哈希函数的设计中起着至关重要的作用。特别感兴趣的是可逆MDS矩阵，这有助于在硬件实现中使用单一电路进行加密和解密。在本文中，我们提出了偶数阶可逆MDS矩阵的几种表征。此外，我们引入了一种新的矩阵形式，用于获取所有偶数阶可逆MDS矩阵，并将其与文献中提供的其他矩阵形式进行比较。然后，我们提出一种系统构造有限域$\mathbb{F}_{2^m}$上所有$4 \times 4$可逆MDS矩阵的技术。通过专注于可逆MDS类代表矩阵，这种方法显著减少了搜索空间，从而生成了比考虑所有$4 \times 4$可逆矩阵更小的矩阵集。具体而言，我们的方法涉及在基数为$(2^m-1)^5$的矩阵集中搜索这些代表矩阵。通过这种方法，我们提供了$4 \times 4$可逆MDS矩阵在$\mathbb{F}_{2^m}$上的总数明确列举，对于$m=3,4,\ldots,8$。

更新时间: 2024-04-12 05:37:42

领域: cs.CR

下载: http://arxiv.org/abs/2404.08250v1

Agile and versatile bipedal robot tracking control through reinforcement learning

The remarkable athletic intelligence displayed by humans in complex dynamic movements such as dancing and gymnastics suggests that the balance mechanism in biological beings is decoupled from specific movement patterns. This decoupling allows for the execution of both learned and unlearned movements under certain constraints while maintaining balance through minor whole-body coordination. To replicate this balance ability and body agility, this paper proposes a versatile controller for bipedal robots. This controller achieves ankle and body trajectory tracking across a wide range of gaits using a single small-scale neural network, which is based on a model-based IK solver and reinforcement learning. We consider a single step as the smallest control unit and design a universally applicable control input form suitable for any single-step variation. Highly flexible gait control can be achieved by combining these minimal control units with high-level policy through our extensible control interface. To enhance the trajectory-tracking capability of our controller, we utilize a three-stage training curriculum. After training, the robot can move freely between target footholds at varying distances and heights. The robot can also maintain static balance without repeated stepping to adjust posture. Finally, we evaluate the tracking accuracy of our controller on various bipedal tasks, and the effectiveness of our control framework is verified in the simulation environment.

Updated: 2024-04-12 05:25:03

标题: 灵活多变的双足机器人通过强化学习的跟踪控制

摘要: 人类在复杂动态运动中展示出的出色运动智能，如跳舞和体操，表明生物体中的平衡机制与特定的运动模式是解耦的。这种解耦允许在特定约束条件下执行学习和未学习的运动，同时通过整体微小协调来保持平衡。为了复制这种平衡能力和身体敏捷性，本文提出了一种用于双足机器人的多功能控制器。该控制器利用基于模型的IK求解器和强化学习的单个小规模神经网络，实现了在广泛的步态范围内实现脚踝和身体轨迹跟踪。我们将单步视为最小的控制单元，并设计了一种适用于任何单步变化的通用控制输入形式。通过将这些最小控制单元与高级策略结合，可以实现高度灵活的步态控制，通过我们的可扩展控制接口。为了增强我们控制器的轨迹跟踪能力，我们采用了三阶段的训练计划。训练后，机器人可以自由移动到不同距离和高度的目标脚手架之间。机器人还可以在不需要重复迈步调整姿势的情况下保持静态平衡。最后，我们评估了我们控制器在各种双足任务中的跟踪精度，并在仿真环境中验证了我们控制框架的有效性。

更新时间: 2024-04-12 05:25:03

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2404.08246v1

RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning

Solving multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations. Although existing works strike the balance of exploration and exploitation through hand-crafted adaptive strategies, they require certain expert knowledge, hence inflexible to deal with MMOP with different properties. In this paper, we propose RLEMMO, a Meta-Black-Box Optimization framework, which maintains a population of solutions and incorporates a reinforcement learning agent for flexibly adjusting individual-level searching strategies to match the up-to-date optimization status, hence boosting the search performance on MMOP. Concretely, we encode landscape properties and evolution path information into each individual and then leverage attention networks to advance population information sharing. With a novel reward mechanism that encourages both quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm. The experimental results on the CEC2013 MMOP benchmark underscore the competitive optimization performance of RLEMMO against several strong baselines.

Updated: 2024-04-12 05:02:49

标题: RLEMMO：深度强化学习辅助的多模态进化优化

摘要: 解决多模态优化问题（MMOP）需要找到所有最优解，这在有限的函数评估中是具有挑战性的。尽管现有的作品通过手工制定的自适应策略在探索和开发之间取得平衡，但它们需要一定的专业知识，因此无法灵活处理具有不同属性的MMOP。在本文中，我们提出了RLEMMO，一个元黑盒优化框架，它维护一个解决方案种群，并整合了一个强化学习代理，用于灵活调整个体级搜索策略，以匹配最新的优化状态，从而提高MMOP的搜索性能。具体而言，我们将景观特性和演变路径信息编码到每个个体中，然后利用注意力网络推进种群信息共享。通过一种鼓励质量和多样性的新颖奖励机制，RLEMMO可以通过政策梯度算法进行有效训练。CEC2013 MMOP基准测试的实验结果突显了RLEMMO相对于几种强基准的竞争优化性能。

更新时间: 2024-04-12 05:02:49

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2404.08242v1

Securing Monolithic Kernels using Compartmentalization

Monolithic operating systems, where all kernel functionality resides in a single, shared address space, are the foundation of most mainstream computer systems. However, a single flaw, even in a non-essential part of the kernel (e.g., device drivers), can cause the entire operating system to fall under an attacker's control. Kernel hardening techniques might prevent certain types of vulnerabilities, but they fail to address a fundamental weakness: the lack of intra-kernel security that safely isolates different parts of the kernel. We survey kernel compartmentalization techniques that define and enforce intra-kernel boundaries and propose a taxonomy that allows the community to compare and discuss future work. We also identify factors that complicate comparisons among compartmentalized systems, suggest new ways to compare future approaches with existing work meaningfully, and discuss emerging research directions.

Updated: 2024-04-12 04:55:13

标题: 使用分区技术保护单体内核

摘要: 单体操作系统，其中所有内核功能都存在于单一共享的地址空间中，是大多数主流计算机系统的基础。然而，即使是内核的非必要部分（例如设备驱动程序）存在一个缺陷，也可能导致整个操作系统落入攻击者控制之下。内核加固技术可能可以防止某些类型的漏洞，但它们未能解决一个根本性的弱点：内核之间缺乏安全隔离的问题。我们调查了定义和执行内核边界的内核分隔技术，并提出了一个分类法，让社区能够比较和讨论未来的工作。我们还确定了使分隔系统之间的比较复杂化的因素，建议新的方法来有意义地比较未来的方法与现有工作，并讨论新兴的研究方向。

更新时间: 2024-04-12 04:55:13

领域: cs.CR,cs.OS

下载: http://arxiv.org/abs/2404.08716v1

Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets

While parameter efficient tuning (PET) methods have shown great potential with transformer architecture on Natural Language Processing (NLP) tasks, their effectiveness with large-scale ConvNets is still under-studied on Computer Vision (CV) tasks. This paper proposes Conv-Adapter, a PET module designed for ConvNets. Conv-Adapter is light-weight, domain-transferable, and architecture-agnostic with generalized performance on different tasks. When transferring on downstream tasks, Conv-Adapter learns tasks-specific feature modulation to the intermediate representations of backbones while keeping the pre-trained parameters frozen. By introducing only a tiny amount of learnable parameters, e.g., only 3.5% full fine-tuning parameters of ResNet50. It can also be applied for transformer-based backbones. Conv-Adapter outperforms previous PET baseline methods and achieves comparable or surpasses the performance of full fine-tuning on 23 classification tasks of various domains. It also presents superior performance on the few-shot classification with an average margin of 3.39%. Beyond classification, Conv-Adapter can generalize to detection and segmentation tasks with more than 50% reduction of parameters but comparable performance to the traditional full fine-tuning.

Updated: 2024-04-12 04:48:48

标题: Conv-Adapter：探索卷积神经网络的参数高效迁移学习

摘要: 虽然参数高效调整（PET）方法在自然语言处理（NLP）任务中的变压器架构中显示出巨大潜力，但它们在计算机视觉（CV）任务中大规模卷积神经网络上的有效性仍未得到充分研究。本文提出了Conv-Adapter，一个专为卷积神经网络设计的PET模块。Conv-Adapter轻量级，领域可迁移，并且在不同任务上具有通用性能。在传输到下游任务时，Conv-Adapter学习了针对特定任务的特征调制到骨干网络的中间表示，同时保持预训练参数冻结。通过引入仅少量可学习参数，例如ResNet50的仅3.5％完全微调参数，Conv-Adapter也可应用于基于变压器的骨干网络。Conv-Adapter优于先前的PET基线方法，并在23个不同领域的分类任务中实现了与完全微调相媲美甚至超越的性能。它还在少样本分类方面表现出优越性能，平均边际为3.39％。除了分类，Conv-Adapter还可以推广到检测和分割任务，参数减少超过50％，但性能与传统完全微调相媲美。

更新时间: 2024-04-12 04:48:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2208.07463v4

Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning

Evolutionary computation (EC) algorithms, renowned as powerful black-box optimizers, leverage a group of individuals to cooperatively search for the optimum. The exploration-exploitation tradeoff (EET) plays a crucial role in EC, which, however, has traditionally been governed by manually designed rules. In this paper, we propose a deep reinforcement learning-based framework that autonomously configures and adapts the EET throughout the EC search process. The framework allows different individuals of the population to selectively attend to the global and local exemplars based on the current search state, maximizing the cooperative search outcome. Our proposed framework is characterized by its simplicity, effectiveness, and generalizability, with the potential to enhance numerous existing EC algorithms. To validate its capabilities, we apply our framework to several representative EC algorithms and conduct extensive experiments on the augmented CEC2021 benchmark. The results demonstrate significant improvements in the performance of the backbone algorithms, as well as favorable generalization across diverse problem classes, dimensions, and population sizes. Additionally, we provide an in-depth analysis of the EET issue by interpreting the learned behaviors of EC.

Updated: 2024-04-12 04:48:32

标题: 通过深度强化学习在进化计算中自动配置探索-开发权衡

摘要: 进化计算（EC）算法被誉为强大的黑盒优化器，利用一组个体协同搜索最优解。探索-开发权衡（EET）在EC中起着至关重要的作用，然而，传统上由人工设计的规则来管理。本文提出了一个基于深度强化学习的框架，自动配置和调整EC搜索过程中的EET。该框架允许种群的不同个体根据当前搜索状态选择性地关注全局和局部示例，最大化协同搜索结果。我们提出的框架以其简单性、有效性和泛化能力而闻名，有潜力增强许多现有的EC算法。为了验证其能力，我们将我们的框架应用于几种代表性的EC算法，并在增强的CEC2021基准上进行了大量实验。结果显示骨干算法性能显著提升，同时在不同问题类别、维度和种群大小上具有良好的泛化能力。此外，我们通过解释EC的学习行为，对EET问题进行了深入分析。

更新时间: 2024-04-12 04:48:32

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2404.08239v1

IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer

Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer (IFViT), which consists of two primary modules. The first module, an interpretable dense registration module, establishes a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and the global context in fingerprint pairs. It provides interpretable dense pixel-wise correspondences of feature points for fingerprint alignment and enhances the interpretability in the subsequent matching stage. The second module takes into account both local and global representations of the aligned fingerprint pair to achieve an interpretable fixed-length representation extraction and matching. It employs the ViTs trained in the first module with the additional fully connected layer and retrains them to simultaneously produce the discriminative fixed-length representation and interpretable dense pixel-wise correspondences of feature points. Extensive experimental results on diverse publicly available fingerprint databases demonstrate that the proposed framework not only exhibits superior performance on dense registration and matching but also significantly promotes the interpretability in deep fixed-length representations-based fingerprint matching.

Updated: 2024-04-12 04:44:11

标题: IFViT：通过视觉Transformer进行指纹匹配的可解释固定长度表示

摘要: 在构建用于准确匹配的深度固定长度表示中确定指纹上的密集特征点，特别是在像素级别，具有重要意义。为了探索指纹匹配的可解释性，我们提出了一个多阶段可解释的指纹匹配网络，即通过Vision Transformer（IFViT）实现指纹匹配的可解释固定长度表示，它由两个主要模块组成。第一个模块是一个可解释的密集配准模块，建立了一个基于Vision Transformer（ViT）的孪生网络，以捕获指纹对中的长距离依赖性和全局上下文。它提供了用于指纹对齐的可解释的密集像素级特征点对应，并增强了后续匹配阶段的可解释性。第二个模块考虑了对齐指纹对的局部和全局表示，以实现可解释的固定长度表示的提取和匹配。它利用第一个模块中训练的ViTs，并添加完全连接层，重新训练它们，以同时产生具有辨别性的固定长度表示和可解释的密集像素级特征点对应。对多样化的公开可用指纹数据库进行的大量实验结果表明，所提出的框架不仅在密集配准和匹配方面表现出优越性，而且显著提升了基于深度固定长度表示的指纹匹配的可解释性。

更新时间: 2024-04-12 04:44:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.08237v1

Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning

Hyperparameter optimization plays a key role in the machine learning domain. Its significance is especially pronounced in reinforcement learning (RL), where agents continuously interact with and adapt to their environments, requiring dynamic adjustments in their learning trajectories. To cater to this dynamicity, the Population-Based Training (PBT) was introduced, leveraging the collective intelligence of a population of agents learning simultaneously. However, PBT tends to favor high-performing agents, potentially neglecting the explorative potential of agents on the brink of significant advancements. To mitigate the limitations of PBT, we present the Generalized Population-Based Training (GPBT), a refined framework designed for enhanced granularity and flexibility in hyperparameter adaptation. Complementing GPBT, we further introduce Pairwise Learning (PL). Instead of merely focusing on elite agents, PL employs a comprehensive pairwise strategy to identify performance differentials and provide holistic guidance to underperforming agents. By integrating the capabilities of GPBT and PL, our approach significantly improves upon traditional PBT in terms of adaptability and computational efficiency. Rigorous empirical evaluations across a range of RL benchmarks confirm that our approach consistently outperforms not only the conventional PBT but also its Bayesian-optimized variant.

Updated: 2024-04-12 04:23:20

标题: 强化学习中超参数优化的广义基于人口的训练

摘要: 超参数优化在机器学习领域起着关键作用。其在强化学习（RL）中的重要性尤为突出，代理不断与环境互动并适应，需要动态调整学习轨迹。为了满足这种动态性，引入了基于种群的训练（PBT），利用同时学习的一群代理的集体智慧。然而，PBT倾向于偏爱表现出色的代理，可能忽视处于重大进展边缘的代理的探索潜力。为了缓解PBT的局限性，我们提出了广义基于种群的训练（GPBT），这是一个精细设计的框架，用于增强超参数适应性的粒度和灵活性。为了补充GPBT，我们进一步引入了成对学习（PL）。PL不仅仅关注精英代理，而是采用全面的成对策略来识别性能差异，并为表现不佳的代理提供全面指导。通过整合GPBT和PL的能力，我们的方法在适应性和计算效率方面显著优于传统PBT。在一系列RL基准测试中进行的严格实证评估证实，我们的方法不仅在性能上优于传统的PBT，还优于其贝叶斯优化变体。

更新时间: 2024-04-12 04:23:20

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2404.08233v1

Navigating Quantum Security Risks in Networked Environments: A Comprehensive Study of Quantum-Safe Network Protocols

The emergence of quantum computing poses a formidable security challenge to network protocols traditionally safeguarded by classical cryptographic algorithms. This paper provides an exhaustive analysis of vulnerabilities introduced by quantum computing in a diverse array of widely utilized security protocols across the layers of the TCP/IP model, including TLS, IPsec, SSH, PGP, and more. Our investigation focuses on precisely identifying vulnerabilities susceptible to exploitation by quantum adversaries at various migration stages for each protocol while also assessing the associated risks and consequences for secure communication. We delve deep into the impact of quantum computing on each protocol, emphasizing potential threats posed by quantum attacks and scrutinizing the effectiveness of post-quantum cryptographic solutions. Through carefully evaluating vulnerabilities and risks that network protocols face in the post-quantum era, this study provides invaluable insights to guide the development of appropriate countermeasures. Our findings contribute to a broader comprehension of quantum computing's influence on network security and offer practical guidance for protocol designers, implementers, and policymakers in addressing the challenges stemming from the advancement of quantum computing. This comprehensive study is a crucial step toward fortifying the security of networked environments in the quantum age.

Updated: 2024-04-12 04:20:05

标题: 在网络环境中导航量子安全风险：量子安全网络协议全面研究

摘要: 量子计算的出现给传统上由经典密码算法保护的网络协议带来了严峻的安全挑战。本文对量子计算在TCP/IP模型的各层中广泛使用的安全协议（包括TLS、IPsec、SSH、PGP等）引入的漏洞进行了详尽的分析。我们的研究重点在于精确定位每个协议在不同迁移阶段容易受到量子对手利用的漏洞，同时评估安全通信的相关风险和后果。我们深入探讨了量子计算对每个协议的影响，强调了量子攻击可能带来的潜在威胁，并审视了后量子密码解决方案的有效性。通过仔细评估网络协议在后量子时代面临的漏洞和风险，本研究为指导制定适当的对策提供了宝贵的见解。我们的发现有助于更广泛地理解量子计算对网络安全的影响，并为协议设计者、实施者和政策制定者提供实用指导，以应对量子计算进步带来的挑战。这项全面研究是加强量子时代网络环境安全的关键一步。

更新时间: 2024-04-12 04:20:05

领域: cs.CR

下载: http://arxiv.org/abs/2404.08232v1

Evaluation Framework for Quantum Security Risk Assessment: A Comprehensive Study for Quantum-Safe Migration

The rise of large-scale quantum computing poses a significant threat to traditional cryptographic security measures. Quantum attacks undermine current asymmetric cryptographic algorithms, rendering them ineffective. Even symmetric key cryptography is vulnerable, albeit to a lesser extent, suggesting longer keys or extended hash functions for security. Thus, current cryptographic solutions are inadequate against emerging quantum threats. Organizations must transition to quantum-safe environments with robust continuity plans and meticulous risk management. This study explores the challenges of migrating to quantum-safe cryptographic states, introducing a comprehensive security risk assessment framework. We propose a security risk assessment framework that examines vulnerabilities across algorithms, certificates, and protocols throughout the migration process (pre-migration, during migration, post-migration). We link these vulnerabilities to the STRIDE threat model to assess their impact and likelihood. Then, we discuss practical mitigation strategies for critical components like algorithms, public key infrastructures, and protocols. Our study not only identifies potential attacks and vulnerabilities at each layer and migration stage but also suggests possible countermeasures and alternatives to enhance system resilience, empowering organizations to construct a secure infrastructure for the quantum era. Through these efforts, we establish the foundation for enduring security in networked systems amid the challenges of the quantum era.

Updated: 2024-04-12 04:18:58

标题: 量子安全风险评估的评估框架：量子安全迁移的综合研究

摘要: 大规模量子计算的兴起对传统的加密安全措施构成了重大威胁。量子攻击破坏了当前的非对称加密算法，使其失效。即使对称密钥加密也是脆弱的，尽管程度较小，建议采用更长的密钥或扩展哈希函数以提高安全性。因此，当前的加密解决方案无法应对新兴的量子威胁。组织必须转向具有强大连续性计划和细致风险管理的量子安全环境。本研究探讨了迁移到量子安全加密状态的挑战，引入了综合的安全风险评估框架。我们提出了一个安全风险评估框架，通过整个迁移过程（迁移前、迁移中、迁移后）检查算法、证书和协议的漏洞。我们将这些漏洞与STRIDE威胁模型联系起来，评估其影响和可能性。然后，我们讨论了关键组件如算法、公钥基础设施和协议的实际缓解策略。我们的研究不仅在每个层次和迁移阶段识别潜在的攻击和漏洞，还提出了可能的对策和替代方案，以增强系统的韧性，使组织能够为量子时代构建安全的基础设施。通过这些努力，我们为在量子时代的挑战中建立网络系统的持久安全奠定了基础。

更新时间: 2024-04-12 04:18:58

领域: cs.CR

下载: http://arxiv.org/abs/2404.08231v1

Enhancing Fairness and Performance in Machine Learning Models: A Multi-Task Learning Approach with Monte-Carlo Dropout and Pareto Optimality

This paper considers the need for generalizable bias mitigation techniques in machine learning due to the growing concerns of fairness and discrimination in data-driven decision-making procedures across a range of industries. While many existing methods for mitigating bias in machine learning have succeeded in specific cases, they often lack generalizability and cannot be easily applied to different data types or models. Additionally, the trade-off between accuracy and fairness remains a fundamental tension in the field. To address these issues, we propose a bias mitigation method based on multi-task learning, utilizing the concept of Monte-Carlo dropout and Pareto optimality from multi-objective optimization. This method optimizes accuracy and fairness while improving the model's explainability without using sensitive information. We test this method on three datasets from different domains and show how it can deliver the most desired trade-off between model fairness and performance. This allows for tuning in specific domains where one metric may be more important than another. With the framework we introduce in this paper, we aim to enhance the fairness-performance trade-off and offer a solution to bias mitigation methods' generalizability issues in machine learning.

Updated: 2024-04-12 04:17:50

标题: 提高机器学习模型的公平性和性能：一种基于蒙特卡洛Dropout和帕累托最优性的多任务学习方法

摘要: 本文考虑了机器学习中对可推广的偏见缓解技术的需求，这是由于在各行各业的数据驱动决策过程中对公平性和歧视的日益关注。虽然许多现有的机器学习中减少偏见的方法已经在特定案例中取得成功，但它们经常缺乏泛化能力，不能轻松地应用于不同的数据类型或模型。此外，准确性和公平性之间的权衡仍然是该领域的一个基本张力。为了解决这些问题，我们提出了一种基于多任务学习的偏见缓解方法，利用了蒙特卡罗辍学和多目标优化中的帕累托最优性概念。该方法在不使用敏感信息的情况下优化准确性和公平性，同时提高模型的可解释性。我们在来自不同领域的三个数据集上测试了这种方法，并展示了它如何可以提供模型公平性和性能之间最理想的权衡。这使得可以在某些特定领域进行调整，其中一个指标可能比另一个更重要。通过本文介绍的框架，我们旨在增强公平性和性能之间的权衡，并为机器学习中偏见缓解方法的泛化问题提供解决方案。

更新时间: 2024-04-12 04:17:50

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2404.08230v1

Differentially Private Log-Location-Scale Regression Using Functional Mechanism

This article introduces differentially private log-location-scale (DP-LLS) regression models, which incorporate differential privacy into LLS regression through the functional mechanism. The proposed models are established by injecting noise into the log-likelihood function of LLS regression for perturbed parameter estimation. We will derive the sensitivities utilized to determine the magnitude of the injected noise and prove that the proposed DP-LLS models satisfy $\epsilon$-differential privacy. In addition, we will conduct simulations and case studies to evaluate the performance of the proposed models. The findings suggest that predictor dimension, training sample size, and privacy budget are three key factors impacting the performance of the proposed DP-LLS regression models. Moreover, the results indicate that a sufficiently large training dataset is needed to simultaneously ensure decent performance of the proposed models and achieve a satisfactory level of privacy protection.

Updated: 2024-04-12 04:14:08

标题: 使用功能机制的差分隐私对数-位置-尺度回归

摘要: 这篇文章介绍了不同ially private log-location-scale (DP-LLS)回归模型，通过功能机制将差分隐私融入LLS回归中。提出的模型通过向LLS回归的对数似然函数注入噪声来进行扰动参数估计。我们将推导用于确定注入噪声量的灵敏度，并证明所提出的DP-LLS模型满足ε-差分隐私。此外，我们将进行模拟和案例研究，以评估所提出模型的性能。研究结果表明，预测维度、训练样本大小和隐私预算是影响所提出DP-LLS回归模型性能的三个关键因素。此外，结果表明，需要足够大的训练数据集才能同时确保所提出模型的良好性能并实现令人满意的隐私保护水平。

更新时间: 2024-04-12 04:14:08

领域: stat.ML,cs.CR,cs.LG,stat.AP

下载: http://arxiv.org/abs/2404.08715v1

Increasing Trust in Language Models through the Reuse of Verified Circuits

Language Models (LMs) are increasingly used for a wide range of prediction tasks, but their training can often neglect rare edge cases, reducing their reliability. Here, we define a stringent standard of trustworthiness whereby the task algorithm and circuit implementation must be verified, accounting for edge cases, with no known failure modes. We show that a transformer model can be trained to meet this standard if built using mathematically and logically specified frameworks. In this paper, we fully verify a model for n-digit integer addition. To exhibit the reusability of verified modules, we insert the trained integer addition model into an untrained model and train the combined model to perform both addition and subtraction. We find extensive reuse of the addition circuits for both tasks, easing verification of the more complex subtractor model. We discuss how inserting verified task modules into LMs can leverage model reuse to improve verifiability and trustworthiness of language models built using them. The reuse of verified circuits reduces the effort to verify more complex composite models which we believe to be a significant step towards safety of language models.

Updated: 2024-04-12 03:57:24

标题: 通过重复使用经过验证的电路提高对语言模型的信任

摘要: 语言模型（LMs）越来越被广泛用于各种预测任务，但它们的训练往往会忽略罕见的边缘情况，降低它们的可靠性。在这里，我们定义了一个严格的可靠性标准，即任务算法和电路实现必须经过验证，考虑边缘情况，没有已知的故障模式。我们展示了如果使用数学和逻辑规范的框架构建，可以训练一个变压器模型以满足这一标准。在本文中，我们完全验证了一个n位整数加法模型。为了展示经过验证模块的可重用性，我们将训练好的整数加法模型插入一个未经训练的模型中，并训练组合模型执行加法和减法。我们发现加法电路在两个任务中都得到了广泛重用，简化了更复杂减法模型的验证。我们讨论了如何将经过验证的任务模块插入LMs中，利用模型重用来提高使用它们构建的语言模型的可验证性和可靠性。经过验证电路的重用减少了验证更复杂的复合模型的工作量，我们认为这是向语言模型安全性迈出的重要一步。

更新时间: 2024-04-12 03:57:24

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.02619v3

Machine Learning-based Approach for Ex-post Assessment of Community Risk and Resilience Based on Coupled Human-infrastructure Systems Performance

There is a limitation in the literature of data-driven analyses for the ex-post evaluation of community risk and resilience, particularly using features related to the performance of coupled human-infrastructure systems. To address this gap, in this study we created a machine learning-based method for the ex-post assessment of community risk and resilience and their interplay based on features related to the coupled human-infrastructure systems performance. Utilizing feature groups related to population protective actions, infrastructure/building performance features, and recovery features, we examined the risk and resilience performance of communities in the context of the 2017 Hurricane Harvey in Harris County, Texas. These features related to the coupled human-infrastructure systems performance were processed using the K-means clustering method to classify census block groups into four distinct clusters then, based on feature analysis, these clusters were labeled and designated into four quadrants of risk-resilience archetypes. Finally, we analyzed the disparities in risk-resilience status of spatial areas across different clusters as well as different income groups. The findings unveil the risk-resilience status of spatial areas shaped by their coupled human-infrastructure systems performance and their interactions. The results also inform about features that contribute to high resilience in high-risk areas. For example, the results indicate that in high-risk areas, evacuation rates contributed to a greater resilience, while in low-risk areas, preparedness contributed to greater resilience.

Updated: 2024-04-12 03:46:38

标题: 基于耦合人类基础设施系统绩效的社区风险和韧性后评估的机器学习方法

摘要: 在数据驱动分析的文献中存在一个局限性，即关于社区风险和韧性的事后评估，特别是使用与耦合的人类基础设施系统性能相关的特征。为了填补这一空白，在这项研究中，我们创建了一种基于机器学习的方法，用于基于与耦合的人类基础设施系统性能相关的特征对社区风险和韧性进行事后评估以及它们之间的相互作用。利用与人口保护行动、基础设施/建筑性能特征和恢复特征相关的特征组，我们在2017年德克萨斯州哈里斯县飓风哈维的背景下研究了社区的风险和韧性表现。这些与耦合的人类基础设施系统性能相关的特征使用K均值聚类方法进行处理，将人口普查区块组划分为四个不同的簇，然后基于特征分析，这些簇被标记并指定为四个风险-韧性典型象限。最后，我们分析了不同簇和不同收入群体之间空间区域的风险-韧性状态的差异。研究结果揭示了由耦合的人类基础设施系统性能和它们之间的互动塑造的空间区域的风险-韧性状态。结果还提供了有助于高风险地区高韧性的特征。例如，结果表明在高风险地区，撤离率有助于更大的韧性，而在低风险地区，准备工作有助于更大的韧性。

更新时间: 2024-04-12 03:46:38

领域: cs.CY,cs.LG

下载: http://arxiv.org/abs/2404.07966v2

Distributed Multi-Agent Reinforcement Learning Based on Graph-Induced Local Value Functions

Achieving distributed reinforcement learning (RL) for large-scale cooperative multi-agent systems (MASs) is challenging because: (i) each agent has access to only limited information; (ii) issues on convergence or computational complexity emerge due to the curse of dimensionality. In this paper, we propose a general computationally efficient distributed framework for cooperative multi-agent reinforcement learning (MARL) by utilizing the structures of graphs involved in this problem. We introduce three coupling graphs describing three types of inter-agent couplings in MARL, namely, the state graph, the observation graph and the reward graph. By further considering a communication graph, we propose two distributed RL approaches based on local value-functions derived from the coupling graphs. The first approach is able to reduce sample complexity significantly under specific conditions on the aforementioned four graphs. The second approach provides an approximate solution and can be efficient even for problems with dense coupling graphs. Here there is a trade-off between minimizing the approximation error and reducing the computational complexity. Simulations show that our RL algorithms have a significantly improved scalability to large-scale MASs compared with centralized and consensus-based distributed RL algorithms.

Updated: 2024-04-12 03:41:09

标题: 基于图诱导的局部值函数的分布式多智能体强化学习

摘要: 实现大规模协作多智能体系统的分布式强化学习（RL）具有挑战性，因为：（i）每个智能体只能访问有限的信息；（ii）由于维度诅咒，收敛或计算复杂性问题浮现。在本文中，我们提出了一种利用涉及问题中的图结构的计算高效的分布式协作多智能体强化学习（MARL）框架。我们引入了三种描述MARL中三种类型智能体耦合的耦合图，即状态图、观测图和奖励图。通过进一步考虑通信图，我们提出了两种基于耦合图导出的局部值函数的分布式RL方法。第一种方法能够在前述四个图的特定条件下显著降低样本复杂性。第二种方法提供了一个近似解，即使对于具有密集耦合图的问题也很高效。在最小化近似误差和降低计算复杂性之间存在权衡。模拟结果表明，与集中式和基于共识的分布式RL算法相比，我们的RL算法在大规模MASs中具有显著提高的可扩展性。

更新时间: 2024-04-12 03:41:09

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2202.13046v5

HCL-MTSAD: Hierarchical Contrastive Consistency Learning for Accurate Detection of Industrial Multivariate Time Series Anomalies

Multivariate Time Series (MTS) anomaly detection focuses on pinpointing samples that diverge from standard operational patterns, which is crucial for ensuring the safety and security of industrial applications. The primary challenge in this domain is to develop representations capable of discerning anomalies effectively. The prevalent methods for anomaly detection in the literature are predominantly reconstruction-based and predictive in nature. However, they typically concentrate on a single-dimensional instance level, thereby not fully harnessing the complex associations inherent in industrial MTS. To address this issue, we propose a novel self-supervised hierarchical contrastive consistency learning method for detecting anomalies in MTS, named HCL-MTSAD. It innovatively leverages data consistency at multiple levels inherent in industrial MTS, systematically capturing consistent associations across four latent levels-measurement, sample, channel, and process. By developing a multi-layer contrastive loss, HCL-MTSAD can extensively mine data consistency and spatio-temporal association, resulting in more informative representations. Subsequently, an anomaly discrimination module, grounded in self-supervised hierarchical contrastive learning, is designed to detect timestamp-level anomalies by calculating multi-scale data consistency. Extensive experiments conducted on six diverse MTS datasets retrieved from real cyber-physical systems and server machines, in comparison with 20 baselines, indicate that HCL-MTSAD's anomaly detection capability outperforms the state-of-the-art benchmark models by an average of 1.8\% in terms of F1 score.

Updated: 2024-04-12 03:39:33

标题: HCL-MTSAD：用于准确检测工业多变量时间序列异常的分层对比一致性学习

摘要: 多变量时间序列（MTS）异常检测着重于确定与标准操作模式偏离的样本，这对于确保工业应用的安全性和安全性至关重要。该领域的主要挑战在于开发能够有效识别异常的表示。文献中用于异常检测的主要方法主要是基于重建和预测的性质。然而，它们通常集中在单维实例级别，因此没有充分利用工业MTS中固有的复杂关联。为了解决这个问题，我们提出了一种用于检测MTS中异常的新型自监督分层对比一致性学习方法，命名为HCL-MTSAD。它创新地利用工业MTS中多个层次的数据一致性，系统地捕获跨四个潜在层次-测量、样本、通道和过程的一致关联。通过开发多层对比损失，HCL-MTSAD可以广泛挖掘数据一致性和时空关联，从而产生更具信息性的表示。随后，基于自监督分层对比学习的异常区分模块被设计用于通过计算多尺度数据一致性来检测时间戳级别的异常。在从实际网络物理系统和服务器机器检索的六个不同的MTS数据集上进行的广泛实验与20个基线进行比较，表明HCL-MTSAD的异常检测能力在F1分数方面优于现有技术基准模型平均1.8％。

更新时间: 2024-04-12 03:39:33

领域: cs.LG,cs.AI,cs.CR,cs.IT,cs.SY,eess.SY,math.IT

下载: http://arxiv.org/abs/2404.08224v1

Probabilistic Survival Analysis by Approximate Bayesian Inference of Neural Networks

Predicting future events always comes with uncertainty, but traditional non-probabilistic methods cannot distinguish certain from uncertain predictions. In survival analysis, probabilistic methods applied to state-of-the-art solutions in the healthcare and biomedical field are still novel, and their implications have not been fully evaluated. In this paper, we study the benefits of modeling uncertainty in deep neural networks for survival analysis with a focus on prediction and calibration performance. For this, we present a Bayesian deep learning framework that consists of three probabilistic network architectures, which we train by optimizing the Cox partial likelihood and combining input-dependent aleatoric uncertainty together with epistemic uncertainty. This enables us to provide uncertainty estimates as credible intervals when predicting the survival curve or as a probability density function over the predicted median survival times. For our empirical analyses, we evaluated our proposed method on four benchmark datasets and found that our method demonstrates prediction performance comparable to the state-of-the-art based on the concordance index and outperforms all other Cox-based approaches in terms of the mean absolute error. Our work explicitly compares the extent to which different Bayesian approximation techniques differ from each other and improves the prediction over traditional non-probabilistic alternatives.

Updated: 2024-04-12 03:27:02

标题: 通过神经网络的近似贝叶斯推断进行概率生存分析

摘要: 预测未来事件总是伴随着不确定性，但传统的非概率方法无法区分确定与不确定的预测。在生存分析中，应用于医疗和生物医学领域最先进解决方案的概率方法仍然是新颖的，它们的影响尚未完全评估。在本文中，我们研究了在深度神经网络中建模不确定性对生存分析的预测和校准性能的益处。为此，我们提出了一个贝叶斯深度学习框架，包括三个概率网络架构，通过优化Cox部分似然并结合输入相关的随机不确定性和认识不确定性进行训练。这使我们能够在预测生存曲线时提供可信区间的不确定性估计，或者在预测中位生存时间时提供概率密度函数。在我们的实证分析中，我们在四个基准数据集上评估了我们提出的方法，并发现我们的方法在预测性能方面与最先进技术相媲美，根据一致性指数，我们的方法在平均绝对误差方面优于所有其他基于Cox的方法。我们的工作明确比较了不同贝叶斯逼近技术之间的差异程度，并改进了传统非概率替代方案的预测能力。

更新时间: 2024-04-12 03:27:02

领域: cs.LG

下载: http://arxiv.org/abs/2404.06421v2

A Technique for Classifying Static Gestures Using UWB Radar

Our paper presents a robust framework for UWB-based static gesture recognition, leveraging proprietary UWB radar sensor technology. Extensive data collection efforts were undertaken to compile datasets containing five commonly used gestures. Our approach involves a comprehensive data pre-processing pipeline that encompasses outlier handling, aspect ratio-preserving resizing, and false-color image transformation. Both CNN and MobileNet models were trained on the processed images. Remarkably, our best-performing model achieved an accuracy of 96.78%. Additionally, we developed a user-friendly GUI framework to assess the model's system resource usage and processing times, which revealed low memory utilization and real-time task completion in under one second. This research marks a significant step towards enhancing static gesture recognition using UWB technology, promising practical applications in various domains.

Updated: 2024-04-12 03:14:34

标题: 一种利用UWB雷达对静态手势进行分类的技术

摘要: 我们的论文提出了一个基于UWB技术的静态手势识别的稳健框架，利用了专有的UWB雷达传感器技术。我们进行了大量的数据收集工作，编制了包含五种常用手势的数据集。我们的方法涉及一个全面的数据预处理流程，包括异常值处理、保持长宽比的调整大小和伪彩色图像转换。我们在处理后的图像上训练了CNN和MobileNet模型。值得注意的是，我们表现最佳的模型实现了96.78%的准确率。此外，我们开发了一个用户友好的GUI框架，用于评估模型的系统资源使用情况和处理时间，显示出低内存利用率和在不到一秒内完成实时任务。这项研究标志着利用UWB技术增强静态手势识别的重要一步，有望在各个领域取得实际应用。

更新时间: 2024-04-12 03:14:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2310.15036v3

Label-based Graph Augmentation with Metapath for Graph Anomaly Detection

Graph anomaly detection has attracted considerable attention from various domain ranging from network security to finance in recent years. Due to the fact that labeling is very costly, existing methods are predominately developed in an unsupervised manner. However, the detected anomalies may be found out uninteresting instances due to the absence of prior knowledge regarding the anomalies looking for. This issue may be solved by using few labeled anomalies as prior knowledge. In real-world scenarios, we can easily obtain few labeled anomalies. Efficiently leveraging labelled anomalies as prior knowledge is crucial for graph anomaly detection; however, this process remains challenging due to the inherently limited number of anomalies available. To address the problem, we propose a novel approach that leverages metapath to embed actual connectivity patterns between anomalous and normal nodes. To further efficiently exploit context information from metapath-based anomaly subgraph, we present a new framework, Metapath-based Graph Anomaly Detection (MGAD), incorporating GCN layers in both the dual-encoders and decoders to efficiently propagate context information between abnormal and normal nodes. Specifically, MGAD employs GNN-based graph autoencoder as its backbone network. Moreover, dual encoders capture the complex interactions and metapath-based context information between labeled and unlabeled nodes both globally and locally. Through a comprehensive set of experiments conducted on seven real-world networks, this paper demonstrates the superiority of the MGAD method compared to state-of-the-art techniques. The code is available at https://github.com/missinghwan/MGAD.

Updated: 2024-04-12 03:10:27

标题: 基于标签的元路径图增强用于图异常检测

摘要: 图形异常检测近年来引起了各个领域的广泛关注，从网络安全到金融。由于标记成本非常高昂，现有方法主要是以无监督的方式开发的。然而，由于缺乏有关寻找异常的先前知识，检测到的异常可能会被认为是无趣的实例。这个问题可以通过使用少量标记的异常作为先前知识来解决。在现实场景中，我们可以轻松获得少量标记的异常。有效地利用标记的异常作为先前知识对于图形异常检测至关重要；然而，由于可用异常数量本质上有限，这个过程仍然具有挑战性。为了解决这个问题，我们提出了一种新颖的方法，利用元路径来嵌入异常节点和正常节点之间的实际连接模式。为了进一步有效地利用基于元路径的异常子图的上下文信息，我们提出了一个新的框架，基于元路径的图异常检测（MGAD），将GCN层同时纳入双编码器和解码器中，以便在异常节点和正常节点之间有效传播上下文信息。具体来说，MGAD采用基于GNN的图自编码器作为其骨干网络。此外，双编码器在全局和局部同时捕获标记和未标记节点之间的复杂交互作用和基于元路径的上下文信息。通过对七个真实网络进行的一系列实验，本文展示了MGAD方法相对于最先进技术的优越性。代码可在https://github.com/missinghwan/MGAD 上找到。

更新时间: 2024-04-12 03:10:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2308.10918v2

Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection

The growing importance of multi-modal humor detection within affective computing correlates with the expanding influence of short-form video sharing on social media platforms. In this paper, we propose a novel two-branch hierarchical model for short-form video humor detection (SVHD), named Comment-aided Video-Language Alignment (CVLA) via data-augmented multi-modal contrastive pre-training. Notably, our CVLA not only operates on raw signals across various modal channels but also yields an appropriate multi-modal representation by aligning the video and language components within a consistent semantic space. The experimental results on two humor detection datasets, including DY11k and UR-FUNNY, demonstrate that CVLA dramatically outperforms state-of-the-art and several competitive baseline approaches. Our dataset, code and model release at https://github.com/yliu-cs/CVLA.

Updated: 2024-04-12 02:51:45

标题: 通过对比式预训练对短视频幽默检测进行评论辅助的视频-语言对齐

摘要: 随着短视频在社交媒体平台上的影响力不断扩大，多模态幽默检测在情感计算中的重要性日益增加。本文提出了一种新颖的用于短视频幽默检测（SVHD）的两分支层次模型，命名为评论辅助视频-语言对齐（CVLA），通过数据增强的多模态对比预训练实现。值得注意的是，我们的CVLA不仅可以操作各种模态通道上的原始信号，还能通过将视频和语言组件在一致的语义空间中对齐而生成适当的多模态表示。在包括DY11k和UR-FUNNY在内的两个幽默检测数据集上的实验结果表明，CVLA远远优于最先进和几种竞争性基线方法。我们的数据集、代码和模型发布在https://github.com/yliu-cs/CVLA。

更新时间: 2024-04-12 02:51:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.09055v2

Optimizing Cyber Response Time on Temporal Active Directory Networks Using Decoys

Microsoft Active Directory (AD) is the default security management system for Window domain network. We study the problem of placing decoys in AD network to detect potential attacks. We model the problem as a Stackelberg game between an attacker and a defender on AD attack graphs where the defender employs a set of decoys to detect the attacker on their way to Domain Admin (DA). Contrary to previous works, we consider time-varying (temporal) attack graphs. We proposed a novel metric called response time, to measure the effectiveness of our decoy placement in temporal attack graphs. Response time is defined as the duration from the moment attackers trigger the first decoy to when they compromise the DA. Our goal is to maximize the defender's response time to the worst-case attack paths. We establish the NP-hard nature of the defender's optimization problem, leading us to develop Evolutionary Diversity Optimization (EDO) algorithms. EDO algorithms identify diverse sets of high-quality solutions for the optimization problem. Despite the polynomial nature of the fitness function, it proves experimentally slow for larger graphs. To enhance scalability, we proposed an algorithm that exploits the static nature of AD infrastructure in the temporal setting. Then, we introduce tailored repair operations, ensuring the convergence to better results while maintaining scalability for larger graphs.

Updated: 2024-04-12 02:45:07

标题: 利用诱饵优化时间活跃目录网络上的网络响应时间

摘要: Microsoft Active Directory（AD）是Windows域网络的默认安全管理系统。我们研究在AD网络中放置诱饵以检测潜在攻击的问题。我们将问题建模为攻击者和防御者在AD攻击图上的斯塔克贝格博弈，防御者利用一组诱饵来检测攻击者在前往域管理员（DA）的过程中。与先前的作品相反，我们考虑时间变化（时间）攻击图。我们提出了一种称为响应时间的新颖度量标准，用于衡量我们在时间攻击图中放置诱饵的有效性。响应时间定义为攻击者触发第一个诱饵到他们妥协DA的持续时间。我们的目标是最大化防御者对最坏情况攻击路径的响应时间。我们确定了防御者优化问题的NP难性质，导致我们开发了进化多样性优化（EDO）算法。EDO算法识别出高质量解决方案的多样化集合。尽管适应性函数的多项式性质，但对于较大的图表，实验结果表明速度较慢。为了增强可扩展性，我们提出了一种利用时态设置中AD基础设施的静态性质的算法。然后，我们引入定制的修复操作，确保收敛到更好的结果同时保持对于较大图表的可扩展性。

更新时间: 2024-04-12 02:45:07

领域: cs.CR,cs.GT,cs.NE

下载: http://arxiv.org/abs/2403.18162v2

Self-Supervised Dataset Distillation for Transfer Learning

Dataset distillation methods have achieved remarkable success in distilling a large dataset into a small set of representative samples. However, they are not designed to produce a distilled dataset that can be effectively used for facilitating self-supervised pre-training. To this end, we propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL). We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is \textit{biased} due to the randomness originating from data augmentations or masking. To address this issue, we propose to minimize the mean squared error (MSE) between a model's representations of the synthetic examples and their corresponding learnable target feature representations for the inner objective, which does not introduce any randomness. Our primary motivation is that the model obtained by the proposed inner optimization can mimic the \textit{self-supervised target model}. To achieve this, we also introduce the MSE between representations of the inner model and the self-supervised target model on the original full dataset for outer optimization. Lastly, assuming that a feature extractor is fixed, we only optimize a linear head on top of the feature extractor, which allows us to reduce the computational cost and obtain a closed-form solution of the head with kernel ridge regression. We empirically validate the effectiveness of our method on various applications involving transfer learning.

Updated: 2024-04-12 01:53:33

标题: 自监督数据集精炼用于迁移学习

摘要: 数据集精简方法在将大型数据集提炼为一小组代表性样本方面取得了显著成功。然而，它们并不是为了产生一个可以有效用于促进自监督预训练的精简数据集而设计的。为此，我们提出了一个新颖的问题，即将一个未标记数据集提炼为一组小型合成样本，以便进行高效的自监督学习（SSL）。我们首先证明，在朴素的双层优化中，合成样本相对于SSL目标的梯度由于数据增强或掩码引起的随机性而是有偏的。为解决这一问题，我们提出最小化模型对合成示例的表示和其对应的可学习目标特征表示之间的均方误差（MSE）作为内部目标，这不会引入任何随机性。我们的主要动机是，通过所提出的内部优化获得的模型可以模仿自监督目标模型。为实现这一目标，我们还在外部优化中引入了内部模型表示和自监督目标模型在原始完整数据集上的MSE。最后，假设特征提取器固定，我们仅优化特征提取器之上的线性头部，这使我们能够降低计算成本并通过核岭回归获得头部的闭合形式解。我们在涉及迁移学习的各种应用中从经验上验证了我们方法的有效性。

更新时间: 2024-04-12 01:53:33

领域: cs.LG

下载: http://arxiv.org/abs/2310.06511v3

Reducing hallucination in structured outputs via Retrieval-Augmented Generation

A common and fundamental limitation of Generative AI (GenAI) is its propensity to hallucinate. While large language models (LLM) have taken the world by storm, without eliminating or at least reducing hallucinations, real-world GenAI systems may face challenges in user adoption. In the process of deploying an enterprise application that produces workflows based on natural language requirements, we devised a system leveraging Retrieval Augmented Generation (RAG) to greatly improve the quality of the structured output that represents such workflows. Thanks to our implementation of RAG, our proposed system significantly reduces hallucinations in the output and improves the generalization of our LLM in out-of-domain settings. In addition, we show that using a small, well-trained retriever encoder can reduce the size of the accompanying LLM, thereby making deployments of LLM-based systems less resource-intensive.

Updated: 2024-04-12 01:42:09

标题: 通过检索增强生成减少结构化输出中的幻觉

摘要: 生成式人工智能（GenAI）的一个普遍且基本的限制是其易产生幻觉。尽管大型语言模型（LLM）风靡全球，但如果不消除或至少减少幻觉，现实世界中的GenAI系统可能面临用户采用的挑战。在部署一个基于自然语言需求生成工作流的企业应用程序的过程中，我们设计了一个利用检索增强生成（RAG）的系统，大大提高了代表这些工作流的结构化输出的质量。由于我们实施了RAG，我们提出的系统显著减少了输出中的幻觉，并改善了我们的LLM在领域外设置中的泛化能力。此外，我们表明使用一个小型、经过良好训练的检索器编码器可以减少相应LLM的大小，从而使基于LLM的系统的部署更少资源密集。

更新时间: 2024-04-12 01:42:09

领域: cs.LG,cs.AI,cs.CL,cs.IR

下载: http://arxiv.org/abs/2404.08189v1

Large Language Model for Causal Decision Making

Large Language Models (LLMs) have shown their success in language understanding and reasoning on general topics. However, their capability to perform inference based on user-specified structured data and knowledge in corpus-rare concepts, such as causal decision-making is still limited. In this work, we explore the possibility of fine-tuning an open-sourced LLM into LLM4Causal, which can identify the causal task, execute a corresponding function, and interpret its numerical results based on users' queries and the provided dataset. Meanwhile, we propose a data generation process for more controllable GPT prompting and present two instruction-tuning datasets: (1) Causal-Retrieval-Bench for causal problem identification and input parameter extraction for causal function calling and (2) Causal-Interpret-Bench for in-context causal interpretation. By conducting end-to-end evaluations and two ablation studies, we showed that LLM4Causal can deliver end-to-end solutions for causal problems and provide easy-to-understand answers, which significantly outperforms the baselines.

Updated: 2024-04-12 01:30:55

标题: 大型语言模型用于因果决策制定

摘要: 大型语言模型（LLMs）已经展示了它们在一般主题的语言理解和推理中的成功。然而，它们在基于用户指定的结构化数据和语料库中稀有概念（如因果决策）进行推理的能力仍然有限。在这项工作中，我们探讨了将开源LLM微调为LLM4Causal的可能性，该模型可以识别因果任务，执行相应的函数，并根据用户的查询和提供的数据集解释其数值结果。同时，我们提出了一个用于更可控GPT提示的数据生成过程，并提出了两个指令微调数据集：（1）用于因果问题识别和输入参数提取的Causal-Retrieval-Bench，用于因果函数调用，以及（2）用于上下文因果解释的Causal-Interpret-Bench。通过进行端到端评估和两项消融研究，我们展示了LLM4Causal可以为因果问题提供端到端解决方案并提供易于理解的答案，明显优于基线模型。

更新时间: 2024-04-12 01:30:55

领域: cs.CL,cs.AI,stat.ML

下载: http://arxiv.org/abs/2312.17122v3

An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar performance compared to traditional methods. However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distribution of learner proficiency levels and non-uniform score intervals between different CEFR proficiency levels. To address these challenges, we explore the use of two novel modeling strategies: metric-based classification and loss reweighting, leveraging distinct SSL-based embedding features. Extensive experimental results on the ICNALE benchmark dataset suggest that our approach can outperform existing strong baselines by a sizable margin, achieving a significant improvement of more than 10% in CEFR prediction accuracy.

Updated: 2024-04-12 01:22:47

标题: 一种有效的自动化口语评估方法，用于缓解数据稀缺和分布不平衡问题

摘要: 自动化口语评估（ASA）通常涉及自动语音识别（ASR）和从学习者语音的ASR转录中手工提取特征。最近，自监督学习（SSL）已经表现出与传统方法相比的出色性能。然而，基于SSL的ASA系统面临至少三个与数据相关的挑战：有限的标注数据，学习者水平不均匀的分布以及不同CEFR熟练水平之间的非均匀分数间隔。为了解决这些挑战，我们探讨了两种新颖的建模策略：基于度量的分类和损失重加权，利用不同的SSL嵌入特征。对ICNALE基准数据集的大量实验结果表明，我们的方法可以比现有的强基线方法表现出更大的优势，CEFR预测准确度显著提高超过10%。

更新时间: 2024-04-12 01:22:47

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2404.07575v2

Multi-Scale Subgraph Contrastive Learning

Graph-level contrastive learning, aiming to learn the representations for each graph by contrasting two augmented graphs, has attracted considerable attention. Previous studies usually simply assume that a graph and its augmented graph as a positive pair, otherwise as a negative pair. However, it is well known that graph structure is always complex and multi-scale, which gives rise to a fundamental question: after graph augmentation, will the previous assumption still hold in reality? By an experimental analysis, we discover the semantic information of an augmented graph structure may be not consistent as original graph structure, and whether two augmented graphs are positive or negative pairs is highly related with the multi-scale structures. Based on this finding, we propose a multi-scale subgraph contrastive learning architecture which is able to characterize the fine-grained semantic information. Specifically, we generate global and local views at different scales based on subgraph sampling, and construct multiple contrastive relationships according to their semantic associations to provide richer self-supervised signals. Extensive experiments and parametric analyzes on eight graph classification real-world datasets well demonstrate the effectiveness of the proposed method.

Updated: 2024-04-12 01:15:01

标题: 多尺度子图对比学习

摘要: 图级对比学习旨在通过对比两个增强图来学习每个图的表示，吸引了相当多的关注。先前的研究通常简单地假定一个图及其增强图为正对，否则为负对。然而，众所周知，图结构总是复杂且多尺度的，这引发了一个基本问题：在图增强之后，先前的假设在现实中是否仍然成立？通过实验分析，我们发现增强图结构的语义信息可能与原始图结构不一致，而两个增强图是正对还是负对与多尺度结构密切相关。基于这一发现，我们提出了一个能够表征细粒度语义信息的多尺度子图对比学习架构。具体地，我们基于子图采样在不同尺度上生成全局和局部视图，并根据它们的语义关联构建多个对比关系，以提供更丰富的自监督信号。对八个图分类真实数据集的广泛实验和参数分析充分证明了所提方法的有效性。

更新时间: 2024-04-12 01:15:01

领域: cs.AI

下载: http://arxiv.org/abs/2403.02719v3

Provably Robust DPO: Aligning Language Models with Noisy Feedback

Learning from preference-based feedback has recently gained traction as a promising approach to align language models with human interests. While these aligned generative models have demonstrated impressive capabilities across various tasks, their dependence on high-quality human preference data poses a bottleneck in practical applications. Specifically, noisy (incorrect and ambiguous) preference pairs in the dataset might restrict the language models from capturing human intent accurately. While practitioners have recently proposed heuristics to mitigate the effect of noisy preferences, a complete theoretical understanding of their workings remain elusive. In this work, we aim to bridge this gap by by introducing a general framework for policy optimization in the presence of random preference flips. We focus on the direct preference optimization (DPO) algorithm in particular since it assumes that preferences adhere to the Bradley-Terry-Luce (BTL) model, raising concerns about the impact of noisy data on the learned policy. We design a novel loss function, which de-bias the effect of noise on average, making a policy trained by minimizing that loss robust to the noise. Under log-linear parameterization of the policy class and assuming good feature coverage of the SFT policy, we prove that the sub-optimality gap of the proposed robust DPO (rDPO) policy compared to the optimal policy is of the order $O(\frac{1}{1-2\epsilon}\sqrt{\frac{d}{n}})$, where $\epsilon < 1/2$ is flip rate of labels, $d$ is policy parameter dimension and $n$ is size of dataset. Our experiments on IMDb sentiment generation and Anthropic's helpful-harmless dataset show that rDPO is robust to noise in preference labels compared to vanilla DPO and other heuristics proposed by practitioners.

Updated: 2024-04-12 01:09:37

标题: 可以这样翻译：可证明鲁棒的DPO：将语言模型与嘈杂反馈对齐

摘要: 最近，从基于偏好的反馈中学习已经成为一种有前途的方法，可以使语言模型与人类兴趣保持一致。尽管这些对齐的生成模型在各种任务中展示了令人印象深刻的能力，但它们对高质量人类偏好数据的依赖在实际应用中构成了瓶颈。具体来说，数据集中的嘈杂（错误和模糊）偏好对语言模型准确捕捉人类意图可能造成限制。尽管从业者最近提出了一些启发式方法来减轻嘈杂偏好的影响，但对其工作原理的完整理论理解仍然是一个难题。在这项工作中，我们旨在通过引入一个在随机偏好翻转存在的情况下进行策略优化的通用框架来弥合这一差距。我们特别关注直接偏好优化（DPO）算法，因为它假设偏好符合布拉德利-特里-卢斯（BTL）模型，引发了对学习策略受噪声数据影响的担忧。我们设计了一种新颖的损失函数，该函数通过平均去除噪声对策略的影响，使通过最小化该损失进行训练的策略对噪声具有鲁棒性。在对策略类别进行对数线性参数化和假设SFT策略具有良好特征覆盖的情况下，我们证明了所提出的鲁棒DPO（rDPO）策略相对于最优策略的次优性差距为$O(\frac{1}{1-2\epsilon}\sqrt{\frac{d}{n}})$，其中$\epsilon < 1/2$是标签翻转率，$d$是策略参数维度，$n$是数据集大小。我们在IMDb情感生成和Anthropic的有用-无害数据集上的实验表明，与普通DPO和其他从业者提出的启发式方法相比，rDPO对偏好标签中的噪声具有鲁棒性。

更新时间: 2024-04-12 01:09:37

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2403.00409v2

BAMBOO: a predictive and transferable machine learning force field framework for liquid electrolyte development

Despite the widespread applications of machine learning force field (MLFF) on solids and small molecules, there is a notable gap in applying MLFF to complex liquid electrolytes. In this work, we introduce BAMBOO (ByteDance AI Molecular Simulation Booster), a novel framework for molecular dynamics (MD) simulations, with a demonstration of its capabilities in the context of liquid electrolytes for lithium batteries. We design a physics-inspired graph equivariant transformer architecture as the backbone of BAMBOO to learn from quantum mechanical simulations. Additionally, we pioneer an ensemble knowledge distillation approach and apply it on MLFFs to improve the stability of MD simulations. Finally, we propose the density alignment algorithm to align BAMBOO with experimental measurements. BAMBOO demonstrates state-of-the-art accuracy in predicting key electrolyte properties such as density, viscosity, and ionic conductivity across various solvents and salt combinations. Our current model, trained on more than 15 chemical species, achieves the average density error of 0.01 g/cm$^3$ on various compositions compared with experimental data. Moreover, our model demonstrates transferability to molecules not included in the quantum mechanical dataset. We envision this work as paving the way to a "universal MLFF" capable of simulating properties of common organic liquids.

Updated: 2024-04-12 01:08:34

标题: 竹子：一种用于液体电解质开发的预测性和可转移的机器学习力场框架

摘要: 尽管机器学习力场（MLFF）在固体和小分子上的广泛应用，但在复杂液体电解质中应用MLFF存在明显的差距。在这项工作中，我们介绍了BAMBOO（字节跳动人工智能分子模拟增强器），这是一个新颖的分子动力学（MD）模拟框架，并展示了其在液体电解质（锂电池）背景下的能力。我们设计了一个受物理启发的图等变换器架构作为BAMBOO的骨干，以从量子机械模拟中学习。此外，我们开创了一种集成知识蒸馏方法，并将其应用于MLFF以提高MD模拟的稳定性。最后，我们提出了密度对准算法，以将BAMBOO与实验测量对准。BAMBOO在预测关键电解质性质（如密度、粘度和离子电导率）方面表现出最先进的准确性，涵盖各种溶剂和盐组合。我们当前的模型在超过15种化学物种上训练，与实验数据相比，在各种组合上平均密度误差为0.01 g/cm$^3$。此外，我们的模型展示了对未包含在量子机械数据集中的分子的可转移性。我们设想这项工作为铺平道路，开发出能够模拟常见有机液体性质的“通用MLFF”。

更新时间: 2024-04-12 01:08:34

领域: cond-mat.mtrl-sci,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2404.07181v3

Introducing Graph Learning over Polytopic Uncertain Graph

This extended abstract introduces a class of graph learning applicable to cases where the underlying graph has polytopic uncertainty, i.e., the graph is not exactly known, but its parameters or properties vary within a known range. By incorporating this assumption that the graph lies in a polytopic set into two established graph learning frameworks, we find that our approach yields better results with less computation.

Updated: 2024-04-12 00:55:07

标题: 引入多面体不确定图上的图学习

摘要: 这个扩展摘要介绍了一种适用于具有多边形不确定性的图学习类别，即，图并非确切已知，而是其参数或属性在已知范围内变化。通过将图处于多边形集合中的假设纳入两个已建立的图学习框架中，我们发现我们的方法能够以更少的计算得到更好的结果。

更新时间: 2024-04-12 00:55:07

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2404.08176v1

UMBCLU at SemEval-2024 Task 1A and 1C: Semantic Textual Relatedness with and without machine translation

The aim of SemEval-2024 Task 1, "Semantic Textual Relatedness for African and Asian Languages" is to develop models for identifying semantic textual relatedness (STR) between two sentences using multiple languages (14 African and Asian languages) and settings (supervised, unsupervised, and cross-lingual). Large language models (LLMs) have shown impressive performance on several natural language understanding tasks such as multilingual machine translation (MMT), semantic similarity (STS), and encoding sentence embeddings. Using a combination of LLMs that perform well on these tasks, we developed two STR models, $\textit{TranSem}$ and $\textit{FineSem}$, for the supervised and cross-lingual settings. We explore the effectiveness of several training methods and the usefulness of machine translation. We find that direct fine-tuning on the task is comparable to using sentence embeddings and translating to English leads to better performance for some languages. In the supervised setting, our model performance is better than the official baseline for 3 languages with the remaining 4 performing on par. In the cross-lingual setting, our model performance is better than the baseline for 3 languages (leading to $1^{st}$ place for Africaans and $2^{nd}$ place for Indonesian), is on par for 2 languages and performs poorly on the remaining 7 languages. Our code is publicly available at https://github.com/dipta007/SemEval24-Task8.

Updated: 2024-04-12 00:53:29

标题: UMBCLU在SemEval-2024任务1A和1C中的表现：使用和不使用机器翻译的语义文本相关性

摘要: SemEval-2024任务1“非洲和亚洲语言的语义文本相关性”的目标是开发模型，用于识别两个句子之间的语义文本相关性（STR），使用多种语言（14种非洲和亚洲语言）和设置（监督、无监督和跨语言）。大型语言模型（LLMs）在多种自然语言理解任务上表现出色，如多语言机器翻译（MMT）、语义相似度（STS）和编码句子嵌入。通过结合在这些任务上表现良好的LLMs，我们为监督和跨语言设置开发了两个STR模型，$\textit{TranSem}$和$\textit{FineSem}$。我们探讨了几种训练方法的有效性和机器翻译的用处。我们发现，直接在任务上进行微调与使用句子嵌入相媲美，将结果翻译成英文可提高某些语言的性能。在监督设置中，我们的模型性能优于3种语言的官方基准，其余4种语言表现相当。在跨语言设置中，我们的模型性能优于3种语言的基准（导致非洲语言和印尼语分别获得第1和第2名），与2种语言表现相当，并且在其余7种语言上表现不佳。我们的代码可在https://github.com/dipta007/SemEval24-Task8 上公开获取。

更新时间: 2024-04-12 00:53:29

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.12730v2

RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion

Raw depth images captured in indoor scenarios frequently exhibit extensive missing values due to the inherent limitations of the sensors and environments. For example, transparent materials frequently elude detection by depth sensors; surfaces may introduce measurement inaccuracies due to their polished textures, extended distances, and oblique incidence angles from the sensor. The presence of incomplete depth maps imposes significant challenges for subsequent vision applications, prompting the development of numerous depth completion techniques to mitigate this problem. Numerous methods excel at reconstructing dense depth maps from sparse samples, but they often falter when faced with extensive contiguous regions of missing depth values, a prevalent and critical challenge in indoor environments. To overcome these challenges, we design a novel two-branch end-to-end fusion network named RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption and utilizing normal maps from RGB-D information as guidance, to regress the local dense depth values from the raw depth map. The other branch applies an RGB-depth fusion CycleGAN, adept at translating RGB imagery into detailed, textured depth maps while ensuring high fidelity through cycle consistency. We fuse the two branches via adaptive fusion modules named W-AdaIN and train the model with the help of pseudo depth maps. Comprehensive evaluations on NYU-Depth V2 and SUN RGB-D datasets show that our method significantly enhances depth completion performance particularly in realistic indoor settings.

Updated: 2024-04-12 00:52:35

标题: RDFC-GAN：用于室内深度补全的RGB-深度融合CycleGAN

摘要: 在室内场景中捕获的原始深度图像经常出现大量缺失值，这是由于传感器和环境的固有限制。例如，透明材料经常逃避深度传感器的检测；表面可能由于其光滑的纹理、延伸距离和从传感器倾斜的入射角而引入测量不准确性。不完整深度图的存在给后续视觉应用带来了重大挑战，促使开发大量深度补全技术以缓解此问题。许多方法擅长从稀疏样本重建密集深度图，但当面对大量连续缺失深度值的区域时，它们经常失败，这是室内环境中普遍且关键的挑战。为了克服这些挑战，我们设计了一种名为RDFC-GAN的新型双分支端到端融合网络，它以一对RGB和不完整深度图像作为输入，预测出密集和完整的深度图。第一个分支采用编码器-解码器结构，遵循曼哈顿世界假设，并利用RGB-D信息中的法线图作为指导，从原始深度图中回归出局部密集深度值。另一个分支应用了RGB-depth融合CycleGAN，能够将RGB图像翻译成详细的纹理深度图，并通过循环一致性确保高保真度。我们通过自适应融合模块W-AdaIN将这两个分支融合起来，并借助伪深度图训练模型。对NYU-Depth V2和SUN RGB-D数据集的全面评估表明，我们的方法在逼真的室内环境中显著提高了深度补全性能。

更新时间: 2024-04-12 00:52:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2306.03584v2

HICO-DET-SG and V-COCO-SG: New Data Splits for Evaluating the Systematic Generalization Performance of Human-Object Interaction Detection Models

Human-Object Interaction (HOI) detection is a task to localize humans and objects in an image and predict the interactions in human-object pairs. In real-world scenarios, HOI detection models need systematic generalization, i.e., generalization to novel combinations of objects and interactions, because the train data are expected to cover a limited portion of all possible combinations. To evaluate the systematic generalization performance of HOI detection models, we created two new sets of HOI detection data splits named HICO-DET-SG and V-COCO-SG based on the HICO-DET and V-COCO datasets, respectively. When evaluated on the new data splits, HOI detection models with various characteristics performed much more poorly than when evaluated on the original splits. This shows that systematic generalization is a challenging goal in HOI detection. By analyzing the evaluation results, we also gain insights for improving the systematic generalization performance and identify four possible future research directions. We hope that our new data splits and presented analysis will encourage further research on systematic generalization in HOI detection.

Updated: 2024-04-12 00:46:26

标题: HICO-DET-SG和V-COCO-SG：用于评估人-物体交互检测模型系统化泛化性能的新数据拆分

摘要: 人-物交互（HOI）检测是定位图像中的人和物体，并预测人-物体对之间交互的任务。在现实世界的场景中，HOI检测模型需要系统化概括，即对新颖物体和交互组合的概括，因为训练数据预计只覆盖所有可能组合的有限部分。为了评估HOI检测模型的系统化概化性能，我们基于HICO-DET和V-COCO数据集分别创建了两组新的HOI检测数据拆分，命名为HICO-DET-SG和V-COCO-SG。在新数据拆分上评估时，具有不同特征的HOI检测模型表现得比在原始拆分上要差得多。这表明系统化概括在HOI检测中是一个具有挑战性的目标。通过分析评估结果，我们还可以获得改善系统化概化性能并确定四个可能的未来研究方向的见解。我们希望我们的新数据拆分和提出的分析将鼓励进一步研究HOI检测中的系统化概括。

更新时间: 2024-04-12 00:46:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2305.09948v5

Optimal Universal Quantum Encoding for Statistical Inference

Optimal encoding of classical data for statistical inference using quantum computing is investigated. A universal encoder is sought that is optimal for a wide array of statistical inference tasks. Accuracy of any statistical inference is shown to be upper bounded by a term that is proportional to maximal quantum leakage from the classical data, i.e., the input to the inference model, through its quantum encoding. This demonstrates that the maximal quantum leakage is a universal measure of the quality of the encoding strategy for statistical inference as it only depends on the quantum encoding of the data and not the inference task itself. The optimal universal encoding strategy, i.e., the encoding strategy that maximizes the maximal quantum leakage, is proved to be attained by pure states. When there are enough qubits, basis encoding is proved to be universally optimal. An iterative method for numerically computing the optimal universal encoding strategy is presented.

Updated: 2024-04-12 00:39:53

标题: 统计推断的最佳通用量子编码

摘要: 这项研究探讨了使用量子计算进行统计推断时经典数据的最佳编码。寻求一种通用编码器，该编码器对于各种统计推断任务都是最佳的。结果表明，任何统计推断的准确性都被限制在一个与经典数据的最大量子泄漏成比例的项上，即通过其量子编码传输给推断模型的输入。这表明，最大量子泄漏是编码策略质量的通用度量，因为它仅取决于数据的量子编码而不取决于推断任务本身。最佳的通用编码策略，即最大化最大量子泄漏的编码策略，被证明是通过纯态实现的。当有足够的量子比特时，基础编码被证明是普遍最优的。提出了一种用于数值计算最佳通用编码策略的迭代方法。

更新时间: 2024-04-12 00:39:53

领域: quant-ph,cs.LG,eess.SP,math.ST,stat.TH

下载: http://arxiv.org/abs/2404.08172v1

Systematically Assessing the Security Risks of AI/ML-enabled Connected Healthcare Systems

The adoption of machine-learning-enabled systems in the healthcare domain is on the rise. While the use of ML in healthcare has several benefits, it also expands the threat surface of medical systems. We show that the use of ML in medical systems, particularly connected systems that involve interfacing the ML engine with multiple peripheral devices, has security risks that might cause life-threatening damage to a patient's health in case of adversarial interventions. These new risks arise due to security vulnerabilities in the peripheral devices and communication channels. We present a case study where we demonstrate an attack on an ML-enabled blood glucose monitoring system by introducing adversarial data points during inference. We show that an adversary can achieve this by exploiting a known vulnerability in the Bluetooth communication channel connecting the glucose meter with the ML-enabled app. We further show that state-of-the-art risk assessment techniques are not adequate for identifying and assessing these new risks. Our study highlights the need for novel risk analysis methods for analyzing the security of AI-enabled connected health devices.

Updated: 2024-04-12 00:33:58

标题: 系统评估AI/ML支持的连接医疗系统的安全风险

摘要: 在医疗领域，采用机器学习技术的系统日益普及。虽然在医疗领域使用机器学习具有多种好处，但也扩大了医疗系统的威胁面。我们指出，在医疗系统中使用机器学习，特别是涉及将机器学习引擎与多个外围设备进行接口的连接系统，存在可能导致患者健康受到威胁的安全风险。这些新风险是由于外围设备和通信通道中存在的安全漏洞引起的。我们提供了一个案例研究，展示了对机器学习血糖监测系统进行攻击的过程，通过在推断过程中引入对抗性数据点。我们展示了攻击者可以利用连接葡萄糖仪与机器学习应用的蓝牙通信渠道中已知的漏洞来实现这一点。此外，我们还表明，目前的风险评估技术不足以识别和评估这些新风险。我们的研究突出了需要为分析人工智能连接健康设备的安全性提供新颖的风险分析方法。

更新时间: 2024-04-12 00:33:58

领域: cs.CR,cs.CY,cs.LG

下载: http://arxiv.org/abs/2401.17136v2

Reinforcement Learning with Non-Cumulative Objective

In reinforcement learning, the objective is almost always defined as a \emph{cumulative} function over the rewards along the process. However, there are many optimal control and reinforcement learning problems in various application fields, especially in communications and networking, where the objectives are not naturally expressed as summations of the rewards. In this paper, we recognize the prevalence of non-cumulative objectives in various problems, and propose a modification to existing algorithms for optimizing such objectives. Specifically, we dive into the fundamental building block for many optimal control and reinforcement learning algorithms: the Bellman optimality equation. To optimize a non-cumulative objective, we replace the original summation operation in the Bellman update rule with a generalized operation corresponding to the objective. Furthermore, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the bottleneck objective, i.e., the objectives determined by the minimum reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.

Updated: 2024-04-12 00:32:08

标题: 使用非累积目标的强化学习

摘要: 在强化学习中，目标几乎总是定义为沿着过程中奖励的累积函数。然而，在各种应用领域中，特别是在通信和网络领域中，存在许多最优控制和强化学习问题，其中目标并非自然地表达为奖励的总和。在本文中，我们意识到在各种问题中非累积目标的普遍存在，并提出了一种修改现有算法以优化此类目标的方法。具体来说，我们深入研究了许多最优控制和强化学习算法的基本构建模块：贝尔曼最优性方程。为了优化非累积目标，我们将贝尔曼更新规则中的原始求和操作替换为与目标对应的广义操作。此外，我们提供了关于广义操作形式的充分条件以及对马尔可夫决策过程的假设，在这些条件下可以保证广义贝尔曼更新的全局最优收敛性。我们通过实验展示了这一想法，即通过瓶颈目标来确定目标，即沿着过程中最小奖励所确定的目标，在经典的最优控制和强化学习任务以及两个网络路由问题中，即最大化流量率。

更新时间: 2024-04-12 00:32:08

领域: cs.LG,cs.AI,cs.NI,math.OC,stat.ML

下载: http://arxiv.org/abs/2307.04957v2

Conformal Prediction via Regression-as-Classification

Conformal prediction (CP) for regression can be challenging, especially when the output distribution is heteroscedastic, multimodal, or skewed. Some of the issues can be addressed by estimating a distribution over the output, but in reality, such approaches can be sensitive to estimation error and yield unstable intervals.~Here, we circumvent the challenges by converting regression to a classification problem and then use CP for classification to obtain CP sets for regression.~To preserve the ordering of the continuous-output space, we design a new loss function and make necessary modifications to the CP classification techniques.~Empirical results on many benchmarks shows that this simple approach gives surprisingly good results on many practical problems.

Updated: 2024-04-12 00:21:30

标题: 透过回归作为分类的形式进行的一致性预测

摘要: 回归的一致性预测（CP）可能具有挑战性，特别是当输出分布是异方差、多峰或偏斜时。一些问题可以通过估计输出分布来解决，但实际上，这种方法可能对估计误差敏感，并产生不稳定的区间。在这里，我们通过将回归转换为分类问题，然后使用用于分类的CP来获取回归的CP集，从而绕过这些挑战。为了保持连续输出空间的顺序，我们设计了一个新的损失函数，并对CP分类技术进行了必要的修改。在许多基准测试中的实证结果表明，这种简单方法在许多实际问题上产生了令人惊讶的好结果。

更新时间: 2024-04-12 00:21:30

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.08168v1

Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models

Parameter Efficient Fine-Tuning (PEFT) methods have been extensively utilized in Large Language Models (LLMs) to improve the down-streaming tasks without the cost of fine-tuing the whole LLMs. Recent studies have shown how to effectively use PEFT for fine-tuning LLMs in ranking tasks with convincing performance; there are some limitations, including the learned prompt being fixed for different documents, overfitting to specific tasks, and low adaptation ability. In this paper, we introduce a query-dependent parameter efficient fine-tuning (Q-PEFT) approach for text reranking to leak the information of the true queries to LLMs and then make the generation of true queries from input documents much easier. Specifically, we utilize the query to extract the top-$k$ tokens from concatenated documents, serving as contextual clues. We further augment Q-PEFT by substituting the retrieval mechanism with a multi-head attention layer to achieve end-to-end training and cover all the tokens in the documents, guiding the LLMs to generate more document-specific synthetic queries, thereby further improving the reranking performance. Extensive experiments are conducted on four public datasets, demonstrating the effectiveness of our proposed approach.

Updated: 2024-04-12 00:18:06

标题: Q-PEFT：基于查询的参数高效微调，用于使用大型语言模型进行文本重新排序

摘要: 参数高效微调（PEFT）方法已被广泛应用于大型语言模型（LLMs）中，以改善下游任务的性能，而无需对整个LLMs进行微调。最近的研究表明如何有效地利用PEFT来微调LLMs以在排名任务中获得令人信服的性能；然而存在一些限制，包括学习的提示对不同文档固定、过度拟合特定任务和适应能力低等问题。在本文中，我们引入了一种基于查询的参数高效微调（Q-PEFT）方法，用于文本重新排序，将真实查询的信息泄漏给LLMs，从而使LLMs更容易从输入文档中生成真实查询。具体来说，我们利用查询从连接文档中提取前k个标记，作为上下文线索。我们进一步通过用多头注意力层替换检索机制来增强Q-PEFT，以实现端到端训练并覆盖文档中的所有标记，引导LLMs生成更多文档特定的合成查询，从而进一步提高重新排序性能。我们在四个公共数据集上进行了大量实验，证明了我们提出的方法的有效性。

更新时间: 2024-04-12 00:18:06

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2404.04522v2

Transfer Learning with Reconstruction Loss

In most applications of utilizing neural networks for mathematical optimization, a dedicated model is trained for each specific optimization objective. However, in many scenarios, several distinct yet correlated objectives or tasks often need to be optimized on the same set of problem inputs. Instead of independently training a different neural network for each problem separately, it would be more efficient to exploit the correlations between these objectives and to train multiple neural network models with shared model parameters and feature representations. To achieve this, this paper first establishes the concept of common information: the shared knowledge required for solving the correlated tasks, then proposes a novel approach for model training by adding into the model an additional reconstruction stage associated with a new reconstruction loss. This loss is for reconstructing the common information starting from a selected hidden layer in the model. The proposed approach encourages the learned features to be general and transferable, and therefore can be readily used for efficient transfer learning. For numerical simulations, three applications are studied: transfer learning on classifying MNIST handwritten digits, the device-to-device wireless network power allocation, and the multiple-input-single-output network downlink beamforming and localization. Simulation results suggest that the proposed approach is highly efficient in data and model complexity, is resilient to over-fitting, and has competitive performances.

Updated: 2024-04-12 00:16:43

标题: 具有重建损失的迁移学习

摘要: 在大多数利用神经网络进行数学优化的应用中，通常会为每个特定的优化目标训练一个专用模型。然而，在许多情况下，同一组问题输入上需要优化几个不同但相关的目标或任务。与为每个问题单独训练不同的神经网络相比，更有效的方法是利用这些目标之间的相关性，并训练具有共享模型参数和特征表示的多个神经网络模型。为了实现这一点，本文首先建立了共同信息的概念：解决相关任务所需的共享知识，然后提出了一种新颖的模型训练方法，通过将与一个新的重构损失相关联的额外重构阶段添加到模型中。这种损失是为了从模型中的一个选择的隐藏层开始重构共同信息。所提出的方法鼓励学到的特征具有普遍性和可转移性，因此可以轻松用于高效的迁移学习。在数值模拟中，研究了三个应用：在分类MNIST手写数字上的迁移学习，设备对设备无线网络功率分配，以及多输入单输出网络下行波束成形和定位。模拟结果表明，所提出的方法在数据和模型复杂性方面非常高效，对过拟合具有弹性，并具有竞争性能。

更新时间: 2024-04-12 00:16:43

领域: cs.LG,cs.AI,cs.NI,stat.ML

下载: http://arxiv.org/abs/2404.00505v2

Gauging Public Acceptance of Conditionally Automated Vehicles in the United States

Public acceptance of conditionally automated vehicles is a crucial step in the realization of smart cities. Prior research in Europe has shown that the factors of hedonic motivation, social influence, and performance expectancy, in decreasing order of importance, influence acceptance. Moreover, a generally positive acceptance of the technology was reported. However, there is a lack of information regarding the public acceptance of conditionally automated vehicles in the United States. In this study, we carried out a web-based experiment where participants were provided information regarding the technology and then completed a questionnaire on their perceptions. The collected data was analyzed using PLS-SEM to examine the factors that may lead to public acceptance of the technology in the United States. Our findings showed that social influence, performance expectancy, effort expectancy, hedonic motivation, and facilitating conditions determine conditionally automated vehicle acceptance. Additionally, certain factors were found to influence the perception of how useful the technology is, the effort required to use it, and the facilitating conditions for its use. By integrating the insights gained from this study, stakeholders can better facilitate the adoption of autonomous vehicle technology, contributing to safer, more efficient, and user-friendly transportation systems in the future that help realize the vision of the smart city.

Updated: 2024-04-12 00:11:32

标题: 在美国评估公众对有条件自动驾驶车辆的接受程度

摘要: 公众对有条件自动驾驶车辆的接受度是实现智慧城市的关键步骤。先前在欧洲进行的研究表明，享乐动机、社会影响和绩效期望这些因素以递减的重要性顺序影响了接受度。此外，普遍对该技术持积极接受态度。然而，在美国关于公众对有条件自动驾驶车辆的接受度方面缺乏信息。在这项研究中，我们进行了基于网络的实验，参与者提供了有关技术的信息，然后填写了有关其感知的问卷调查。使用PLS-SEM分析了收集的数据，以研究可能导致技术在美国公众中被接受的因素。我们的研究结果显示，社会影响、绩效期望、努力期望、享乐动机和便利条件决定了有条件自动驾驶车辆的接受度。此外，某些因素被发现影响了技术的实用性、使用所需的努力和使用的便利条件的感知。通过整合从这项研究中获得的见解，利益相关者可以更好地促进自动驾驶车辆技术的采纳，有助于实现未来更安全、更高效和更用户友好的交通系统，从而实现智慧城市的愿景。

更新时间: 2024-04-12 00:11:32

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2402.11444v2

Lightweight Cryptanalysis of IoT Encryption Algorithms : Is Quota Sampling the Answer?

Rapid growth in the number of small sensor devices known as the Internet of Things (IoT) has seen the development of lightweight encryption algorithms. Two well-known lightweight algorithms are SIMON and SIMECK which have been specifically designed for use on resource-constrained IoT devices. These lightweight encryption algorithms are based on the efficient Feistel block structure which is known to exhibit vulnerabilities to differential cryptanalysis. Consequently, it is necessary to test these algorithms for resilience against such attacks. While existing state-of-the-art research has demonstrated novel heuristic methods of differential cryptanalysis that improve time efficiency on previous techniques, the large state sizes of these encryption algorithms inhibit cryptanalysis time efficiency. In this paper, we introduce Versatile Investigative Sampling Technique for Advanced Cryptanalysis (VISTA-CRYPT) - a time-efficient enhancement of differential cryptanalysis of lightweight encryption algorithms. The proposed technique introduces a simple framework of quota sampling that produces state-of-the-art results with time reductions of up to $76\%$ over existing techniques. Further, we present a preliminary graph-based analysis of the output differentials for the identification of relationships within the data and future research opportunities to further enhance the performance of differential cryptanalysis. The code designed for this work and associated datasets will be available at https://github.com/johncook1979/simon-cryptanalysis.

Updated: 2024-04-12 00:08:39

标题: 物联网加密算法的轻量级密码分析：定额抽样是否是答案？

摘要: 物联网（IoT）中的小型传感器设备数量迅速增长，这导致了轻量级加密算法的发展。两个知名的轻量级算法是SIMON和SIMECK，它们专门设计用于资源受限的物联网设备。这些轻量级加密算法基于高效的费斯特尔块结构，而费斯特尔块结构已知对差分密码分析存在漏洞。因此，有必要测试这些算法对此类攻击的弹性。尽管现有的最新研究展示了改进差分密码分析的启发式方法，提高了之前技术的时间效率，但这些加密算法的大状态大小阻碍了密码分析的时间效率。在本文中，我们介绍了用于轻量级加密算法的差分密码分析的高效时间增强技术Versatile Investigative Sampling Technique for Advanced Cryptanalysis（VISTA-CRYPT）。所提出的技术引入了一种简单的配额抽样框架，可以在现有技术的基础上将时间减少高达76％的结果。此外，我们提出了一种基于图的输出差分的初步分析，用于识别数据中的关系以及未来研究机会，以进一步提高差分密码分析的性能。本研究设计的代码和相关数据集将在https://github.com/johncook1979/simon-cryptanalysis 上提供。

更新时间: 2024-04-12 00:08:39

领域: cs.CR

下载: http://arxiv.org/abs/2404.08165v1

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com.

Updated: 2024-04-12 00:05:53

标题: 朝向衡量语言模型中主观全球观点的表达

摘要: 大型语言模型（LLMs）可能不公平地代表社会问题的多元化全球观点。在本文中，我们开发了一个定量框架来评估哪些人的意见与模型生成的回答更相似。我们首先构建了一个数据集GlobalOpinionQA，由设计用于捕捉不同国家对全球问题的多元化意见的跨国调查的问题和答案组成。接下来，我们定义了一个度量标准，量化了LLM生成的调查回答与人类回答之间的相似性，条件是国家。通过我们的框架，我们对一个经过训练的LLM进行了三个实验，旨在使其有益、诚实和无害，与Constitutional AI一起。默认情况下，LLM的回答往往更类似于某些人口的意见，例如来自美国、一些欧洲和南美洲国家的人口，突显了偏见的潜在性。当我们提示模型考虑特定国家的观点时，回答会转变为更类似于被提示人口的意见，但可能反映出有害的文化刻板印象。当我们将GlobalOpinionQA问题翻译成目标语言时，模型的回答不一定变得最类似于那些语言使用者的意见。我们发布了我们的数据集供他人使用和建立。我们的数据位于https://huggingface.co/datasets/Anthropic/llm_global_opinions。我们还提供了一个交互式可视化网站，网址为https://llmglobalvalues.anthropic.com。

更新时间: 2024-04-12 00:05:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2306.16388v2

Language Model Prompt Selection via Simulation Optimization

With the advancement in generative language models, the selection of prompts has gained significant attention in recent years. A prompt is an instruction or description provided by the user, serving as a guide for the generative language model in content generation. Despite existing methods for prompt selection that are based on human labor, we consider facilitating this selection through simulation optimization, aiming to maximize a pre-defined score for the selected prompt. Specifically, we propose a two-stage framework. In the first stage, we determine a feasible set of prompts in sufficient numbers, where each prompt is represented by a moderate-dimensional vector. In the subsequent stage for evaluation and selection, we construct a surrogate model of the score regarding the moderate-dimensional vectors that represent the prompts. We propose sequentially selecting the prompt for evaluation based on this constructed surrogate model. We prove the consistency of the sequential evaluation procedure in our framework. We also conduct numerical experiments to demonstrate the efficacy of our proposed framework, providing practical instructions for implementation.

Updated: 2024-04-12 00:03:56

标题: 通过模拟优化选择语言模型提示

摘要: 随着生成式语言模型的发展，近年来对提示的选择引起了重要关注。提示是用户提供的指令或描述，作为生成性语言模型在内容生成中的指导。尽管存在基于人力的提示选择方法，但我们考虑通过模拟优化来促进这一选择，旨在最大化选定提示的预定义分数。具体而言，我们提出了一个两阶段框架。在第一阶段，我们确定了足够数量的可行提示集，其中每个提示由一个中等维度向量表示。在评估和选择的后续阶段，我们构建了一个关于表示提示的中等维度向量的分数的代理模型。我们建议根据这个构建的代理模型依次选择要评估的提示。我们证明了我们框架中顺序评估程序的一致性。我们还进行了数值实验，以展示我们提出的框架的有效性，并提供实施的实际指导。

更新时间: 2024-04-12 00:03:56

领域: stat.ML,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.08164v1