Arxiv Day: Article

The Dilemma of Uncertainty Estimation for General Purpose AI in the EU AI Act

The AI act is the European Union-wide regulation of AI systems. It includes specific provisions for general-purpose AI models which however need to be further interpreted in terms of technical standards and state-of-art studies to ensure practical compliance solutions. This paper examines the AI act requirements for providers and deployers of general-purpose AI and further proposes uncertainty estimation as a suitable measure for legal compliance and quality assurance in training of such models. We argue that uncertainty estimation should be a required component for deploying models in the real world, and under the EU AI Act, it could fulfill several requirements for transparency, accuracy, and trustworthiness. However, generally using uncertainty estimation methods increases the amount of computation, producing a dilemma, as computation might go over the threshold ($10^{25}$ FLOPS) to classify the model as a systemic risk system which bears more regulatory burden.

Updated: 2024-08-20 23:59:51

标题: 《欧盟AI法案中通用AI不确定性估计的困境》

摘要: AI法案是欧盟范围内对AI系统的监管。它包括针对通用AI模型的具体规定，然而这些规定需要在技术标准和最新研究方面进一步解释，以确保实际的合规解决方案。本文审查了AI法案对通用AI提供商和部署者的要求，并进一步提出将不确定性估计作为在训练此类模型时的合法合规和质量保证的适当措施。我们认为，不确定性估计应该是在现实世界中部署模型的必要组成部分，并根据欧盟AI法案，它可以满足透明度、准确性和可信度等多项要求。然而，一般来说，使用不确定性估计方法会增加计算量，产生一个困境，因为计算量可能超过阈值（$10^{25}$FLOPS），将模型分类为系统风险系统，这将带来更多的监管负担。

更新时间: 2024-08-20 23:59:51

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2408.11249v1

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Video summarization aims to create short, accurate, and cohesive summaries of longer videos. Despite the existence of various video summarization datasets, a notable limitation is their limited amount of source videos, which hampers the effective training of advanced large vision-language models (VLMs). Additionally, most existing datasets are created for video-to-video summarization, overlooking the contemporary need for multimodal video content summarization. Recent efforts have been made to expand from unimodal to multimodal video summarization, categorizing the task into three sub-tasks based on the summary's modality: video-to-video (V2V), video-to-text (V2T), and a combination of video and text summarization (V2VT). However, the textual summaries in previous multimodal datasets are inadequate. To address these issues, we introduce Instruct-V2Xum, a cross-modal video summarization dataset featuring 30,000 diverse videos sourced from YouTube, with lengths ranging from 40 to 940 seconds and an average summarization ratio of 16.39%. Each video summary in Instruct-V2Xum is paired with a textual summary that references specific frame indexes, facilitating the generation of aligned video and textual summaries. In addition, we propose a new video summarization framework named V2Xum-LLM. V2Xum-LLM, specifically V2Xum-LLaMA in this study, is the first framework that unifies different video summarization tasks into one large language model's (LLM) text decoder and achieves task-controllable video summarization with temporal prompts and task instructions. Experiments show that V2Xum-LLaMA outperforms strong baseline models on multiple video summarization tasks. Furthermore, we propose an enhanced evaluation metric for V2V and V2VT summarization tasks.

Updated: 2024-08-20 23:47:02

标题: V2Xum-LLM: 通过时间提示指导调整的跨模态视频摘要化

摘要: 视频摘要旨在创建较长视频的简短、准确和连贯的摘要。尽管存在各种视频摘要数据集，但一个显著的限制是它们源视频数量有限，这妨碍了对先进大型视觉语言模型（VLMs）的有效训练。此外，大多数现有数据集是为视频到视频摘要而创建的，忽视了当今对多模态视频内容摘要的需求。最近的努力致力于从单模态扩展到多模态视频摘要，根据摘要的模态将任务分类为三个子任务：视频到视频（V2V）、视频到文本（V2T）以及视频和文本摘要的组合（V2VT）。然而，在先前的多模态数据集中，文本摘要是不足的。为了解决这些问题，我们介绍了Instruct-V2Xum，一个跨模态视频摘要数据集，包括来自YouTube的30,000个不同视频，长度从40到940秒不等，平均摘要比例为16.39%。Instruct-V2Xum中的每个视频摘要都配有一个引用特定帧索引的文本摘要，有助于生成对齐的视频和文本摘要。此外，我们提出了一个新的视频摘要框架，名为V2Xum-LLM。V2Xum-LLM，特别是本研究中的V2Xum-LLaMA，是第一个将不同视频摘要任务统一到一个大型语言模型（LLM）文本解码器中，并通过时间提示和任务说明实现任务可控视频摘要的框架。实验证明，V2Xum-LLaMA在多个视频摘要任务上优于强基线模型。此外，我们提出了一个增强的评估指标，用于V2V和V2VT摘要任务。

更新时间: 2024-08-20 23:47:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.12353v2

Asymmetric Graph Error Control with Low Complexity in Causal Bandits

In this paper, the causal bandit problem is investigated, in which the objective is to select an optimal sequence of interventions on nodes in a causal graph. It is assumed that the graph is governed by linear structural equations; it is further assumed that both the causal topology and the distribution of interventions are unknown. By exploiting the causal relationships between the nodes whose signals contribute to the reward, interventions are optimized. First, based on the difference between the two types of graph identification errors (false positives and negatives), a causal graph learning method is proposed, which strongly reduces sample complexity relative to the prior art by learning sub-graphs. Under the assumption of Gaussian exogenous inputs and minimum-mean squared error weight estimation, a new uncertainty bound tailored to the causal bandit problem is derived. This uncertainty bound drives an upper confidence bound based intervention selection to optimize the reward. To cope with non-stationary bandits, a sub-graph change detection mechanism is proposed, with high sample efficiency. Numerical results compare the new methodology to existing schemes and show a substantial performance improvement in both stationary and non-stationary settings. Compared to existing approaches, the proposed scheme takes 67% fewer samples to learn the causal structure and achieves an average reward gain of 85%.

Updated: 2024-08-20 23:37:08

标题: 因果臂下低复杂度的不对称图错误控制

摘要: 本文研究了因果强化学习问题，其目标是选择一个在因果图中节点上的最优干预序列。假定图由线性结构方程控制；进一步假定因果拓扑和干预分布均未知。通过利用贡献于奖励的信号节点之间的因果关系，对干预进行优化。首先，基于两种图识别错误（假阳性和假阴性）之间的差异，提出了一种因果图学习方法，通过学习子图大大减少了样本复杂度。在假定高斯外生输入和最小均方误差权重估计的情况下，为因果强化学习问题量身定制了一种新的不确定性边界。该不确定性边界驱动基于上限置信区间的干预选择以优化奖励。为了应对非平稳的强化学习问题，提出了一种子图变化检测机制，具有很高的样本效率。数值结果将新方法与现有方案进行了比较，并在稳定和非稳定环境中显示了显著的性能提升。与现有方法相比，所提出的方案学习因果结构所需样本数量减少了67%，并实现了85%的平均奖励增益。

更新时间: 2024-08-20 23:37:08

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2408.11240v1

A Little Confidence Goes a Long Way

We introduce a group of related methods for binary classification tasks using probes of the hidden state activations in large language models (LLMs). Performance is on par with the largest and most advanced LLMs currently available, but requiring orders of magnitude fewer computational resources and not requiring labeled data. This approach involves translating class labels into a semantically rich description, spontaneous symmetry breaking of multilayer perceptron probes for unsupervised learning and inference, training probes to generate confidence scores (prior probabilities) from hidden state activations subject to known constraints via entropy maximization, and selecting the most confident probe model from an ensemble for prediction. These techniques are evaluated on four datasets using five base LLMs.

Updated: 2024-08-20 23:36:00

标题: 一点自信带来的长远影响

摘要: 我们引入了一组相关的方法，用于在大型语言模型（LLMs）中使用隐藏状态激活的探针进行二元分类任务。性能与目前最大和最先进的LLMs相当，但需要数量级更少的计算资源，并且不需要标记的数据。这种方法涉及将类别标签转化为语义丰富的描述，使用多层感知器探针进行无监督学习和推理的自发对称破缺，训练探针从隐藏状态激活中生成置信分数（先验概率），受到已知约束条件的熵最大化，并从集成中选择最可信的探针模型进行预测。这些技术在四个数据集上使用五个基本LLMs进行评估。

更新时间: 2024-08-20 23:36:00

领域: cs.LG,cs.AI,cs.CL,cs.IT,cs.NE,math.IT

下载: http://arxiv.org/abs/2408.11239v1

Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification

Detecting out-of-distribution (OOD) data is crucial in machine learning applications to mitigate the risk of model overconfidence, thereby enhancing the reliability and safety of deployed systems. The majority of existing OOD detection methods predominantly address uni-modal inputs, such as images or texts. In the context of multi-modal documents, there is a notable lack of extensive research on the performance of these methods, which have primarily been developed with a focus on computer vision tasks. We propose a novel methodology termed as attention head masking (AHM) for multi-modal OOD tasks in document classification systems. Our empirical results demonstrate that the proposed AHM method outperforms all state-of-the-art approaches and significantly decreases the false positive rate (FPR) compared to existing solutions up to 7.5\%. This methodology generalizes well to multi-modal data, such as documents, where visual and textual information are modeled under the same Transformer architecture. To address the scarcity of high-quality publicly available document datasets and encourage further research on OOD detection for documents, we introduce FinanceDocs, a new document AI dataset. Our code and dataset are publicly available.

Updated: 2024-08-20 23:30:00

标题: 使用注意力头掩模进行多模态文档分类的超出分布检测

摘要: 检测到数据分布外（OOD）的数据在机器学习应用中至关重要，可以减少模型过度自信的风险，从而增强部署系统的可靠性和安全性。现有的大多数OOD检测方法主要处理单模态输入，如图像或文本。在多模态文档的情况下，对这些方法的性能缺乏广泛的研究，这些方法主要是针对计算机视觉任务开发的。我们提出了一种新的方法，称为注意力头遮罩（AHM），用于文档分类系统中的多模态OOD任务。我们的实证结果表明，所提出的AHM方法优于所有最先进的方法，并且与现有解决方案相比显著降低了假阳性率（FPR）高达7.5％。这种方法在多模态数据（如文档）中具有良好的泛化性能，其中视觉和文本信息在相同的Transformer架构下建模。为了解决高质量公开可用文档数据集的稀缺性，并鼓励进一步研究文档的OOD检测，我们推出了FinanceDocs，一个新的文档AI数据集。我们的代码和数据集是公开可用的。

更新时间: 2024-08-20 23:30:00

领域: cs.AI,cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.11237v1

Unified Deep Learning Model for Global Prediction of Aboveground Biomass, Canopy Height and Cover from High-Resolution, Multi-Sensor Satellite Imagery

Regular measurement of carbon stock in the world's forests is critical for carbon accounting and reporting under national and international climate initiatives, and for scientific research, but has been largely limited in scalability and temporal resolution due to a lack of ground based assessments. Increasing efforts have been made to address these challenges by incorporating remotely sensed data. We present a new methodology which uses multi-sensor, multi-spectral imagery at a resolution of 10 meters and a deep learning based model which unifies the prediction of above ground biomass density (AGBD), canopy height (CH), canopy cover (CC) as well as uncertainty estimations for all three quantities. The model is trained on millions of globally sampled GEDI-L2/L4 measurements. We validate the capability of our model by deploying it over the entire globe for the year 2023 as well as annually from 2016 to 2023 over selected areas. The model achieves a mean absolute error for AGBD (CH, CC) of 26.1 Mg/ha (3.7 m, 9.9 %) and a root mean squared error of 50.6 Mg/ha (5.4 m, 15.8 %) on a globally sampled test dataset, demonstrating a significant improvement over previously published results. We also report the model performance against independently collected ground measurements published in the literature, which show a high degree of correlation across varying conditions. We further show that our pre-trained model facilitates seamless transferability to other GEDI variables due to its multi-head architecture.

Updated: 2024-08-20 23:15:41

标题: 统一的深度学习模型用于高分辨率、多传感器卫星影像的地球上生物量、冠层高度和覆盖度的全球预测

摘要: 世界森林碳储量的定期测量对于国家和国际气候倡议下的碳核算和报告以及科学研究至关重要，但由于缺乏基于地面评估的可扩展性和时间分辨率，目前在很大程度上受到限制。为了解决这些挑战，越来越多的努力正在通过整合遥感数据来进行。我们提出了一种新的方法，该方法利用分辨率为10米的多传感器、多光谱图像以及基于深度学习的模型，统一预测地上生物量密度（AGBD）、冠层高度（CH）、冠覆盖率（CC）以及这三个量的不确定性估计。该模型是在全球采样的数百万个GEDI-L2/L4测量数据上进行训练的。我们验证了我们的模型的能力，通过在2023年以及2016年至2023年间在选定区域对其进行部署。该模型在全球采样测试数据集上实现了AGBD（CH、CC）的平均绝对误差为26.1 Mg/ha（3.7 m、9.9%），均方根误差为50.6 Mg/ha（5.4 m、15.8%），显示出明显的改进。我们还报告了模型对独立收集的地面测量数据的性能，这些数据已在文献中发表，结果显示在不同条件下存在高度相关性。我们进一步展示了我们的预训练模型由于其多头结构，可以轻松地转移到其他GEDI变量。

更新时间: 2024-08-20 23:15:41

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.11234v1

OCTCube: A 3D foundation model for optical coherence tomography that improves cross-dataset, cross-disease, cross-device and cross-modality analysis

Optical coherence tomography (OCT) has become critical for diagnosing retinal diseases as it enables 3D images of the retina and optic nerve. OCT acquisition is fast, non-invasive, affordable, and scalable. Due to its broad applicability, massive numbers of OCT images have been accumulated in routine exams, making it possible to train large-scale foundation models that can generalize to various diagnostic tasks using OCT images. Nevertheless, existing foundation models for OCT only consider 2D image slices, overlooking the rich 3D structure. Here, we present OCTCube, a 3D foundation model pre-trained on 26,605 3D OCT volumes encompassing 1.62 million 2D OCT images. OCTCube is developed based on 3D masked autoencoders and exploits FlashAttention to reduce the larger GPU memory usage caused by modeling 3D volumes. OCTCube outperforms 2D models when predicting 8 retinal diseases in both inductive and cross-dataset settings, indicating that utilizing the 3D structure in the model instead of 2D data results in significant improvement. OCTCube further shows superior performance on cross-device prediction and when predicting systemic diseases, such as diabetes and hypertension, further demonstrating its strong generalizability. Finally, we propose a contrastive-self-supervised-learning-based OCT-IR pre-training framework (COIP) for cross-modality analysis on OCT and infrared retinal (IR) images, where the OCT volumes are embedded using OCTCube. We demonstrate that COIP enables accurate alignment between OCT and IR en face images. Collectively, OCTCube, a 3D OCT foundation model, demonstrates significantly better performance against 2D models on 27 out of 29 tasks and comparable performance on the other two tasks, paving the way for AI-based retinal disease diagnosis.

Updated: 2024-08-20 22:55:19

标题: OCTCube：一种用于光学相干断层扫描的三维基础模型，可改善跨数据集、跨疾病、跨设备和跨模态分析

摘要: 光学相干断层扫描（OCT）已成为诊断视网膜疾病的关键工具，因为它能够生成视网膜和视神经的三维图像。OCT采集快速、无创、价格适中且可扩展。由于其广泛适用性，在日常检查中积累了大量的OCT图像，从而有可能训练大规模的基础模型，可以使用OCT图像对各种诊断任务进行泛化。然而，现有的OCT基础模型仅考虑2D图像切片，忽视了丰富的3D结构。在这里，我们提出了OCTCube，这是一个在26,605个三维OCT体积上进行预训练的3D基础模型，包含了162万个2D OCT图像。OCTCube基于3D掩膜自编码器开发，并利用FlashAttention来减少由建模3D体积引起的更大GPU内存使用量。在预测8种视网膜疾病时，OCTCube在归纳和跨数据集设置中均优于2D模型，表明在模型中利用3D结构而不是2D数据会带来显著的改进。OCTCube在跨设备预测和预测糖尿病、高血压等系统性疾病时表现出更好的性能，进一步证明了其强大的泛化能力。最后，我们提出了一个基于对比自监督学习的OCT-IR预训练框架（COIP）用于在OCT和红外视网膜（IR）图像上进行跨模态分析，其中使用OCTCube嵌入OCT体积。我们证明COIP能够实现OCT和IR面像之间的精确对齐。总的来说，作为一个3D OCT基础模型，OCTCube在29个任务中对27个任务表现出明显更好的性能，对另外两个任务表现出可比较的性能，为基于人工智能的视网膜疾病诊断铺平了道路。

更新时间: 2024-08-20 22:55:19

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.11227v1

CoDi: Conversational Distillation for Grounded Question Answering

Distilling conversational skills into Small Language Models (SLMs) with approximately 1 billion parameters presents significant challenges. Firstly, SLMs have limited capacity in their model parameters to learn extensive knowledge compared to larger models. Secondly, high-quality conversational datasets are often scarce, small, and domain-specific. Addressing these challenges, we introduce a novel data distillation framework named CoDi (short for Conversational Distillation, pronounced "Cody"), allowing us to synthesize large-scale, assistant-style datasets in a steerable and diverse manner. Specifically, while our framework is task agnostic at its core, we explore and evaluate the potential of CoDi on the task of conversational grounded reasoning for question answering. This is a typical on-device scenario for specialist SLMs, allowing for open-domain model responses, without requiring the model to "memorize" world knowledge in its limited weights. Our evaluations show that SLMs trained with CoDi-synthesized data achieve performance comparable to models trained on human-annotated data in standard metrics. Additionally, when using our framework to generate larger datasets from web data, our models surpass larger, instruction-tuned models in zero-shot conversational grounded reasoning tasks.

Updated: 2024-08-20 22:35:47

标题: CoDi: 基于对话的问答领域精简

摘要: 将会话技能提炼成拥有大约10亿参数的小语言模型（SLMs）存在重大挑战。首先，与更大模型相比，SLMs 在模型参数方面学习广泛知识的能力有限。其次，高质量的会话数据集通常稀缺、规模小且领域特定。为了解决这些挑战，我们引入了一种名为CoDi（即Conversational Distillation，读作"Cody"）的新型数据提炼框架，使我们能够以可控和多样化的方式合成大规模的助手风格数据集。具体而言，虽然我们的框架在其核心是任务不可知的，我们在会话基础推理问题回答任务上探索和评估了CoDi 的潜力。这对于专用SLMs来说是典型的设备上场景，允许开放域模型响应，而无需在有限的权重中“记忆”世界知识。我们的评估表明，使用CoDi 合成数据训练的SLMs 在标准指标上实现了与人工注释数据训练模型相当的性能。此外，当使用我们的框架从网络数据生成更大的数据集时，我们的模型在零-shot 会话基础推理任务中超越了更大的、经过指导调节的模型。

更新时间: 2024-08-20 22:35:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.11219v1

Revisiting Min-Max Optimization Problem in Adversarial Training

The rise of computer vision applications in the real world puts the security of the deep neural networks at risk. Recent works demonstrate that convolutional neural networks are susceptible to adversarial examples - where the input images look similar to the natural images but are classified incorrectly by the model. To provide a rebuttal to this problem, we propose a new method to build robust deep neural networks against adversarial attacks by reformulating the saddle point optimization problem in \cite{madry2017towards}. Our proposed method offers significant resistance and a concrete security guarantee against multiple adversaries. The goal of this paper is to act as a stepping stone for a new variation of deep learning models which would lead towards fully robust deep learning models.

Updated: 2024-08-20 22:31:19

标题: 重新审视对抗训练中的极小-极大优化问题

摘要: 计算机视觉应用在现实世界中的兴起使深度神经网络的安全面临风险。最近的研究表明，卷积神经网络容易受到对抗性示例的影响 - 输入图像与自然图像相似，但被模型错误分类。为了对这个问题提出反驳，我们提出了一种新的方法，通过重新构建\cite{madry2017towards}中的鞍点优化问题，构建抗对抗攻击的深度神经网络。我们提出的方法提供了显著的抵抗力和明确的安全保证，可以抵御多个对手的攻击。本文的目标是为新型深度学习模型的一种变体打下基础，这将引导向完全强大的深度学习模型发展。

更新时间: 2024-08-20 22:31:19

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.11218v1

Discovery of Generalizable TBI Phenotypes Using Multivariate Time-Series Clustering

Traumatic Brain Injury (TBI) presents a broad spectrum of clinical presentations and outcomes due to its inherent heterogeneity, leading to diverse recovery trajectories and varied therapeutic responses. While many studies have delved into TBI phenotyping for distinct patient populations, identifying TBI phenotypes that consistently generalize across various settings and populations remains a critical research gap. Our research addresses this by employing multivariate time-series clustering to unveil TBI's dynamic intricates. Utilizing a self-supervised learning-based approach to clustering multivariate time-Series data with missing values (SLAC-Time), we analyzed both the research-centric TRACK-TBI and the real-world MIMIC-IV datasets. Remarkably, the optimal hyperparameters of SLAC-Time and the ideal number of clusters remained consistent across these datasets, underscoring SLAC-Time's stability across heterogeneous datasets. Our analysis revealed three generalizable TBI phenotypes ({\alpha}, \b{eta}, and {\gamma}), each exhibiting distinct non-temporal features during emergency department visits, and temporal feature profiles throughout ICU stays. Specifically, phenotype {\alpha} represents mild TBI with a remarkably consistent clinical presentation. In contrast, phenotype \b{eta} signifies severe TBI with diverse clinical manifestations, and phenotype {\gamma} represents a moderate TBI profile in terms of severity and clinical diversity. Age is a significant determinant of TBI outcomes, with older cohorts recording higher mortality rates. Importantly, while certain features varied by age, the core characteristics of TBI manifestations tied to each phenotype remain consistent across diverse populations.

Updated: 2024-08-20 22:12:44

标题: 发现可推广的脑损伤表型：利用多变量时间序列聚类方法

摘要: 创伤性脑损伤（TBI）呈现出广泛的临床表现和结果，由于其固有的异质性，导致不同的康复轨迹和治疗反应。虽然许多研究已经深入探讨了不同患者群体的TBI表型，但在各种环境和人群中普遍适用的TBI表型仍然是一个重要的研究空白。我们的研究通过采用多变量时间序列聚类来揭示TBI的动态细节来解决这一问题。我们利用基于自监督学习的方法来聚类具有缺失值的多变量时间序列数据（SLAC-Time），分析了研究中心的TRACK-TBI和真实世界的MIMIC-IV数据集。值得注意的是，SLAC-Time的最佳超参数和理想的簇数在这些数据集中保持一致，强调了SLAC-Time在异质数据集中的稳定性。我们的分析揭示了三种可推广的TBI表型（α、β和γ），每种表型在急诊部门访问期间展现出不同的非时间特征，并在ICU住院期间展现出时间特征。具体来说，α表型代表了轻度TBI，具有非常一致的临床表现。相比之下，β表型表示严重TBI，具有多样化的临床表现，而γ表型则代表了在严重性和临床多样性方面都属于中等TBI。年龄是TBI结果的重要决定因素，年龄较大的群体记录了更高的死亡率。重要的是，虽然某些特征因年龄而异，但与每种表型相关的TBI表现的核心特征在不同人群中保持一致。

更新时间: 2024-08-20 22:12:44

领域: cs.LG,q-bio.QM,stat.AP

下载: http://arxiv.org/abs/2401.08002v2

Approximation of the Proximal Operator of the $\ell_\infty$ Norm Using a Neural Network

Computing the proximal operator of the $\ell_\infty$ norm, $\textbf{prox}_{\alpha ||\cdot||_\infty}(\mathbf{x})$, generally requires a sort of the input data, or at least a partial sort similar to quicksort. In order to avoid using a sort, we present an $O(m)$ approximation of $\textbf{prox}_{\alpha ||\cdot||_\infty}(\mathbf{x})$ using a neural network. A novel aspect of the network is that it is able to accept vectors of varying lengths due to a feature selection process that uses moments of the input data. We present results on the accuracy of the approximation, feature importance, and computational efficiency of the approach. We show that the network outperforms a "vanilla neural network" that does not use feature selection. We also present an algorithm with corresponding theory to calculate $\textbf{prox}_{\alpha ||\cdot||_\infty}(\mathbf{x})$ exactly, relate it to the Moreau decomposition, and compare its computational efficiency to that of the approximation.

Updated: 2024-08-20 22:12:30

标题: 使用神经网络逼近$\ell_\infty$范数的近端算子

摘要: 计算$\ell_\infty$范数的近端算子$\textbf{prox}_{\alpha ||\cdot||_\infty}(\mathbf{x})$通常需要对输入数据进行排序，或者至少类似于快速排序的部分排序。为了避免使用排序，我们提出了使用神经网络对$\textbf{prox}_{\alpha ||\cdot||_\infty}(\mathbf{x})$进行$O(m)$近似的方法。网络的一个新颖之处在于，它能够接受长度不同的向量，这是由于使用输入数据的矩特征选择过程。我们展示了近似的准确性、特征重要性以及方法的计算效率的结果。我们表明该网络胜过不使用特征选择的“普通神经网络”。我们还提出了一个相应理论的算法来精确计算$\textbf{prox}_{\alpha ||\cdot||_\infty}(\mathbf{x})$，将其与Moreau分解联系起来，并将其计算效率与近似方法进行比较。

更新时间: 2024-08-20 22:12:30

领域: math.NA,cs.LG,cs.NA,65K10, 68T07

下载: http://arxiv.org/abs/2408.11211v1

ComTraQ-MPC: Meta-Trained DQN-MPC Integration for Trajectory Tracking with Limited Active Localization Updates

Optimal decision-making for trajectory tracking in partially observable, stochastic environments where the number of active localization updates -- the process by which the agent obtains its true state information from the sensors -- are limited, presents a significant challenge. Traditional methods often struggle to balance resource conservation, accurate state estimation and precise tracking, resulting in suboptimal performance. This problem is particularly pronounced in environments with large action spaces, where the need for frequent, accurate state data is paramount, yet the capacity for active localization updates is restricted by external limitations. This paper introduces ComTraQ-MPC, a novel framework that combines Deep Q-Networks (DQN) and Model Predictive Control (MPC) to optimize trajectory tracking with constrained active localization updates. The meta-trained DQN ensures adaptive active localization scheduling, while the MPC leverages available state information to improve tracking. The central contribution of this work is their reciprocal interaction: DQN's update decisions inform MPC's control strategy, and MPC's outcomes refine DQN's learning, creating a cohesive, adaptive system. Empirical evaluations in simulated and real-world settings demonstrate that ComTraQ-MPC significantly enhances operational efficiency and accuracy, providing a generalizable and approximately optimal solution for trajectory tracking in complex partially observable environments.

Updated: 2024-08-20 21:52:51

标题: ComTraQ-MPC：元训练的DQN-MPC集成用于具有有限主动定位更新的轨迹跟踪

摘要: 在部分可观测、随机环境中进行轨迹跟踪的最佳决策，在主动定位更新的数量受限的情况下，是一个重大挑战。传统方法往往难以平衡资源节约、准确状态估计和精确跟踪，导致性能次优。这个问题在行动空间较大的环境中尤为突出，其中频繁、准确的状态数据需求至关重要，但主动定位更新的能力受到外部限制。本文引入了ComTraQ-MPC，这是一个将深度Q网络（DQN）和模型预测控制（MPC）结合起来，以优化受限主动定位更新的轨迹跟踪的新框架。元训练的DQN确保了自适应的主动定位调度，而MPC利用可用的状态信息来改善跟踪。本工作的核心贡献在于它们之间的相互作用：DQN的更新决策指导MPC的控制策略，而MPC的结果又会进一步完善DQN的学习，创造出一个紧密、自适应的系统。在模拟和实际环境中的实证评估表明，ComTraQ-MPC显著提高了操作效率和准确性，为复杂部分可观测环境中的轨迹跟踪提供了一个可推广且近似最优的解决方案。

更新时间: 2024-08-20 21:52:51

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2403.01564v3

Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval

Artificial intelligence (AI) hiring tools have revolutionized resume screening, and large language models (LLMs) have the potential to do the same. However, given the biases which are embedded within LLMs, it is unclear whether they can be used in this scenario without disadvantaging groups based on their protected attributes. In this work, we investigate the possibilities of using LLMs in a resume screening setting via a document retrieval framework that simulates job candidate selection. Using that framework, we then perform a resume audit study to determine whether a selection of Massive Text Embedding (MTE) models are biased in resume screening scenarios. We simulate this for nine occupations, using a collection of over 500 publicly available resumes and 500 job descriptions. We find that the MTEs are biased, significantly favoring White-associated names in 85.1\% of cases and female-associated names in only 11.1\% of cases, with a minority of cases showing no statistically significant differences. Further analyses show that Black males are disadvantaged in up to 100\% of cases, replicating real-world patterns of bias in employment settings, and validate three hypotheses of intersectionality. We also find an impact of document length as well as the corpus frequency of names in the selection of resumes. These findings have implications for widely used AI tools that are automating employment, fairness, and tech policy.

Updated: 2024-08-20 21:49:26

标题: 性别、种族和交叉偏见在简历筛选中的语言模型检索中的翻译 Bias

摘要: 人工智能（AI）招聘工具已经彻底改变了简历筛选的方式，而大型语言模型（LLMs）也有潜力做到同样的事情。然而，由于LLMs内嵌的偏见，目前尚不清楚它们是否可以在不影响受保护属性群体的情况下在这种情况下使用。在这项工作中，我们通过一个模拟求职候选人选择的文档检索框架，调查了在简历筛选环境中使用LLMs的可能性。利用该框架，我们进行了一项简历审计研究，以确定一系列大规模文本嵌入（MTE）模型在简历筛选场景中是否存在偏见。我们使用了超过500份公开简历和500份职位描述，模拟了九种职业的情况。我们发现MTEs存在偏见，在85.1％的情况下明显偏向与白人相关的名字，而仅在11.1％的情况下偏向与女性相关的名字，少数情况没有显示出统计学显著差异。进一步分析显示，黑人男性在最多100％的情况下处于不利地位，复制了就业场所中的偏见模式，并验证了三个交叉性假设。我们还发现文档长度以及简历中名称的语料库频率对简历选择产生影响。这些发现对广泛使用的自动化就业、公平和技术政策的人工智能工具具有重要意义。

更新时间: 2024-08-20 21:49:26

领域: cs.CY,cs.AI,cs.CL,cs.LG,K.4.2

下载: http://arxiv.org/abs/2407.20371v2

PooDLe: Pooled and dense self-supervised learning from naturalistic videos

Self-supervised learning has driven significant progress in learning from single-subject, iconic images. However, there are still unanswered questions about the use of minimally-curated, naturalistic video data, which contain dense scenes with many independent objects, imbalanced class distributions, and varying object sizes. In this paper, we propose a novel approach that combines an invariance-based SSL objective on pooled representations with a dense SSL objective that enforces equivariance to optical flow warping. Our findings indicate that a unified objective applied at multiple feature scales is essential for learning effective image representations from high-resolution, naturalistic videos. We validate our approach on the BDD100K driving video dataset and the Walking Tours first-person video dataset, demonstrating its ability to capture spatial understanding from a dense objective and semantic understanding via a pooled representation objective.

Updated: 2024-08-20 21:40:48

标题: PooDLe：来自自然视频的汇聚和密集的自监督学习

摘要: 自监督学习在从单个对象的图像中学习中取得了重大进展。然而，对于使用最小编辑的自然视频数据仍然存在一些未解答的问题，这些数据包含有许多独立对象的密集场景、不平衡的类分布以及不同大小的对象。在本文中，我们提出了一种新颖的方法，将基于不变性的SSL目标与强制光流变形等变性的密集SSL目标相结合。我们的研究结果表明，在多个特征尺度上应用统一目标对于从高分辨率、自然视频中学习有效的图像表示是必不可少的。我们在BDD100K驾驶视频数据集和Walking Tours第一人称视频数据集上验证了我们的方法，证明了它能够通过密集目标捕捉空间理解，通过池化表示目标捕捉语义理解的能力。

更新时间: 2024-08-20 21:40:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.11208v1

Quantum Inverse Contextual Vision Transformers (Q-ICVT): A New Frontier in 3D Object Detection for AVs

The field of autonomous vehicles (AVs) predominantly leverages multi-modal integration of LiDAR and camera data to achieve better performance compared to using a single modality. However, the fusion process encounters challenges in detecting distant objects due to the disparity between the high resolution of cameras and the sparse data from LiDAR. Insufficient integration of global perspectives with local-level details results in sub-optimal fusion performance.To address this issue, we have developed an innovative two-stage fusion process called Quantum Inverse Contextual Vision Transformers (Q-ICVT). This approach leverages adiabatic computing in quantum concepts to create a novel reversible vision transformer known as the Global Adiabatic Transformer (GAT). GAT aggregates sparse LiDAR features with semantic features in dense images for cross-modal integration in a global form. Additionally, the Sparse Expert of Local Fusion (SELF) module maps the sparse LiDAR 3D proposals and encodes position information of the raw point cloud onto the dense camera feature space using a gating point fusion approach. Our experiments show that Q-ICVT achieves an mAPH of 82.54 for L2 difficulties on the Waymo dataset, improving by 1.88% over current state-of-the-art fusion methods. We also analyze GAT and SELF in ablation studies to highlight the impact of Q-ICVT. Our code is available at https://github.com/sanjay-810/Qicvt Q-ICVT

Updated: 2024-08-20 21:36:57

标题: 量子逆环境视觉变换器（Q-ICVT）：自动驾驶汽车中三维物体检测的新前沿

摘要: 自动驾驶汽车领域主要利用激光雷达和摄像头数据的多模态集成，以实现比单一模态更好的性能。然而，由于摄像头的高分辨率和激光雷达的稀疏数据之间存在差异，融合过程在检测远处物体时遇到挑战。全局视角与局部细节的不足整合导致次优的融合性能。为了解决这个问题，我们开发了一种创新的两阶段融合过程，称为量子逆上下文视觉变压器（Q-ICVT）。该方法利用了量子概念中的绝热计算，创建了一种新颖的可逆视觉变压器，称为全局绝热变压器（GAT）。GAT在全局形式中将稀疏的激光雷达特征与密集图像中的语义特征进行聚合，以进行跨模态集成。此外，局部融合的稀疏专家（SELF）模块使用门控点融合方法将稀疏的激光雷达3D提议映射到原始点云的密集摄像头特征空间中，并编码位置信息。我们的实验表明，Q-ICVT在Waymo数据集的L2困难度上实现了82.54的mAPH，比当前最先进的融合方法提高了1.88%。我们还通过消蚀研究分析了GAT和SELF，以突出Q-ICVT的影响。我们的代码可在https://github.com/sanjay-810/Qicvt Q-ICVT上找到。

更新时间: 2024-08-20 21:36:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.11207v1

A Practical Solver for Scalar Data Topological Simplification

This paper presents a practical approach for the optimization of topological simplification, a central pre-processing step for the analysis and visualization of scalar data. Given an input scalar field f and a set of "signal" persistence pairs to maintain, our approach produces an output field g that is close to f and which optimizes (i) the cancellation of "non-signal" pairs, while (ii) preserving the "signal" pairs. In contrast to pre-existing simplification algorithms, our approach is not restricted to persistence pairs involving extrema and can thus address a larger class of topological features, in particular saddle pairs in three-dimensional scalar data. Our approach leverages recent generic persistence optimization frameworks and extends them with tailored accelerations specific to the problem of topological simplification. Extensive experiments report substantial accelerations over these frameworks, thereby making topological simplification optimization practical for real-life datasets. Our approach enables a direct visualization and analysis of the topologically simplified data, e.g., via isosurfaces of simplified topology (fewer components and handles). We apply our approach to the extraction of prominent filament structures in three-dimensional data. Specifically, we show that our pre-simplification of the data leads to practical improvements over standard topological techniques for removing filament loops. We also show how our approach can be used to repair genus defects in surface processing. Finally, we provide a C++ implementation for reproducibility purposes.

Updated: 2024-08-20 21:27:00

标题: 一个用于标量数据拓扑简化的实用求解器

摘要: 这篇论文提出了一种实用的方法，用于优化拓扑简化，这是对标量数据进行分析和可视化的中心预处理步骤。给定输入标量场 f 和一组要保持的“信号”持久对，我们的方法产生一个接近 f 的输出场 g，并优化 (i) 取消“非信号”对，同时 (ii) 保留“信号”对。与现有的简化算法不同，我们的方法不限于涉及极值的持久对，因此可以处理更大类别的拓扑特征，特别是三维标量数据中的鞍对。我们的方法利用最近的通用持久优化框架，并针对拓扑简化问题扩展了定制的加速。广泛的实验报告显示，与这些框架相比，我们的方法加速显著，从而使拓扑简化优化对实际数据集变得实用。我们的方法使得直接可视化和分析经过拓扑简化的数据成为可能，例如通过简化拓扑的等值面（组件和手柄较少）。我们将我们的方法应用于三维数据中显著的纤维结构的提取。具体来说，我们展示了数据的预简化如何比标准的拓扑技术更有效地去除纤维环。我们还展示了我们的方法如何用于修复表面处理中的层次缺陷。最后，我们提供了一个用于再现目的的 C++ 实现。

更新时间: 2024-08-20 21:27:00

领域: cs.LG,cs.CG,cs.CV,cs.GR

下载: http://arxiv.org/abs/2407.12399v3

Detecting Fraudulent Services on Quantum Cloud Platforms via Dynamic Fingerprinting

Noisy Intermediate-Scale Quantum (NISQ) devices, while accessible via cloud platforms, face challenges due to limited availability and suboptimal quality. These challenges raise the risk of cloud providers offering fraudulent services. This emphasizes the need for users to detect such fraud to protect their investments and ensure computational integrity. This study introduces a novel dynamic fingerprinting method for detecting fraudulent service provision on quantum cloud platforms, specifically targeting machine substitution and profile fabrication attacks. The dynamic fingerprint is constructed using a \textit{single} probing circuit to capture the unique error characteristics of quantum devices, making this approach practical because of its trivial computational costs. When the user examines the service, the execution results of the probing circuit act as the device-side fingerprint of the quantum device providing the service. The user then generates the user-side fingerprint by estimating the expected execution result, assuming the correct device is in use. We propose an algorithm for users to construct the user-side fingerprint with linear complexity. By comparing the device-side and user-side fingerprints, users can effectively detect fraudulent services. Our experiments on the IBM Quantum platform, involving seven devices with varying capabilities, confirm the method's effectiveness.

Updated: 2024-08-20 21:26:59

标题: 通过动态指纹识别在量子云平台上检测欺诈服务

摘要: 嘈杂中等规模量子（NISQ）设备，虽然可以通过云平台访问，但由于有限的可用性和次优质量而面临挑战。这些挑战增加了云提供商提供欺诈服务的风险。这强调了用户需要检测此类欺诈以保护其投资并确保计算的完整性。本研究介绍了一种用于检测量子云平台上欺诈服务提供的新颖动态指纹方法，具体针对机器替换和档案制作攻击。动态指纹是使用一个探测电路构建的，以捕获量子设备的独特错误特征，使得这种方法实用，因为其计算成本微不足道。当用户检查服务时，探测电路的执行结果作为提供服务的量子设备的设备端指纹。然后，用户通过估计预期的执行结果生成用户端指纹，假设使用了正确的设备。我们提出了一种用户构建用户端指纹的线性复杂度算法。通过比较设备端和用户端指纹，用户可以有效地检测欺诈服务。我们在IBM Quantum平台上进行的实验涉及七台具有不同能力的设备，证实了该方法的有效性。

更新时间: 2024-08-20 21:26:59

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2408.11203v1

Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits

We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits (CCB), where a policy selects a subset in the action space. For example, it might choose a set of furniture pieces (a bed and a drawer) from available items (bed, drawer, chair, etc.) for interior design sales. This setting is widespread in fields such as recommender systems and healthcare, yet OPE/L of CCB remains unexplored in the relevant literature. Typical OPE/L methods such as regression and importance sampling can be applied to the CCB problem, however, they face significant challenges due to high bias or variance, exacerbated by the exponential growth in the number of available subsets. To address these challenges, we introduce a concept of factored action space, which allows us to decompose each subset into binary indicators. This formulation allows us to distinguish between the ''main effect'' derived from the main actions, and the ''residual effect'', originating from the supplemental actions, facilitating more effective OPE. Specifically, our estimator, called OPCB, leverages an importance sampling-based approach to unbiasedly estimate the main effect, while employing regression-based approach to deal with the residual effect with low variance. OPCB achieves substantial variance reduction compared to conventional importance sampling methods and bias reduction relative to regression methods under certain conditions, as illustrated in our theoretical analysis. Experiments demonstrate OPCB's superior performance over typical methods in both OPE and OPL.

Updated: 2024-08-20 21:25:04

标题: 在上下文组合赌博中有效的离线策略评估和学习

摘要: 我们探讨了在情境组合赌博机（CCB）中的离线策略评估和学习（OPE/L）问题，其中一种策略在行动空间中选择一个子集。例如，它可能从可用物品（床、抽屉、椅子等）中选择一组家具（床和抽屉）用于室内设计销售。这种情景在推荐系统和医疗保健等领域广泛存在，然而CCB的OPE/L在相关文献中尚未被探讨。典型的OPE/L方法，如回归和重要性采样，可以应用于CCB问题，但由于可用子集数量的指数增长，它们面临着高偏差或方差的显著挑战。为了解决这些挑战，我们引入了一个概念，即分解行动空间，这使我们能够将每个子集分解为二进制指示器。这种表述使我们能够区分出源自主要行动的“主要效应”和源自补充行动的“剩余效应”，从而促进更有效的OPE。具体而言，我们的估计器OPCB利用基于重要性采样的方法来无偏估计主要效应，同时采用基于回归的方法来处理具有低方差的剩余效应。在某些条件下，OPCB相对于传统重要性采样方法实现了显著的方差减少，相对于回归方法实现了偏差减少，如我们在理论分析中所示。实验证明OPCB在OPE和OPL方面比典型方法表现更优秀。

更新时间: 2024-08-20 21:25:04

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.11202v1

UKAN: Unbound Kolmogorov-Arnold Network Accompanied with Accelerated Library

In this work, we present a GPU-accelerated library for the underlying components of Kolmogorov-Arnold Networks (KANs), along with an algorithm to eliminate bounded grids in KANs. The GPU-accelerated library reduces the computational complexity of Basis Spline (B-spline) evaluation by a factor of $\mathcal{O}$(grid size) compared to existing codes, enabling batch computation for large-scale learning. To overcome the limitations of traditional KANs, we introduce Unbounded KANs (UKANs), which eliminate the need for a bounded grid and a fixed number of B-spline coefficients. To do so, we replace the KAN parameters (B-spline coefficients) with a coefficient generator (CG) model. The inputs to the CG model are designed based on the idea of an infinite symmetric grid extending from negative infinity to positive infinity. The positional encoding of grid group, a sequential collection of B-spline grid indexes, is fed into the CG model, and coefficients are consumed by the efficient implementation (matrix representations) of B-spline functions to generate outputs. We perform several experiments on regression, classification, and generative tasks, which are promising. In particular, UKAN does not require data normalization or a bounded domain for evaluation. Additionally, our benchmarking results indicate the superior memory and computational efficiency of our library compared to existing codes.

Updated: 2024-08-20 21:20:38

标题: UKAN：伴随加速库的非约束科尔莫戈洛夫-阿诺德网络

摘要: 在这项工作中，我们提出了一个用于Kolmogorov-Arnold网络（KANs）基本组件的GPU加速库，以及一种消除KANs中有界网格的算法。GPU加速库将基础样条（B-spline）评估的计算复杂度降低了一个因子$\mathcal{O}$(网格大小) ，与现有代码相比，使得可以进行大规模学习的批量计算。为了克服传统KANs的局限性，我们引入了无界KANs（UKANs），消除了对有界网格和固定数量的B-spline系数的需求。为此，我们用系数生成器（CG）模型替换了KAN参数（B-spline系数）。CG模型的输入是基于从负无穷到正无穷延伸的无限对称网格的概念设计的。网格组的位置编码，即B-spline网格索引的顺序集合，被馈送到CG模型中，系数由B-spline函数的高效实现（矩阵表示）消耗以生成输出。我们在回归、分类和生成任务上进行了多次实验，效果良好。特别是，UKAN不需要数据归一化或有界域进行评估。此外，我们的基准结果表明，与现有代码相比，我们的库具有更高的内存和计算效率。

更新时间: 2024-08-20 21:20:38

领域: cs.LG

下载: http://arxiv.org/abs/2408.11200v1

EPiC: Cost-effective Search-based Prompt Engineering of LLMs for Code Generation

Large Language Models (LLMs) have seen increasing use in various software development tasks, especially in code generation. The most advanced recent methods attempt to incorporate feedback from code execution into prompts to help guide LLMs in generating correct code, in an iterative process. While effective, these methods could be costly and time-consuming due to numerous interactions with the LLM and the extensive token usage. To address this issue, we propose an alternative approach named Evolutionary Prompt Engineering for Code (EPiC), which leverages a lightweight evolutionary algorithm to evolve the original prompts toward better ones that produce high-quality code, with minimal interactions with LLM. Our evaluation against state-of-the-art (SOTA) LLM-based code generation models shows that EPiC outperforms all the baselines in terms of cost-effectiveness.

Updated: 2024-08-20 21:15:36

标题: EPiC: 高性价比的基于搜索的提示工程LLM用于代码生成

摘要: 大型语言模型（LLMs）在各种软件开发任务中的使用越来越普遍，特别是在代码生成中。最近最先进的方法尝试将代码执行的反馈纳入提示中，以帮助指导LLMs在生成正确代码时进行迭代过程。虽然有效，但由于与LLM的大量交互和广泛的令牌使用，这些方法可能会耗时且昂贵。为了解决这个问题，我们提出了一种名为Evolutionary Prompt Engineering for Code（EPiC）的替代方法，利用轻量级的进化算法来使原始提示向更好的提示演化，从而生成高质量代码，与LLM的交互最小化。我们针对最先进的LLM基础的代码生成模型进行评估，结果显示EPiC在成本效益方面优于所有基准线。

更新时间: 2024-08-20 21:15:36

领域: cs.SE,cs.AI,cs.NE

下载: http://arxiv.org/abs/2408.11198v1

Robust Topology Optimization Using Multi-Fidelity Variational Autoencoders

Robust topology optimization (RTO), as a class of topology optimization problems, identifies a design with the best average performance while reducing the response sensitivity to input uncertainties, e.g. load uncertainty. Solving RTO is computationally challenging as it requires repetitive finite element solutions for different candidate designs and different samples of random inputs. To address this challenge, a neural network method is proposed that offers computational efficiency because (1) it builds and explores a low dimensional search space which is parameterized using deterministically optimal designs corresponding to different realizations of random inputs, and (2) the probabilistic performance measure for each design candidate is predicted by a neural network surrogate. This method bypasses the numerous finite element response evaluations that are needed in the standard RTO approaches and with minimal training can produce optimal designs with better performance measures compared to those observed in the training set. Moreover, a multi-fidelity framework is incorporated to the proposed approach to further improve the computational efficiency. Numerical application of the method is shown on the robust design of L-bracket structure with single point load as well as multiple point loads.

Updated: 2024-08-20 21:03:31

标题: 使用多保真度变分自编码器的鲁棒性拓扑优化

摘要: 鲁棒拓扑优化（RTO）作为一类拓扑优化问题，旨在识别设计具有最佳平均性能，同时降低对输入不确定性（如载荷不确定性）的响应敏感性。解决RTO问题在计算上具有挑战性，因为需要针对不同候选设计和不同随机输入样本进行重复有限元解。为了解决这一挑战，提出了一种神经网络方法，该方法具有计算效率，因为（1）它构建并探索一个低维搜索空间，该空间使用对应于不同随机输入实现的确定性最优设计进行参数化，以及（2）每个设计候选的概率性能度量由神经网络替代预测。该方法绕过了标准RTO方法中需要的大量有限元响应评估，并且在经过最小训练后，可以生成具有更好性能度量的最佳设计，与训练集中观察到的性能相比。此外，提出的方法还将多保真度框架纳入其中，以进一步提高计算效率。该方法的数值应用展示了L型支架结构在单点载荷和多点载荷下的鲁棒设计。

更新时间: 2024-08-20 21:03:31

领域: cs.LG

下载: http://arxiv.org/abs/2107.10661v2

Proposal of an Electronic Auditing System Applied to the Brazilian Electronic Voting Machine

A new system, called SELA -- Auditing Electronic System, has been developed to be applied to the Brazilian Electronic Voting Machine. The SELA was designed to use open hardware and software, making it widely known by society. The security of the auditing process is guaranteed by the application of a Fingerprint Algorithm, a Hash Function. This system is robust and requires minimal modifications to the Electronic Voting Machine. In this paper, SELA is described, and its use during the election process is analyzed. A comparison between SELA and the use of thermal printers as a secondary voting record system is also presented. The authors recommend a pilot implementation of SELA for the 2002 Brazilian Elections.

Updated: 2024-08-20 21:03:06

标题: 巴西电子投票机应用的电子审计系统提案

摘要: 一个名为SELA - 审计电子系统的新系统已经开发出来，用于应用于巴西电子投票机。SELA被设计为使用开放硬件和软件，使其被社会广泛知晓。审计过程的安全性由应用指纹算法和哈希函数来保证。这个系统是强大的，并且对电子投票机只需要进行最少的修改。在本文中，描述了SELA，并分析了其在选举过程中的使用情况。还提出了SELA和使用热打印机作为辅助投票记录系统的比较。作者建议在2002年巴西选举中进行SELA的试点实施。

更新时间: 2024-08-20 21:03:06

领域: cs.CR

下载: http://arxiv.org/abs/2408.11195v1

Active Learning of Molecular Data for Task-Specific Objectives

Active learning (AL) has shown promise for being a particularly data-efficient machine learning approach. Yet, its performance depends on the application and it is not clear when AL practitioners can expect computational savings. Here, we carry out a systematic AL performance assessment for three diverse molecular datasets and two common scientific tasks: compiling compact, informative datasets and targeted molecular searches. We implemented AL with Gaussian processes (GP) and used the many-body tensor as molecular representation. For the first task, we tested different data acquisition strategies, batch sizes and GP noise settings. AL was insensitive to the acquisition batch size and we observed the best AL performance for the acquisition strategy that combines uncertainty reduction with clustering to promote diversity. However, for optimal GP noise settings, AL did not outperform randomized selection of data points. Conversely, for targeted searches, AL outperformed random sampling and achieved data savings up to 64%. Our analysis provides insight into this task-specific performance difference in terms of target distributions and data collection strategies. We established that the performance of AL depends on the relative distribution of the target molecules in comparison to the total dataset distribution, with the largest computational savings achieved when their overlap is minimal.

Updated: 2024-08-20 20:50:29

标题: 分子数据的主动学习用于特定任务目标

摘要: 主动学习（AL）已显示出作为一种特别节约数据的机器学习方法的潜力。然而，其性能取决于应用程序，并不清楚AL从业者可以在何时期望节省计算资源。在这里，我们对三个不同的分子数据集和两个常见的科学任务进行了系统的AL性能评估：编制紧凑、信息丰富的数据集和有针对性的分子搜索。我们使用高斯过程（GP）实现了AL，并使用多体张量作为分子表示。对于第一个任务，我们测试了不同的数据采集策略、批量大小和GP噪声设置。AL对采集批量大小不敏感，我们观察到将不确定性减少与聚类相结合以促进多样性的采集策略取得了最佳的AL性能。然而，对于最佳的GP噪声设置，AL并不比随机选择数据点表现更好。相反，对于有针对性的搜索，AL胜过随机抽样，并实现了高达64％的数据节省。我们的分析提供了关于目标分布和数据收集策略方面的任务特定性能差异的见解。我们确定AL的性能取决于目标分子相对于总数据集分布的相对分布，当它们的重叠最小时，可以实现最大的计算节省。

更新时间: 2024-08-20 20:50:29

领域: cs.LG,physics.data-an

下载: http://arxiv.org/abs/2408.11191v1

Reading with Intent

Retrieval augmented generation (RAG) systems augment how knowledge language models are by integrating external information sources such as Wikipedia, internal documents, scientific papers, or the open internet. RAG systems that rely on the open internet as their knowledge source have to contend with the complexities of human-generated content. Human communication extends much deeper than just the words rendered as text. Intent, tonality, and connotation can all change the meaning of what is being conveyed. Recent real-world deployments of RAG systems have shown some difficulty in understanding these nuances of human communication. One significant challenge for these systems lies in processing sarcasm. Though the Large Language Models (LLMs) that make up the backbone of these RAG systems are able to detect sarcasm, they currently do not always use these detections for the subsequent processing of text. To address these issues, in this paper, we synthetically generate sarcastic passages from Natural Question's Wikipedia retrieval corpus. We then test the impact of these passages on the performance of both the retriever and reader portion of the RAG pipeline. We introduce a prompting system designed to enhance the model's ability to interpret and generate responses in the presence of sarcasm, thus improving overall system performance. Finally, we conduct ablation studies to validate the effectiveness of our approach, demonstrating improvements in handling sarcastic content within RAG systems.

Updated: 2024-08-20 20:47:27

标题: 带着目的阅读

摘要: 检索增强生成（RAG）系统通过整合外部信息源（如维基百科、内部文件、科学论文或开放互联网）增强了知识语言模型的功能。依赖开放互联网作为知识来源的RAG系统必须应对人类生成内容的复杂性。人类沟通远不止文本所呈现的文字。意图、语调和内涵都可能改变所传达内容的含义。最近RAG系统的实际部署显示出理解人类沟通细微差别的一些困难。这些系统面临的一个重要挑战在于处理讽刺。虽然构成这些RAG系统基础的大型语言模型（LLMs）能够检测讽刺，但目前并不总是在随后的文本处理中使用这些检测结果。为了解决这些问题，本文从自然问题的维基百科检索语料库中合成讽刺段落。然后测试这些段落对RAG管道的检索器和阅读器部分性能的影响。我们引入了一个提示系统，旨在增强模型在讽刺存在的情况下解释和生成回应的能力，从而提高整个系统的性能。最后，我们进行消融研究来验证我们方法的有效性，展示在RAG系统中处理讽刺内容的改进。

更新时间: 2024-08-20 20:47:27

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2408.11189v1

Optimization of Multi-Agent Flying Sidekick Traveling Salesman Problem over Road Networks

The mixed truck-drone delivery systems have attracted increasing attention for last-mile logistics, but real-world complexities demand a shift from single-agent, fully connected graph models to multi-agent systems operating on actual road networks. We introduce the multi-agent flying sidekick traveling salesman problem (MA-FSTSP) on road networks, extending the single truck-drone model to multiple trucks, each carrying multiple drones while considering full road networks for truck restrictions and flexible drone routes. We propose a mixed-integer linear programming model and an efficient three-phase heuristic algorithm for this NP-hard problem. Our approach decomposes MA-FSTSP into manageable subproblems of one truck with multiple drones. Then, it computes the routes for trucks without drones in subproblems, which are used in the final phase as heuristics to help optimize drone and truck routes simultaneously. Extensive numerical experiments on Manhattan and Boston road networks demonstrate our algorithm's superior effectiveness and efficiency, significantly outperforming both column generation and variable neighborhood search baselines in solution quality and computation time. Notably, our approach scales to more than 300 customers within a 5-minute time limit, showcasing its potential for large-scale, real-world logistics applications.

Updated: 2024-08-20 20:44:18

标题: 多智能体飞行助手在路网上的旅行推销员问题的优化

摘要: 混合卡车-无人机交付系统吸引了越来越多的关注，用于最后一英里的物流，但现实世界中的复杂性要求从单一代理、完全连接的图模型转向在实际道路网络上运行的多代理系统。我们在道路网络上引入了多代理飞行助手旅行推销员问题（MA-FSTSP），将单一卡车-无人机模型扩展为多个卡车，每辆卡车携带多个无人机，同时考虑卡车限制和灵活的无人机路线。我们提出了一个混合整数线性规划模型和一个高效的三阶段启发式算法，用于解决这个NP难问题。我们的方法将MA-FSTSP分解为一个卡车与多个无人机的可管理子问题。然后，在子问题中计算没有无人机的卡车的路线，这些路线在最终阶段被用作启发式方法，以帮助同时优化无人机和卡车路线。在曼哈顿和波士顿的道路网络上进行的大量数值实验表明，我们的算法具有卓越的效果和效率，比列生成和变量邻域搜索基线在解质量和计算时间方面明显优于。值得注意的是，我们的方法可以在5分钟的时间限制内扩展到超过300个客户，展示了其在大规模、实际物流应用中的潜力。

更新时间: 2024-08-20 20:44:18

领域: cs.RO,cs.AI,cs.MA

下载: http://arxiv.org/abs/2408.11187v1

Autonomous Negotiation Using Comparison-Based Gradient Estimation

Negotiation is useful for resolving conflicts in multi-agent systems. We explore autonomous negotiation in a setting where two self-interested rational agents sequentially trade items from a finite set of categories. Each agent has a utility function that depends on the amount of items it possesses in each category. The offering agent makes trade offers to improve its utility without knowing the responding agent's utility function, and the responding agent accepts offers that improve its utility. We present a comparison-based algorithm for the offering agent that generates offers through previous acceptance or rejection responses without extensive information sharing. The algorithm estimates the responding agent's gradient by leveraging the rationality assumption and rejected offers to prune the space of potential gradients. After the algorithm makes a finite number of consecutively rejected offers, the responding agent is at a near-optimal state, or the agents' preferences are closely aligned. Additionally, we facilitate negotiations with humans by representing natural language feedback as comparisons that can be integrated into the proposed algorithm. We compare the proposed algorithm against random search baselines in integer and fractional trading scenarios and show that it improves the societal benefit with fewer offers.

Updated: 2024-08-20 20:42:41

标题: 自主谈判使用基于比较的梯度估计

摘要: 谈判在多智能体系统中解决冲突很有用。我们探讨了在一个设置中进行自主谈判，其中两个自私理性的智能体依次交易来自有限类别集合的物品。每个智能体都有一个效用函数，该函数取决于它在每个类别中拥有的物品数量。提供交易的智能体提出交易提议，以提高其效用，而不知道响应智能体的效用函数，而响应智能体接受能提高其效用的提议。我们提出了一种基于比较的算法，用于提供交易的智能体，通过先前的接受或拒绝响应生成提议，而无需进行大量信息共享。该算法通过利用理性假设和被拒绝的提议来修剪潜在梯度空间，估计了响应智能体的梯度。在算法连续生成有限数量的被拒绝提议之后，响应智能体处于接近最优状态，或者智能体的偏好紧密对齐。此外，我们通过将自然语言反馈表示为比较，将其整合到提出的算法中，以便与人类进行谈判。我们将提出的算法与整数和分数交易场景中的随机搜索基线进行比较，并表明它提高了社会利益，并减少了提议的数量。

更新时间: 2024-08-20 20:42:41

领域: cs.MA,cs.AI,math.OC

下载: http://arxiv.org/abs/2408.11186v1

CRACKS: Crowdsourcing Resources for Analysis and Categorization of Key Subsurface faults

Crowdsourcing annotations has created a paradigm shift in the availability of labeled data for machine learning. Availability of large datasets has accelerated progress in common knowledge applications involving visual and language data. However, specialized applications that require expert labels lag in data availability. One such application is fault segmentation in subsurface imaging. Detecting, tracking, and analyzing faults has broad societal implications in predicting fluid flows, earthquakes, and storing excess atmospheric CO$_2$. However, delineating faults with current practices is a labor-intensive activity that requires precise analysis of subsurface imaging data by geophysicists. In this paper, we propose the $\texttt{CRACKS}$ dataset to detect and segment faults in subsurface images by utilizing crowdsourced resources. We leverage Amazon Mechanical Turk to obtain fault delineations from sections of the Netherlands North Sea subsurface images from (i) $26$ novices who have no exposure to subsurface data and were shown a video describing and labeling faults, (ii) $8$ practitioners who have previously interacted and worked on subsurface data, (iii) one geophysicist to label $7636$ faults in the region. Note that all novices, practitioners, and the expert segment faults on the same subsurface volume with disagreements between and among the novices and practitioners. Additionally, each fault annotation is equipped with the confidence level of the annotator. The paper provides benchmarks on detecting and segmenting the expert labels, given the novice and practitioner labels. Additional details along with the dataset links and codes are available at $\href{https://alregib.ece.gatech.edu/cracks-crowdsourcing-resources-for-analysis-and-categorization-of-key-subsurface-faults/}{link}$.

Updated: 2024-08-20 20:40:11

标题: CRACKS: 用于分析和分类关键地下断层的众包资源

摘要: 众包注释已经在为机器学习提供标记数据方面产生了一种范式转变。大型数据集的可用性加速了涉及视觉和语言数据的常见知识应用的进展。然而，需要专家标签的专业应用在数据可用性方面落后。其中一个这样的应用是地下成像中的故障分割。检测、跟踪和分析故障在预测流体流动、地震和存储过量大气CO$_2$方面具有广泛的社会影响。然而，用当前的做法来勾画故障是一项需要地球物理学家对地下成像数据进行精确分析的劳动密集型活动。在本文中，我们提出了$\texttt{CRACKS}$数据集，通过利用众包资源来检测和分割地下成像中的故障。我们利用亚马逊机械土耳其人从荷兰北海地下成像的部分图像中获得故障勾画，包括(i) $26$名对地下数据没有接触过并观看了描述和标记故障的视频的新手，(ii) $8$名之前与地下数据互动并从事相关工作的从业者，(iii) 一名地球物理学家在该地区标记了$7636$个故障。值得注意的是，所有新手、从业者和专家在同一地下体积上对故障进行分割，新手和从业者之间以及内部存在分歧。此外，每个故障注释都附带有注释者的置信水平。该论文提供了基准，以检测和分割专家标签，考虑到新手和从业者的标签。有关更多详细信息以及数据集链接和代码，请访问链接。

更新时间: 2024-08-20 20:40:11

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2408.11185v1

Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Neural Carrier Articles

Jailbreak attacks on Language Model Models (LLMs) entail crafting prompts aimed at exploiting the models to generate malicious content. This paper proposes a new type of jailbreak attacks which shift the attention of the LLM by inserting a prohibited query into a carrier article. The proposed attack leverage the knowledge graph and a composer LLM to automatically generating a carrier article that is similar to the topic of the prohibited query but does not violate LLM's safeguards. By inserting the malicious query to the carrier article, the assembled attack payload can successfully jailbreak LLM. To evaluate the effectiveness of our method, we leverage 4 popular categories of ``harmful behaviors'' adopted by related researches to attack 6 popular LLMs. Our experiment results show that the proposed attacking method can successfully jailbreak all the target LLMs which high success rate, except for Claude-3.

Updated: 2024-08-20 20:35:04

标题: 将您的恶意目标隐藏在良性叙述中：通过神经载体文章越狱大型语言模型

摘要: 监狱突破攻击对语言模型（LLMs）构建提示，旨在利用模型生成恶意内容。本文提出了一种新类型的监狱突破攻击，通过在载体文章中插入禁止的查询来转移LLM的注意力。所提出的攻击利用知识图和一个作曲家LLM来自动生成一个与禁止查询主题相似但不违反LLM保护措施的载体文章。通过将恶意查询插入载体文章，组装的攻击载荷可以成功地突破LLM。为了评估我们方法的有效性，我们利用相关研究采用的4种流行类别的“有害行为”来攻击6种流行的LLM。我们的实验结果表明，所提出的攻击方法可以成功地突破所有目标LLMs，成功率很高，除了Claude-3。

更新时间: 2024-08-20 20:35:04

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2408.11182v1

A Full DAG Score-Based Algorithm for Learning Causal Bayesian Networks with Latent Confounders

Causal Bayesian networks (CBN) are popular graphical probabilistic models that encode causal relations among variables. Learning their graphical structure from observational data has received a lot of attention in the literature. When there exists no latent (unobserved) confounder, i.e., no unobserved direct common cause of some observed variables, learning algorithms can be divided essentially into two classes: constraint-based and score-based approaches. The latter are often thought to be more robust than the former and to produce better results. However, to the best of our knowledge, when variables are discrete, no score-based algorithm is capable of dealing with latent confounders. This paper introduces the first fully score-based structure learning algorithm searching the space of DAGs (directed acyclic graphs) that is capable of identifying the presence of some latent confounders. It is justified mathematically and experiments highlight its effectiveness.

Updated: 2024-08-20 20:25:56

标题: 一个基于完整DAG得分的算法，用于学习具有潜在混淆因素的因果贝叶斯网络

摘要: 因果贝叶斯网络（CBN）是一种流行的图形概率模型，用于编码变量之间的因果关系。从观测数据中学习它们的图形结构在文献中受到了很多关注。当不存在潜在（未观察到的）混杂因素时，即某些观察变量没有未观察到的直接共同原因，学习算法基本上可以分为两类：基于约束和基于评分的方法。后者通常被认为比前者更稳健，并且能够产生更好的结果。然而，据我们所知，当变量是离散的时，没有基于评分的算法能够处理潜在的混杂因素。本文介绍了第一个完全基于评分的结构学习算法，搜索DAGs（有向无环图）空间，能够识别某些潜在的混杂因素的存在。这在数学上得到了证明，实验证实了其有效性。

更新时间: 2024-08-20 20:25:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.11181v1

SubgoalXL: Subgoal-based Expert Learning for Theorem Proving

Formal theorem proving, a field at the intersection of mathematics and computer science, has seen renewed interest with advancements in large language models (LLMs). This paper introduces SubgoalXL, a novel approach that synergizes subgoal-based proofs with expert learning to enhance LLMs' capabilities in formal theorem proving within the Isabelle environment. SubgoalXL addresses two critical challenges: the scarcity of specialized mathematics and theorem-proving data, and the need for improved multi-step reasoning abilities in LLMs. By optimizing data efficiency and employing subgoal-level supervision, SubgoalXL extracts richer information from limited human-generated proofs. The framework integrates subgoal-oriented proof strategies with an expert learning system, iteratively refining formal statement, proof, and subgoal generators. Leveraging the Isabelle environment's advantages in subgoal-based proofs, SubgoalXL achieves a new state-of-the-art performance of 56.1\% in Isabelle on the standard miniF2F dataset, marking an absolute improvement of 4.9\%. Notably, SubgoalXL successfully solves 41 AMC12, 9 AIME, and 3 IMO problems from miniF2F. These results underscore the effectiveness of maximizing limited data utility and employing targeted guidance for complex reasoning in formal theorem proving, contributing to the ongoing advancement of AI reasoning capabilities. The implementation is available at \url{https://github.com/zhaoxlpku/SubgoalXL}.

Updated: 2024-08-20 20:10:53

标题: SubgoalXL：基于子目标的定理证明专家学习

摘要: 形式定理证明是数学和计算机科学交叉领域，随着大型语言模型（LLMs）的进步，受到了新的关注。本文介绍了SubgoalXL，一种新颖的方法，它将基于子目标的证明与专家学习相结合，以增强LLMs在Isabelle环境中进行形式定理证明的能力。SubgoalXL解决了两个关键挑战：专门数学和定理证明数据的稀缺性，以及LLMs在多步推理能力方面的需求。通过优化数据效率并应用子目标级别的监督，SubgoalXL从有限的人工生成的证明中提取更丰富的信息。该框架将基于子目标的证明策略与专家学习系统集成，迭代地优化形式语句、证明和子目标生成器。利用Isabelle环境在基于子目标的证明中的优势，SubgoalXL在标准miniF2F数据集上实现了56.1\%的最新性能，绝对改进了4.9\%。值得注意的是，SubgoalXL成功解决了miniF2F中的41个AMC12、9个AIME和3个IMO问题。这些结果突显了在形式定理证明中最大化有限数据效用和引导复杂推理的有效性，有助于推动人工智能推理能力的持续进步。该实现可在\url{https://github.com/zhaoxlpku/SubgoalXL}获取。

更新时间: 2024-08-20 20:10:53

领域: cs.LG,cs.AI,cs.CL,cs.LO

下载: http://arxiv.org/abs/2408.11172v1

Creative Problem Solving in Large Language and Vision Models -- What Would it Take?

In this paper, we discuss approaches for integrating Computational Creativity (CC) with research in large language and vision models (LLVMs) to address a key limitation of these models, i.e., creative problem solving. We present preliminary experiments showing how CC principles can be applied to address this limitation through augmented prompting. With this work, we hope to foster discussions of Computational Creativity in the context of ML algorithms for creative problem solving in LLVMs. Our code is at: https://github.com/lnairGT/creative-problem-solving-LLMs

Updated: 2024-08-20 20:03:14

标题: 大型语言和视觉模型中的创造性问题解决--需要什么？

摘要: 在这篇论文中，我们讨论了将计算创造力（CC）与大型语言和视觉模型（LLVMs）的研究相结合的方法，以解决这些模型的一个关键局限，即创造性问题解决。我们呈现了初步实验，展示了如何应用CC原则通过增强提示来解决这一限制。通过这项工作，我们希望促进在LLVMs中利用ML算法进行创造性问题解决的计算创造力讨论。我们的代码位于：https://github.com/lnairGT/creative-problem-solving-LLMs

更新时间: 2024-08-20 20:03:14

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.01453v2

The Ensemble Epanechnikov Mixture Filter

In the high-dimensional setting, Gaussian mixture kernel density estimates become increasingly suboptimal. In this work we aim to show that it is practical to instead use the optimal multivariate Epanechnikov kernel. We make use of this optimal Epanechnikov mixture kernel density estimate for the sequential filtering scenario through what we term the ensemble Epanechnikov mixture filter (EnEMF). We provide a practical implementation of the EnEMF that is as cost efficient as the comparable ensemble Gaussian mixture filter. We show on a static example that the EnEMF is robust to growth in dimension, and also that the EnEMF has a significant reduction in error per particle on the 40-variable Lorenz '96 system.

Updated: 2024-08-20 19:50:59

标题: 合奏Epanechnikov混合滤波器

摘要: 在高维环境中，高斯混合核密度估计变得越来越次优。在这项工作中，我们旨在展示实际上使用最优多元Epanechnikov核是可行的。我们利用这种最优Epanechnikov混合核密度估计，通过我们所称的集合Epanechnikov混合滤波器（EnEMF）来进行顺序过滤场景。我们提供了EnEMF的实际实现，其成本效率与可比较的集合高斯混合滤波器一样高。我们在一个静态示例上展示了EnEMF对维度增长的鲁棒性，同时也展示了EnEMF在40个变量的Lorenz '96系统上每个粒子的误差显著减少。

更新时间: 2024-08-20 19:50:59

领域: stat.ML,cs.LG,cs.NA,math.NA,math.OC,stat.ME

下载: http://arxiv.org/abs/2408.11164v1

Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making

Recently, large language models (LLMs) have expanded into various domains. However, there remains a need to evaluate how these models perform when prompted with commonplace queries compared to domain-specific queries, which may be useful for benchmarking prior to fine-tuning for domain-specific downstream tasks. This study evaluates LLMs, specifically Gemma-2B and Gemma-7B, across diverse domains, including cybersecurity, medicine, and finance, compared to common knowledge queries. This study utilizes a comprehensive methodology to assess foundational models, which includes problem formulation, data analysis, and the development of ThroughCut, a novel outlier detection technique that automatically identifies response throughput outliers based on their conciseness. This methodological rigor enhances the credibility of the presented evaluation frameworks. This study focused on assessing inference time, response length, throughput, quality, and resource utilization and investigated the correlations between these factors. The results indicate that model size and types of prompts used for inference significantly influenced response length and quality. In addition, common prompts, which include various types of queries, generate diverse and inconsistent responses at irregular intervals. In contrast, domain-specific prompts consistently generate concise responses within a reasonable time. Overall, this study underscores the need for comprehensive evaluation frameworks to enhance the reliability of benchmarking procedures in multidomain AI research.

Updated: 2024-08-20 19:17:58

标题: 评估基础模型的有效性：推进基准测试实践以增强微调决策制定

摘要: 最近，大型语言模型（LLMs）已经扩展到各个领域。然而，仍然需要评估这些模型在常见查询和领域特定查询的情况下表现如何，这对于在领域特定下游任务微调之前进行基准测试可能是有用的。本研究评估了LLMs，特别是Gemma-2B和Gemma-7B，在包括网络安全、医学和金融在内的多个领域中与常见知识查询进行比较。本研究采用了全面的方法论来评估基础模型，包括问题制定、数据分析以及开发了一种新颖的异常检测技术ThroughCut，该技术可以根据简洁性自动识别响应吞吐量异常值。这种方法论的严谨性增强了所提出的评估框架的可信度。本研究重点评估了推理时间、响应长度、吞吐量、质量和资源利用率，并调查了这些因素之间的相关性。结果表明，模型大小和用于推理的提示类型显著影响了响应长度和质量。此外，常见提示，包括各种类型的查询，在不规律的间隔时间生成多样化且不一致的响应。相比之下，领域特定的提示在合理的时间内始终生成简洁的响应。总的来说，本研究强调了需要全面评估框架以增强多领域AI研究中基准测试程序的可靠性。

更新时间: 2024-08-20 19:17:58

领域: cs.CL,cs.AI,cs.LG,cs.PF

下载: http://arxiv.org/abs/2407.11006v2

A Roadmap to Pluralistic Alignment

With increased power and prevalence of AI systems, it is ever more critical that AI systems are designed to serve all, i.e., people with diverse values and perspectives. However, aligning models to serve pluralistic human values remains an open research question. In this piece, we propose a roadmap to pluralistic alignment, specifically using language models as a test bed. We identify and formalize three possible ways to define and operationalize pluralism in AI systems: 1) Overton pluralistic models that present a spectrum of reasonable responses; 2) Steerably pluralistic models that can steer to reflect certain perspectives; and 3) Distributionally pluralistic models that are well-calibrated to a given population in distribution. We also formalize and discuss three possible classes of pluralistic benchmarks: 1) Multi-objective benchmarks, 2) Trade-off steerable benchmarks, which incentivize models to steer to arbitrary trade-offs, and 3) Jury-pluralistic benchmarks which explicitly model diverse human ratings. We use this framework to argue that current alignment techniques may be fundamentally limited for pluralistic AI; indeed, we highlight empirical evidence, both from our own experiments and from other work, that standard alignment procedures might reduce distributional pluralism in models, motivating the need for further research on pluralistic alignment.

Updated: 2024-08-20 19:14:31

标题: 一个通往多元对齐的路线图

摘要: 随着人工智能系统的权力和普及增加，设计人工智能系统以服务所有人，即具有不同价值观和观点的人，变得更加关键。然而，将模型对齐以服务多元人类价值仍然是一个开放的研究问题。在本文中，我们提出了一个多元对齐的路线图，具体使用语言模型作为测试基础。我们确定并形式化了在人工智能系统中定义和操作化多元主义的三种可能方式：1）Overton多元主义模型，呈现出一系列合理的反应；2）可控多元主义模型，可以调整以反映某些观点；3）分布式多元主义模型，对给定人群在分布上进行了良好校准。我们还形式化并讨论了三种可能的多元主义基准类别：1）多目标基准，2）权衡可控基准，激励模型进行任意权衡调整，3）陪审团多元主义基准，明确模拟多样化的人类评级。我们使用这个框架来论证当前的对齐技术在多元主义人工智能方面可能存在根本限制；事实上，我们强调了来自我们自己的实验和其他工作的经验证据，即标准的对齐程序可能会减少模型中的分布式多元主义，促使进一步研究多元对齐的需求。

更新时间: 2024-08-20 19:14:31

领域: cs.AI,cs.CL,cs.IR

下载: http://arxiv.org/abs/2402.05070v3

Swim till You Sink: Computing the Limit of a Game

During 2023, two interesting results were proven about the limit behavior of game dynamics: First, it was shown that there is a game for which no dynamics converges to the Nash equilibria. Second, it was shown that the sink equilibria of a game adequately capture the limit behavior of natural game dynamics. These two results have created a need and opportunity to articulate a principled computational theory of the meaning of the game that is based on game dynamics. Given any game in normal form, and any prior distribution of play, we study the problem of computing the asymptotic behavior of a class of natural dynamics called the noisy replicator dynamics as a limit distribution over the sink equilibria of the game. When the prior distribution has pure strategy support, we prove this distribution can be computed efficiently, in near-linear time to the size of the best-response graph. When the distribution can be sampled -- for example, if it is the uniform distribution over all mixed strategy profiles -- we show through experiments that the limit distribution of reasonably large games can be estimated quite accurately through sampling and simulation.

Updated: 2024-08-20 19:09:21

标题: 游泳直到你沉没：计算游戏的极限

摘要: 在2023年，关于游戏动态的极限行为有两个有趣的结果被证明：首先，有一个游戏被证明不存在任何动态会收敛到纳什均衡。其次，有一个游戏的吸引均衡很好地捕捉了自然游戏动态的极限行为。这两个结果引发了对基于游戏动态的游戏含义的计算理论的需求和机会。给定任何正常形式的游戏，以及任何先前的游戏分布，我们研究了计算一类自然动态（称为噪声复制者动态）的渐近行为的问题，将其作为游戏吸引均衡的极限分布。当先前的分布具有纯策略支持时，我们证明这个分布可以高效地计算，在接近线性时间内达到最佳响应图的大小。当分布可以进行抽样时——例如，如果它是所有混合策略配置的均匀分布——我们通过实验表明，通过抽样和模拟可以相当准确地估计相当大的游戏的极限分布。

更新时间: 2024-08-20 19:09:21

领域: cs.GT,cs.LG,econ.TH

下载: http://arxiv.org/abs/2408.11146v1

Total Uncertainty Quantification in Inverse PDE Solutions Obtained with Reduced-Order Deep Learning Surrogate Models

We propose an approximate Bayesian method for quantifying the total uncertainty in inverse PDE solutions obtained with machine learning surrogate models, including operator learning models. The proposed method accounts for uncertainty in the observations and PDE and surrogate models. First, we use the surrogate model to formulate a minimization problem in the reduced space for the maximum a posteriori (MAP) inverse solution. Then, we randomize the MAP objective function and obtain samples of the posterior distribution by minimizing different realizations of the objective function. We test the proposed framework by comparing it with the iterative ensemble smoother and deep ensembling methods for a non-linear diffusion equation with an unknown space-dependent diffusion coefficient. Among other problems, this equation describes groundwater flow in an unconfined aquifer. Depending on the training dataset and ensemble sizes, the proposed method provides similar or more descriptive posteriors of the parameters and states than the iterative ensemble smoother method. Deep ensembling underestimates uncertainty and provides less informative posteriors than the other two methods.

Updated: 2024-08-20 19:06:02

标题: 使用降阶深度学习代理模型获得的反问题PDE解的总不确定性量化

摘要: 我们提出了一种近似贝叶斯方法，用于量化利用机器学习代理模型获得的逆PDE解决方案中的总不确定性，包括操作员学习模型。所提出的方法考虑了观测和PDE以及代理模型的不确定性。首先，我们使用代理模型在减少空间中制定一个最大后验（MAP）逆解决方案的最小化问题。然后，我们随机化MAP目标函数，并通过最小化目标函数的不同实现来获得后验分布的样本。我们通过将该方法与迭代集合平滑器和深度集成方法进行比较，来测试所提出的框架，应用于具有未知空间相关扩散系数的非线性扩散方程。该方程描述了无限制含水层中的地下水流动。根据训练数据集和集合大小，所提出的方法提供了与迭代集合平滑器方法相似或更详细的参数和状态的后验。深度集成低估了不确定性，并提供了比其他两种方法更少信息的后验。

更新时间: 2024-08-20 19:06:02

领域: cs.LG

下载: http://arxiv.org/abs/2408.11145v1

Large Model Strategic Thinking, Small Model Efficiency: Transferring Theory of Mind in Large Language Models

As the performance of larger, newer Large Language Models continues to improve for strategic Theory of Mind (ToM) tasks, the demand for these state-of-the-art models increases commensurately. However, their deployment is costly both in terms of processing power and time. In this paper, we investigate the feasibility of creating smaller, highly-performing specialized algorithms by way of fine-tuning. To do this, we first present a large pre-trained model with 20 unique scenarios that combine different social contexts with games of varying social dilemmas, record its answers, and use them for Q&A fine-tuning on a smaller model of the same family. Our focus is on in-context game-theoretic decision-making, the same domain within which human interaction occurs and that requires both a theory of mind (or a semblance thereof) and an understanding of social dynamics. The smaller model is therefore trained not just on the answers provided, but also on the motivations provided by the larger model, which should contain advice and guidelines to navigate both strategic dilemmas and social cues. We find that the fine-tuned smaller language model consistently bridged the gap in performance between the smaller pre-trained version of the model and its larger relative and that its improvements extended in areas and contexts beyond the ones provided in the training examples, including on out-of-sample scenarios that include completely different game structures. On average for all games, through fine-tuning, the smaller model showed a 46% improvement measured as alignment towards the behavior of the larger model, with 100% representing indistinguishable behavior. When presented with out-of-sample social contexts and games, the fine-tuned model still displays remarkable levels of alignment, reaching an improvement of 18% and 28% respectively.

Updated: 2024-08-20 18:58:00

标题: 大型模型的战略思维，小型模型的效率：在大型语言模型中转移心智理论

摘要: 随着更大、更新的大型语言模型在战略心灵理论（ToM）任务中的性能不断提高，对这些最先进模型的需求也相应增加。然而，它们的部署在处理能力和时间方面都是昂贵的。本文研究通过微调创建更小、性能高的专门算法的可行性。为此，我们首先展示了一个包含不同社会背景和不同社会困境游戏的20个独特场景的大型预训练模型，并记录了其答案，然后将这些答案用于对同一系列更小模型进行问答微调。我们的重点是在情境游戏理论决策制定，这是人类互动发生的领域，需要心灵理论（或类似理论）和对社会动态的理解。因此，更小的模型不仅在提供的答案上进行了训练，还在更大模型提供的动机上进行了训练，这应包含用于导航战略困境和社会提示的建议和指导。我们发现，微调后的更小语言模型始终填补了较小预训练模型版本和其更大相对版本之间性能差距，并且其改进范围扩展到了超出训练示例提供的领域和情境，包括包含完全不同游戏结构的样本场景。对于所有游戏的平均值，通过微调，更小模型表现出46%的改进，以对齐向较大模型的行为，其中100%代表不可区分的行为。在面对超出样本社会背景和游戏时，微调模型仍然显示出显著的对齐水平，分别达到18%和28%的改进。

更新时间: 2024-08-20 18:58:00

领域: cs.CL,cs.AI,cs.CY,cs.ET,cs.GT

下载: http://arxiv.org/abs/2408.05241v3

Beyond the Typical: Modeling Rare Plausible Patterns in Chemical Reactions by Leveraging Sequential Mixture-of-Experts

Reaction prediction, a critical task in synthetic chemistry, is to predict the outcome of a reaction based on given reactants. Generative models like Transformer and VAE have typically been employed to predict the reaction product. However, these likelihood-maximization models overlooked the inherent stochastic nature of chemical reactions, such as the multiple ways electrons can be redistributed among atoms during the reaction process. In scenarios where similar reactants could follow different electron redistribution patterns, these models typically predict the most common outcomes, neglecting less frequent but potentially crucial reaction patterns. These overlooked patterns, though rare, can lead to innovative methods for designing synthetic routes and significantly advance synthesis techniques. To break the limits of previous approaches, we propose organizing the mapping space between reactants and electron redistribution patterns in a divide-and-conquer manner. We address the reaction problem by training multiple expert models, each specializing in capturing a type of electron redistribution pattern in reaction. These experts enhance the prediction process by considering both typical and other less common electron redistribution manners. In the inference stage, a dropout strategy is applied to each expert to improve the electron redistribution diversity. The most plausible products are finally predicted through a ranking stage designed to integrate the predictions from multiple experts. Experimental results on the largest reaction prediction benchmark USPTO-MIT show the superior performance of our proposed method compared to baselines.

Updated: 2024-08-20 18:52:56

标题: 超越典型：通过利用顺序专家混合模型对化学反应中的罕见可信模式进行建模

摘要: 反应预测是合成化学中的一个关键任务，其目标是基于给定的反应物预测反应的结果。生成模型如Transformer和VAE通常被用来预测反应产物。然而，这些最大似然模型忽视了化学反应固有的随机性，比如在反应过程中电子在原子之间重新分配的多种方式。在类似的反应物可能遵循不同电子重新分配模式的情况下，这些模型通常预测最常见的结果，忽略了更少见但潜在关键的反应模式。这些被忽视的模式，尽管罕见，却可以引导创新的合成路线设计方法，显著推进合成技术。为了突破以往方法的限制，我们提出以分治的方式组织反应物和电子重新分配模式之间的映射空间。我们通过训练多个专家模型来解决反应问题，每个专家专注于捕捉反应中一种类型的电子重新分配模式。这些专家通过考虑典型和其他不太常见的电子重新分配方式来增强预测过程。在推理阶段，对每个专家应用了一种辍学策略来提高电子重新分配的多样性。通过一个排名阶段设计来整合多个专家的预测结果最终预测出最合理的产物。在最大反应预测基准USPTO-MIT上的实验结果显示，我们提出的方法相比基线具有优越的性能。

更新时间: 2024-08-20 18:52:56

领域: cs.LG,physics.chem-ph

下载: http://arxiv.org/abs/2310.04674v2

MS$^3$D: A RG Flow-Based Regularization for GAN Training with Limited Data

Generative adversarial networks (GANs) have made impressive advances in image generation, but they often require large-scale training data to avoid degradation caused by discriminator overfitting. To tackle this issue, we investigate the challenge of training GANs with limited data, and propose a novel regularization method based on the idea of renormalization group (RG) in physics.We observe that in the limited data setting, the gradient pattern that the generator obtains from the discriminator becomes more aggregated over time. In RG context, this aggregated pattern exhibits a high discrepancy from its coarse-grained versions, which implies a high-capacity and sensitive system, prone to overfitting and collapse. To address this problem, we introduce a \textbf{m}ulti-\textbf{s}cale \textbf{s}tructural \textbf{s}elf-\textbf{d}issimilarity (MS$^3$D) regularization, which constrains the gradient field to have a consistent pattern across different scales, thereby fostering a more redundant and robust system. We show that our method can effectively enhance the performance and stability of GANs under limited data scenarios, and even allow them to generate high-quality images with very few data.

Updated: 2024-08-20 18:37:37

标题: MS$^3$D：基于RG流的有限数据下生成对抗网络训练的正则化

摘要: 生成对抗网络（GANs）在图像生成方面取得了令人印象深刻的进展，但通常需要大规模的训练数据以避免由鉴别器过拟合引起的降级。为了解决这个问题，我们研究了在有限数据情况下训练GANs的挑战，并提出了一种基于物理学中再正规化群（RG）思想的新型正则化方法。我们观察到在有限数据设置中，生成器从鉴别器获得的梯度模式随时间变得更加聚合。在RG背景下，这种聚合模式与其粗粒化版本之间存在明显差异，这意味着一个高容量和敏感的系统容易过拟合和崩溃。为了解决这个问题，我们引入了一种多尺度结构自相似性（MS^3D）正则化，该方法约束梯度场在不同尺度上具有一致的模式，从而促进一个更加冗余和稳健的系统。我们展示了我们的方法可以有效增强GANs在有限数据情况下的性能和稳定性，甚至使它们能够仅使用很少的数据生成高质量图像。

更新时间: 2024-08-20 18:37:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.11135v1

Towards the Unmanned Aerial Vehicle Traffic Management Systems (UTMs): Security Risks and Challenges

Every aspect of our life depends on the ability to communicate effectively. Organizations that manage to establish communication routines, protocols and means thrive. An Aerial Traffic Management System operates similarly as an organization but certainly in a more strict manner. Third party agencies ensure several aspects of their functionality, the utmost to be consider safety. Many people take safety as granted but it is a pretty difficult part our daily functions. Thus, apart from digesting new things and habits of the new era, simultaneously we have to ensure safety in every part of it. It is true that the more data we produce, the more information we create and the more specialization we must introduce in order to be effective in a reasonable time basis. A Unmanned Aircraft System Traffic Management (UTM) is a system that consists of miscellaneous modules where each of them needs its consideration regarding safety. In other words, a UTM is the state-of-the-art system that demand a high quality of services and specialization, if we need to consider them reliable.

Updated: 2024-08-20 18:25:34

标题: 走向无人机交通管理系统（UTMs）：安全风险和挑战

摘要: 我们生活的方方面面都依赖于有效沟通的能力。成功建立沟通例程、协议和手段的组织蓬勃发展。空中交通管理系统类似于一个组织，但肯定以更严格的方式运作。第三方机构确保其功能的几个方面，其中最重要的是安全性。许多人认为安全是理所当然的，但这是我们日常功能中相当困难的一部分。因此，除了消化新事物和新时代的习惯外，我们还必须同时确保其各个方面的安全。事实上，我们产生的数据越多，我们创造的信息也就越多，我们必须引入更多专业化，以便在合理的时间基础上有效。无人机系统交通管理（UTM）是一个由各种模块组成的系统，其中每个模块都需要就其安全性进行考虑。换句话说，UTM是一个需要高质量服务和专业化的最先进系统，如果我们需要认为它们是可靠的。

更新时间: 2024-08-20 18:25:34

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2408.11125v1

Learning Complete Topology-Aware Correlations Between Relations for Inductive Link Prediction

Inductive link prediction -- where entities during training and inference stages can be different -- has shown great potential for completing evolving knowledge graphs in an entity-independent manner. Many popular methods mainly focus on modeling graph-level features, while the edge-level interactions -- especially the semantic correlations between relations -- have been less explored. However, we notice a desirable property of semantic correlations between relations is that they are inherently edge-level and entity-independent. This implies the great potential of the semantic correlations for the entity-independent inductive link prediction task. Inspired by this observation, we propose a novel subgraph-based method, namely TACO, to model Topology-Aware COrrelations between relations that are highly correlated to their topological structures within subgraphs. Specifically, we prove that semantic correlations between any two relations can be categorized into seven topological patterns, and then proposes Relational Correlation Network (RCN) to learn the importance of each pattern. To further exploit the potential of RCN, we propose Complete Common Neighbor induced subgraph that can effectively preserve complete topological patterns within the subgraph. Extensive experiments demonstrate that TACO effectively unifies the graph-level information and edge-level interactions to jointly perform reasoning, leading to a superior performance over existing state-of-the-art methods for the inductive link prediction task.

Updated: 2024-08-20 18:24:32

标题: 学习归纳链路预测中关系之间完整拓扑感知相关性

摘要: 感知链接预测 - 在训练和推断阶段实体可以不同 - 已经展现出在实体独立的方式完成不断演化的知识图谱的巨大潜力。许多流行的方法主要集中在建模图级特征，而边级交互 - 特别是关系之间的语义相关性 - 则被较少探索。然而，我们注意到关系之间的语义相关性的一个令人满意的特性是，它们本质上是边级和实体独立的。这意味着语义相关性对于实体独立的感知链接预测任务具有巨大潜力。受到这一观察的启发，我们提出了一种新颖的基于子图的方法，即TACO，用于建模高度与子图内部拓扑结构相关的关系之间的拓扑感知相关性。具体地，我们证明了任意两个关系之间的语义相关性可以被归类为七种拓扑模式，然后提出了关系相关网络（RCN）来学习每种模式的重要性。为了进一步利用RCN的潜力，我们提出了完全共同邻居诱导的子图，可以有效地保留子图内部的完整拓扑模式。大量实验证明，TACO有效地统一了图级信息和边级交互，共同进行推理，比现有的最先进方法在感知链接预测任务中表现出更卓越的性能。

更新时间: 2024-08-20 18:24:32

领域: cs.AI

下载: http://arxiv.org/abs/2309.11528v3

DOMBA: Double Model Balancing for Access-Controlled Language Models via Minimum-Bounded Aggregation

The utility of large language models (LLMs) depends heavily on the quality and quantity of their training data. Many organizations possess large data corpora that could be leveraged to train or fine-tune LLMs tailored to their specific needs. However, these datasets often come with access restrictions that are based on user privileges and enforced by access control mechanisms. Training LLMs on such datasets could result in exposure of sensitive information to unauthorized users. A straightforward approach for preventing such exposure is to train a separate model for each access level. This, however, may result in low utility models due to the limited amount of training data per model compared to the amount in the entire organizational corpus. Another approach is to train a single LLM on all the data while limiting the exposure of unauthorized information. However, current exposure-limiting methods for LLMs are ineffective for access-controlled data, where sensitive information appears frequently across many training examples. We propose DOMBA - double model balancing - a simple approach for training and deploying LLMs that provides high utility and access-control functionality with security guarantees. DOMBA aggregates the probability distributions of two models, each trained on documents with (potentially many) different access levels, using a "min-bounded" average function (a function that is bounded by the smaller value, e.g., harmonic mean). A detailed mathematical analysis and extensive evaluation show that DOMBA safeguards restricted information while offering utility comparable to non-secure models.

Updated: 2024-08-20 18:23:38

标题: DOMBA：通过最小有界聚合实现访问控制语言模型的双模型平衡

摘要: 大型语言模型（LLMs）的效用在很大程度上取决于其训练数据的质量和数量。许多组织拥有大量数据语料库，可以利用这些数据来训练或微调适合其特定需求的LLMs。然而，这些数据集通常带有基于用户权限的访问限制，并由访问控制机制强制执行。在这些数据集上训练LLMs可能会导致未经授权用户暴露敏感信息。防止此类暴露的一种简单方法是为每个访问级别训练一个单独的模型。然而，由于每个模型的训练数据量有限，相对于整个组织语料库中的数据量，这可能导致模型的效用较低。另一种方法是在所有数据上训练单个LLM，同时限制未经授权信息的暴露。然而，目前用于LLMs的限制信息暴露方法对于受访问控制数据限制的情况并不有效，敏感信息在许多训练示例中频繁出现。我们提出了DOMBA - 双模型平衡 - 一种简单的方法，用于训练和部署LLMs，提供高效用和访问控制功能，并具有安全保证。DOMBA聚合了两个模型的概率分布，每个模型都是在具有（可能很多）不同访问级别的文档上训练的，使用“min-bounded”平均函数（即受较小值限制的函数，例如调和平均数）。详细的数学分析和广泛的评估表明，DOMBA在提供与非安全模型可比的效用的同时，保护受限信息。

更新时间: 2024-08-20 18:23:38

领域: cs.LG,cs.AI,cs.CL,cs.CR

下载: http://arxiv.org/abs/2408.11121v1

Post-Quantum Secure UE-to-UE Communications

The rapid development of quantum computing poses a significant threat to the security of current cryptographic systems, including those used in User Equipment (UE) for mobile communications. Conventional cryptographic algorithms such as Rivest-Shamir-Adleman (RSA) and Elliptic curve cryptography (ECC) are vulnerable to quantum computing attacks, which could jeopardize the confidentiality, integrity, and availability of sensitive data transmitted by UEs. This demo paper proposes the integration of Post-Quantum Cryptography (PQC) in TLS for UE Communication to mitigate the risks of quantum attacks. We present our setup and explain each of the components used. We also provide the entire workflow of the demo for other researchers to replicate the same setup. By addressing the implementation of PQC within a 5G network to secure UE-to-UE communication, this research aims to pave the way for developing quantum-resistant mobile devices and securing the future of wireless communications.

Updated: 2024-08-20 18:19:39

标题: 后量子安全的UE到UE通信

摘要: 量子计算的快速发展对当前使用于移动通信用户设备（UE）中的加密系统安全构成重大威胁。传统的加密算法，如RSA和椭圆曲线加密（ECC），对量子计算攻击容易受到影响，可能危及UE传输的敏感数据的保密性、完整性和可用性。本文提出将后量子密码学（PQC）集成到UE通信中的TLS中，以减轻量子攻击的风险。我们展示了我们的设置，并解释了使用的每个组件。我们还提供了演示的整个工作流程，以便其他研究人员复制相同的设置。通过在5G网络中实现PQC来保护UE对UE通信，本研究旨在为开发抗量子的移动设备和保障无线通信的未来铺平道路。

更新时间: 2024-08-20 18:19:39

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2408.11117v1

Experimentation, deployment and monitoring Machine Learning models: Approaches for applying MLOps

In recent years, Data Science has become increasingly relevant as a support tool for industry, significantly enhancing decision-making in a way never seen before. In this context, the MLOps discipline emerges as a solution to automate the life cycle of Machine Learning models, ranging from experimentation to monitoring in productive environments. Research results shows MLOps is a constantly evolving discipline, with challenges and solutions for integrating development and production environments, publishing models in production environments, and monitoring models throughout the end to end development lifecycle. This paper contributes to the understanding of MLOps techniques and their most diverse applications.

Updated: 2024-08-20 18:11:17

标题: 实验、部署和监控机器学习模型：应用MLOps的方法论

摘要: 近年来，数据科学作为一种支持工具在工业领域变得越来越重要，显著提升了决策制定的效果。在这种背景下，MLOps学科作为一种解决方案出现，可以自动化机器学习模型的生命周期，从实验到在生产环境中的监控。研究结果显示，MLOps是一个不断发展的学科，面临着将开发和生产环境整合、在生产环境中发布模型以及监控整个开发生命周期中的模型等挑战和解决方案。本文有助于理解MLOps技术及其各种不同应用。

更新时间: 2024-08-20 18:11:17

领域: cs.LG

下载: http://arxiv.org/abs/2408.11112v1

ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks

The loss functions of many learning problems contain multiple additive terms that can disagree and yield conflicting update directions. For Physics-Informed Neural Networks (PINNs), loss terms on initial/boundary conditions and physics equations are particularly interesting as they are well-established as highly difficult tasks. To improve learning the challenging multi-objective task posed by PINNs, we propose the ConFIG method, which provides conflict-free updates by ensuring a positive dot product between the final update and each loss-specific gradient. It also maintains consistent optimization rates for all loss terms and dynamically adjusts gradient magnitudes based on conflict levels. We additionally leverage momentum to accelerate optimizations by alternating the back-propagation of different loss terms. The proposed method is evaluated across a range of challenging PINN scenarios, consistently showing superior performance and runtime compared to baseline methods. We also test the proposed method in a classic multi-task benchmark, where the ConFIG method likewise exhibits a highly promising performance. Source code is available at \url{https://tum-pbs.github.io/ConFIG}.

Updated: 2024-08-20 18:00:20

标题: ConFIG：朝向无冲突训练物理信息神经网络

摘要: 许多学习问题的损失函数包含多个可加的项，这些项可能不一致，导致冲突的更新方向。对于物理信息神经网络（PINNs），对初始/边界条件和物理方程的损失项特别有趣，因为它们被认为是高度困难的任务。为了改进PINNs提出的具有挑战性的多目标任务的学习，我们提出了ConFIG方法，通过确保最终更新和每个特定损失的梯度之间的正向点积，提供无冲突的更新。它还保持了所有损失项的一致优化速率，并根据冲突水平动态调整梯度大小。我们还利用动量来加速优化，通过交替反向传播不同的损失项。所提出的方法在一系列具有挑战性的PINN场景中进行评估，与基准方法相比，始终表现出优越的性能和运行时间。我们还在经典的多任务基准测试中测试了所提出的方法，其中ConFIG方法同样表现出非常有前途的性能。源代码可在\url{https://tum-pbs.github.io/ConFIG}中找到。

更新时间: 2024-08-20 18:00:20

领域: cs.LG,68T07

下载: http://arxiv.org/abs/2408.11104v1

Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors

Deep learning-based face recognition (FR) systems pose significant privacy risks by tracking users without their consent. While adversarial attacks can protect privacy, they often produce visible artifacts compromising user experience. To mitigate this issue, recent facial privacy protection approaches advocate embedding adversarial noise into the natural looking makeup styles. However, these methods require training on large-scale makeup datasets that are not always readily available. In addition, these approaches also suffer from dataset bias. For instance, training on makeup data that predominantly contains female faces could compromise protection efficacy for male faces. To handle these issues, we propose a test-time optimization approach that solely optimizes an untrained neural network to transfer makeup style from a reference to a source image in an adversarial manner. We introduce two key modules: a correspondence module that aligns regions between reference and source images in latent space, and a decoder with conditional makeup layers. The untrained decoder, optimized via carefully designed structural and makeup consistency losses, generates a protected image that resembles the source but incorporates adversarial makeup to deceive FR models. As our approach does not rely on training with makeup face datasets, it avoids potential male/female dataset biases while providing effective protection. We further extend the proposed approach to videos by leveraging on temporal correlations. Experiments on benchmark datasets demonstrate superior performance in face verification and identification tasks and effectiveness against commercial FR systems. Our code and models will be available at https://github.com/fahadshamshad/deep-facial-privacy-prior

Updated: 2024-08-20 17:59:39

标题: 通过未经训练的神经网络先验实现化妆引导的面部隐私保护

摘要: 基于深度学习的人脸识别（FR）系统通过在未经用户同意的情况下跟踪用户，提供了重大的隐私风险。虽然对抗性攻击可以保护隐私，但往往会产生可见的瑕疵，从而损害用户体验。为了缓解这一问题，最近的面部隐私保护方法提倡将对抗性噪声嵌入自然的化妆风格中。然而，这些方法需要在大规模的化妆数据集上进行训练，这种数据集并不总是容易获得的。此外，这些方法也受到数据集偏见的影响。例如，在主要包含女性面孔的化妆数据上进行训练可能会影响对男性面孔的保护效果。为了解决这些问题，我们提出了一种测试时间优化方法，仅通过优化未经训练的神经网络，以对抗性方式从参考图像向源图像传输化妆风格。我们引入了两个关键模块：一个将参考图像和源图像在潜在空间中对齐的对应模块，以及一个带有条件化妆层的解码器。通过仔细设计的结构和化妆一致性损失，优化未经训练的解码器生成一个类似源图像但包含对抗性化妆的受保护图像，以欺骗FR模型。由于我们的方法不依赖于化妆面部数据集的训练，因此避免了潜在的男性/女性数据集偏见，同时提供了有效的保护。我们进一步将所提出的方法扩展到视频领域，利用时间相关性。在基准数据集上的实验表明，在人脸验证和识别任务中表现出优越的性能，并对商业FR系统具有有效性。我们的代码和模型将在https://github.com/fahadshamshad/deep-facial-privacy-prior 上提供。

更新时间: 2024-08-20 17:59:39

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.12387v1

NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency

We propose sorting patch representations across views as a novel self-supervised learning signal to improve pretrained representations. To this end, we introduce NeCo: Patch Neighbor Consistency, a novel training loss that enforces patch-level nearest neighbor consistency across a student and teacher model, relative to reference batches. Our method leverages a differentiable sorting method applied on top of pretrained representations, such as DINOv2-registers to bootstrap the learning signal and further improve upon them. This dense post-pretraining leads to superior performance across various models and datasets, despite requiring only 19 hours on a single GPU. We demonstrate that this method generates high-quality dense feature encoders and establish several new state-of-the-art results: +5.5% and + 6% for non-parametric in-context semantic segmentation on ADE20k and Pascal VOC, and +7.2% and +5.7% for linear segmentation evaluations on COCO-Things and -Stuff.

Updated: 2024-08-20 17:58:59

标题: NeCo: 用Patch Neighbor Consistency在19 GPU小时内改进DINOv2的空间表示

摘要: 我们提出在不同视图之间对补丁表示进行排序作为一种新颖的自监督学习信号，以改进预训练表示。为此，我们引入NeCo：Patch Neighbor Consistency，这是一种新颖的训练损失，通过相对于参考批次对学生和教师模型之间的补丁级最近邻一致性进行强制，从而推动方法。我们的方法利用一种可微分的排序方法，应用于预训练表示之上，例如DINOv2-registers，以启动学习信号并进一步改进它们。这种密集的后预训练导致在各种模型和数据集上表现出更优越的性能，尽管只需在单个GPU上进行19小时的训练。我们证明该方法生成高质量的密集特征编码器，并建立了几个新的最先进结果：在ADE20k和Pascal VOC上的非参数上下文语义分割分别提高了+5.5%和+6%，在COCO-Things和-Stuff上的线性分割评估分别提高了+7.2%和+5.7%。

更新时间: 2024-08-20 17:58:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.11054v1

Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks

The application of large-language models (LLMs) to digital hardware code generation is an emerging field. Most LLMs are primarily trained on natural language and software code. Hardware code, such as Verilog, represents only a small portion of the training data and few hardware benchmarks exist. To address this gap, the open-source VerilogEval benchmark was released in 2023, providing a consistent evaluation framework for LLMs on code completion tasks. It was tested on state-of-the-art models at the time including GPT-4. However, VerilogEval and other Verilog generation benchmarks lack failure analysis and, in present form, are not conducive to exploring prompting techniques. Also, since VerilogEval's release, both commercial and open-source models have seen continued development. In this work, we evaluate new commercial and open-source models of varying sizes against an improved VerilogEval benchmark suite. We enhance VerilogEval's infrastructure and dataset by automatically classifying failures, introduce new prompts for supporting in-context learning (ICL) examples, and extend the supported tasks to specification-to-RTL translation. We find a measurable improvement in commercial state-of-the-art models, with GPT-4 Turbo achieving a 59% pass rate on spec-to-RTL tasks. We also study the performance of open-source and domain-specific models that have emerged, and demonstrate that models can benefit substantially from ICL. We find that recently-released Llama 3.1 405B achieves a pass rate of 58%, effectively matching that of GPT-4 Turbo, and that the much smaller domain-specific RTL-Coder 6.7B models achieve an impressive 37% pass rate. However, prompt engineering is key to achieving good pass rates, and varies widely with model and task. A benchmark infrastructure that allows for prompt engineering and failure analysis is key to continued model development and deployment.

Updated: 2024-08-20 17:58:56

标题: 重新审视VerilogEval：新的LLMs，上下文学习和从规范到RTL任务

摘要: 将大型语言模型（LLMs）应用于数字硬件代码生成是一个新兴领域。大多数LLMs主要是在自然语言和软件代码上进行训练的。硬件代码，如Verilog，仅占训练数据的一小部分，且很少存在硬件基准。为了填补这一空白，开源VerilogEval基准在2023年发布，为LLMs在代码补全任务上提供了一致的评估框架。当时该基准在包括GPT-4在内的最新模型上进行了测试。然而，VerilogEval和其他Verilog生成基准缺乏故障分析，并且在目前形式下不利于探索提示技术。此外，自VerilogEval发布以来，商业和开源模型均持续发展。在这项工作中，我们评估了不同规模的新商业和开源模型与改进的VerilogEval基准套件。我们通过自动分类故障来增强VerilogEval的基础设施和数据集，引入新提示以支持上下文学习（ICL）示例，并将支持的任务扩展到规范到RTL翻译。我们发现商业最先进模型的性能有了显著提升，GPT-4 Turbo在规范到RTL任务上达到了59%的通过率。我们还研究了新出现的开源和领域特定模型的性能，并证明模型可以从ICL中获益。我们发现最近发布的Llama 3.1 405B实现了58%的通过率，有效地与GPT-4 Turbo相匹配，而较小的领域特定RTL-Coder 6.7B模型实现了令人印象深刻的37%通过率。然而，提示工程是实现良好通过率的关键，并且随着模型和任务的不同而变化。一个能够进行提示工程和故障分析的基准基础设施对于持续的模型开发和部署至关重要。

更新时间: 2024-08-20 17:58:56

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2408.11053v1

Accelerating Goal-Conditioned RL Algorithms and Research

Self-supervision has the potential to transform reinforcement learning (RL), paralleling the breakthroughs it has enabled in other areas of machine learning. While self-supervised learning in other domains aims to find patterns in a fixed dataset, self-supervised goal-conditioned reinforcement learning (GCRL) agents discover new behaviors by learning from the goals achieved during unstructured interaction with the environment. However, these methods have failed to see similar success, both due to a lack of data from slow environments as well as a lack of stable algorithms. We take a step toward addressing both of these issues by releasing a high-performance codebase and benchmark JaxGCRL for self-supervised GCRL, enabling researchers to train agents for millions of environment steps in minutes on a single GPU. The key to this performance is a combination of GPU-accelerated environments and a stable, batched version of the contrastive reinforcement learning algorithm, based on an infoNCE objective, that effectively makes use of this increased data throughput. With this approach, we provide a foundation for future research in self-supervised GCRL, enabling researchers to quickly iterate on new ideas and evaluate them in a diverse set of challenging environments. Website + Code: https://github.com/MichalBortkiewicz/JaxGCRL

Updated: 2024-08-20 17:58:40

标题: 加速目标条件强化学习算法和研究

摘要: 自我监督有潜力改变强化学习（RL），类似于它在机器学习的其他领域中所实现的突破。虽然其他领域中的自监督学习旨在发现固定数据集中的模式，但自监督目标条件强化学习（GCRL）代理通过从与环境的非结构化交互中实现的目标学习，发现新的行为。然而，这些方法在成功方面未能取得类似的成就，部分原因是缓慢环境中数据的不足，另一部分原因是缺乏稳定的算法。我们通过发布高性能代码库和用于自我监督GCRL的基准JaxGCRL来解决这两个问题，使研究人员能够在单个GPU上的几分钟内训练代理进行数百万个环境步骤。这种性能的关键是GPU加速环境和一种稳定的、批处理版本的对比强化学习算法，基于infoNCE目标，有效地利用了增加的数据吞吐量。通过这种方法，我们为未来的自我监督GCRL研究提供了基础，使研究人员能够快速迭代新想法并在多样的挑战性环境中进行评估。网站+代码：https://github.com/MichalBortkiewicz/JaxGCRL

更新时间: 2024-08-20 17:58:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.11052v1

FLAME: Learning to Navigate with Multimodal LLM in Urban Environments

Large Language Models (LLMs) have demonstrated potential in Vision-and-Language Navigation (VLN) tasks, yet current applications face challenges. While LLMs excel in general conversation scenarios, they struggle with specialized navigation tasks, yielding suboptimal performance compared to specialized VLN models. We introduce FLAME (FLAMingo-Architected Embodied Agent), a novel Multimodal LLM-based agent and architecture designed for urban VLN tasks that efficiently handles multiple observations. Our approach implements a three-phase tuning technique for effective adaptation to navigation tasks, including single perception tuning for street view description, multiple perception tuning for trajectory summarization, and end-to-end training on VLN datasets. The augmented datasets are synthesized automatically. Experimental results demonstrate FLAME's superiority over existing methods, surpassing state-of-the-art methods by a 7.3% increase in task completion rate on Touchdown dataset. This work showcases the potential of Multimodal LLMs (MLLMs) in complex navigation tasks, representing an advancement towards practical applications of MLLMs in embodied AI. Project page: https://flame-sjtu.github.io

Updated: 2024-08-20 17:57:46

标题: FLAME：在城市环境中学习多模态LLM导航

摘要: 大型语言模型（LLMs）在视觉与语言导航（VLN）任务中展现出潜力，但当前的应用面临挑战。虽然LLMs在一般对话情景中表现出色，但它们在专业导航任务中表现不佳，与专门的VLN模型相比性能不佳。我们介绍了FLAME（FLAMingo-Architected Embodied Agent），这是一种基于新型多模式LLM的代理和架构，专为城市VLN任务设计，能够高效处理多个观察。我们的方法实现了一种三阶段调整技术，以有效适应导航任务，包括为街景描述进行单一感知调整，为轨迹总结进行多重感知调整，以及在VLN数据集上进行端到端训练。增强型数据集是自动合成的。实验结果显示，FLAME优于现有方法，在Touchdown数据集上的任务完成率提高了7.3％，超过了现有技术的水平。这项工作展示了多模式LLMs（MLLMs）在复杂导航任务中的潜力，代表了在具体的人工智能应用中推进MLLMs的进步。项目页面：https://flame-sjtu.github.io

更新时间: 2024-08-20 17:57:46

领域: cs.CV,cs.AI,cs.CL,cs.RO

下载: http://arxiv.org/abs/2408.11051v1

RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands

It has been a long-standing research goal to endow robot hands with human-level dexterity. Bi-manual robot piano playing constitutes a task that combines challenges from dynamic tasks, such as generating fast while precise motions, with slower but contact-rich manipulation problems. Although reinforcement learning based approaches have shown promising results in single-task performance, these methods struggle in a multi-song setting. Our work aims to close this gap and, thereby, enable imitation learning approaches for robot piano playing at scale. To this end, we introduce the Robot Piano 1 Million (RP1M) dataset, containing bi-manual robot piano playing motion data of more than one million trajectories. We formulate finger placements as an optimal transport problem, thus, enabling automatic annotation of vast amounts of unlabeled songs. Benchmarking existing imitation learning approaches shows that such approaches reach state-of-the-art robot piano playing performance by leveraging RP1M.

Updated: 2024-08-20 17:56:52

标题: RP1M: 一个用于双手灵巧机器人演奏钢琴的大规模运动数据集

摘要: 长期以来，赋予机器手人类水平的灵巧性一直是一个研究目标。双手机器人演奏钢琴构成了一个任务，结合了动态任务的挑战，例如生成快速而精确的动作，以及较慢但富有接触的操作问题。虽然基于强化学习的方法在单一任务表现方面表现出有希望的结果，但这些方法在多首歌曲的设置中表现不佳。我们的工作旨在弥补这一差距，并因此使得机器人钢琴演奏的模仿学习方法能够大规模应用。为此，我们引入了Robot Piano 1 Million（RP1M）数据集，包含了超过一百万条双手机器人钢琴演奏运动数据轨迹。我们将手指放置形式化为一个最优输运问题，从而实现对大量未标记歌曲的自动注释。对现有的模仿学习方法进行基准测试表明，通过利用RP1M，这些方法可以实现最先进的机器人钢琴演奏表现。

更新时间: 2024-08-20 17:56:52

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.11048v1

Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles

We consider stochastic optimization when one only has access to biased stochastic oracles of the objective and the gradient, and obtaining stochastic gradients with low biases comes at high costs. This setting captures various optimization paradigms, such as conditional stochastic optimization, distributionally robust optimization, shortfall risk optimization, and machine learning paradigms, such as contrastive learning. We examine a family of multi-level Monte Carlo (MLMC) gradient methods that exploit a delicate tradeoff among bias, variance, and oracle cost. We systematically study their total sample and computational complexities for strongly convex, convex, and nonconvex objectives and demonstrate their superiority over the widely used biased stochastic gradient method. When combined with the variance reduction techniques like SPIDER, these MLMC gradient methods can further reduce the complexity in the nonconvex regime. Our results imply that a series of stochastic optimization problems with biased oracles, previously considered to be more challenging, is fundamentally no harder than the classical stochastic optimization with unbiased oracles. We also delineate the boundary conditions under which these problems become more difficult. Moreover, MLMC gradient methods significantly improve the best-known complexities in the literature for conditional stochastic optimization and shortfall risk optimization. Our extensive numerical experiments on distributionally robust optimization, pricing and staffing scheduling problems, and contrastive learning demonstrate the superior performance of MLMC gradient methods.

Updated: 2024-08-20 17:56:16

标题: 使用有偏向的神谕进行随机优化的多层蒙特卡洛梯度方法

摘要: 我们考虑在只有偏置随机目标和梯度预言的情况下的随机优化，并且获得具有低偏差的随机梯度需要付出很高的成本。这种设置涵盖了各种优化范式，比如条件随机优化，分布鲁棒优化，缺口风险优化，以及机器学习范式，比如对比学习。我们研究了一族多层蒙特卡洛（MLMC）梯度方法，利用偏差、方差和预言成本之间微妙的权衡。我们系统地研究了它们在强凸、凸和非凸目标上的总样本和计算复杂性，并展示了它们相对于广泛使用的有偏随机梯度方法的优越性。当与像SPIDER这样的方差减少技术相结合时，这些MLMC梯度方法可以进一步减少在非凸区域的复杂性。我们的结果表明，一系列具有偏置预言的随机优化问题，先前被认为更具挑战性，从根本上来说并不比具有无偏预言的经典随机优化更难。我们还勾勒了这些问题变得更困难的边界条件。此外，MLMC梯度方法显著提高了文献中对条件随机优化和缺口风险优化的最佳已知复杂性。我们在分布鲁棒优化、定价和人员安排问题以及对比学习上进行了大量的数值实验，展示了MLMC梯度方法的卓越性能。

更新时间: 2024-08-20 17:56:16

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2408.11084v1

What is in Your Safe Data? Identifying Benign Data that Breaks Safety

Current Large Language Models (LLMs), even those tuned for safety and alignment, are susceptible to jailbreaking. Some have found that just further fine-tuning an aligned model with benign data (i.e., data without harmful content) surprisingly leads to substantial degradation in safety. We delve into the data-centric aspects of why benign fine-tuning inadvertently contributes to jailbreaking. First, we represent fine-tuning data through two lenses: representation and gradient spaces. Additionally, we propose a bi-directional anchoring method that, during the selection process, prioritizes data points that are close to harmful examples and far from benign ones. Our approach effectively identifies subsets of benign data that are more likely to degrade the model's safety after fine-tuning. Training on just 100 of these seemingly benign datapoints surprisingly leads to the fine-tuned model affirmatively responding to >70% of tested harmful requests, compared to <20% after fine-tuning on randomly selected data. We also observe that the selected data frequently appear as lists, bullet points, or math questions, indicating a systematic pattern in fine-tuning data that contributes to jailbreaking.

Updated: 2024-08-20 17:54:08

标题: 你的安全数据中有什么？识别破坏安全的良性数据

摘要: 目前的大型语言模型（LLM），即使经过安全性和对齐性调整，也容易被越狱。一些人发现，仅仅通过用良性数据（即没有有害内容的数据）进一步微调对齐模型，令人惊讶地导致安全性显著降低。我们深入探讨了为什么良性微调不经意间会导致越狱的数据中心方面。首先，我们通过两个视角来表示微调数据：表示和梯度空间。此外，我们提出了一种双向锚定方法，该方法在选择过程中优先考虑那些接近有害示例而远离良性示例的数据点。我们的方法有效地识别了在微调后更有可能降低模型安全性的良性数据子集。令人惊讶的是，仅仅在这些看似良性的100个数据点上训练，就导致微调模型对>70%的测试有害请求做出积极回应，而在随机选择数据后微调后，这一比例不到20%。我们还观察到，所选择的数据经常以列表、项目符号或数学问题的形式出现，表明微调数据中存在一种系统模式，这有助于越狱。

更新时间: 2024-08-20 17:54:08

领域: cs.LG,cs.AI,cs.CL,cs.CR

下载: http://arxiv.org/abs/2404.01099v2

Unified Domain Adaptive Semantic Segmentation

Unsupervised Domain Adaptive Semantic Segmentation (UDA-SS) aims to transfer the supervision from a labeled source domain to an unlabeled target domain. The majority of existing UDA-SS works typically consider images whilst recent attempts have extended further to tackle videos by modeling the temporal dimension. Although the two lines of research share the major challenges -- overcoming the underlying domain distribution shift, their studies are largely independent, resulting in fragmented insights, a lack of holistic understanding, and missed opportunities for cross-pollination of ideas. This fragmentation prevents the unification of methods, leading to redundant efforts and suboptimal knowledge transfer across image and video domains. Under this observation, we advocate unifying the study of UDA-SS across video and image scenarios, enabling a more comprehensive understanding, synergistic advancements, and efficient knowledge sharing. To that end, we explore the unified UDA-SS from a general data augmentation perspective, serving as a unifying conceptual framework, enabling improved generalization, and potential for cross-pollination of ideas, ultimately contributing to the overall progress and practical impact of this field of research. Specifically, we propose a Quad-directional Mixup (QuadMix) method, characterized by tackling distinct point attributes and feature inconsistencies through four-directional paths for intra- and inter-domain mixing in a feature space. To deal with temporal shifts with videos, we incorporate optical flow-guided feature aggregation across spatial and temporal dimensions for fine-grained domain alignment. Extensive experiments show that our method outperforms the state-of-the-art works by large margins on four challenging UDA-SS benchmarks. Our source code and models will be released at \url{https://github.com/ZHE-SAPI/UDASS}.

Updated: 2024-08-20 17:53:39

标题: 统一域自适应语义分割

摘要: 无监督领域自适应语义分割（UDA-SS）旨在将监督从标记的源领域转移到未标记的目标领域。现有大多数UDA-SS工作通常考虑图像，而最近的尝试进一步扩展到通过建模时间维度来处理视频。尽管这两条研究线共享主要挑战--克服潜在的领域分布转移，但他们的研究很大程度上是独立的，导致了碎片化的见解，缺乏整体理解，并错失了在图像和视频领域交叉污染思想的机会。这种碎片化阻止了方法的统一，导致了冗余的努力和跨图像和视频领域的知识传输的不完美。基于这一观察，我们主张在视频和图像场景之间统一研究UDA-SS，实现更全面的理解，协同推进和有效的知识共享。为此，我们从一般数据增强的角度探索统一的UDA-SS，作为一个统一的概念框架，实现了更好的泛化能力和思想交叉污染的潜力，最终为这一研究领域的整体进展和实际影响做出贡献。具体来说，我们提出了一个四向混合（QuadMix）方法，通过四个方向路径在特征空间内和跨领域混合来解决不同点属性和特征不一致性。为了处理视频中的时间转移，我们将光流引导的特征聚合融入到空间和时间维度，进行细粒度的领域对齐。大量实验证明我们的方法在四个具有挑战性的UDA-SS基准测试中明显优于最先进的工作。我们的源代码和模型将在\url{https://github.com/ZHE-SAPI/UDASS}发布。

更新时间: 2024-08-20 17:53:39

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2311.13254v2

Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research

Qualitative data collection and analysis approaches, such as those employing interviews and focus groups, provide rich insights into customer attitudes, sentiment, and behavior. However, manually analyzing qualitative data requires extensive time and effort to identify relevant topics and thematic insights. This study proposes a novel approach to address this challenge by leveraging Retrieval Augmented Generation (RAG) based Large Language Models (LLMs) for analyzing interview transcripts. The novelty of this work lies in strategizing the research inquiry as one that is augmented by an LLM that serves as a novice research assistant. This research explores the mental model of LLMs to serve as novice qualitative research assistants for researchers in the talent management space. A RAG-based LLM approach is extended to enable topic modeling of semi-structured interview data, showcasing the versatility of these models beyond their traditional use in information retrieval and search. Our findings demonstrate that the LLM-augmented RAG approach can successfully extract topics of interest, with significant coverage compared to manually generated topics from the same dataset. This establishes the viability of employing LLMs as novice qualitative research assistants. Additionally, the study recommends that researchers leveraging such models lean heavily on quality criteria used in traditional qualitative research to ensure rigor and trustworthiness of their approach. Finally, the paper presents key recommendations for industry practitioners seeking to reconcile the use of LLMs with established qualitative research paradigms, providing a roadmap for the effective integration of these powerful, albeit novice, AI tools in the analysis of qualitative datasets within talent

Updated: 2024-08-20 17:49:51

标题: 调和方法论范式：将大型语言模型作为人才管理研究中的初级定性研究助手

摘要: 管理领域。质性数据收集和分析方法，如采用访谈和焦点小组，能够提供关于客户态度、情感和行为的丰富见解。然而，手动分析质性数据需要大量时间和精力来识别相关主题和主题洞见。本研究提出了一种新颖的方法来解决这一挑战，即利用基于检索增强生成（RAG）的大型语言模型（LLMs）来分析访谈记录。这项工作的创新之处在于将研究调查策略化为由LLM充当初学者研究助手的方式。该研究探讨了LLMs作为初学者质性研究助手为人才管理领域研究人员服务的思维模型。RAG基于LLM方法被扩展以实现半结构化访谈数据的主题建模，展示了这些模型在信息检索和搜索以外的传统用途方面的多样性。我们的研究结果表明，LLM增强的RAG方法能够成功提取感兴趣的主题，并与手动生成的相同数据集的主题相比具有显著的覆盖范围。这确立了利用LLMs作为初学者质性研究助手的可行性。此外，研究建议利用这些模型的研究人员在保证方法的严谨性和可靠性方面倾向于传统质性研究中使用的质量标准。最后，本文提出了行业从业者寻求将LLMs与已建立的质性研究范式协调使用的关键建议，为有效整合这些强大但初学者的人工智能工具在人才管理领域的质性数据分析中提供了一条路线图。

更新时间: 2024-08-20 17:49:51

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2408.11043v1

GraphFSA: A Finite State Automaton Framework for Algorithmic Learning on Graphs

Many graph algorithms can be viewed as sets of rules that are iteratively applied, with the number of iterations dependent on the size and complexity of the input graph. Existing machine learning architectures often struggle to represent these algorithmic decisions as discrete state transitions. Therefore, we propose a novel framework: GraphFSA (Graph Finite State Automaton). GraphFSA is designed to learn a finite state automaton that runs on each node of a given graph. We test GraphFSA on cellular automata problems, showcasing its abilities in a straightforward algorithmic setting. For a comprehensive empirical evaluation of our framework, we create a diverse range of synthetic problems. As our main application, we then focus on learning more elaborate graph algorithms. Our findings suggest that GraphFSA exhibits strong generalization and extrapolation abilities, presenting an alternative approach to represent these algorithms.

Updated: 2024-08-20 17:49:47

标题: GraphFSA：用于图上算法学习的有限状态自动机框架

摘要: 许多图算法可以被视为一组规则，这些规则被迭代地应用，迭代次数取决于输入图的大小和复杂性。现有的机器学习架构通常难以将这些算法决策表示为离散状态转换。因此，我们提出了一个新颖的框架：GraphFSA（图有限状态自动机）。GraphFSA旨在学习在给定图的每个节点上运行的有限状态自动机。我们在细胞自动机问题上测试了GraphFSA，展示了其在一个简单的算法设置中的能力。为了全面评估我们的框架，我们创建了各种合成问题。作为我们的主要应用，我们专注于学习更复杂的图算法。我们的研究结果表明，GraphFSA具有强大的泛化和外推能力，提供了一个代表这些算法的替代方法。

更新时间: 2024-08-20 17:49:47

领域: cs.AI

下载: http://arxiv.org/abs/2408.11042v1

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. We pretrain multiple Transfusion models up to 7B parameters from scratch on a mixture of text and image data, establishing scaling laws with respect to a variety of uni- and cross-modal benchmarks. Our experiments show that Transfusion scales significantly better than quantizing images and training a language model over discrete image tokens. By introducing modality-specific encoding and decoding layers, we can further improve the performance of Transfusion models, and even compress each image to just 16 patches. We further demonstrate that scaling our Transfusion recipe to 7B parameters and 2T multi-modal tokens produces a model that can generate images and text on a par with similar scale diffusion models and language models, reaping the benefits of both worlds.

Updated: 2024-08-20 17:48:20

标题: 输血：预测下一个标记并使用一个多模型模型扩散图像

摘要: 我们介绍了Transfusion，这是一种用于训练多模态模型的配方，可以处理离散和连续数据。Transfusion将语言建模损失函数（下一个标记预测）与扩散结合起来，训练一个混合模态序列上的单个Transformer。我们从头开始预训练多个Transfusion模型，总参数量高达7B，使用文本和图像数据的混合，建立了与各种单模态和交叉模态基准的扩展规律。我们的实验表明，Transfusion比对图像进行量化并训练一个语言模型的性能要好得多。通过引入特定模态的编码和解码层，我们可以进一步提高Transfusion模型的性能，甚至将每个图像压缩到仅16个补丁。我们进一步展示，将我们的Transfusion配方扩展到7B参数和2T多模态标记，可以产生与类似规模的扩散模型和语言模型相媲美的图像和文本生成模型，充分利用了两个世界的优势。

更新时间: 2024-08-20 17:48:20

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.11039v1

Beyond Labels: Aligning Large Language Models with Human-like Reasoning

Aligning large language models (LLMs) with a human reasoning approach ensures that LLMs produce morally correct and human-like decisions. Ethical concerns are raised because current models are prone to generating false positives and providing malicious responses. To contribute to this issue, we have curated an ethics dataset named Dataset for Aligning Reasons (DFAR), designed to aid in aligning language models to generate human-like reasons. The dataset comprises statements with ethical-unethical labels and their corresponding reasons. In this study, we employed a unique and novel fine-tuning approach that utilizes ethics labels and their corresponding reasons (L+R), in contrast to the existing fine-tuning approach that only uses labels (L). The original pre-trained versions, the existing fine-tuned versions, and our proposed fine-tuned versions of LLMs were then evaluated on an ethical-unethical classification task and a reason-generation task. Our proposed fine-tuning strategy notably outperforms the others in both tasks, achieving significantly higher accuracy scores in the classification task and lower misalignment rates in the reason-generation task. The increase in classification accuracies and decrease in misalignment rates indicate that the L+R fine-tuned models align more with human ethics. Hence, this study illustrates that injecting reasons has substantially improved the alignment of LLMs, resulting in more human-like responses. We have made the DFAR dataset and corresponding codes publicly available at https://github.com/apurba-nsu-rnd-lab/DFAR.

Updated: 2024-08-20 17:44:51

标题: 超越标签：将大型语言模型与类人推理对齐

摘要: 将大型语言模型（LLMs）与人类推理方法对齐，确保LLMs产生道德正确和人类化的决策。当前模型容易产生错误的阳性结果并提供恶意回应，引发了伦理关切。为了解决这个问题，我们创建了一个名为“对齐原因数据集”（DFAR）的伦理数据集，旨在帮助对齐语言模型以生成人类化的原因。该数据集包括具有伦理-非伦理标签及其对应原因的陈述。在本研究中，我们采用了一种独特和新颖的微调方法，利用伦理标签及其对应原因（L+R），与现有的只使用标签（L）的微调方法形成对比。然后，在伦理-非伦理分类任务和原因生成任务上对LLMs的原始预训练版本、现有微调版本和我们提出的微调版本进行评估。我们提出的微调策略在两项任务中明显优于其他方法，在分类任务中实现了显著更高的准确度分数，并在原因生成任务中实现了更低的失调率。分类准确率的提高和失调率的降低表明，L+R微调模型更符合人类伦理。因此，这项研究表明注入原因显著改善了LLMs的对齐，导致更人类化的回应。我们已经将DFAR数据集和相应的代码公开发布在https://github.com/apurba-nsu-rnd-lab/DFAR。

更新时间: 2024-08-20 17:44:51

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.11879v1

Atmospheric Transport Modeling of CO$_2$ with Neural Networks

Accurately describing the distribution of CO$_2$ in the atmosphere with atmospheric tracer transport models is essential for greenhouse gas monitoring and verification support systems to aid implementation of international climate agreements. Large deep neural networks are poised to revolutionize weather prediction, which requires 3D modeling of the atmosphere. While similar in this regard, atmospheric transport modeling is subject to new challenges. Both, stable predictions for longer time horizons and mass conservation throughout need to be achieved, while IO plays a larger role compared to computational costs. In this study we explore four different deep neural networks (UNet, GraphCast, Spherical Fourier Neural Operator and SwinTransformer) which have proven as state-of-the-art in weather prediction to assess their usefulness for atmospheric tracer transport modeling. For this, we assemble the CarbonBench dataset, a systematic benchmark tailored for machine learning emulators of Eulerian atmospheric transport. Through architectural adjustments, we decouple the performance of our emulators from the distribution shift caused by a steady rise in atmospheric CO$_2$. More specifically, we center CO$_2$ input fields to zero mean and then use an explicit flux scheme and a mass fixer to assure mass balance. This design enables stable and mass conserving transport for over 6 months with all four neural network architectures. In our study, the SwinTransformer displays particularly strong emulation skill (90-day $R^2 > 0.99$), with physically plausible emulation even for forward runs of multiple years. This work paves the way forward towards high resolution forward and inverse modeling of inert trace gases with neural networks.

Updated: 2024-08-20 17:33:20

标题: 大气CO$_2$输送模型与神经网络

摘要: 准确描述大气中CO$_2$的分布对温室气体监测和验证支持系统至关重要，以帮助实施国际气候协议。大型深度神经网络有望彻底改变天气预测，这需要对大气进行三维建模。尽管在这方面类似，大气传输建模面临新的挑战。在长期时间范围内实现稳定的预测和质量守恒是必须实现的目标，同时IO在计算成本方面扮演了更重要的角色。在本研究中，我们探讨了四种不同的深度神经网络（UNet、GraphCast、球面傅立叶神经算子和SwinTransformer），这些网络在天气预测中已被证明是最先进的，以评估它们在大气示踪物传输建模中的实用性。为此，我们汇编了CarbonBench数据集，这是一个专为欧拉大气传输机器学习仿真器量身定制的系统基准。通过架构调整，我们将仿真器的性能与由大气CO$_2$稳步上升引起的分布变化解耦。更具体地，我们将CO$_2$输入字段置于零均值，然后使用显式通量方案和质量修正器来确保质量平衡。这种设计使得所有四种神经网络架构能够实现稳定和质量守恒的传输达到6个多月。在我们的研究中，SwinTransformer展现出特别强大的仿真技能（90天$R^2 > 0.99$），即使是多年的正向运行也能实现物理上可信的仿真。这项工作为利用神经网络进行惰性痕迹气体的高分辨率正向和反向建模铺平了道路。

更新时间: 2024-08-20 17:33:20

领域: cs.LG,cs.CV,physics.ao-ph

下载: http://arxiv.org/abs/2408.11032v1

Learning Realistic Joint Space Boundaries for Range of Motion Analysis of Healthy and Impaired Human Arms

A realistic human kinematic model that satisfies anatomical constraints is essential for human-robot interaction, biomechanics and robot-assisted rehabilitation. Modeling realistic joint constraints, however, is challenging as human arm motion is constrained by joint limits, inter- and intra-joint dependencies, self-collisions, individual capabilities and muscular or neurological constraints which are difficult to represent. Hence, physicians and researchers have relied on simple box-constraints, ignoring important anatomical factors. In this paper, we propose a data-driven method to learn realistic anatomically constrained upper-limb range of motion (RoM) boundaries from motion capture data. This is achieved by fitting a one-class support vector machine to a dataset of upper-limb joint space exploration motions with an efficient hyper-parameter tuning scheme. Our approach outperforms similar works focused on valid RoM learning. Further, we propose an impairment index (II) metric that offers a quantitative assessment of capability/impairment when comparing healthy and impaired arms. We validate the metric on healthy subjects physically constrained to emulate hemiplegia and different disability levels as stroke patients.

Updated: 2024-08-20 17:21:43

标题: 学习健康和受损人类手臂活动范围分析的真实关节空间边界

摘要: 一个符合解剖学约束的真实人体运动模型对于人机交互、生物力学和机器人辅助康复至关重要。然而，建模真实的关节约束是具有挑战性的，因为人体手臂运动受到关节限制、关节间和关节内依赖、自身碰撞、个体能力以及肌肉或神经约束的限制，这些难以表示。因此，医生和研究人员一直依赖简单的盒式约束，忽视重要的解剖因素。在本文中，我们提出了一种基于数据驱动的方法，从动作捕捉数据中学习真实的解剖约束上肢运动范围（RoM）边界。这是通过将一个单类支持向量机拟合到一个上肢关节空间探索动作数据集中，利用高效的超参数调整方案实现的。我们的方法优于专注于有效RoM学习的类似作品。此外，我们提出了一种损伤指数（II）度量，提供了对健康和受损手臂进行比较时能力/损伤的定量评估。我们在健康受试者上验证了该度量，他们被物理上限制以模拟偏瘫和不同残疾程度的中风患者。

更新时间: 2024-08-20 17:21:43

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2311.10653v2

Athena: Safe Autonomous Agents with Verbal Contrastive Learning

Due to emergent capabilities, large language models (LLMs) have been utilized as language-based agents to perform a variety of tasks and make decisions with an increasing degree of autonomy. These autonomous agents can understand high-level instructions, interact with their environments, and execute complex tasks using a selection of tools available to them. As the capabilities of the agents expand, ensuring their safety and trustworthiness becomes more imperative. In this study, we introduce the Athena framework which leverages the concept of verbal contrastive learning where past safe and unsafe trajectories are used as in-context (contrastive) examples to guide the agent towards safety while fulfilling a given task. The framework also incorporates a critiquing mechanism to guide the agent to prevent risky actions at every step. Furthermore, due to the lack of existing benchmarks on the safety reasoning ability of LLM-based agents, we curate a set of 80 toolkits across 8 categories with 180 scenarios to provide a safety evaluation benchmark. Our experimental evaluation, with both closed- and open-source LLMs, indicates verbal contrastive learning and interaction-level critiquing improve the safety rate significantly.

Updated: 2024-08-20 17:21:10

标题: 雅典娜：具有口头对比学习的安全自主代理

摘要: 由于新兴的能力，大型语言模型（LLMs）已被用作基于语言的代理，执行各种任务并以越来越自主的方式做出决策。这些自主代理能够理解高级指令，与环境互动，并利用可用的工具执行复杂任务。随着代理的能力扩展，确保它们的安全性和可信度变得更加迫切。在这项研究中，我们介绍了Athena框架，该框架利用了口头对照学习的概念，过去的安全和不安全轨迹被用作上下文（对照）示例，引导代理朝向安全性同时完成给定任务。该框架还包括批评机制，引导代理在每一步阻止风险行为。此外，由于缺乏关于基于LLM代理的安全推理能力的现有基准，我们策划了一个包含180个场景的80个工具包的集合，以提供安全评估基准。我们的实验评估结果显示，口头对照学习和交互级别的批评显著提高了安全率，无论是闭源还是开源的LLMs。

更新时间: 2024-08-20 17:21:10

领域: cs.CL,cs.AI,cs.MA

下载: http://arxiv.org/abs/2408.11021v1

An Overlooked Role of Context-Sensitive Dendrites

To date, most dendritic studies have predominantly focused on the apical zone of pyramidal two-point neurons (TPNs) receiving only feedback (FB) connections from higher perceptual layers and using them for learning. Recent cellular neurophysiology and computational neuroscience studies suggests that the apical input (context), coming from feedback and lateral connections, is multifaceted and far more diverse, with greater implications for ongoing learning and processing in the brain than previously realized. In addition to the FB, the apical tuft receives signals from neighboring cells of the same network as proximal (P) context, other parts of the brain as distal (D) context, and overall coherent information across the network as universal (U) context. The integrated context (C) amplifies and suppresses the transmission of coherent and conflicting feedforward (FF) signals, respectively. Specifically, we show that complex context-sensitive (CS)-TPNs flexibly integrate C moment-by-moment with the FF somatic current at the soma such that the somatic current is amplified when both feedforward (FF) and C are coherent; otherwise, it is attenuated. This generates the event only when the FF and C currents are coherent, which is then translated into a singlet or a burst based on the FB information. Spiking simulation results show that this flexible integration of somatic and contextual currents enables the propagation of more coherent signals (bursts), making learning faster with fewer neurons. Similar behavior is observed when this functioning is used in conventional artificial networks, where orders of magnitude fewer neurons are required to process vast amounts of heterogeneous real-world audio-visual (AV) data trained using backpropagation (BP). The computational findings presented here demonstrate the universality of CS-TPNs, suggesting a dendritic narrative that was previously overlooked.

Updated: 2024-08-20 17:18:54

标题: 一个被忽视的角色：上下文敏感的树突

摘要: 到目前为止，大多数树突状研究主要集中在金字塔形两点神经元（TPNs）的顶端区域上，这些神经元仅接收来自更高感知层的反馈（FB）连接并将其用于学习。最近的细胞神经生理学和计算神经科学研究表明，来自反馈和侧连接的顶端输入（背景）是多面的，并且比以往意识到的对大脑中正在进行的学习和处理具有更大的影响。除了FB外，顶端丛毛还接收来自同一网络的相邻细胞的信号作为近端（P）背景，来自大脑其他部分的信号作为远端（D）背景，以及整个网络的一致信息作为通用（U）背景。综合背景（C）分别放大和抑制一致和冲突的前馈（FF）信号的传输。具体而言，我们发现复杂的背景敏感（CS）-TPNs灵活地将C与FF体细胞电流逐时集成在细胞体上，使得当FF和C一致时细胞体电流被放大；否则，被衰减。仅当FF和C电流一致时才产生事件，然后根据FB信息将其转化为单个或突发。尖峰仿真结果表明，这种对体细胞和背景电流的灵活集成使得更一致的信号（突发）的传播更快，需要更少的神经元进行学习。当这种功能用于传统的人工网络中时，观察到类似的行为，仅需数量级较少的神经元即可处理大量经过反向传播（BP）训练的异质现实世界音视频（AV）数据。本文介绍的计算结果展示了CS-TPNs的普遍性，暗示了以前被忽视的树突叙事。

更新时间: 2024-08-20 17:18:54

领域: q-bio.NC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.11019v1

Multiwinner Temporal Voting with Aversion to Change

We study two-stage committee elections where voters have dynamic preferences over candidates; at each stage, a committee is chosen under a given voting rule. We are interested in identifying a winning committee for the second stage that overlaps as much as possible with the first-stage committee. We show a full complexity dichotomy for the class of Thiele rules: this problem is tractable for Approval Voting (AV) and hard for all other Thiele rules (including, in particular, Proportional Approval Voting and the Chamberlin-Courant rule). We extend this dichotomy to the greedy variants of Thiele rules. We also explore this problem from a parameterized complexity perspective for several natural parameters. We complement the theory with experimental analysis: e.g., we investigate the average number of changes in the committee as a function of changes in voters' preferences and the role of ties.

Updated: 2024-08-20 17:16:54

标题: 多赢家暂时投票与对变更的厌恶

摘要: 我们研究了选民对候选人具有动态偏好的两阶段委员会选举；在每个阶段，根据给定的投票规则选择一个委员会。我们感兴趣的是确定第二阶段的获胜委员会，尽可能与第一阶段的委员会重叠。我们对Thiele规则类别展示了一个完整的复杂性二分：对于批准投票（AV）这个问题是可处理的，对于所有其他Thiele规则（包括特别是比例批准投票和Chamberlin-Courant规则）是困难的。我们将这一二分扩展到Thiele规则的贪婪变体。我们还从参数化复杂性的角度探讨了几个自然参数的这个问题。我们通过实验分析来补充理论：例如，我们调查了委员会中变化的平均数量作为选民偏好变化的函数，以及平局的作用。

更新时间: 2024-08-20 17:16:54

领域: cs.GT,cs.AI,cs.CC

下载: http://arxiv.org/abs/2408.11017v1

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

The causal capabilities of large language models (LLMs) are a matter of significant debate, with critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. We conduct a "behavorial" study of LLMs to benchmark their capability in generating causal arguments. Across a wide range of tasks, we find that LLMs can generate text corresponding to correct causal arguments with high probability, surpassing the best-performing existing methods. Algorithms based on GPT-3.5 and 4 outperform existing algorithms on a pairwise causal discovery task (97%, 13 points gain), counterfactual reasoning task (92%, 20 points gain) and event causality (86% accuracy in determining necessary and sufficient causes in vignettes). We perform robustness checks across tasks and show that the capabilities cannot be explained by dataset memorization alone, especially since LLMs generalize to novel datasets that were created after the training cutoff date. That said, LLMs exhibit unpredictable failure modes, and we discuss the kinds of errors that may be improved and what are the fundamental limits of LLM-based answers. Overall, by operating on the text metadata, LLMs bring capabilities so far understood to be restricted to humans, such as using collected knowledge to generate causal graphs or identifying background causal context from natural language. As a result, LLMs may be used by human domain experts to save effort in setting up a causal analysis, one of the biggest impediments to the widespread adoption of causal methods. Given that LLMs ignore the actual data, our results also point to a fruitful research direction of developing algorithms that combine LLMs with existing causal techniques. Code and datasets are available at https://github.com/py-why/pywhy-llm.

Updated: 2024-08-20 17:16:20

标题: 因果推理与大型语言模型：为因果关系打开新的领域

摘要: 大语言模型（LLM）的因果能力是一个备受争议的问题，对LLM在医学、科学、法律和政策等社会影响领域的应用有重要影响。我们进行了一项“行为”研究，对LLM在生成因果论证能力方面进行了基准测试。在广泛的任务范围内，我们发现LLM可以以高概率生成与正确因果论证相对应的文本，超过了现有表现最好的方法。基于GPT-3.5和4的算法在成对因果发现任务（97％，提高13点）、反事实推理任务（92％，提高20点）和事件因果关系（86％准确确定小故事中必要和充分原因）方面优于现有算法。我们在各项任务上进行了鲁棒性检查，并显示这些能力不能仅通过数据集记忆来解释，特别是因为LLM可以推广到在训练截止日期后创建的新数据集。尽管如此，LLM表现出难以预测的失效模式，我们讨论了可能改进的错误类型以及基于LLM的答案的基本限制。总的来说，通过对文本元数据进行操作，LLM具有到目前为止被认为仅限于人类的能力，例如利用收集的知识生成因果图或从自然语言中识别背景因果上下文。因此，LLM可以被人类领域专家用来节省设置因果分析的工作量，这是导致因果方法广泛采用的最大障碍之一。鉴于LLM忽略实际数据，我们的结果还指向一个有益的研究方向，即开发将LLM与现有因果技术结合的算法。代码和数据集可在https://github.com/py-why/pywhy-llm 上找到。

更新时间: 2024-08-20 17:16:20

领域: cs.AI,cs.CL,cs.CY,cs.HC,cs.LG,stat.ME

下载: http://arxiv.org/abs/2305.00050v3

Disparate Impact on Group Accuracy of Linearization for Private Inference

Ensuring privacy-preserving inference on cryptographically secure data is a well-known computational challenge. To alleviate the bottleneck of costly cryptographic computations in non-linear activations, recent methods have suggested linearizing a targeted portion of these activations in neural networks. This technique results in significantly reduced runtimes with often negligible impacts on accuracy. In this paper, we demonstrate that such computational benefits may lead to increased fairness costs. Specifically, we find that reducing the number of ReLU activations disproportionately decreases the accuracy for minority groups compared to majority groups. To explain these observations, we provide a mathematical interpretation under restricted assumptions about the nature of the decision boundary, while also showing the prevalence of this problem across widely used datasets and architectures. Finally, we show how a simple procedure altering the fine-tuning step for linearized models can serve as an effective mitigation strategy.

Updated: 2024-08-20 17:08:53

标题: 线性化对私人推断的群体准确性的不均衡影响

摘要: 确保在具有密码安全性的数据上进行隐私保护推断是一个众所周知的计算挑战。为了缓解非线性激活中昂贵的加密计算的瓶颈，最近的方法建议在线性化神经网络中的目标部分激活。这种技术导致运行时间显著减少，通常对准确性的影响可以忽略不计。在本文中，我们证明这种计算优势可能导致公平成本增加。具体来说，我们发现减少ReLU激活的数量会不成比例地降低少数群体的准确性，而相对于多数群体。为了解释这些观察结果，我们在对决策边界性质做出限制性假设的情况下提供了数学解释，并展示了这个问题在广泛使用的数据集和架构中的普遍存在。最后，我们展示了如何通过简单的程序修改线性化模型的微调步骤来作为有效的缓解策略。

更新时间: 2024-08-20 17:08:53

领域: cs.LG,cs.CR,cs.CY

下载: http://arxiv.org/abs/2402.03629v3

Does GPT Really Get It? A Hierarchical Scale to Quantify Human vs AI's Understanding of Algorithms

As Large Language Models (LLMs) perform (and sometimes excel at) more and more complex cognitive tasks, a natural question is whether AI really understands. The study of understanding in LLMs is in its infancy, and the community has yet to incorporate well-trodden research in philosophy, psychology, and education. We initiate this, specifically focusing on understanding algorithms, and propose a hierarchy of levels of understanding. We use the hierarchy to design and conduct a study with human subjects (undergraduate and graduate students) as well as large language models (generations of GPT), revealing interesting similarities and differences. We expect that our rigorous criteria will be useful to keep track of AI's progress in such cognitive domains.

Updated: 2024-08-20 17:08:13

标题: 《GPT真的明白吗？一种用于量化人类与人工智能对算法理解的层次刻度》

摘要: 随着大型语言模型（LLMs）在越来越复杂的认知任务中表现得越来越好，一个自然的问题是人工智能是否真正理解。对LLMs的理解研究尚处于起步阶段，学术界尚未充分整合哲学、心理学和教育领域的研究。我们发起了这一研究，专门关注理解算法，并提出了一个层次理解的层次结构。我们利用这个层次结构设计并进行了一项与人类受试者（本科和研究生学生）以及大型语言模型（GPT的生成）的研究，揭示了有趣的相似性和差异。我们期望我们严格的标准将有助于跟踪人工智能在这些认知领域的进展。

更新时间: 2024-08-20 17:08:13

领域: cs.AI,I.2.m; F.1.1

下载: http://arxiv.org/abs/2406.14722v2

Representation Matters for Mastering Chess: Improved Feature Representation in AlphaZero Outperforms Switching to Transformers

While transformers have gained recognition as a versatile tool for artificial intelligence (AI), an unexplored challenge arises in the context of chess - a classical AI benchmark. Here, incorporating Vision Transformers (ViTs) into AlphaZero is insufficient for chess mastery, mainly due to ViTs' computational limitations. The attempt to optimize their efficiency by combining MobileNet and NextViT outperformed AlphaZero by about 30 Elo. However, we propose a practical improvement that involves a simple change in the input representation and value loss functions. As a result, we achieve a significant performance boost of up to 180 Elo points beyond what is currently achievable with AlphaZero in chess. In addition to these improvements, our experimental results using the Integrated Gradient technique confirm the effectiveness of the newly introduced features.

Updated: 2024-08-20 16:49:43

标题: 代表对于掌握国际象棋至关重要：AlphaZero中改进的特征表示优于切换到变压器

摘要: 尽管变压器已被认可为人工智能（AI）的多功能工具，但在国际象棋这一经典AI基准中出现了一个未被探索的挑战。在这里，将Vision Transformers（ViTs）整合到AlphaZero中并不能达到国际象棋的掌握，主要是因为ViTs的计算限制。试图通过将MobileNet和NextViT结合来优化它们的效率，表现优于AlphaZero约30个Elo。然而，我们提出了一个实际的改进，涉及对输入表示和价值损失函数的简单更改。结果，我们实现了高达180个Elo点的显著性能提升，超出了目前AlphaZero在国际象棋中的可达性。除了这些改进，我们使用综合梯度技术的实验结果证实了新引入功能的有效性。

更新时间: 2024-08-20 16:49:43

领域: cs.AI

下载: http://arxiv.org/abs/2304.14918v2

Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos

A "match cut" is a common video editing technique where a pair of shots that have a similar composition transition fluidly from one to another. Although match cuts are often visual, certain match cuts involve the fluid transition of audio, where sounds from different sources merge into one indistinguishable transition between two shots. In this paper, we explore the ability to automatically find and create "audio match cuts" within videos and movies. We create a self-supervised audio representation for audio match cutting and develop a coarse-to-fine audio match pipeline that recommends matching shots and creates the blended audio. We further annotate a dataset for the proposed audio match cut task and compare the ability of multiple audio representations to find audio match cut candidates. Finally, we evaluate multiple methods to blend two matching audio candidates with the goal of creating a smooth transition. Project page and examples are available at: https://denfed.github.io/audiomatchcut/

Updated: 2024-08-20 16:46:54

标题: 音频匹配剪辑：在电影和视频中找到并创建匹配的音频过渡

摘要: 一个“匹配剪辑”是一种常见的视频编辑技术，其中一对具有相似构图的镜头可以流畅地过渡。虽然匹配剪辑通常是视觉的，但某些匹配剪辑涉及到音频的流畅过渡，其中来自不同来源的声音融合为一个无法区分的过渡。在本文中，我们探讨了在视频和电影中自动查找和创建“音频匹配剪辑”的能力。我们为音频匹配剪辑创建了一个自监督音频表示，并开发了一个从粗到细的音频匹配管道，推荐匹配的镜头并创建混合音频。我们进一步为拟议的音频匹配剪辑任务注释了一个数据集，并比较了多种音频表示的能力来找到音频匹配剪辑候选。最后，我们评估了多种方法来混合两个匹配音频候选，目的是创建一个平滑的过渡。项目页面和示例可在以下链接找到：https://denfed.github.io/audiomatchcut/

更新时间: 2024-08-20 16:46:54

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2408.10998v1

Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform

Let $\Omega\subset \mathbb{R}^d$ be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU$^k$ activation function can approximate functions from Sobolev spaces $W^s(L_p(\Omega))$ with error measured in the $L_q(\Omega)$-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a variety of cases, including when $q\leq p$, $p\geq 2$, and $s \leq k + (d+1)/2$. The rates we derive are optimal up to logarithmic factors, and significantly generalize existing results. An interesting consequence is that the adaptivity of shallow ReLU$^k$ neural networks enables them to obtain optimal approximation rates for smoothness up to order $s = k + (d+1)/2$, even though they represent piecewise polynomials of fixed degree $k$.

Updated: 2024-08-20 16:43:45

标题: Shallow ReLU$^k$神经网络在Sobolev空间中的逼近速率通过Radon变换

摘要: 让$\Omega\subset \mathbb{R}^d$是一个有界区域。我们考虑使用ReLU$^k$激活函数的浅层神经网络如何有效地逼近来自Sobolev空间$W^s(L_p(\Omega))$的函数，误差以$L_q(\Omega)$-范数衡量。利用Radon变换和最近的差异理论结果，我们提供了一个简单的证明，几乎在各种情况下都达到了最优逼近速率，包括当$q\leq p$，$p\geq 2$，以及$s \leq k + (d+1)/2$时。我们得出的速率在对数因子上是最优的，并且显著地推广了现有结果。一个有趣的结论是，浅层ReLU$^k$神经网络的适应性使它们能够获得平滑度最优逼近速率，直到$s = k + (d+1)/2$，尽管它们表示的是固定次数$k$的分段多项式。

更新时间: 2024-08-20 16:43:45

领域: stat.ML,cs.LG,cs.NA,math.NA,62M45, 41A25, 41A30

下载: http://arxiv.org/abs/2408.10996v1

Automating Knowledge Discovery from Scientific Literature via LLMs: A Dual-Agent Approach with Progressive Ontology Prompting

To address the challenge of automating knowledge discovery from a vast volume of literature, in this paper, we introduce a novel framework based on large language models (LLMs) that combines a progressive ontology prompting (POP) algorithm with a dual-agent system, named LLM-Duo, designed to enhance the automation of knowledge extraction from scientific articles. The POP algorithm utilizes a prioritized breadth-first search (BFS) across a predefined ontology to generate structured prompt templates and action orders, thereby guiding LLMs to discover knowledge in an automatic manner. Additionally, our LLM-Duo employs two specialized LLM agents: an explorer and an evaluator. These two agents work collaboratively and adversarially to enhance the reliability of the discovery and annotation processes. Experiments demonstrate that our method outperforms advanced baselines, enabling more accurate and complete annotations. To validate the effectiveness of our method in real-world scenarios, we employ our method in a case study of speech-language intervention discovery. Our method identifies 2,421 interventions from 64,177 research articles in the speech-language therapy domain. We curate these findings into a publicly accessible intervention knowledge base that holds significant potential to benefit the speech-language therapy community.

Updated: 2024-08-20 16:42:23

标题: 通过LLMs自动化科学文献知识发现：一种具有渐进本体提示的双代理方法

摘要: 为了解决从大量文献中自动发现知识的挑战，本文介绍了一种基于大型语言模型（LLMs）的新框架，该框架结合了渐进本体提示（POP）算法和双代理系统，称为LLM-Duo，旨在增强从科学文章中提取知识的自动化。POP算法利用优先级广度优先搜索（BFS）跨预定义本体生成结构化提示模板和操作顺序，从而引导LLMs以自动方式发现知识。此外，我们的LLM-Duo采用两个专门的LLM代理：一个探险者和一个评估者。这两个代理协作并对抗性地工作，以增强发现和注释过程的可靠性。实验证明，我们的方法优于先进的基线，能够实现更准确和完整的注释。为了验证我们的方法在现实场景中的有效性，我们在言语干预发现的案例研究中应用了我们的方法。我们的方法从言语治疗领域的64,177篇研究文章中识别出2,421种干预措施。我们将这些发现整理成一个公开可访问的干预知识库，具有显著潜力造福言语治疗社区。

更新时间: 2024-08-20 16:42:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.00054v1

Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection

Quantization-aware training (QAT) is a representative model compression method to reduce redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long training time and high energy costs. In addition, the potential label noise in the training data undermines the robustness of QAT. We propose two metrics based on analysis of loss and gradient of quantized weights: error vector score and disagreement score, to quantify the importance of each sample during training. Guided by these two metrics, we proposed a quantization-aware Adaptive Coreset Selection (ACS) method to select the data for the current training epoch. We evaluate our method on various networks (ResNet-18, MobileNetV2, RetinaNet), datasets(CIFAR-10, CIFAR-100, ImageNet-1K, COCO), and under different quantization settings. Specifically, our method can achieve an accuracy of 68.39\% of 4-bit quantized ResNet-18 on the ImageNet-1K dataset with only a 10\% subset, which has an absolute gain of 4.24\% compared to the baseline. Our method can also improve the robustness of QAT by removing noisy samples in the training set.

Updated: 2024-08-20 16:37:12

标题: 高效且稳健的基于量化感知训练：通过自适应核心集选择

摘要: Quantization-aware training (QAT)是一种代表性的模型压缩方法，用于减少权重和激活中的冗余。然而，大多数现有的QAT方法需要对整个数据集进行端对端训练，这会导致训练时间长和能耗高。此外，训练数据中潜在的标签噪声会损害QAT的鲁棒性。我们提出了两个基于量化权重损失和梯度分析的度量标准：误差向量分数和不一致分数，用于量化每个样本在训练过程中的重要性。在这两个度量标准的指导下，我们提出了一种基于量化感知的自适应Coreset选择(ACS)方法，用于选择当前训练轮次的数据。我们在各种网络（ResNet-18、MobileNetV2、RetinaNet）、数据集（CIFAR-10、CIFAR-100、ImageNet-1K、COCO）和不同的量化设置下评估了我们的方法。具体来说，我们的方法可以在只使用10%子集的情况下，在ImageNet-1K数据集上将4位量化的ResNet-18的准确率提高到68.39%，与基线相比，绝对增益为4.24%。我们的方法还可以通过移除训练集中的噪声样本提高QAT的鲁棒性。

更新时间: 2024-08-20 16:37:12

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2306.07215v3

Denoising Plane Wave Ultrasound Images Using Diffusion Probabilistic Models

Ultrasound plane wave imaging is a cutting-edge technique that enables high frame-rate imaging. However, one challenge associated with high frame-rate ultrasound imaging is the high noise associated with them, hindering their wider adoption. Therefore, the development of a denoising method becomes imperative to augment the quality of plane wave images. Drawing inspiration from Denoising Diffusion Probabilistic Models (DDPMs), our proposed solution aims to enhance plane wave image quality. Specifically, the method considers the distinction between low-angle and high-angle compounding plane waves as noise and effectively eliminates it by adapting a DDPM to beamformed radiofrequency (RF) data. The method underwent training using only 400 simulated images. In addition, our approach employs natural image segmentation masks as intensity maps for the generated images, resulting in accurate denoising for various anatomy shapes. The proposed method was assessed across simulation, phantom, and in vivo images. The results of the evaluations indicate that our approach not only enhances image quality on simulated data but also demonstrates effectiveness on phantom and in vivo data in terms of image quality. Comparative analysis with other methods underscores the superiority of our proposed method across various evaluation metrics. The source code and trained model will be released along with the dataset at: http://code.sonography.ai

Updated: 2024-08-20 16:31:31

标题: 使用扩散概率模型去除平面波超声图像的噪声

摘要: 超声平面波成像是一种先进技术，可以实现高帧率成像。然而，与高帧率超声成像相关的一个挑战是与之相关的高噪声，阻碍了其更广泛的应用。因此，开发一种去噪方法变得迫切，以增强平面波图像的质量。从去噪扩散概率模型（DDPMs）中汲取灵感，我们提出的解决方案旨在增强平面波图像质量。具体而言，该方法考虑了低角度和高角度混合平面波之间的区别作为噪音，并通过将DDPM调整到波束形成的射频（RF）数据中有效消除它。该方法仅使用了400个模拟图像进行训练。此外，我们的方法还采用自然图像分割掩模作为生成图像的强度图，从而实现对各种解剖形状的准确去噪。该提出的方法在模拟、幻影和体内图像上进行评估。评估结果表明，我们的方法不仅提高了模拟数据的图像质量，而且在幻影和体内数据方面也在图像质量方面表现出有效性。与其他方法的比较分析突显了我们提出的方法在各种评估指标上的优越性。源代码和训练模型将与数据集一起发布在：http://code.sonography.ai.

更新时间: 2024-08-20 16:31:31

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.10987v1

NeR-VCP: A Video Content Protection Method Based on Implicit Neural Representation

With the popularity of video applications, the security of video content has emerged as a pressing issue that demands urgent attention. Most video content protection methods mainly rely on encryption technology, which needs to be manually designed or implemented in an experience-based manner. To address this problem, we propose an automatic encryption technique for video content protection based on implicit neural representation. We design a key-controllable module, which serves as a key for encryption and decryption. NeR-VCP first pre-distributes the key-controllable module trained by the sender to the recipients, and then uses Implicit Neural Representation (INR) with a (pre-distributed) key-controllable module to encrypt plain video as an implicit neural network, and the legal recipients uses a pre-distributed key-controllable module to decrypt this cipher neural network (the corresponding implicit neural network). Under the guidance of the key-controllable design, our method can improve the security of video content and provide a novel video encryption scheme. Moreover, using model compression techniques, this method can achieve video content protection while effectively mitigating the amount of encrypted data transferred. We experimentally find that it has superior performance in terms of visual representation, imperceptibility to illegal users, and security from a cryptographic viewpoint.

Updated: 2024-08-20 16:23:51

标题: NeR-VCP：基于隐式神经表示的视频内容保护方法

摘要: 随着视频应用的普及，视频内容的安全性已成为一个迫切需要关注的问题。大多数视频内容保护方法主要依赖于加密技术，这需要以经验为基础进行手动设计或实施。为了解决这个问题，我们提出了一种基于隐式神经表示的视频内容保护自动加密技术。我们设计了一个可控密钥模块，用作加密和解密的密钥。NeR-VCP首先通过发送方训练的预分配密钥可控模块分发给接收方，然后使用隐式神经表示(INR)和(预分配的)密钥可控模块将普通视频加密为隐式神经网络，合法接收方使用预分配的密钥可控模块来解密这个密码神经网络(对应的隐式神经网络)。在可控密钥的设计指导下，我们的方法可以提高视频内容的安全性并提供一种新颖的视频加密方案。此外，利用模型压缩技术，该方法可以在有效减少传输的加密数据量的同时实现视频内容保护。我们在实验中发现，从视觉表示、对非法用户的不可察觉性和密码学角度的安全性来看，该方法具有卓越的性能。

更新时间: 2024-08-20 16:23:51

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2408.15281v1

Kernel-Based Differentiable Learning of Non-Parametric Directed Acyclic Graphical Models

Causal discovery amounts to learning a directed acyclic graph (DAG) that encodes a causal model. This model selection problem can be challenging due to its large combinatorial search space, particularly when dealing with non-parametric causal models. Recent research has sought to bypass the combinatorial search by reformulating causal discovery as a continuous optimization problem, employing constraints that ensure the acyclicity of the graph. In non-parametric settings, existing approaches typically rely on finite-dimensional approximations of the relationships between nodes, resulting in a score-based continuous optimization problem with a smooth acyclicity constraint. In this work, we develop an alternative approximation method by utilizing reproducing kernel Hilbert spaces (RKHS) and applying general sparsity-inducing regularization terms based on partial derivatives. Within this framework, we introduce an extended RKHS representer theorem. To enforce acyclicity, we advocate the log-determinant formulation of the acyclicity constraint and show its stability. Finally, we assess the performance of our proposed RKHS-DAGMA procedure through simulations and illustrative data analyses.

Updated: 2024-08-20 16:09:40

标题: 基于核的可微学习非参数有向无环图模型

摘要: 因果发现是学习一个编码因果模型的有向无环图（DAG）。这个模型选择问题可能具有挑战性，因为它具有庞大的组合搜索空间，特别是在处理非参数因果模型时。最近的研究试图通过将因果发现重新构建为连续优化问题来绕过组合搜索，利用约束条件确保图的无环性。在非参数设置中，现有方法通常依赖于节点之间关系的有限维近似，导致基于得分的连续优化问题，带有平滑的无环性约束。在这项工作中，我们通过利用再生核希尔伯特空间（RKHS）并应用基于偏导数的一般稀疏诱导正则化项，开发了一种替代近似方法。在这个框架内，我们介绍了一个扩展的RKHS表示定理。为了强制实现无环性，我们主张使用无环性约束的对数行列式公式，并展示其稳定性。最后，我们通过模拟和说明性数据分析评估了我们提出的RKHS-DAGMA过程的性能。

更新时间: 2024-08-20 16:09:40

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2408.10976v1

Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

Recent advances in Large Language Models (LLMs) have demonstrated significant potential in the field of Recommendation Systems (RSs). Most existing studies have focused on converting user behavior logs into textual prompts and leveraging techniques such as prompt tuning to enable LLMs for recommendation tasks. Meanwhile, research interest has recently grown in multimodal recommendation systems that integrate data from images, text, and other sources using modality fusion techniques. This introduces new challenges to the existing LLM-based recommendation paradigm which relies solely on text modality information. Moreover, although Multimodal Large Language Models (MLLMs) capable of processing multi-modal inputs have emerged, how to equip MLLMs with multi-modal recommendation capabilities remains largely unexplored. To this end, in this paper, we propose the Multimodal Large Language Model-enhanced Multimodaln Sequential Recommendation (MLLM-MSR) model. To capture the dynamic user preference, we design a two-stage user preference summarization method. Specifically, we first utilize an MLLM-based item-summarizer to extract image feature given an item and convert the image into text. Then, we employ a recurrent user preference summarization generation paradigm to capture the dynamic changes in user preferences based on an LLM-based user-summarizer. Finally, to enable the MLLM for multi-modal recommendation task, we propose to fine-tune a MLLM-based recommender using Supervised Fine-Tuning (SFT) techniques. Extensive evaluations across various datasets validate the effectiveness of MLLM-MSR, showcasing its superior ability to capture and adapt to the evolving dynamics of user preferences.

Updated: 2024-08-20 16:09:33

标题: 利用多模态大型语言模型进行多模态序列推荐

摘要: 最近关于大型语言模型（LLMs）的研究取得了显著进展，显示出在推荐系统（RSs）领域具有重要潜力。大多数现有研究集中在将用户行为日志转换为文本提示，并利用提示调整等技术来启用LLMs进行推荐任务。与此同时，最近对多模态推荐系统的研究兴趣不断增长，这些系统利用多模态融合技术整合来自图像、文本和其他来源的数据。这给现有基于LLM的推荐范式带来了新的挑战，因为它仅依赖于文本模态信息。此外，虽然出现了能够处理多模态输入的多模态大型语言模型（MLLMs），但如何赋予MLLMs多模态推荐能力仍然大部分未被探索。因此，本文提出了增强多模态大型语言模型的多模态顺序推荐（MLLM-MSR）模型。为了捕捉动态用户偏好，我们设计了一个两阶段用户偏好总结方法。具体来说，我们首先利用基于MLLM的项目摘要生成器提取给定项目的图像特征并将图像转换为文本。然后，我们采用基于LLM的用户摘要生成范式来捕捉基于用户摘要生成器的用户偏好的动态变化。最后，为了使MLLM能够进行多模态推荐任务，我们提出使用监督微调（SFT）技术对基于MLLM的推荐器进行微调。对各种数据集的广泛评估验证了MLLM-MSR的有效性，展示了其优越的捕捉和适应用户偏好演变动态的能力。

更新时间: 2024-08-20 16:09:33

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2408.09698v2

Hybrid Recurrent Models Support Emergent Descriptions for Hierarchical Planning and Control

An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work has demonstrated that a class of hybrid state-space model known as recurrent switching linear dynamical systems (rSLDS) discover meaningful behavioural units via the piecewise linear decomposition of complex continuous dynamics (Linderman et al., 2016). Furthermore, they model how the underlying continuous states drive these discrete mode switches. We propose that the rich representations formed by an rSLDS can provide useful abstractions for planning and control. We present a novel hierarchical model-based algorithm inspired by Active Inference in which a discrete MDP sits above a low-level linear-quadratic controller. The recurrent transition dynamics learned by the rSLDS allow us to (1) specify temporally-abstracted sub-goals in a method reminiscent of the options framework, (2) lift the exploration into discrete space allowing us to exploit information-theoretic exploration bonuses and (3) `cache' the approximate solutions to low-level problems in the discrete planner. We successfully apply our model to the sparse Continuous Mountain Car task, demonstrating fast system identification via enhanced exploration and non-trivial planning through the delineation of abstract sub-goals.

Updated: 2024-08-20 16:02:54

标题: 混合循环模型支持用于分层规划和控制的新兴描述

摘要: 人工智能中一个开放的问题是系统如何灵活地学习对解决本质上是连续问题有用的离散抽象。先前的工作表明，一种称为递归切换线性动态系统（rSLDS）的混合状态空间模型通过复杂连续动态的分段线性分解发现有意义的行为单元。此外，它们模拟了潜在连续状态如何驱动这些离散模式切换。我们提出，rSLDS形成的丰富表示可以为规划和控制提供有用的抽象。我们提出了一种受主动推理启发的新型分层模型算法，其中离散MDP位于低级线性二次控制器之上。rSLDS学习的递归转换动态使我们能够（1）以类似选项框架的方法指定时间抽象的子目标，（2）将探索提升到离散空间，使我们能够利用信息论探索奖励，以及（3）在离散计划者中“缓存”低级问题的近似解决方案。我们成功将我们的模型应用于稀疏的连续山车任务，通过增强探索实现快速系统识别，并通过划分抽象子目标实现非平凡的规划。

更新时间: 2024-08-20 16:02:54

领域: cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2408.10970v1

KeySpace: Public Key Infrastructure Considerations in Interplanetary Networks

As satellite networks grow larger and begin to incorporate interplanetary communication, there is an increasing interest in the unsolved problem of how to approach PKI in these conditions. In this paper we explore the goals and requirements for implementing key management systems in satellite networks, focusing on megaconstellations and interplanetary networks. We design a set of standardized experiments which can be used to compare systems against one another for particular network topologies. Using these, we demonstrate that terrestrial PKI techniques are feasible in highly distributed interplanetary networks, showing that it is possible to configure PKI systems to achieve efficient low-latency connection establishment, and minimize the impact of attacks through effective revocations. We evaluate this by building the Deep Space Network Simulator (DSNS), a novel network simulator aimed at efficient simulation of large space networks. We run simulations evaluating connection establishment and key revocation under a wide range of PKI configurations. Finally, we propose and evaluate two additional configuration options: OCSP Hybrid, and the use of relay nodes as a firewall. Together these minimize the extent of the network an attacker can reach with a compromised key, and reduce the attacker's load on interplanetary relay links.

Updated: 2024-08-20 16:00:17

标题: KeySpace：行星间网络中的公钥基础设施考虑

摘要: 随着卫星网络规模的扩大并开始整合星际通信，人们对如何在这些条件下处理PKI这一未解决的问题越来越感兴趣。本文探讨了在卫星网络中实现密钥管理系统的目标和要求，重点放在超级星座和星际网络上。我们设计了一组标准化实验，可用于比较特定网络拓扑结构下的系统。通过这些实验，我们展示了在高度分布的星际网络中，地面PKI技术是可行的，表明可以配置PKI系统以实现高效低延迟的连接建立，并通过有效的吊销减少攻击的影响。我们通过建立深空网络模拟器（DSNS）来评估这一点，这是一个旨在高效模拟大型空间网络的新型网络模拟器。我们进行了一系列模拟实验，评估在各种PKI配置下的连接建立和密钥吊销。最后，我们提出并评估了两种额外的配置选项：OCSP混合和将中继节点用作防火墙。这些配置共同减少了攻击者可以利用被攻击密钥到达网络的范围，并减少了攻击者对星际中继链路的负载。

更新时间: 2024-08-20 16:00:17

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2408.10963v1

Kilometer-Scale Convection Allowing Model Emulation using Generative Diffusion Modeling

Storm-scale convection-allowing models (CAMs) are an important tool for predicting the evolution of thunderstorms and mesoscale convective systems that result in damaging extreme weather. By explicitly resolving convective dynamics within the atmosphere they afford meteorologists the nuance needed to provide outlook on hazard. Deep learning models have thus far not proven skilful at km-scale atmospheric simulation, despite being competitive at coarser resolution with state-of-the-art global, medium-range weather forecasting. We present a generative diffusion model called StormCast, which emulates the high-resolution rapid refresh (HRRR) model-NOAA's state-of-the-art 3km operational CAM. StormCast autoregressively predicts 99 state variables at km scale using a 1-hour time step, with dense vertical resolution in the atmospheric boundary layer, conditioned on 26 synoptic variables. We present evidence of successfully learnt km-scale dynamics including competitive 1-6 hour forecast skill for composite radar reflectivity alongside physically realistic convective cluster evolution, moist updrafts, and cold pool morphology. StormCast predictions maintain realistic power spectra for multiple predicted variables across multi-hour forecasts. Together, these results establish the potential for autoregressive ML to emulate CAMs -- opening up new km-scale frontiers for regional ML weather prediction and future climate hazard dynamical downscaling.

Updated: 2024-08-20 15:56:01

标题: 使用生成扩散建模实现公里尺度对流允许模拟

摘要: 暴风尺度对流允许模型（CAMs）是预测导致极端恶劣天气的雷暴和中尺度对流系统演变的重要工具。通过明确解析大气中的对流动力学，它们为气象学家提供了提供危险预测所需的细微差别。尽管深度学习模型在km尺度大气模拟方面尚未表现出色，但在与最先进的全球、中程天气预报相比较时，它们在更粗分辨率下具有竞争力。我们提出了一种名为StormCast的生成扩散模型，它模拟高分辨率快速更新（HRRR）模型-美国国家海洋和大气管理局的最先进3km运行CAM。StormCast使用1小时时间步长自回归地预测km尺度上的99个状态变量，具有在大气边界层内的密集垂直分辨率，并以26个合成变量为条件。我们展示了成功学习的km尺度动力学，包括与物理现实相一致的对流团演化、潮湿上升流和冷池形态，以及与合成雷达反射率相比具有竞争力的1-6小时预报技巧。StormCast的预测在多小时预报中维持多个预测变量的实际功率谱。综合这些结果，建立了自回归ML模型模拟CAMs的潜力，为区域ML天气预测和未来气候危险动态降尺度开启了新的km尺度前沿。

更新时间: 2024-08-20 15:56:01

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2408.10958v1

InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales

Retrieval-augmented generation (RAG) has shown promising potential to enhance the accuracy and factuality of language models (LMs). However, imperfect retrievers or noisy corpora can introduce misleading or even erroneous information to the retrieved contents, posing a significant challenge to the generation quality. Existing RAG methods typically address this challenge by directly predicting final answers despite potentially noisy inputs, resulting in an implicit denoising process that is difficult to interpret and verify. On the other hand, the acquisition of explicit denoising supervision is often costly, involving significant human efforts. In this work, we propose InstructRAG, where LMs explicitly learn the denoising process through self-synthesized rationales -- First, we instruct the LM to explain how the ground-truth answer is derived from retrieved documents. Then, these rationales can be used either as demonstrations for in-context learning of explicit denoising or as supervised fine-tuning data to train the model. Compared to standard RAG approaches, InstructRAG requires no additional supervision, allows for easier verification of the predicted answers, and effectively improves generation accuracy. Experiments show InstructRAG consistently outperforms existing RAG methods in both training-free and trainable scenarios, achieving a relative improvement of 8.3% over the best baseline method on average across five knowledge-intensive benchmarks. Extensive analysis indicates that InstructRAG scales well with increased numbers of retrieved documents and consistently exhibits robust denoising ability even in out-of-domain datasets, demonstrating strong generalizability.

Updated: 2024-08-20 15:48:49

标题: InstructRAG: 通过自我合成的理由指导检索增强生成

摘要: 检索增强生成（RAG）已经显示出增强语言模型（LMs）的准确性和事实性的潜力。然而，不完美的检索器或嘈杂的语料库可能会向检索内容引入误导性甚至错误信息，给生成质量带来重大挑战。现有的RAG方法通常通过直接预测最终答案来应对这一挑战，尽管输入可能存在嘈杂，从而导致难以解释和验证的隐式去噪过程。另一方面，明确获取去噪监督通常成本高昂，涉及大量人力。在这项工作中，我们提出了InstructRAG，其中LMs通过自我合成的理由明确地学习去噪过程 - 首先，我们指导LM解释如何从检索文档中推导出地面真相答案。然后，这些理由可以被用作明确去噪的上下文学习的示范，或者作为监督微调数据来训练模型。与标准的RAG方法相比，InstructRAG不需要额外的监督，可以更容易地验证预测的答案，并有效提高生成的准确性。实验表明，在无需训练和可训练的情况下，InstructRAG始终优于现有的RAG方法，在五个知识密集型基准测试中平均相对改进了8.3％。广泛的分析表明，InstructRAG随着检索文档数量的增加而扩展，并且即使在领域外数据集中也始终表现出强大的去噪能力，展示了强大的泛化能力。

更新时间: 2024-08-20 15:48:49

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.13629v2

Extracting Signal out of Chaos: Advancements on MAGI for Bayesian Analysis of Dynamical Systems

This work builds off the manifold-constrained Gaussian process inference (MAGI) method for Bayesian parameter inference and trajectory reconstruction of ODE-based dynamical systems, focusing primarily on sparse and noisy data conditions. First, we introduce Pilot MAGI (pMAGI), a novel methodological upgrade on the base MAGI method that confers significantly-improved numerical stability, parameter inference, and trajectory reconstruction. Second, we demonstrate, for the first time to our knowledge, how one can combine MAGI-based methods with dynamical systems theory to provide probabilistic classifications of whether a system is stable or chaotic. Third, we demonstrate how pMAGI performs favorably in many settings against much more computationally-expensive and overparameterized methods. Fourth, we introduce Pilot MAGI Sequential Prediction (PMSP), a novel method building upon pMAGI that allows one to predict the trajectory of ODE-based dynamical systems multiple time steps into the future, given only sparse and noisy observations. We show that PMSP can output accurate future predictions even on chaotic dynamical systems and significantly outperform PINN-based methods. Overall, we contribute to the literature two novel methods, pMAGI and PMSP, that serve as Bayesian, uncertainty-quantified competitors to the Physics-Informed Neural Network.

Updated: 2024-08-20 15:47:06

标题: 从混沌中提取信号：用于动态系统贝叶斯分析的MAGI进展

摘要: 这项工作基于多重约束高斯过程推断（MAGI）方法，用于贝叶斯参数推断和基于ODE的动力系统轨迹重建，主要关注稀疏和嘈杂的数据条件。首先，我们介绍了Pilot MAGI（pMAGI），这是对基础MAGI方法的一种新的方法论升级，显著提高了数值稳定性、参数推断和轨迹重建。其次，我们首次展示了如何将基于MAGI的方法与动力系统理论相结合，以提供系统稳定或混沌的概率分类。第三，我们展示了pMAGI在许多情境中优于更具计算负担和过度参数化的方法。第四，我们介绍了Pilot MAGI Sequential Prediction（PMSP），这是一种建立在pMAGI基础上的新方法，允许在仅有稀疏和嘈杂观测数据的情况下预测基于ODE的动力系统未来多个时间步长的轨迹。我们展示了PMSP甚至可以在混沌动力系统上输出准确的未来预测，并且明显优于基于PINN的方法。总的来说，我们提出了两种新颖的方法，pMAGI和PMSP，它们作为贝叶斯、不确定性量化的竞争对手，与基于物理信息的神经网络相提并论。

更新时间: 2024-08-20 15:47:06

领域: stat.CO,cs.LG,math.DS,stat.ML

下载: http://arxiv.org/abs/2409.01293v1

Wave-Mask/Mix: Exploring Wavelet-Based Augmentations for Time Series Forecasting

Data augmentation is important for improving machine learning model performance when faced with limited real-world data. In time series forecasting (TSF), where accurate predictions are crucial in fields like finance, healthcare, and manufacturing, traditional augmentation methods for classification tasks are insufficient to maintain temporal coherence. This research introduces two augmentation approaches using the discrete wavelet transform (DWT) to adjust frequency elements while preserving temporal dependencies in time series data. Our methods, Wavelet Masking (WaveMask) and Wavelet Mixing (WaveMix), are evaluated against established baselines across various forecasting horizons. To the best of our knowledge, this is the first study to conduct extensive experiments on multivariate time series using Discrete Wavelet Transform as an augmentation technique. Experimental results demonstrate that our techniques achieve competitive results with previous methods. We also explore cold-start forecasting using downsampled training datasets, comparing outcomes to baseline methods.

Updated: 2024-08-20 15:42:10

标题: Wave-Mask/Mix：探索基于小波的时间序列预测增强技术

摘要: 数据增强对于在面对有限真实世界数据时改善机器学习模型性能至关重要。在时间序列预测（TSF）中，准确预测在金融、医疗保健和制造业等领域至关重要，传统的分类任务增强方法不足以保持时间序列数据的时间一致性。本研究引入了两种利用离散小波变换（DWT）调整频率元素同时保留时间依赖性的增强方法。我们的方法，小波屏蔽（WaveMask）和小波混合（WaveMix），在各种预测时间范围内与已建立的基线进行评估。据我们所知，这是第一项利用离散小波变换作为增强技术对多变量时间序列进行广泛实验的研究。实验结果表明，我们的技术达到了与先前方法相竞争的结果。我们还探讨了使用降采样训练数据集进行冷启动预测，将结果与基线方法进行比较。

更新时间: 2024-08-20 15:42:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.10951v1

Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation

With the recent advances of large language models (LLMs), it is no longer infeasible to build an automated debate system that helps people to synthesise persuasive arguments. Previous work attempted this task by integrating multiple components. In our work, we introduce an argument mining dataset that captures the end-to-end process of preparing an argumentative essay for a debate, which covers the tasks of claim and evidence identification (Task 1 ED), evidence convincingness ranking (Task 2 ECR), argumentative essay summarisation and human preference ranking (Task 3 ASR) and metric learning for automated evaluation of resulting essays, based on human feedback along argument quality dimensions (Task 4 SQE). Our dataset contains 14k examples of claims that are fully annotated with the various properties supporting the aforementioned tasks. We evaluate multiple generative baselines for each of these tasks, including representative LLMs. We find, that while they show promising results on individual tasks in our benchmark, their end-to-end performance on all four tasks in succession deteriorates significantly, both in automated measures as well as in human-centred evaluation. This challenge presented by our proposed dataset motivates future research on end-to-end argument mining and summarisation. The repository of this project is available at https://github.com/HaoBytes/ArgSum-Datatset

Updated: 2024-08-20 15:41:27

标题: 你站在哪一边？一个用于端到端论点摘要和评估的多任务数据集

摘要: 随着大型语言模型（LLMs）的最新进展，建立一个自动辩论系统来帮助人们综合有说服力的论点不再是不可行的。先前的工作尝试通过整合多个组件来完成这项任务。在我们的工作中，我们引入了一个论点挖掘数据集，该数据集捕捉了为辩论准备辩证性论文的端到端过程，涵盖了主张和证据识别（任务1 ED）、证据说服力排名（任务2 ECR）、辩证性论文总结和人类偏好排名（任务3 ASR）以及基于人类反馈沿着论点质量维度进行自动评估的度量学习（任务4 SQE）。我们的数据集包含14,000个主张示例，这些示例都完全注释了支持前述任务的各种属性。我们针对每个任务评估了多个生成基线，包括代表性的LLMs。我们发现，虽然它们在我们的基准测试中在各个任务上表现出有希望的结果，但它们在所有四个任务上连续的端到端表现在自动化度量以及人类中心评估方面显著下降。我们提出的数据集所面临的挑战激励了未来关于端到端论点挖掘和总结的研究。该项目的存储库可在https://github.com/HaoBytes/ArgSum-Datatset上找到。

更新时间: 2024-08-20 15:41:27

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.03151v3

GAIM: Attacking Graph Neural Networks via Adversarial Influence Maximization

Recent studies show that well-devised perturbations on graph structures or node features can mislead trained Graph Neural Network (GNN) models. However, these methods often overlook practical assumptions, over-rely on heuristics, or separate vital attack components. In response, we present GAIM, an integrated adversarial attack method conducted on a node feature basis while considering the strict black-box setting. Specifically, we define an adversarial influence function to theoretically assess the adversarial impact of node perturbations, thereby reframing the GNN attack problem into the adversarial influence maximization problem. In our approach, we unify the selection of the target node and the construction of feature perturbations into a single optimization problem, ensuring a unique and consistent feature perturbation for each target node. We leverage a surrogate model to transform this problem into a solvable linear programming task, streamlining the optimization process. Moreover, we extend our method to accommodate label-oriented attacks, broadening its applicability. Thorough evaluations on five benchmark datasets across three popular models underscore the effectiveness of our method in both untargeted and label-oriented targeted attacks. Through comprehensive analysis and ablation studies, we demonstrate the practical value and efficacy inherent to our design choices.

Updated: 2024-08-20 15:41:20

标题: GAIM：通过对抗影响最大化攻击图神经网络

摘要: 最近的研究表明，对图结构或节点特征进行良好设计的干扰可能会误导经过训练的图神经网络（GNN）模型。然而，这些方法通常忽视了实际假设，过分依赖启发式方法，或将关键的攻击组件分开。为了应对这一挑战，我们提出了GAIM，这是一种基于节点特征进行的综合对抗攻击方法，同时考虑了严格的黑盒设置。具体来说，我们定义了一个对抗影响函数，用于理论上评估节点干扰的对抗影响，从而将GNN攻击问题重新构思为对抗影响最大化问题。在我们的方法中，我们将目标节点的选择和特征干扰的构建统一为一个优化问题，确保每个目标节点都有一个独特且一致的特征干扰。我们利用一个替代模型将这个问题转化为一个可解的线性规划任务，简化了优化过程。此外，我们扩展了我们的方法以适应面向标签的攻击，扩大了其适用范围。通过对三种流行模型跨五个基准数据集的彻底评估，我们强调了我们方法在无目标和面向标签的有目标攻击中的有效性。通过全面分析和消融研究，我们展示了我们设计选择固有的实际价值和功效。

更新时间: 2024-08-20 15:41:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.10948v1

Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity

Recent research demonstrates that GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions. However, they mainly focus on node classification tasks, neglecting the potential threats entailed within the domain of graph classification tasks. Furthermore, their practicality is questionable due to unreasonable assumptions, specifically concerning the large data requirements and extensive model knowledge. To this end, we advocate following strict settings with limited real data and hard-label awareness to generate synthetic data, thereby facilitating the stealing of the target model. Specifically, following important data generation principles, we introduce three model stealing attacks to adapt to different actual scenarios: MSA-AU is inspired by active learning and emphasizes the uncertainty to enhance query value of generated samples; MSA-AD introduces diversity based on Mixup augmentation strategy to alleviate the query inefficiency issue caused by over-similar samples generated by MSA-AU; MSA-AUD combines the above two strategies to seamlessly integrate the authenticity, uncertainty, and diversity of the generated samples. Finally, extensive experiments consistently demonstrate the superiority of the proposed methods in terms of concealment, query efficiency, and stealing performance.

Updated: 2024-08-20 15:41:10

标题: 攻击图分类模型的模型窃取攻击：真实性、不确定性和多样性

摘要: 最近的研究表明，图神经网络(GNNs)容易受到模型窃取攻击的威胁，这是一种通过查询许可来复制目标模型的恶意行为。然而，他们主要关注节点分类任务，忽略了图分类任务领域内潜在的威胁。此外，由于不合理的假设，特别是关于大量数据需求和广泛的模型知识，他们的实用性受到质疑。因此，我们提倡在限制真实数据和硬标签意识的严格设置下生成合成数据，以便促进目标模型的窃取。具体而言，遵循重要的数据生成原则，我们引入了三种模型窃取攻击，以适应不同的实际情况：MSA-AU受到主动学习的启发，强调不确定性以增强生成样本的查询价值；MSA-AD基于Mixup增强策略引入多样性，以减轻由MSA-AU生成的过于相似样本造成的查询效率问题；MSA-AUD结合了上述两种策略，无缝地整合了生成样本的真实性、不确定性和多样性。最后，广泛的实验一致表明，所提出的方法在隐蔽性、查询效率和窃取性能方面表现出优越性。

更新时间: 2024-08-20 15:41:10

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2312.10943v3

Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models

Teachers are important to imparting knowledge and guiding learners, and the role of large language models (LLMs) as potential educators is emerging as an important area of study. Recognizing LLMs' capability to generate educational content can lead to advances in automated and personalized learning. While LLMs have been tested for their comprehension and problem-solving skills, their capability in teaching remains largely unexplored. In teaching, questioning is a key skill that guides students to analyze, evaluate, and synthesize core concepts and principles. Therefore, our research introduces a benchmark to evaluate the questioning capability in education as a teacher of LLMs through evaluating their generated educational questions, utilizing Anderson and Krathwohl's taxonomy across general, monodisciplinary, and interdisciplinary domains. We shift the focus from LLMs as learners to LLMs as educators, assessing their teaching capability through guiding them to generate questions. We apply four metrics, including relevance, coverage, representativeness, and consistency, to evaluate the educational quality of LLMs' outputs. Our results indicate that GPT-4 demonstrates significant potential in teaching general, humanities, and science courses; Claude2 appears more apt as an interdisciplinary teacher. Furthermore, the automatic scores align with human perspectives.

Updated: 2024-08-20 15:36:30

标题: Dr.Academy: 一个用于评估大型语言模型在教育中问答能力的基准测试

摘要: 老师在传授知识和指导学习者方面起着重要作用，大语言模型（LLMs）作为潜在教育者的角色正在成为一个重要的研究领域。认识到LLMs生成教育内容的能力可以推动自动化和个性化学习的进步。虽然LLMs已经被测试过其理解和解决问题的能力，但它们在教学中的能力仍然大部分未被探索。在教学中，提问是一项关键技能，它引导学生分析、评估和综合核心概念和原则。因此，我们的研究引入了一个基准来评估LLMs作为教育者的教育提问能力，通过评估它们生成的教育问题，利用Anderson和Krathwohl的分类法跨越一般、单学科和跨学科领域。我们将焦点从LLMs作为学习者转移到LLMs作为教育者，通过引导它们生成问题来评估它们的教学能力。我们应用了四个度量标准，包括相关性、覆盖范围、代表性和一致性，来评估LLMs输出的教育质量。我们的结果表明，GPT-4在教授一般、人文和科学课程方面展现出显著潜力；Claude2似乎更适合作为跨学科教师。此外，自动分数与人类观点一致。

更新时间: 2024-08-20 15:36:30

领域: cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2408.10947v1

Large Language Model Driven Recommendation

While previous chapters focused on recommendation systems (RSs) based on standardized, non-verbal user feedback such as purchases, views, and clicks -- the advent of LLMs has unlocked the use of natural language (NL) interactions for recommendation. This chapter discusses how LLMs' abilities for general NL reasoning present novel opportunities to build highly personalized RSs -- which can effectively connect nuanced and diverse user preferences to items, potentially via interactive dialogues. To begin this discussion, we first present a taxonomy of the key data sources for language-driven recommendation, covering item descriptions, user-system interactions, and user profiles. We then proceed to fundamental techniques for LLM recommendation, reviewing the use of encoder-only and autoregressive LLM recommendation in both tuned and untuned settings. Afterwards, we move to multi-module recommendation architectures in which LLMs interact with components such as retrievers and RSs in multi-stage pipelines. This brings us to architectures for conversational recommender systems (CRSs), in which LLMs facilitate multi-turn dialogues where each turn presents an opportunity not only to make recommendations, but also to engage with the user in interactive preference elicitation, critiquing, and question-answering.

Updated: 2024-08-20 15:36:24

标题: 大型语言模型驱动的推荐

摘要: 过去的章节主要关注基于标准化、非语言用户反馈（如购买、浏览和点击）的推荐系统（RSs）——LLMs的出现解锁了自然语言（NL）交互用于推荐。本章讨论了LLMs在一般NL推理方面的能力如何为构建高度个性化的RSs提供了新机会——通过互动对话，可以有效地将细微和多样化的用户偏好连接到物品。为了开始这个讨论，我们首先提出了以语言驱动推荐为基础的关键数据来源分类，涵盖物品描述、用户-系统交互和用户档案。然后我们继续探讨LLM推荐的基本技术，回顾了编码器-只和自回归LLM推荐在调整和未调整设置中的使用。之后，我们转向多模块推荐架构，其中LLMs与检索器和RSs等组件在多阶段管道中交互。这将我们带到了用于对话推荐系统（CRSs）的架构，其中LLMs促进多轮对话，在每一轮中不仅提供推荐的机会，还与用户进行互动偏好引导、批评和问答。

更新时间: 2024-08-20 15:36:24

领域: cs.AI

下载: http://arxiv.org/abs/2408.10946v1

HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments

High-resolution Vision-Language Models (VLMs) have been widely used in multimodal tasks to enhance accuracy by preserving detailed image information. However, these models often generate excessive visual tokens due to encoding multiple partitions of the input image. Processing these excessive visual tokens is computationally challenging, especially in resource-constrained environments with commodity GPUs. To support high-resolution images while meeting resource constraints, we propose High-Resolution Early Dropping (HiRED), a token-dropping scheme that operates within a fixed token budget before the Large Language Model (LLM) stage. HiRED can be integrated with existing high-resolution VLMs in a plug-and-play manner, as it requires no additional training while still maintaining superior accuracy. We strategically use the vision encoder's attention in the initial layers to assess the visual content of each image partition and allocate the token budget accordingly. Then, using the attention in the final layer, we select the most important visual tokens from each partition within the allocated budget, dropping the rest. Empirically, when applied to LLaVA-Next-7B on NVIDIA TESLA P40 GPU, HiRED with a 20% token budget increases token generation throughput by 4.7, reduces first-token generation latency by 15 seconds, and saves 2.3 GB of GPU memory for a single inference.

Updated: 2024-08-20 15:34:27

标题: HiRED: 基于注意力引导的令牌丢弃，在资源受限环境中高效推理高分辨率视觉语言模型

摘要: 高分辨率视觉语言模型（VLMs）已广泛应用于多模态任务中，通过保留详细的图像信息来提高准确性。然而，由于对输入图像的多个分区进行编码，这些模型通常会生成过多的视觉标记。处理这些过多的视觉标记在资源受限的环境中具有挑战性，尤其是在使用普通GPU的情况下。为了在满足资源约束条件的同时支持高分辨率图像，我们提出了高分辨率早期丢弃（HiRED），这是一种在大语言模型（LLM）阶段之前在固定标记预算内运行的标记丢弃方案。HiRED可以以即插即用的方式与现有的高分辨率VLM集成，因为它无需额外的训练，同时仍保持卓越的准确性。我们在初始层中策略性地使用视觉编码器的注意力来评估每个图像分区的视觉内容，并相应地分配标记预算。然后，使用最终层中的注意力，我们从在分配的预算内的每个分区中选择最重要的视觉标记，丢弃其余的标记。经验上，当应用于NVIDIA TESLA P40 GPU上的LLaVA-Next-7B时，HiRED在20%的标记预算下将标记生成吞吐量提高了4.7倍，将第一个标记的生成延迟降低了15秒，并为单次推断节省了2.3GB的GPU内存。

更新时间: 2024-08-20 15:34:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.10945v1

Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations

Self-supervised representation learning often uses data augmentations to induce some invariance to "style" attributes of the data. However, with downstream tasks generally unknown at training time, it is difficult to deduce a priori which attributes of the data are indeed "style" and can be safely discarded. To deal with this, current approaches try to retain some style information by tuning the degree of invariance to some particular task, such as ImageNet object classification. However, prior work has shown that such task-specific tuning can lead to significant performance degradation on other tasks that rely on the discarded style. To address this, we introduce a more principled approach that seeks to disentangle style features rather than discard them. The key idea is to add multiple style embedding spaces where: (i) each is invariant to all-but-one augmentation; and (ii) joint entropy is maximized. We formalize our structured data-augmentation procedure from a causal latent-variable-model perspective, and prove identifiability of both content and individual style variables. We empirically demonstrate the benefits of our approach on both synthetic and real-world data.

Updated: 2024-08-20 15:33:12

标题: 通过利用数据增强中的结构进行自监督解缠

摘要: 自监督表示学习通常使用数据增强来诱导对数据的“风格”属性具有一定的不变性。然而，在训练时通常未知下游任务，很难事先推断哪些数据属性实际上是“风格”，可以安全地丢弃。为了解决这个问题，当前的方法尝试通过调整对某个特定任务（如ImageNet对象分类）的不变性程度来保留一些风格信息。然而，先前的研究表明，这种特定任务的调整可能导致其他依赖被丢弃风格的任务的性能显著下降。为了解决这个问题，我们引入了一种更有原则的方法，旨在解开风格特征而不是丢弃它们。关键思想是添加多个风格嵌入空间，其中：（i）每个对除一个增强方式外的所有增强方式都具有不变性；（ii）联合熵被最大化。我们从因果潜变量模型的角度形式化了我们的结构化数据增强过程，并证明了内容和单独风格变量的可辨识性。我们在合成数据和真实世界数据上实证地展示了我们方法的好处。

更新时间: 2024-08-20 15:33:12

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2311.08815v2

Robust Regression with Ensembles Communicating over Noisy Channels

As machine-learning models grow in size, their implementation requirements cannot be met by a single computer system. This observation motivates distributed settings, in which intermediate computations are performed across a network of processing units, while the central node only aggregates their outputs. However, distributing inference tasks across low-precision or faulty edge devices, operating over a network of noisy communication channels, gives rise to serious reliability challenges. We study the problem of an ensemble of devices, implementing regression algorithms, that communicate through additive noisy channels in order to collaboratively perform a joint regression task. We define the problem formally, and develop methods for optimizing the aggregation coefficients for the parameters of the noise in the channels, which can potentially be correlated. Our results apply to the leading state-of-the-art ensemble regression methods: bagging and gradient boosting. We demonstrate the effectiveness of our algorithms on both synthetic and real-world datasets.

Updated: 2024-08-20 15:32:47

标题: 在嘈杂通道上传输的强健回归集成

摘要: 随着机器学习模型的规模不断增长，单个计算机系统无法满足其实施要求。这一观察结果促使分布式设置的出现，其中在处理单元网络上执行中间计算，而中心节点仅聚合它们的输出。然而，将推断任务分布在低精度或有故障的边缘设备上，在嘈杂通信通道网络上操作，会带来严重的可靠性挑战。我们研究了一个由实施回归算法的设备集合的问题，它们通过加性嘈杂通道进行通信，以共同执行联合回归任务。我们正式定义了这个问题，并开发了用于优化通道中噪声参数的聚合系数的方法，这些参数可能存在相关性。我们的结果适用于最先进的集成回归方法：bagging和梯度提升。我们在合成和真实数据集上展示了我们算法的有效性。

更新时间: 2024-08-20 15:32:47

领域: cs.LG,cs.DC,cs.IT,math.IT

下载: http://arxiv.org/abs/2408.10942v1

Normalise for Fairness: A Simple Normalisation Technique for Fairness in Regression Machine Learning Problems

Algorithms and Machine Learning (ML) are increasingly affecting everyday life and several decision-making processes, where ML has an advantage due to scalability or superior performance. Fairness in such applications is crucial, where models should not discriminate their results based on race, gender, or other protected groups. This is especially crucial for models affecting very sensitive topics, like interview invitation or recidivism prediction. Fairness is not commonly studied for regression problems compared to binary classification problems; hence, we present a simple, yet effective method based on normalisation (FaiReg), which minimises the impact of unfairness in regression problems, especially due to labelling bias. We present a theoretical analysis of the method, in addition to an empirical comparison against two standard methods for fairness, namely data balancing and adversarial training. We also include a hybrid formulation (FaiRegH), merging the presented method with data balancing, in an attempt to face labelling and sampling biases simultaneously. The experiments are conducted on the multimodal dataset First Impressions (FI) with various labels, namely Big-Five personality prediction and interview screening score. The results show the superior performance of diminishing the effects of unfairness better than data balancing, also without deteriorating the performance of the original problem as much as adversarial training. Fairness is evaluated based on the Equal Accuracy (EA) and Statistical Parity (SP) constraints. The experiments present a setup that enhances the fairness for several protected variables simultaneously.

Updated: 2024-08-20 15:31:45

标题: 公平归一化：一种用于回归机器学习问题的简单归一化技术

摘要: 算法和机器学习（ML）越来越影响日常生活和一些决策过程，在这些过程中，由于可扩展性或优越性能，ML具有优势。在这些应用中，公平性至关重要，模型不应基于种族、性别或其他受保护群体而歧视其结果。这对于影响非常敏感主题的模型尤为重要，比如面试邀请或累犯预测。与二元分类问题相比，对于回归问题，公平性并不经常被研究；因此，我们提出了一种基于归一化的简单但有效的方法（FaiReg），该方法最小化了回归问题中不公平性的影响，尤其是由于标签偏差造成的影响。我们对该方法进行了理论分析，并与两种公平性标准方法进行了实证比较，即数据平衡和对抗训练。我们还包括一个混合形式（FaiRegH），将所提出的方法与数据平衡相结合，尝试同时面对标签和采样偏差。实验在多模态数据集First Impressions（FI）上进行，其中包含各种标签，如大五人格预测和面试筛选得分。结果显示，减少不公平性效果的性能优于数据平衡，而且与对抗训练相比，原始问题的性能并没有受到太大损害。公平性根据等准确性（EA）和统计平等（SP）约束进行评估。实验呈现了一个增强多个受保护变量公平性的设置。

更新时间: 2024-08-20 15:31:45

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2202.00993v2

A Closer Look at Data Augmentation Strategies for Finetuning-Based Low/Few-Shot Object Detection

Current methods for low- and few-shot object detection have primarily focused on enhancing model performance for detecting objects. One common approach to achieve this is by combining model finetuning with data augmentation strategies. However, little attention has been given to the energy efficiency of these approaches in data-scarce regimes. This paper seeks to conduct a comprehensive empirical study that examines both model performance and energy efficiency of custom data augmentations and automated data augmentation selection strategies when combined with a lightweight object detector. The methods are evaluated in three different benchmark datasets in terms of their performance and energy consumption, and the Efficiency Factor is employed to gain insights into their effectiveness considering both performance and efficiency. Consequently, it is shown that in many cases, the performance gains of data augmentation strategies are overshadowed by their increased energy usage, necessitating the development of more energy efficient data augmentation strategies to address data scarcity.

Updated: 2024-08-20 15:29:56

标题: 一个更详细的数据增强策略研究：基于微调的低/少样本目标检测

摘要: 目前，针对低和少样本目标检测的方法主要集中在提高模型性能以检测目标。实现这一目标的一种常见方法是将模型微调与数据增强策略相结合。然而，在数据稀缺的情况下，对这些方法的能源效率给予的关注较少。本文旨在进行一项全面的实证研究，检查当与轻量级目标检测器结合使用时，自定义数据增强和自动数据增强选择策略的模型性能和能源效率。这些方法在三个不同的基准数据集中以其性能和能源消耗进行评估，同时采用效率因子来洞察其有效性考虑到性能和效率。因此，研究表明，在许多情况下，数据增强策略的性能增益被其增加的能源使用所掩盖，迫使开发更加能源高效的数据增强策略以应对数据稀缺的问题。

更新时间: 2024-08-20 15:29:56

领域: cs.CV,cs.AI,cs.LG,cs.PF

下载: http://arxiv.org/abs/2408.10940v1

Conformalized Interval Arithmetic with Symmetric Calibration

Uncertainty quantification is essential in decision-making, especially when joint distributions of random variables are involved. While conformal prediction provides distribution-free prediction sets with valid coverage guarantees, it traditionally focuses on single predictions. This paper introduces novel conformal prediction methods for estimating the sum or average of unknown labels over specific index sets. We develop conformal prediction intervals for single target to the prediction interval for sum of multiple targets. Under permutation invariant assumptions, we prove the validity of our proposed method. We also apply our algorithms on class average estimation and path cost prediction tasks, and we show that our method outperforms existing conformalized approaches as well as non-conformal approaches.

Updated: 2024-08-20 15:27:18

标题: 使用对称校准的形式化区间算术

摘要: 不确定性量化在决策中至关重要，特别是涉及随机变量的联合分布时。虽然符合性预测提供了具有有效覆盖保证的无分布预测集，但传统上重点放在单个预测上。本文介绍了用于估计特定索引集上未知标签总和或平均值的新型符合性预测方法。我们为单个目标开发了符合性预测区间，以便预测多个目标的总和。在置换不变假设下，我们证明了我们提出的方法的有效性。我们还将我们的算法应用于类平均估计和路径成本预测任务，并展示了我们的方法优于现有的符合性方法以及非符合性方法。

更新时间: 2024-08-20 15:27:18

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2408.10939v1

SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement

Currently, most low-light image enhancement methods only consider information from a single view, neglecting the correlation between cross-view information. Therefore, the enhancement results produced by these methods are often unsatisfactory. In this context, there have been efforts to develop methods specifically for low-light stereo image enhancement. These methods take into account the cross-view disparities and enable interaction between the left and right views, leading to improved performance. However, these methods still do not fully exploit the interaction between left and right view information. To address this issue, we propose a model called Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement (SDI-Net). The backbone structure of SDI-Net is two encoder-decoder pairs, which are used to learn the mapping function from low-light images to normal-light images. Among the encoders and the decoders, we design a module named Cross-View Sufficient Interaction Module (CSIM), aiming to fully exploit the correlations between the binocular views via the attention mechanism. The quantitative and visual results on public datasets validate the superiority of our method over other related methods. Ablation studies also demonstrate the effectiveness of the key elements in our model.

Updated: 2024-08-20 15:17:11

标题: SDI-Net: 低光照立体图像增强的充分双视图交互

摘要: 目前，大多数低光图像增强方法仅考虑来自单个视图的信息，忽略了跨视图信息之间的相关性。因此，这些方法产生的增强结果通常令人不满意。在这种情况下，已经有一些努力开发专门用于低光立体图像增强的方法。这些方法考虑了跨视图的差异，并实现了左右视图之间的互动，从而提高了性能。然而，这些方法仍未充分利用左右视图信息之间的互动。为了解决这个问题，我们提出了一个名为 Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement（SDI-Net）的模型。SDI-Net的骨干结构是两个编码器-解码器对，用于学习从低光图像到正常光图像的映射函数。在编码器和解码器中，我们设计了一个名为Cross-View Sufficient Interaction Module（CSIM）的模块，旨在通过注意机制充分利用双眼视图之间的相关性。公共数据集上的定量和视觉结果验证了我们的方法优于其他相关方法。消融研究还证明了我们模型中关键元素的有效性。

更新时间: 2024-08-20 15:17:11

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2408.10934v1

The Evolution of Reinforcement Learning in Quantitative Finance

Reinforcement Learning (RL) has experienced significant advancement over the past decade, prompting a growing interest in applications within finance. This survey critically evaluates 167 publications, exploring diverse RL applications and frameworks in finance. Financial markets, marked by their complexity, multi-agent nature, information asymmetry, and inherent randomness, serve as an intriguing test-bed for RL. Traditional finance offers certain solutions, and RL advances these with a more dynamic approach, incorporating machine learning methods, including transfer learning, meta-learning, and multi-agent solutions. This survey dissects key RL components through the lens of Quantitative Finance. We uncover emerging themes, propose areas for future research, and critique the strengths and weaknesses of existing methods.

Updated: 2024-08-20 15:15:10

标题: 量化金融中强化学习的演变

摘要: 强化学习（RL）在过去十年中取得了显著进展，引发了金融领域应用的日益关注。本调查对167篇文献进行了批判性评估，探讨了金融领域中多样化的RL应用和框架。金融市场以其复杂性、多智能体性质、信息不对称性和固有的随机性而著称，为RL提供了一个引人注目的试验场。传统金融提供了一些解决方案，RL则通过更动态的方法推进了这些解决方案，包括整合机器学习方法，如迁移学习、元学习和多智能体解决方案。本调查通过量化金融的视角剖析了关键RL组件。我们揭示了新兴主题，提出了未来研究的领域，并批判了现有方法的优势和劣势。

更新时间: 2024-08-20 15:15:10

领域: cs.AI,cs.CE,cs.LG,I.2.6; I.2.1

下载: http://arxiv.org/abs/2408.10932v1

Reference-Guided Verdict: LLMs-as-Judges in Automatic Evaluation of Free-Form Text

The emergence of Large Language Models (LLMs) as chat assistants capable of generating human-like conversations has amplified the need for robust evaluation methods, particularly for open-ended tasks. Conventional metrics like BLEU and ROUGE, while useful, are increasingly inadequate for capturing the subtle semantics and contextual richness of such generative outputs. We propose a reference-guided verdict method that automates the evaluation process by leveraging multiple LLMs-as-judges. Through experiments on three open-ended question-answering tasks, we demonstrate that combining multiple LLMs-as-judges significantly improves the reliability and accuracy of evaluations, particularly in complex tasks where a single model might struggle. Our findings reveal a strong correlation with human evaluations, establishing our method as a viable and effective alternative to traditional metrics and human judgments, particularly in the context of LLM-based chat assistants where the complexity and diversity of responses challenge existing benchmarks.

Updated: 2024-08-20 15:12:08

标题: 参考指导的裁决：LLM作为法官在自由文本的自动评估中

摘要: 大型语言模型（LLMs）作为能够生成类似人类对话的聊天助手的出现，加剧了对稳健评估方法的需求，尤其是针对开放式任务。传统的度量标准如BLEU和ROUGE虽然有用，但越来越难以捕捉生成输出的微妙语义和上下文丰富性。我们提出了一种参考导向的评判方法，通过利用多个LLMs作为评判者自动化评估过程。通过对三个开放式问答任务的实验，我们证明结合多个LLMs作为评判者显著提高了评估的可靠性和准确性，特别是在单一模型可能遇到困难的复杂任务中。我们的研究结果与人类评估呈现出强相关性，将我们的方法确立为传统度量标准和人类判断的一种可行有效替代，尤其是在基于LLMs的聊天助手情境下，响应的复杂性和多样性挑战了现有基准。

更新时间: 2024-08-20 15:12:08

领域: cs.CL,cs.AI,68T50, 68T07, 68T20,I.2.0; I.2.7; I.2.2

下载: http://arxiv.org/abs/2408.09235v2

MTFinEval:A Multi-domain Chinese Financial Benchmark with Eurypalynous questions

With the emergence of more and more economy-specific LLMS, how to measure whether they can be safely invested in production becomes a problem. Previous research has primarily focused on evaluating the performance of LLMs within specific application scenarios. However, these benchmarks cannot reflect the theoretical level and generalization ability, and the backward datasets are increasingly unsuitable for problems in real scenarios. In this paper, we have compiled a new benchmark, MTFinEval, focusing on the LLMs' basic knowledge of economics, which can always be used as a basis for judgment. To examine only theoretical knowledge as much as possible, MTFinEval is build with foundational questions from university textbooks,and exam papers in economics and management major. Aware of the overall performance of LLMs do not depend solely on one subdiscipline of economics, MTFinEval comprise 360 questions refined from six major disciplines of economics, and reflect capabilities more comprehensively. Experiment result shows all LLMs perform poorly on MTFinEval, which proves that our benchmark built on basic knowledge is very successful. Our research not only offers guidance for selecting the appropriate LLM for specific use cases, but also put forward increase the rigor reliability of LLMs from the basics.

Updated: 2024-08-20 15:04:38

标题: MTFinEval：一个具有多个领域的中文金融基准，包含广泛的问题

摘要: 随着越来越多经济专业的LLMS的出现，如何衡量它们是否可以安全投资于生产成为一个问题。先前的研究主要集中在评估LLMs在特定应用场景中的表现。然而，这些基准无法反映理论水平和泛化能力，而且过时的数据集在真实场景中的问题越来越不适用。在本文中，我们编制了一个新的基准，MTFinEval，重点关注LLMs对经济学的基本知识，可以始终作为判断的依据。为了尽可能检验理论知识，MTFinEval是由经济学和管理专业的大学教科书和考试试卷中的基础问题构建而成。意识到LLMs的整体表现不仅仅取决于经济学的一个分支学科，MTFinEval包括来自经济学六个主要学科的360个问题，更全面地反映了能力。实验结果显示所有LLMs在MTFinEval上表现不佳，证明我们基于基础知识构建的基准非常成功。我们的研究不仅为选择特定用例的合适LLM提供了指导，还提出了从基础上增加LLMs的严谨可靠性。

更新时间: 2024-08-20 15:04:38

领域: cs.AI

下载: http://arxiv.org/abs/2408.10921v1

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

The Linear Representation Hypothesis (LRH) states that neural networks learn to encode concepts as directions in activation space, and a strong version of the LRH states that models learn only such encodings. In this paper, we present a counterexample to this strong LRH: when trained to repeat an input token sequence, gated recurrent neural networks (RNNs) learn to represent the token at each position with a particular order of magnitude, rather than a direction. These representations have layered features that are impossible to locate in distinct linear subspaces. To show this, we train interventions to predict and manipulate tokens by learning the scaling factor corresponding to each sequence position. These interventions indicate that the smallest RNNs find only this magnitude-based solution, while larger RNNs have linear representations. These findings strongly indicate that interpretability research should not be confined by the LRH.

Updated: 2024-08-20 15:04:37

标题: 循环神经网络通过非线性表示学习存储和生成序列

摘要: 线性表示假设（LRH）指出神经网络学习将概念编码为激活空间中的方向，而LRH的强版本则指出模型仅学习这种编码。本文提出了一个针对强LRH的反例：当训练重复输入标记序列时，门控循环神经网络（RNNs）学习用特定数量级表示每个位置的标记，而不是方向。这些表示具有无法在不同线性子空间中定位的分层特征。为了证明这一点，我们训练干预来预测和操作标记，通过学习对应于每个序列位置的缩放因子。这些干预表明最小的RNNs仅找到基于数量级的解决方案，而较大的RNNs具有线性表示。这些发现强烈表明解释性研究不应受LRH的限制。

更新时间: 2024-08-20 15:04:37

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2408.10920v1

Conformal e-prediction

This paper discusses a counterpart of conformal prediction for e-values, "conformal e-prediction". Conformal e-prediction is conceptually simpler and had been developed in the 1990s as precursor of conformal prediction. When conformal prediction emerged as result of replacing e-values by p-values, it seemed to have important advantages over conformal e-prediction without obvious disadvantages. This paper re-examines relations between conformal prediction and conformal e-prediction systematically from a modern perspective. Conformal e-prediction has advantages of its own, such as the ease of designing conditional conformal e-predictors and the guaranteed validity of cross-conformal e-predictors (whereas for cross-conformal predictors validity is only an empirical fact and can be broken with excessive randomization). Even where conformal prediction has clear advantages, conformal e-prediction can often emulate those advantages, more or less successfully. Conformal e-prediction can also serve as basis for testing; the resulting "conformal e-testing" looks very different from but inherits some strengths of conformal testing.

Updated: 2024-08-20 14:50:38

标题: 共形e-预测

摘要: 这篇论文讨论了与符合预测相对应的e-值的一种对应物，“符合e-预测”。符合e-预测在概念上更简单，作为符合预测的前身于1990年代开发。当符合预测作为通过用p-值替换e-值而产生时，似乎具有明显的优势而没有明显的劣势。本文从现代视角系统地重新审视了符合预测和符合e-预测之间的关系。符合e-预测有其自身的优势，例如设计条件符合e-预测器的简易性和交叉符合e-预测器的保证有效性（而对于交叉符合预测器，有效性仅是经验事实，可以通过过度随机化而破坏）。即使符合预测具有明显的优势，符合e-预测通常也可以在更或少成功地模拟这些优势。符合e-预测还可以作为测试的基础；由此产生的“符合e-测试”看起来与符合测试非常不同，但继承了一些符合测试的优势。

更新时间: 2024-08-20 14:50:38

领域: cs.LG,stat.ML,68T05 (Primary) 68Q32, 62G10, 62L10 (Secondary)

下载: http://arxiv.org/abs/2001.05989v3

The impact of labeling automotive AI as "trustworthy" or "reliable" on user evaluation and technology acceptance

This study explores whether labeling AI as "trustworthy" or "reliable" influences user perceptions and acceptance of automotive AI technologies. Using a one-way between-subjects design, the research involved 478 online participants who were presented with guidelines for either trustworthy or reliable AI. Participants then evaluated three vignette scenarios and completed a modified version of the Technology Acceptance Model, which included variables such as perceived ease of use, human-like trust, and overall attitude. Although labeling AI as "trustworthy" did not significantly influence judgments on specific scenarios, it increased perceived ease of use and human-like trust, particularly benevolence. This suggests a positive impact on usability and an anthropomorphic effect on user perceptions. The study provides insights into how specific labels can influence attitudes toward AI technology.

Updated: 2024-08-20 14:48:24

标题: 将汽车人工智能标记为“可信赖”或“可靠”对用户评估和技术接受的影响

摘要: 该研究探讨了将人工智能标记为“值得信赖”或“可靠”是否会影响用户对汽车人工智能技术的认知和接受程度。采用单向组间设计，研究涉及478名在线参与者，他们被呈现了“值得信赖”或“可靠”人工智能的指南。然后，参与者评估了三个情境场景并完成了修改版的技术接受模型，其中包括感知易用性、类人信任和整体态度等变量。尽管将人工智能标记为“值得信赖”并没有显著影响对特定情景的判断，但它增加了感知易用性和类人信任，尤其是善意。这表明对可用性有积极影响，并对用户认知产生了拟人效应。该研究为特定标签如何影响对人工智能技术的态度提供了见解。

更新时间: 2024-08-20 14:48:24

领域: cs.HC,cs.AI,cs.ET,K.4.1; H.5.2; H.4.2; J.7; J.4

下载: http://arxiv.org/abs/2408.10905v1

Towards Efficient Formal Verification of Spiking Neural Network

Recently, AI research has primarily focused on large language models (LLMs), and increasing accuracy often involves scaling up and consuming more power. The power consumption of AI has become a significant societal issue; in this context, spiking neural networks (SNNs) offer a promising solution. SNNs operate event-driven, like the human brain, and compress information temporally. These characteristics allow SNNs to significantly reduce power consumption compared to perceptron-based artificial neural networks (ANNs), highlighting them as a next-generation neural network technology. However, societal concerns regarding AI go beyond power consumption, with the reliability of AI models being a global issue. For instance, adversarial attacks on AI models are a well-studied problem in the context of traditional neural networks. Despite their importance, the stability and property verification of SNNs remains in the early stages of research. Most SNN verification methods are time-consuming and barely scalable, making practical applications challenging. In this paper, we introduce temporal encoding to achieve practical performance in verifying the adversarial robustness of SNNs. We conduct a theoretical analysis of this approach and demonstrate its success in verifying SNNs at previously unmanageable scales. Our contribution advances SNN verification to a practical level, facilitating the safer application of SNNs.

Updated: 2024-08-20 14:43:33

标题: 朝着高效的脉冲神经网络形式验证的方向

摘要: 最近，人工智能研究主要集中在大型语言模型（LLMs）上，提高准确性通常涉及扩展规模并消耗更多能量。人工智能的能量消耗已成为一个重要的社会问题；在这种背景下，脉冲神经网络（SNNs）提供了一个有希望的解决方案。SNNs以事件驱动方式运作，类似于人脑，并且在时间上压缩信息。这些特征使SNNs能够显著降低能量消耗，与基于感知器的人工神经网络（ANNs）相比，突出它们作为下一代神经网络技术。然而，关于人工智能的社会关注超出了能量消耗，AI模型的可靠性是一个全球性问题。例如，在传统神经网络的背景下，对AI模型的敌对攻击是一个被广泛研究的问题。尽管它们很重要，但SNN的稳定性和属性验证仍处于研究的早期阶段。大多数SNN验证方法耗时且难以扩展，使实际应用具有挑战性。在本文中，我们引入时间编码以实现在验证SNN的敌对稳健性方面取得实际性能。我们对这种方法进行了理论分析，并展示了它在以前难以管理的规模上验证SNN的成功。我们的贡献将SNN验证推进到实际水平，有助于更安全地应用SNNs。

更新时间: 2024-08-20 14:43:33

领域: cs.AI,cs.ET,cs.NE

下载: http://arxiv.org/abs/2408.10900v1

Learning material synthesis-process-structure-property relationship by data fusion: Bayesian Coregionalization N-Dimensional Piecewise Function Learning

Autonomous materials research labs require the ability to combine and learn from diverse data streams. This is especially true for learning material synthesis-process-structure-property relationships, key to accelerating materials optimization and discovery as well as accelerating mechanistic understanding. We present the Synthesis-process-structure-property relAtionship coreGionalized lEarner (SAGE) algorithm. A fully Bayesian algorithm that uses multimodal coregionalization to merge knowledge across data sources to learn synthesis-process-structure-property relationships. SAGE outputs a probabilistic posterior for the relationships including the most likely relationships given the data.

Updated: 2024-08-20 14:42:50

标题: 通过数据融合学习材料合成-过程-结构-性能关系：Bayesian核共现N维分段函数学习

摘要: 自主材料研究实验室需要能够结合和学习各种数据流。这对于学习材料合成-过程-结构-性质关系尤为重要，这是加速材料优化和发现以及加速机理理解的关键。我们提出了合成-过程-结构-性质关系核区化学习者（SAGE）算法。这是一个完全贝叶斯算法，利用多模态核区化技术将知识跨数据源合并，以学习合成-过程-结构-性质关系。SAGE输出了一个概率后验，包括给定数据情况下最可能的关系。

更新时间: 2024-08-20 14:42:50

领域: cs.LG,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2311.06228v3

Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models

While many have shown how Large Language Models (LLMs) can be applied to a diverse set of tasks, the critical issues of data contamination and memorization are often glossed over. In this work, we address this concern for tabular data. Specifically, we introduce a variety of different techniques to assess whether a language model has seen a tabular dataset during training. This investigation reveals that LLMs have memorized many popular tabular datasets verbatim. We then compare the few-shot learning performance of LLMs on datasets that were seen during training to the performance on datasets released after training. We find that LLMs perform better on datasets seen during training, indicating that memorization leads to overfitting. At the same time, LLMs show non-trivial performance on novel datasets and are surprisingly robust to data transformations. We then investigate the in-context statistical learning abilities of LLMs. While LLMs are significantly better than random at solving statistical classification problems, the sample efficiency of few-shot learning lags behind traditional statistical learning algorithms, especially as the dimension of the problem increases. This suggests that much of the observed few-shot performance on novel real-world datasets is due to the LLM's world knowledge. Overall, our results highlight the importance of testing whether an LLM has seen an evaluation dataset during pre-training. We release the https://github.com/interpretml/LLM-Tabular-Memorization-Checker Python package to test LLMs for memorization of tabular datasets.

Updated: 2024-08-20 14:35:03

标题: 大象永远不会忘记：大型语言模型中表格数据的记忆和学习

摘要: 尽管许多人已经展示了大型语言模型（LLMs）如何应用于各种任务，但数据污染和记忆的关键问题经常被忽视。在这项工作中，我们针对表格数据解决了这一问题。具体来说，我们引入了各种不同的技术来评估语言模型是否在训练过程中看到了表格数据集。这项研究揭示了LLMs已经原封不动地记住了许多流行的表格数据集。然后，我们比较了LLMs在训练期间看到的数据集和训练后发布的数据集上的few-shot学习表现。我们发现LLMs在训练期间看到的数据集上表现更好，表明记忆导致了过拟合。与此同时，LLMs在新数据集上表现出非常好的性能，并且对数据转换具有惊人的鲁棒性。然后，我们调查了LLMs的上下文统计学习能力。尽管LLMs在解决统计分类问题上明显优于随机方法，但few-shot学习的样本效率落后于传统的统计学习算法，特别是在问题维度增加时。这表明观察到的LLMs在新实际数据集上的few-shot性能大部分是由LLMs的世界知识决定的。总的来说，我们的结果强调了在预训练期间测试LLM是否看到了评估数据集的重要性。我们发布了https://github.com/interpretml/LLM-Tabular-Memorization-Checker Python程序包，用于测试LLMs是否记忆了表格数据集。

更新时间: 2024-08-20 14:35:03

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.06209v2

Analytical and Empirical Study of Herding Effects in Recommendation Systems

Online rating systems are often used in numerous web or mobile applications, e.g., Amazon and TripAdvisor, to assess the ground-truth quality of products. Due to herding effects, the aggregation of historical ratings (or historical collective opinion) can significantly influence subsequent ratings, leading to misleading and erroneous assessments. We study how to manage product ratings via rating aggregation rules and shortlisted representative reviews, for the purpose of correcting the assessment error. We first develop a mathematical model to characterize important factors of herding effects in product ratings. We then identify sufficient conditions (via the stochastic approximation theory), under which the historical collective opinion converges to the ground-truth collective opinion of the whole user population. These conditions identify a class of rating aggregation rules and review selection mechanisms that can reveal the ground-truth product quality. We also quantify the speed of convergence (via the martingale theory), which reflects the efficiency of rating aggregation rules and review selection mechanisms. We prove that the herding effects slow down the speed of convergence while an accurate review selection mechanism can speed it up. We also study the speed of convergence numerically and reveal trade-offs in selecting rating aggregation rules and review selection mechanisms. To show the utility of our framework, we design a maximum likelihood algorithm to infer model parameters from ratings, and conduct experiments on rating datasets from Amazon and TripAdvisor. We show that proper recency aware rating aggregation rules can improve the speed of convergence in Amazon and TripAdvisor by 41% and 62% respectively.

Updated: 2024-08-20 14:29:23

标题: 推荐系统中兽群效应的分析与实证研究

摘要: 在线评分系统经常用于众多网络或移动应用程序中，例如亚马逊和TripAdvisor，用于评估产品的真实质量。由于群体效应，历史评分（或历史集体意见）的汇总可以显著影响后续评分，导致误导和错误的评估。我们研究如何通过评分聚合规则和入选代表性评论来管理产品评分，以纠正评估错误。我们首先开发了一个数学模型来表征产品评分中群体效应的重要因素。然后，我们通过随机逼近理论确定了历史集体意见收敛到整个用户群体的真实集体意见的充分条件。这些条件确定了一类可以揭示真实产品质量的评分聚合规则和评论选择机制。我们还量化了收敛速度（通过鞍点理论），这反映了评分聚合规则和评论选择机制的效率。我们证明了群体效应会减慢收敛速度，而准确的评论选择机制可以加快收敛速度。我们还通过数值研究了收敛速度，并揭示了在选择评分聚合规则和评论选择机制时的权衡。为了展示我们框架的实用性，我们设计了一个最大似然算法，从亚马逊和TripAdvisor的评分数据集中推断模型参数，并进行实验。我们展示了适当的最新感知评分聚合规则可以分别提高亚马逊和TripAdvisor的收敛速度41%和62%。

更新时间: 2024-08-20 14:29:23

领域: cs.AI

下载: http://arxiv.org/abs/2408.10895v1

DAAD: Dynamic Analysis and Adaptive Discriminator for Fake News Detection

In current web environment, fake news spreads rapidly across online social networks, posing serious threats to society. Existing multimodal fake news detection (MFND) methods can be classified into knowledge-based and semantic-based approaches. However, these methods are overly dependent on human expertise and feedback, lacking flexibility. To address this challenge, we propose a Dynamic Analysis and Adaptive Discriminator (DAAD) approach for fake news detection. For knowledge-based methods, we introduce the Monte Carlo Tree Search (MCTS) algorithm to leverage the self-reflective capabilities of large language models (LLMs) for prompt optimization, providing richer, domain-specific details and guidance to the LLMs, while enabling more flexible integration of LLM comment on news content. For semantic-based methods, we define four typical deceit patterns: emotional exaggeration, logical inconsistency, image manipulation, and semantic inconsistency, to reveal the mechanisms behind fake news creation. To detect these patterns, we carefully design four discriminators and expand them in depth and breadth, using the soft-routing mechanism to explore optimal detection models. Experimental results on three real-world datasets demonstrate the superiority of our approach. The code will be available at: https://github.com/SuXinqi/DAAD.

Updated: 2024-08-20 14:13:54

标题: DAAD：用于假新闻检测的动态分析和自适应鉴别器

摘要: 在当前的网络环境中，假新闻在在线社交网络中迅速传播，给社会带来严重威胁。现有的多模态假新闻检测（MFND）方法可以分为基于知识和基于语义的方法。然而，这些方法过于依赖人类专业知识和反馈，缺乏灵活性。为了解决这一挑战，我们提出了一种动态分析和自适应鉴别器（DAAD）方法用于假新闻检测。对于基于知识的方法，我们引入蒙特卡洛树搜索（MCTS）算法，利用大型语言模型（LLMs）的自我反思能力进行及时优化，提供更丰富的领域特定细节和指导给LLMs，同时使LLMs对新闻内容的评论更加灵活。对于基于语义的方法，我们定义了四种典型的欺骗模式：情绪夸张、逻辑不一致、图像篡改和语义不一致，以揭示假新闻制造的机制。为了检测这些模式，我们精心设计了四个鉴别器，并在深度和广度上进行扩展，使用软路由机制来探索最优的检测模型。在三个真实世界数据集上的实验结果证明了我们方法的优越性。代码将在以下链接提供：https://github.com/SuXinqi/DAAD。

更新时间: 2024-08-20 14:13:54

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.10883v1

Exploiting Defenses against GAN-Based Feature Inference Attacks in Federated Learning

Federated learning (FL) is a decentralized model training framework that aims to merge isolated data islands while maintaining data privacy. However, recent studies have revealed that Generative Adversarial Network (GAN) based attacks can be employed in FL to learn the distribution of private datasets and reconstruct recognizable images. In this paper, we exploit defenses against GAN-based attacks in FL and propose a framework, Anti-GAN, to prevent attackers from learning the real distribution of the victim's data. The core idea of Anti-GAN is to manipulate the visual features of private training images to make them indistinguishable to human eyes even restored by attackers. Specifically, Anti-GAN projects the private dataset onto a GAN's generator and combines the generated fake images with the actual images to create the training dataset, which is then used for federated model training. The experimental results demonstrate that Anti-GAN is effective in preventing attackers from learning the distribution of private images while causing minimal harm to the accuracy of the federated model.

Updated: 2024-08-20 14:11:18

标题: 利用联邦学习中针对基于GAN的特征推理攻击的防御措施

摘要: 联邦学习（FL）是一个分散的模型训练框架，旨在合并孤立的数据岛，同时保持数据隐私。然而，最近的研究发现，基于生成对抗网络（GAN）的攻击可以在FL中被使用，学习私有数据集的分布并重建可识别的图像。在本文中，我们利用对抗GAN攻击的防御措施，并提出了一个名为Anti-GAN的框架，以防止攻击者学习受害者数据的真实分布。Anti-GAN的核心思想是操纵私有训练图像的视觉特征，使它们在被攻击者还原后对人眼无法区分。具体来说，Anti-GAN将私有数据集投影到GAN的生成器上，并将生成的假图像与实际图像结合起来创建训练数据集，然后用于联邦模型训练。实验结果表明，Anti-GAN在防止攻击者学习私有图像分布的同时，对联邦模型的准确性造成最小的伤害。

更新时间: 2024-08-20 14:11:18

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2004.12571v3

Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity

For the problem of reconstructing a low-rank matrix from a few linear measurements, two classes of algorithms have been widely studied in the literature: convex approaches based on nuclear norm minimization, and non-convex approaches that use factorized gradient descent. Under certain statistical model assumptions, it is known that nuclear norm minimization recovers the ground truth as soon as the number of samples scales linearly with the number of degrees of freedom of the ground-truth. In contrast, while non-convex approaches are computationally less expensive, existing recovery guarantees assume that the number of samples scales at least quadratically with the rank $r$ of the ground-truth matrix. In this paper, we close this gap by showing that the non-convex approaches can be as efficient as nuclear norm minimization in terms of sample complexity. Namely, we consider the problem of reconstructing a positive semidefinite matrix from a few Gaussian measurements. We show that factorized gradient descent with spectral initialization converges to the ground truth with a linear rate as soon as the number of samples scales with $ \Omega (rd\kappa^2)$, where $d$ is the dimension, and $\kappa$ is the condition number of the ground truth matrix. This improves the previous rank-dependence from quadratic to linear. Our proof relies on a probabilistic decoupling argument, where we show that the gradient descent iterates are only weakly dependent on the individual entries of the measurement matrices. We expect that our proof technique is of independent interest for other non-convex problems.

Updated: 2024-08-20 14:09:28

标题: 非凸矩阵感知：在样本复杂性中突破二次秩限制Barrier

摘要: 针对从少量线性测量中重建低秩矩阵的问题，在文献中广泛研究了两类算法：基于核范数最小化的凸方法和使用分解梯度下降的非凸方法。在某些统计模型假设下，已知核范数最小化可以在样本数量与地面实况的自由度数量线性扩展时恢复地面真相。相比之下，虽然非凸方法在计算上更便宜，但现有的恢复保证假设样本数量至少随地面真实矩阵的秩$r$的平方成比例。在本文中，我们通过展示非凸方法可以在样本复杂性方面与核范数最小化一样高效来弥合这一差距。换句话说，我们考虑从少量高斯测量中重建半正定矩阵的问题。我们展示出具有谱初始化的分解梯度下降会以线性速率收敛到地面真相，只要样本数量与$ \Omega (rd\kappa^2)$成比例，其中$d$是维度，$\kappa$是地面真实矩阵的条件数。这将前述的秩依赖性从二次改进为线性。我们的证明依赖于概率解耦论证，我们展示了梯度下降迭代仅对测量矩阵的各个项弱相关。我们期望我们的证明技术对于其他非凸问题也具有独立的兴趣。

更新时间: 2024-08-20 14:09:28

领域: stat.ML,cs.IT,cs.LG,math.IT,math.OC,math.PR,math.ST,stat.TH

下载: http://arxiv.org/abs/2408.13276v1

More Options for Prelabor Rupture of Membranes, A Bayesian Analysis

An obstetric goal for a laboring mother is to achieve a vaginal delivery as it reduces the risks inherent in major abdominal surgery (i.e., a Cesarean section). Various medical interventions may be used by a physician to increase the likelihood of this occurring while minimizing maternal and fetal morbidity. However, patients with prelabor rupture of membranes (PROM) have only two commonly used options for cervical ripening, Pitocin and misoprostol. Little research exists on the benefits/risks for these two key drugs for PROM patients. A major limitation with most induction-of-labor related research is the inability to account for differences in \textit{Bishop scores} that are commonly used in obstetrical practice to determine the next induction agent offered to the patient. This creates a confounding factor, which biases the results, but has not been realized in the literature. In this work, we use a Bayesian model of the relationships between the relevant factors, informed by expert physicians, to separate the confounding variable from its actual impact. In doing so, we provide strong evidence that pitocin and buccal misoprostol are equally effective and safe; thus, physicians have more choice in clinical care than previously realized. This is particularly important for developing countries where neither medication may be readily available, and prior guidelines may create an artificial barrier to needed medication.

Updated: 2024-08-20 14:05:25

标题: 更多的选项用于产前胎膜破裂，一项贝叶斯分析

摘要: 一位分娩中的产妇的产科目标是实现阴道分娩，因为这可以减少主要腹部手术（即剖宫产）固有的风险。医生可以使用各种医疗干预措施来增加这种情况发生的可能性，同时最大程度地减少母婴的发病率。然而，羊水破裂（PROM）的患者只有两种常用的宫颈成熟选项，即飞桃定和米索前列醇。关于这两种关键药物对PROM患者的利益/风险几乎没有研究。大多数有关催产的研究的一个主要局限性是无法考虑在产科实践中常用的“母教评分”之间的差异，以确定下一个提供给患者的催产剂。这会产生一个混杂因素，会对结果造成偏见，但在文献中尚未意识到。在这项工作中，我们使用专家医生提供的信息，建立了相关因素之间的贝叶斯模型，将混杂变量与其实际影响分开。通过这样做，我们提供了强有力的证据表明飞桃定和口服米索前列醇同样有效且安全；因此，医生在临床护理中有比以前更多的选择。这对于那些两种药物可能不易获得的发展中国家尤为重要，并且先前的指南可能对所需药物造成人为障碍。

更新时间: 2024-08-20 14:05:25

领域: stat.AP,cs.LG

下载: http://arxiv.org/abs/2408.10876v1

Radio U-Net: a convolutional neural network to detect diffuse radio sources in galaxy clusters and beyond

The forthcoming generation of radio telescope arrays promises significant advancements in sensitivity and resolution, enabling the identification and characterization of many new faint and diffuse radio sources. Conventional manual cataloging methodologies are anticipated to be insufficient to exploit the capabilities of new radio surveys. Radio interferometric images of diffuse sources present a challenge for image segmentation tasks due to noise, artifacts, and embedded radio sources. In response to these challenges, we introduce Radio U-Net, a fully convolutional neural network based on the U-Net architecture. Radio U-Net is designed to detect faint and extended sources in radio surveys, such as radio halos, relics, and cosmic web filaments. Radio U-Net was trained on synthetic radio observations built upon cosmological simulations and then tested on a sample of galaxy clusters, where the detection of cluster diffuse radio sources relied on customized data reduction and visual inspection of LOFAR Two Metre Sky Survey (LoTSS) data. The 83% of clusters exhibiting diffuse radio emission were accurately identified, and the segmentation successfully recovered the morphology of the sources even in low-quality images. In a test sample comprising 246 galaxy clusters, we achieved a 73% accuracy rate in distinguishing between clusters with and without diffuse radio emission. Our results establish the applicability of Radio U-Net to extensive radio survey datasets, probing its efficiency on cutting-edge high-performance computing systems. This approach represents an advancement in optimizing the exploitation of forthcoming large radio surveys for scientific exploration.

Updated: 2024-08-20 14:03:21

标题: 无线电 U-Net：一种卷积神经网络，用于在星团和其他地方检测扩散的无线电源

摘要: 即将到来的一代射电望远镜阵列承诺在灵敏度和分辨率方面取得显著进展，使得能够识别和表征许多新的微弱和扩散的射电源。传统的手动编录方法预计将不足以充分利用新射电调查的能力。扩散源的射电干涉图像对图像分割任务构成挑战，原因是噪声、伪像和嵌入的射电源。针对这些挑战，我们引入了Radio U-Net，这是一个基于U-Net架构的全卷积神经网络。Radio U-Net旨在检测射电调查中的微弱和扩展源，如射电晕、残留物和宇宙网丝。Radio U-Net在基于宇宙学模拟构建的合成射电观测数据上进行了训练，然后在一些星系团样本上进行了测试，其中对星系团扩散射电源的检测依赖于定制的数据减少和对LOFAR两米天空调查（LoTSS）数据的可视检查。准确识别了83%表现出扩散射电发射的星系团，并且分割成功地在低质量图像中恢复了源的形态。在由246个星系团组成的测试样本中，我们在区分是否具有扩散射电发射的星系团方面达到了73%的准确率。我们的结果证实了Radio U-Net对大量射电调查数据集的适用性，验证了其在尖端高性能计算系统上的效率。这种方法代表了在优化利用即将到来的大型射电调查进行科学探索方面的进展。

更新时间: 2024-08-20 14:03:21

领域: astro-ph.IM,astro-ph.CO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.10871v1

Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities

Motivated by distributed selection problems, we formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures stochastic arrival of requests to each arm, as well as the policy of allocating requests to players. The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile (an arm pulling profile prescribes the number of players at each arm) without communicating to each other. We first design a greedy algorithm, which locates one of the optimal arm pulling profiles with a polynomial computational complexity. We also design an iterative distributed algorithm for players to commit to an optimal arm pulling profile with a constant number of rounds in expectation. We apply the explore then commit (ETC) framework to address the online setting when model parameters are unknown. We design an exploration strategy for players to estimate the optimal arm pulling profile. Since such estimates can be different across different players, it is challenging for players to commit. We then design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds. We conduct experiments to validate our algorithm.

Updated: 2024-08-20 13:57:00

标题: 带有随机可共享臂容量的多智能体多臂老虎机

摘要: 受到分布式选择问题的启发，我们提出了一个新的多玩家多臂老虎机（MAB）模型变体，该模型捕捉到每个手臂的请求的随机到达，以及分配请求给玩家的策略。挑战在于如何设计一个分布式学习算法，使得玩家根据最佳的拉手臂配置（一个拉手臂配置规定了每个手臂上的玩家数量）选择手臂，而无需彼此通信。我们首先设计了一种贪婪算法，该算法在多项式计算复杂度内找到最佳的拉手臂配置之一。我们还设计了一个迭代分布式算法，使得玩家可以在期望的常数轮次内承诺到一个最佳的拉手臂配置。我们应用探索然后承诺（ETC）框架来处理当模型参数未知时的在线设置。我们为玩家设计了一个探索策略来估计最佳的拉手臂配置。由于这些估计在不同玩家之间可能不同，因此对玩家来说承诺是具有挑战性的。然后我们设计了一个迭代分布式算法，保证玩家可以在仅M轮次内达成对最佳的拉手臂配置的共识。我们进行实验来验证我们的算法。

更新时间: 2024-08-20 13:57:00

领域: cs.AI

下载: http://arxiv.org/abs/2408.10865v1

SUBER: An RL Environment with Simulated Human Behavior for Recommender Systems

Reinforcement learning (RL) has gained popularity in the realm of recommender systems due to its ability to optimize long-term rewards and guide users in discovering relevant content. However, the successful implementation of RL in recommender systems is challenging because of several factors, including the limited availability of online data for training on-policy methods. This scarcity requires expensive human interaction for online model training. Furthermore, the development of effective evaluation frameworks that accurately reflect the quality of models remains a fundamental challenge in recommender systems. To address these challenges, we propose a comprehensive framework for synthetic environments that simulate human behavior by harnessing the capabilities of large language models (LLMs). We complement our framework with in-depth ablation studies and demonstrate its effectiveness with experiments on movie and book recommendations. Using LLMs as synthetic users, this work introduces a modular and novel framework to train RL-based recommender systems. The software, including the RL environment, is publicly available on GitHub.

Updated: 2024-08-20 13:56:21

标题: SUBER：一个具有模拟人类行为的RL环境，用于推荐系统

摘要: 强化学习（RL）在推荐系统领域备受欢迎，因为它能够优化长期回报并指导用户发现相关内容。然而，在推荐系统中成功实施RL是具有挑战性的，因为存在多种因素，包括在线数据的有限可用性，用于训练在线策略方法。这种稀缺性需要昂贵的人类互动来进行在线模型训练。此外，开发有效的评估框架以准确反映模型质量在推荐系统中仍然是一个根本性挑战。为了解决这些挑战，我们提出了一个综合的框架，用于通过利用大型语言模型（LLMs）的能力模拟人类行为的合成环境。我们通过深入的消融研究来补充我们的框架，并通过对电影和书籍推荐的实验来展示其有效性。利用LLMs作为合成用户，本研究引入了一个模块化和新颖的框架来训练基于RL的推荐系统。该软件，包括RL环境，已在GitHub上公开提供。

更新时间: 2024-08-20 13:56:21

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2406.01631v2

Knowledge Sharing and Transfer via Centralized Reward Agent for Multi-Task Reinforcement Learning

Reward shaping is effective in addressing the sparse-reward challenge in reinforcement learning by providing immediate feedback through auxiliary informative rewards. Based on the reward shaping strategy, we propose a novel multi-task reinforcement learning framework, that integrates a centralized reward agent (CRA) and multiple distributed policy agents. The CRA functions as a knowledge pool, which aims to distill knowledge from various tasks and distribute it to individual policy agents to improve learning efficiency. Specifically, the shaped rewards serve as a straightforward metric to encode knowledge. This framework not only enhances knowledge sharing across established tasks but also adapts to new tasks by transferring valuable reward signals. We validate the proposed method on both discrete and continuous domains, demonstrating its robustness in multi-task sparse-reward settings and its effective transferability to unseen tasks.

Updated: 2024-08-20 13:49:26

标题: 通过集中式奖励代理实现的多任务强化学习知识共享和转移

摘要: 奖励设计在强化学习中有效地解决了稀疏奖励挑战，通过提供即时反馈来辅助信息奖励。基于奖励设计策略，我们提出了一种新颖的多任务强化学习框架，该框架集成了一个集中式奖励代理（CRA）和多个分布式策略代理。CRA作为一个知识池，旨在从各种任务中提炼知识并将其分发给个体策略代理以提高学习效率。具体来说，塑造的奖励作为编码知识的直接度量。该框架不仅增强了跨已建立任务的知识共享，还通过传递有价值的奖励信号适应了新任务。我们在离散和连续领域上验证了所提出的方法，在多任务稀疏奖励设置中展示了其鲁棒性，并有效地将其转移到未见任务。

更新时间: 2024-08-20 13:49:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.10858v1

MambaDS: Near-Surface Meteorological Field Downscaling with Topography Constrained Selective State Space Modeling

In an era of frequent extreme weather and global warming, obtaining precise, fine-grained near-surface weather forecasts is increasingly essential for human activities. Downscaling (DS), a crucial task in meteorological forecasting, enables the reconstruction of high-resolution meteorological states for target regions from global-scale forecast results. Previous downscaling methods, inspired by CNN and Transformer-based super-resolution models, lacked tailored designs for meteorology and encountered structural limitations. Notably, they failed to efficiently integrate topography, a crucial prior in the downscaling process. In this paper, we address these limitations by pioneering the selective state space model into the meteorological field downscaling and propose a novel model called MambaDS. This model enhances the utilization of multivariable correlations and topography information, unique challenges in the downscaling process while retaining the advantages of Mamba in long-range dependency modeling and linear computational complexity. Through extensive experiments in both China mainland and the continental United States (CONUS), we validated that our proposed MambaDS achieves state-of-the-art results in three different types of meteorological field downscaling settings. We will release the code subsequently.

Updated: 2024-08-20 13:45:49

标题: MambaDS：受地形约束的选择性状态空间建模的近地表气象场细化

摘要: 在频繁极端天气和全球变暖的时代，获取精确、细粒度的近地表天气预报对人类活动日益重要。下尺度化（DS）是气象预报中的关键任务，它能够从全球范围的预报结果中重建目标区域的高分辨率气象状态。以前的下尺度化方法，受到CNN和基于Transformer的超分辨率模型的启发，缺乏针对气象的定制设计，并遇到结构限制。值得注意的是，它们未能有效地整合地形，这是下尺度化过程中的关键先验条件。在这篇论文中，我们通过将选择性状态空间模型引入气象领域下尺度化，提出了一种名为MambaDS的新模型。该模型增强了多变量相关性和地形信息的利用，这是下尺度化过程中独特的挑战，同时保留了Mamba在长程依赖建模和线性计算复杂度方面的优势。通过在中国大陆和美国大陆（CONUS）进行广泛实验，我们验证了我们提出的MambaDS在三种不同类型的气象领域下尺度化设置中取得了最先进的结果。我们将随后发布代码。

更新时间: 2024-08-20 13:45:49

领域: physics.ao-ph,cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.10854v1

Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based audio have become increasingly critical. This paper investigate the effectiveness of current countermeasure (CM) against ALM-based audio. Specifically, we collect 12 types of the latest ALM-based deepfake audio and utilizing the latest CMs to evaluate. Our findings reveal that the latest codec-trained CM can effectively detect ALM-based audio, achieving 0% equal error rate under most ALM test conditions, which exceeded our expectations. This indicates promising directions for future research in ALM-based deepfake audio detection.

Updated: 2024-08-20 13:45:34

标题: 当前的深度伪造音频检测模型能有效地检测基于ALM的深度伪造音频吗？

摘要: 目前，由于大型语言模型和音频神经编解码器的发展，音频语言模型（ALMs）正在迅速发展。这些ALMs显著降低了创建深度伪造音频的门槛，生成高度逼真和多样化类型的深度伪造音频，对社会构成严重威胁。因此，对于检测基于ALM的音频的有效音频深度伪造检测技术变得日益关键。本文研究了目前针对基于ALM的音频的反制措施（CM）的有效性。具体而言，我们收集了12种最新的基于ALM的深度伪造音频，并利用最新的CM进行评估。我们的研究结果显示，最新的编解码器训练的CM可以有效检测基于ALM的音频，在大多数ALM测试条件下实现了0％的等误差率，超出了我们的预期。这表明了在基于ALM的深度伪造音频检测方面未来研究的有希望的方向。

更新时间: 2024-08-20 13:45:34

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2408.10853v1

CELLM: An Efficient Communication in Large Language Models Training for Federated Learning

Federated Learning (FL) is a recent model training paradigm in which client devices collaboratively train a model without ever aggregating their data. Crucially, this scheme offers users potential privacy and security benefits by only ever communicating updates to the model weights to a central server as opposed to traditional machine learning (ML) training which directly communicates and aggregates data. However, FL training suffers from statistical heterogeneity as clients may have differing local data distributions. Large language models (LLMs) offer a potential solution to this issue of heterogeneity given that they have consistently been shown to be able to learn on vast amounts of noisy data. While LLMs are a promising development for resolving the consistent issue of non-I.I.D. Clients in federated settings exacerbate two other bottlenecks in FL: limited local computing and expensive communication. This thesis aims to develop efficient training methods for LLMs in FL. To this end, we employ two critical techniques in enabling efficient training. First, we use low-rank adaptation (LoRA) to reduce the computational load of local model training. Second, we communicate sparse updates throughout training to significantly cut down on communication costs. Taken together, our method reduces communication costs by up to 10x over vanilla LoRA and up to 5x over more complex sparse LoRA baselines while achieving greater utility. We emphasize the importance of carefully applying sparsity and picking effective rank and sparsity configurations for federated LLM training.

Updated: 2024-08-20 13:42:25

标题: CELLM：联邦学习中大型语言模型训练的高效通信

摘要: 联邦学习（FL）是一种最近的模型训练范式，其中客户端设备在不聚合其数据的情况下协同训练模型。关键是，这种方案通过将模型权重的更新仅传达给中央服务器，而不是传统的机器学习（ML）训练直接通信和聚合数据，为用户提供潜在的隐私和安全益处。然而，FL训练受到统计异质性的影响，因为客户端可能具有不同的本地数据分布。大型语言模型（LLMs）为解决这种异质性问题提供了潜在的解决方案，因为它们一直被证明能够在大量嘈杂数据上学习。尽管LLMs是解决联邦设置中非I.I.D.客户端的持续问题的一个有希望的发展，但它们加剧了FL中的另外两个瓶颈：有限的本地计算和昂贵的通信。本论文旨在开发LLMs在FL中的高效训练方法。为此，我们采用了两种关键技术来实现高效训练。首先，我们使用低秩调整（LoRA）来减少本地模型训练的计算负载。其次，我们在整个训练过程中传递稀疏更新，从而显著减少通信成本。综合考虑，我们的方法将通信成本降低了高达10倍以上，比纯粹的LoRA和更复杂的稀疏LoRA基线减少了高达5倍，同时实现了更大的效用。我们强调谨慎应用稀疏性，并为联邦LLM训练选择有效的秩和稀疏配置的重要性。

更新时间: 2024-08-20 13:42:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.20557v2

Benchmarking Large Language Models for Math Reasoning Tasks

The use of Large Language Models (LLMs) in mathematical reasoning has become a cornerstone of related research, demonstrating the intelligence of these models and enabling potential practical applications through their advanced performance, such as in educational settings. Despite the variety of datasets and in-context learning algorithms designed to improve the ability of LLMs to automate mathematical problem solving, the lack of comprehensive benchmarking across different datasets makes it complicated to select an appropriate model for specific tasks. In this project, we present a benchmark that fairly compares seven state-of-the-art in-context learning algorithms for mathematical problem solving across five widely used mathematical datasets on four powerful foundation models. Furthermore, we explore the trade-off between efficiency and performance, highlighting the practical applications of LLMs for mathematical reasoning. Our results indicate that larger foundation models like GPT-4o and LLaMA 3-70B can solve mathematical reasoning independently from the concrete prompting strategy, while for smaller models the in-context learning approach significantly influences the performance. Moreover, the optimal prompt depends on the chosen foundation model. We open-source our benchmark code to support the integration of additional models in future research.

Updated: 2024-08-20 13:34:17

标题: 对数学推理任务的大型语言模型进行基准测试

摘要: 大型语言模型（LLMs）在数学推理中的应用已经成为相关研究的基石，展示了这些模型的智能，并通过其先进性能实现了潜在的实际应用，例如在教育环境中。尽管有各种数据集和设计用于改善LLMs自动化数学问题解决能力的上下文学习算法，但在不同数据集之间缺乏全面的基准测试，使得选择特定任务的适当模型变得复杂。在这个项目中，我们提出了一个基准测试，公平比较了七种最先进的上下文学习算法在四个强大基础模型上跨五个广泛使用的数学数据集上的数学问题解决能力。此外，我们探讨了效率和性能之间的权衡，突出了LLMs在数学推理中的实际应用。我们的结果表明，像GPT-4o和LLaMA 3-70B这样的大型基础模型可以独立解决数学推理问题，而较小的模型则受到上下文学习方法对性能的显著影响。此外，最佳提示取决于选择的基础模型。我们开源我们的基准测试代码，以支持未来研究中额外模型的集成。

更新时间: 2024-08-20 13:34:17

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.10839v1

Multilevel CNNs for Parametric PDEs based on Adaptive Finite Elements

A neural network architecture is presented that exploits the multilevel properties of high-dimensional parameter-dependent partial differential equations, enabling an efficient approximation of parameter-to-solution maps, rivaling best-in-class methods such as low-rank tensor regression in terms of accuracy and complexity. The neural network is trained with data on adaptively refined finite element meshes, thus reducing data complexity significantly. Error control is achieved by using a reliable finite element a posteriori error estimator, which is also provided as input to the neural network. The proposed U-Net architecture with CNN layers mimics a classical finite element multigrid algorithm. It can be shown that the CNN efficiently approximates all operations required by the solver, including the evaluation of the residual-based error estimator. In the CNN, a culling mask set-up according to the local corrections due to refinement on each mesh level reduces the overall complexity, allowing the network optimization with localized fine-scale finite element data. A complete convergence and complexity analysis is carried out for the adaptive multilevel scheme, which differs in several aspects from previous non-adaptive multilevel CNN. Moreover, numerical experiments with common benchmark problems from Uncertainty Quantification illustrate the practical performance of the architecture.

Updated: 2024-08-20 13:32:11

标题: 基于自适应有限元的参数化PDE的多级CNNs

摘要: 提出了一种神经网络架构，利用高维参数相关偏微分方程的多级特性，实现参数到解映射的高效逼近，与低秩张量回归等最佳方法在准确性和复杂性方面不相上下。神经网络通过在自适应细化有限元网格上训练数据，显著降低数据复杂性。通过使用可靠的有限元后验误差估计器实现误差控制，该估计器也作为输入提供给神经网络。提出的具有CNN层的U-Net架构模仿了经典有限元多重网格算法。可以证明CNN有效地逼近求解器所需的所有操作，包括基于残差的误差估计器的评估。在CNN中，根据每个网格级别上由于细化而产生的局部校正设置一个裁剪蒙版，减少了整体复杂性，使网络能够优化具有局部细粒度有限元数据。对自适应多级方案进行了完整的收敛性和复杂性分析，与以前的非自适应多级CNN在几个方面有所不同。此外，通过常见的不确定性量化基准问题进行的数值实验展示了该架构的实际性能。

更新时间: 2024-08-20 13:32:11

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2408.10838v1

Conditional Brownian Bridge Diffusion Model for VHR SAR to Optical Image Translation

Synthetic Aperture Radar (SAR) imaging technology provides the unique advantage of being able to collect data regardless of weather conditions and time. However, SAR images exhibit complex backscatter patterns and speckle noise, which necessitate expertise for interpretation. Research on translating SAR images into optical-like representations has been conducted to aid the interpretation of SAR data. Nevertheless, existing studies have predominantly utilized low-resolution satellite imagery datasets and have largely been based on Generative Adversarial Network (GAN) which are known for their training instability and low fidelity. To overcome these limitations of low-resolution data usage and GAN-based approaches, this paper introduces a conditional image-to-image translation approach based on Brownian Bridge Diffusion Model (BBDM). We conducted comprehensive experiments on the MSAW dataset, a paired SAR and optical images collection of 0.5m Very-High-Resolution (VHR). The experimental results indicate that our method surpasses both the Conditional Diffusion Models (CDMs) and the GAN-based models in diverse perceptual quality metrics.

Updated: 2024-08-20 13:30:11

标题: 高分辨率 SAR 到光学图像转换的条件布朗桥扩散模型

摘要: 合成孔径雷达（SAR）成像技术具有独特优势，能够在任何天气条件和时间收集数据。然而，SAR图像展现出复杂的回波模式和斑点噪声，这需要专业知识来解释。研究将SAR图像转化为光学式表示以辅助解释SAR数据已经进行。然而，现有研究主要利用低分辨率卫星图像数据集，并且主要基于以生成对抗网络（GAN）为基础的方法，这些方法以训练不稳定性和低保真度而闻名。为了克服低分辨率数据使用和基于GAN的方法的限制，本文介绍了一种基于布朗桥扩散模型（BBDM）的有条件的图像到图像转换方法。我们在MSAW数据集上进行了全面的实验，这是一个配对的SAR和光学图像集合，分辨率为0.5m的超高分辨率（VHR）。实验结果表明，我们的方法在各种感知质量指标上均优于有条件扩散模型（CDMs）和基于GAN的模型。

更新时间: 2024-08-20 13:30:11

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.07947v2

ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data

Synthetic data is increasingly being used to address the lack of labeled images in uncommon domains for deep learning tasks. A prominent example is 2D pose estimation of animals, particularly wild species like zebras, for which collecting real-world data is complex and impractical. However, many approaches still require real images, consistency and style constraints, sophisticated animal models, and/or powerful pre-trained networks to bridge the syn-to-real gap. Moreover, they often assume that the animal can be reliably detected in images or videos, a hypothesis that often does not hold, e.g. in wildlife scenarios or aerial images. To solve this, we use synthetic data generated with a 3D photorealistic simulator to obtain the first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any of the aforementioned bridging strategies. Unlike previous works, we extensively train and benchmark our detection and 2D pose estimation models on multiple real-world and synthetic datasets using both pre-trained and non-pre-trained backbones. These experiments show how the models trained from scratch and only with synthetic data can consistently generalize to real-world images of zebras in both tasks. Moreover, we show it is possible to easily generalize those same models to 2D pose estimation of horses with a minimal amount of real-world images to account for the domain transfer. Code, results, trained models; and the synthetic, training, and validation data, including 104K manually labeled frames, are provided as open-source at https://zebrapose.is.tue.mpg.de/

Updated: 2024-08-20 13:28:37

标题: 斑马姿势：仅使用合成数据进行斑马检测和姿势估计

摘要: 合成数据越来越被用于解决深度学习任务中不常见领域缺乏标注图像的问题。一个著名的例子是动物的2D姿势估计，特别是野生物种如斑马，采集真实世界数据复杂且不切实际。然而，许多方法仍然需要真实图像、一致性和风格约束、复杂的动物模型和/或强大的预训练网络来弥合从合成到真实的差距。此外，它们通常假设动物可以在图像或视频中可靠检测，这个假设通常不成立，例如在野生动物场景或航空图像中。为了解决这个问题，我们使用一个3D逼真模拟器生成的合成数据，获得了第一个可以用于斑马检测和2D姿势估计的合成数据集，而不需要应用任何前述的桥接策略。与以往的工作不同，我们在多个真实世界和合成数据集上广泛训练和评估我们的检测和2D姿势估计模型，同时使用预训练和非预训练的主干网络。这些实验表明，从零开始只用合成数据训练的模型可以在两个任务中一致地泛化到斑马的真实世界图像。此外，我们展示了可以轻松将这些模型泛化到仅需少量真实世界图像以实现域转移的马的2D姿势估计。代码、结果、训练模型；以及合成、训练和验证数据，包括104K个手动标记的帧，在https://zebrapose.is.tue.mpg.de/上作为开源提供。

更新时间: 2024-08-20 13:28:37

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2408.10831v1

ELASTIC: Efficient Linear Attention for Sequential Interest Compression

State-of-the-art sequential recommendation models heavily rely on transformer's attention mechanism. However, the quadratic computational and memory complexities of self attention have limited its scalability for modeling users' long range behaviour sequences. To address this problem, we propose ELASTIC, an Efficient Linear Attention for SequenTial Interest Compression, requiring only linear time complexity and decoupling model capacity from computational cost. Specifically, ELASTIC introduces a fixed length interest experts with linear dispatcher attention mechanism which compresses the long-term behaviour sequences to a significantly more compact representation which reduces up to 90% GPU memory usage with x2.7 inference speed up. The proposed linear dispatcher attention mechanism significantly reduces the quadratic complexity and makes the model feasible for adequately modeling extremely long sequences. Moreover, in order to retain the capacity for modeling various user interests, ELASTIC initializes a vast learnable interest memory bank and sparsely retrieves compressed user's interests from the memory with a negligible computational overhead. The proposed interest memory retrieval technique significantly expands the cardinality of available interest space while keeping the same computational cost, thereby striking a trade-off between recommendation accuracy and efficiency. To validate the effectiveness of our proposed ELASTIC, we conduct extensive experiments on various public datasets and compare it with several strong sequential recommenders. Experimental results demonstrate that ELASTIC consistently outperforms baselines by a significant margin and also highlight the computational efficiency of ELASTIC when modeling long sequences. We will make our implementation code publicly available.

Updated: 2024-08-20 13:24:50

标题: ELASTIC：用于序列兴趣压缩的高效线性注意力

摘要: 目前最先进的顺序推荐模型严重依赖于transformer的注意力机制。然而，自注意力的二次计算和内存复杂性限制了其在建模用户长期行为序列方面的可扩展性。为了解决这个问题，我们提出了ELASTIC，一种用于顺序兴趣压缩的高效线性注意力，仅需要线性时间复杂度，并且将模型容量与计算成本分离。具体而言，ELASTIC引入了一个具有线性分配器注意力机制的固定长度兴趣专家，将长期行为序列压缩成一个更加紧凑的表示，从而减少了高达90%的GPU内存使用，并提高了2.7倍的推理速度。提出的线性分配器注意力机制显著降低了二次复杂性，并使模型能够充分地建模极长序列。此外，为了保留建模各种用户兴趣的能力，ELASTIC初始化了一个庞大的可学习兴趣存储库，并从存储器中稀疏检索压缩的用户兴趣，带来可忽略的计算开销。提出的兴趣存储检索技术显著扩展了可用兴趣空间的基数，同时保持相同的计算成本，从而在推荐准确性和效率之间取得平衡。为了验证我们提出的ELASTIC的有效性，我们在各种公共数据集上进行了大量实验，并与几种强大的顺序推荐器进行了比较。实验结果表明，ELASTIC始终以显着优势击败基线，并突出了ELASTIC在建模长序列时的计算效率。我们将公开提供我们的实现代码。

更新时间: 2024-08-20 13:24:50

领域: cs.AI,cs.IR

下载: http://arxiv.org/abs/2408.09380v2

From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis

Recent advances in self-supervised learning enabled novel medical AI models, known as foundation models (FMs) that offer great potential for characterizing health from diverse biomedical data. Continuous glucose monitoring (CGM) provides rich, temporal data on glycemic patterns, but its full potential for predicting broader health outcomes remains underutilized. Here, we present GluFormer, a generative foundation model on biomedical temporal data based on a transformer architecture, and trained on over 10 million CGM measurements from 10,812 non-diabetic individuals. We tokenized the CGM training data and trained GluFormer using next token prediction in a generative, autoregressive manner. We demonstrate that GluFormer generalizes effectively to 15 different external datasets, including 4936 individuals across 5 different geographical regions, 6 different CGM devices, and several metabolic disorders, including normoglycemic, prediabetic, and diabetic populations, as well as those with gestational diabetes and obesity. GluFormer produces embeddings which outperform traditional CGM analysis tools, and achieves high Pearson correlations in predicting clinical parameters such as HbA1c, liver-related parameters, blood lipids, and sleep-related indices. Notably, GluFormer can also predict onset of future health outcomes even 4 years in advance. We also show that CGM embeddings from pre-intervention periods in Randomized Clinical Trials (RCTs) outperform other methods in predicting primary and secondary outcomes. When integrating dietary data into GluFormer, we show that the enhanced model can accurately generate CGM data based only on dietary intake data, simulate outcomes of dietary interventions, and predict individual responses to specific foods. Overall, we show that GluFormer accurately predicts health outcomes which generalize across different populations metabolic conditions.

Updated: 2024-08-20 13:19:06

标题: 从血糖模式到健康结果：连续血糖监测数据分析的通用基础模型

摘要: 最近自监督学习的进展使得新型医疗人工智能模型，被称为基础模型（FMs），具有从多样化生物医学数据中表征健康的巨大潜力。连续血糖监测（CGM）提供了关于血糖模式的丰富时间数据，但其用于预测更广泛健康结果的全部潜力尚未得到充分利用。在这里，我们介绍了GluFormer，一种基于变压器结构的生物医学时间数据生成基础模型，通过对来自10,812名非糖尿病患者的1000万多个CGM测量进行训练。我们对CGM训练数据进行了分词处理，并使用生成式、自回归的方式训练GluFormer进行下一个令牌的预测。我们展示了GluFormer有效地泛化到15个不同的外部数据集，包括来自5个不同地理区域的4936名个体，6种不同的CGM设备，以及几种代谢紊乱，包括正常血糖、糖尿病前期和糖尿病人群，以及孕期糖尿病和肥胖症人群。GluFormer生成的嵌入优于传统的CGM分析工具，并在预测HbA1c、与肝脏相关的参数、血脂以及与睡眠有关的指标等临床参数方面取得高Pearson相关性。值得注意的是，GluFormer甚至可以提前4年预测未来健康结果的发生。我们还展示了在随机临床试验（RCTs）中，来自干预前期的CGM嵌入优于其他方法在预测主要和次要结果方面。当将膳食数据整合到GluFormer中时，我们展示了增强模型可以准确地基于仅膳食摄入数据生成CGM数据，模拟膳食干预的结果，并预测个体对特定食物的反应。总的来说，我们展示了GluFormer可以准确预测健康结果，并且这些结果可以在不同人群和代谢状况之间泛化。

更新时间: 2024-08-20 13:19:06

领域: q-bio.QM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.11876v1

Exploiting Large Language Models Capabilities for Question Answer-Driven Knowledge Graph Completion Across Static and Temporal Domains

Knowledge graph completion (KGC) aims to identify missing triples in a knowledge graph (KG). This is typically achieved through tasks such as link prediction and instance completion. However, these methods often focus on either static knowledge graphs (SKGs) or temporal knowledge graphs (TKGs), addressing only within-scope triples. This paper introduces a new generative completion framework called Generative Subgraph-based KGC (GS-KGC). GS-KGC employs a question-answering format to directly generate target entities, addressing the challenge of questions having multiple possible answers. We propose a strategy that extracts subgraphs centered on entities and relationships within the KG, from which negative samples and neighborhood information are separately obtained to address the one-to-many problem. Our method generates negative samples using known facts to facilitate the discovery of new information. Furthermore, we collect and refine neighborhood path data of known entities, providing contextual information to enhance reasoning in large language models (LLMs). Our experiments evaluated the proposed method on four SKGs and two TKGs, achieving state-of-the-art Hits@1 metrics on five datasets. Analysis of the results shows that GS-KGC can discover new triples within existing KGs and generate new facts beyond the closed KG, effectively bridging the gap between closed-world and open-world KGC.

Updated: 2024-08-20 13:13:41

标题: 利用大型语言模型的能力进行问题回答驱动的知识图完成跨静态和时间领域

摘要: 知识图谱完成（KGC）旨在识别知识图谱（KG）中缺失的三元组。通常通过链接预测和实例完成等任务实现。然而，这些方法往往集中在静态知识图（SKGs）或时间知识图（TKGs），仅解决范围内的三元组。本文介绍了一种新的生成完成框架，称为基于子图的生成知识图谱完成（GS-KGC）。GS-KGC采用问答格式直接生成目标实体，解决了问题具有多个可能答案的挑战。我们提出了一种提取以实体和关系为中心的子图的策略，从中分别获得负样本和邻域信息，以解决一对多问题。我们的方法使用已知事实生成负样本，以促进新信息的发现。此外，我们收集和完善已知实体的邻域路径数据，提供上下文信息以增强大型语言模型（LLMs）的推理能力。我们的实验在四个SKGs和两个TKGs上评估了所提出的方法，在五个数据集上实现了最先进的Hits@1指标。结果分析显示，GS-KGC能够在现有KG中发现新的三元组，并在封闭KG之外生成新的事实，有效地弥合了封闭世界和开放世界KGC之间的差距。

更新时间: 2024-08-20 13:13:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.10819v1

Learning Randomized Algorithms with Transformers

Randomization is a powerful tool that endows algorithms with remarkable properties. For instance, randomized algorithms excel in adversarial settings, often surpassing the worst-case performance of deterministic algorithms with large margins. Furthermore, their success probability can be amplified by simple strategies such as repetition and majority voting. In this paper, we enhance deep neural networks, in particular transformer models, with randomization. We demonstrate for the first time that randomized algorithms can be instilled in transformers through learning, in a purely data- and objective-driven manner. First, we analyze known adversarial objectives for which randomized algorithms offer a distinct advantage over deterministic ones. We then show that common optimization techniques, such as gradient descent or evolutionary strategies, can effectively learn transformer parameters that make use of the randomness provided to the model. To illustrate the broad applicability of randomization in empowering neural networks, we study three conceptual tasks: associative recall, graph coloring, and agents that explore grid worlds. In addition to demonstrating increased robustness against oblivious adversaries through learned randomization, our experiments reveal remarkable performance improvements due to the inherently random nature of the neural networks' computation and predictions.

Updated: 2024-08-20 13:13:36

标题: 使用Transformer学习随机算法

摘要: 随机化是一种强大的工具，赋予算法卓越的特性。例如，在对抗环境中，随机化算法往往超越确定性算法的最坏情况表现，有着显著的优势。此外，通过简单的策略如重复和多数投票，它们的成功概率可以被增强。在本文中，我们通过随机化增强了深度神经网络，特别是transformer模型。我们首次展示了随机化算法可以通过学习纯粹基于数据和目标的方法灌输到transformers中。首先，我们分析已知的对抗目标，证明随机化算法在这些目标上比确定性算法具有明显优势。然后，我们展示了常见的优化技术，如梯度下降或进化策略，可以有效地学习transformer参数，利用模型提供的随机性。为了说明随机化在增强神经网络中的广泛适用性，我们研究了三个概念性任务：联想回忆、图着色和在网格世界中探索的代理。除了通过学习的随机化展示出对抗无意对手的增强鲁棒性外，我们的实验还揭示了由于神经网络计算和预测的固有随机性，性能有了显著的提升。

更新时间: 2024-08-20 13:13:36

领域: cs.LG

下载: http://arxiv.org/abs/2408.10818v1

Deep Learning-based Classification of Dementia using Image Representation of Subcortical Signals

Dementia is a neurological syndrome marked by cognitive decline. Alzheimer's disease (AD) and Frontotemporal dementia (FTD) are the common forms of dementia, each with distinct progression patterns. EEG, a non-invasive tool for recording brain activity, has shown potential in distinguishing AD from FTD and mild cognitive impairment (MCI). Previous studies have utilized various EEG features, such as subband power and connectivity patterns to differentiate these conditions. However, artifacts in EEG signals can obscure crucial information, necessitating advanced signal processing techniques. This study aims to develop a deep learning-based classification system for dementia by analyzing scout time-series signals from deep brain regions, specifically the hippocampus, amygdala, and thalamus. The study utilizes scout time series extracted via the standardized low-resolution brain electromagnetic tomography (sLORETA) technique. The time series is converted to image representations using continuous wavelet transform (CWT) and fed as input to deep learning models. Two high-density EEG datasets are utilized to check for the efficacy of the proposed method: the online BrainLat dataset (comprising AD, FTD, and healthy controls (HC)) and the in-house IITD-AIIA dataset (including subjects with AD, MCI, and HC). Different classification strategies and classifier combinations have been utilized for the accurate mapping of classes on both datasets. The best results were achieved by using a product of probabilities from classifiers for left and right subcortical regions in conjunction with the DenseNet model architecture. It yields accuracies of 94.17$\%$ and 77.72$\%$ on the BrainLat and IITD-AIIA datasets, respectively. This highlights the potential of this approach for early and accurate differentiation of neurodegenerative disorders.

Updated: 2024-08-20 13:11:43

标题: 基于深度学习的基于次皮质信号图像表示的痴呆分类

摘要: 痴呆症是一种以认知功能下降为特征的神经系统综合症。阿尔茨海默病（AD）和颞颞叶痴呆（FTD）是常见形式的痴呆症，每种都具有不同的发展模式。脑电图（EEG）是一种记录脑活动的非侵入性工具，已显示出在区分AD和FTD以及轻度认知障碍（MCI）方面具有潜力。先前的研究已利用各种EEG特征，如子带功率和连接模式，来区分这些疾病。然而，EEG信号中的伪迹可能会模糊关键信息，因此需要先进的信号处理技术。本研究旨在通过分析来自深部脑区域（特别是海马体、杏仁核和丘脑）的侦察时间序列信号，开发基于深度学习的痴呆症分类系统。该研究利用通过标准化低分辨率脑电磁层析成像（sLORETA）技术提取的侦察时间序列。时间序列通过连续小波变换（CWT）转换为图像表示，并作为深度学习模型的输入。利用两个高密度EEG数据集检验了所提出方法的有效性：在线BrainLat数据集（包括AD、FTD和健康对照组（HC））和内部IITD-AIIA数据集（包括AD、MCI和HC的受试者）。不同的分类策略和分类器组合已用于在两个数据集上准确映射类。通过使用左右皮层下区域分类器的概率乘积与DenseNet模型架构相结合，取得了最佳结果。在BrainLat和IITD-AIIA数据集上分别获得了94.17％和77.72％的准确性。这突显了这种方法在早期和准确区分神经退行性疾病方面的潜力。

更新时间: 2024-08-20 13:11:43

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2408.10816v1

Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in?

In this study, we investigate whether non-English-centric LLMs, despite their strong performance, `think' in their respective dominant language: more precisely, `think' refers to how the representations of intermediate layers, when un-embedded into the vocabulary space, exhibit higher probabilities for certain dominant languages during generation. We term such languages as internal $\textbf{latent languages}$. We examine the latent language of three typical categories of models for Japanese processing: Llama2, an English-centric model; Swallow, an English-centric model with continued pre-training in Japanese; and LLM-jp, a model pre-trained on balanced English and Japanese corpora. Our empirical findings reveal that, unlike Llama2 which relies exclusively on English as the internal latent language, Japanese-specific Swallow and LLM-jp employ both Japanese and English, exhibiting dual internal latent languages. For any given target language, the model preferentially activates the latent language most closely related to it. In addition, we explore how intermediate layers respond to questions involving cultural conflicts between latent internal and target output languages. We further explore how the language identity shifts across layers while keeping consistent semantic meaning reflected in the intermediate layer representations. This study deepens the understanding of non-English-centric large language models, highlighting the intricate dynamics of language representation within their intermediate layers.

Updated: 2024-08-20 13:05:41

标题: 超越以英语为中心的LLMs：多语言语言模型思考时使用的语言是什么？

摘要: 在这项研究中，我们调查了非以英语为中心的LLMs，尽管它们表现出色，是否在其各自的主导语言中“思考”：更精确地说，“思考”是指中间层表示在未嵌入词汇空间时，在生成过程中展现出更高的概率倾向于某些主导语言。我们将这样的语言称为内部潜在语言。我们研究了三种典型的日语处理模型的潜在语言：Llama2，一个以英语为中心的模型；Swallow，一个在日语中继续预训练的以英语为中心的模型；以及LLM-jp，一个在平衡的英语和日语语料库上预训练的模型。我们的实证研究结果表明，与完全依赖英语作为内部潜在语言的Llama2不同，日语特定的Swallow和LLM-jp同时使用日语和英语，展现出双重内部潜在语言。对于任何给定的目标语言，模型会优先激活与其最相关的潜在语言。此外，我们探讨了中间层如何应对涉及潜在内部语言和目标输出语言之间的文化冲突的问题。我们进一步探讨了在保持中间层表示中的一致语义意义的同时，语言身份在各层之间如何变化。这项研究加深了对非以英语为中心的大型语言模型的理解，突出了它们中间层内语言表示的复杂动态。

更新时间: 2024-08-20 13:05:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.10811v1

DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation

Existing work on pitch and timbre disentanglement has been mostly focused on single-instrument music audio, excluding the cases where multiple instruments are presented. To fill the gap, we propose DisMix, a generative framework in which the pitch and timbre representations act as modular building blocks for constructing the melody and instrument of a source, and the collection of which forms a set of per-instrument latent representations underlying the observed mixture. By manipulating the representations, our model samples mixtures with novel combinations of pitch and timbre of the constituent instruments. We can jointly learn the disentangled pitch-timbre representations and a latent diffusion transformer that reconstructs the mixture conditioned on the set of source-level representations. We evaluate the model using both a simple dataset of isolated chords and a realistic four-part chorales in the style of J.S. Bach, identify the key components for the success of disentanglement, and demonstrate the application of mixture transformation based on source-level attribute manipulation.

Updated: 2024-08-20 12:56:49

标题: DisMix：解开音乐乐器混合物，用于源级音高和音色操纵

摘要: 关于音高和音色解缠的现有研究主要集中在单一乐器音频上，排除了多个乐器同时出现的情况。为了填补这一空白，我们提出了DisMix，这是一个生成框架，在该框架中，音高和音色表示作为构建源的旋律和乐器的模块化构建块，其集合形成了潜在表示的一组，这些潜在表示位于观察到的混合物下。通过操作这些表示，我们的模型可以对由构成乐器的音高和音色的新组合进行采样混合。我们可以同时学习解缠音高音色表示和一个潜在扩散变压器，它在基于源级表示的条件下重建混合物。我们使用一个简单的孤立音符数据集和一个仿J.S. Bach风格的逼真的四声部赞美诗来评估模型，确定解缠成功的关键组件，并展示基于源级属性操纵的混合变换的应用。

更新时间: 2024-08-20 12:56:49

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2408.10807v1

Projectivity revisited

The behaviour of statistical relational representations across differently sized domains has become a focal area of research from both a modelling and a complexity viewpoint.Recently, projectivity of a family of distributions emerged as a key property, ensuring that marginal probabilities are independent of the domain size. However, the formalisation used currently assumes that the domain is characterised only by its size. This contribution extends the notion of projectivity from families of distributions indexed by domain size to functors taking extensional data from a database. This makes projectivity available for the large range of applications taking structured input. We transfer key known results on projective families of distributions to the new setting. This includes a characterisation of projective fragments in different statistical relational formalisms as well as a general representation theorem for projective families of distributions. Furthermore, we prove a correspondence between projectivity and distributions on countably infinite domains, which we use to unify and generalise earlier work on statistical relational representations in infinite domains. Finally, we use the extended notion of projectivity to define a further strengthening, which we call $\sigma$-projectivity, and which allows the use of the same representation in different modes while retaining projectivity.

Updated: 2024-08-20 12:55:45

标题: 项目性再探讨

摘要: 统计关系表示在不同大小领域中的行为已经成为研究的重点领域，从建模和复杂性的角度来看。最近，一个家族分布的可投影性出现为一个关键属性，确保边际概率与领域大小无关。然而，当前使用的形式化假设领域仅由其大小来表征。本文将可投影性的概念从由领域大小索引的分布家族扩展到从数据库中获取外延数据的函子。这使得可投影性可以应用于接受结构化输入的广泛应用。我们将已知的关于可投影分布家族的关键结果转移到新的设置中。这包括在不同统计关系形式主义中对可投影片段的表征，以及对可投影分布家族的一般表示定理。此外，我们证明了可投影性与可数无限领域上的分布之间的对应关系，我们利用这一对应关系统一和概括了先前在无限领域中的统计关系表示方面的工作。最后，我们使用扩展后的可投影性概念来定义一个进一步的强化，我们称之为$\sigma$-可投影性，它允许在不同模式中使用相同的表示同时保持可投影性。

更新时间: 2024-08-20 12:55:45

领域: cs.AI,math.PR,math.ST,stat.TH,60B20, 62E20,I.2.3; I.2.4; I.2.6

下载: http://arxiv.org/abs/2207.00625v4

Inverse Deep Learning Ray Tracing for Heliostat Surface Prediction

Concentrating Solar Power (CSP) plants play a crucial role in the global transition towards sustainable energy. A key factor in ensuring the safe and efficient operation of CSP plants is the distribution of concentrated flux density on the receiver. However, the non-ideal flux density generated by individual heliostats can undermine the safety and efficiency of the power plant. The flux density from each heliostat is influenced by its precise surface profile, which includes factors such as canting and mirror errors. Accurately measuring these surface profiles for a large number of heliostats in operation is a formidable challenge. Consequently, control systems often rely on the assumption of ideal surface conditions, which compromises both safety and operational efficiency. In this study, we introduce inverse Deep Learning Ray Tracing (iDLR), an innovative method designed to predict heliostat surfaces based solely on target images obtained during heliostat calibration. Our simulation-based investigation demonstrates that sufficient information regarding the heliostat surface is retained in the flux density distribution of a single heliostat, enabling deep learning models to accurately predict the underlying surface with deflectometry-like precision for the majority of heliostats. Additionally, we assess the limitations of this method, particularly in relation to surface accuracy and resultant flux density predictions. Furthermore, we are presenting a new comprehensive heliostat model using Non-Uniform Rational B-Spline (NURBS) that has the potential to become the new State of the Art for heliostat surface parameterization. Our findings reveal that iDLR has significant potential to enhance CSP plant operations, potentially increasing the overall efficiency and energy output of the power plants.

Updated: 2024-08-20 12:51:35

标题: 反向深度学习射线追踪用于日镜面预测

摘要: 太阳能集中发电（CSP）厂在全球可持续能源转型中发挥着至关重要的作用。确保CSP厂安全高效运行的关键因素是接收器上的集中光束密度分布。然而，单个日镜产生的非理想光束密度可能会损害电厂的安全性和效率。每个日镜的光束密度受其精确的表面轮廓影响，包括倾斜和镜面误差等因素。准确测量大量运行中的日镜的这些表面轮廓是一个巨大的挑战。因此，控制系统通常依赖于理想表面条件的假设，这既影响了安全性又影响了运行效率。在这项研究中，我们介绍了逆向深度学习光线追踪（iDLR），这是一种创新方法，仅基于日镜校准期间获取的目标图像来预测日镜表面。我们基于模拟的调查表明，单个日镜的光束密度分布中保留了足够的有关日镜表面的信息，使深度学习模型能够准确预测大多数日镜的基础表面，类似偏转测量精度。此外，我们评估了这种方法的局限性，特别是与表面精度和结果光束密度预测相关的问题。此外，我们提出了一个使用非均匀有理B样条（NURBS）的全新全面日镜模型，有潜力成为日镜表面参数化的新的尖端技术。我们的研究结果表明，iDLR具有显著潜力来提升CSP厂的运行，潜在地提高电厂的整体效率和能量产出。

更新时间: 2024-08-20 12:51:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.10802v1

Probabilities of the Third Type: Statistical Relational Learning and Reasoning with Relative Frequencies

Dependencies on the relative frequency of a state in the domain are common when modelling probabilistic dependencies on relational data. For instance, the likelihood of a school closure during an epidemic might depend on the proportion of infected pupils exceeding a threshold. Often, rather than depending on discrete thresholds, dependencies are continuous: for instance, the likelihood of any one mosquito bite transmitting an illness depends on the proportion of carrier mosquitoes. Current approaches usually only consider probabilities over possible worlds rather than over domain elements themselves. An exception are the recently introduced lifted Bayesian networks for conditional probability logic, which express discrete dependencies on probabilistic data. We introduce functional lifted Bayesian networks, a formalism that explicitly incorporates continuous dependencies on relative frequencies into statistical relational artificial intelligence, and compare and contrast them with lifted Bayesian networks for conditional probability logic. Incorporating relative frequencies is not only beneficial to modelling; it also provides a more rigorous approach to learning problems where training and test or application domains have different sizes. To this end, we provide a representation of the asymptotic probability distributions induced by functional lifted Bayesian networks on domains of increasing sizes. Since that representation has well-understood scaling behaviour across domain sizes, it can be used to estimate parameters for a large domain consistently from randomly sampled subpopulations. Furthermore, we show that in parametric families of FLBN, convergence is uniform in the parameters, which ensures a meaningful dependence of the asymptotic probabilities on the parameters of the model.

Updated: 2024-08-20 12:50:18

标题: 第三类概率：统计关系学习和基于相对频率的推理

摘要: 对于建模概率关系数据时，依赖于域中状态的相对频率是常见的。例如，在流行病期间关闭学校的可能性可能取决于感染学生的比例是否超过某个阈值。通常，依赖关系不是离散的阈值，而是连续的：例如，任何一次蚊虫叮咬传播疾病的可能性取决于携带者蚊虫的比例。目前的方法通常只考虑可能世界上的概率，而不考虑域元素本身的概率。一个例外是最近引入的条件概率逻辑的抬升贝叶斯网络，它表达了对概率数据的离散依赖关系。我们引入了功能性抬升贝叶斯网络，这是一种明确将相对频率的连续依赖关系纳入统计关系人工智能的形式化方法，并将其与条件概率逻辑的抬升贝叶斯网络进行比较和对比。将相对频率纳入建模不仅有益，还提供了一个更严谨的方法来解决训练和测试或应用域大小不同的问题。为此，我们提供了功能性抬升贝叶斯网络在域逐渐增大时引起的渐近概率分布的表示。由于这种表示在不同域大小上有良好理解的缩放行为，可以用来从随机抽样的亚群体中一致地估计大域的参数。此外，我们展示在FLBN的参数族中，参数的收敛是均匀的，这确保了渐近概率对模型参数的有意义依赖。

更新时间: 2024-08-20 12:50:18

领域: cs.AI,cs.LG,cs.LO,I.2.4; I.2.6

下载: http://arxiv.org/abs/2202.10367v4

Universal Novelty Detection Through Adaptive Contrastive Learning

Novelty detection is a critical task for deploying machine learning models in the open world. A crucial property of novelty detection methods is universality, which can be interpreted as generalization across various distributions of training or test data. More precisely, for novelty detection, distribution shifts may occur in the training set or the test set. Shifts in the training set refer to cases where we train a novelty detector on a new dataset and expect strong transferability. Conversely, distribution shifts in the test set indicate the methods' performance when the trained model encounters a shifted test sample. We experimentally show that existing methods falter in maintaining universality, which stems from their rigid inductive biases. Motivated by this, we aim for more generalized techniques that have more adaptable inductive biases. In this context, we leverage the fact that contrastive learning provides an efficient framework to easily switch and adapt to new inductive biases through the proper choice of augmentations in forming the negative pairs. We propose a novel probabilistic auto-negative pair generation method AutoAugOOD, along with contrastive learning, to yield a universal novelty detector method. Our experiments demonstrate the superiority of our method under different distribution shifts in various image benchmark datasets. Notably, our method emerges universality in the lens of adaptability to different setups of novelty detection, including one-class, unlabeled multi-class, and labeled multi-class settings. Code: https://github.com/mojtaba-nafez/UNODE

Updated: 2024-08-20 12:46:23

标题: 通过自适应对比学习实现通用新颖性检测

摘要: 新颖性检测是在开放世界中部署机器学习模型的关键任务。新颖性检测方法的一个关键特性是普适性，可以解释为在训练或测试数据的各种分布之间的泛化。更准确地说，对于新颖性检测，训练集或测试集中可能会发生分布转移。训练集中的转移指的是我们在新数据集上训练新颖性检测器，并期望有很强的可迁移性的情况。相反，测试集中的分布转移指的是当训练模型遇到转移的测试样本时的方法性能。我们实验证明现有方法在保持普适性方面存在缺陷，这源于它们的刚性归纳偏见。在此基础上，我们旨在提出更通用的技术，具有更适应性的归纳偏见。在这种情况下，我们利用对比学习提供的有效框架，通过适当选择增强来形成负对，轻松切换和适应新的归纳偏见。我们提出了一种新颖的概率自动负对生成方法AutoAugOOD，以及对比学习，以产生一种通用的新颖性检测器方法。我们的实验证明了我们的方法在各种图像基准数据集中的不同分布转移下的优越性。值得注意的是，我们的方法在适应不同设置的新颖性检测，包括单类、无标签多类和有标签多类设置方面具有普适性。源代码：https://github.com/mojtaba-nafez/UNODE

更新时间: 2024-08-20 12:46:23

领域: cs.LG

下载: http://arxiv.org/abs/2408.10798v1

Honeyquest: Rapidly Measuring the Enticingness of Cyber Deception Techniques with Code-based Questionnaires

Fooling adversaries with traps such as honeytokens can slow down cyber attacks and create strong indicators of compromise. Unfortunately, cyber deception techniques are often poorly specified. Also, realistically measuring their effectiveness requires a well-exposed software system together with a production-ready implementation of these techniques. This makes rapid prototyping challenging. Our work translates 13 previously researched and 12 self-defined techniques into a high-level, machine-readable specification. Our open-source tool, Honeyquest, allows researchers to quickly evaluate the enticingness of deception techniques without implementing them. We test the enticingness of 25 cyber deception techniques and 19 true security risks in an experiment with 47 humans. We successfully replicate the goals of previous work with many consistent findings, but without a time-consuming implementation of these techniques on real computer systems. We provide valuable insights for the design of enticing deception and also show that the presence of cyber deception can significantly reduce the risk that adversaries will find a true security risk by about 22% on average.

Updated: 2024-08-20 12:45:41

标题: 蜜蜂探索：使用基于代码的问卷快速测量网络欺骗技术的诱惑力

摘要: 使用诸如蜜罐等陷阱欺骗对手可以减缓网络攻击并创建强有力的妥协指标。不幸的是，网络欺骗技术通常规范不清。此外，实际衡量它们的有效性需要一个充分暴露的软件系统，以及这些技术的生产就绪实施。这使得快速原型设计具有挑战性。我们的工作将之前研究的13种和自定义的12种技术转化为高级别、可机器读取的规范。我们的开源工具Honeyquest使研究人员能够快速评估欺骗技术的吸引力，而无需实施它们。我们在一个涉及47名人类的实验中测试了25种网络欺骗技术的吸引力和19种真实安全风险。我们成功复制了以前工作的目标，得出了许多一致的发现，但没有在真实计算机系统上耗时实施这些技术。我们为诱人欺骗的设计提供了宝贵的见解，并且还表明网络欺骗的存在可以将对手发现真正安全风险的风险平均降低约22%。

更新时间: 2024-08-20 12:45:41

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2408.10796v1

Abstract Weighted Based Gradual Semantics in Argumentation Theory

Weighted gradual semantics provide an acceptability degree to each argument representing the strength of the argument, computed based on factors including background evidence for the argument, and taking into account interactions between this argument and others. We introduce four important problems linking gradual semantics and acceptability degrees. First, we reexamine the inverse problem, seeking to identify the argument weights of the argumentation framework which lead to a specific final acceptability degree. Second, we ask whether the function mapping between argument weights and acceptability degrees is injective or a homeomorphism onto its image. Third, we ask whether argument weights can be found when preferences, rather than acceptability degrees for arguments are considered. Fourth, we consider the topology of the space of valid acceptability degrees, asking whether "gaps" exist in this space. While different gradual semantics have been proposed in the literature, in this paper, we identify a large family of weighted gradual semantics, called abstract weighted based gradual semantics. These generalise many of the existing semantics while maintaining desirable properties such as convergence to a unique fixed point. We also show that a sub-family of the weighted gradual semantics, called abstract weighted (L^p,\lambda,\mu)-based gradual semantics and which include well-known semantics, solve all four of the aforementioned problems.

Updated: 2024-08-20 12:44:00

标题: 基于抽象权重的逐步语义在论证理论中的应用

摘要: 加权逐渐语义为每个代表论点强度的论点提供一个可接受程度，该程度是基于包括对论点的背景证据在内的因素计算得出的，并且考虑了该论点与其他论点之间的交互作用。我们介绍了将逐渐语义和可接受程度联系起来的四个重要问题。首先，我们重新审视了逆问题，试图确定导致特定最终可接受程度的论证框架的论点权重。其次，我们问论点权重与可接受程度之间的映射函数是否是单射或同胚映射。第三，我们问在考虑偏好而非论点的可接受程度时是否可以找到论点权重。第四，我们考虑有效可接受程度空间的拓扑学，询问该空间中是否存在“间隙”。尽管文献中提出了不同的逐渐语义，但在本文中，我们确定了一个被称为抽象加权基础逐渐语义的大家族的加权逐渐语义。这些泛化了许多现有的语义，同时保持了诸如收敛到唯一不动点等理想性质。我们还展示了加权逐渐语义的一个子族，被称为抽象加权（L^p,\lambda,\mu）基础逐渐语义，这些语义包括了一些知名的语义，解决了前述四个问题。

更新时间: 2024-08-20 12:44:00

领域: cs.AI

下载: http://arxiv.org/abs/2401.11472v3

Cloud Kitchen: Using Planning-based Composite AI to Optimize Food Delivery Processes

The global food delivery market provides many opportunities for AI-based services that can improve the efficiency of feeding the world. This paper presents the Cloud Kitchen platform as a decision-making tool for restaurants with food delivery and a simulator to evaluate the impact of the decisions. The platform contains a Technology-Specific Bridge (TSB) that provides an interface for communicating with restaurants or the simulator. TSB uses a planning domain model to represent decisions embedded in the Unified Planning Framework (UPF). Decision-making, which concerns allocating customers' orders to vehicles and deciding in which order the customers will be served (for each vehicle), is done via a Vehicle Routing Problem with Time Windows (VRPTW), an efficient tool for this problem. We show that decisions made by our platform can improve customer satisfaction by reducing the number of delayed deliveries using a real-world historical dataset.

Updated: 2024-08-20 12:38:36

标题: 云厨房：利用基于规划的综合人工智能优化食品配送流程

摘要: 全球食品配送市场为基于人工智能的服务提供了许多机会，可以提高世界各地的饲养效率。本文介绍了Cloud Kitchen平台作为餐厅食品配送的决策工具以及评估决策影响的模拟器。该平台包含一个技术特定桥梁（TSB），提供与餐厅或模拟器通信的接口。TSB使用规划领域模型来表示嵌入在统一规划框架（UPF）中的决策。决策涉及将客户订单分配给车辆，并决定以何种顺序为客户提供服务（对于每辆车），通过带有时间窗口的车辆路径问题（VRPTW）来完成，这是解决这个问题的有效工具。我们展示了通过使用真实世界历史数据集，我们平台所做出的决策可以通过减少延迟交付数量来提高客户满意度。

更新时间: 2024-08-20 12:38:36

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2402.10725v2

SoftTiger: A Clinical Foundation Model for Healthcare Workflows

We introduce SoftTiger, a clinical large language model (CLaM) designed as a foundation model for healthcare workflows. The narrative and unstructured nature of clinical notes is a major obstacle for healthcare intelligentization. We address a critical problem of structuring clinical notes into clinical data, according to international interoperability standards. We collect and annotate data for three subtasks, namely, international patient summary, clinical impression and medical encounter. We then supervised fine-tuned a state-of-the-art LLM using public and credentialed clinical data. The training is orchestrated in a way that the target model can first support basic clinical tasks such as abbreviation expansion and temporal information extraction, and then learn to perform more complex downstream clinical tasks. Moreover, we address several modeling challenges in the healthcare context, e.g., extra long context window. Our blind pairwise evaluation shows that SoftTiger outperforms other popular open-source models and GPT-3.5, comparable to Gemini-pro, with a mild gap from GPT-4. We believe that LLMs may become a step-stone towards healthcare digitalization and democratization. Therefore, we publicly release SoftTiger models at scales of 13 billion and 70 billion parameters, as well as datasets and code for our innovative scalable evaluation, hopefully, making a significant contribution to the healthcare industry.

Updated: 2024-08-20 12:37:02

标题: 柔虎（SoftTiger）：医疗工作流的临床基础模型

摘要: 我们介绍了SoftTiger，这是一个专为医疗工作流设计的临床大型语言模型(CLAM)。临床笔记的叙述性和非结构化的特点是医疗智能化的主要障碍。我们解决了将临床笔记结构化为临床数据的关键问题，符合国际互操作标准。我们收集和注释了三个子任务的数据，即国际病人摘要、临床印象和医疗接触。然后，我们使用公开和有资质的临床数据对最先进的LLM进行了监督微调。培训的方式使目标模型首先能够支持基本的临床任务，如缩写扩展和时间信息提取，然后学习执行更复杂的下游临床任务。此外，我们解决了医疗环境中的几个建模挑战，例如额外长的上下文窗口。我们的盲对评估表明，SoftTiger的表现优于其他流行的开源模型和GPT-3.5，与Gemini-pro相当，与GPT-4相比有一定差距。我们相信LLM可能成为通向医疗数字化和民主化的垫脚石。因此，我们以130亿和700亿参数的规模公开发布SoftTiger模型，以及用于我们创新可扩展评估的数据集和代码，希望为医疗行业做出重大贡献。

更新时间: 2024-08-20 12:37:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.00868v3

Understanding the Skills Gap between Higher Education and Industry in the UK in Artificial Intelligence Sector

As Artificial Intelligence (AI) changes how businesses work, there is a growing need for people who can work in this sector. This paper investigates how well universities in United Kingdom offering courses in AI, prepare students for jobs in the real world. To gain insight into the differences between university curricula and industry demands we review the contents of taught courses and job advertisement portals. By using custom data scraping tools to gather information from job advertisements and university curricula, and frequency and Naive Bayes classifier analysis, this study will show exactly what skills industry is looking for. In this study we identified 12 skill categories that were used for mapping. The study showed that the university curriculum in the AI domain is well balanced in most technical skills, including Programming and Machine learning subjects, but have a gap in Data Science and Maths and Statistics skill categories.

Updated: 2024-08-20 12:28:58

标题: 理解英国高等教育和人工智能行业之间的技能鸿沟

摘要: 随着人工智能（AI）改变了企业的运作方式，对能够在这个领域工作的人才的需求日益增长。本文研究了在英国提供人工智能课程的大学如何为学生在现实世界中就业做好准备。为了深入了解大学课程内容与行业需求之间的差异，我们审查了教授课程和工作广告门户网站的内容。通过使用自定义数据抓取工具从工作广告和大学课程中收集信息，并进行频率和朴素贝叶斯分类器分析，本研究将准确展示行业所需的技能。在这项研究中，我们确定了12个用于制定技能图谱的技能类别。研究表明，在AI领域，大学课程在大多数技术技能方面都很平衡，包括编程和机器学习学科，但在数据科学和数学统计技能类别中存在差距。

更新时间: 2024-08-20 12:28:58

领域: cs.AI

下载: http://arxiv.org/abs/2408.10788v1

LightMDETR: A Lightweight Approach for Low-Cost Open-Vocabulary Object Detection Training

Object detection in computer vision traditionally involves identifying objects in images. By integrating textual descriptions, we enhance this process, providing better context and accuracy. The MDETR model significantly advances this by combining image and text data for more versatile object detection and classification. However, MDETR's complexity and high computational demands hinder its practical use. In this paper, we introduce Lightweight MDETR (LightMDETR), an optimized MDETR variant designed for improved computational efficiency while maintaining robust multimodal capabilities. Our approach involves freezing the MDETR backbone and training a sole component, the Deep Fusion Encoder (DFE), to represent image and text modalities. A learnable context vector enables the DFE to switch between these modalities. Evaluation on datasets like RefCOCO, RefCOCO+, and RefCOCOg demonstrates that LightMDETR achieves superior precision and accuracy.

Updated: 2024-08-20 12:27:53

标题: LightMDETR：一种轻量级的低成本开放词汇目标检测训练方法

摘要: 计算机视觉中的对象检测传统上涉及在图像中识别对象。通过集成文本描述，我们增强了这一过程，提供更好的上下文和准确性。MDETR模型通过结合图像和文本数据，显著推进了更多功能的对象检测和分类。然而，MDETR的复杂性和高计算需求阻碍了其实际使用。在本文中，我们介绍了轻量级MDETR（LightMDETR），这是一种经过优化的MDETR变体，旨在提高计算效率同时保持稳健的多模态能力。我们的方法涉及冻结MDETR骨干，并训练一个组件，即深度融合编码器（DFE），来表示图像和文本模态。一个可学习的上下文向量使DFE能够在这些模态之间切换。在RefCOCO、RefCOCO+和RefCOCOg等数据集上的评估表明，LightMDETR实现了更高的精度和准确性。

更新时间: 2024-08-20 12:27:53

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.10787v1

Planning Domain Model Acquisition from State Traces without Action Parameters

Previous STRIPS domain model acquisition approaches that learn from state traces start with the names and parameters of the actions to be learned. Therefore their only task is to deduce the preconditions and effects of the given actions. In this work, we explore learning in situations when the parameters of learned actions are not provided. We define two levels of trace quality based on which information is provided and present an algorithm for each. In one level (L1), the states in the traces are labeled with action names, so we can deduce the number and names of the actions, but we still need to work out the number and types of parameters. In the other level (L2), the states are additionally labeled with objects that constitute the parameters of the corresponding grounded actions. Here we still need to deduce the types of the parameters in the learned actions. We experimentally evaluate the proposed algorithms and compare them with the state-of-the-art learning tool FAMA on a large collection of IPC benchmarks. The evaluation shows that our new algorithms are faster, can handle larger inputs and provide better results in terms of learning action models more similar to reference models.

Updated: 2024-08-20 12:24:00

标题: 从状态跟踪中获取规划领域模型，无需动作参数

摘要: 先前的STRIPS领域模型获取方法是从状态跟踪中学习，从要学习的动作的名称和参数开始。因此，它们的唯一任务是推断给定动作的前提条件和效果。在这项工作中，我们探讨了在学习动作的参数未提供的情况下的学习。我们根据提供的信息定义了两个跟踪质量水平，并为每个水平提出了一种算法。在一个水平（L1）中，跟踪中的状态带有动作名称的标记，因此我们可以推断出动作的数量和名称，但仍需要确定参数的数量和类型。在另一个水平（L2）中，状态还另外带有构成相应基础动作的对象的标记。在这里，我们仍需要推断学习动作的参数类型。我们通过对大量IPC基准测试进行实验评估所提出的算法，并将它们与最新的学习工具FAMA进行比较。评估结果显示，我们的新算法更快，可以处理更大的输入，并在学习行为模型方面提供更接近参考模型的结果。

更新时间: 2024-08-20 12:24:00

领域: cs.AI

下载: http://arxiv.org/abs/2402.10726v2

Just a Hint: Point-Supervised Camouflaged Object Detection

Camouflaged Object Detection (COD) demands models to expeditiously and accurately distinguish objects which conceal themselves seamlessly in the environment. Owing to the subtle differences and ambiguous boundaries, COD is not only a remarkably challenging task for models but also for human annotators, requiring huge efforts to provide pixel-wise annotations. To alleviate the heavy annotation burden, we propose to fulfill this task with the help of only one point supervision. Specifically, by swiftly clicking on each object, we first adaptively expand the original point-based annotation to a reasonable hint area. Then, to avoid partial localization around discriminative parts, we propose an attention regulator to scatter model attention to the whole object through partially masking labeled regions. Moreover, to solve the unstable feature representation of camouflaged objects under only point-based annotation, we perform unsupervised contrastive learning based on differently augmented image pairs (e.g. changing color or doing translation). On three mainstream COD benchmarks, experimental results show that our model outperforms several weakly-supervised methods by a large margin across various metrics.

Updated: 2024-08-20 12:17:25

标题: Just a Hint: Point-Supervised Camouflaged Object Detection 只是一个提示：点监督的伪装物体检测

摘要: 伪装对象检测（COD）要求模型迅速而准确地区分那些在环境中无缝隐藏的对象。由于微妙的差异和模糊的边界，COD不仅对模型而言是一项极具挑战性的任务，对人类标注者也是如此，需要投入大量工作提供像素级的标注。为了减轻繁重的标注负担，我们提出只使用一个点的监督来完成这项任务。具体来说，通过快速点击每个对象，我们首先自适应地将原始点标注扩展到一个合理的提示区域。然后，为了避免在区分性部分周围进行局部定位，我们提出了一个注意力调节器，通过部分遮盖标记区域来分散模型的注意力到整个对象。此外，为了解决仅基于点标注的伪装对象的不稳定特征表示，我们进行了基于不同增强图像对（例如改变颜色或进行平移）的无监督对比学习。在三个主流COD基准测试中，实验结果显示，我们的模型在各种指标上大幅优于几种弱监督方法。

更新时间: 2024-08-20 12:17:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.10777v1

DropKAN: Regularizing KANs by masking post-activations

We propose DropKAN (Dropout Kolmogorov-Arnold Networks) a regularization method that prevents co-adaptation of activation function weights in Kolmogorov-Arnold Networks (KANs). DropKAN functions by embedding the drop mask directly within the KAN layer, randomly masking the outputs of some activations within the KANs' computation graph. We show that this simple procedure that require minimal coding effort has a regularizing effect and consistently lead to better generalization of KANs. We analyze the adaptation of the standard Dropout with KANs and demonstrate that Dropout applied to KANs' neurons can lead to unpredictable behavior in the feedforward pass. We carry an empirical study with real world Machine Learning datasets to validate our findings. Our results suggest that DropKAN is consistently a better alternative to using standard Dropout with KANs, and improves the generalization performance of KANs. Our implementation of DropKAN is available at: \url{https://github.com/Ghaith81/dropkan}.

Updated: 2024-08-20 12:16:13

标题: DropKAN：通过屏蔽后激活来正则化KANs

摘要: 我们提出了DropKAN（Dropout Kolmogorov-Arnold Networks），这是一种正则化方法，可以防止Kolmogorov-Arnold Networks（KANs）中激活函数权重的共适应。DropKAN通过直接嵌入掉落掩码在KAN层中，随机屏蔽KAN计算图中一些激活的输出来实现功能。我们展示了这个简单的过程需要最少的编码工作，具有正则化效果，并且一贯地导致KAN的更好泛化。我们分析了标准Dropout与KAN的适应性，并展示了应用于KAN的神经元的Dropout可能导致前向传递中的不可预测行为。我们进行了一项实证研究，使用真实的机器学习数据集来验证我们的发现。我们的结果表明，DropKAN一贯是使用标准Dropout与KAN的更好选择，并提高了KAN的泛化性能。我们的DropKAN实现可在以下链接找到：\url{https://github.com/Ghaith81/dropkan}。

更新时间: 2024-08-20 12:16:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.13044v4

SSL-TTS: Leveraging Self-Supervised Embeddings and kNN Retrieval for Zero-Shot Multi-speaker TTS

While recent zero-shot multispeaker text-to-speech (TTS) models achieve impressive results, they typically rely on extensive transcribed speech datasets from numerous speakers and intricate training pipelines. Meanwhile, self-supervised learning (SSL) speech features have emerged as effective intermediate representations for TTS. It was also observed that SSL features from different speakers that are linearly close share phonetic information while maintaining individual speaker identity, which enables straight-forward and robust voice cloning. In this study, we introduce SSL-TTS, a lightweight and efficient zero-shot TTS framework trained on transcribed speech from a single speaker. SSL-TTS leverages SSL features and retrieval methods for simple and robust zero-shot multi-speaker synthesis. Objective and subjective evaluations show that our approach achieves performance comparable to state-of-the-art models that require significantly larger training datasets. The low training data requirements mean that SSL-TTS is well suited for the development of multi-speaker TTS systems for low-resource domains and languages. We also introduce an interpolation parameter which enables fine control over the output speech by blending voices. Demo samples are available at https://idiap.github.io/ssl-tts

Updated: 2024-08-20 12:09:58

标题: SSL-TTS：利用自监督嵌入和kNN检索实现零样本多说话者TTS

摘要: 最近的零样本多说话者文本到语音（TTS）模型取得了令人印象深刻的成果，但它们通常依赖于来自众多说话者的广泛转录语音数据集和复杂的训练流程。与此同时，自监督学习（SSL）语音特征已经被证明是TTS的有效中间表示。同时观察到，线性接近的不同说话者的SSL特征共享音素信息，同时保持个体说话者身份，这使得直接和稳健的语音克隆成为可能。在这项研究中，我们介绍了SSL-TTS，这是一个轻量级高效的零样本TTS框架，训练于来自单个说话者的转录语音。SSL-TTS利用SSL特征和检索方法进行简单和稳健的零样本多说话者合成。客观和主观评估显示，我们的方法的性能与需要明显更大的训练数据集的最先进模型相当。低训练数据要求意味着SSL-TTS非常适合用于低资源领域和语言的多说话者TTS系统的开发。我们还引入了一个插值参数，可以通过融合声音对输出语音进行精细控制。演示样本可在https://idiap.github.io/ssl-tts上找到。

更新时间: 2024-08-20 12:09:58

领域: eess.AS,cs.AI,cs.LG,cs.SD

下载: http://arxiv.org/abs/2408.10771v1

An Open Source Python Library for Anonymizing Sensitive Data

Open science is a fundamental pillar to promote scientific progress and collaboration, based on the principles of open data, open source and open access. However, the requirements for publishing and sharing open data are in many cases difficult to meet in compliance with strict data protection regulations. Consequently, researchers need to rely on proven methods that allow them to anonymize their data without sharing it with third parties. To this end, this paper presents the implementation of a Python library for the anonymization of sensitive tabular data. This framework provides users with a wide range of anonymization methods that can be applied on the given dataset, including the set of identifiers, quasi-identifiers, generalization hierarchies and allowed level of suppression, along with the sensitive attribute and the level of anonymity required. The library has been implemented following best practices for integration and continuous development, as well as the use of workflows to test code coverage based on unit and functional tests.

Updated: 2024-08-20 12:01:57

标题: 一个开源的用于匿名化敏感数据的Python库

摘要: 开放科学是促进科学进步和合作的基本支柱，基于开放数据、开源和开放获取的原则。然而，出版和分享开放数据的要求在许多情况下很难满足严格的数据保护法规。因此，研究人员需要依赖已被证明有效的方法来对其数据进行匿名化处理，而不与第三方分享。为此，本文介绍了一个用于对敏感表格数据进行匿名化处理的Python库的实现。该框架为用户提供了一系列可以应用于给定数据集的匿名化方法，包括标识符集、准标识符、泛化层次结构和允许的抑制级别，以及敏感属性和所需的匿名级别。该库是根据最佳实践进行实现的，包括集成和持续开发，以及使用工作流程进行基于单元和功能测试的代码覆盖率测试。

更新时间: 2024-08-20 12:01:57

领域: cs.CR,cs.DB,cs.SE

下载: http://arxiv.org/abs/2408.10766v1

$p$SVM: Soft-margin SVMs with $p$-norm Hinge Loss

Support Vector Machines (SVMs) based on hinge loss have been extensively discussed and applied to various binary classification tasks. These SVMs achieve a balance between margin maximization and the minimization of slack due to outliers. Although many efforts have been dedicated to enhancing the performance of SVMs with hinge loss, studies on $p$SVMs, soft-margin SVMs with $p$-norm hinge loss, remain relatively scarce. In this paper, we explore the properties, performance, and training algorithms of $p$SVMs. We first derive the generalization bound of $p$SVMs, then formulate the dual optimization problem, comparing it with the traditional approach. Furthermore, we discuss a generalized version of the Sequential Minimal Optimization (SMO) algorithm, $p$SMO, to train our $p$SVM model. Comparative experiments on various datasets, including binary and multi-class classification tasks, demonstrate the effectiveness and advantages of our $p$SVM model and the $p$SMO method. Code is available at https://github.com/CoderBak/pSVM.

Updated: 2024-08-20 12:00:00

标题: $p$SVM：带有$p$-范数铰链损失的软间隔支持向量机

摘要: 基于铰链损失的支持向量机（SVM）已被广泛讨论并应用于各种二元分类任务。这些SVM在最大化边界和由于异常值而导致的松弛最小化之间取得了平衡。尽管许多努力已致力于提高具有铰链损失的SVM的性能，但对于$p$SVM，即具有$p$-范数铰链损失的软边界SVM的研究相对较少。在本文中，我们探讨了$p$SVM的性质、性能和训练算法。我们首先推导了$p$SVM的泛化界限，然后制定了对偶优化问题，并将其与传统方法进行比较。此外，我们讨论了一种广义的顺序最小优化（SMO）算法$p$SMO，用于训练我们的$p$SVM模型。对各种数据集进行的比较实验，包括二元和多类分类任务，证明了我们的$p$SVM模型和$p$SMO方法的有效性和优势。代码可在https://github.com/CoderBak/pSVM 上找到。

更新时间: 2024-08-20 12:00:00

领域: cs.LG

下载: http://arxiv.org/abs/2408.09908v2

Human-inspired Explanations for Vision Transformers and Convolutional Neural Networks

We introduce Foveation-based Explanations (FovEx), a novel human-inspired visual explainability (XAI) method for Deep Neural Networks. Our method achieves state-of-the-art performance on both transformer (on 4 out of 5 metrics) and convolutional models (on 3 out of 5 metrics), demonstrating its versatility. Furthermore, we show the alignment between the explanation map produced by FovEx and human gaze patterns (+14\% in NSS compared to RISE, +203\% in NSS compared to gradCAM), enhancing our confidence in FovEx's ability to close the interpretation gap between humans and machines.

Updated: 2024-08-20 11:57:06

标题: 人类启发的视觉变换器和卷积神经网络的解释

摘要: 我们介绍了一种新颖的人类启发式视觉可解释性（XAI）方法，即基于视觉聚焦的解释（FovEx）。我们的方法在变压器模型（在5个指标中的4个上）和卷积模型（在5个指标中的3个上）上均实现了最先进的性能，展示了其多功能性。此外，我们展示了FovEx生成的解释图与人类凝视模式之间的对齐性（与RISE相比NSS提高了14\%，与gradCAM相比提高了203%），增强了我们对FovEx缩小人类与机器之间解释差距的能力的信心。

更新时间: 2024-08-20 11:57:06

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.02123v2

Limited Communications Distributed Optimization via Deep Unfolded Distributed ADMM

Distributed optimization is a fundamental framework for collaborative inference and decision making in decentralized multi-agent systems. The operation is modeled as the joint minimization of a shared objective which typically depends on observations gathered locally by each agent. Distributed optimization algorithms, such as the common D-ADMM, tackle this task by iteratively combining local computations and message exchanges. One of the main challenges associated with distributed optimization, and particularly with D-ADMM, is that it requires a large number of communications, i.e., messages exchanged between the agents, to reach consensus. This can make D-ADMM costly in power, latency, and channel resources. In this work we propose unfolded D-ADMM, which follows the emerging deep unfolding methodology to enable D-ADMM to operate reliably with a predefined and small number of messages exchanged by each agent. Unfolded D-ADMM fully preserves the operation of D-ADMM, while leveraging data to tune the hyperparameters of each iteration of the algorithm. These hyperparameters can either be agent-specific, aiming at achieving the best performance within a fixed number of iterations over a given network, or shared among the agents, allowing to learn to distributedly optimize over different networks. For both settings, our unfolded D-ADMM operates with limited communications, while preserving the interpretability and flexibility of the original D-ADMM algorithm. We specialize unfolded D-ADMM for two representative settings: a distributed estimation task, considering a sparse recovery setup, and a distributed learning scenario, where multiple agents collaborate in learning a machine learning model. Our numerical results demonstrate that the proposed approach dramatically reduces the number of communications utilized by D-ADMM, without compromising on its performance.

Updated: 2024-08-20 11:56:03

标题: Limited Communications Distributed Optimization via Deep Unfolded Distributed ADMM 通过深度展开的分布式ADMM实现有限通信的分布式优化

摘要: 分布式优化是分布式多智能体系统中协作推理和决策的基本框架。该操作被建模为共同最小化一个共享目标，通常取决于每个智能体本地收集的观测。分布式优化算法，如常见的D-ADMM，通过迭代地结合本地计算和消息交换来解决这个任务。分布式优化及特别是D-ADMM面临的主要挑战之一是需要大量的通信，即智能体之间交换的消息，才能达成共识。这可能使D-ADMM在功耗、延迟和信道资源方面成本高昂。在这项工作中，我们提出了展开的D-ADMM，它遵循新兴的深度展开方法论，使D-ADMM能够可靠地运行，并通过每个智能体交换的预定义和少量消息。展开的D-ADMM完全保留了D-ADMM的操作，同时利用数据来调整算法的每次迭代的超参数。这些超参数可以是特定于智能体的，旨在在给定网络上的固定迭代次数内实现最佳性能，也可以在智能体之间共享，从而允许在不同网络上学习分布式优化。对于这两种设置，我们的展开的D-ADMM在有限的通信条件下运行，同时保留了原始D-ADMM算法的可解释性和灵活性。我们为两种典型设置专门设计了展开的D-ADMM：一个分布式估计任务，考虑稀疏恢复设置；一个分布式学习场景，多个智能体合作学习机器学习模型。我们的数值结果表明，所提出的方法显著减少了D-ADMM所使用的通信数量，而不影响其性能。

更新时间: 2024-08-20 11:56:03

领域: math.OC,cs.LG,eess.SP

下载: http://arxiv.org/abs/2309.14353v2

SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection

Most Camouflaged Object Detection (COD) methods heavily rely on mask annotations, which are time-consuming and labor-intensive to acquire. Existing weakly-supervised COD approaches exhibit significantly inferior performance compared to fully-supervised methods and struggle to simultaneously support all the existing types of camouflaged object labels, including scribbles, bounding boxes, and points. Even for Segment Anything Model (SAM), it is still problematic to handle the weakly-supervised COD and it typically encounters challenges of prompt compatibility of the scribble labels, extreme response, semantically erroneous response, and unstable feature representations, producing unsatisfactory results in camouflaged scenes. To mitigate these issues, we propose a unified COD framework in this paper, termed SAM-COD, which is capable of supporting arbitrary weakly-supervised labels. Our SAM-COD employs a prompt adapter to handle scribbles as prompts based on SAM. Meanwhile, we introduce response filter and semantic matcher modules to improve the quality of the masks obtained by SAM under COD prompts. To alleviate the negative impacts of inaccurate mask predictions, a new strategy of prompt-adaptive knowledge distillation is utilized to ensure a reliable feature representation. To validate the effectiveness of our approach, we have conducted extensive empirical experiments on three mainstream COD benchmarks. The results demonstrate the superiority of our method against state-of-the-art weakly-supervised and even fully-supervised methods.

Updated: 2024-08-20 11:49:27

标题: SAM-COD：基于SAM引导的弱监督伪装目标检测统一框架

摘要: 大多数伪装物体检测（COD）方法严重依赖于需要耗时和劳动密集的蒙版标注。现有的弱监督COD方法在性能上明显不如完全监督方法，并且难以同时支持所有现有类型的伪装物体标签，包括涂鸦、边界框和点。即使对于Segment Anything Model（SAM），处理弱监督COD仍然存在问题，通常会遇到涂鸦标签的及时兼容性、极端响应、语义错误响应和不稳定的特征表示等挑战，在伪装场景中产生不理想的结果。为了缓解这些问题，我们在本文中提出了一个统一的COD框架，称为SAM-COD，能够支持任意的弱监督标签。我们的SAM-COD采用了一个及时适配器来处理基于SAM的涂鸦提示。同时，我们引入了响应过滤器和语义匹配器模块，以提高SAM在COD提示下获得的蒙版的质量。为了减轻不准确蒙版预测的负面影响，采用了一种新的策略，即提示自适应知识蒸馏，以确保可靠的特征表示。为了验证我们方法的有效性，我们在三个主流的COD基准上进行了广泛的实证实验。结果表明，我们的方法优于最先进的弱监督甚至完全监督方法。

更新时间: 2024-08-20 11:49:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.10760v1

Side-Channel Analysis of OpenVINO-based Neural Network Models

Embedded devices with neural network accelerators offer great versatility for their users, reducing the need to use cloud-based services. At the same time, they introduce new security challenges in the area of hardware attacks, the most prominent being side-channel analysis (SCA). It was shown that SCA can recover model parameters with a high accuracy, posing a threat to entities that wish to keep their models confidential. In this paper, we explore the susceptibility of quantized models implemented in OpenVINO, an embedded framework for deploying neural networks on embedded and Edge devices. We show that it is possible to recover model parameters with high precision, allowing the recovered model to perform very close to the original one. Our experiments on GoogleNet v1 show only a 1% difference in the Top 1 and a 0.64% difference in the Top 5 accuracies.

Updated: 2024-08-20 11:48:43

标题: 基于OpenVINO的神经网络模型的侧信道分析

摘要: 带有神经网络加速器的嵌入式设备为用户提供了极大的灵活性，减少了使用基于云的服务的需求。同时，它们在硬件攻击领域引入了新的安全挑战，其中最突出的是侧信道分析（SCA）。研究表明，SCA可以以高精度恢复模型参数，对希望保持其模型机密性的实体构成威胁。本文探讨了在OpenVINO中实现的量化模型在嵌入式和边缘设备上部署神经网络的易感性。我们展示了可以以高精度恢复模型参数，使恢复的模型执行非常接近原始模型。我们在GoogleNet v1上的实验显示，在Top 1准确度上仅有1%的差异，而在Top 5准确度上仅有0.64%的差异。

更新时间: 2024-08-20 11:48:43

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2407.16467v2

NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

The inherent diversity of computation types within the deep neural network (DNN) models often requires a variety of specialized units in hardware processors, which limits computational efficiency, increasing both inference latency and power consumption, especially when the hardware processor needs to support and execute different neural networks. In this study, we introduce NeuralMatrix, which elastically transforms the computations of entire DNNs into linear matrix operations. This transformation allows seamless execution of various DNN models all with matrix operations and paves the way for running versatile DNN models with a single General Matrix Multiplication (GEMM) accelerator.Extensive experiments with both CNN and transformer-based models demonstrate the potential of NeuralMatrix to accurately and efficiently execute a wide range of DNN models, achieving 2.17-38.72 times computation efficiency (i.e., throughput per power) compared to CPUs, GPUs, and SoC platforms. This level of efficiency is usually only attainable with the accelerator designed for a specific neural network.

Updated: 2024-08-20 11:45:34

标题: 神经矩阵：使用线性矩阵运算计算整个神经网络以进行高效推断

摘要: 深度神经网络（DNN）模型内在的计算类型多样性通常需要硬件处理器中各种专门的单元，这限制了计算效率，增加了推理延迟和功耗，特别是当硬件处理器需要支持和执行不同的神经网络时。在本研究中，我们介绍了NeuralMatrix，它可以将整个DNN的计算弹性地转换为线性矩阵运算。这种转换允许使用矩阵运算无缝执行各种DNN模型，并为使用单个通用矩阵乘法（GEMM）加速器运行多功能DNN模型铺平了道路。通过对CNN和基于transformer的模型进行大量实验，证明了NeuralMatrix准确高效地执行各种DNN模型的潜力，与CPU、GPU和SoC平台相比，实现了2.17-38.72倍的计算效率（即功率下的吞吐量）。通常情况下，只有为特定神经网络设计的加速器才能达到这种效率水平。

更新时间: 2024-08-20 11:45:34

领域: cs.LG,cs.AI,cs.AR

下载: http://arxiv.org/abs/2305.14405v4

Generating Synthetic Fair Syntax-agnostic Data by Learning and Distilling Fair Representation

Data Fairness is a crucial topic due to the recent wide usage of AI powered applications. Most of the real-world data is filled with human or machine biases and when those data are being used to train AI models, there is a chance that the model will reflect the bias in the training data. Existing bias-mitigating generative methods based on GANs, Diffusion models need in-processing fairness objectives and fail to consider computational overhead while choosing computationally-heavy architectures, which may lead to high computational demands, instability and poor optimization performance. To mitigate this issue, in this work, we present a fair data generation technique based on knowledge distillation, where we use a small architecture to distill the fair representation in the latent space. The idea of fair latent space distillation enables more flexible and stable training of Fair Generative Models (FGMs). We first learn a syntax-agnostic (for any data type) fair representation of the data, followed by distillation in the latent space into a smaller model. After distillation, we use the distilled fair latent space to generate high-fidelity fair synthetic data. While distilling, we employ quality loss (for fair distillation) and utility loss (for data utility) to ensure that the fairness and data utility characteristics remain in the distilled latent space. Our approaches show a 5%, 5% and 10% rise in performance in fairness, synthetic sample quality and data utility, respectively, than the state-of-the-art fair generative model.

Updated: 2024-08-20 11:37:52

标题: 生成合成的公平的语法无关数据通过学习和提炼公平表示

摘要: 数据公平性是一个关键主题，因为近年来人工智能应用广泛使用。大部分现实世界的数据充满了人类或机器的偏见，当这些数据被用来训练AI模型时，模型可能会反映训练数据中的偏见。现有基于GANs、扩散模型的偏见缓解生成方法需要内部处理公平性目标，并且在选择计算密集的架构时未考虑计算开销，这可能导致高计算需求、不稳定性和优化性能不佳。为了缓解这个问题，在这项工作中，我们提出了一种基于知识蒸馏的公平数据生成技术，其中我们使用一个小型架构在潜在空间中提炼公平的表示。公平潜在空间提炼的思想使得公平生成模型（FGMs）的训练更加灵活和稳定。我们首先学习数据的语法不可知（适用于任何数据类型）的公平表示，然后在潜在空间中进行提炼到一个更小的模型。在提炼过程中，我们使用质量损失（用于公平提炼）和效用损失（用于数据效用）来确保公平性和数据效用特征保持在提炼后的潜在空间中。我们的方法在公平性、合成样本质量和数据效用方面分别比最先进的公平生成模型提高了5%、5%和10%。

更新时间: 2024-08-20 11:37:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.10755v1

Security Assessment of Hierarchical Federated Deep Learning

Hierarchical federated learning (HFL) is a promising distributed deep learning model training paradigm, but it has crucial security concerns arising from adversarial attacks. This research investigates and assesses the security of HFL using a novel methodology by focusing on its resilience against adversarial attacks inference-time and training-time. Through a series of extensive experiments across diverse datasets and attack scenarios, we uncover that HFL demonstrates robustness against untargeted training-time attacks due to its hierarchical structure. However, targeted attacks, particularly backdoor attacks, exploit this architecture, especially when malicious clients are positioned in the overlapping coverage areas of edge servers. Consequently, HFL shows a dual nature in its resilience, showcasing its capability to recover from attacks thanks to its hierarchical aggregation that strengthens its suitability for adversarial training, thereby reinforcing its resistance against inference-time attacks. These insights underscore the necessity for balanced security strategies in HFL systems, leveraging their inherent strengths while effectively mitigating vulnerabilities.

Updated: 2024-08-20 11:34:23

标题: 层次化联合深度学习的安全评估

摘要: 分层联合学习（HFL）是一种有前途的分布式深度学习模型训练范式，但它存在着由对抗攻击引起的关键安全问题。本研究通过一种新颖的方法学，调查和评估了HFL的安全性，重点关注其对抗攻击推理时间和训练时间的弹性。通过一系列广泛的实验跨越不同数据集和攻击场景，我们发现HFL对于未定向的训练时间攻击表现出强大的韧性，这是由于其分层结构。然而，有针对性的攻击，特别是后门攻击，利用了这种架构，特别是当恶意客户端位于边缘服务器重叠覆盖区域时。因此，HFL在其弹性方面表现出双重性质，展示了其能够从攻击中恢复的能力，这要归功于其加强了对抗性训练的分层聚合，从而加强了其对推理时间攻击的抵抗力。这些见解强调了在HFL系统中平衡安全策略的必要性，充分利用其固有的优势，有效地缓解漏洞。

更新时间: 2024-08-20 11:34:23

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2408.10752v1

Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-Tuning

Large language models (LLMs) have unlocked a plethora of powerful applications at the network edge, such as intelligent personal assistants. Data privacy and security concerns have prompted a shift towards edge-based fine-tuning of personal LLMs, away from cloud reliance. However, this raises issues of computational intensity and resource scarcity, hindering training efficiency and feasibility. While current studies investigate parameter-efficient fine-tuning (PEFT) techniques to mitigate resource constraints, our analysis indicates that these techniques are not sufficiently resource-efficient for edge devices. To tackle these challenges, we propose Pluto and Charon (PAC), a time and memory efficient collaborative edge AI framework for personal LLMs fine-tuning. PAC breaks the resource wall of personal LLMs fine-tuning with a sophisticated algorithm-system co-design. (1) Algorithmically, PAC implements a personal LLMs fine-tuning technique that is efficient in terms of parameters, time, and memory. It utilizes Parallel Adapters to circumvent the need for a full backward pass through the LLM backbone. Additionally, an activation cache mechanism further streamlining the process by negating the necessity for repeated forward passes across multiple epochs. (2) Systematically, PAC leverages edge devices in close proximity, pooling them as a collective resource for in-situ personal LLMs fine-tuning, utilizing a hybrid data and pipeline parallelism to orchestrate distributed training. The use of the activation cache eliminates the need for forward pass through the LLM backbone,enabling exclusive fine-tuning of the Parallel Adapters using data parallelism. Extensive evaluation based on prototype implementation demonstrates that PAC remarkably outperforms state-of-the-art approaches, achieving up to 8.64x end-to-end speedup and up to 88.16% reduction in memory footprint.

Updated: 2024-08-20 11:30:12

标题: 冥王星和卡戎：一种面向个人LLMs微调的时间和内存高效的协作边缘AI框架

摘要: 大型语言模型（LLMs）已经在网络边缘解锁了大量强大的应用，例如智能个人助手。数据隐私和安全问题促使向基于边缘的个人LLMs的精细调整转变，远离云依赖。然而，这引发了计算密集度和资源稀缺性的问题，阻碍了训练效率和可行性。虽然目前的研究在研究参数高效的精细调整（PEFT）技术以缓解资源限制，但我们的分析表明这些技术对于边缘设备来说并不足够资源高效。为了解决这些挑战，我们提出了Pluto和Charon（PAC），一个时间和内存高效的个人LLMs精细调整的协作边缘人工智能框架。PAC通过复杂的算法-系统共同设计打破了个人LLMs精细调整的资源壁垒。算法上，PAC实现了一种在参数、时间和内存方面高效的个人LLMs精细调整技术。它利用并行适配器来规避需要通过LLM骨干进行完整反向传递的需求。此外，激活缓存机制通过消除跨多个时代重复进行前向传递的必要性进一步简化了流程。系统上，PAC利用靠近的边缘设备，将它们作为集体资源用于就地个人LLMs精细调整，利用混合数据和管道并行性来编排分布式训练。激活缓存的使用消除了通过LLM骨干的前向传递的需求，使得可以使用数据并行性专门对并行适配器进行精细调整。基于原型实现的广泛评估表明，PAC明显优于最先进的方法，实现了高达8.64倍的端到端加速和高达88.16%的内存占用减少。

更新时间: 2024-08-20 11:30:12

领域: cs.DC,cs.AI,cs.LG,cs.NI

下载: http://arxiv.org/abs/2408.10746v1

Non-autoregressive Generative Models for Reranking Recommendation

Contemporary recommendation systems are designed to meet users' needs by delivering tailored lists of items that align with their specific demands or interests. In a multi-stage recommendation system, reranking plays a crucial role by modeling the intra-list correlations among items. The key challenge of reranking lies in the exploration of optimal sequences within the combinatorial space of permutations. Recent research proposes a generator-evaluator learning paradigm, where the generator generates multiple feasible sequences and the evaluator picks out the best sequence based on the estimated listwise score. The generator is of vital importance, and generative models are well-suited for the generator function. Current generative models employ an autoregressive strategy for sequence generation. However, deploying autoregressive models in real-time industrial systems is challenging. To address these issues, we propose a Non-AutoRegressive generative model for reranking Recommendation (NAR4Rec) designed to enhance efficiency and effectiveness. To tackle challenges such as sparse training samples and dynamic candidates, we introduce a matching model. Considering the diverse nature of user feedback, we employ a sequence-level unlikelihood training objective to differentiate feasible sequences from unfeasible ones. Additionally, to overcome the lack of dependency modeling in non-autoregressive models regarding target items, we introduce contrastive decoding to capture correlations among these items. Extensive offline experiments validate the superior performance of NAR4Rec over state-of-the-art reranking methods. Online A/B tests reveal that NAR4Rec significantly enhances the user experience. Furthermore, NAR4Rec has been fully deployed in a popular video app Kuaishou with over 300 million daily active users.

Updated: 2024-08-20 11:29:37

标题: 非自回归生成模型用于重新排序推荐

摘要: 当代推荐系统旨在通过提供符合用户特定需求或兴趣的定制列表来满足用户需求。在多阶段推荐系统中，重新排序在建模项目之间的列表内相关性方面起着至关重要的作用。重新排序的关键挑战在于在排列的组合空间内探索最佳序列。最近的研究提出了一个生成器-评估器学习范式，其中生成器生成多个可行序列，评估器根据估计的列表得分选择最佳序列。生成器是至关重要的，生成模型非常适合生成器功能。当前的生成模型采用自回归策略进行序列生成。然而，在实时工业系统中部署自回归模型具有挑战性。为了解决这些问题，我们提出了一个用于重新排序推荐的非自回归生成模型（NAR4Rec），旨在提高效率和效果。为了解决稀疏训练样本和动态候选者等挑战，我们引入了一个匹配模型。考虑到用户反馈的多样性，我们采用了一个序列级别的不可能性训练目标来区分可行序列和不可行序列。此外，为了克服非自回归模型在目标项目方面缺乏依赖建模，我们引入对比解码来捕捉这些项目之间的相关性。广泛的离线实验验证了NAR4Rec在最先进的重新排序方法上的卓越表现。在线A/B测试显示，NAR4Rec显著提升了用户体验。此外，NAR4Rec已经完全部署在一个拥有超过3亿日活跃用户的热门视频应用快手中。

更新时间: 2024-08-20 11:29:37

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2402.06871v4

Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective

Graphs are a natural representation for systems based on relations between connected entities. Combinatorial optimization problems, which arise when considering an objective function related to a process of interest on discrete structures, are often challenging due to the rapid growth of the solution space. The trial-and-error paradigm of Reinforcement Learning has recently emerged as a promising alternative to traditional methods, such as exact algorithms and (meta)heuristics, for discovering better decision-making strategies in a variety of disciplines including chemistry, computer science, and statistics. Despite the fact that they arose in markedly different fields, these techniques share significant commonalities. Therefore, we set out to synthesize this work in a unifying perspective that we term Graph Reinforcement Learning, interpreting it as a constructive decision-making method for graph problems. After covering the relevant technical background, we review works along the dividing line of whether the goal is to optimize graph structure given a process of interest, or to optimize the outcome of the process itself under fixed graph structure. Finally, we discuss the common challenges facing the field and open research questions. In contrast with other surveys, the present work focuses on non-canonical graph problems for which performant algorithms are typically not known and Reinforcement Learning is able to provide efficient and effective solutions.

Updated: 2024-08-20 11:21:32

标题: 图强化学习用于组合优化问题：调研和统一视角

摘要: 图表是基于连接实体之间关系的系统的自然表示。组合优化问题在考虑与离散结构上感兴趣的过程相关的目标函数时会出现，通常由于解空间的快速增长而具有挑战性。强化学习的试错范式最近作为一种有望替代传统方法（如精确算法和（元）启发式方法）的方法而出现，用于在化学、计算机科学和统计学等各种学科中发现更好的决策策略。尽管它们起源于明显不同的领域，但这些技术共享重要的共同点。因此，我们着手在一个统一的视角中综合这项工作，将其称为图强化学习，将其解释为一种用于图问题的建设性决策方法。在覆盖相关技术背景后，我们审查了工作，分为是否优化给定过程下的图结构目标，或者在固定图结构下优化过程本身结果的分界线。最后，我们讨论了该领域面临的共同挑战和开放的研究问题。与其他调查不同，本文侧重于非规范图问题，通常不知道高性能算法，并且强化学习能够提供高效和有效的解决方案。

更新时间: 2024-08-20 11:21:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.06492v2

What can Large Language Models Capture about Code Functional Equivalence?

Code-LLMs, LLMs pre-trained on large code corpora, have shown great progress in learning rich representations of the structure and syntax of code, successfully using it to generate or classify code fragments. At the same time, understanding if they are able to do so because they capture code semantics, and how well, is still an open question. In this paper, we tackle this problem by introducing SeqCoBench, a benchmark for systematically assessing how Code-LLMs can capture code functional equivalence. SeqCoBench contains over 20 code transformations that either preserve or alter the semantics of Python programs. We conduct extensive evaluations in different settings, including zero-shot and parameter-efficient finetuning methods on state-of-the-art (Code-)LLMs to see if they can discern semantically equivalent or different pairs of programs in SeqCoBench. We find that the performance gap between these LLMs and classical match-based retrieval scores is minimal, with both approaches showing a concerning lack of depth in understanding code semantics.

Updated: 2024-08-20 11:19:06

标题: 大型语言模型能捕捉代码功能等价性的哪些内容？

摘要: Code-LLMs是在大型代码语料库上预训练的LLMs，在学习代码的结构和语法的丰富表示方面取得了巨大进展，成功地将其用于生成或分类代码片段。与此同时，了解它们能否做到这一点是因为它们捕捉了代码语义，以及它们做得有多好，仍然是一个开放的问题。在本文中，我们通过引入SeqCoBench来解决这个问题，这是一个用于系统评估Code-LLMs如何捕捉代码功能等价性的基准测试。SeqCoBench包含超过20种代码转换，这些转换要么保留Python程序的语义，要么改变其语义。我们在不同设置下进行了广泛的评估，包括在最先进的（代码）LLMs上进行零-shot和参数高效的微调方法，以查看它们是否能够在SeqCoBench中分辨语义等价或不同的程序对。我们发现，这些LLMs和传统的基于匹配的检索得分之间的性能差距是最小的，两种方法都显示出对理解代码语义的深度的令人担忧的缺乏。

更新时间: 2024-08-20 11:19:06

领域: cs.SE,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.11081v1

Online SLA Decomposition: Enabling Real-Time Adaptation to Evolving Systems

When a network slice spans multiple domains, each domain must uphold the End-to-End (E2E) Service Level Agreement (SLA) associated with the slice. This requires decomposing the E2E SLA into partial SLAs for each domain. In a two-level network slicing management system with an E2E orchestrator and local controllers, we propose an online learning-decomposition framework that dynamically updates risk models using recent feedback. This approach utilizes online gradient descent and FIFO memory buffers to enhance stability and robustness. Our empirical study shows the proposed framework outperforms state-of-the-art static methods, offering more accurate and resilient SLA decomposition under varying conditions and sparse data.

Updated: 2024-08-20 11:17:56

标题: 在线SLA分解：实现对不断发展系统的实时适应

摘要: 当一个网络切片跨越多个域时，每个域必须维护与该切片相关的端到端（E2E）服务级别协议（SLA）。这要求将端到端SLA分解为每个域的部分SLA。在一个具有端到端编排器和本地控制器的两级网络切片管理系统中，我们提出了一个在线学习分解框架，通过使用最近的反馈动态更新风险模型。这种方法利用在线梯度下降和FIFO内存缓冲区来增强稳定性和鲁棒性。我们的实证研究表明，所提出的框架优于最先进的静态方法，在不同条件和稀疏数据下提供更准确和弹性的SLA分解。

更新时间: 2024-08-20 11:17:56

领域: cs.NI,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.08968v2

PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection

Phishing attacks are a major threat to online security, exploiting user vulnerabilities to steal sensitive information. Various methods have been developed to counteract phishing, each with varying levels of accuracy, but they also encounter notable limitations. In this study, we introduce PhishAgent, a multimodal agent that combines a wide range of tools, integrating both online and offline knowledge bases with Multimodal Large Language Models (MLLMs). This combination leads to broader brand coverage, which enhances brand recognition and recall. Furthermore, we propose a multimodal information retrieval framework designed to extract the top k relevant items from offline knowledge bases, utilizing all available information from a webpage, including logos, HTML, and URLs. Our empirical results, based on three real-world datasets, demonstrate that the proposed framework significantly enhances detection accuracy and reduces both false positives and false negatives, while maintaining model efficiency. Additionally, PhishAgent shows strong resilience against various types of adversarial attacks.

Updated: 2024-08-20 11:14:21

标题: PhishAgent：一种用于检测网络钓鱼网页的稳健多模态代理

摘要: 网络钓鱼攻击是在线安全的主要威胁，利用用户的弱点窃取敏感信息。已经开发了各种方法来对抗网络钓鱼，每种方法的准确性不同，但也存在显著的局限性。在这项研究中，我们介绍了PhishAgent，这是一个多模态代理，结合了广泛的工具，将在线和离线知识库与多模态大语言模型（MLLMs）集成在一起。这种组合导致更广泛的品牌覆盖范围，提高了品牌认知和记忆。此外，我们提出了一个多模态信息检索框架，旨在从离线知识库中提取前k个相关项目，利用网页上的所有可用信息，包括标志、HTML和URL。基于三个真实世界数据集的实证结果表明，所提出的框架显著提高了检测准确性，减少了假阳性和假阴性，同时保持了模型的效率。此外，PhishAgent 对各种类型的对抗性攻击表现出强大的抵抗力。

更新时间: 2024-08-20 11:14:21

领域: cs.CR

下载: http://arxiv.org/abs/2408.10738v1

Tailoring Graph Neural Network-based Flow-guided Localization to Individual Bloodstreams and Activities

Flow-guided localization using in-body nanodevices in the bloodstream is expected to be beneficial for early disease detection, continuous monitoring of biological conditions, and targeted treatment. The nanodevices face size and power constraints that produce erroneous raw data for localization purposes. On-body anchors receive this data, and use it to derive the locations of diagnostic events of interest. Different Machine Learning (ML) approaches have been recently proposed for this task, yet they are currently restricted to a reference bloodstream of a resting patient. As such, they are unable to deal with the physical diversity of patients' bloodstreams and cannot provide continuous monitoring due to changes in individual patient's activities. Toward addressing these issues for the current State-of-the-Art (SotA) flow-guided localization approach based on Graph Neural Networks (GNNs), we propose a pipeline for GNN adaptation based on individual physiological indicators including height, weight, and heart rate. Our results indicate that the proposed adaptions are beneficial in reconciling the individual differences between bloodstreams and activities.

Updated: 2024-08-20 11:12:23

标题: 将基于图神经网络的流引导定位定制到个体血流和活动

摘要: 使用体内纳米器件在血液中进行流动引导定位预计将有助于早期疾病检测、持续监测生物条件和靶向治疗。纳米器件面临尺寸和功耗限制，会产生定位目的的错误原始数据。外置锚点接收这些数据，并用它来推导感兴趣的诊断事件的位置。最近已经提出了不同的机器学习（ML）方法来完成这项任务，但目前仅限于静息患者的参考血流。因此，它们无法处理患者血流的物理多样性，并且无法提供持续监测，因为患者个体活动的变化。为了解决当前基于图神经网络（GNNs）的最新流动引导定位方法的这些问题，我们提出了一个基于个体生理指标（包括身高、体重和心率）的GNN适应性流程。我们的结果表明，所提出的适应性有助于调和血流和活动之间的个体差异。

更新时间: 2024-08-20 11:12:23

领域: cs.LG,cs.AI,cs.ET,cs.NI

下载: http://arxiv.org/abs/2408.01239v2

Persistent Ballistic Entanglement Spreading with Optimal Control in Quantum Spin Chains

Entanglement propagation provides a key routine to understand quantum many-body dynamics in and out of equilibrium. The entanglement entropy (EE) usually approaches to a sub-saturation known as the Page value $\tilde{S}_{P} =\tilde{S} - dS$ (with $\tilde{S}$ the maximum of EE and $dS$ the Page correction) in, e.g., the random unitary evolutions. The ballistic spreading of EE usually appears in the early time and will be deviated far before the Page value is reached. In this work, we uncover that the magnetic field that maximizes the EE robustly induces persistent ballistic spreading of entanglement in quantum spin chains. The linear growth of EE is demonstrated to persist till the maximal $\tilde{S}$ (along with a flat entanglement spectrum) is reached. The robustness of ballistic spreading and the enhancement of EE under such an optimal control are demonstrated, considering particularly perturbing the initial state by random pure states (RPS's). These are argued as the results from the endomorphism of the time evolution under such an entanglement-enhancing optimal control for the RPS's.

Updated: 2024-08-20 11:09:16

标题: 量子自旋链中具有最优控制的持续性球形纠缠扩展

摘要: 纠缠传播为理解量子多体动力学在平衡和非平衡状态下提供了关键的常规。纠缠熵（EE）通常接近于一个称为Page值$\tilde{S}_{P}=\tilde{S} - dS$（其中$\tilde{S}$是EE的最大值，$dS$是Page修正）的亚饱和状态，例如，在随机幺正演化中。在早期出现的EE的球状扩散通常会在达到Page值之前偏离。在这项工作中，我们发现最大化EE的磁场在量子自旋链中稳健地诱导了持续的球状扩散纠缠。EE的线性增长被证明将持续到达到最大$\tilde{S}$（以及一个平坦的纠缠谱）。通过特别考虑通过随机纯态（RPS's）扰动初始状态来展示球状扩散的稳健性和EE的增强。这些被认为是在RPS's的纠缠增强最优控制下时间演化的内态结果。

更新时间: 2024-08-20 11:09:16

领域: quant-ph,cond-mat.str-el,cs.LG

下载: http://arxiv.org/abs/2307.11609v2

LLaMEA: A Large Language Model Evolutionary Algorithm for Automatically Generating Metaheuristics

Large Language Models (LLMs) such as GPT-4 have demonstrated their ability to understand natural language and generate complex code snippets. This paper introduces a novel Large Language Model Evolutionary Algorithm (LLaMEA) framework, leveraging GPT models for the automated generation and refinement of algorithms. Given a set of criteria and a task definition (the search space), LLaMEA iteratively generates, mutates and selects algorithms based on performance metrics and feedback from runtime evaluations. This framework offers a unique approach to generating optimized algorithms without requiring extensive prior expertise. We show how this framework can be used to generate novel black-box metaheuristic optimization algorithms automatically. LLaMEA generates multiple algorithms that outperform state-of-the-art optimization algorithms (Covariance Matrix Adaptation Evolution Strategy and Differential Evolution) on the five dimensional black box optimization benchmark (BBOB). The algorithms also show competitive performance on the 10- and 20-dimensional instances of the test functions, although they have not seen such instances during the automated generation process. The results demonstrate the feasibility of the framework and identify future directions for automated generation and optimization of algorithms via LLMs.

Updated: 2024-08-20 11:06:09

标题: LLaMEA：一个用于自动生成元启发式算法的大型语言模型进化算法

摘要: 大型语言模型（LLMs）如GPT-4已经证明了它们能够理解自然语言并生成复杂的代码片段。本文介绍了一种新颖的大型语言模型进化算法（LLaMEA）框架，利用GPT模型自动生成和优化算法。给定一组标准和任务定义（搜索空间），LLaMEA迭代地生成、变异和选择算法，基于性能指标和运行时评估的反馈。这个框架提供了一种独特的方法来生成优化算法，而不需要广泛的先前专业知识。我们展示了这个框架如何可以被用来自动生成新颖的黑盒元启发式优化算法。LLaMEA生成了多种算法，这些算法在五维黑盒优化基准测试（BBOB）上优于最先进的优化算法（协方差矩阵自适应进化策略和差分进化）。这些算法在测试函数的10维和20维实例上也表现出竞争性能，尽管它们在自动生成过程中没有见过这样的实例。结果表明了该框架的可行性，并确定了通过LLMs实现算法自动生成和优化的未来方向。

更新时间: 2024-08-20 11:06:09

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2405.20132v3

Towards Efficient Large Language Models for Scientific Text: A Review

Large language models (LLMs) have ushered in a new era for processing complex information in various fields, including science. The increasing amount of scientific literature allows these models to acquire and understand scientific knowledge effectively, thus improving their performance in a wide range of tasks. Due to the power of LLMs, they require extremely expensive computational resources, intense amounts of data, and training time. Therefore, in recent years, researchers have proposed various methodologies to make scientific LLMs more affordable. The most well-known approaches align in two directions. It can be either focusing on the size of the models or enhancing the quality of data. To date, a comprehensive review of these two families of methods has not yet been undertaken. In this paper, we (I) summarize the current advances in the emerging abilities of LLMs into more accessible AI solutions for science, and (II) investigate the challenges and opportunities of developing affordable solutions for scientific domains using LLMs.

Updated: 2024-08-20 10:57:34

标题: 朝向高效的科学文本大语言模型：综述

摘要: 大型语言模型（LLMs）已经引领了处理各个领域中复杂信息的新时代，包括科学领域。不断增加的科学文献数量使得这些模型能够有效地获取和理解科学知识，从而提高它们在各种任务中的性能。由于LLMs的强大能力，它们需要极其昂贵的计算资源、大量的数据和训练时间。因此，近年来，研究人员提出了各种方法来使科学LLMs更具经济性。最知名的方法可以归结为两个方向，即专注于模型的大小或提高数据质量。迄今为止，对这两类方法的综合评估尚未进行。在本文中，我们（I）总结了LLMs不断增强的能力，将其转化为更易于访问的科学AI解决方案，并（II）探讨了利用LLMs开发经济实惠的科学领域解决方案所面临的挑战和机遇。

更新时间: 2024-08-20 10:57:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.10729v1

Quantum Artificial Intelligence: A Brief Survey

Quantum Artificial Intelligence (QAI) is the intersection of quantum computing and AI, a technological synergy with expected significant benefits for both. In this paper, we provide a brief overview of what has been achieved in QAI so far and point to some open questions for future research. In particular, we summarize some major key findings on the feasability and the potential of using quantum computing for solving computationally hard problems in various subfields of AI, and vice versa, the leveraging of AI methods for building and operating quantum computing devices.

Updated: 2024-08-20 10:55:17

标题: 量子人工智能：简要调查

摘要: 量子人工智能（QAI）是量子计算和人工智能的交叉领域，这种技术协同有望为两者带来显著的益处。本文简要概述了迄今为止在QAI领域取得的成就，并指出了一些未来研究的开放问题。特别是，我们总结了一些关于利用量子计算解决人工智能各个子领域中的计算难题的可行性和潜力的主要关键发现，以及反过来，利用人工智能方法来构建和操作量子计算设备的方法。

更新时间: 2024-08-20 10:55:17

领域: quant-ph,cs.AI

下载: http://arxiv.org/abs/2408.10726v1

MEGen: Generative Backdoor in Large Language Models via Model Editing

Large language models (LLMs) have demonstrated remarkable capabilities. Their powerful generative abilities enable flexible responses based on various queries or instructions. Emerging as widely adopted generalists for diverse tasks, LLMs are still vulnerable to backdoors. This paper proposes an editing-based generative backdoor, named MEGen, aiming to create a customized backdoor for NLP tasks with the least side effects. In our approach, we first leverage a language model to insert a trigger selected on fixed metrics into the input, then design a pipeline of model editing to directly embed a backdoor into an LLM. By adjusting a small set of local parameters with a mini-batch of samples, MEGen significantly enhances time efficiency and achieves high robustness. Experimental results indicate that our backdoor attack strategy achieves a high attack success rate on poison data while maintaining the model's performance on clean data. Notably, the backdoored model, when triggered, can freely output pre-set dangerous information while successfully completing downstream tasks. This suggests that future LLM applications could be guided to deliver certain dangerous information, thus altering the LLM's generative style. We believe this approach provides insights for future LLM applications and the execution of backdoor attacks on conversational AI systems.

Updated: 2024-08-20 10:44:29

标题: MEGen：通过模型编辑在大型语言模型中生成后门

摘要: 大型语言模型（LLMs）展示了显著的能力。它们强大的生成能力使其能够基于各种查询或指令提供灵活的响应。作为广泛采用的多功能主义者，LLMs仍然容易受到后门攻击。本文提出了一种基于编辑的生成后门攻击，名为MEGen，旨在为NLP任务创建一个定制的后门，副作用最小。在我们的方法中，我们首先利用语言模型将一个根据固定指标选择的触发器插入输入，然后设计一个模型编辑的流水线，直接将一个后门嵌入LLM中。通过调整一小组局部参数与一个小批量样本，MEGen显著提高了时间效率并实现了高鲁棒性。实验结果表明，我们的后门攻击策略在毒数据上取得了高攻击成功率，同时保持了模型在干净数据上的性能。值得注意的是，当触发时，后门模型可以自由输出预设的危险信息，同时成功完成下游任务。这表明未来LLM应用可以被引导来传递特定危险信息，从而改变LLM的生成风格。我们相信这种方法为未来LLM应用和对话AI系统的后门攻击执行提供了见解。

更新时间: 2024-08-20 10:44:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.10722v1

Towards Foundation Models for the Industrial Forecasting of Chemical Kinetics

Scientific Machine Learning is transforming traditional engineering industries by enhancing the efficiency of existing technologies and accelerating innovation, particularly in modeling chemical reactions. Despite recent advancements, the issue of solving stiff chemically reacting problems within computational fluid dynamics remains a significant issue. In this study we propose a novel approach utilizing a multi-layer-perceptron mixer architecture (MLP-Mixer) to model the time-series of stiff chemical kinetics. We evaluate this method using the ROBER system, a benchmark model in chemical kinetics, to compare its performance with traditional numerical techniques. This study provides insight into the industrial utility of the recently developed MLP-Mixer architecture to model chemical kinetics and provides motivation for such neural architecture to be used as a base for time-series foundation models.

Updated: 2024-08-20 10:43:09

标题: 朝着化学动力学工业预测的基础模型

摘要: 科学机器学习正在通过提高现有技术的效率和加速创新，改变传统工程行业，特别是在建模化学反应方面。尽管近年来取得了进展，但在计算流体动力学中解决刚性化学反应问题仍然是一个重要问题。在这项研究中，我们提出了一种新颖的方法，利用多层感知器混合器架构（MLP-Mixer）来建模刚性化学动力学的时间序列。我们使用ROBER系统进行评估，这是化学动力学中的一个基准模型，以比较其性能与传统数值技术的差异。这项研究为最近开发的MLP-Mixer架构在建模化学动力学方面的工业实用性提供了见解，并为将此类神经架构用作时间序列基础模型的动机提供了支持。

更新时间: 2024-08-20 10:43:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.10720v1

Segment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation

Referring Image Segmentation (RIS) - the problem of identifying objects in images through natural language sentences - is a challenging task currently mostly solved through supervised learning. However, while collecting referred annotation masks is a time-consuming process, the few existing weakly-supervised and zero-shot approaches fall significantly short in performance compared to fully-supervised learning ones. To bridge the performance gap without mask annotations, we propose a novel weakly-supervised framework that tackles RIS by decomposing it into three steps: obtaining instance masks for the object mentioned in the referencing instruction (segment), using zero-shot learning to select a potentially correct mask for the given instruction (select), and bootstrapping a model which allows for fixing the mistakes of zero-shot selection (correct). In our experiments, using only the first two steps (zero-shot segment and select) outperforms other zero-shot baselines by as much as 16.5%, while our full method improves upon this much stronger baseline and sets the new state-of-the-art for weakly-supervised RIS, reducing the gap between the weakly-supervised and fully-supervised methods in some cases from around 33% to as little as 7%. Code is available at https://github.com/fgirbal/segment-select-correct.

Updated: 2024-08-20 10:35:24

标题: 分割、选择、修正：一种弱监督引用分割的框架

摘要: 引用图像分割（RIS）-通过自然语言句子识别图像中的对象的问题-是一个具有挑战性的任务，目前主要通过监督学习来解决。然而，尽管收集引用注释蒙版是一个耗时的过程，但目前存在的几种弱监督和零样本方法在性能上明显不及完全监督学习。为了弥补性能差距，我们提出了一种新颖的弱监督框架，通过将RIS分解为三个步骤来解决问题：获取引用指令中提到的对象的实例蒙版（分割），使用零样本学习为给定指令选择可能的正确蒙版（选择），并引导一个模型，允许修正零样本选择的错误（校正）。在我们的实验中，仅使用前两个步骤（零样本分割和选择）的表现比其他零样本基线高出多达16.5％，而我们的完整方法改进了这一更强的基线，并为弱监督RIS设定了新的最先进水平，将弱监督和完全监督方法之间的差距在某些情况下从约33％减少到仅7％。代码可在https://github.com/fgirbal/segment-select-correct上找到。

更新时间: 2024-08-20 10:35:24

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.13479v3

Predicting Short Term Energy Demand in Smart Grid: A Deep Learning Approach for Integrating Renewable Energy Sources in Line with SDGs 7, 9, and 13

Integrating renewable energy sources into the power grid is becoming increasingly important as the world moves towards a more sustainable energy future in line with SDG 7. However, the intermittent nature of renewable energy sources can make it challenging to manage the power grid and ensure a stable supply of electricity, which is crucial for achieving SDG 9. In this paper, we propose a deep learning model for predicting energy demand in a smart power grid, which can improve the integration of renewable energy sources by providing accurate predictions of energy demand. Our approach aligns with SDG 13 on climate action, enabling more efficient management of renewable energy resources. We use long short-term memory networks, well-suited for time series data, to capture complex patterns and dependencies in energy demand data. The proposed approach is evaluated using four historical short-term energy demand data datasets from different energy distribution companies, including American Electric Power, Commonwealth Edison, Dayton Power and Light, and Pennsylvania-New Jersey-Maryland Interconnection. The proposed model is compared with three other state-of-the-art forecasting algorithms: Facebook Prophet, Support Vector Regression, and Random Forest Regression. The experimental results show that the proposed REDf model can accurately predict energy demand with a mean absolute error of 1.4%, indicating its potential to enhance the stability and efficiency of the power grid and contribute to achieving SDGs 7, 9, and 13. The proposed model also has the potential to manage the integration of renewable energy sources effectively.

Updated: 2024-08-20 10:34:28

标题: 预测智能电网中的短期能源需求：一种深度学习方法，以符合可再生能源与SDG 7、9和13的整合

摘要: 将可再生能源源整合到电网中变得越来越重要，因为世界正在朝着与SDG 7一致的更可持续能源未来迈进。然而，可再生能源的间歇性特性可能使电网管理和确保稳定的电力供应变得具有挑战性，这对于实现SDG 9至关重要。本文提出了一种用于预测智能电网中能源需求的深度学习模型，通过提供准确的能源需求预测，可以改善可再生能源的整合。我们的方法符合SDG 13关于气候行动的要求，可以实现对可再生能源资源的更高效管理。我们使用长短期记忆网络，适用于时间序列数据，来捕捉能源需求数据中的复杂模式和依赖关系。我们的方法使用了来自不同能源分配公司的四个历史短期能源需求数据集进行评估，包括美国电力、共和电力、戴顿电力和光线以及宾夕法尼亚-新泽西-马里兰互连。我们将所提出的模型与其他三种最先进的预测算法进行了比较：Facebook Prophet、支持向量回归和随机森林回归。实验结果显示，所提出的REDf模型可以准确预测能源需求，平均绝对误差为1.4%，表明其具有提高电网稳定性和效率，并有助于实现SDGs 7、9和13的潜力。所提出的模型还具有有效管理可再生能源整合的潜力。

更新时间: 2024-08-20 10:34:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2304.03997v4

Accelerated training of deep learning surrogate models for surface displacement and flow, with application to MCMC-based history matching of CO2 storage operations

Deep learning surrogate modeling shows great promise for subsurface flow applications, but the training demands can be substantial. Here we introduce a new surrogate modeling framework to predict CO2 saturation, pressure and surface displacement for use in the history matching of carbon storage operations. Rather than train using a large number of expensive coupled flow-geomechanics simulation runs, training here involves a large number of inexpensive flow-only simulations combined with a much smaller number of coupled runs. The flow-only runs use an effective rock compressibility, which is shown to provide accurate predictions for saturation and pressure for our system. A recurrent residual U-Net architecture is applied for the saturation and pressure surrogate models, while a new residual U-Net model is introduced to predict surface displacement. The surface displacement surrogate accepts, as inputs, geomodel quantities along with saturation and pressure surrogate predictions. Median relative error for a diverse test set is less than 4% for all variables. The surrogate models are incorporated into a hierarchical Markov chain Monte Carlo history matching workflow. Surrogate error is included using a new treatment involving the full model error covariance matrix. A high degree of prior uncertainty, with geomodels characterized by uncertain geological scenario parameters (metaparameters) and associated realizations, is considered. History matching results for a synthetic true model are generated using in-situ monitoring-well data only, surface displacement data only, and both data types. The enhanced uncertainty reduction achieved with both data types is quantified. Posterior saturation and surface displacement fields are shown to correspond well with the true solution.

Updated: 2024-08-20 10:31:52

标题: 加速训练深度学习替代模型，用于表面位移和流动，以及应用于基于MCMC的CO2存储操作历史匹配

摘要: 深度学习代理建模在地下流应用中显示出巨大潜力，但培训需求可能很大。在这里，我们介绍了一个新的代理建模框架，用于预测二氧化碳饱和度、压力和地表位移，以用于碳储存操作的历史匹配。与使用大量昂贵的耦合流-地质力学模拟运行进行训练不同，这里的训练涉及大量廉价的仅流动模拟，以及少量耦合运行。仅流动运行使用有效的岩石压缩性，据显示可为我们的系统提供饱和度和压力的准确预测。对于饱和度和压力代理模型，采用了一种循环残差U-Net架构，同时引入了一个新的残差U-Net模型来预测地表位移。地表位移代理接受地质模型数量以及饱和度和压力代理预测作为输入。对于各种测试集，中位相对误差对所有变量均低于4％。代理模型被整合到分层马尔可夫链蒙特卡洛历史匹配工作流程中。代理误差使用一种涉及完整模型误差协方差矩阵的新处理方法。考虑了极高的先验不确定性，其中地质场景参数（元参数）和相关实现使地质模型具有不确定性。使用仅在现场监测井数据、仅地表位移数据以及两种数据类型的历史匹配结果生成了一个合成真实模型。通过两种数据类型实现的增强不确定性减少得到量化。后验饱和度和地表位移场显示出与真实解的良好对应。

更新时间: 2024-08-20 10:31:52

领域: cs.LG

下载: http://arxiv.org/abs/2408.10717v1

Fine-Tuning a Local LLaMA-3 Large Language Model for Automated Privacy-Preserving Physician Letter Generation in Radiation Oncology

Generating physician letters is a time-consuming task in daily clinical practice. This study investigates local fine-tuning of large language models (LLMs), specifically LLaMA models, for physician letter generation in a privacy-preserving manner within the field of radiation oncology. Our findings demonstrate that base LLaMA models, without fine-tuning, are inadequate for effectively generating physician letters. The QLoRA algorithm provides an efficient method for local intra-institutional fine-tuning of LLMs with limited computational resources (i.e., a single 48 GB GPU workstation within the hospital). The fine-tuned LLM successfully learns radiation oncology-specific information and generates physician letters in an institution-specific style. ROUGE scores of the generated summary reports highlight the superiority of the 8B LLaMA-3 model over the 13B LLaMA-2 model. Further multidimensional physician evaluations of 10 cases reveal that, although the fine-tuned LLaMA-3 model has limited capacity to generate content beyond the provided input data, it successfully generates salutations, diagnoses and treatment histories, recommendations for further treatment, and planned schedules. Overall, clinical benefit was rated highly by the clinical experts (average score of 3.44 on a 4-point scale). With careful physician review and correction, automated LLM-based physician letter generation has significant practical value.

Updated: 2024-08-20 10:31:36

标题: 调整本地LLaMA-3大型语言模型，用于放射肿瘤学自动生成隐私保护医师信件

摘要: 生成医生信件是日常临床实践中一项耗时的任务。本研究调查了大型语言模型（LLMs）的本地微调，特别是LLaMA模型，在放射肿瘤学领域以隐私保护的方式生成医生信件。我们的研究结果表明，未经微调的基本LLaMA模型无法有效地生成医生信件。QLoRA算法提供了一种在医院内部使用有限计算资源（即单个48 GB GPU工作站）进行LLMs本地机构内微调的高效方法。经过微调的LLM成功学习了放射肿瘤学特定信息，并以机构特定风格生成医生信件。生成的摘要报告的ROUGE分数突显了8B LLaMA-3模型优于13B LLaMA-2模型。进一步对10例病例进行的多维医师评估显示，虽然经过微调的LLaMA-3模型在提供的输入数据之外生成内容的能力有限，但它成功生成了问候语、诊断和治疗历史、进一步治疗建议和计划日程。总体而言，临床专家高度评价了临床效益（在4分制上的平均得分为3.44分）。经过仔细的医生审查和更正，基于LLM的自动医生信件生成具有重要的实际价值。

更新时间: 2024-08-20 10:31:36

领域: cs.AI

下载: http://arxiv.org/abs/2408.10715v1

Offline Model-Based Reinforcement Learning with Anti-Exploration

Model-based reinforcement learning (MBRL) algorithms learn a dynamics model from collected data and apply it to generate synthetic trajectories to enable faster learning. This is an especially promising paradigm in offline reinforcement learning (RL) where data may be limited in quantity, in addition to being deficient in coverage and quality. Practical approaches to offline MBRL usually rely on ensembles of dynamics models to prevent exploitation of any individual model and to extract uncertainty estimates that penalize values in states far from the dataset support. Uncertainty estimates from ensembles can vary greatly in scale, making it challenging to generalize hyperparameters well across even similar tasks. In this paper, we present Morse Model-based offline RL (MoMo), which extends the anti-exploration paradigm found in offline model-free RL to the model-based space. We develop model-free and model-based variants of MoMo and show how the model-free version can be extended to detect and deal with out-of-distribution (OOD) states using explicit uncertainty estimation without the need for large ensembles. MoMo performs offline MBRL using an anti-exploration bonus to counteract value overestimation in combination with a policy constraint, as well as a truncation function to terminate synthetic rollouts that are excessively OOD. Experimentally, we find that both model-free and model-based MoMo perform well, and the latter outperforms prior model-based and model-free baselines on the majority of D4RL datasets tested.

Updated: 2024-08-20 10:29:21

标题: 具有反探索的离线基于模型的强化学习

摘要: 基于模型的强化学习（MBRL）算法从收集到的数据中学习动态模型，并将其应用于生成合成轨迹，以加快学习速度。在离线强化学习（RL）中，数据量可能有限，覆盖范围和质量也可能不足，因此这种方法尤其具有潜力。实际的离线MBRL方法通常依赖于动态模型的集合，以防止对任何单个模型的过度利用，并提取不确定性估计，惩罚远离数据集支持的状态值。集合中的不确定性估计可能在规模上有很大差异，使得难以在甚至相似的任务之间很好地推广超参数。在本文中，我们提出了基于莫尔斯（Morse）模型的离线RL（MoMo），将离线无模型RL中发现的反探索范式扩展到基于模型的空间。我们开发了MoMo的无模型和基于模型的变体，并展示了无模型版本如何通过明确的不确定性估计来检测和处理分布外（OOD）状态，而无需大量集合。MoMo使用反探索奖励进行离线MBRL，以抵消值的过度估计，结合策略约束以及终止过度OOD的合成轨迹的截断函数。实验结果表明，无模型和基于模型的MoMo表现良好，后者在大多数测试的D4RL数据集上胜过之前的基于模型和无模型基线。

更新时间: 2024-08-20 10:29:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.10713v1

Investigating Context Effects in Similarity Judgements in Large Language Models

Large Language Models (LLMs) have revolutionised the capability of AI models in comprehending and generating natural language text. They are increasingly being used to empower and deploy agents in real-world scenarios, which make decisions and take actions based on their understanding of the context. Therefore researchers, policy makers and enterprises alike are working towards ensuring that the decisions made by these agents align with human values and user expectations. That being said, human values and decisions are not always straightforward to measure and are subject to different cognitive biases. There is a vast section of literature in Behavioural Science which studies biases in human judgements. In this work we report an ongoing investigation on alignment of LLMs with human judgements affected by order bias. Specifically, we focus on a famous human study which showed evidence of order effects in similarity judgements, and replicate it with various popular LLMs. We report the different settings where LLMs exhibit human-like order effect bias and discuss the implications of these findings to inform the design and development of LLM based applications.

Updated: 2024-08-20 10:26:02

标题: 使用大型语言模型调查相似性判断中的背景效应

摘要: 大型语言模型（LLMs）已经彻底改变了人工智能模型理解和生成自然语言文本的能力。它们越来越被用于赋能和部署在现实世界场景中的代理人，这些代理人根据对上下文的理解做出决策并采取行动。因此，研究人员、决策者和企业都在努力确保这些代理人做出的决策符合人类价值观和用户期望。尽管如此，人类价值观和决策并不总是容易衡量，而且受不同认知偏见的影响。行为科学领域有大量文献研究人类判断中的偏见。在这项工作中，我们报告了一项正在进行的研究，探讨LLMs与受到顺序偏见影响的人类判断的一致性。具体来说，我们关注一项著名的人类研究，该研究显示了相似性判断中的顺序效应的证据，并使用各种流行的LLMs进行复制实验。我们报告了LLMs展现出人类式顺序效应偏见的不同设置，并讨论这些发现对LLM基础应用的设计和开发的影响。

更新时间: 2024-08-20 10:26:02

领域: cs.AI

下载: http://arxiv.org/abs/2408.10711v1

Coarse-to-Fine Detection of Multiple Seams for Robotic Welding

Efficiently detecting target weld seams while ensuring sub-millimeter accuracy has always been an important challenge in autonomous welding, which has significant application in industrial practice. Previous works mostly focused on recognizing and localizing welding seams one by one, leading to inferior efficiency in modeling the workpiece. This paper proposes a novel framework capable of multiple weld seams extraction using both RGB images and 3D point clouds. The RGB image is used to obtain the region of interest by approximately localizing the weld seams, and the point cloud is used to achieve the fine-edge extraction of the weld seams within the region of interest using region growth. Our method is further accelerated by using a pre-trained deep learning model to ensure both efficiency and generalization ability. The performance of the proposed method has been comprehensively tested on various workpieces featuring both linear and curved weld seams and in physical experiment systems. The results showcase considerable potential for real-world industrial applications, emphasizing the method's efficiency and effectiveness. Videos of the real-world experiments can be found at https://youtu.be/pq162HSP2D4.

Updated: 2024-08-20 10:24:59

标题: 粗到精检测机器焊接的多道缝隙

摘要: 在自主焊接中，高效地检测目标焊缝并确保亚毫米精度一直是一个重要挑战，这在工业实践中具有重要应用。先前的研究大多集中在逐个识别和定位焊缝，导致了模拟工件的效率较低。本文提出了一个新颖的框架，能够使用RGB图像和3D点云来提取多个焊缝。RGB图像用于通过大致定位焊缝来获取感兴趣区域，点云则用于利用区域生长在感兴趣区域内实现焊缝的精细边缘提取。我们的方法进一步通过使用预先训练的深度学习模型来加速，以确保效率和泛化能力。提出的方法在具有直线和曲线焊缝的各种工件上进行了全面测试，并在物理实验系统中展示了可观的真实世界工业应用潜力，强调了该方法的效率和有效性。真实世界实验的视频可以在https://youtu.be/pq162HSP2D4 中找到。

更新时间: 2024-08-20 10:24:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.10710v1

Breaking Language Barriers with MMTweets: Advancing Cross-Lingual Debunked Narrative Retrieval for Fact-Checking

Finding previously debunked narratives involves identifying claims that have already undergone fact-checking. The issue intensifies when similar false claims persist in multiple languages, despite the availability of debunks for several months in another language. Hence, automatically finding debunks (or fact-checks) in multiple languages is crucial to make the best use of scarce fact-checkers' resources. Mainly due to the lack of readily available data, this is an understudied problem, particularly when considering the cross-lingual scenario, i.e. the retrieval of debunks in a language different from the language of the online post being checked. This study introduces cross-lingual debunked narrative retrieval and addresses this research gap by: (i) creating Multilingual Misinformation Tweets (MMTweets): a dataset that stands out, featuring cross-lingual pairs, images, human annotations, and fine-grained labels, making it a comprehensive resource compared to its counterparts; (ii) conducting an extensive experiment to benchmark state-of-the-art cross-lingual retrieval models and introducing multistage retrieval methods tailored for the task; and (iii) comprehensively evaluating retrieval models for their cross-lingual and cross-dataset transfer capabilities within MMTweets, and conducting a retrieval latency analysis. We find that MMTweets presents challenges for cross-lingual debunked narrative retrieval, highlighting areas for improvement in retrieval models. Nonetheless, the study provides valuable insights for creating MMTweets datasets and optimising debunked narrative retrieval models to empower fact-checking endeavours. The dataset and annotation codebook are publicly available at https://doi.org/10.5281/zenodo.10637161.

Updated: 2024-08-20 10:24:50

标题: 用MMTweets突破语言障碍：推进跨语言辟谣叙述检索以进行事实核查

摘要: 发现先前被揭穿的叙述涉及识别已经经过事实核查的声明。当类似的虚假声明在多种语言中持续存在，尽管另一种语言中已经有数月的揭穿可用时，问题变得更加严重。因此，在多种语言中自动找到揭穿（或事实核查）对于充分利用有限的事实核查资源至关重要。主要是由于缺乏现成的数据，这是一个研究不足的问题，特别是在考虑跨语言场景时，即在与正在检查的在线帖子的语言不同的语言中检索揭穿。本研究引入了跨语言揭穿叙述检索，并通过以下方式解决了这一研究空白：(i) 创建了多语种误解推文（MMTweets）：一个突出的数据集，具有跨语言对、图像、人工注释和细粒度标签，使其相对于同类资源更为全面；(ii) 进行广泛实验，对最先进的跨语言检索模型进行基准测试，并引入为任务量身定制的多阶段检索方法；(iii) 在MMTweets中全面评估检索模型的跨语言和跨数据集转移能力，并进行检索延迟分析。我们发现，MMTweets对于跨语言揭穿叙述检索提出了挑战，突出了检索模型改进的领域。尽管如此，这项研究为创建MMTweets数据集和优化揭穿叙述检索模型提供了宝贵的见解，以加强事实核查工作。数据集和注释代码手册可在以下网址公开获取：https://doi.org/10.5281/zenodo.10637161。

更新时间: 2024-08-20 10:24:50

领域: cs.CL,cs.CY,cs.IR,cs.LG,cs.SI

下载: http://arxiv.org/abs/2308.05680v2

Variable Assignment Invariant Neural Networks for Learning Logic Programs

Learning from interpretation transition (LFIT) is a framework for learning rules from observed state transitions. LFIT has been implemented in purely symbolic algorithms, but they are unable to deal with noise or generalize to unobserved transitions. Rule extraction based neural network methods suffer from overfitting, while more general implementation that categorize rules suffer from combinatorial explosion. In this paper, we introduce a technique to leverage variable permutation invariance inherent in symbolic domains. Our technique ensures that the permutation and the naming of the variables would not affect the results. We demonstrate the effectiveness and the scalability of this method with various experiments. Our code is publicly available at https://github.com/phuayj/delta-lfit-2

Updated: 2024-08-20 10:23:35

标题: 可变赋值不变神经网络用于学习逻辑程序

摘要: 学习解释转换（LFIT）是一个从观察到的状态转换中学习规则的框架。LFIT已经在纯符号算法中实现，但它们无法处理噪音或推广到未观察到的转换。基于神经网络的规则提取方法容易过拟合，而更一般的实现方法对规则进行分类会导致组合爆炸。在本文中，我们介绍了一种利用符号域中固有的变量置换不变性的技术。我们的技术确保置换和变量命名不会影响结果。我们通过各种实验展示了这种方法的有效性和可扩展性。我们的代码可以在https://github.com/phuayj/delta-lfit-2上公开获取。

更新时间: 2024-08-20 10:23:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.10709v1

An Information-Theoretic Approach to Generalization Theory

We investigate the in-distribution generalization of machine learning algorithms. We depart from traditional complexity-based approaches by analyzing information-theoretic bounds that quantify the dependence between a learning algorithm and the training data. We consider two categories of generalization guarantees: 1) Guarantees in expectation: These bounds measure performance in the average case. Here, the dependence between the algorithm and the data is often captured by information measures. While these measures offer an intuitive interpretation, they overlook the geometry of the algorithm's hypothesis class. Here, we introduce bounds using the Wasserstein distance to incorporate geometry, and a structured, systematic method to derive bounds capturing the dependence between the algorithm and an individual datum, and between the algorithm and subsets of the training data. 2) PAC-Bayesian guarantees: These bounds measure the performance level with high probability. Here, the dependence between the algorithm and the data is often measured by the relative entropy. We establish connections between the Seeger--Langford and Catoni's bounds, revealing that the former is optimized by the Gibbs posterior. We introduce novel, tighter bounds for various types of loss functions. To achieve this, we introduce a new technique to optimize parameters in probabilistic statements. To study the limitations of these approaches, we present a counter-example where most of the information-theoretic bounds fail while traditional approaches do not. Finally, we explore the relationship between privacy and generalization. We show that algorithms with a bounded maximal leakage generalize. For discrete data, we derive new bounds for differentially private algorithms that guarantee generalization even with a constant privacy parameter, which is in contrast to previous bounds in the literature.

Updated: 2024-08-20 10:08:21

标题: 一种信息论方法来研究泛化理论

摘要: 我们研究了机器学习算法的分布内泛化。我们通过分析量化学习算法与训练数据之间依赖关系的信息论界限，离开了传统的基于复杂性的方法。我们考虑了两类泛化保证： 1）期望中的保证：这些界限衡量平均情况下的性能。在这里，算法与数据之间的依赖关系通常由信息度量来捕捉。虽然这些度量提供直观解释，但它们忽视了算法的假设类的几何结构。在这里，我们引入了使用Wasserstein距离来包含几何结构的界限，并引入了一种结构化、系统化的方法来推导捕捉算法与单个数据以及训练数据子集之间依赖关系的界限。 2）PAC-Bayesian保证：这些界限以高概率衡量性能水平。在这里，算法与数据之间的依赖关系通常通过相对熵来衡量。我们建立了Seeger-Langford和Catoni的界限之间的联系，揭示了前者由Gibbs后验最优化。我们为各种类型的损失函数引入了新颖、更紧的界限。为了实现这一点，我们引入了一种新技术来优化概率语句中的参数。为了研究这些方法的局限性，我们提出了一个反例，在这个反例中大多数信息论界限失败，而传统方法则不会。最后，我们探讨了隐私与泛化之间的关系。我们发现具有有界最大泄漏的算法具有泛化性。对于离散数据，我们推导了针对差分隐私算法的新界限，即使具有恒定的隐私参数，也能保证泛化性，这与文献中以前的界限形成对比。

更新时间: 2024-08-20 10:08:21

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2408.13275v1

AnyGraph: Graph Foundation Model in the Wild

The growing ubiquity of relational data structured as graphs has underscored the need for graph learning models with exceptional generalization capabilities. However, current approaches often struggle to effectively extract generalizable insights, frequently requiring extensive fine-tuning and limiting their versatility. Graph foundation models offer a transformative solution, with the potential to learn robust, generalizable representations from graph data. This enables more effective and adaptable applications across a wide spectrum of tasks and domains. In this work, we investigate a unified graph model, AnyGraph, designed to handle key challenges: i) Structure Heterogenity. Addressing distribution shift in graph structural information; ii) Feature Heterogenity. Handling diverse feature representation spaces across graph datasets; iii) Fast Adaptation. Efficiently adapting the model to new graph domains; iv) Scaling Law Emergence. Enabling the model to exhibit scaling law behavior, where its performance scales favorably with the amount of data and parameter sizes. To tackle these critical challenges, we build the AnyGraph upon a Graph Mixture-of-Experts (MoE) architecture. This approach empowers the model to effectively manage both the in-domain and cross-domain distribution shift concerning structure-level and feature-level heterogeneity. Furthermore, a lightweight graph expert routing mechanism is proposed to facilitate AnyGraph's fast adaptability to new data and domains. Our extensive experiments on diverse 38 graph datasets have demonstrated the strong zero-shot learning performance of AnyGraph across diverse graph domains with significant distribution shift. Furthermore, we have validated the model's fast adaptation ability and scaling law emergence, showcasing its versatility.

Updated: 2024-08-20 09:57:13

标题: AnyGraph：野外图基础模型

摘要: 随着以图形结构化的关系数据日益普及，强调了需要具有出色泛化能力的图学习模型。然而，目前的方法往往难以有效地提取可泛化的见解，经常需要大量的微调，限制了它们的多样性。图基础模型提供了一种革命性的解决方案，有潜力从图数据中学习出稳健、可泛化的表示。这使得在广泛的任务和领域中更有效和适应性更强的应用成为可能。在这项工作中，我们研究了一个统一的图模型AnyGraph，旨在处理关键挑战：i）结构异质性。解决图结构信息中的分布转移；ii）特征异质性。处理图数据集中不同特征表示空间；iii）快速适应。高效地将模型适应到新的图领域；iv）尺度定律的出现。使模型能够表现出尺度定律行为，其中其性能随着数据量和参数大小的增加而有利地扩展。为了解决这些关键挑战，我们建立了AnyGraph，基于图混合专家（MoE）架构。这种方法使模型能够有效地处理关于结构水平和特征水平异质性的领域内和跨领域分布转移。此外，提出了一种轻量级图专家路由机制，以促进AnyGraph快速适应新数据和领域。我们在38个不同的图数据集上进行了广泛的实验，证明了AnyGraph在多样化的图领域中具有显著分布转移的强零样本学习性能。此外，我们验证了模型的快速适应能力和尺度定律的出现，展示了其多样性。

更新时间: 2024-08-20 09:57:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.10700v1

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

Since the invention of GPT2--1.5B in 2019, large language models (LLMs) have transitioned from specialized models to versatile foundation models. The LLMs exhibit impressive zero-shot ability, however, require fine-tuning on local datasets and significant resources for deployment. Traditional fine-tuning techniques with the first-order optimizers require substantial GPU memory that exceeds mainstream hardware capability. Therefore, memory-efficient methods are motivated to be investigated. Model compression techniques can reduce energy consumption, operational costs, and environmental impact so that to support sustainable artificial intelligence advancements. Additionally, large-scale foundation models have expanded to create images, audio, videos, and multi-modal contents, further emphasizing the need for efficient deployment. Therefore, we are motivated to present a comprehensive overview of the prevalent memory-efficient fine-tuning methods over the network edge. We also review the state-of-the-art literatures on model compression to provide a vision on deploying LLMs over the network edge.

Updated: 2024-08-20 09:42:17

标题: Feine-Abstimmung und Bereitstellung großer Sprachmodelle über Kanten: Probleme und Ansätze

摘要: 自2019年GPT2-1.5B的发明以来，大型语言模型（LLMs）已经从专业模型转变为多功能基础模型。LLMs展示出令人印象深刻的零样本能力，但是需要在本地数据集上进行微调，并且需要大量资源进行部署。传统的微调技术与一阶优化器需要大量的GPU内存，超出了主流硬件的能力。因此，有必要研究内存高效的方法。模型压缩技术可以降低能源消耗、运营成本和环境影响，从而支持可持续的人工智能进步。此外，大规模的基础模型已经扩展到创建图像、音频、视频和多模态内容，进一步强调了高效部署的需求。因此，我们有动力提供有关网络边缘上流行的内存高效微调方法的综合概述。我们还回顾了有关模型压缩的最新文献，以提供在网络边缘部署LLMs的愿景。

更新时间: 2024-08-20 09:42:17

领域: cs.AI

下载: http://arxiv.org/abs/2408.10691v1

Rejection in Abstract Argumentation: Harder Than Acceptance?

Abstract argumentation is a popular toolkit for modeling, evaluating, and comparing arguments. Relationships between arguments are specified in argumentation frameworks (AFs), and conditions are placed on sets (extensions) of arguments that allow AFs to be evaluated. For more expressiveness, AFs are augmented with \emph{acceptance conditions} on directly interacting arguments or a constraint on the admissible sets of arguments, resulting in dialectic frameworks or constrained argumentation frameworks. In this paper, we consider flexible conditions for \emph{rejecting} an argument from an extension, which we call rejection conditions (RCs). On the technical level, we associate each argument with a specific logic program. We analyze the resulting complexity, including the structural parameter treewidth. Rejection AFs are highly expressive, giving rise to natural problems on higher levels of the polynomial hierarchy.

Updated: 2024-08-20 09:37:04

标题: 在抽象论证中的拒绝：比接受更困难吗？

摘要: 摘要：抽象论证是一种常用的工具包，用于建模、评估和比较论点。论点之间的关系在论证框架（AFs）中被指定，对允许对AFs进行评估的论点集（扩展）施加条件。为了增加表达能力，AFs在直接相互作用的论点上增加了\emph{接受条件}，或者对可接受的论点集施加了约束，从而产生对话框架或受限论证框架。在本文中，我们考虑了用于从扩展中\emph{拒绝}一个论点的灵活条件，我们称之为拒绝条件（RCs）。在技术层面上，我们将每个论点与一个特定的逻辑程序相关联。我们分析了由此产生的复杂性，包括结构参数树宽度。拒绝AFs具有很高的表达能力，导致了在多项式层次的更高水平上出现自然问题。

更新时间: 2024-08-20 09:37:04

领域: cs.AI,cs.CC,cs.LO

下载: http://arxiv.org/abs/2408.10683v1

Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models

LLM have achieved success in many fields but still troubled by problematic content in the training corpora. LLM unlearning aims at reducing their influence and avoid undesirable behaviours. However, existing unlearning methods remain vulnerable to adversarial queries and the unlearned knowledge resurfaces after the manually designed attack queries. As part of a red-team effort to proactively assess the vulnerabilities of unlearned models, we design Dynamic Unlearning Attack (DUA), a dynamic and automated framework to attack these models and evaluate their robustness. It optimizes adversarial suffixes to reintroduce the unlearned knowledge in various scenarios. We find that unlearned knowledge can be recovered in $55.2\%$ of the questions, even without revealing the unlearned model's parameters. In response to this vulnerability, we propose Latent Adversarial Unlearning (LAU), a universal framework that effectively enhances the robustness of the unlearned process. It formulates the unlearning process as a min-max optimization problem and resolves it through two stages: an attack stage, where perturbation vectors are trained and added to the latent space of LLMs to recover the unlearned knowledge, and a defense stage, where previously trained perturbation vectors are used to enhance unlearned model's robustness. With our LAU framework, we obtain two robust unlearning methods, AdvGA and AdvNPO. We conduct extensive experiments across multiple unlearning benchmarks and various models, and demonstrate that they improve the unlearning effectiveness by over $53.5\%$, cause only less than a $11.6\%$ reduction in neighboring knowledge, and have almost no impact on the model's general capabilities.

Updated: 2024-08-20 09:36:04

标题: 朝向稳健的知识遗忘：一种对大型语言模型进行评估和改进知识遗忘稳健性的对抗性框架

摘要: LLM在许多领域取得了成功，但仍然受到训练语料库中存在问题内容的困扰。LLM反学习旨在减少这些内容的影响并避免不良行为。然而，现有的反学习方法仍然容易受到对抗性查询的攻击，手动设计的攻击查询后，被反学习的知识会重新出现。作为主动评估反学习模型的漏洞的红队工作的一部分，我们设计了动态反学习攻击（DUA），这是一个动态和自动化的攻击框架，用于攻击这些模型并评估它们的鲁棒性。它优化对抗性后缀，以在不同场景中重新引入被反学习的知识。我们发现，即使不透露反学习模型的参数，$55.2\%$的问题中可以恢复反学习知识。针对这种漏洞，我们提出了潜在对抗性反学习（LAU），这是一个通用框架，有效增强了反学习过程的鲁棒性。它将反学习过程形式化为一个极小-极大优化问题，并通过两个阶段解决：攻击阶段，训练扰动向量并将其添加到LLM的潜在空间中以恢复反学习知识，以及防御阶段，先前训练过的扰动向量用于增强反学习模型的鲁棒性。借助我们的LAU框架，我们获得了两种鲁棒的反学习方法，AdvGA和AdvNPO。我们在多个反学习基准和各种模型上进行了大量实验，并证明它们将反学习的有效性提高了超过$53.5\%$，仅导致邻近知识减少不到$11.6\%$，几乎不会影响模型的总体能力。

更新时间: 2024-08-20 09:36:04

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2408.10682v1

HMoE: Heterogeneous Mixture of Experts for Language Modeling

Mixture of Experts (MoE) offers remarkable performance and computational efficiency by selectively activating subsets of model parameters. Traditionally, MoE models use homogeneous experts, each with identical capacity. However, varying complexity in input data necessitates experts with diverse capabilities, while homogeneous MoE hinders effective expert specialization and efficient parameter utilization. In this study, we propose a novel Heterogeneous Mixture of Experts (HMoE), where experts differ in size and thus possess diverse capacities. This heterogeneity allows for more specialized experts to handle varying token complexities more effectively. To address the imbalance in expert activation, we propose a novel training objective that encourages the frequent activation of smaller experts, enhancing computational efficiency and parameter utilization. Extensive experiments demonstrate that HMoE achieves lower loss with fewer activated parameters and outperforms conventional homogeneous MoE models on various pre-training evaluation benchmarks. Codes will be released upon acceptance.

Updated: 2024-08-20 09:35:24

标题: HMoE: 用于语言建模的异质专家混合模型

摘要: Mixture of Experts（MoE）通过有选择地激活模型参数的子集，提供了卓越的性能和计算效率。传统上，MoE模型使用具有相同容量的同质专家。然而，输入数据的复杂性变化需要具有不同能力的专家，而同质的MoE阻碍了有效的专家专业化和高效的参数利用。在这项研究中，我们提出了一种新颖的异质专家混合（HMoE）模型，其中专家在大小上有所差异，因此具有不同的能力。这种异质性允许更专业化的专家更有效地处理不同令牌的复杂性。为了解决专家激活的不平衡问题，我们提出了一种新颖的训练目标，鼓励更频繁地激活较小的专家，增强计算效率和参数利用。大量实验证明，HMoE在更少激活参数的情况下实现了更低的损失，并在各种预训练评估基准上优于传统的同质MoE模型。代码将在接受后发布。

更新时间: 2024-08-20 09:35:24

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.10681v1

Hierarchical Retrieval-Augmented Generation Model with Rethink for Multi-hop Question Answering

Multi-hop Question Answering (QA) necessitates complex reasoning by integrating multiple pieces of information to resolve intricate questions. However, existing QA systems encounter challenges such as outdated information, context window length limitations, and an accuracy-quantity trade-off. To address these issues, we propose a novel framework, the Hierarchical Retrieval-Augmented Generation Model with Rethink (HiRAG), comprising Decomposer, Definer, Retriever, Filter, and Summarizer five key modules. We introduce a new hierarchical retrieval strategy that incorporates both sparse retrieval at the document level and dense retrieval at the chunk level, effectively integrating their strengths. Additionally, we propose a single-candidate retrieval method to mitigate the limitations of multi-candidate retrieval. We also construct two new corpora, Indexed Wikicorpus and Profile Wikicorpus, to address the issues of outdated and insufficient knowledge. Our experimental results on four datasets demonstrate that HiRAG outperforms state-of-the-art models across most metrics, and our Indexed Wikicorpus is effective. The code for HiRAG is available at https://github.com/2282588541a/HiRAG

Updated: 2024-08-20 09:29:31

标题: 具有重新思考的分层检索增强生成模型用于多跳问答

摘要: Multi-hop Question Answering (QA) 需要通过整合多个信息片段来解决复杂问题，这需要复杂的推理。然而，现有的QA系统遇到了诸如信息过时、上下文窗口长度限制以及准确度和数量之间的权衡等挑战。为了解决这些问题，我们提出了一个新颖的框架，名为具有重新思考的分层检索增强生成模型（HiRAG），包括分解器、定义器、检索器、过滤器和总结器五个关键模块。我们引入了一种新的分层检索策略，该策略在文档级别进行稀疏检索，同时在块级别进行密集检索，有效地整合了它们的优势。此外，我们提出了一种单候选检索方法，以缓解多候选检索的局限性。我们还构建了两个新的语料库，Indexed Wikicorpus 和 Profile Wikicorpus，以解决过时和知识不足的问题。我们在四个数据集上的实验结果表明，HiRAG在大多数指标上优于最先进的模型，我们的 Indexed Wikicorpus 是有效的。HiRAG的代码可在https://github.com/2282588541a/HiRAG找到。

更新时间: 2024-08-20 09:29:31

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2408.11875v1

Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

Detecting out-of-distribution (OOD) samples is a critical task for reliable machine learning. However, it becomes particularly challenging when the models are trained on long-tailed datasets, as the models often struggle to distinguish tail-class in-distribution samples from OOD samples. We examine the main challenges in this problem by identifying the trade-offs between OOD detection and in-distribution (ID) classification, faced by existing methods. We then introduce our method, called \textit{Representation Norm Amplification} (RNA), which solves this challenge by decoupling the two problems. The main idea is to use the norm of the representation as a new dimension for OOD detection, and to develop a training method that generates a noticeable discrepancy in the representation norm between ID and OOD data, while not perturbing the feature learning for ID classification. Our experiments show that RNA achieves superior performance in both OOD detection and classification compared to the state-of-the-art methods, by 1.70\% and 9.46\% in FPR95 and 2.43\% and 6.87\% in classification accuracy on CIFAR10-LT and ImageNet-LT, respectively. The code for this work is available at https://github.com/dgshin21/RNA.

Updated: 2024-08-20 09:27:07

标题: 长尾学习中基于表示规范的超出分布检测增强

摘要: 检测异常分布（OOD）样本是可靠机器学习的关键任务。然而，当模型在长尾数据集上训练时，尤其具有挑战性，因为模型经常难以区分尾类内分布样本和OOD样本。我们通过识别现有方法所面临的OOD检测和内分布（ID）分类之间的权衡来研究这个问题的主要挑战。然后，我们介绍了我们的方法，称为\textit{表示规范放大}（RNA），通过解耦这两个问题来解决这一挑战。主要思想是使用表示的规范作为OOD检测的新维度，并开发一种训练方法，生成ID和OOD数据之间表示规范的明显差异，同时不扰乱ID分类的特征学习。我们的实验表明，与最先进的方法相比，RNA在OOD检测和分类方面均取得了卓越的性能，分别在CIFAR10-LT和ImageNet-LT上的FPR95分别提高了1.70%和9.46%，分类准确度分别提高了2.43%和6.87%。此作品的代码可在https://github.com/dgshin21/RNA上找到。

更新时间: 2024-08-20 09:27:07

领域: cs.LG

下载: http://arxiv.org/abs/2408.10676v1

World Models Increase Autonomy in Reinforcement Learning

Reinforcement learning (RL) is an appealing paradigm for training intelligent agents, enabling policy acquisition from the agent's own autonomously acquired experience. However, the training process of RL is far from automatic, requiring extensive human effort to reset the agent and environments. To tackle the challenging reset-free setting, we first demonstrate the superiority of model-based (MB) RL methods in such setting, showing that a straightforward adaptation of MBRL can outperform all the prior state-of-the-art methods while requiring less supervision. We then identify limitations inherent to this direct extension and propose a solution called model-based reset-free (MoReFree) agent, which further enhances the performance. MoReFree adapts two key mechanisms, exploration and policy learning, to handle reset-free tasks by prioritizing task-relevant states. It exhibits superior data-efficiency across various reset-free tasks without access to environmental reward or demonstrations while significantly outperforming privileged baselines that require supervision. Our findings suggest model-based methods hold significant promise for reducing human effort in RL. Website: https://sites.google.com/view/morefree

Updated: 2024-08-20 09:23:34

标题: 世界模型在强化学习中增加自主性

摘要: 强化学习（RL）是一种吸引人的范例，用于训练智能代理，使其能够从代理自己自主获得的经验中获得策略。然而，RL的训练过程远非自动化，需要大量人力来重置代理和环境。为了解决这一具有挑战性的无重置设置，我们首先展示了模型为基础（MB）RL方法在这种设置中的优越性，表明MBRL的简单适应能够胜过所有先前的最新方法，同时需要更少的监督。然后，我们确定了这种直接扩展固有的局限性，并提出了一种称为基于模型的无重置（MoReFree）代理的解决方案，进一步提高了性能。MoReFree通过优先处理与任务相关的状态，调整了两个关键机制——探索和策略学习，以处理无重置任务。它在各种无重置任务中表现出卓越的数据效率，而无需访问环境奖励或演示，同时明显优于需要监督的基线。我们的研究结果表明，基于模型的方法在减少RL中的人力投入方面具有重大潜力。网站：https://sites.google.com/view/morefree

更新时间: 2024-08-20 09:23:34

领域: cs.AI

下载: http://arxiv.org/abs/2408.09807v2

Neural Exploratory Landscape Analysis

Recent research in Meta-Black-Box Optimization (MetaBBO) have shown that meta-trained neural networks can effectively guide the design of black-box optimizers, significantly reducing the need for expert tuning and delivering robust performance across complex problem distributions. Despite their success, a paradox remains: MetaBBO still rely on human-crafted Exploratory Landscape Analysis features to inform the meta-level agent about the low-level optimization progress. To address the gap, this paper proposes Neural Exploratory Landscape Analysis (NeurELA), a novel framework that dynamically profiles landscape features through a two-stage, attention-based neural network, executed in an entirely end-to-end fashion. NeurELA is pre-trained over a variety of MetaBBO algorithms using a multi-task neuroevolution strategy. Extensive experiments show that NeurELA achieves consistently superior performance when integrated into different and even unseen MetaBBO tasks and can be efficiently fine-tuned for further performance boost. This advancement marks a pivotal step in making MetaBBO algorithms more autonomous and broadly applicable.

Updated: 2024-08-20 09:17:11

标题: 神经探索性景观分析

摘要: 最近对元黑盒优化（MetaBBO）的研究表明，经过元训练的神经网络可以有效地指导黑盒优化器的设计，显著减少了专家调整的需求，并在复杂问题分布中提供稳健的性能。尽管它们取得了成功，但仍存在一个悖论：MetaBBO仍然依赖于人工设计的探索性景观分析特征，以通知元级代理关于低级优化进展。为了解决这一差距，本文提出了神经探索性景观分析（NeurELA），这是一个全新的框架，通过一个两阶段、基于注意力的神经网络动态地对景观特征进行剖析，完全以端到端的方式执行。NeurELA在多任务神经进化策略下对各种MetaBBO算法进行了预训练。大量实验证明，NeurELA在集成到不同甚至未见过的MetaBBO任务中都能实现一致优越的性能，并且可以有效地进行微调以进一步提升性能。这一进展标志着MetaBBO算法在使其更自主和广泛适用方面迈出了关键步骤。

更新时间: 2024-08-20 09:17:11

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2408.10672v1

Tensor tree learns hidden relational structures in data to construct generative models

Based on the tensor tree network with the Born machine framework, we propose a general method for constructing a generative model by expressing the target distribution function as the quantum wave function amplitude represented by a tensor tree. The key idea is dynamically optimizing the tree structure that minimizes the bond mutual information. The proposed method offers enhanced performance and uncovers hidden relational structures in the target data. We illustrate potential practical applications with four examples: (i) random patterns, (ii) QMNIST hand-written digits, (iii) Bayesian networks, and (iv) the stock price fluctuation pattern in S&P500. In (i) and (ii), strongly correlated variables were concentrated near the center of the network; in (iii), the causality pattern was identified; and, in (iv), a structure corresponding to the eleven sectors emerged.

Updated: 2024-08-20 09:11:38

标题: 张量树学习数据中的隐藏关系结构以构建生成模型

摘要: 基于张量树网络和Born机框架，我们提出了一种构建生成模型的通用方法，通过将目标分布函数表示为张量树表示的量子波函数振幅。关键思想是动态优化最小化键合互信息的树结构。该方法提供了增强的性能，并揭示了目标数据中的隐藏关系结构。我们通过四个示例说明了潜在的实际应用：(i) 随机模式，(ii) QMNIST手写数字，(iii) 贝叶斯网络，和 (iv) S&P500中的股价波动模式。在 (i) 和 (ii) 中，强相关变量集中在网络的中心；在 (iii) 中，确定了因果关系模式；在 (iv) 中，出现了对应于十一个部门的结构。

更新时间: 2024-08-20 09:11:38

领域: cs.LG,cond-mat.stat-mech,cs.AI,quant-ph

下载: http://arxiv.org/abs/2408.10669v1

End-to-end learned Lossy Dynamic Point Cloud Attribute Compression

Recent advancements in point cloud compression have primarily emphasized geometry compression while comparatively fewer efforts have been dedicated to attribute compression. This study introduces an end-to-end learned dynamic lossy attribute coding approach, utilizing an efficient high-dimensional convolution to capture extensive inter-point dependencies. This enables the efficient projection of attribute features into latent variables. Subsequently, we employ a context model that leverage previous latent space in conjunction with an auto-regressive context model for encoding the latent tensor into a bitstream. Evaluation of our method on widely utilized point cloud datasets from the MPEG and Microsoft demonstrates its superior performance compared to the core attribute compression module Region-Adaptive Hierarchical Transform method from MPEG Geometry Point Cloud Compression with 38.1% Bjontegaard Delta-rate saving in average while ensuring a low-complexity encoding/decoding.

Updated: 2024-08-20 09:06:59

标题: 端到端学习的损失动态点云属性压缩

摘要: 近年来，点云压缩技术的最新进展主要侧重于几何压缩，而相比之下，对属性压缩的研究相对较少。本研究引入了一种端到端学习的动态有损属性编码方法，利用高效的高维卷积来捕捉广泛的点间依赖关系。这使得属性特征能够有效投影到潜在变量中。随后，我们采用一个上下文模型，结合自回归上下文模型，将潜在张量编码成比特流。我们在MPEG和微软广泛使用的点云数据集上评估了我们的方法，结果表明与MPEG几何点云压缩中的核心属性压缩模块Region-Adaptive Hierarchical Transform方法相比，我们的方法表现出更优越的性能，平均节省了38.1%的Bjontegaard Delta-rate，同时保证了低复杂度的编码/解码过程。

更新时间: 2024-08-20 09:06:59

领域: eess.IV,cs.LG

下载: http://arxiv.org/abs/2408.10665v1

Federated Clustering: An Unsupervised Cluster-Wise Training for Decentralized Data Distributions

Federated Learning (FL) is a pivotal approach in decentralized machine learning, especially when data privacy is crucial and direct data sharing is impractical. While FL is typically associated with supervised learning, its potential in unsupervised scenarios is underexplored. This paper introduces a novel unsupervised federated learning methodology designed to identify the complete set of categories (global K) across multiple clients within label-free, non-uniform data distributions, a process known as Federated Clustering. Our approach, Federated Cluster-Wise Refinement (FedCRef), involves clients that collaboratively train models on clusters with similar data distributions. Initially, clients with diverse local data distributions (local K) train models on their clusters to generate compressed data representations. These local models are then shared across the network, enabling clients to compare them through reconstruction error analysis, leading to the formation of federated groups.In these groups, clients collaboratively train a shared model representing each data distribution, while continuously refining their local clusters to enhance data association accuracy. This iterative process allows our system to identify all potential data distributions across the network and develop robust representation models for each. To validate our approach, we compare it with traditional centralized methods, establishing a performance baseline and showcasing the advantages of our distributed solution. We also conduct experiments on the EMNIST and KMNIST datasets, demonstrating FedCRef's ability to refine and align cluster models with actual data distributions, significantly improving data representation precision in unsupervised federated settings.

Updated: 2024-08-20 09:05:44

标题: 联合聚类：一种去中心化数据分布的无监督聚类训练

摘要: 联邦学习（FL）是分散式机器学习中的一个关键方法，特别是在数据隐私至关重要且直接数据共享不可行时。虽然FL通常与监督学习相关联，但其在非监督场景中的潜力尚未得到充分探索。本文介绍了一种新颖的非监督联邦学习方法，旨在在无标签、非均匀数据分布中识别跨多个客户端的完整类别集（全局K），这个过程被称为联邦聚类。我们的方法，联邦集群精细化（FedCRef），涉及客户端合作训练具有相似数据分布的集群上的模型。最初，具有不同本地数据分布（本地K）的客户端在其集群上训练模型以生成压缩数据表示。然后，这些本地模型在网络中共享，使客户端能够通过重构误差分析比较它们，从而形成联邦群。在这些群中，客户端合作训练代表每个数据分布的共享模型，同时持续改进其本地集群以增强数据关联精度。这个迭代过程允许我们的系统识别网络中所有潜在的数据分布，并为每个数据开发强大的表示模型。为了验证我们的方法，我们将其与传统的集中方法进行比较，建立性能基线并展示我们分布式解决方案的优势。我们还在EMNIST和KMNIST数据集上进行实验，展示了FedCRef在非监督联邦设置中改进和调整集群模型与实际数据分布的能力，显著提高数据表示精度。

更新时间: 2024-08-20 09:05:44

领域: cs.LG

下载: http://arxiv.org/abs/2408.10664v1

SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge

Large vision-language models (LVLMs) are ignorant of the up-to-date knowledge, such as LLaVA series, because they cannot be updated frequently due to the large amount of resources required, and therefore fail in many cases. For example, if a LVLM was released on January 2024, and it wouldn't know the singer of the theme song for the new Detective Conan movie, which wasn't released until April 2024. To solve the problem, a promising solution motivated by retrieval-augmented generation (RAG) is to provide LVLMs with up-to-date knowledge via internet search during inference, i.e., internet-augmented generation (IAG), which is already integrated in some closed-source commercial LVLMs such as GPT-4V. However, the specific mechanics underpinning them remain a mystery. In this paper, we propose a plug-and-play framework, for augmenting existing LVLMs in handling visual question answering (VQA) about up-to-date knowledge, dubbed SearchLVLMs. A hierarchical filtering model is trained to effectively and efficiently find the most helpful content from the websites returned by a search engine to prompt LVLMs with up-to-date knowledge. To train the model and evaluate our framework's performance, we propose a pipeline to automatically generate news-related VQA samples to construct a dataset, dubbed UDK-VQA. A multi-model voting mechanism is introduced to label the usefulness of website/content for VQA samples to construct the training set. Experimental results demonstrate the effectiveness of our framework, outperforming GPT-4V by about 25% in accuracy.

Updated: 2024-08-20 09:04:25

标题: SearchLVLMs：通过搜索最新的互联网知识来增强大型视觉语言模型的即插即用框架

摘要: 大型视觉语言模型（LVLM）对最新知识，如LLaVA系列，缺乏了解，因为它们由于所需资源的大量而无法频繁更新，因此在许多情况下失败。例如，如果一个LVLM在2024年1月发布，它将不会知道直到2024年4月才发布的新名侦探柯南电影主题曲的歌手。为了解决这个问题，受检索增强生成（RAG）的启发，一种有前途的解决方案是在推理过程中通过互联网搜索为LVLM提供最新知识，即互联网增强生成（IAG），这已经集成在一些闭源商业LVLM中，如GPT-4V。然而，支撑它们的具体机制仍然是一个谜。在本文中，我们提出了一个即插即用的框架，用于增强现有的LVLM处理关于最新知识的视觉问答（VQA），称为SearchLVLMs。训练了一个分层过滤模型，以有效且高效地从搜索引擎返回的网站中找到对LVLM提供最新知识最有帮助的内容。为了训练模型并评估我们框架的性能，我们提出了一个流水线，自动生成新闻相关VQA样本以构建一个数据集，称为UDK-VQA。引入了多模型投票机制来标记网站/内容对VQA样本的有用性以构建训练集。实验结果表明了我们框架的有效性，准确率比GPT-4V高出约25%。

更新时间: 2024-08-20 09:04:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.14554v2

Seamless Integration: Sampling Strategies in Federated Learning Systems

Federated Learning (FL) represents a paradigm shift in the field of machine learning, offering an approach for a decentralized training of models across a multitude of devices while maintaining the privacy of local data. However, the dynamic nature of FL systems, characterized by the ongoing incorporation of new clients with potentially diverse data distributions and computational capabilities, poses a significant challenge to the stability and efficiency of these distributed learning networks. The seamless integration of new clients is imperative to sustain and enhance the performance and robustness of FL systems. This paper looks into the complexities of integrating new clients into existing FL systems and explores how data heterogeneity and varying data distribution (not independent and identically distributed) among them can affect model training, system efficiency, scalability and stability. Despite these challenges, the integration of new clients into FL systems presents opportunities to enhance data diversity, improve learning performance, and leverage distributed computational power. In contrast to other fields of application such as the distributed optimization of word predictions on Gboard (where federated learning once originated), there are usually only a few clients in the production environment, which is why information from each new client becomes all the more valuable. This paper outlines strategies for effective client selection strategies and solutions for ensuring system scalability and stability. Using the example of images from optical quality inspection, it offers insights into practical approaches. In conclusion, this paper proposes that addressing the challenges presented by new client integration is crucial to the advancement and efficiency of distributed learning networks, thus paving the way for the adoption of Federated Learning in production environments.

Updated: 2024-08-20 09:04:25

标题: 无缝集成：联邦学习系统中的采样策略

摘要: 联邦学习（FL）代表了机器学习领域的一种范式转变，提供了一种在多种设备上进行模型分散训练的方法，同时保持本地数据隐私。然而，FL系统的动态特性，即不断引入具有潜在不同数据分布和计算能力的新客户，对这些分布式学习网络的稳定性和效率构成了重大挑战。顺利整合新客户对于维持和增强FL系统的性能和稳健性至关重要。本文探讨了将新客户整合到现有FL系统中的复杂性，并探讨了数据异质性和它们之间不同的数据分布（非独立同分布）如何影响模型训练、系统效率、可伸缩性和稳定性。尽管存在这些挑战，将新客户整合到FL系统中提供了增加数据多样性、改善学习性能和利用分布式计算能力的机会。与在Gboard上进行单词预测的分布式优化（联邦学习的起源）等其他应用领域不同，生产环境中通常只有少数客户，这就使得每个新客户的信息变得更加有价值。本文概述了有效的客户选择策略和确保系统可伸缩性和稳定性的解决方案。通过光学质量检查图像的示例，提供了实用方法的见解。最后，本文提出，解决新客户整合带来的挑战对于分布式学习网络的进步和效率至关重要，从而为在生产环境中采用联邦学习铺平道路。

更新时间: 2024-08-20 09:04:25

领域: cs.LG

下载: http://arxiv.org/abs/2408.09545v2

ETGuard: Malicious Encrypted Traffic Detection in Blockchain-based Power Grid Systems

The escalating prevalence of encryption protocols has led to a concomitant surge in the number of malicious attacks that hide in encrypted traffic. Power grid systems, as fundamental infrastructure, are becoming prime targets for such attacks. Conventional methods for detecting malicious encrypted packets typically use a static pre-trained model. We observe that these methods are not well-suited for blockchain-based power grid systems. More critically, they fall short in dynamic environments where new types of encrypted attacks continuously emerge. Motivated by this, in this paper we try to tackle these challenges from two aspects: (1) We present a novel framework that is able to automatically detect malicious encrypted traffic in blockchain-based power grid systems and incrementally learn from new malicious traffic. (2) We mathematically derive incremental learning losses to resist the forgetting of old attack patterns while ensuring the model is capable of handling new encrypted attack patterns. Empirically, our method achieves state-of-the-art performance on three different benchmark datasets. We also constructed the first malicious encrypted traffic dataset for blockchain-based power grid scenario. Our code and dataset are available at https://github.com/PPPmzt/ETGuard, hoping to inspire future research.

Updated: 2024-08-20 08:53:42

标题: ETGuard：基于区块链的电力系统中恶意加密流量检测

摘要: 加密协议的不断普及导致了隐藏在加密流量中的恶意攻击数量的激增。作为基础设施，电力系统正成为这种攻击的主要目标。传统的检测恶意加密数据包的方法通常使用静态预训练模型。我们观察到这些方法并不适用于基于区块链的电力系统。更为关键的是，它们在动态环境中表现不佳，新型加密攻击不断涌现。鉴于此，本文试图从两个方面应对这些挑战：(1) 我们提出了一个能够自动检测基于区块链的电力系统中的恶意加密流量并从新的恶意流量中逐渐学习的新框架。 (2) 我们数学推导出增量学习损失，以抵抗遗忘旧攻击模式，同时确保模型能够处理新的加密攻击模式。在经验上，我们的方法在三个不同的基准数据集上实现了最先进的性能。我们还构建了基于区块链的电力系统中的第一个恶意加密流量数据集。我们的代码和数据集可在https://github.com/PPPmzt/ETGuard上获得，希望能激发未来的研究。

更新时间: 2024-08-20 08:53:42

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2408.10657v1

Atlas-Based Interpretable Age Prediction In Whole-Body MR Images

Age prediction is an important part of medical assessments and research. It can aid in detecting diseases as well as abnormal ageing by highlighting potential discrepancies between chronological and biological age. To improve understanding of age-related changes in various body parts, we investigate the ageing of the human body on a large scale by using whole-body 3D images. We utilise the Grad-CAM method to determine the body areas most predictive of a person's age. In order to expand our analysis beyond individual subjects, we employ registration techniques to generate population-wide importance maps that show the most predictive areas in the body for a whole cohort of subjects. We show that the investigation of the full 3D volume of the whole body and the population-wide analysis can give important insights into which body parts play the most important roles in predicting a person's age. Our findings reveal three primary areas of interest: the spine, the autochthonous back muscles, and the cardiac region, which exhibits the highest importance. Finally, we investigate differences between subjects that show accelerated and decelerated ageing.

Updated: 2024-08-20 08:52:17

标题: 基于图谱的全身MR影像中可解释的年龄预测

摘要: 年龄预测是医学评估和研究的重要部分。它可以帮助检测疾病以及异常衰老，通过突出实际年龄和生物年龄之间的潜在差异。为了改善对各个身体部位的年龄相关变化的理解，我们利用全身3D图像对人体的衰老进行大规模研究。我们利用Grad-CAM方法确定对一个人的年龄最具预测性的身体区域。为了将我们的分析扩展到个体以外，我们采用注册技术生成针对整个受试者群体的重要性地图，显示对整个受试者队列而言最具预测性的身体区域。我们发现，对整个身体的完整3D体积和全体群体的分析可以为预测一个人的年龄的哪些身体部位起到最重要的作用提供重要见解。我们的研究揭示了三个主要感兴趣的领域：脊柱、自主背部肌肉和心脏区域，后者显示出最高的重要性。最后，我们研究了显示加速和减速衰老的受试者之间的差异。

更新时间: 2024-08-20 08:52:17

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2307.07439v4

Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant

Most recent 3D instance segmentation methods are open vocabulary, offering a greater flexibility than closed-vocabulary methods. Yet, they are limited to reasoning within a specific set of concepts, \ie the vocabulary, prompted by the user at test time. In essence, these models cannot reason in an open-ended fashion, i.e., answering ``List the objects in the scene.''. We introduce the first method to address 3D instance segmentation in a setting that is void of any vocabulary prior, namely a vocabulary-free setting. We leverage a large vision-language assistant and an open-vocabulary 2D instance segmenter to discover and ground semantic categories on the posed images. To form 3D instance mask, we first partition the input point cloud into dense superpoints, which are then merged into 3D instance masks. We propose a novel superpoint merging strategy via spectral clustering, accounting for both mask coherence and semantic coherence that are estimated from the 2D object instance masks. We evaluate our method using ScanNet200 and Replica, outperforming existing methods in both vocabulary-free and open-vocabulary settings. Code will be made available.

Updated: 2024-08-20 08:46:54

标题: 无词汇3D实例分割与视觉和语言助手

摘要: 最近的3D实例分割方法大多是开放词汇的，比封闭词汇方法具有更大的灵活性。然而，它们仅限于在特定概念集内进行推理，即在测试时由用户提出的词汇。实质上，这些模型无法以开放式方式推理，即回答“列出场景中的对象”。我们介绍了第一种方法，用于解决在没有先验词汇的情况下进行3D实例分割的设置，即无词汇的设置。我们利用一个大型视觉语言助手和一个开放词汇的2D实例分割器来发现并定位提出的图像上的语义类别。为了形成3D实例掩膜，我们首先将输入点云划分为密集超点，然后将其合并为3D实例掩膜。我们提出了一种新颖的超点合并策略，通过谱聚类考虑了从2D对象实例掩膜中估计的掩膜一致性和语义一致性。我们使用ScanNet200和Replica评估了我们的方法，在无词汇和开放词汇设置中均优于现有方法。代码将提供。

更新时间: 2024-08-20 08:46:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.10652v1

DPM: Clustering Sensitive Data through Separation

Clustering is an important tool for data exploration where the goal is to subdivide a data set into disjoint clusters that fit well into the underlying data structure. When dealing with sensitive data, privacy-preserving algorithms aim to approximate the non-private baseline while minimising the leakage of sensitive information. State-of-the-art privacy-preserving clustering algorithms tend to output clusters that are good in terms of the standard metrics, inertia, silhouette score, and clustering accuracy, however, the clustering result strongly deviates from the non-private KMeans baseline. In this work, we present a privacy-preserving clustering algorithm called DPM that recursively separates a data set into clusters based on a geometrical clustering approach. In addition, DPM estimates most of the data-dependent hyper-parameters in a privacy-preserving way. We prove that DPM preserves Differential Privacy and analyse the utility guarantees of DPM. Finally, we conduct an extensive empirical evaluation for synthetic and real-life data sets. We show that DPM achieves state-of-the-art utility on the standard clustering metrics and yields a clustering result much closer to that of the popular non-private KMeans algorithm without requiring the number of classes.

Updated: 2024-08-20 08:46:40

标题: DPM: 通过分离对敏感数据进行聚类

摘要: 聚类是数据探索的重要工具，其目标是将数据集细分为适合底层数据结构的不相交聚类。在处理敏感数据时，保护隐私的算法旨在在最小化敏感信息泄露的同时逼近非私密基线。最先进的保护隐私聚类算法倾向于输出在标准度量、惯性、轮廓分数和聚类精度方面良好的聚类，但聚类结果与非私密KMeans基线存在明显偏差。在这项工作中，我们提出了一种名为DPM的保护隐私聚类算法，该算法基于几何聚类方法递归地将数据集分成聚类。此外，DPM以保护隐私的方式估计大多数数据相关的超参数。我们证明了DPM保持差分隐私，并分析了DPM的效用保证。最后，我们对合成和真实数据集进行了广泛的实证评估。我们展示了DPM在标准聚类度量上实现了最新的效用，并且在不需要类数的情况下产生了与流行的非私密KMeans算法非常接近的聚类结果。

更新时间: 2024-08-20 08:46:40

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2307.02969v3

MambaLoc: Efficient Camera Localisation via State Space Model

Location information is pivotal for the automation and intelligence of terminal devices and edge-cloud IoT systems, such as autonomous vehicles and augmented reality. However, achieving reliable positioning across diverse IoT applications remains challenging due to significant training costs and the necessity of densely collected data. To tackle these issues, we have innovatively applied the selective state space (SSM) model to visual localization, introducing a new model named MambaLoc. The proposed model demonstrates exceptional training efficiency by capitalizing on the SSM model's strengths in efficient feature extraction, rapid computation, and memory optimization, and it further ensures robustness in sparse data environments due to its parameter sparsity. Additionally, we propose the Global Information Selector (GIS), which leverages selective SSM to implicitly achieve the efficient global feature extraction capabilities of Non-local Neural Networks. This design leverages the computational efficiency of the SSM model alongside the Non-local Neural Networks' capacity to capture long-range dependencies with minimal layers. Consequently, the GIS enables effective global information capture while significantly accelerating convergence. Our extensive experimental validation using public indoor and outdoor datasets first demonstrates our model's effectiveness, followed by evidence of its versatility with various existing localization models. Our code and models are publicly available to support further research and development in this area.

Updated: 2024-08-20 08:44:42

标题: MambaLoc: 通过状态空间模型实现高效的相机定位

摘要: 位置信息对于终端设备和边缘云物联网系统（如自动驾驶汽车和增强现实）的自动化和智能至关重要。然而，由于显著的训练成本和需要密集收集数据，跨不同物联网应用程序实现可靠定位仍然具有挑战性。为了解决这些问题，我们创新地将选择性状态空间（SSM）模型应用于视觉定位，引入了一种名为MambaLoc的新模型。所提出的模型通过利用SSM模型在高效特征提取、快速计算和内存优化方面的优势，展现出出色的训练效率，并且由于其参数稀疏性，在稀疏数据环境中进一步确保了稳健性。此外，我们提出了全局信息选择器（GIS），利用选择性SSM隐含地实现了非局部神经网络的高效全局特征提取能力。该设计利用了SSM模型的计算效率以及非局部神经网络捕获长距离依赖性的能力，同时最小化了层数。因此，GIS实现了有效的全局信息捕获，同时显著加速了收敛速度。我们使用公共室内和室外数据集进行了广泛的实验验证，首先证明了我们模型的有效性，然后证明了其与各种现有定位模型的多功能性。我们的代码和模型已公开提供，以支持该领域的进一步研究和开发。

更新时间: 2024-08-20 08:44:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.09680v2

Inferring Underwater Topography with FINN

Spatiotemporal partial differential equations (PDEs) find extensive application across various scientific and engineering fields. While numerous models have emerged from both physics and machine learning (ML) communities, there is a growing trend towards integrating these approaches to develop hybrid architectures known as physics-aware machine learning models. Among these, the finite volume neural network (FINN) has emerged as a recent addition. FINN has proven to be particularly efficient in uncovering latent structures in data. In this study, we explore the capabilities of FINN in tackling the shallow-water equations, which simulates wave dynamics in coastal regions. Specifically, we investigate FINN's efficacy to reconstruct underwater topography based on these particular wave equations. Our findings reveal that FINN exhibits a remarkable capacity to infer topography solely from wave dynamics, distinguishing itself from both conventional ML and physics-aware ML models. Our results underscore the potential of FINN in advancing our understanding of spatiotemporal phenomena and enhancing parametrization capabilities in related domains.

Updated: 2024-08-20 08:42:00

标题: 用FINN推断水下地形

摘要: 时空偏微分方程（PDEs）在各个科学和工程领域得到了广泛应用。虽然物理学和机器学习（ML）社区都提出了许多模型，但目前越来越多的趋势是将这些方法整合起来，开发出被称为物理感知机器学习模型的混合架构。在这些模型中，有一个最近出现的称为有限体积神经网络（FINN）的模型。FINN在发现数据中的潜在结构方面被证明特别有效。在本研究中，我们探讨了FINN在处理浅水方程方面的能力，该方程模拟了沿海地区的波动动态。具体来说，我们研究了FINN在基于这些特定波动方程重建水下地形的有效性。我们的研究结果表明，FINN表现出了一种非凡的能力，能够仅从波动动态中推断地形，与传统的ML和物理感知ML模型有所不同。我们的结果强调了FINN在推进我们对时空现象的理解和增强相关领域参数化能力的潜力。

更新时间: 2024-08-20 08:42:00

领域: cs.LG,cs.AI,physics.ao-ph,physics.comp-ph,physics.flu-dyn

下载: http://arxiv.org/abs/2408.10649v1

Fishers Harvest Parallel Unlearning in Inherited Model Networks

Unlearning in various learning frameworks remains challenging, with the continuous growth and updates of models exhibiting complex inheritance relationships. This paper presents a novel unlearning framework, which enables fully parallel unlearning among models exhibiting inheritance. A key enabler is the new Unified Model Inheritance Graph (UMIG), which captures the inheritance using a Directed Acyclic Graph (DAG).Central to our framework is the new Fisher Inheritance Unlearning (FIUn) algorithm, which utilizes the Fisher Information Matrix (FIM) from initial unlearning models to pinpoint impacted parameters in inherited models. By employing FIM, the FIUn method breaks the sequential dependencies among the models, facilitating simultaneous unlearning and reducing computational overhead. We further design to merge disparate FIMs into a single matrix, synchronizing updates across inherited models. Experiments confirm the effectiveness of our unlearning framework. For single-class tasks, it achieves complete unlearning with 0\% accuracy for unlearned labels while maintaining 94.53\% accuracy for retained labels on average. For multi-class tasks, the accuracy is 1.07\% for unlearned labels and 84.77\% for retained labels on average. Our framework accelerates unlearning by 99\% compared to alternative methods.

Updated: 2024-08-20 08:41:58

标题: 渔民在继承模型网络中进行并行去学习

摘要: 在各种学习框架中，去学习仍然具有挑战性，因为模型不断增长和更新，展现出复杂的继承关系。本文提出了一种新颖的去学习框架，可以在展现继承关系的模型之间实现完全并行的去学习。一个关键的实现方式是新的统一模型继承图（UMIG），它利用有向无环图（DAG）来捕捉继承关系。我们框架的核心是新的Fisher继承去学习（FIUn）算法，它利用初始去学习模型的Fisher信息矩阵（FIM）来确定继承模型中受影响的参数。通过使用FIM，FIUn方法打破了模型之间的顺序依赖关系，促进了同时的去学习，减少了计算开销。我们进一步设计将不同的FIM合并为单个矩阵，同步更新继承模型之间的更新。实验证实了我们去学习框架的有效性。对于单类任务，它在平均保留标签的准确率为94.53\%，未学习标签的准确率为0\%。对于多类任务，平均未学习标签的准确率为1.07\%，保留标签的准确率为84.77\%。我们的框架相对于替代方法加速了99%的去学习过程。

更新时间: 2024-08-20 08:41:58

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2408.08493v2

Smart Contract Coordinated Privacy Preserving Crowd-Sensing Campaigns

Crowd-sensing has emerged as a powerful data retrieval model, enabling diverse applications by leveraging active user participation. However, data availability and privacy concerns pose significant challenges. Traditional methods like data encryption and anonymization, while essential, may not fully address these issues. For instance, in sparsely populated areas, anonymized data can still be traced back to individual users. Additionally, the volume of data generated by users can reveal their identities. To develop credible crowd-sensing systems, data must be anonymized, aggregated and separated into uniformly sized chunks. Furthermore, decentralizing the data management process, rather than relying on a single server, can enhance security and trust. This paper proposes a system utilizing smart contracts and blockchain technologies to manage crowd-sensing campaigns. The smart contract handles user subscriptions, data encryption, and decentralized storage, creating a secure data marketplace. Incentive policies within the smart contract encourage user participation and data diversity. Simulation results confirm the system's viability, highlighting the importance of user participation for data credibility and the impact of geographical data scarcity on rewards. This approach aims to balance data origin and reduce cheating risks.

Updated: 2024-08-20 08:41:57

标题: 智能合约协调的隐私保护众包感知活动

摘要: 群体感知作为一种强大的数据检索模型已经出现，通过利用用户的积极参与，实现了多样化的应用。然而，数据的可用性和隐私问题带来了重大挑战。传统的方法，如数据加密和匿名化虽然是必不可少的，但可能无法完全解决这些问题。例如，在人口稀少的地区，匿名化的数据仍然可以追溯到个别用户。此外，用户产生的大量数据可能会揭示他们的身份。为了开发可信的群体感知系统，数据必须进行匿名化、聚合并分割成统一大小的块。此外，将数据管理过程分散化，而不是依赖于单一服务器，可以增强安全性和信任。本文提出了一种利用智能合约和区块链技术来管理群体感知活动的系统。智能合约处理用户订阅、数据加密和分散存储，创建一个安全的数据市场。智能合约内的激励政策鼓励用户参与和数据多样性。模拟结果证实了系统的可行性，突出了用户参与对数据可信度的重要性以及地理数据稀缺对奖励的影响。这种方法旨在平衡数据来源并减少作弊风险。

更新时间: 2024-08-20 08:41:57

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2408.10648v1

Privacy-preserving Universal Adversarial Defense for Black-box Models

Deep neural networks (DNNs) are increasingly used in critical applications such as identity authentication and autonomous driving, where robustness against adversarial attacks is crucial. These attacks can exploit minor perturbations to cause significant prediction errors, making it essential to enhance the resilience of DNNs. Traditional defense methods often rely on access to detailed model information, which raises privacy concerns, as model owners may be reluctant to share such data. In contrast, existing black-box defense methods fail to offer a universal defense against various types of adversarial attacks. To address these challenges, we introduce DUCD, a universal black-box defense method that does not require access to the target model's parameters or architecture. Our approach involves distilling the target model by querying it with data, creating a white-box surrogate while preserving data privacy. We further enhance this surrogate model using a certified defense based on randomized smoothing and optimized noise selection, enabling robust defense against a broad range of adversarial attacks. Comparative evaluations between the certified defenses of the surrogate and target models demonstrate the effectiveness of our approach. Experiments on multiple image classification datasets show that DUCD not only outperforms existing black-box defenses but also matches the accuracy of white-box defenses, all while enhancing data privacy and reducing the success rate of membership inference attacks.

Updated: 2024-08-20 08:40:39

标题: 隐私保护的黑盒模型通用对抗性防御

摘要: 深度神经网络（DNNs）越来越多地应用于关键应用程序，如身份验证和自动驾驶，对抗性攻击的鲁棒性至关重要。这些攻击可以利用微小的扰动来引起显著的预测错误，因此加强DNNs的韧性至关重要。传统的防御方法通常依赖于对详细模型信息的访问，这引发了隐私问题，因为模型所有者可能不愿分享这些数据。相比之下，现有的黑盒防御方法未能提供对各种类型的对抗性攻击的普遍防御。为了解决这些挑战，我们介绍了DUCD，一个通用的黑盒防御方法，不需要访问目标模型的参数或架构。我们的方法涉及通过数据查询目标模型来提炼它，创建一个白盒替代模型同时保护数据隐私。我们进一步通过基于随机平滑和优化噪声选择的认证防御来增强这个替代模型，从而实现对广泛范围的对抗性攻击的强大防御。对替代模型和目标模型的认证防御进行比较评估，证明了我们方法的有效性。在多个图像分类数据集上的实验表明，DUCD不仅优于现有的黑盒防御方法，而且匹配白盒防御的准确性，同时增强了数据隐私并降低了成员推断攻击的成功率。

更新时间: 2024-08-20 08:40:39

领域: cs.LG,cs.AI,cs.CR,I.2.10

下载: http://arxiv.org/abs/2408.10647v1

Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs

The veracity of a factoid is largely independent of the language it is written in. However, language models are inconsistent in their ability to answer the same factual question across languages. This raises questions about how LLMs represent a given fact across languages. We explore multilingual factual knowledge through two aspects: the model's ability to answer a query consistently across languages, and the ability to ''store'' answers in a shared representation for several languages. We propose a methodology to measure the extent of representation sharing across languages by repurposing knowledge editing methods. We examine LLMs with various multilingual configurations using a new multilingual dataset. We reveal that high consistency does not necessarily imply shared representation, particularly for languages with different scripts. Moreover, we find that script similarity is a dominant factor in representation sharing. Finally, we observe that if LLMs could fully share knowledge across languages, their accuracy in their best-performing language could benefit an increase of up to 150\% on average. These findings highlight the need for improved multilingual knowledge representation in LLMs and suggest a path for the development of more robust and consistent multilingual LLMs.

Updated: 2024-08-20 08:38:30

标题: 标题翻译：一探一致性的深层：探索LLMs中跨语言知识表示共享

摘要: 事实的真实性在很大程度上与其所写的语言无关。然而，语言模型在跨语言回答相同事实问题的能力上存在不一致性。这引发了关于LLM如何跨语言表示给定事实的问题。我们从两个方面探讨多语言事实知识：模型在跨语言一致性回答查询的能力，以及在多种语言中“存储”答案的能力。我们提出了一种方法来衡量跨语言共享表示的程度，通过重新利用知识编辑方法。我们使用一个新的多语言数据集检验了具有不同多语言配置的LLM。我们发现，高一致性并不一定意味着共享表示，尤其是对于具有不同文字的语言。此外，我们发现文字相似性是表示共享的主要因素。最后，我们观察到，如果LLM能够完全跨语言共享知识，它们在表现最佳语言上的准确性平均增加高达150％。这些发现突显了LLM中改进多语言知识表示的必要性，并提出了发展更健壮和一致的多语言LLM的途径。

更新时间: 2024-08-20 08:38:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.10646v1

Snuffy: Efficient Whole Slide Image Classifier

Whole Slide Image (WSI) classification with multiple instance learning (MIL) in digital pathology faces significant computational challenges. Current methods mostly rely on extensive self-supervised learning (SSL) for satisfactory performance, requiring long training periods and considerable computational resources. At the same time, no pre-training affects performance due to domain shifts from natural images to WSIs. We introduce Snuffy architecture, a novel MIL-pooling method based on sparse transformers that mitigates performance loss with limited pre-training and enables continual few-shot pre-training as a competitive option. Our sparsity pattern is tailored for pathology and is theoretically proven to be a universal approximator with the tightest probabilistic sharp bound on the number of layers for sparse transformers, to date. We demonstrate Snuffy's effectiveness on CAMELYON16 and TCGA Lung cancer datasets, achieving superior WSI and patch-level accuracies. The code is available on https://github.com/jafarinia/snuffy.

Updated: 2024-08-20 08:36:59

标题: Snuffy：高效的全切片图像分类器

摘要: 数字病理学中使用多实例学习（MIL）进行全幻灯片图像（WSI）分类面临着重大的计算挑战。当前方法主要依赖于广泛的自监督学习（SSL）以获得令人满意的性能，需要长时间的训练周期和相当的计算资源。同时，由于从自然图像到WSI的域转移，没有预训练会影响性能。我们介绍了基于稀疏变换器的新型MIL池化方法Snuffy架构，该方法可以在有限的预训练下减轻性能损失，并使持续的少样本预训练成为一个竞争性选项。我们的稀疏模式专为病理学设计，并在理论上被证明是迄今为止稀疏变换器层数上最紧密的概率尖锐界限的通用逼近器。我们在CAMELYON16和TCGA肺癌数据集上展示了Snuffy的有效性，实现了优越的WSI和图块级别的准确性。该代码可在https://github.com/jafarinia/snuffy 上找到。

更新时间: 2024-08-20 08:36:59

领域: cs.CV,cs.AI,cs.LG,cs.NE,eess.IV

下载: http://arxiv.org/abs/2408.08258v2

CoRA: Collaborative Information Perception by Large Language Model's Weights for Recommendation

Involving collaborative information in Large Language Models (LLMs) is a promising technique for adapting LLMs for recommendation. Existing methods achieve this by concatenating collaborative features with text tokens into a unified sequence input and then fine-tuning to align these features with LLM's input space. Although effective, in this work, we identify two limitations when adapting LLMs to recommendation tasks, which hinder the integration of general knowledge and collaborative information, resulting in sub-optimal recommendation performance. (1) Fine-tuning LLM with recommendation data can undermine its inherent world knowledge and fundamental competencies, which are crucial for interpreting and inferring recommendation text. (2) Incorporating collaborative features into textual prompts disrupts the semantics of the original prompts, preventing LLM from generating appropriate outputs. In this paper, we propose a new paradigm, CoRA (an acronym for Collaborative LoRA), with a collaborative weights generator. Rather than input space alignment, this method aligns collaborative information with LLM's parameter space, representing them as incremental weights to update LLM's output. This way, LLM perceives collaborative information without altering its general knowledge and text inference capabilities. Specifically, we employ a collaborative filtering model to extract user and item embeddings, converting them into collaborative weights with low-rank properties through the collaborative weights generator. We then merge the collaborative weights into LLM's weights, enabling LLM to perceive the collaborative signals and generate personalized recommendations without fine-tuning or extra collaborative tokens in prompts. Extensive experiments confirm that CoRA effectively integrates collaborative information into LLM, enhancing recommendation performance.

Updated: 2024-08-20 08:36:59

标题: CoRA: 利用大型语言模型权重的协作信息感知用于推荐

摘要: 在大型语言模型（LLMs）中涉及协作信息是一种很有前途的技术，可以用于调整LLMs以进行推荐。现有的方法通过将协作特征与文本标记连接成统一的序列输入，然后微调以使这些特征与LLMs的输入空间对齐来实现这一目标。尽管有效，但在这项工作中，我们发现了两个限制，使得LLMs难以适应推荐任务，从而阻碍了将通用知识和协作信息整合到一起，导致推荐性能不佳。（1）用推荐数据微调LLMs可能会削弱其固有的世界知识和基本能力，这些能力对于解释和推断推荐文本至关重要。（2）将协作特征整合到文本提示中会破坏原始提示的语义，阻止LLMs生成适当的输出。在本文中，我们提出了一种新的范式，CoRA（Collaborative LoRA的缩写），并配备了一个协作权重生成器。与输入空间对齐不同，该方法将协作信息与LLMs的参数空间对齐，将其表示为用于更新LLMs输出的增量权重。这样，LLMs可以感知协作信息，而不会改变其通用知识和文本推理能力。具体而言，我们采用协作过滤模型提取用户和项目嵌入，通过协作权重生成器将其转换为具有低秩特性的协作权重。然后将协作权重合并到LLMs的权重中，使LLMs能够感知协作信号并生成个性化推荐，而无需微调或在提示中添加额外的协作标记。大量实验证实，CoRA有效地将协作信息整合到LLMs中，提高了推荐性能。

更新时间: 2024-08-20 08:36:59

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2408.10645v1

Strong hallucinations from negation and how to fix them

Despite great performance on many tasks, language models (LMs) still struggle with reasoning, sometimes providing responses that cannot possibly be true because they stem from logical incoherence. We call such responses \textit{strong hallucinations} and prove that they follow from an LM's computation of its internal representations for logical operators and outputs from those representations. Focusing on negation, we provide a novel solution in which negation is treated not as another element of a latent representation, but as \textit{an operation over an LM's latent representations that constrains how they may evolve}. We show that our approach improves model performance in cloze prompting and natural language inference tasks with negation without requiring training on sparse negative data.

Updated: 2024-08-20 08:36:26

标题: 否定形成强烈幻觉及其修复方法

摘要: 尽管语言模型（LMs）在许多任务上表现出色，但仍然在推理方面遇到困难，有时提供的回答不可能是真实的，因为它们源于逻辑上的不一致。我们称这样的回答为\textit{强幻觉}，并证明它们是由LM计算其内部表示的逻辑运算符和从这些表示中产生的输出导致的。专注于否定，我们提供了一种新颖的解决方案，其中否定被视为LM的潜在表示的另一个元素，而是作为\textit{LM的潜在表示上的一个操作，约束了它们可能如何演变}。我们表明，我们的方法提高了模型在填空提示和自然语言推理任务中的性能，而无需在稀疏负面数据上进行训练。

更新时间: 2024-08-20 08:36:26

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2402.10543v2

Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation

Instruct LLM provide a paradigm used in large scale language model to align LLM to human preference. The paradigm contains supervised fine tuning and reinforce learning from human feedback. This paradigm is also used in downstream scenarios to adapt LLM to specific corpora and applications. Comparing to SFT, there are many efforts focused on RLHF and several algorithms being proposed, such as PPO, DPO, IPO, KTO, MinorDPO and etc. Meanwhile most efforts for SFT are focused on how to collect, filter and mix high quality data. In this article with insight from DPO and MinorDPO, we propose a training metric for SFT to measure the discrepancy between the optimized model and the original model, and a loss function MinorSFT that can increase the training effectiveness, and reduce the discrepancy between the optimized LLM and original LLM.

Updated: 2024-08-20 08:32:44

标题: 微小的SFT损失用于LLM微调，以提高性能并减少模型偏差

摘要: 指导LLM提供了一个在大规模语言模型中使用的范例，以将LLM与人类偏好进行对齐。该范例包括监督微调和从人类反馈中进行强化学习。这一范例也用于下游场景，以适应LLM到特定的语料库和应用程序。与SFT相比，有许多努力集中在RLHF上，并提出了几种算法，如PPO、DPO、IPO、KTO、MinorDPO等。与此同时，大多数关于SFT的努力都集中在如何收集、过滤和混合高质量数据上。在本文中，从DPO和MinorDPO的见解中，我们提出了一个用于SFT的训练度量，用于衡量优化模型与原始模型之间的不一致性，以及一个可以增加训练效果并减少优化LLM和原始LLM之间不一致性的损失函数MinorSFT。

更新时间: 2024-08-20 08:32:44

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.10642v1

A Review of Human-Object Interaction Detection

Human-object interaction (HOI) detection plays a key role in high-level visual understanding, facilitating a deep comprehension of human activities. Specifically, HOI detection aims to locate the humans and objects involved in interactions within images or videos and classify the specific interactions between them. The success of this task is influenced by several key factors, including the accurate localization of human and object instances, as well as the correct classification of object categories and interaction relationships. This paper systematically summarizes and discusses the recent work in image-based HOI detection. First, the mainstream datasets involved in HOI relationship detection are introduced. Furthermore, starting with two-stage methods and end-to-end one-stage detection approaches, this paper comprehensively discusses the current developments in image-based HOI detection, analyzing the strengths and weaknesses of these two methods. Additionally, the advancements of zero-shot learning, weakly supervised learning, and the application of large-scale language models in HOI detection are discussed. Finally, the current challenges in HOI detection are outlined, and potential research directions and future trends are explored.

Updated: 2024-08-20 08:32:39

标题: 人-物体交互检测综述

摘要: 人体物体交互（HOI）检测在高级视觉理解中起着关键作用，有助于深入理解人类活动。具体来说，HOI检测旨在定位图像或视频中涉及的人体和物体，并对它们之间的具体交互进行分类。该任务的成功受到几个关键因素的影响，包括准确定位人体和物体实例，以及正确分类物体类别和交互关系。本文系统地总结和讨论了基于图像的HOI检测的最新研究。首先，介绍了涉及HOI关系检测的主流数据集。此外，从两阶段方法和端到端一阶段检测方法开始，本文全面讨论了基于图像的HOI检测的当前发展，分析了这两种方法的优势和劣势。此外，还讨论了零样本学习、弱监督学习以及在HOI检测中应用大规模语言模型的进展。最后，概述了HOI检测中当前面临的挑战，并探讨了潜在的研究方向和未来趋势。

更新时间: 2024-08-20 08:32:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.10641v1

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

In this paper, we propose a new method Strategist that utilizes LLMs to acquire new skills for playing multi-agent games through a self-improvement process. Our method gathers quality feedback through self-play simulations with Monte Carlo tree search and LLM-based reflection, which can then be used to learn high-level strategic skills such as how to evaluate states that guide the low-level execution.We showcase how our method can be used in both action planning and dialogue generation in the context of games, achieving good performance on both tasks. Specifically, we demonstrate that our method can help train agents with better performance than both traditional reinforcement learning-based approaches and other LLM-based skill learning approaches in games including the Game of Pure Strategy (GOPS) and The Resistance: Avalon.

Updated: 2024-08-20 08:22:04

标题: 战略家：通过双层树搜索学习LLMs的战略技能

摘要: 在这篇论文中，我们提出了一种新的方法Strategist，利用LLMs通过自我改进过程来获取在多智能体游戏中发挥新技能。我们的方法通过与蒙特卡洛树搜索和基于LLM的反思的自我对抗模拟来收集质量反馈，然后可以用来学习高级战略技能，例如如何评估指导低级执行的状态。我们展示了我们的方法如何在游戏的行动规划和对话生成中使用，在这两个任务上取得了良好的表现。具体来说，我们证明了我们的方法可以帮助训练比传统的基于强化学习的方法和其他基于LLM的技能学习方法在包括纯策略游戏(GOPS)和抵抗组织：阿瓦隆在内的游戏中表现更好的代理。

更新时间: 2024-08-20 08:22:04

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.10635v1

Industry Perception of Security Challenges with Identity Access Management Solutions

Identity Access Management (IAM) is an area posing significant challenges, particularly in the context of remote connectivity and distributed or cloud-based systems. A wide range of technical solutions have been proposed by prior research, but the integration of these solutions in the commercial sector represent steps that significantly hamper their acceptance. The study aims to outline the current perception and security issues associated with IAMs solutions from the perspective of the beneficiaries. The analysis relies on a series of interviews with 45 cyber security professionals from different organisations all over the world. As results showed, cloud IAM solutions and on premises IAM solutions are affected by different issues. The main challenges for cloud based IAM solutions were Default configurations, Poor management of Non-Human Identities such as Service accounts, Poor certificate management, Poor API configuration and limited Log analysis. In contrast, the challenges for on premise solutions were Multi Factor Authentication, insecure Default configurations, Lack of skillsets required to manage IAM solution securely, Poor password policies, Unpatched vulnerabilities, and compromise of Single-Sign on leading to compromise of multiple entities. The study also determined that, regardless the evolving functionality of cloud based IAM solutions, 41% of respondents believe that the on premise solutions more secure than the cloud-based ones. As pointed out by the respondents, cloud IAM may potentially expose organisations to a wider range of vulnerabilities due to the complexity of the underlying solutions, challenges with managing permissions, and compliance to dynamic IAM policies.

Updated: 2024-08-20 08:19:58

标题: 行业对身份访问管理解决方案安全挑战的看法

摘要: 身份访问管理（IAM）是一个领域，尤其在远程连接和分布式或基于云的系统的情境中，存在着重大挑战。先前的研究提出了各种技术解决方案，但这些解决方案在商业领域中的整合阻碍了它们的接受程度。该研究旨在从受益者的角度概述IAM解决方案当前的认知和安全问题。分析依赖于对全球不同组织的45名网络安全专业人士进行的一系列访谈。结果显示，云IAM解决方案和本地IAM解决方案受到不同问题的影响。云IAM解决方案的主要挑战包括默认配置、非人身份（例如服务账号）的管理不善、证书管理不善、API配置不当和有限的日志分析。相比之下，本地解决方案面临的挑战包括多因素身份验证、不安全的默认配置、缺乏管理IAM解决方案所需的技能集、密码策略不佳、未打补丁的漏洞以及单点登录的妥协导致多个实体受到妥协。研究还确定，尽管云IAM解决方案的功能不断发展，但41%的受访者认为本地解决方案比基于云的解决方案更安全。正如受访者指出的那样，云IAM可能会使组织面临更广泛的漏洞，原因是底层解决方案的复杂性、权限管理的挑战以及遵守动态IAM政策。

更新时间: 2024-08-20 08:19:58

领域: cs.CR

下载: http://arxiv.org/abs/2408.10634v1

Interactive Counterfactual Generation for Univariate Time Series

We propose an interactive methodology for generating counterfactual explanations for univariate time series data in classification tasks by leveraging 2D projections and decision boundary maps to tackle interpretability challenges. Our approach aims to enhance the transparency and understanding of deep learning models' decision processes. The application simplifies the time series data analysis by enabling users to interactively manipulate projected data points, providing intuitive insights through inverse projection techniques. By abstracting user interactions with the projected data points rather than the raw time series data, our method facilitates an intuitive generation of counterfactual explanations. This approach allows for a more straightforward exploration of univariate time series data, enabling users to manipulate data points to comprehend potential outcomes of hypothetical scenarios. We validate this method using the ECG5000 benchmark dataset, demonstrating significant improvements in interpretability and user understanding of time series classification. The results indicate a promising direction for enhancing explainable AI, with potential applications in various domains requiring transparent and interpretable deep learning models. Future work will explore the scalability of this method to multivariate time series data and its integration with other interpretability techniques.

Updated: 2024-08-20 08:19:55

标题: 单变量时间序列交互式反事实生成

摘要: 我们提出了一种交互式方法，通过利用二维投影和决策边界图来生成针对单变量时间序列数据的反事实解释，以解决解释性挑战。我们的方法旨在增强深度学习模型决策过程的透明度和理解。该应用通过让用户交互式操作投影数据点，通过反投影技术提供直观的见解，简化了时间序列数据分析。通过将用户与投影数据点的交互抽象化，而不是与原始时间序列数据交互，我们的方法促进了反事实解释的直观生成。这种方法允许更直接地探索单变量时间序列数据，使用户能够操作数据点以理解假设情景的潜在结果。我们使用ECG5000基准数据集验证了该方法，展示了在时间序列分类中解释性和用户理解的显著改进。结果表明了增强可解释AI的一个有前途的方向，具有潜在应用于需要透明且可解释的深度学习模型的各种领域。未来工作将探索该方法在多变量时间序列数据上的可扩展性以及与其他解释性技术的整合。

更新时间: 2024-08-20 08:19:55

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2408.10633v1

LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models

Large language models (LLMs) have grown significantly in scale, leading to a critical need for efficient model pruning techniques. Existing post-training pruning techniques primarily focus on measuring weight importance on converged dense models to determine salient weights to retain. However, they often overlook the changes in weight importance during the pruning process, which can lead to performance degradation in the pruned models. To address this issue, we present LLM-Barber (Block-Aware Rebuilder for Sparsity Mask in One-Shot), a novel one-shot pruning framework that rebuilds the sparsity mask of pruned models without any retraining or weight reconstruction. LLM-Barber incorporates block-aware error optimization across Self-Attention and MLP blocks, ensuring global performance optimization. Inspired by the recent discovery of prominent outliers in LLMs, LLM-Barber introduces an innovative pruning metric that identifies weight importance using weights multiplied by gradients. Our experiments show that LLM-Barber can efficiently prune models like LLaMA and OPT families with 7B to 13B parameters on a single A100 GPU in just 30 minutes, achieving state-of-the-art results in both perplexity and zero-shot performance across various language benchmarks. Code is available at https://github.com/YupengSu/LLM-Barber.

Updated: 2024-08-20 08:13:52

标题: LLM-Barber: 一次性大型语言模型的稀疏性掩码的块感知重建者

摘要: 大型语言模型（LLMs）在规模上显著增长，导致对高效模型剪枝技术的迫切需求。现有的后训练剪枝技术主要集中在测量收敛的密集模型上的权重重要性，以确定要保留的显著权重。然而，它们经常忽视剪枝过程中权重重要性的变化，这可能导致被剪枝模型的性能下降。为了解决这个问题，我们提出了LLM-Barber（一次性稀疏掩模的块感知重建器），这是一个新颖的一次性剪枝框架，可以在不重新训练或重建权重的情况下重建被剪枝模型的稀疏掩模。LLM-Barber跨自注意和MLP块实现了块感知误差优化，确保全局性能优化。受LLMs中突出异常值的最近发现的启发，LLM-Barber引入了一种创新的剪枝度量，使用权重乘以梯度来识别权重重要性。我们的实验表明，LLM-Barber可以在单个A100 GPU上仅需30分钟就能高效地剪枝类似LLaMA和OPT系列的模型，这些模型参数在7B到13B之间，在各种语言基准测试中实现了最先进的困惑度和零样本性能。代码可在https://github.com/YupengSu/LLM-Barber找到。

更新时间: 2024-08-20 08:13:52

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.10631v1

Accelerometer-Based Multivariate Time-Series Dataset for Calf Behavior Classification

Getting new insights on pre-weaned calf behavioral adaptation to routine challenges (transport, group relocation, etc.) and diseases (respiratory diseases, diarrhea, etc.) is a promising way to improve calf welfare in dairy farms. A classic approach to automatically monitoring behavior is to equip animals with accelerometers attached to neck collars and to develop machine learning models from accelerometer time-series. However, to be used for model development, data must be equipped with labels. Obtaining these labels requires annotating behaviors from direct observation or videos, a time-consuming and labor-intensive process. To address this challenge, we propose the ActBeCalf (Accelerometer Time-Series for Calf Behaviour classification) dataset: 30 pre-weaned dairy calves (Holstein Friesian and Jersey) were equipped with a 3D-accelerometer sensor attached to a neck-collar from one week of birth for 13 weeks. The calves were simultaneously filmed with a camera in each pen. At the end of the trial, behaviors were manually annotated from the videos using the Behavioral Observation Research Interactive Software (BORIS) by 3 observers using an ethogram with 23 behaviors. ActBeCalf contains 27.4 hours of accelerometer data aligned adequately with calf behaviors. The dataset includes the main behaviors, like lying, standing, walking, and running, and less prominent behaviors, such as sniffing, social interaction, and grooming. Finally, ActBeCalf was used for behavior classification with machine learning models: (i)two classes of behaviors, [active and inactive; model 1] and (ii)four classes of behaviors [running, lying, drinking milk, and 'other' class; model 2] to demonstrate its reliability. We got a balanced accuracy of 92% [model1] and 84% [model2]. ActBeCalf is a comprehensive and ready-to-use dataset for classifying pre-weaned calf behaviour from the acceleration time series.

Updated: 2024-08-20 08:11:54

标题: 基于加速度计的小牛行为分类多变量时间序列数据集

摘要: 获得关于哺乳前犊牛对日常挑战（运输、群体迁移等）和疾病（呼吸道疾病、腹泻等）的行为适应的新见解是改善奶牛场犊牛福利的一种有希望的途径。自动监测行为的经典方法是为动物配备附在颈圈上的加速计，并从加速计时间序列开发机器学习模型。然而，为了用于模型开发，数据必须附带标签。获取这些标签需要通过直接观察或视频注释行为，这是一项耗时和劳动密集的过程。为了解决这一挑战，我们提出了ActBeCalf（用于犊牛行为分类的加速计时间序列）数据集：30头哺乳前奶牛（荷斯坦弗里西亚和泽西）从出生第一周起，用颈圈上附有3D加速计传感器的方式装备，持续13周。小牛同时在每个围栏内用摄像机拍摄。在试验结束时，行为通过使用23种行为的行为观察研究交互软件（BORIS）从视频中手动注释，由3名观察员进行。ActBeCalf包含27.4小时的加速计数据，与小牛行为适当对齐。该数据集包括主要行为，如躺、站立、行走和奔跑，以及不太明显的行为，如嗅、社交互动和梳理。最后，ActBeCalf被用于行为分类与机器学习模型：（i）两种行为类别，[活跃和不活跃；模型1]和（ii）四种行为类别[奔跑、躺、喝奶、和“其他”类；模型2]以展示其可靠性。我们得到了92% [模型1]和84% [模型2]的平衡准确率。ActBeCalf是一个全面且可立即使用的数据集，用于从加速时间序列中对哺乳前犊牛行为进行分类。

更新时间: 2024-08-20 08:11:54

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2409.00053v1

Finding the DeepDream for Time Series: Activation Maximization for Univariate Time Series

Understanding how models process and interpret time series data remains a significant challenge in deep learning to enable applicability in safety-critical areas such as healthcare. In this paper, we introduce Sequence Dreaming, a technique that adapts Activation Maximization to analyze sequential information, aiming to enhance the interpretability of neural networks operating on univariate time series. By leveraging this method, we visualize the temporal dynamics and patterns most influential in model decision-making processes. To counteract the generation of unrealistic or excessively noisy sequences, we enhance Sequence Dreaming with a range of regularization techniques, including exponential smoothing. This approach ensures the production of sequences that more accurately reflect the critical features identified by the neural network. Our approach is tested on a time series classification dataset encompassing applications in predictive maintenance. The results show that our proposed Sequence Dreaming approach demonstrates targeted activation maximization for different use cases so that either centered class or border activation maximization can be generated. The results underscore the versatility of Sequence Dreaming in uncovering salient temporal features learned by neural networks, thereby advancing model transparency and trustworthiness in decision-critical domains.

Updated: 2024-08-20 08:09:44

标题: 寻找时序数据的深梦境：单变量时序数据的激活最大化

摘要: 理解模型如何处理和解释时间序列数据仍然是深度学习中的一个重要挑战，以实现在诸如医疗保健等安全关键领域的应用性。在本文中，我们介绍了Sequence Dreaming，这是一种将激活最大化技术应用于分析序列信息的技术，旨在增强在单变量时间序列上运行的神经网络的可解释性。通过利用这种方法，我们可以可视化对模型决策过程最有影响力的时间动态和模式。为了抵消产生不现实或过多噪音的序列，我们将Sequence Dreaming与一系列正则化技术相结合，包括指数平滑。这种方法确保生成的序列更准确地反映出神经网络识别的关键特征。我们的方法在一个涵盖预测性维护应用的时间序列分类数据集上进行了测试。结果表明，我们提出的Sequence Dreaming方法展示了针对不同用例的有针对性激活最大化，从而可以生成中心类或边界激活最大化。结果强调了Sequence Dreaming在揭示神经网络学习的显著时间特征方面的多功能性，从而推进了模型在决策关键领域的透明度和可信度。

更新时间: 2024-08-20 08:09:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.10628v1

HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning

Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., total budget and cost-per-click (CPC), etc. Different from existing works mainly focusing on single channel bidding, we explicitly consider cross-channel constrained bidding with budget allocation. Specifically, we propose a hierarchical offline deep reinforcement learning (DRL) framework called ``HiBid'', consisted of a high-level planner equipped with auxiliary loss for non-competitive budget allocation, and a data augmentation enhanced low-level executor for adaptive bidding strategy in response to allocated budgets. Additionally, a CPC-guided action selection mechanism is introduced to satisfy the cross-channel CPC constraint. Through extensive experiments on both the large-scale log data and online A/B testing, we confirm that HiBid outperforms six baselines in terms of the number of clicks, CPC satisfactory ratio, and return-on-investment (ROI). We also deploy HiBid on Meituan advertising platform to already service tens of thousands of advertisers every day.

Updated: 2024-08-20 08:09:26

标题: HiBid：一种通过层次化离线深度强化学习进行预算分配的跨频道受限竞标系统

摘要: 在线展示广告平台通过提供实时竞价（RTB）为数十亿次广告请求提供服务给众多广告主。竞价策略处理跨多个渠道的广告请求，以在设定的财务限制下最大化点击次数，即总预算和每次点击成本（CPC）等。与现有作品主要集中在单一渠道竞价不同，我们明确考虑了具有预算分配的跨渠道受限竞价。具体而言，我们提出了一个层次化的离线深度强化学习（DRL）框架称为“HiBid”，由一个配备辅助损失的高级规划器和一个通过数据增强增强的低级执行器组成，以响应分配的预算进行自适应竞价策略。此外，引入了一个基于CPC的行动选择机制，以满足跨渠道CPC约束。通过对大规模日志数据和在线A/B测试的广泛实验，我们确认HiBid在点击次数、CPC满意比率和投资回报率（ROI）方面优于六个基准线。我们还将HiBid部署在美团广告平台上，每天已为数以万计的广告主提供服务。

更新时间: 2024-08-20 08:09:26

领域: cs.LG,cs.GT

下载: http://arxiv.org/abs/2312.17503v2

Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think

Multiple choice questions (MCQs) are commonly used to evaluate the capabilities of large language models (LLMs). One common way to evaluate the model response is to rank the candidate answers based on the log probability of the first token prediction. An alternative way is to examine the text output. Prior work has shown that first token probabilities lack robustness to changes in MCQ phrasing, and that first token probabilities do not match text answers for instruction-tuned models. Therefore, in this paper, we investigate the robustness of text answers. We show that the text answers are more robust to question perturbations than the first token probabilities, when the first token answers mismatch the text answers. The difference in robustness increases as the mismatch rate becomes greater. As the mismatch reaches over 50\%, the text answer is more robust to option order changes than the debiased first token probabilities using state-of-the-art debiasing methods such as PriDe. Our findings provide further evidence for the benefits of text answer evaluation over first token probability evaluation.

Updated: 2024-08-20 08:07:49

标题: 看文本：指导调整的语言模型比你想象的更强大的多选选择器

摘要: 多项选择题（MCQs）常用于评估大型语言模型（LLMs）的能力。评估模型响应的一种常见方法是根据第一个标记预测的对数概率对候选答案进行排名。另一种方法是检查文本输出。先前的研究表明，第一个标记的概率对MCQ措辞的变化缺乏稳健性，并且第一个标记的概率与教学调整模型的文本答案不匹配。因此，在本文中，我们调查了文本答案的稳健性。我们发现，当第一个标记的答案与文本答案不匹配时，文本答案比第一个标记的概率更能承受问题扰动。随着不匹配率的增加，稳健性的差异也增加。当不匹配率超过50\%时，文本答案比使用PriDe等最先进的去偏方法的去偏第一个标记概率更能承受选项顺序变化。我们的发现进一步证明了文本答案评估优于第一个标记概率评估的好处。

更新时间: 2024-08-20 08:07:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.08382v2

WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification

For the visible-infrared person re-identification (VI-ReID) task, one of the primary challenges lies in significant cross-modality discrepancy. Existing methods struggle to conduct modality-invariant information mining. They often focus solely on mining singular dimensions like spatial or channel, and overlook the extraction of specific-modality multi-dimension information. To fully mine modality-invariant information across a wide range, we introduce the Wide-Ranging Information Mining Network (WRIM-Net), which mainly comprises a Multi-dimension Interactive Information Mining (MIIM) module and an Auxiliary-Information-based Contrastive Learning (AICL) approach. Empowered by the proposed Global Region Interaction (GRI), MIIM comprehensively mines non-local spatial and channel information through intra-dimension interaction. Moreover, Thanks to the low computational complexity design, separate MIIM can be positioned in shallow layers, enabling the network to better mine specific-modality multi-dimension information. AICL, by introducing the novel Cross-Modality Key-Instance Contrastive (CMKIC) loss, effectively guides the network in extracting modality-invariant information. We conduct extensive experiments not only on the well-known SYSU-MM01 and RegDB datasets but also on the latest large-scale cross-modality LLCM dataset. The results demonstrate WRIM-Net's superiority over state-of-the-art methods.

Updated: 2024-08-20 08:06:16

标题: WRIM-Net：可见-红外人员再识别的广域信息挖掘网络

摘要: 对于可见-红外人员再识别（VI-ReID）任务，主要挑战之一在于存在显著的跨模态差异。现有方法难以进行模态不变信息挖掘。它们通常只关注挖掘空间或通道等单一维度，并忽视特定模态多维信息的提取。为了充分挖掘跨范围的模态不变信息，我们引入了广范围信息挖掘网络（WRIM-Net），主要包括多维交互信息挖掘（MIIM）模块和基于辅助信息的对比学习（AICL）方法。MIIM通过引入全局区域交互（GRI），全面通过维内交互挖掘非局部空间和通道信息。此外，由于低计算复杂度设计，单独的MIIM可以放置在浅层，从而使网络更好地挖掘特定模态多维信息。AICL通过引入新颖的跨模态关键实例对比（CMKIC）损失，有效引导网络提取模态不变信息。我们不仅在著名的SYSU-MM01和RegDB数据集上进行了广泛实验，还在最新的大规模跨模态LLCM数据集上进行了实验。结果显示WRIM-Net优于现有方法。

更新时间: 2024-08-20 08:06:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.10624v1

Joint Selective State Space Model and Detrending for Robust Time Series Anomaly Detection

Deep learning-based sequence models are extensively employed in Time Series Anomaly Detection (TSAD) tasks due to their effective sequential modeling capabilities. However, the ability of TSAD is limited by two key challenges: (i) the ability to model long-range dependency and (ii) the generalization issue in the presence of non-stationary data. To tackle these challenges, an anomaly detector that leverages the selective state space model known for its proficiency in capturing long-term dependencies across various domains is proposed. Additionally, a multi-stage detrending mechanism is introduced to mitigate the prominent trend component in non-stationary data to address the generalization issue. Extensive experiments conducted on realworld public datasets demonstrate that the proposed methods surpass all 12 compared baseline methods.

Updated: 2024-08-20 08:00:02

标题: 联合选择状态空间模型和去趋势化用于稳健的时间序列异常检测

摘要: 基于深度学习的序列模型被广泛应用于时间序列异常检测（TSAD）任务，因为其有效的序列建模能力。然而，TSAD的能力受到两个关键挑战的限制：（i）模拟长程依赖的能力和（ii）在非平稳数据存在的情况下的泛化问题。为了解决这些挑战，提出了一种异常检测器，利用了选择性状态空间模型，以其在各个领域中捕获长期依赖关系的能力而闻名。此外，引入了一个多阶段去趋势机制，以减轻非平稳数据中突出的趋势成分，以解决泛化问题。在真实世界公开数据集上进行的大量实验表明，所提出的方法超越了所有12种比较基准方法。

更新时间: 2024-08-20 08:00:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19823v2

Novel Change Detection Framework in Remote Sensing Imagery Using Diffusion Models and Structural Similarity Index (SSIM)

Change detection is a crucial task in remote sensing, enabling the monitoring of environmental changes, urban growth, and disaster impact. Conventional change detection techniques, such as image differencing and ratioing, often struggle with noise and fail to capture complex variations in imagery. Recent advancements in machine learning, particularly generative models like diffusion models, offer new opportunities for enhancing change detection accuracy. In this paper, we propose a novel change detection framework that combines the strengths of Stable Diffusion models with the Structural Similarity Index (SSIM) to create robust and interpretable change maps. Our approach, named Diffusion Based Change Detector, is evaluated on both synthetic and real-world remote sensing datasets and compared with state-of-the-art methods. The results demonstrate that our method significantly outperforms traditional differencing techniques and recent deep learning-based methods, particularly in scenarios with complex changes and noise.

Updated: 2024-08-20 07:54:08

标题: 遥感图像中使用扩散模型和结构相似性指数（SSIM）的新型变化检测框架

摘要: 变化检测是遥感中的一项关键任务，可以监测环境变化、城市增长和灾害影响。传统的变化检测技术，如图像差异和比值法，经常受到噪声干扰，难以捕捉图像中复杂的变化。近年来，机器学习的最新进展，特别是像扩散模型这样的生成模型，为提高变化检测准确性提供了新的机会。本文提出了一种结合了稳定扩散模型和结构相似性指数（SSIM）优势的新型变化检测框架，用于创建稳健且可解释的变化地图。我们的方法，命名为扩散基础变化检测器，在合成和真实的遥感数据集上进行了评估，并与最先进的方法进行了比较。结果表明，我们的方法在复杂变化和噪声场景中明显优于传统的差异技术和最近的基于深度学习的方法。

更新时间: 2024-08-20 07:54:08

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2408.10619v1

OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model

Air-ground robots (AGRs) are widely used in surveillance and disaster response due to their exceptional mobility and versatility (i.e., flying and driving). Current AGR navigation systems perform well in static occlusion-prone environments (e.g., indoors) by using 3D semantic occupancy networks to predict occlusions for complete local mapping and then computing Euclidean Signed Distance Field (ESDF) for path planning. However, these systems face challenges in dynamic, severe occlusion scenes (e.g., crowds) due to limitations in perception networks' low prediction accuracy and path planners' high computation overhead. In this paper, we propose OMEGA, which contains OccMamba with an Efficient AGR-Planner to address the above-mentioned problems. OccMamba adopts a novel architecture that separates semantic and occupancy prediction into independent branches, incorporating two mamba blocks within these branches. These blocks efficiently extract semantic and geometric features in 3D environments with linear complexity, ensuring that the network can learn long-distance dependencies to improve prediction accuracy. Semantic and geometric features are combined within the Bird's Eye View (BEV) space to minimise computational overhead during feature fusion. The resulting semantic occupancy map is then seamlessly integrated into the local map, providing occlusion awareness of the dynamic environment. Our AGR-Planner utilizes this local map and employs kinodynamic A* search and gradient-based trajectory optimization to guarantee planning is ESDF-free and energy-efficient. Extensive experiments demonstrate that OccMamba outperforms the state-of-the-art 3D semantic occupancy network with 25.0% mIoU. End-to-end navigation experiments in dynamic scenes verify OMEGA's efficiency, achieving a 96% average planning success rate. Code and video are available at https://jmwang0117.github.io/OMEGA/.

Updated: 2024-08-20 07:50:29

标题: OMEGA: 针对动态环境中空地机器人的高效遮挡感知导航方法——基于状态空间模型

摘要: 空地机器人（AGRs）因其出色的机动性和多功能性（即飞行和行驶）而被广泛用于监视和灾难响应。当前的AGR导航系统在静态遮挡易发环境（例如室内）中表现良好，通过使用3D语义占用网络预测遮挡以进行完整的局部地图，并计算欧几里得符号距离场（ESDF）以进行路径规划。然而，在动态、严重遮挡场景（例如人群）中，这些系统面临挑战，因为感知网络的低预测精度和路径规划器的高计算开销限制了它们。在本文中，我们提出了OMEGA，其中包含OccMamba和高效的AGR-Planner，以解决上述问题。OccMamba采用了一种新颖的架构，将语义和占用预测分为独立的分支，并在这些分支中整合了两个mamba块。这些块能够以线性复杂度高效地提取3D环境中的语义和几何特征，确保网络能够学习长距离依赖以提高预测精度。语义和几何特征在鸟瞰图空间内结合，以减少特征融合过程中的计算开销。然后将生成的语义占用地图无缝集成到局部地图中，提供动态环境的遮挡意识。我们的AGR-Planner利用这个局部地图，采用动力学A*搜索和基于梯度的轨迹优化，以确保规划是无ESDF且高效能的。大量实验表明，OccMamba在3D语义占用网络方面的mIoU表现优于现有技术25.0%。在动态场景中的端到端导航实验验证了OMEGA的效率，实现了96%的平均规划成功率。代码和视频可在https://jmwang0117.github.io/OMEGA/上找到。

更新时间: 2024-08-20 07:50:29

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.10618v1

Generalizable Facial Expression Recognition

SOTA facial expression recognition (FER) methods fail on test sets that have domain gaps with the train set. Recent domain adaptation FER methods need to acquire labeled or unlabeled samples of target domains to fine-tune the FER model, which might be infeasible in real-world deployment. In this paper, we aim to improve the zero-shot generalization ability of FER methods on different unseen test sets using only one train set. Inspired by how humans first detect faces and then select expression features, we propose a novel FER pipeline to extract expression-related features from any given face images. Our method is based on the generalizable face features extracted by large models like CLIP. However, it is non-trivial to adapt the general features of CLIP for specific tasks like FER. To preserve the generalization ability of CLIP and the high precision of the FER model, we design a novel approach that learns sigmoid masks based on the fixed CLIP face features to extract expression features. To further improve the generalization ability on unseen test sets, we separate the channels of the learned masked features according to the expression classes to directly generate logits and avoid using the FC layer to reduce overfitting. We also introduce a channel-diverse loss to make the learned masks separated. Extensive experiments on five different FER datasets verify that our method outperforms SOTA FER methods by large margins. Code is available in https://github.com/zyh-uaiaaaa/Generalizable-FER.

Updated: 2024-08-20 07:48:45

标题: 可泛化的面部表情识别

摘要: 最先进的面部表情识别（FER）方法在与训练集存在域差异的测试集上失败。最近的领域自适应FER方法需要获取目标域的标记或未标记样本来微调FER模型，这在实际部署中可能是不可行的。在本文中，我们旨在通过仅使用一个训练集来提高FER方法在不同未知测试集上的零样本泛化能力。受人类如何首先检测面部然后选择表情特征的启发，我们提出了一种新颖的FER流程，用于从任何给定的面部图像中提取与表情相关的特征。我们的方法基于大型模型（如CLIP）提取的可泛化面部特征。然而，将CLIP的一般特征调整为特定任务（如FER）并不容易。为了保持CLIP的泛化能力和FER模型的高精度，我们设计了一种基于固定CLIP面部特征学习S型掩模的新方法来提取表情特征。为了进一步提高在未知测试集上的泛化能力，我们根据表情类别将学习的掩膜特征的通道分开，直接生成logits并避免使用FC层以减少过拟合。我们还引入了一个通道多样化损失来使学习的掩膜分离。在五个不同的FER数据集上进行的大量实验验证了我们的方法在性能上大幅优于最先进的FER方法。代码可在https://github.com/zyh-uaiaaaa/Generalizable-FER获取。

更新时间: 2024-08-20 07:48:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.10614v1

Challenging the Human-in-the-loop in Algorithmic Decision-making

We discuss the role of humans in algorithmic decision-making (ADM) for socially relevant problems from a technical and philosophical perspective. In particular, we illustrate tensions arising from diverse expectations, values, and constraints by and on the humans involved. To this end, we assume that a strategic decision-maker (SDM) introduces ADM to optimize strategic and societal goals while the algorithms' recommended actions are overseen by a practical decision-maker (PDM) - a specific human-in-the-loop - who makes the final decisions. While the PDM is typically assumed to be a corrective, it can counteract the realization of the SDM's desired goals and societal values not least because of a misalignment of these values and unmet information needs of the PDM. This has significant implications for the distribution of power between the stakeholders in ADM, their constraints, and information needs. In particular, we emphasize the overseeing PDM's role as a potential political and ethical decision maker, who acts expected to balance strategic, value-driven objectives and on-the-ground individual decisions and constraints. We demonstrate empirically, on a machine learning benchmark dataset, the significant impact an overseeing PDM's decisions can have even if the PDM is constrained to performing only a limited amount of actions differing from the algorithms' recommendations. To ensure that the SDM's intended values are realized, the PDM needs to be provided with appropriate information conveyed through tailored explanations and its role must be characterized clearly. Our findings emphasize the need for an in-depth discussion of the role and power of the PDM and challenge the often-taken view that just including a human-in-the-loop in ADM ensures the 'correct' and 'ethical' functioning of the system.

Updated: 2024-08-20 07:48:13

标题: 挑战算法决策中的人机协同

摘要: 我们从技术和哲学的角度讨论了人类在算法决策制定中的作用，尤其是在社会相关问题中。具体来说，我们阐述了涉及到不同期望、价值观和约束的人类之间的紧张关系。为此，我们假设一个战略决策者（SDM）引入ADM来优化战略和社会目标，而算法推荐的行动由一个实际决策者（PDM）监督，即一个特定的人在其中做出最终决策。虽然通常认为PDM是一个纠正者，但由于价值观的不一致和PDM未满足的信息需求，它可能会抵消SDM所期望的目标和社会价值的实现。这对ADM中利益相关者的权力分配、约束和信息需求具有重要影响。特别是，我们强调监督PDM作为潜在的政治和伦理决策者的角色，他应该平衡战略、价值驱动的目标和基层个体的决策和约束。我们通过一个机器学习基准数据集的实证研究，展示了即使PDM被限制只执行有限数量的与算法推荐不同的行动，其决策也可能产生重大影响。为了确保SDM所期望的价值得以实现，需要向PDM提供适当的信息，并明确定义其角色。我们的研究结果强调了对PDM的角色和权力进行深入探讨的必要性，并质疑通常认为在ADM中只是包括一个人在其中就能确保系统的“正确”和“伦理”功能的观点。

更新时间: 2024-08-20 07:48:13

领域: cs.LG

下载: http://arxiv.org/abs/2405.10706v2

Recent Advances in End-to-End Simultaneous Speech Translation

Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles. Secondly, satisfying real-time requirements presents inherent difficulties due to the need for immediate translation output. Thirdly, striking a balance between translation quality and latency constraints remains a critical challenge. Finally, the scarcity of annotated data adds another layer of complexity to the task. Through our exploration of these challenges and the proposed solutions, we aim to provide valuable insights into the current landscape of SimulST research and suggest promising directions for future exploration.

Updated: 2024-08-20 07:47:49

标题: 最近在端到端同时语音翻译方面取得的进展

摘要: 同时语音翻译（SimulST）是一项具有挑战性的任务，涉及实时生成翻译，并持续处理语音输入。本文全面概述了SimulST研究的最新进展，重点关注四个主要挑战。首先，处理长时间和连续语音流所带来的复杂性构成重要障碍。其次，满足实时要求会因需要立即翻译输出而产生固有困难。第三，在翻译质量和延迟约束之间取得平衡仍然是一个关键挑战。最后，标注数据的稀缺性给任务增加了另一个复杂度层面。通过对这些挑战和提出的解决方案的探讨，我们旨在提供对SimulST研究当前格局的有价值见解，并为未来探索提出有前途的方向。

更新时间: 2024-08-20 07:47:49

领域: cs.SD,cs.AI,cs.CL,eess.AS

下载: http://arxiv.org/abs/2406.00497v2

On the Approximability of Stationary Processes using the ARMA Model

We identify certain gaps in the literature on the approximability of stationary random variables using the Autoregressive Moving Average (ARMA) model. To quantify approximability, we propose that an ARMA model be viewed as an approximation of a stationary random variable. We map these stationary random variables to Hardy space functions, and formulate a new function approximation problem that corresponds to random variable approximation, and thus to ARMA. Based on this Hardy space formulation we identify a class of stationary processes where approximation guarantees are feasible. We also identify an idealized stationary random process for which we conjecture that a good ARMA approximation is not possible. Next, we provide a constructive proof that Pad\'e approximations do not always correspond to the best ARMA approximation. Finally, we note that the spectral methods adopted in this paper can be seen as a generalization of unit root methods for stationary processes even when an ARMA model is not defined.

Updated: 2024-08-20 07:42:42

标题: 使用ARMA模型近似处理稳态过程的可行性

摘要: 我们确定了文献中关于使用自回归移动平均(ARMA)模型近似稳态随机变量的一些空白。为了量化近似性，我们提出将ARMA模型视为稳态随机变量的近似。我们将这些稳态随机变量映射到Hardy空间函数，并制定了一个对应于随机变量近似及因此对应于ARMA的新函数逼近问题。基于这个Hardy空间公式，我们确定了一类稳态过程，在这类过程中近似性保证是可行的。我们还确定了一个理想化的稳态随机过程，我们猜想在这个过程中无法得到一个好的ARMA近似。接下来，我们提供了一个构造性证明，说明Padé逼近并不总是对应于最佳的ARMA逼近。最后，我们注意到本文采用的谱方法即使在没有定义ARMA模型时，也可以看作是用于稳态过程的单位根方法的一种泛化。

更新时间: 2024-08-20 07:42:42

领域: cs.LG,math.PR,stat.ME,60G10,G.3

下载: http://arxiv.org/abs/2408.10610v1

PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

We present a comprehensive framework for predicting the effects of perturbations in single cells, designed to standardize benchmarking in this rapidly evolving field. Our framework, PerturBench, includes a user-friendly platform, diverse datasets, metrics for fair model comparison, and detailed performance analysis. Extensive evaluations of published and baseline models reveal limitations like mode or posterior collapse, and underscore the importance of rank metrics that assess the ordering of perturbations alongside traditional measures like RMSE. Our findings show that simple models can outperform more complex approaches. This benchmarking exercise sets new standards for model evaluation, supports robust model development, and advances the potential of these models to use high-throughput and high-content genetic and chemical screens for disease target discovery.

Updated: 2024-08-20 07:40:20

标题: PerturBench：用于细胞扰动分析的机器学习模型基准测试

摘要: 我们提出了一个全面的框架，用于预测单个细胞中扰动的影响，旨在规范这个快速发展的领域中的基准测试。我们的框架PerturBench包括一个用户友好的平台，多样化的数据集，用于公平模型比较的指标，以及详细的性能分析。对已发表和基准模型的广泛评估揭示了例如模式或后验坍缩等限制，并强调了评估扰动排序的排名指标的重要性，以及像RMSE这样的传统指标。我们的研究结果表明，简单模型可以胜过更复杂的方法。这一基准测试练习为模型评估设定了新标准，支持强大的模型开发，并推动这些模型利用高通量和高内容的遗传和化学筛选，用于疾病靶标发现的潜力。

更新时间: 2024-08-20 07:40:20

领域: cs.LG,q-bio.GN,stat.ML

下载: http://arxiv.org/abs/2408.10609v1

Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory

Large language models (LLMs) are trained on extensive text corpora, which inevitably include biased information. Although techniques such as Affective Alignment can mitigate some negative impacts of these biases, existing prompt-based attack methods can still extract these biases from the model's weights. Moreover, these biases frequently appear subtly when LLMs are prompted to perform identical tasks across different demographic groups, thereby camouflaging their presence. To address this issue, we have formally defined the implicit bias problem and developed an innovative framework for bias removal based on Bayesian theory, Bayesian-Theory based Bias Removal (BTBR). BTBR employs likelihood ratio screening to pinpoint data entries within publicly accessible biased datasets that represent biases inadvertently incorporated during the LLM training phase. It then automatically constructs relevant knowledge triples and expunges bias information from LLMs using model editing techniques. Through extensive experimentation, we have confirmed the presence of the implicit bias problem in LLMs and demonstrated the effectiveness of our BTBR approach.

Updated: 2024-08-20 07:40:12

标题: 促进大型语言模型中的平等：基于贝叶斯理论识别和减轻隐性偏见

摘要: 大型语言模型（LLMs）是在庞大的文本语料库上训练的，其中不可避免地包含有偏见的信息。尽管诸如情感对齐等技术可以缓解这些偏见的一些负面影响，但现有的基于提示的攻击方法仍然可以从模型的权重中提取这些偏见。此外，当LLMs被提示在不同人群中执行相同任务时，这些偏见经常以微妙的方式出现，从而掩盖了它们的存在。为了解决这个问题，我们正式定义了隐性偏见问题，并基于贝叶斯理论开发了一个创新的偏见消除框架，即基于贝叶斯理论的偏见消除（BTBR）。BTBR利用似然比筛选来确定公开可访问的偏见数据集中表示在LLM训练阶段无意中纳入的偏见的数据条目。然后，它使用模型编辑技术自动构建相关知识三元组，并从LLMs中清除偏见信息。通过大量实验，我们确认了LLMs中存在隐性偏见问题，并展示了我们的BTBR方法的有效性。

更新时间: 2024-08-20 07:40:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.10608v1

Multilingual Non-Factoid Question Answering with Silver Answers

Most existing Question Answering Datasets (QuADs) primarily focus on factoid-based short-context Question Answering (QA) in high-resource languages. However, the scope of such datasets for low-resource languages remains limited, with only a few works centered on factoid-based QuADs and none on non-factoid QuADs. Therefore, this work presents MuNfQuAD, a multilingual QuAD with non-factoid questions. It utilizes interrogative sub-headings from BBC news articles as questions and the corresponding paragraphs as silver answers. The dataset comprises over 370K QA pairs across 38 languages, encompassing several low-resource languages, and stands as the largest multilingual QA dataset to date. Based on the manual annotations of 790 QA-pairs from MuNfQuAD (golden set), we observe that 98\% of questions can be answered using their corresponding silver answer. Our fine-tuned Answer Paragraph Selection (APS) model outperforms the baselines. The APS model attained an accuracy of 80\% and 72\%, as well as a macro F1 of 72\% and 66\%, on the MuNfQuAD testset and the golden set, respectively. Furthermore, the APS model effectively generalizes certain a language within the golden set, even after being fine-tuned on silver labels.

Updated: 2024-08-20 07:37:06

标题: 多语言非事实型问题回答与银色答案

摘要: 大多数现有的问答数据集（QuADs）主要关注高资源语言中基于事实的短文本问答（QA）。然而，对于低资源语言而言，这类数据集的范围仍然有限，只有少数作品集中在基于事实的QuADs上，而没有关于非事实性QuADs的作品。因此，本文提出了MuNfQuAD，一个具有非事实性问题的多语言QuAD。它利用BBC新闻文章中的疑问子标题作为问题，相应的段落作为银色答案。该数据集涵盖了38种语言的超过37万个QA对，包括几种低资源语言，是迄今为止最大的多语言QA数据集。根据MuNfQuAD中790个QA对（黄金集）的手动注释，我们观察到98\%的问题可以使用其对应的银色答案来回答。我们微调的答案段选取（APS）模型胜过了基线。APS模型在MuNfQuAD测试集上的精度为80\%和72\%，在黄金集上的宏F1为72\%和66%。此外，即使在银标签上微调后，APS模型也有效地推广了黄金集中的某些语言。

更新时间: 2024-08-20 07:37:06

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2408.10604v1

MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation

Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge. In this paper, we propose a novel multi-view MOS model (MV-MOS) by fusing motion-semantic features from different 2D representations of point clouds. To effectively exploit complementary information, the motion branches of the proposed model combines motion features from both bird's eye view (BEV) and range view (RV) representations. In addition, a semantic branch is introduced to provide supplementary semantic features of moving objects. Finally, a Mamba module is utilized to fuse the semantic features with motion features and provide effective guidance for the motion branches. We validated the effectiveness of the proposed multi-branch fusion MOS framework via comprehensive experiments, and our proposed model outperforms existing state-of-the-art models on the SemanticKITTI benchmark.

Updated: 2024-08-20 07:30:00

标题: MV-MOS：用于3D移动物体分割的多视图特征融合

摘要: 有效地总结密集的3D点云数据并提取移动物体的运动信息（移动物体分割，MOS）对自动驾驶和机器人应用至关重要。如何有效利用运动和语义特征并避免在3D到2D投影过程中的信息丢失仍然是一个关键挑战。本文提出了一种新颖的多视角MOS模型（MV-MOS），通过融合来自不同2D点云表示的运动-语义特征来实现。为了有效利用互补信息，所提出模型的运动分支结合了来自鸟瞰图（BEV）和范围视图（RV）表示的运动特征。此外，引入了一个语义分支，提供移动物体的补充语义特征。最后，利用Mamba模块融合语义特征和运动特征，并为运动分支提供有效指导。通过全面实验验证了所提出的多分支融合MOS框架的有效性，并且我们的模型在SemanticKITTI基准测试中胜过了现有的最先进模型。

更新时间: 2024-08-20 07:30:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.10602v1

Deep Index Policy for Multi-Resource Restless Matching Bandit and Its Application in Multi-Channel Scheduling

Scheduling in multi-channel wireless communication system presents formidable challenges in effectively allocating resources. To address these challenges, we investigate a multi-resource restless matching bandit (MR-RMB) model for heterogeneous resource systems with an objective of maximizing long-term discounted total rewards while respecting resource constraints. We have also generalized to applications beyond multi-channel wireless. We discuss the Max-Weight Index Matching algorithm, which optimizes resource allocation based on learned partial indexes. We have derived the policy gradient theorem for index learning. Our main contribution is the introduction of a new Deep Index Policy (DIP), an online learning algorithm tailored for MR-RMB. DIP learns the partial index by leveraging the policy gradient theorem for restless arms with convoluted and unknown transition kernels of heterogeneous resources. We demonstrate the utility of DIP by evaluating its performance for three different MR-RMB problems. Our simulation results show that DIP indeed learns the partial indexes efficiently.

Updated: 2024-08-20 07:20:10

标题: 深度指数策略在多资源不安定匹配老虎机中的应用及其在多通道调度中的应用

摘要: 在多通道无线通信系统中进行调度面临着有效分配资源的巨大挑战。为了解决这些挑战，我们研究了一种用于异构资源系统的多资源不安静匹配赌博机（MR-RMB）模型，其目标是在遵守资源约束的同时最大化长期折现总奖励。我们还将其推广到多通道无线以外的应用。我们讨论了最大权重指标匹配算法，该算法基于学习到的部分指数优化资源分配。我们推导了指数学习的策略梯度定理。我们的主要贡献是引入了一种新的深度指数策略（DIP），这是一种专为MR-RMB定制的在线学习算法。DIP通过利用对于具有复杂和未知转移核的异构资源的不安静臂的策略梯度定理来学习部分指数。我们通过评估DIP在三种不同的MR-RMB问题中的表现来展示其实用性。我们的模拟结果表明，DIP确实能够高效地学习部分指数。

更新时间: 2024-08-20 07:20:10

领域: cs.LG

下载: http://arxiv.org/abs/2408.07205v2

A Circuit Approach to Constructing Blockchains on Blockchains

Since the creation of Bitcoin 15 years ago, there has been an explosion in the number of permissionless blockchains. Each of these blockchains provides an open ledger that anyone can read from and write to. In this multi-chain world, an important question emerges: how can we build a more secure overlay blockchain by reading from and writing to a given set of blockchains? Drawing an analogy with switching circuits, we approach the problem by defining two basic compositional operations between blockchains, serial and triangular compositions, and use these operations as building blocks to construct general overlay blockchains. Under the partially synchronous setting, we have the following results: 1) the serial composition, between two blockchains, yields an overlay blockchain that is safe if at least one of the two underlay blockchains is safe and that is live if both of them are live; 2) the triangular composition between three blockchains, akin to parallel composition of switching circuits, yields an overlay blockchain that is safe if all underlay blockchains are safe and that is live if over half of them are live; 3) repeated composition of these two basic operations can yield all possible tradeoffs of safety and liveness for an overlay blockchain built on arbitrary number of underlay chains. The results are also extended to the synchronous setting.

Updated: 2024-08-20 07:18:25

标题: 一种在区块链上构建区块链的电路方法

摘要: 自比特币创立15年以来，无需许可的区块链数量激增。每个区块链都提供了一个开放的账本，任何人都可以阅读和写入。在这个多链世界中，一个重要问题出现了：我们如何通过从给定的一组区块链读取和写入来构建一个更安全的覆盖区块链？通过类比于开关电路，我们通过定义两种基本的区块链之间的组合操作，串行和三角形组合，并将这些操作作为构建通用覆盖区块链的基础模块。在部分同步设置下，我们有以下结果：1）两个区块链之间的串行组合产生一个覆盖区块链，如果其中至少一个底层区块链是安全的，则该覆盖区块链是安全的，并且如果两者都是活跃的，则该覆盖区块链是活跃的；2）三个区块链之间的三角形组合，类似于开关电路的并行组合，产生一个覆盖区块链，如果所有底层区块链都是安全的，则该覆盖区块链是安全的，并且如果超过一半的区块链是活跃的，则该覆盖区块链是活跃的；3）重复应用这两种基本操作可以产生建立在任意数量的底层链上的覆盖区块链的所有可能的安全性和活跃性权衡。这些结果也扩展到同步设置中。

更新时间: 2024-08-20 07:18:25

领域: cs.CR

下载: http://arxiv.org/abs/2402.00220v4

DeCoF: Generated Video Detection via Frame Consistency: The First Benchmark Dataset

The escalating quality of video generated by advanced video generation methods results in new security challenges, while there have been few relevant research efforts: 1) There is no open-source dataset for generated video detection, 2) No generated video detection method has been proposed so far. To this end, we propose an open-source dataset and a detection method for generated video for the first time. First, we propose a scalable dataset consisting of 964 prompts, covering various forgery targets, scenes, behaviors, and actions, as well as various generation models with different architectures and generation methods, including the most popular commercial models like OpenAI's Sora and Google's Veo. Second, we found via probing experiments that spatial artifact-based detectors lack generalizability. Hence, we propose a simple yet effective \textbf{de}tection model based on \textbf{f}rame \textbf{co}nsistency (\textbf{DeCoF}), which focuses on temporal artifacts by eliminating the impact of spatial artifacts during feature learning. Extensive experiments demonstrate the efficacy of DeCoF in detecting videos generated by unseen video generation models and confirm its powerful generalizability across several commercially proprietary models. Our code and dataset will be released at \url{https://github.com/wuwuwuyue/DeCoF}.

Updated: 2024-08-20 07:17:31

标题: DeCoF：通过帧一致性生成视频检测：第一个基准数据集

摘要: 视频生成方法不断提高的视频质量导致了新的安全挑战，但相关的研究工作却很少：1）目前没有针对生成视频检测的开源数据集，2）迄今为止尚未提出任何生成视频检测方法。为此，我们首次提出了一个开源数据集和一个生成视频检测方法。首先，我们提出了一个可扩展的数据集，包括964个提示，涵盖了各种伪造目标、场景、行为和动作，以及不同架构和生成方法的各种生成模型，包括OpenAI的Sora和Google的Veo等最流行的商业模型。其次，我们通过探测实验证明了基于空间伪影的检测器缺乏普适性。因此，我们提出了一种简单而有效的基于帧一致性（DeCoF）的检测模型，通过在特征学习过程中消除空间伪影的影响，从而专注于时间伪影。大量实验表明DeCoF在检测由未知视频生成模型生成的视频方面的有效性，并确认了其在几种商业专有模型中的强大泛化能力。我们的代码和数据集将在\url{https://github.com/wuwuwuyue/DeCoF}上发布。

更新时间: 2024-08-20 07:17:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.02085v6

A Correlation-induced Finite Difference Estimator

Finite difference (FD) approximation is a classic approach to stochastic gradient estimation when only noisy function realizations are available. In this paper, we first provide a sample-driven method via the bootstrap technique to estimate the optimal perturbation, and then propose an efficient FD estimator based on correlated samples at the estimated optimal perturbation. Furthermore, theoretical analyses of both the perturbation estimator and the FD estimator reveal that, {\it surprisingly}, the correlation enables the proposed FD estimator to achieve a reduction in variance and, in some cases, a decrease in bias compared to the traditional optimal FD estimator. Numerical results confirm the efficiency of our estimators and align well with the theory presented, especially in scenarios with small sample sizes. Finally, we apply the estimator to solve derivative-free optimization (DFO) problems, and numerical studies show that DFO problems with 100 dimensions can be effectively solved.

Updated: 2024-08-20 07:17:25

标题: 一个相关引起的有限差分估计器

摘要: 有限差分（FD）逼近是一种经典方法，用于在只有嘈杂的函数实现可用时进行随机梯度估计。在本文中，我们首先通过自举技术提供了一种基于样本驱动的方法来估计最佳扰动，然后提出了一种基于估计最佳扰动处相关样本的高效FD估计器。此外，对扰动估计器和FD估计器的理论分析表明，令人惊讶的是，相关性使得所提出的FD估计器能够实现方差的降低，并在某些情况下，与传统的最佳FD估计器相比，偏差减少。数值结果证实了我们的估计器的效率，并与所提出的理论在小样本量的情况下很好地吻合。最后，我们将估计器应用于解决无导数优化（DFO）问题，数值研究表明，具有100个维度的DFO问题可以得到有效解决。

更新时间: 2024-08-20 07:17:25

领域: stat.ME,cs.LG,cs.NA,math.NA,math.OC,90,I.6.3

下载: http://arxiv.org/abs/2405.05638v4

Breast tumor classification based on self-supervised contrastive learning from ultrasound videos

Background: Breast ultrasound is prominently used in diagnosing breast tumors. At present, many automatic systems based on deep learning have been developed to help radiologists in diagnosis. However, training such systems remains challenging because they are usually data-hungry and demand amounts of labeled data, which need professional knowledge and are expensive. Methods: We adopted a triplet network and a self-supervised contrastive learning technique to learn representations from unlabeled breast ultrasound video clips. We further designed a new hard triplet loss to to learn representations that particularly discriminate positive and negative image pairs that are hard to recognize. We also constructed a pretraining dataset from breast ultrasound videos (1,360 videos from 200 patients), which includes an anchor sample dataset with 11,805 images, a positive sample dataset with 188,880 images, and a negative sample dataset dynamically generated from video clips. Further, we constructed a finetuning dataset, including 400 images from 66 patients. We transferred the pretrained network to a downstream benign/malignant classification task and compared the performance with other state-of-the-art models, including three models pretrained on ImageNet and a previous contrastive learning model retrained on our datasets. Results and conclusion: Experiments revealed that our model achieved an area under the receiver operating characteristic curve (AUC) of 0.952, which is significantly higher than the others. Further, we assessed the dependence of our pretrained model on the number of labeled data and revealed that <100 samples were required to achieve an AUC of 0.901. The proposed framework greatly reduces the demand for labeled data and holds potential for use in automatic breast ultrasound image diagnosis.

Updated: 2024-08-20 07:16:01

标题: 基于自监督对比学习的超声视频乳腺肿瘤分类

摘要: 背景：乳腺超声在乳腺肿瘤诊断中被广泛使用。目前，许多基于深度学习的自动系统已经被开发出来，以帮助放射科医生进行诊断。然而，训练这样的系统仍然具有挑战性，因为它们通常数据需求量大，需要大量标记数据，这需要专业知识并且昂贵。方法：我们采用了三元网络和自监督对比学习技术，从未标记的乳腺超声视频片段中学习表示。我们进一步设计了一种新的硬三元损失来学习尤其可以区分难以识别的正负图像对的表示。我们还从乳腺超声视频中构建了一个预训练数据集（来自200名患者的1,360个视频），其中包括一个包含11,805张图像的锚样本数据集，一个包含188,880张图像的正样本数据集，以及从视频片段动态生成的一个负样本数据集。此外，我们构建了一个微调数据集，包括来自66名患者的400张图像。我们将预训练网络转移到下游的良恶性分类任务中，并与其他最先进的模型进行比较，包括三个在ImageNet上预训练的模型和一个先前在我们的数据集上重新训练的对比学习模型。结果和结论：实验证明，我们的模型实现了0.952的接收者操作特征曲线下的面积（AUC），明显高于其他模型。此外，我们评估了我们预训练模型对标记数据数量的依赖性，并揭示出要实现0.901的AUC需要<100个样本。提出的框架大大减少了对标记数据的需求，并有望用于自动乳腺超声图像诊断。

更新时间: 2024-08-20 07:16:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.10600v1

Hologram Reasoning for Solving Algebra Problems with Geometry Diagrams

Solving Algebra Problems with Geometry Diagrams (APGDs) is still a challenging problem because diagram processing is not studied as intensively as language processing. To work against this challenge, this paper proposes a hologram reasoning scheme and develops a high-performance method for solving APGDs by using this scheme. To reach this goal, it first defines a hologram, being a kind of graph, and proposes a hologram generator to convert a given APGD into a hologram, which represents the entire information of APGD and the relations for solving the problem can be acquired from it by a uniform way. Then HGR, a hologram reasoning method employs a pool of prepared graph models to derive algebraic equations, which is consistent with the geometric theorems. This method is able to be updated by adding new graph models into the pool. Lastly, it employs deep reinforcement learning to enhance the efficiency of model selection from the pool. The entire HGR not only ensures high solution accuracy with fewer reasoning steps but also significantly enhances the interpretability of the solution process by providing descriptions of all reasoning steps. Experimental results demonstrate the effectiveness of HGR in improving both accuracy and interpretability in solving APGDs.

Updated: 2024-08-20 07:10:05

标题: 使用几何图解解代数问题的全息推理

摘要: 使用几何图解（APGDs）解决代数问题仍然是一个具有挑战性的问题，因为图解处理并没有像语言处理那样受到深入研究。为了应对这一挑战，本文提出了一种全息推理方案，并通过使用该方案开发了一个高性能的解决APGDs的方法。为了实现这一目标，首先定义了一种全息，作为一种图形，并提出了一个全息生成器，将给定的APGD转换为一个全息，它代表了APGD的所有信息，并可以通过统一的方式从中获取解决问题的关系。然后，全息推理方法HGR采用一组准备好的图模型来推导代数方程，这与几何定理一致。该方法可以通过向池中添加新的图模型来更新。最后，它采用深度强化学习来提高从池中选择模型的效率。整个HGR不仅确保了在较少推理步骤中获得高精度的解决方案，还通过提供所有推理步骤的描述显着增强了解决过程的可解释性。实验结果证明了HGR在提高解决APGDs的准确性和可解释性方面的有效性。

更新时间: 2024-08-20 07:10:05

领域: cs.AI,cs.CG,cs.LO

下载: http://arxiv.org/abs/2408.10592v1

Panorama Tomosynthesis from Head CBCT with Simulated Projection Geometry

Cone Beam Computed Tomography (CBCT) and Panoramic X-rays are the most commonly used imaging modalities in dental health care. CBCT can produce three-dimensional views of a patient's head, providing clinicians with better diagnostic capability, whereas Panoramic X-ray can capture the entire maxillofacial region in a single image. If the CBCT is already available, it can be beneficial to synthesize a Panoramic X-ray, thereby avoiding an immediate additional scan and extra radiation exposure. Existing methods focus on delineating an approximate dental arch and creating orthogonal projections along this arch. However, no golden standard is available for such dental arch extractions, and this choice can affect the quality of synthesized X-rays. To avoid such issues, we propose a novel method for synthesizing Panoramic X-rays from diverse head CBCTs, employing a simulated projection geometry and dynamic rotation centers. Our method effectively synthesized panoramic views from CBCT, even for patients with missing or nonexistent teeth and in the presence of severe metal implants. Our results demonstrate that this method can generate high-quality panoramic images irrespective of the CBCT scanner geometry.

Updated: 2024-08-20 07:07:49

标题: 用模拟的投影几何学从头部CBCT制作全景切片图像

摘要: 锥束计算机断层扫描（CBCT）和全景X射线是牙科保健中最常用的成像模式。CBCT可以生成患者头部的三维视图，为临床医生提供更好的诊断能力，而全景X射线可以在单个图像中捕捉整个颌面区域。如果CBCT已经可用，则合成全景X射线可能有益，从而避免立即进行额外扫描和额外辐射暴露。现有方法侧重于勾画近似的牙弓并沿着这个弓创建正交投影。然而，目前还没有针对这种牙弓提取的黄金标准，而这种选择可能影响合成X射线的质量。为了避免这些问题，我们提出了一种新颖的方法，用于从不同头部CBCT合成全景X射线，采用模拟投影几何和动态旋转中心。我们的方法有效地从CBCT合成全景视图，即使对于有缺失或不存在牙齿以及存在严重金属植入物的患者也是如此。我们的结果表明，该方法能够生成高质量的全景图像，而不受CBCT扫描仪几何形状的影响。

更新时间: 2024-08-20 07:07:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.09358v2

CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

As the utilization of large language models (LLMs) has proliferated world-wide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures. In this work, we uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations, and extract symbols from these generations that are associated to each culture by the LLM. We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures. We also discover that LLMs have an uneven degree of diversity in the culture symbols, and that cultures from different geographic regions have different presence in LLMs' culture-agnostic generation. Our findings promote further research in studying the knowledge and fairness of global culture perception in LLMs. Code and Data can be found here: https://github.com/huihanlhh/Culture-Gen/

Updated: 2024-08-20 06:53:45

标题: CULTURE-GEN：通过自然语言提示揭示语言模型中的全球文化感知

摘要: 随着大型语言模型（LLMs）的利用在全球范围内不断增加，对于它们具有足够的知识和公平的代表性对于不同的全球文化至关重要。在这项工作中，我们通过文化条件生成，揭示了三个SOTA模型在8个与文化相关的主题上对110个国家和地区的文化感知，并从这些生成中提取了LLM将与每种文化相关的符号。我们发现，文化条件生成包含了能够将边缘文化与默认文化区分开的语言“标记”。我们还发现，LLMs在文化符号的多样性程度不均衡，并且来自不同地理区域的文化在LLMs的文化无关生成中的存在程度不同。我们的发现促进了进一步研究LLMs中全球文化感知的知识和公平性。代码和数据可以在此找到：https://github.com/huihanlhh/Culture-Gen/

更新时间: 2024-08-20 06:53:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.10199v5

PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety

Multi-agent systems, when enhanced with Large Language Models (LLMs), exhibit profound capabilities in collective intelligence. However, the potential misuse of this intelligence for malicious purposes presents significant risks. To date, comprehensive research on the safety issues associated with multi-agent systems remains limited. In this paper, we explore these concerns through the innovative lens of agent psychology, revealing that the dark psychological states of agents constitute a significant threat to safety. To tackle these concerns, we propose a comprehensive framework (PsySafe) grounded in agent psychology, focusing on three key areas: firstly, identifying how dark personality traits in agents can lead to risky behaviors; secondly, evaluating the safety of multi-agent systems from the psychological and behavioral perspectives, and thirdly, devising effective strategies to mitigate these risks. Our experiments reveal several intriguing phenomena, such as the collective dangerous behaviors among agents, agents' self-reflection when engaging in dangerous behavior, and the correlation between agents' psychological assessments and dangerous behaviors. We anticipate that our framework and observations will provide valuable insights for further research into the safety of multi-agent systems. We will make our data and code publicly accessible at https://github.com/AI4Good24/PsySafe.

Updated: 2024-08-20 06:45:50

标题: PsySafe：一个全面的基于心理学的多智能体系统安全攻击、防御和评估框架

摘要: 多智能体系统在增强大型语言模型（LLMs）的情况下，展现出集体智能方面的深远能力。然而，这种智能被恶意利用的潜在风险带来了重大风险。迄今为止，关于多智能体系统安全问题的全面研究仍然有限。在本文中，我们通过创新的代理心理学视角探讨了这些问题，揭示了代理的黑暗心理状态构成了安全的重大威胁。为了解决这些问题，我们提出了一个基于代理心理学的全面框架（PsySafe），重点关注三个关键领域：首先，确定代理中的黑暗人格特征如何导致风险行为；其次，从心理和行为角度评估多智能体系统的安全性；第三，制定有效的策略来减轻这些风险。我们的实验揭示了一些有趣的现象，例如代理之间的集体危险行为、代理在参与危险行为时的自我反思，以及代理的心理评估与危险行为之间的相关性。我们期待我们的框架和观察结果将为进一步研究多智能体系统的安全性提供宝贵的见解。我们将使我们的数据和代码在https://github.com/AI4Good24/PsySafe 上公开可访问。

更新时间: 2024-08-20 06:45:50

领域: cs.CL,cs.AI,cs.CR,cs.MA

下载: http://arxiv.org/abs/2401.11880v3

Box-Free Model Watermarks Are Prone to Black-Box Removal Attacks

Box-free model watermarking is an emerging technique to safeguard the intellectual property of deep learning models, particularly those for low-level image processing tasks. Existing works have verified and improved its effectiveness in several aspects. However, in this paper, we reveal that box-free model watermarking is prone to removal attacks, even under the real-world threat model such that the protected model and the watermark extractor are in black boxes. Under this setting, we carry out three studies. 1) We develop an extractor-gradient-guided (EGG) remover and show its effectiveness when the extractor uses ReLU activation only. 2) More generally, for an unknown extractor, we leverage adversarial attacks and design the EGG remover based on the estimated gradients. 3) Under the most stringent condition that the extractor is inaccessible, we design a transferable remover based on a set of private proxy models. In all cases, the proposed removers can successfully remove embedded watermarks while preserving the quality of the processed images, and we also demonstrate that the EGG remover can even replace the watermarks. Extensive experimental results verify the effectiveness and generalizability of the proposed attacks, revealing the vulnerabilities of the existing box-free methods and calling for further research.

Updated: 2024-08-20 06:37:37

标题: 不带盒子的模型水印容易受到黑盒移除攻击的影响

摘要: 盒子无水印模型是一种新兴的技术，用于保护深度学习模型的知识产权，特别是用于低级图像处理任务的模型。现有研究已经验证并改进了它在几个方面的有效性。然而，在本文中，我们揭示了盒子无水印模型容易受到删除攻击，即使在真实世界的威胁模型下，受保护的模型和水印提取器都是黑匣子。在这种设置下，我们进行了三项研究。1）我们开发了一个基于提取器梯度引导（EGG）的去除器，并展示了当提取器仅使用ReLU激活时其有效性。2）更普遍地，对于未知的提取器，我们利用对抗攻击，并基于估计的梯度设计EGG去除器。3）在最严格的条件下，即提取器不可访问时，我们设计了一个基于一组私有代理模型的可转移去除器。在所有情况下，所提出的去除器可以成功去除嵌入的水印，同时保持处理后图像的质量，并我们还展示了EGG去除器甚至可以替换水印。大量实验结果验证了所提出攻击的有效性和泛化性，揭示了现有盒子无水印方法的漏洞，并呼吁进一步研究。

更新时间: 2024-08-20 06:37:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.09863v3

Auto-ICL: In-Context Learning without Human Supervision

With in-context learning ability, the performance of large language models can be significantly boosted when provided with appropriate context. However, existing in-context learning methods mainly rely on human-provided contexts, such as labeled examples and explicit instructions. Writing context by humans is labor-intensive on various tasks and limits the model to tasks manageable by humans. To overcome these limitations, we propose Automatic In-Context Learning framework that enables the model to autonomously generate examples and instructions for problem-solving. With experiments across various models and datasets, results show that model-generated contexts outperform human-annotated contexts, including Few-Shot and Few-Shot-CoT methods, and surpass existing self-generated context methods like Zero-CoT and Auto-CoT.

Updated: 2024-08-20 06:34:37

标题: Auto-ICL: 无需人类监督的上下文学习

摘要: 随着上下文学习能力的提升，当为大型语言模型提供适当的上下文时，其性能可以显著提升。然而，现有的上下文学习方法主要依赖于人类提供的上下文，例如标记的示例和明确的指令。人类编写上下文在各种任务上需要大量的劳动力，并限制了模型仅适用于人类可管理的任务。为了克服这些限制，我们提出了自动上下文学习框架，使模型能够自主生成用于问题解决的示例和指令。通过对各种模型和数据集进行实验，结果表明，模型生成的上下文优于人工标注的上下文，包括Few-Shot和Few-Shot-CoT方法，并超越了现有的自动生成上下文方法，如Zero-CoT和Auto-CoT。

更新时间: 2024-08-20 06:34:37

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2311.09263v3

Learning Rule-Induced Subgraph Representations for Inductive Relation Prediction

Inductive relation prediction (IRP) -- where entities can be different during training and inference -- has shown great power for completing evolving knowledge graphs. Existing works mainly focus on using graph neural networks (GNNs) to learn the representation of the subgraph induced from the target link, which can be seen as an implicit rule-mining process to measure the plausibility of the target link. However, these methods cannot differentiate the target link and other links during message passing, hence the final subgraph representation will contain irrelevant rule information to the target link, which reduces the reasoning performance and severely hinders the applications for real-world scenarios. To tackle this problem, we propose a novel \textit{single-source edge-wise} GNN model to learn the \textbf{R}ule-induc\textbf{E}d \textbf{S}ubgraph represen\textbf{T}ations (\textbf{REST}), which encodes relevant rules and eliminates irrelevant rules within the subgraph. Specifically, we propose a \textit{single-source} initialization approach to initialize edge features only for the target link, which guarantees the relevance of mined rules and target link. Then we propose several RNN-based functions for \textit{edge-wise} message passing to model the sequential property of mined rules. REST is a simple and effective approach with theoretical support to learn the \textit{rule-induced subgraph representation}. Moreover, REST does not need node labeling, which significantly accelerates the subgraph preprocessing time by up to \textbf{11.66$\times$}. Experiments on inductive relation prediction benchmarks demonstrate the effectiveness of our REST. Our code is available at https://github.com/smart-lty/REST.

Updated: 2024-08-20 06:33:40

标题: 学习规则诱导的子图表示用于归纳关系预测

摘要: 归纳关系预测（IRP）--在训练和推理过程中实体可以不同--已经显示出对于完善不断发展的知识图谱具有巨大的能力。现有工作主要集中在使用图神经网络（GNNs）学习从目标链接诱导的子图的表示，这可以看作是一个隐式规则挖掘过程，用于衡量目标链接的合理性。然而，这些方法在消息传递过程中无法区分目标链接和其他链接，因此最终的子图表示将包含与目标链接无关的规则信息，这降低了推理性能，并严重阻碍了在现实场景中的应用。为了解决这个问题，我们提出了一种新颖的\textit{单源边逐一} GNN 模型来学习\textbf{R}ule-induc\textbf{E}d \textbf{S}ubgraph 表示（\textbf{REST}），它编码了相关规则并消除了子图中的无关规则。具体来说，我们提出了一种\textit{单源}初始化方法，仅为目标链接初始化边特征，这保证了挖掘规则和目标链接的相关性。然后我们提出了几个基于 RNN 的函数用于\textit{边逐一}消息传递，以建模挖掘规则的顺序属性。REST 是一种简单而有效的方法，具有理论支持，用于学习\textit{规则诱导的子图表示}。此外，REST 不需要节点标记，从而将子图预处理时间加速了\textbf{11.66$\times$}。在归纳关系预测基准上的实验证明了我们的 REST 的有效性。我们的代码可在 https://github.com/smart-lty/REST 上找到。

更新时间: 2024-08-20 06:33:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.07088v2

Putting People in LLMs' Shoes: Generating Better Answers via Question Rewriter

Large Language Models (LLMs) have demonstrated significant capabilities, particularly in the domain of question answering (QA). However, their effectiveness in QA is often undermined by the vagueness of user questions. To address this issue, we introduce single-round instance-level prompt optimization, referred to as question rewriter. By enhancing the intelligibility of human questions for black-box LLMs, our question rewriter improves the quality of generated answers. The rewriter is optimized using direct preference optimization based on feedback collected from automatic criteria for evaluating generated answers; therefore, its training does not require costly human annotations. The experiments across multiple black-box LLMs and long-form question answering (LFQA) datasets demonstrate the efficacy of our method. This paper provides a practical framework for training question rewriters and sets a precedent for future explorations in prompt optimization within LFQA tasks. Code is available at \url{https://github.com/3244we/Question-Rewriter}.

Updated: 2024-08-20 06:24:47

标题: 让人们穿上LLMs的鞋子：通过问题重写器生成更好的答案

摘要: 大型语言模型（LLMs）已经展示出显著的能力，特别是在问答（QA）领域。然而，它们在问答中的有效性常常受到用户问题的模糊性的影响。为了解决这个问题，我们引入了一种称为问题重写器的单轮实例级提示优化方法。通过提高人类问题对黑盒LLMs的可理解性，我们的问题重写器提高了生成答案的质量。问题重写器是使用基于从评估生成答案的自动标准收集的反馈的直接偏好优化来优化的；因此，其训练不需要昂贵的人工注释。在多个黑盒LLMs和长格式问答（LFQA）数据集上进行的实验显示了我们方法的有效性。本文提供了一个实用的框架，用于训练问题重写器，并为将来在LFQA任务中进行提示优化的探索设立了先例。代码可在\url{https://github.com/3244we/Question-Rewriter}获取。

更新时间: 2024-08-20 06:24:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.10573v1

A Tutorial on Explainable Image Classification for Dementia Stages Using Convolutional Neural Network and Gradient-weighted Class Activation Mapping

This paper presents a tutorial of an explainable approach using Convolutional Neural Network (CNN) and Gradient-weighted Class Activation Mapping (Grad-CAM) to classify four progressive dementia stages based on open MRI brain images. The detailed implementation steps are demonstrated with an explanation. Whilst the proposed CNN architecture is demonstrated to achieve more than 99% accuracy for the test dataset, the computational procedure of CNN remains a black box. The visualisation based on Grad-CAM is attempted to explain such very high accuracy and may provide useful information for physicians. Future motivation based on this work is discussed.

Updated: 2024-08-20 06:23:20

标题: 使用卷积神经网络和梯度加权类激活映射解释性图像分类教程，用于痴呆症阶段

摘要: 本文介绍了一种使用卷积神经网络（CNN）和梯度加权类激活映射（Grad-CAM）的可解释方法教程，用于基于开放的MRI脑图像对四个进行性痴呆阶段进行分类。详细的实现步骤通过解释进行了演示。虽然所提出的CNN架构在测试数据集上实现了超过99％的准确度，但CNN的计算过程仍然是一个黑匣子。基于Grad-CAM的可视化尝试解释了这种非常高的准确性，并可能为医生提供有用的信息。基于这项工作的未来动机也进行了讨论。

更新时间: 2024-08-20 06:23:20

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2408.10572v1

Prompt Your Brain: Scaffold Prompt Tuning for Efficient Adaptation of fMRI Pre-trained Model

We introduce Scaffold Prompt Tuning (ScaPT), a novel prompt-based framework for adapting large-scale functional magnetic resonance imaging (fMRI) pre-trained models to downstream tasks, with high parameter efficiency and improved performance compared to fine-tuning and baselines for prompt tuning. The full fine-tuning updates all pre-trained parameters, which may distort the learned feature space and lead to overfitting with limited training data which is common in fMRI fields. In contrast, we design a hierarchical prompt structure that transfers the knowledge learned from high-resource tasks to low-resource ones. This structure, equipped with a Deeply-conditioned Input-Prompt (DIP) mapping module, allows for efficient adaptation by updating only 2% of the trainable parameters. The framework enhances semantic interpretability through attention mechanisms between inputs and prompts, and it clusters prompts in the latent space in alignment with prior knowledge. Experiments on public resting state fMRI datasets reveal ScaPT outperforms fine-tuning and multitask-based prompt tuning in neurodegenerative diseases diagnosis/prognosis and personality trait prediction, even with fewer than 20 participants. It highlights ScaPT's efficiency in adapting pre-trained fMRI models to low-resource tasks.

Updated: 2024-08-20 06:08:37

标题: 激发你的大脑：为fMRI预训练模型的高效适应调整支架提示

摘要: 我们介绍了一种名为Scaffold Prompt Tuning（ScaPT）的新型基于提示的框架，用于将大规模功能性磁共振成像（fMRI）预训练模型调整到下游任务中，相比微调和提示调整的基线，具有更高的参数效率和改进的性能。完全微调更新所有预训练参数，这可能会扭曲学习到的特征空间，并导致过拟合，而在fMRI领域中常见的有限训练数据。相反，我们设计了一个分层提示结构，将从高资源任务学到的知识转移到低资源任务中。该结构配备了一个深度条件输入提示（DIP）映射模块，通过仅更新可训练参数的2%来实现高效的调整。该框架通过输入和提示之间的注意机制增强了语义可解释性，并在与先前知识一致的潜在空间中对提示进行聚类。对公共静息态fMRI数据集的实验表明，ScaPT在神经退行性疾病的诊断/预后和人格特质预测方面优于微调和多任务提示调整，即使参与者少于20人。这凸显了ScaPT在将预训练fMRI模型调整到低资源任务中的效率。

更新时间: 2024-08-20 06:08:37

领域: q-bio.NC,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.10567v1

The Stable Model Semantics for Higher-Order Logic Programming

We propose a stable model semantics for higher-order logic programs. Our semantics is developed using Approximation Fixpoint Theory (AFT), a powerful formalism that has successfully been used to give meaning to diverse non-monotonic formalisms. The proposed semantics generalizes the classical two-valued stable model semantics of (Gelfond and Lifschitz 1988) as-well-as the three-valued one of (Przymusinski 1990), retaining their desirable properties. Due to the use of AFT, we also get for free alternative semantics for higher-order logic programs, namely supported model, Kripke-Kleene, and well-founded. Additionally, we define a broad class of stratified higher-order logic programs and demonstrate that they have a unique two-valued higher-order stable model which coincides with the well-founded semantics of such programs. We provide a number of examples in different application domains, which demonstrate that higher-order logic programming under the stable model semantics is a powerful and versatile formalism, which can potentially form the basis of novel ASP systems.

Updated: 2024-08-20 06:03:52

标题: Higher-Order Logic Programming的稳定模型语义

摘要: 我们提出了一个稳定的模型语义，用于高阶逻辑程序。我们的语义是使用近似不动点理论（AFT）开发的，这是一种强大的形式主义，已成功用于赋予各种非单调形式主义意义。所提出的语义将经典的两值稳定模型语义（Gelfond和Lifschitz，1988年）以及三值语义（Przymusinski，1990年）进行了泛化，保留了它们的良好属性。由于使用了AFT，我们还免费获得了高阶逻辑程序的替代语义，即支持模型、Kripke-Kleene和基础良好。此外，我们定义了一类广泛的分层高阶逻辑程序，并证明它们具有独特的两值高阶稳定模型，与这些程序的基础良好语义相一致。我们提供了不同应用领域的几个示例，证明了在稳定模型语义下，高阶逻辑编程是一种强大而多功能的形式主义，可以潜在地构成新型ASP系统的基础。

更新时间: 2024-08-20 06:03:52

领域: cs.LO,cs.AI,cs.PL,I.2.3; I.2.5; F.3.2

下载: http://arxiv.org/abs/2408.10563v1

Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre-collected offline datasets that represent real-world complexities and practical applications. However, existing datasets often fall short in their simplicity and lack of realism. To address this gap, we propose Hokoff, a comprehensive set of pre-collected datasets that covers both offline RL and offline MARL, accompanied by a robust framework, to facilitate further research. This data is derived from Honor of Kings, a recognized Multiplayer Online Battle Arena (MOBA) game known for its intricate nature, closely resembling real-life situations. Utilizing this framework, we benchmark a variety of offline RL and offline MARL algorithms. We also introduce a novel baseline algorithm tailored for the inherent hierarchical action space of the game. We reveal the incompetency of current offline RL approaches in handling task complexity, generalization and multi-task learning.

Updated: 2024-08-20 05:38:50

标题: Hokoff：英雄联盟真实游戏数据集及其离线强化学习基准

摘要: 离线强化学习（RL）和离线多智能体强化学习（MARL）的进展严重依赖于高质量的、预先收集的离线数据集，这些数据集代表了真实世界的复杂性和实际应用。然而，现有的数据集通常在简单性和现实性方面存在不足。为了解决这一差距，我们提出了Hokoff，一个全面的预先收集数据集，涵盖了离线RL和离线MARL，配备了一个强大的框架，以促进进一步的研究。这些数据来自于《王者荣耀》，这是一款以其复杂性而闻名的多人在线战斗竞技游戏，紧密地类似于现实生活中的情况。利用这个框架，我们对各种离线RL和离线MARL算法进行了基准测试。我们还介绍了一种针对游戏固有分层行动空间的新型基准算法。我们揭示了当前离线RL方法在处理任务复杂性、泛化和多任务学习方面的不足。

更新时间: 2024-08-20 05:38:50

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.10556v1

Target-Prompt Online Graph Collaborative Learning for Temporal QoS Prediction

In service-oriented architecture, accurately predicting the Quality of Service (QoS) is vital for maintaining reliability and enhancing user satisfaction. However, current methods often neglect high-order latent collaborative relationships and fail to dynamically adjust feature learning for specific user-service invocations, which are critical for precise feature extraction. Moreover, relying on RNNs to capture QoS evolution limits the ability to detect long-term trends due to challenges in managing long-range dependencies. To address these issues, we propose the Target-Prompt Online Graph Collaborative Learning (TOGCL) framework for temporal QoS prediction. It leverages a dynamic user-service invocation graph to comprehensively model historical interactions. Building on this graph, it develops a target-prompt graph attention network to extract online deep latent features of users and services at each time slice, considering implicit target-neighboring collaborative relationships and historical QoS values. Additionally, a multi-layer Transformer encoder is employed to uncover temporal feature evolution patterns, enhancing temporal QoS prediction. Extensive experiments on the WS-DREAM dataset demonstrate that TOGCL significantly outperforms state-of-the-art methods across multiple metrics, achieving improvements of up to 38.80\%. These results underscore the effectiveness of TOGCL for temporal QoS prediction.

Updated: 2024-08-20 05:38:47

标题: 目标驱动的在线图协作学习用于时间QoS预测

摘要: 在面向服务的架构中，准确预测服务质量（QoS）对于维持可靠性和增强用户满意度至关重要。然而，当前的方法经常忽略高阶潜在的协作关系，并且未能动态调整特征学习以适应特定用户服务调用，这对于精确特征提取至关重要。此外，依靠循环神经网络（RNNs）捕捉QoS演变限制了检测长期趋势的能力，因为长距离依赖性管理存在挑战。为了解决这些问题，我们提出了用于时间QoS预测的Target-Prompt Online Graph Collaborative Learning（TOGCL）框架。它利用动态用户服务调用图来全面建模历史交互。基于这个图，它开发了一个目标提示图注意网络，在每个时间片段提取用户和服务的在线深层潜在特征，考虑了隐式目标相邻协作关系和历史QoS值。此外，采用了多层Transformer编码器来揭示时间特征演变模式，增强时间QoS预测。对WS-DREAM数据集的大量实验表明，TOGCL在多个指标上明显优于最先进的方法，改善幅度高达38.80\%。这些结果强调了TOGCL对于时间QoS预测的有效性。

更新时间: 2024-08-20 05:38:47

领域: cs.LG,cs.IR,68T99,H.4.0; I.2.0

下载: http://arxiv.org/abs/2408.10555v1

Information-Theoretic Foundations for Machine Learning

The staggering progress of machine learning in the past decade has been a sight to behold. In retrospect, it is both remarkable and unsettling that these milestones were achievable with little to no rigorous theory to guide experimentation. Despite this fact, practitioners have been able to guide their future experimentation via observations from previous large-scale empirical investigations. However, alluding to Plato's Allegory of the cave, it is likely that the observations which form the field's notion of reality are but shadows representing fragments of that reality. In this work, we propose a theoretical framework which attempts to answer what exists outside of the cave. To the theorist, we provide a framework which is mathematically rigorous and leaves open many interesting ideas for future exploration. To the practitioner, we provide a framework whose results are very intuitive, general, and which will help form principles to guide future investigations. Concretely, we provide a theoretical framework rooted in Bayesian statistics and Shannon's information theory which is general enough to unify the analysis of many phenomena in machine learning. Our framework characterizes the performance of an optimal Bayesian learner, which considers the fundamental limits of information. Throughout this work, we derive very general theoretical results and apply them to derive insights specific to settings ranging from data which is independently and identically distributed under an unknown distribution, to data which is sequential, to data which exhibits hierarchical structure amenable to meta-learning. We conclude with a section dedicated to characterizing the performance of misspecified algorithms. These results are exciting and particularly relevant as we strive to overcome increasingly difficult machine learning challenges in this endlessly complex world.

Updated: 2024-08-20 05:34:20

标题: 信息论基础在机器学习中的应用

摘要: 在过去的十年中，机器学习取得了惊人的进展，这是一幅令人瞩目的景象。回顾过去，令人感到非常不可思议和不安的是，这些里程碑的实现几乎没有严格的理论指导实验。尽管如此，从先前的大规模经验调查中获得的观察结果，使实践者能够引导他们未来的实验。然而，暗示着柏拉图的“洞穴寓言”，形成该领域现实观念的观察结果可能只是代表那个现实的片段的影子。在这项工作中，我们提出了一个理论框架，试图回答洞穴之外的存在是什么。对于理论家，我们提供了一个数学严谨的框架，留下许多有趣的想法供未来探索。对于实践者，我们提供了一个结果非常直观、通用的框架，将有助于形成指导未来调查的原则。具体来说，我们提供了一个根植于贝叶斯统计和香农信息理论的理论框架，足够通用以统一对机器学习中许多现象的分析。我们的框架描述了最优贝叶斯学习者的表现，考虑了信息的基本限制。在整个工作过程中，我们得出了非常通用的理论结果，并将它们应用到从未知分布下独立同分布的数据，到序列数据，再到适合进行元学习的具有层次结构的数据等各种情境中得出具体见解。最后，我们还有一个专门用于描述错配算法性能的部分。这些结果令人振奋，特别是在我们努力克服这个无穷复杂世界中越来越困难的机器学习挑战时。

更新时间: 2024-08-20 05:34:20

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.12288v3

Enhancing Adversarial Transferability with Adversarial Weight Tuning

Deep neural networks (DNNs) are vulnerable to adversarial examples (AEs) that mislead the model while appearing benign to human observers. A critical concern is the transferability of AEs, which enables black-box attacks without direct access to the target model. However, many previous attacks have failed to explain the intrinsic mechanism of adversarial transferability. In this paper, we rethink the property of transferable AEs and reformalize the formulation of transferability. Building on insights from this mechanism, we analyze the generalization of AEs across models with different architectures and prove that we can find a local perturbation to mitigate the gap between surrogate and target models. We further establish the inner connections between model smoothness and flat local maxima, both of which contribute to the transferability of AEs. Further, we propose a new adversarial attack algorithm, \textbf{A}dversarial \textbf{W}eight \textbf{T}uning (AWT), which adaptively adjusts the parameters of the surrogate model using generated AEs to optimize the flat local maxima and model smoothness simultaneously, without the need for extra data. AWT is a data-free tuning method that combines gradient-based and model-based attack methods to enhance the transferability of AEs. Extensive experiments on a variety of models with different architectures on ImageNet demonstrate that AWT yields superior performance over other attacks, with an average increase of nearly 5\% and 10\% attack success rates on CNN-based and Transformer-based models, respectively, compared to state-of-the-art attacks.

Updated: 2024-08-20 05:28:55

标题: 通过对抗权重调整增强对抗可迁移性

摘要: 深度神经网络(DNNs)容易受到误导模型但对人类观察者看来无害的对抗样本(AEs)的影响。一个关键问题是AE的可迁移性，它使得黑盒攻击成为可能，即在没有直接访问目标模型的情况下进行攻击。然而，许多先前的攻击并未能解释对抗可迁移性的内在机制。在本文中，我们重新思考了可迁移AE的属性，并重新制定了可迁移性的公式化。基于对这一机制的洞察，我们分析了在具有不同架构的模型之间的AE的泛化，并证明我们可以找到一个局部扰动来减轻替代和目标模型之间的差距。我们进一步建立了模型光滑性和平坦局部极值之间的内在联系，这两者都促进了AE的可迁移性。此外，我们提出了一种新的对抗攻击算法，即对抗权重调整(AWT)，它通过自适应地调整替代模型的参数，使用生成的AE来同时优化平坦局部极值和模型光滑性，无需额外数据。AWT是一种无数据调整方法，它结合了基于梯度和基于模型的攻击方法，以增强AE的可迁移性。在ImageNet上使用不同架构的各种模型进行的大量实验表明，与最先进的攻击相比，AWT在CNN和基于Transformer的模型上分别提高了近5\%和10%的攻击成功率。

更新时间: 2024-08-20 05:28:55

领域: cs.CR

下载: http://arxiv.org/abs/2408.09469v2

MoDeGPT: Modular Decomposition for Large Language Model Compression

Large Language Models (LLMs) have reshaped the landscape of artificial intelligence by demonstrating exceptional performance across various tasks. However, substantial computational requirements make their deployment challenging on devices with limited resources. Recently, compression methods using low-rank matrix techniques have shown promise, yet these often lead to degraded accuracy or introduce significant overhead in parameters and inference latency. This paper introduces \textbf{Mo}dular \textbf{De}composition (MoDeGPT), a novel structured compression framework that does not need recovery fine-tuning while resolving the above drawbacks. MoDeGPT partitions the Transformer block into modules comprised of matrix pairs and reduces the hidden dimensions via reconstructing the module-level outputs. MoDeGPT is developed based on a theoretical framework that utilizes three well-established matrix decomposition algorithms -- Nystr\"om approximation, CR decomposition, and SVD -- and applies them to our redefined transformer modules. Our comprehensive experiments show MoDeGPT, without backward propagation, matches or surpasses previous structured compression methods that rely on gradient information, and saves 98% of compute costs on compressing a 13B model. On \textsc{Llama}-2/3 and OPT models, MoDeGPT maintains 90-95% zero-shot performance with 25-30% compression rates. Moreover, the compression can be done on a single GPU within a few hours and increases the inference throughput by up to 46%.

Updated: 2024-08-20 05:28:27

标题: MoDeGPT：用于大型语言模型压缩的模块化分解

摘要: 大型语言模型（LLMs）通过在各种任务中展示出色的性能，已经重塑了人工智能的格局。然而，巨大的计算需求使它们在资源有限的设备上部署具有挑战性。最近，使用低秩矩阵技术的压缩方法显示出潜力，但这些方法通常会导致精度下降或引入参数和推理延迟方面的显著开销。本文介绍了MoDeGPT（Modular Decomposition），这是一个新颖的结构化压缩框架，不需要恢复微调，同时解决了上述缺点。MoDeGPT将Transformer块分成由矩阵对组成的模块，并通过重建模块级别的输出来减少隐藏维度。MoDeGPT基于一个理论框架开发，该框架利用了三种成熟的矩阵分解算法 - Nyström近似、CR分解和SVD - 并将它们应用于我们重新定义的Transformer模块。我们的综合实验表明，MoDeGPT在不进行反向传播的情况下，与依赖梯度信息的先前结构化压缩方法相匹配或超越，并在压缩13B模型时节省了98%的计算成本。在Llama-2/3和OPT模型上，MoDeGPT在25-30%的压缩率下保持了90-95%的零-shot性能。此外，压缩可以在单个GPU上完成，耗时几小时，并且将推理吞吐量提高了高达46%。

更新时间: 2024-08-20 05:28:27

领域: cs.LG,cs.CL,stat.ML,15A23 (Primary),I.2.7

下载: http://arxiv.org/abs/2408.09632v2

Periodic agent-state based Q-learning for POMDPs

The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP. However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy. Our main thesis that we illustrate via examples is that because the agent state does not satisfy the Markov property, non-stationary agent-state based policies can outperform stationary ones. To leverage this feature, we propose PASQL (periodic agent-state based Q-learning), which is a variant of agent-state-based Q-learning that learns periodic policies. By combining ideas from periodic Markov chains and stochastic approximation, we rigorously establish that PASQL converges to a cyclic limit and characterize the approximation error of the converged periodic policy. Finally, we present a numerical experiment to highlight the salient features of PASQL and demonstrate the benefit of learning periodic policies over stationary policies.

Updated: 2024-08-20 05:16:36

标题: Periodic agent-state based Q-learning for POMDPs POMDPs中基于周期性代理状态的Q学习

摘要: 部分可观察马尔可夫决策过程（POMDPs）的标准方法是将它们转换为完全观察到的信念状态MDP。然而，信念状态取决于系统模型，因此在强化学习（RL）环境中不可行。一个广泛使用的替代方法是使用代理状态，这是一个无模型、可递归更新的观察历史的函数。例如包括帧堆叠和循环神经网络。由于代理状态是无模型的，它用于调整标准RL算法以适应POMDPs。然而，像Q-learning这样的标准RL算法学习一个静态策略。我们通过示例展示的主要论点是，由于代理状态不满足马尔可夫性质，非静态代理状态策略可以胜过静态策略。为了利用这一特性，我们提出了PASQL（基于周期性代理状态的Q-learning），这是一种学习周期性策略的代理状态基础Q-learning的变种。通过结合周期性马尔可夫链和随机逼近的思想，我们严格地证明了PASQL收敛到一个循环极限，并表征了收敛的周期性策略的逼近误差。最后，我们进行了一个数值实验，以突出PASQL的显著特点，并展示了学习周期性策略相对于静态策略的好处。

更新时间: 2024-08-20 05:16:36

领域: cs.LG

下载: http://arxiv.org/abs/2407.06121v2

PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering

Image composition involves seamlessly integrating given objects into a specific visual context. Current training-free methods rely on composing attention weights from several samplers to guide the generator. However, since these weights are derived from disparate contexts, their combination leads to coherence confusion and loss of appearance information. These issues worsen with their excessive focus on background generation, even when unnecessary in this task. This not only impedes their swift implementation but also compromises foreground generation quality. Moreover, these methods introduce unwanted artifacts in the transition area. In this paper, we formulate image composition as a subject-based local editing task, solely focusing on foreground generation. At each step, the edited foreground is combined with the noisy background to maintain scene consistency. To address the remaining issues, we propose PrimeComposer, a faster training-free diffuser that composites the images by well-designed attention steering across different noise levels. This steering is predominantly achieved by our Correlation Diffuser, utilizing its self-attention layers at each step. Within these layers, the synthesized subject interacts with both the referenced object and background, capturing intricate details and coherent relationships. This prior information is encoded into the attention weights, which are then integrated into the self-attention layers of the generator to guide the synthesis process. Besides, we introduce a Region-constrained Cross-Attention to confine the impact of specific subject-related tokens to desired regions, addressing the unwanted artifacts shown in the prior method thereby further improving the coherence in the transition area. Our method exhibits the fastest inference efficiency and extensive experiments demonstrate our superiority both qualitatively and quantitatively.

Updated: 2024-08-20 05:14:00

标题: PrimeComposer：用于图像合成的更快的逐步组合扩散与注意力导向

摘要: 图像合成涉及将给定对象无缝地整合到特定的视觉背景中。目前的无训练方法依赖于从几个采样器中组合注意力权重来引导生成器。然而，由于这些权重来自不同的上下文，它们的组合导致一致性混乱和外观信息的丢失。这些问题在它们过度关注背景生成时变得更加严重，即使在这个任务中是不必要的。这不仅阻碍了它们的快速实施，还损害了前景生成质量。此外，这些方法在过渡区域引入了不必要的伪影。在本文中，我们将图像合成定义为一个基于主题的局部编辑任务，专注于前景生成。在每个步骤中，编辑后的前景与嘈杂的背景相结合，以保持场景的一致性。为了解决剩余的问题，我们提出了PrimeComposer，一个更快的无训练扩散器，通过设计良好的注意力引导跨不同噪声水平来合成图像。这种引导主要是通过我们的相关扩散器实现的，利用其在每个步骤中的自注意力层。在这些层中，合成的主题与引用的对象和背景互动，捕捉复杂的细节和连贯的关系。这些先前的信息被编码到注意力权重中，然后集成到生成器的自注意力层中，以引导合成过程。此外，我们引入了一个区域约束的交叉注意力，将特定主题相关令牌的影响限制在所需的区域，从而解决了先前方法中显示的不想要的伪影，进一步改善了过渡区域的连贯性。我们的方法展示了最快的推理效率，广泛的实验证明了我们在质量和数量上的优越性。

更新时间: 2024-08-20 05:14:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.05053v3

AI-Based IVR

The use of traditional IVR (Interactive Voice Response) methods often proves insufficient to meet customer needs. This article examines the application of artificial intelligence (AI) technologies to enhance the efficiency of IVR systems in call centers. A proposed approach is based on the integration of speech-to-text conversion solutions, text query classification using large language models (LLM), and speech synthesis. Special attention is given to adapting these technologies to work with the Kazakh language, including fine-tuning models on specialized datasets. The practical aspects of implementing the developed system in a real call center for query classification are described. The research results demonstrate that the application of AI technologies in call center IVR systems reduces operator workload, improves customer service quality, and increases the efficiency of query processing. The proposed approach can be adapted for use in call centers operating with various languages.

Updated: 2024-08-20 05:04:40

标题: 基于人工智能的交互式语音响应系统(AI-Based IVR)

摘要: 传统IVR（交互式语音应答）方法通常无法满足客户需求。本文研究了将人工智能（AI）技术应用于增强呼叫中心IVR系统效率的方法。提出的方法基于集成语音转文本解决方案、使用大型语言模型（LLM）进行文本查询分类以及语音合成。特别关注将这些技术调整为适用于哈萨克语，包括在专门数据集上对模型进行微调。描述了在真实呼叫中心中实施开发系统用于查询分类的实际方面。研究结果表明，在呼叫中心IVR系统中应用AI技术可以减轻操作员工作负担，提高客户服务质量，并提高查询处理效率。提出的方法可以适用于使用各种语言的呼叫中心。

更新时间: 2024-08-20 05:04:40

领域: cs.AI

下载: http://arxiv.org/abs/2408.10549v1

Diff-PCC: Diffusion-based Neural Compression for 3D Point Clouds

Stable diffusion networks have emerged as a groundbreaking development for their ability to produce realistic and detailed visual content. This characteristic renders them ideal decoders, capable of producing high-quality and aesthetically pleasing reconstructions. In this paper, we introduce the first diffusion-based point cloud compression method, dubbed Diff-PCC, to leverage the expressive power of the diffusion model for generative and aesthetically superior decoding. Different from the conventional autoencoder fashion, a dual-space latent representation is devised in this paper, in which a compressor composed of two independent encoding backbones is considered to extract expressive shape latents from distinct latent spaces. At the decoding side, a diffusion-based generator is devised to produce high-quality reconstructions by considering the shape latents as guidance to stochastically denoise the noisy point clouds. Experiments demonstrate that the proposed Diff-PCC achieves state-of-the-art compression performance (e.g., 7.711 dB BD-PSNR gains against the latest G-PCC standard at ultra-low bitrate) while attaining superior subjective quality. Source code will be made publicly available.

Updated: 2024-08-20 04:55:29

标题: Diff-PCC：基于扩散的神经网络压缩技术用于3D点云

摘要: 稳定的扩散网络已经成为一项开创性的发展，因为它们能够生成逼真和详细的视觉内容。这一特点使它们成为理想的解码器，能够生成高质量和美观的重建。本文介绍了第一个基于扩散的点云压缩方法，命名为Diff-PCC，以利用扩散模型的表达能力进行生成和美学上优越的解码。与传统的自动编码器方式不同，本文设计了一个双空间潜在表示，其中考虑到由两个独立编码主干组成的压缩器，以从不同的潜在空间提取表达形状的潜在。在解码方面，设计了一个基于扩散的生成器，通过将形状潜在视为引导来随机去噪嘈杂的点云，从而生成高质量的重建。实验证明，所提出的Diff-PCC在达到卓越的主观质量的同时，实现了最先进的压缩性能（例如，在超低比特率下，与最新的G-PCC标准相比，获得了7.711 dB的BD-PSNR增益）。源代码将被公开发布。

更新时间: 2024-08-20 04:55:29

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2408.10543v1

QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering an innovative approach to synthesizing musical content from textual descriptions. Achieving high accuracy and diversity in this generation process requires extensive, high-quality data, including both high-fidelity audio waveforms and detailed text descriptions, which often constitute only a small portion of available datasets. In open-source datasets, issues such as low-quality music waveforms, mislabeling, weak labeling, and unlabeled data significantly hinder the development of music generation models. To address these challenges, we propose a novel paradigm for high-quality music generation that incorporates a quality-aware training strategy, enabling generative models to discern the quality of input music waveforms during training. Leveraging the unique properties of musical signals, we first adapted and implemented a masked diffusion transformer (MDT) model for the TTM task, demonstrating its distinct capacity for quality control and enhanced musicality. Additionally, we address the issue of low-quality captions in TTM with a caption refinement data processing approach. Experiments demonstrate our state-of-the-art (SOTA) performance on MusicCaps and the Song-Describer Dataset. Our demo page can be accessed at https://qa-mdt.github.io/.

Updated: 2024-08-20 04:54:40

标题: QA-MDT：面向音乐生成增强的质量感知蒙版扩散变压器

摘要: 近年来，基于扩散的文本到音乐（TTM）生成方法逐渐受到关注，为从文本描述中合成音乐内容提供了创新的途径。在这个生成过程中实现高准确度和多样性需要大量高质量的数据，包括高保真度的音频波形和详细的文本描述，而这些通常只构成可用数据集的一小部分。在开源数据集中，低质量的音乐波形、错误标记、弱标记和未标记数据等问题显著阻碍了音乐生成模型的发展。为了解决这些挑战，我们提出了一种新颖的高质量音乐生成范式，这种方法融入了一个质量感知训练策略，使生成模型能够在训练过程中辨别输入音乐波形的质量。利用音乐信号的独特特性，我们首先为TTM任务改进并实现了一种蒙版扩散变压器（MDT）模型，展示了其在质量控制和增强音乐性方面的独特能力。另外，我们通过一种标题细化数据处理方法解决了TTM中低质量标题的问题。实验展示了我们在MusicCaps和Song-Describer数据集上的最新性能。我们的演示页面可以在https://qa-mdt.github.io/访问。

更新时间: 2024-08-20 04:54:40

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2405.15863v2

MalLight: Influence-Aware Coordinated Traffic Signal Control for Traffic Signal Malfunctions

Urban traffic is subject to disruptions that cause extended waiting time and safety issues at signalized intersections. While numerous studies have addressed the issue of intelligent traffic systems in the context of various disturbances, traffic signal malfunction, a common real-world occurrence with significant repercussions, has received comparatively limited attention. The primary objective of this research is to mitigate the adverse effects of traffic signal malfunction, such as traffic congestion and collision, by optimizing the control of neighboring functioning signals. To achieve this goal, this paper presents a novel traffic signal control framework (MalLight), which leverages an Influence-aware State Aggregation Module (ISAM) and an Influence-aware Reward Aggregation Module (IRAM) to achieve coordinated control of surrounding traffic signals. To the best of our knowledge, this study pioneers the application of a Reinforcement Learning(RL)-based approach to address the challenges posed by traffic signal malfunction. Empirical investigations conducted on real-world datasets substantiate the superior performance of our proposed methodology over conventional and deep learning-based alternatives in the presence of signal malfunction, with reduction of throughput alleviated by as much as 48.6$\%$.

Updated: 2024-08-20 04:43:50

标题: MalLight：面向交通信号故障的影响感知协调交通信号控制

摘要: 城市交通受到干扰，导致信号交叉口等待时间延长和安全问题。虽然许多研究已经探讨了智能交通系统在各种干扰情况下的问题，但交通信号故障作为一种常见的现实世界事件，具有重要影响，却受到相对有限的关注。本研究的主要目标是通过优化周围运行信号的控制，减轻交通信号故障带来的不利影响，如交通拥堵和碰撞。为实现这一目标，本文提出了一个新颖的交通信号控制框架（MalLight），利用一个影响感知状态聚合模块（ISAM）和一个影响感知奖励聚合模块（IRAM）来实现周围交通信号的协调控制。据我们所知，这项研究开创了应用基于强化学习（RL）的方法来解决交通信号故障带来的挑战。对现实世界数据集进行的实证研究证实了我们提出的方法在信号故障情况下的优越性能，通过减少吞吐量，可以减轻48.6％。

更新时间: 2024-08-20 04:43:50

领域: cs.AI

下载: http://arxiv.org/abs/2408.09768v2

NutrifyAI: An AI-Powered System for Real-Time Food Detection, Nutritional Analysis, and Personalized Meal Recommendations

With diet and nutrition apps reaching 1.4 billion users in 2022 [1], it's no surprise that health apps like MyFitnessPal, Noom, and Calorie Counter, are surging in popularity. However, one major setback [2] of nearly all nutrition applications is that users must enter food data manually, which is time-consuming and tedious. Thus, there has been an increasing demand for applications that can accurately identify food items, analyze their nutritional content, and offer dietary recommendations in real-time. This paper introduces a comprehensive system that combines advanced computer vision techniques with nutrition analysis, implemented in a versatile mobile and web application. The system is divided into three key components: 1) food detection using the YOLOv8 model, 2) nutrient analysis via the Edamam Nutrition Analysis API, and 3) personalized meal recommendations using the Edamam Meal Planning and Recipe Search APIs. Designed for both mobile and web platforms, the application ensures fast processing times with an intuitive user interface, with features such as data visualizations using Chart.js, a login system, and personalized settings for dietary preferences, allergies, and cuisine choices. Preliminary results showcase the system's effectiveness, making it a valuable tool for users to make informed dietary decisions.

Updated: 2024-08-20 04:18:53

标题: NutrifyAI：一种基于人工智能的实时食物检测、营养分析和个性化膳食建议系统

摘要: 随着2022年饮食和营养应用用户达到14亿[1]，毫无疑问，像MyFitnessPal、Noom和卡路里计数器等健康应用正变得越来越受欢迎。然而，几乎所有营养应用的一个主要缺点[2]是用户必须手动输入食物数据，这是耗时且乏味的。因此，对于能够准确识别食物项目、分析其营养含量并实时提供饮食建议的应用程序需求不断增加。本文介绍了一个综合系统，结合了先进的计算机视觉技术和营养分析，实现了一个多功能的移动和Web应用。该系统分为三个关键组件：1）使用YOLOv8模型进行食物检测，2）通过Edamam营养分析API进行营养分析，3）使用Edamam餐饮规划和食谱搜索API进行个性化餐饮建议。该应用程序设计用于移动和Web平台，确保快速处理时间和直观用户界面，具有诸如使用Chart.js的数据可视化、登录系统以及个性化设置饮食偏好、过敏反应和烹饪选择等功能。初步结果展示了该系统的有效性，使其成为用户做出明智饮食决策的有价值工具。

更新时间: 2024-08-20 04:18:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.10532v1

An Efficient Real-Time Object Detection Framework on Resource-Constricted Hardware Devices via Software and Hardware Co-design

The fast development of object detection techniques has attracted attention to developing efficient Deep Neural Networks (DNNs). However, the current state-of-the-art DNN models can not provide a balanced solution among accuracy, speed, and model size. This paper proposes an efficient real-time object detection framework on resource-constrained hardware devices through hardware and software co-design. The Tensor Train (TT) decomposition is proposed for compressing the YOLOv5 model. By unitizing the unique characteristics given by the TT decomposition, we develop an efficient hardware accelerator based on FPGA devices. Experimental results show that the proposed method can significantly reduce the model size and improve the execution time.

Updated: 2024-08-20 04:15:50

标题: 一种基于软硬件协同设计的资源受限硬件设备上高效实时目标检测框架

摘要: 目前，目标检测技术的快速发展引起了对开发高效深度神经网络（DNNs）的关注。然而，当前最先进的DNN模型无法在精度、速度和模型大小之间提供平衡的解决方案。本文提出了一种通过硬件和软件协同设计在资源受限硬件设备上实现高效实时目标检测的框架。提出了张量列车（TT）分解用于压缩YOLOv5模型。通过利用TT分解给出的独特特性，我们开发了基于FPGA设备的高效硬件加速器。实验结果表明，所提出的方法可以显著减小模型大小并提高执行时间。

更新时间: 2024-08-20 04:15:50

领域: cs.LG,68T10, 65K10

下载: http://arxiv.org/abs/2408.01534v2

New Job, New Gender? Measuring the Social Bias in Image Generation Models

Image generation models can generate or edit images from a given text. Recent advancements in image generation technology, exemplified by DALL-E and Midjourney, have been groundbreaking. These advanced models, despite their impressive capabilities, are often trained on massive Internet datasets, making them susceptible to generating content that perpetuates social stereotypes and biases, which can lead to severe consequences. Prior research on assessing bias within image generation models suffers from several shortcomings, including limited accuracy, reliance on extensive human labor, and lack of comprehensive analysis. In this paper, we propose BiasPainter, a novel evaluation framework that can accurately, automatically and comprehensively trigger social bias in image generation models. BiasPainter uses a diverse range of seed images of individuals and prompts the image generation models to edit these images using gender, race, and age-neutral queries. These queries span 62 professions, 39 activities, 57 types of objects, and 70 personality traits. The framework then compares the edited images to the original seed images, focusing on the significant changes related to gender, race, and age. BiasPainter adopts a key insight that these characteristics should not be modified when subjected to neutral prompts. Built upon this design, BiasPainter can trigger the social bias and evaluate the fairness of image generation models. We use BiasPainter to evaluate six widely-used image generation models, such as stable diffusion and Midjourney. Experimental results show that BiasPainter can successfully trigger social bias in image generation models. According to our human evaluation, BiasPainter can achieve 90.8% accuracy on automatic bias detection, which is significantly higher than the results reported in previous work.

Updated: 2024-08-20 04:11:26

标题: 新工作，新性别？衡量图像生成模型中的社会偏见

摘要: 图像生成模型可以从给定的文本中生成或编辑图像。最近图像生成技术的进步，例如DALL-E和Midjourney，具有开创性意义。这些先进的模型，尽管具有令人印象深刻的能力，通常是在庞大的互联网数据集上进行训练，这使它们容易生成持续社会刻板印象和偏见的内容，可能导致严重后果。先前关于评估图像生成模型内偏见的研究存在一些缺点，包括准确性受限、依赖大量人力劳动和缺乏全面分析。在本文中，我们提出了BiasPainter，一个新颖的评估框架，可以准确、自动和全面地触发图像生成模型中的社会偏见。BiasPainter使用各种不同的个人种子图像，并促使图像生成模型使用性别、种族和年龄中立的查询来编辑这些图像。这些查询涵盖了62种职业、39种活动、57种物体类型和70种个性特征。然后该框架将编辑后的图像与原始种子图像进行比较，重点关注与性别、种族和年龄相关的显著变化。BiasPainter采用了一个关键的观点，即在接受中立提示时这些特征不应被修改。基于这一设计，BiasPainter可以触发社会偏见并评估图像生成模型的公平性。我们使用BiasPainter评估了六种广泛使用的图像生成模型，如稳定扩散和Midjourney。实验结果表明BiasPainter可以成功触发图像生成模型中的社会偏见。根据我们的人工评估，BiasPainter在自动偏见检测方面可以达到90.8%的准确率，这显著高于先前工作中报告的结果。

更新时间: 2024-08-20 04:11:26

领域: cs.SE,cs.AI,cs.CL,cs.CV,cs.MM

下载: http://arxiv.org/abs/2401.00763v3

EdgeNAT: Transformer for Efficient Edge Detection

Transformers, renowned for their powerful feature extraction capabilities, have played an increasingly prominent role in various vision tasks. Especially, recent advancements present transformer with hierarchical structures such as Dilated Neighborhood Attention Transformer (DiNAT), demonstrating outstanding ability to efficiently capture both global and local features. However, transformers' application in edge detection has not been fully exploited. In this paper, we propose EdgeNAT, a one-stage transformer-based edge detector with DiNAT as the encoder, capable of extracting object boundaries and meaningful edges both accurately and efficiently. On the one hand, EdgeNAT captures global contextual information and detailed local cues with DiNAT, on the other hand, it enhances feature representation with a novel SCAF-MLA decoder by utilizing both inter-spatial and inter-channel relationships of feature maps. Extensive experiments on multiple datasets show that our method achieves state-of-the-art performance on both RGB and depth images. Notably, on the widely used BSDS500 dataset, our L model achieves impressive performances, with ODS F-measure and OIS F-measure of 86.0%, 87.6% for multi-scale input,and 84.9%, and 86.3% for single-scale input, surpassing the current state-of-the-art EDTER by 1.2%, 1.1%, 1.7%, and 1.6%, respectively. Moreover, as for throughput, our approach runs at 20.87 FPS on RTX 4090 GPU with single-scale input. The code for our method will be released soon.

Updated: 2024-08-20 04:04:22

标题: EdgeNAT：有效边缘检测的转换器

摘要: Transformers因其强大的特征提取能力而闻名，在各种视觉任务中扮演着越来越重要的角色。特别是，最近的进展将transformer与具有层次结构的模型（如Dilated Neighborhood Attention Transformer，简称DiNAT）结合，展现出出色的能力，能够高效地捕获全局和局部特征。然而，transformers在边缘检测中的应用尚未被充分利用。本文提出了EdgeNAT，一种基于transformer的单阶段边缘检测器，使用DiNAT作为编码器，能够准确高效地提取物体边界和有意义的边缘。一方面，EdgeNAT通过DiNAT捕获全局上下文信息和详细的局部线索，另一方面，它利用一种新颖的SCAF-MLA解码器增强特征表示，利用特征图的空间间和通道间的关系。在多个数据集上进行的大量实验表明，我们的方法在RGB和深度图像上均取得了最先进的性能。值得注意的是，在广泛使用的BSDS500数据集上，我们的L模型取得了出色的表现，多尺度输入的ODS F-measure和OIS F-measure分别为86.0％、87.6％，单尺度输入的分别为84.9％和86.3％，分别超过了当前最先进的EDTER 1.2％、1.1％、1.7％和1.6％。此外，就吞吐量而言，我们的方法在RTX 4090 GPU上以单尺度输入运行速度为20.87 FPS。我们的方法的代码将很快发布。

更新时间: 2024-08-20 04:04:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.10527v1

XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition

Contextualized ASR models have been demonstrated to effectively improve the recognition accuracy of uncommon phrases when a predefined phrase list is available. However, these models often struggle with bilingual settings, which are prevalent in code-switching speech recognition. In this study, we make the initial attempt to address this challenge by introducing a Cross-lingual Contextual Biasing(XCB) module. Specifically, we augment a pre-trained ASR model for the dominant language by integrating an auxiliary language biasing module and a supplementary language-specific loss, aimed at enhancing the recognition of phrases in the secondary language. Experimental results conducted on our in-house code-switching dataset have validated the efficacy of our approach, demonstrating significant improvements in the recognition of biasing phrases in the secondary language, even without any additional inference overhead. Additionally, our proposed system exhibits both efficiency and generalization when is applied by the unseen ASRU-2019 test set.

Updated: 2024-08-20 04:00:19

标题: XCB：一种有效的上下文偏置方法，用于偏置跨语言短语在语音识别中

摘要: 上下文化的自动语音识别模型已被证明在预定义短语列表可用时有效地提高罕见短语的识别准确率。然而，这些模型在双语环境下往往面临困难，而双语环境在代码切换语音识别中很常见。在这项研究中，我们通过引入跨语言上下文偏置(XCB)模块，首次尝试解决这一挑战。具体来说，我们通过集成辅助语言偏置模块和补充语言特定损失来增强主导语言的预训练ASR模型，旨在提高次要语言短语的识别。我们在内部代码切换数据集上进行的实验结果验证了我们方法的有效性，展示了在次要语言中偏置短语的识别方面取得的显著改进，甚至不需要额外的推理开销。此外，我们提出的系统在应用于未见的ASRU-2019测试集时表现出高效性和泛化性。

更新时间: 2024-08-20 04:00:19

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2408.10524v1

Integrating Multi-Modal Input Token Mixer Into Mamba-Based Decision Models: Decision MetaMamba

Return-Conditioned Transformer Decision Models (RCTDM) have demonstrated the potential to enhance transformer performance in offline reinforcement learning by replacing rewards in the input sequence with returns-to-go. However, to achieve the goal of learning an optimal policy from offline datasets composed of limited suboptimal trajectories, RCTDM required alternative methods. One prominent approach, trajectory stitching, was designed to enable the network to combine multiple trajectories to find the optimal path. To implement this using only transformers without auxiliary networks, it was necessary to shorten the input sequence length to better capture the Markov property in reinforcement learnings. This, however, introduced a trade-off, as it reduced the accuracy of action inference. Our study introduces a model named Decision MetaMamba to resolve these challenges. DMM employs an input token mixer to extract patterns from short sequences and uses a State Space Model (SSM) to selectively combine information from relatively distant sequences. Inspired by Metaformer, this structure was developed by transforming Mamba's input layer into various multi-modal layers. Fortunately, with the advent of Mamba, implemented using parallel selective scanning, we achieved a high-performance sequence model capable of replacing transformers. Based on these innovations, DMM demonstrated excellent performance across various datasets in offline RL, confirming that models using SSM can improve performance by domain-specific alterations of the input layer. Additionally, it maintained its performance even in lightweight models with fewer parameters. These results suggest that decision models based on SSM can pave the way for improved outcomes in future developments.

Updated: 2024-08-20 03:35:28

标题: 将多模态输入令牌混合器整合到基于Mamba的决策模型中：决策MetaMamba

摘要: 返回条件变压器决策模型（RCTDM）已经证明可以通过将输入序列中的奖励替换为返回值来增强离线强化学习中变压器的性能。然而，为了实现从由有限次优轨迹组成的离线数据集中学习最优策略的目标，RCTDM需要替代方法。其中一种突出的方法，轨迹拼接，旨在使网络能够组合多个轨迹以找到最佳路径。为了仅使用变压器而不使用辅助网络来实现这一目标，有必要缩短输入序列长度以更好地捕捉强化学习中的马尔可夫特性。然而，这引入了一个权衡，因为它降低了行动推断的准确性。我们的研究引入了一个名为Decision MetaMamba（DMM）的模型来解决这些挑战。DMM利用输入令牌混合器从短序列中提取模式，并使用状态空间模型（SSM）有选择地组合来自相对遥远序列的信息。受Metaformer的启发，这种结构通过将Mamba的输入层转换为各种多模态层来发展。幸运的是，随着使用并行选择扫描实现的Mamba的出现，我们实现了一种高性能序列模型，能够替代变压器。基于这些创新，DMM在离线RL中各种数据集上表现出色，证实使用SSM的模型可以通过对输入层进行领域特定修改来提高性能。此外，即使在参数较少的轻量级模型中，它也保持了其性能。这些结果表明，基于SSM的决策模型可以为未来发展带来改善的结果。

更新时间: 2024-08-20 03:35:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.10517v1

Data Augmentation Integrating Dialogue Flow and Style to Adapt Spoken Dialogue Systems to Low-Resource User Groups

This study addresses the interaction challenges encountered by spoken dialogue systems (SDSs) when engaging with users who exhibit distinct conversational behaviors, particularly minors, in scenarios where data are scarce. We propose a novel data augmentation framework to enhance SDS performance for user groups with limited resources. Our approach leverages a large language model (LLM) to extract speaker styles and a pre-trained language model (PLM) to simulate dialogue act history. This method generates enriched and personalized dialogue data, facilitating improved interactions with unique user demographics. Extensive experiments validate the efficacy of our methodology, highlighting its potential to foster the development of more adaptive and inclusive dialogue systems.

Updated: 2024-08-20 03:33:04

标题: 数据增强：融合对话流和风格以适应低资源用户群的口语对话系统

摘要: 这项研究探讨了口语对话系统（SDSs）在与表现出不同会话行为的用户，特别是未成年人，在数据稀缺情况下进行互动时遇到的挑战。我们提出了一个新颖的数据增强框架，以增强对资源有限用户群体的SDS性能。我们的方法利用大型语言模型（LLM）提取说话者风格，利用预训练语言模型（PLM）模拟对话行为历史。这种方法生成了丰富和个性化的对话数据，有助于改进与独特用户群体的互动。广泛的实验验证了我们方法的有效性，突显了其促进更具适应性和包容性对话系统发展的潜力。

更新时间: 2024-08-20 03:33:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.10516v1

Response Style Characterization for Repeated Measures Using the Visual Analogue Scale

Self-report measures (e.g., Likert scales) are widely used to evaluate subjective health perceptions. Recently, the visual analog scale (VAS), a slider-based scale, has become popular owing to its ability to precisely and easily assess how people feel. These data can be influenced by the response style (RS), a user-dependent systematic tendency that occurs regardless of questionnaire instructions. Despite its importance, especially in between-individual analysis, little attention has been paid to handling the RS in the VAS (denoted as response profile (RP)), as it is mainly used for within-individual monitoring and is less affected by RP. However, VAS measurements often require repeated self-reports of the same questionnaire items, making it difficult to apply conventional methods on a Likert scale. In this study, we developed a novel RP characterization method for various types of repeatedly measured VAS data. This approach involves the modeling of RP as distributional parameters ${\theta}$ through a mixture of RS-like distributions, and addressing the issue of unbalanced data through bootstrap sampling for treating repeated measures. We assessed the effectiveness of the proposed method using simulated pseudo-data and an actual dataset from an empirical study. The assessment of parameter recovery showed that our method accurately estimated the RP parameter ${\theta}$, demonstrating its robustness. Moreover, applying our method to an actual VAS dataset revealed the presence of individual RP heterogeneity, even in repeated VAS measurements, similar to the findings of the Likert scale. Our proposed method enables RP heterogeneity-aware VAS data analysis, similar to Likert-scale data analysis.

Updated: 2024-08-20 03:32:55

标题: 使用视觉模拟量表对重复测量进行响应风格特征化

摘要: 自我报告量表（例如力克特量表）被广泛用于评估主观健康感知。最近，基于滑块的视觉模拟量表（VAS）因其能够准确且轻松地评估人们的感受而变得流行。这些数据可能会受到响应风格（RS）的影响，这是一种用户依赖性的系统倾向，不受问卷指导影响而发生。尽管在个体间分析中尤为重要，但对于处理VAS中的RS（表示为响应概况（RP）），目前却没有受到太多关注，因为它主要用于个体内监测，并且受RP的影响较小。然而，VAS测量通常需要重复自我报告相同的问卷项目，这使得难以在力克特量表上应用传统方法。在本研究中，我们为不同类型的重复测量的VAS数据开发了一种新的RP特征化方法。这种方法通过将RP建模为分布参数${\theta}$，通过RS类似分布的混合，并通过Bootstrap采样处理不平衡数据的方法来解决这个问题。我们使用模拟伪数据和来自实证研究的实际数据集评估了所提出的方法的有效性。参数恢复的评估显示，我们的方法准确估计了RP参数${\theta}$，证明了其稳健性。此外，将我们的方法应用于实际VAS数据集揭示了个体RP异质性的存在，即使在重复的VAS测量中，也类似于力克特量表的发现。我们提出的方法使RP异质性感知VAS数据分析成为可能，类似于力克特量表数据分析。

更新时间: 2024-08-20 03:32:55

领域: stat.ME,cs.AI

下载: http://arxiv.org/abs/2403.10136v2

Approximate Estimation of High-dimension Execution Skill for Dynamic Agents in Continuous Domains

In many real-world continuous action domains, human agents must decide which actions to attempt and then execute those actions to the best of their ability. However, humans cannot execute actions without error. Human performance in these domains can potentially be improved by the use of AI to aid in decision-making. One requirement for an AI to correctly reason about what actions a human agent should attempt is a correct model of that human's execution error, or skill. Recent work has demonstrated successful techniques for estimating this execution error with various types of agents across different domains. However, this previous work made several assumptions that limit the application of these ideas to real-world settings. First, previous work assumed that the error distributions were symmetric normal, which meant that only a single parameter had to be estimated. In reality, agent error distributions might exhibit arbitrary shapes and should be modeled more flexibly. Second, it was assumed that the execution error of the agent remained constant across all observations. Especially for human agents, execution error changes over time, and this must be taken into account to obtain effective estimates. To overcome both of these shortcomings, we propose a novel particle-filter-based estimator for this problem. After describing the details of this approximate estimator, we experimentally explore various design decisions and compare performance with previous skill estimators in a variety of settings to showcase the improvements. The outcome is an estimator capable of generating more realistic, time-varying execution skill estimates of agents, which can then be used to assist agents in making better decisions and improve their overall performance.

Updated: 2024-08-20 03:27:09

标题: 在连续领域中动态代理的高维执行技能的近似估计

摘要: 在许多现实世界的连续动作领域中，人类代理必须决定尝试哪些动作，然后尽力执行这些动作。然而，人类无法完全没有错误地执行动作。通过使用人工智能来辅助决策，可以潜在地提高这些领域中人类的表现。AI正确推理人类代理应该尝试哪些动作的一个要求是正确模拟该人类的执行错误或技能。最近的研究表明，成功地估计了跨不同领域的各种类型的代理的执行错误的技术。然而，这些先前的工作做出了一些假设，限制了这些想法在现实世界环境中的应用。首先，先前的工作假设错误分布是对称正态的，这意味着只需要估计一个参数。实际上，代理错误分布可能呈现任意形状，应该更灵活地建模。其次，假设代理的执行错误在所有观察中保持不变。特别是对于人类代理，执行错误会随时间变化，这必须考虑到，以获得有效的估计。为了克服这些缺点，我们提出了一个基于粒子滤波器的新型估计器来解决这个问题。在描述这个近似估计器的细节之后，我们通过在各种环境中实验性地探讨各种设计决策，并将其与先前的技能估计器的性能进行比较，以展示改进。结果是一个能够生成更现实的、时变的代理执行技能估计的估计器，这样就可以帮助代理做出更好的决策，并提高他们的整体表现。

更新时间: 2024-08-20 03:27:09

领域: cs.AI

下载: http://arxiv.org/abs/2408.10512v1

Single-cell Curriculum Learning-based Deep Graph Embedding Clustering

The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies enables the investigation of cellular-level tissue heterogeneity. Cell annotation significantly contributes to the extensive downstream analysis of scRNA-seq data. However, The analysis of scRNA-seq for biological inference presents challenges owing to its intricate and indeterminate data distribution, characterized by a substantial volume and a high frequency of dropout events. Furthermore, the quality of training samples varies greatly, and the performance of the popular scRNA-seq data clustering solution GNN could be harmed by two types of low-quality training nodes: 1) nodes on the boundary; 2) nodes that contribute little additional information to the graph. To address these problems, we propose a single-cell curriculum learning-based deep graph embedding clustering (scCLG). We first propose a Chebyshev graph convolutional autoencoder with multi-decoder (ChebAE) that combines three optimization objectives corresponding to three decoders, including topology reconstruction loss of cell graphs, zero-inflated negative binomial (ZINB) loss, and clustering loss, to learn cell-cell topology representation. Meanwhile, we employ a selective training strategy to train GNN based on the features and entropy of nodes and prune the difficult nodes based on the difficulty scores to keep the high-quality graph. Empirical results on a variety of gene expression datasets show that our model outperforms state-of-the-art methods.

Updated: 2024-08-20 03:20:13

标题: 基于单细胞课程学习的深度图嵌入聚类

摘要: 单细胞RNA测序(scRNA-seq)技术的快速发展使得可以研究细胞水平组织异质性。细胞注释对于scRNA-seq数据的广泛下游分析起着重要作用。然而，由于其错综复杂和不确定的数据分布，scRNA-seq数据的生物推断分析存在挑战，其特点是数据量大且存在较高频率的缺失事件。此外，训练样本的质量差异很大，流行的scRNA-seq数据聚类解决方案GNN的性能可能会受到两种低质量训练节点的影响：1）边界上的节点；2）对图中提供少量额外信息的节点。为了解决这些问题，我们提出了一种基于单细胞课程学习的深度图嵌入聚类(scCLG)方法。我们首先提出了一个具有多解码器的Chebyshev图卷积自编码器(ChebAE)，结合三个优化目标，包括细胞图的拓扑重建损失、零膨胀负二项(ZINB)损失和聚类损失，来学习细胞-细胞拓扑表示。同时，我们采用选择性训练策略基于节点的特征和熵来训练GNN，并基于难度评分修剪难处理的节点，以保持高质量的图。对各种基因表达数据集的实证结果表明，我们的模型优于最先进的方法。

更新时间: 2024-08-20 03:20:13

领域: cs.LG,cs.AI,q-bio.GN

下载: http://arxiv.org/abs/2408.10511v1

Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation

Designing preference elicitation (PE) methodologies that can quickly ascertain a user's top item preferences in a cold-start setting is a key challenge for building effective and personalized conversational recommendation (ConvRec) systems. While large language models (LLMs) enable fully natural language (NL) PE dialogues, we hypothesize that monolithic LLM NL-PE approaches lack the multi-turn, decision-theoretic reasoning required to effectively balance the exploration and exploitation of user preferences towards an arbitrary item set. In contrast, traditional Bayesian optimization PE methods define theoretically optimal PE strategies, but cannot generate arbitrary NL queries or reason over content in NL item descriptions -- requiring users to express preferences via ratings or comparisons of unfamiliar items. To overcome the limitations of both approaches, we formulate NL-PE in a Bayesian Optimization (BO) framework that seeks to actively elicit NL feedback to identify the best recommendation. Key challenges in generalizing BO to deal with natural language feedback include determining: (a) how to leverage LLMs to model the likelihood of NL preference feedback as a function of item utilities, and (b) how to design an acquisition function for NL BO that can elicit preferences in the infinite space of language. We demonstrate our framework in a novel NL-PE algorithm, PEBOL, which uses: 1) Natural Language Inference (NLI) between user preference utterances and NL item descriptions to maintain Bayesian preference beliefs, and 2) BO strategies such as Thompson Sampling (TS) and Upper Confidence Bound (UCB) to steer LLM query generation. We numerically evaluate our methods in controlled simulations, finding that after 10 turns of dialogue, PEBOL can achieve an MRR@10 of up to 0.27 compared to the best monolithic LLM baseline's MRR@10 of 0.17, despite relying on earlier and smaller LLMs.

Updated: 2024-08-20 03:15:07

标题: 基于LLM的采集函数的贝叶斯优化在自然语言偏好引导中的应用

摘要: 设计偏好获取（PE）方法，可以在冷启动环境中快速确定用户的首选项目偏好，这是建立有效和个性化对话推荐（ConvRec）系统的关键挑战。虽然大型语言模型（LLMs）可以实现完全自然语言（NL）的PE对话，但我们假设单片式LLM NL-PE方法缺乏多轮、决策理论推理，无法有效平衡对用户偏好的探索和利用，以实现对任意项目集的偏好。相反，传统的贝叶斯优化PE方法定义了理论上最佳的PE策略，但无法生成任意NL查询或对NL项目描述进行推理，需要用户通过对陌生项目的评级或比较来表达偏好。为了克服这两种方法的局限性，我们在贝叶斯优化（BO）框架中制定NL-PE，旨在积极引导NL反馈，以确定最佳推荐。将BO泛化以处理自然语言反馈的关键挑战包括确定：（a）如何利用LLMs来建模NL偏好反馈的可能性，作为项目效用的函数，以及（b）如何设计NL BO的获取函数，可以在语言的无限空间中引导偏好。我们在一种新颖的NL-PE算法PEBOL中展示了我们的框架，该算法使用：1）用户偏好话语和NL项目描述之间的自然语言推断（NLI）来维护贝叶斯偏好信念，2）BO策略，如汤普森抽样（TS）和上置信界（UCB）来引导LLM查询生成。我们在受控模拟中对我们的方法进行了数值评估，发现在对话10轮之后，与最佳单片式LLM基线的MRR@10为0.17相比，PEBOL可以实现高达0.27的MRR@10，尽管依赖于较早和较小的LLMs。

更新时间: 2024-08-20 03:15:07

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.00981v2

HYDEN: Hyperbolic Density Representations for Medical Images and Reports

In light of the inherent entailment relations between images and text, hyperbolic point vector embeddings, leveraging the hierarchical modeling advantages of hyperbolic space, have been utilized for visual semantic representation learning. However, point vector embedding approaches fail to address the issue of semantic uncertainty, where an image may have multiple interpretations, and text may refer to different images, a phenomenon particularly prevalent in the medical domain. Therefor, we propose \textbf{HYDEN}, a novel hyperbolic density embedding based image-text representation learning approach tailored for specific medical domain data. This method integrates text-aware local features alongside global features from images, mapping image-text features to density features in hyperbolic space via using hyperbolic pseudo-Gaussian distributions. An encapsulation loss function is employed to model the partial order relations between image-text density distributions. Experimental results demonstrate the interpretability of our approach and its superior performance compared to the baseline methods across various zero-shot tasks and different datasets.

Updated: 2024-08-20 03:13:41

标题: HYDEN：医学图像和报告的双曲密度表示

摘要: 鉴于图像和文本之间固有的包含关系，利用双曲点向量嵌入，充分利用双曲空间的分层建模优势，已被用于视觉语义表示学习。然而，点向量嵌入方法未能解决语义不确定性问题，其中一幅图像可能有多种解释，文本可能指的是不同的图像，这种现象在医学领域尤为普遍。因此，我们提出了一种新颖的基于双曲密度嵌入的图像-文本表示学习方法，专门针对特定的医学领域数据。该方法将文本感知局部特征与图像的全局特征结合起来，通过使用双曲伪高斯分布将图像-文本特征映射到双曲空间中的密度特征。采用封装损失函数来建模图像-文本密度分布之间的偏序关系。实验结果表明，与基准方法相比，我们的方法在各种零样本任务和不同数据集上表现出更好的可解释性和性能。

更新时间: 2024-08-20 03:13:41

领域: cs.AI,cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2408.09715v2

QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

Prompt engineering has demonstrated remarkable success in enhancing the performance of large language models (LLMs) across diverse tasks. However, most existing prompt optimization methods only focus on the task-level performance, overlooking the importance of query-preferred prompts, which leads to suboptimal performances. Additionally, these methods rely heavily on frequent interactions with LLMs to obtain feedback for guiding the optimization process, incurring substantial redundant interaction costs. In this paper, we introduce Query-dependent Prompt Optimization (QPO), which leverages multi-loop offline reinforcement learning to iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries, thus significantly improving the prompting effect on the large target LLM. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks, thereby circumventing the expenses of online interactions. Furthermore, we continuously augment the offline dataset with the generated prompts in each loop, as the prompts from the fine-tuned model are supposed to outperform the source prompts in the original dataset. These iterative loops bootstrap the model towards generating optimal prompts. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.

Updated: 2024-08-20 03:06:48

标题: QPO：基于多循环离线强化学习的查询相关提示优化

摘要: 提示工程在增强大型语言模型（LLMs）在各种任务中的表现方面取得了显著成功。然而，大多数现有的提示优化方法仅关注任务级别的性能，忽视了查询首选提示的重要性，导致性能不佳。此外，这些方法严重依赖频繁与LLMs的交互以获取指导优化过程的反馈，产生了相当大量的冗余交互成本。在本文中，我们介绍了Query-dependent Prompt Optimization (QPO)，利用多环离线强化学习来迭代微调一个小型预训练语言模型，以生成针对输入查询定制的最佳提示，从而显著改善对大型目标LLM的提示效果。我们从离线提示演示数据中获得了见解，这些数据已经大量存在于在开源任务上对各种提示进行基准测试的副产品中，从而规避了在线交互的费用。此外，我们在每个循环中不断增加离线数据集，因为经过微调模型生成的提示应该优于原始数据集中的源提示。这些迭代循环将模型引导生成最佳提示。在各种LLM规模和多样的NLP和数学任务上的实验表明，我们的方法在零射和少射场景中的有效性和成本效益。

更新时间: 2024-08-20 03:06:48

领域: cs.AI

下载: http://arxiv.org/abs/2408.10504v1

Adaptive Knowledge Distillation for Classification of Hand Images using Explainable Vision Transformers

Assessing the forensic value of hand images involves the use of unique features and patterns present in an individual's hand. The human hand has distinct characteristics, such as the pattern of veins, fingerprints, and the geometry of the hand itself. This paper investigates the use of vision transformers (ViTs) for classification of hand images. We use explainability tools to explore the internal representations of ViTs and assess their impact on the model outputs. Utilizing the internal understanding of ViTs, we introduce distillation methods that allow a student model to adaptively extract knowledge from a teacher model while learning on data of a different domain to prevent catastrophic forgetting. Two publicly available hand image datasets are used to conduct a series of experiments to evaluate performance of the ViTs and our proposed adaptive distillation methods. The experimental results demonstrate that ViT models significantly outperform traditional machine learning methods and the internal states of ViTs are useful for explaining the model outputs in the classification task. By averting catastrophic forgetting, our distillation methods achieve excellent performance on data from both source and target domains, particularly when these two domains exhibit significant dissimilarity. The proposed approaches therefore can be developed and implemented effectively for real-world applications such as access control, identity verification, and authentication systems.

Updated: 2024-08-20 03:03:56

标题: 使用可解释视觉Transformer进行手部图像分类的自适应知识蒸馏

摘要: 评估手部图像的法庭价值涉及使用个人手部中存在的独特特征和图案。人手具有独特的特征，如静脉图案、指纹和手部几何形态。本文研究了视觉变换器（ViTs）用于手部图像分类的应用。我们使用可解释性工具来探索ViTs的内部表示，并评估其对模型输出的影响。利用ViTs的内部理解，我们引入蒸馏方法，允许学生模型在学习来自不同领域的数据时，从教师模型中自适应地提取知识，以防止灾难性遗忘。我们使用两个公开可用的手部图像数据集进行一系列实验，以评估ViTs和我们提出的自适应蒸馏方法的性能。实验结果表明，ViT模型明显优于传统的机器学习方法，ViTs的内部状态对于解释分类任务中的模型输出是有用的。通过避免灾难性遗忘，我们的蒸馏方法在源领域和目标领域的数据上均表现出色，尤其是当这两个领域存在显著差异时。因此，所提出的方法可以有效地用于实际应用，如访问控制、身份验证和认证系统。

更新时间: 2024-08-20 03:03:56

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.10503v1

PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities

Financial time series modeling is crucial for understanding and predicting market behaviors but faces challenges such as non-linearity, non-stationarity, and high noise levels. Traditional models struggle to capture complex patterns due to these issues, compounded by limitations in computational resources and model capacity. Inspired by the success of large language models in NLP, we introduce $\textbf{PLUTUS}$, a $\textbf{P}$re-trained $\textbf{L}$arge $\textbf{U}$nified $\textbf{T}$ransformer-based model that $\textbf{U}$nveils regularities in financial time $\textbf{S}$eries. PLUTUS uses an invertible embedding module with contrastive learning and autoencoder techniques to create an approximate one-to-one mapping between raw data and patch embeddings. TimeFormer, an attention based architecture, forms the core of PLUTUS, effectively modeling high-noise time series. We incorporate a novel attention mechanisms to capture features across both variable and temporal dimensions. PLUTUS is pre-trained on an unprecedented dataset of 100 billion observations, designed to thrive in noisy financial environments. To our knowledge, PLUTUS is the first open-source, large-scale, pre-trained financial time series model with over one billion parameters. It achieves state-of-the-art performance in various tasks, demonstrating strong transferability and establishing a robust foundational model for finance. Our research provides technical guidance for pre-training financial time series data, setting a new standard in the field.

Updated: 2024-08-20 02:59:16

标题: 普鲁特斯：一个经过充分预训练的大型统一Transformer可以揭示金融时间序列的规律性

摘要: 金融时间序列建模对于理解和预测市场行为至关重要，但面临着非线性、非平稳和高噪音水平等挑战。传统模型由于这些问题以及计算资源和模型容量的限制而难以捕捉复杂模式。受自然语言处理中大型语言模型成功的启发，我们引入了$\textbf{PLUTUS}$，一个基于$\textbf{P}$re-trained $\textbf{L}$arge $\textbf{U}$nified $\textbf{T}$ransformer的模型，揭示了金融时间$\textbf{S}$eries中的规律性。PLUTUS使用可逆嵌入模块，结合对比学习和自编码器技术，创建了原始数据和补丁嵌入之间的近似一对一映射。TimeFormer作为基于注意力的架构，构成了PLUTUS的核心，有效地建模高噪音时间序列。我们结合了一种新颖的注意力机制，以捕捉跨变量和时间维度的特征。PLUTUS在一个前所未有的包含1000亿观测值的数据集上进行了预训练，旨在在嘈杂的金融环境中取得成功。据我们所知，PLUTUS是第一个开源的、大规模的、预训练的金融时间序列模型，拥有超过10亿个参数。它在各种任务中取得了最先进的性能，展示了强大的可迁移性，并为金融领域建立了一个稳健的基础模型。我们的研究为预训练金融时间序列数据提供了技术指导，树立了该领域的新标准。

更新时间: 2024-08-20 02:59:16

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.10111v2

ARAP: Demystifying Anti Runtime Analysis Code in Android Apps

With the continuous growth in the usage of Android apps, ensuring their security has become critically important. An increasing number of malicious apps adopt anti-analysis techniques to evade security measures. Although some research has started to consider anti-runtime analysis (ARA), it is unfortunate that they have not systematically examined ARA techniques. Furthermore, the rapid evolution of ARA technology exacerbates the issue, leading to increasingly inaccurate analysis results. To effectively analyze Android apps, understanding their adopted ARA techniques is necessary. However, no systematic investigation has been conducted thus far. In this paper, we conduct the first systematic study of the ARA implementations in a wide range of 117,171 Android apps (including both malicious and benign ones) collected between 2016 and 2023. Additionally, we propose a specific investigation tool named ARAP to assist this study by leveraging both static and dynamic analysis. According to the evaluation results, ARAP not only effectively identifies the ARA implementations in Android apps but also reveals many important findings. For instance, almost all apps have implemented at least one category of ARA technology (99.6% for benign apps and 97.0% for malicious apps).

Updated: 2024-08-20 02:50:56

标题: ARAP：揭秘安卓应用中的反运行时分析代码

摘要: 随着Android应用程序的不断增长，确保它们的安全性变得至关重要。越来越多的恶意应用程序采用反分析技术来规避安全措施。尽管一些研究已经开始考虑反运行时分析（ARA），但遗憾的是他们尚未系统地检查ARA技术。此外，ARA技术的快速演变加剧了这一问题，导致分析结果越来越不准确。为了有效分析Android应用程序，了解它们采用的ARA技术是必要的。然而，迄今为止尚未进行系统调查。在本文中，我们对2016年至2023年收集的117,171款Android应用程序（包括恶意和良性应用程序）进行了首次系统研究ARA实现。此外，我们提出了一种名为ARAP的特定调查工具，通过静态和动态分析来辅助这项研究。根据评估结果，ARAP不仅有效地识别Android应用程序中的ARA实现，还揭示了许多重要发现。例如，几乎所有的应用程序都实现了至少一种ARA技术类别（良性应用程序为99.6％，恶意应用程序为97.0％）。

更新时间: 2024-08-20 02:50:56

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2408.11080v1

Asymptotic Classification Error for Heavy-Tailed Renewal Processes

Despite the widespread occurrence of classification problems and the increasing collection of point process data across many disciplines, study of error probability for point process classification only emerged very recently. Here, we consider classification of renewal processes. We obtain asymptotic expressions for the Bhattacharyya bound on misclassification error probabilities for heavy-tailed renewal processes.

Updated: 2024-08-20 02:47:24

标题: 重尾更新过程的渐近分类误差

摘要: 尽管分类问题的普遍存在以及跨多个学科收集点过程数据的增加，但对于点过程分类的错误概率的研究最近才出现。在这里，我们考虑续程过程的分类。我们得到了重尾续程过程误分类错误概率的Bhattacharyya界的渐近表达式。

更新时间: 2024-08-20 02:47:24

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2408.10502v1

ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User Programming

Existing visual assistive technologies are built for simple and common use cases, and have few avenues for blind people to customize their functionalities. Drawing from prior work on DIY assistive technology, this paper investigates end-user programming as a means for users to create and customize visual access programs to meet their unique needs. We introduce ProgramAlly, a system for creating custom filters for visual information, e.g., 'find NUMBER on BUS', leveraging three end-user programming approaches: block programming, natural language, and programming by example. To implement ProgramAlly, we designed a representation of visual filtering tasks based on scenarios encountered by blind people, and integrated a set of on-device and cloud models for generating and running these programs. In user studies with 12 blind adults, we found that participants preferred different programming modalities depending on the task, and envisioned using visual access programs to address unique accessibility challenges that are otherwise difficult with existing applications. Through ProgramAlly, we present an exploration of how blind end-users can create visual access programs to customize and control their experiences.

Updated: 2024-08-20 02:45:31

标题: ProgramAlly：通过多模式终端用户编程创建自定义视觉访问程序

摘要: 现有的视觉辅助技术是为简单和常见的使用情况而构建的，并且几乎没有盲人可以自定义功能的途径。本文从DIY辅助技术的先前工作中汲取灵感，探讨了终端用户编程作为用户创建和定制视觉访问程序以满足其独特需求的手段。我们介绍了ProgramAlly，这是一个用于创建自定义视觉信息过滤器的系统，例如，“在公交车上找到数字”，利用了三种终端用户编程方法：块编程、自然语言和示例编程。为了实现ProgramAlly，我们设计了一个基于盲人遇到的情景的视觉过滤任务表示，并整合了一组用于生成和运行这些程序的设备上和云端模型。在与12名盲人成年人进行的用户研究中，我们发现参与者根据任务偏好不同的编程模式，并设想使用视觉访问程序来解决通过现有应用程序难以实现的独特无障碍挑战。通过ProgramAlly，我们展示了盲人终端用户如何创建视觉访问程序来定制和控制他们的体验的探索。

更新时间: 2024-08-20 02:45:31

领域: cs.HC,cs.AI,cs.PL

下载: http://arxiv.org/abs/2408.10499v1

QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention

Generative LLM have achieved significant success in various industrial tasks and can effectively adapt to vertical domains and downstream tasks through ICL. However, with tasks becoming increasingly complex, the context length required by ICL is also getting longer, and two significant issues arise: (i) The excessively long context leads to high costs and inference delays. (ii) A substantial amount of task-irrelevant information introduced by long contexts exacerbates the "lost in the middle" problem. Recently, compressing prompts by removing tokens according to some metric obtained from some causal language models, such as llama-7b, has emerged as an effective approach to mitigate these issues. However, the metric used by prior method such as self-information or PPL do not fully align with the objective of distinuishing the most important tokens when conditioning on query. In this work, we introduce information bottleneck theory to carefully examine the properties required by the metric. Inspired by this, we use cross-attention in encoder-decoder architecture as a new metric. Our simple method leads to significantly better performance in smaller models with lower latency. We evaluate our method on four datasets: DROP, CoQA, SQuAD, and Quoref. The experimental results show that, while maintaining the same performance, our compression rate can improve by nearly 25% over previous SOTA. Remarkably, in experiments where 25% of the tokens are removed, our model's EM score for answers sometimes even exceeds that of the control group using uncompressed text as context.

Updated: 2024-08-20 02:44:45

标题: QUITO-X：基于信息瓶颈和交叉注意力的压缩算法

摘要: 生成式LLM在各种工业任务中取得了显著的成功，并且可以通过ICL有效地适应垂直领域和下游任务。然而，随着任务变得越来越复杂，ICL所需的上下文长度也变得越来越长，出现了两个重要问题：（i）过长的上下文导致高成本和推理延迟。（ii）长上下文引入了大量与任务无关的信息，加剧了“中间迷失”问题。最近，通过根据来自某些因果语言模型（如llama-7b）获得的某些度量标准删除标记来压缩提示已经成为缓解这些问题的有效方法。然而，先前方法使用的度量标准，如自信息或PPL，并不能完全与在查询时调节时识别最重要标记的目标相一致。在这项工作中，我们引入了信息瓶颈理论，以仔细检查度量标准所需的属性。受此启发，我们在编码器-解码器架构中使用交叉注意力作为新的度量标准。我们的简单方法导致较小模型具有更低延迟的显著更好的性能。我们在四个数据集上评估了我们的方法：DROP、CoQA、SQuAD和Quoref。实验结果表明，保持相同性能的情况下，我们的压缩率比先前的SOTA提高了近25%。值得注意的是，在删除25%的标记的实验中，我们的模型对答案的EM分数有时甚至超过使用未压缩文本作为上下文的对照组。

更新时间: 2024-08-20 02:44:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.10497v1

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?

The rapid advancement of large language models (LLMs) such as GPT-4 has revolutionized the landscape of software engineering, positioning these models at the core of modern development practices. As we anticipate these models to evolve into the primary and trustworthy tools used in software development, ensuring the security of the code they produce becomes paramount. How well can LLMs serve as end-to-end secure code producers? This paper presents a systematic investigation into LLMs' inherent potential to generate code with fewer vulnerabilities. Specifically, We studied GPT-3.5 and GPT-4's capability to identify and repair vulnerabilities in the code generated by four popular LLMs including themselves (GPT-3.5, GPT-4, Code Llama, and CodeGeeX2). By manually or automatically reviewing 4,900 pieces of code, our study reveals that: (1) large language models lack awareness of scenario-relevant security risks, which leads to the generation of over 75% vulnerable code on the SecurityEval benchmark; (2) LLMs such as GPT-3.5 and GPT-4 are unable to precisely identify vulnerabilities in the code they generated; (3) GPT-3.5 and GPT-4 can achieve 33.2%~59.6% success rates in repairing the insecure code produced by the 4 LLMs, but they both perform poorly when repairing self-produced code, indicating self-repair "blind spots". To address the limitation of a single round of repair, we developed a lightweight tool that prompts LLMs to construct safer source code through an iterative repair procedure based on the insights gained from our study. Experiments show that assisted by semantic analysis engines, our tool significantly improves the success rates of repair to 65.9%~85.5%.

Updated: 2024-08-20 02:42:29

标题: 大型语言模型作为端到端安全代码生成器的表现如何？

摘要: 大型语言模型（LLMs）如GPT-4的快速发展已经彻底改变了软件工程领域，将这些模型定位为现代开发实践的核心。随着我们预期这些模型将发展成为软件开发中主要和可信赖的工具，确保它们生成的代码的安全性变得至关重要。LLMs能够作为端到端安全代码生成器的表现如何？本文对LLMs天生潜力生成较少漏洞代码进行了系统调查。具体来说，我们研究了GPT-3.5和GPT-4在识别和修复由四种流行的LLMs（包括它们自己）生成的代码中的漏洞的能力。通过手动或自动审查了4900个代码片段，我们的研究揭示了：（1）大型语言模型缺乏对情境相关安全风险的意识，导致在SecurityEval基准测试中生成超过75％的易受攻击的代码；（2）诸如GPT-3.5和GPT-4之类的LLMs无法准确识别它们生成的代码中的漏洞；（3）GPT-3.5和GPT-4可以在修复由4种LLMs生成的不安全代码时实现33.2%~59.6%的成功率，但在修复自身生成的代码时表现不佳，表明存在自我修复的“盲点”。为了解决单轮修复的限制，我们开发了一款轻量级工具，通过基于我们研究成果的见解进行迭代修复过程，促使LLMs构建更安全的源代码。实验表明，在语义分析引擎的辅助下，我们的工具显著提高了修复成功率，达到了65.9%~85.5%。

更新时间: 2024-08-20 02:42:29

领域: cs.SE,cs.AI,D.2

下载: http://arxiv.org/abs/2408.10495v1

How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization

We provide a simple and flexible framework for designing differentially private algorithms to find approximate stationary points of non-convex loss functions. Our framework is based on using a private approximate risk minimizer to "warm start" another private algorithm for finding stationary points. We use this framework to obtain improved, and sometimes optimal, rates for several classes of non-convex loss functions. First, we obtain improved rates for finding stationary points of smooth non-convex empirical loss functions. Second, we specialize to quasar-convex functions, which generalize star-convex functions and arise in learning dynamical systems and training some neural nets. We achieve the optimal rate for this class. Third, we give an optimal algorithm for finding stationary points of functions satisfying the Kurdyka-Lojasiewicz (KL) condition. For example, over-parameterized neural networks often satisfy this condition. Fourth, we provide new state-of-the-art rates for stationary points of non-convex population loss functions. Fifth, we obtain improved rates for non-convex generalized linear models. A modification of our algorithm achieves nearly the same rates for second-order stationary points of functions with Lipschitz Hessian, improving over the previous state-of-the-art for each of the above problems.

Updated: 2024-08-20 02:37:32

标题: 如何在隐私保护的情况下使梯度变小：差分隐私非凸优化的改进速率

摘要: 我们提供了一个简单灵活的框架，用于设计差分隐私算法，以找到非凸损失函数的近似稳定点。我们的框架基于使用私有近似风险最小化器来“热启动”另一个用于寻找稳定点的私有算法。我们利用这个框架来获得改进的，有时是最佳的速率，适用于几类非凸损失函数。首先，我们获得了寻找平滑非凸经验损失函数稳定点的改进速率。其次，我们专门研究了夸萨凸函数，它概括了星凸函数，并出现在学习动态系统和训练一些神经网络中。我们实现了这个类别的最佳速率。第三，我们提供了一种找到满足Kurdyka-Lojasiewicz（KL）条件函数的稳定点的最佳算法。例如，过度参数化的神经网络经常满足这个条件。第四，我们为非凸总体损失函数的稳定点提供了新的最新速率。第五，我们获得了非凸广义线性模型的改进速率。我们的算法的修改实现了对具有Lipschitz Hessian的函数的二阶稳定点几乎相同的速率，改进了以上每个问题的现有最新技术。

更新时间: 2024-08-20 02:37:32

领域: cs.LG,cs.CR,math.OC

下载: http://arxiv.org/abs/2402.11173v2

Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge

Large Language Models (LLMs) have revolutionized the landscape of machine learning, yet current benchmarks often fall short in capturing the diverse behavior of these models in real-world applications. A benchmark's usefulness is determined by its ability to clearly differentiate between models of varying capabilities (separability) and closely align with human preferences. Existing frameworks like Alpaca-Eval 2.0 LC \cite{dubois2024lengthcontrolledalpacaevalsimpleway} and Arena-Hard v0.1 \cite{li2024crowdsourced} are limited by their focus on general-purpose queries and lack of diversity across domains such as law, medicine, and multilingual contexts. In this paper, we address these limitations by introducing a novel data pipeline that curates diverse, domain-specific evaluation sets tailored for LLM-as-a-Judge frameworks. Our approach leverages a combination of manual curation, semi-supervised learning to generate clusters, and stratified sampling to ensure balanced representation across a wide range of domains and languages. The resulting evaluation set, which includes 1573 samples across 14 categories, demonstrates high separability (84\%) across ten top-ranked models, and agreement (84\%) with Chatbot Arena and (0.915) Spearman correlation. The agreement values are 9\% better than Arena Hard and 20\% better than AlpacaEval 2.0 LC, while the Spearman coefficient is 0.7 more than the next best benchmark, showcasing a significant improvement in the usefulness of the benchmark. We further provide an open-source evaluation tool that enables fine-grained analysis of model performance across user-defined categories, offering valuable insights for practitioners. This work contributes to the ongoing effort to enhance the transparency, diversity, and effectiveness of LLM evaluation methodologies.

Updated: 2024-08-20 02:32:58

标题: 为LLM作为法官构建领域特定的评估集

摘要: 大型语言模型（LLMs）已经彻底改变了机器学习的格局，然而目前的基准测试往往未能充分捕捉到这些模型在实际应用中的多样行为。一个基准测试的实用性取决于其能够清晰区分不同能力模型（可分离性）并与人类偏好紧密对齐。现有框架如Alpaca-Eval 2.0 LC和Arena-Hard v0.1受限于其对通用查询的关注，缺乏在法律、医学和多语言环境等领域的多样性。在本文中，我们通过引入一种新颖的数据管道来解决这些限制，该管道精心策划了针对LLM作为评判框架定制的多样、领域特定的评估集。我们的方法利用手动策划、半监督学习生成簇和分层抽样，以确保在各种领域和语言中平衡地表示。结果评估集包括14个类别的1573个样本，展示了在十个排名靠前的模型中的高可分性（84％），以及与Chatbot Arena的一致性（84％）和Spearman相关性（0.915）。一致性值比Arena Hard高9％，比AlpacaEval 2.0 LC高20％，而Spearman系数比下一个最佳基准测试高0.7，展示了基准测试实用性的显著改进。我们进一步提供了一个开源评估工具，可实现跨用户定义类别的模型性能的细粒度分析，为从业者提供有价值的见解。这项工作有助于不断努力提升LLM评估方法的透明度、多样性和有效性。

更新时间: 2024-08-20 02:32:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.08808v3

Impact of Guidance and Interaction Strategies for LLM Use on Learner Performance and Perception

Personalized chatbot-based teaching assistants can be crucial in addressing increasing classroom sizes, especially where direct teacher presence is limited. Large language models (LLMs) offer a promising avenue, with increasing research exploring their educational utility. However, the challenge lies not only in establishing the efficacy of LLMs but also in discerning the nuances of interaction between learners and these models, which impact learners' engagement and results. We conducted a formative study in an undergraduate computer science classroom (N=145) and a controlled experiment on Prolific (N=356) to explore the impact of four pedagogically informed guidance strategies on the learners' performance, confidence and trust in LLMs. Direct LLM answers marginally improved performance, while refining student solutions fostered trust. Structured guidance reduced random queries as well as instances of students copy-pasting assignment questions to the LLM. Our work highlights the role that teachers can play in shaping LLM-supported learning environments.

Updated: 2024-08-20 02:31:25

标题: 导向和互动策略对LLM使用的影响对学习者表现和感知的影响

摘要: 个性化的基于聊天机器人的教学助手在解决日益增加的课堂规模方面至关重要，特别是当直接教师存在受限时。大型语言模型(LLMs)提供了一个有希望的途径，随着越来越多的研究探索它们在教育中的实用性。然而，挑战不仅在于建立LLMs的功效，还在于辨别学习者与这些模型之间的互动细微差别，这些差别影响学习者的参与和结果。我们在一个本科计算机科学课堂（N=145）和在Prolific上进行了一项受控实验（N=356），探讨了四种教学上的指导策略对学习者表现、信心和对LLMs的信任的影响。直接的LLM答案略微提高了表现，而改进学生解决方案则增强了信任。结构化的指导减少了随机查询以及学生将作业问题复制粘贴到LLM的情况。我们的工作突出了教师在塑造LLM支持的学习环境中所起的作用。

更新时间: 2024-08-20 02:31:25

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2310.13712v3

Clustering by Mining Density Distributions and Splitting Manifold Structure

Spectral clustering requires the time-consuming decomposition of the Laplacian matrix of the similarity graph, thus limiting its applicability to large datasets. To improve the efficiency of spectral clustering, a top-down approach was recently proposed, which first divides the data into several micro-clusters (granular-balls), then splits these micro-clusters when they are not "compact'', and finally uses these micro-clusters as nodes to construct a similarity graph for more efficient spectral clustering. However, this top-down approach is challenging to adapt to unevenly distributed or structurally complex data. This is because constructing micro-clusters as a rough ball struggles to capture the shape and structure of data in a local range, and the simplistic splitting rule that solely targets ``compactness'' is susceptible to noise and variations in data density and leads to micro-clusters with varying shapes, making it challenging to accurately measure the similarity between them. To resolve these issues, this paper first proposes to start from local structures to obtain micro-clusters, such that the complex structural information inside local neighborhoods is well captured by them. Moreover, by noting that Euclidean distance is more suitable for convex sets, this paper further proposes a data splitting rule that couples local density and data manifold structures, so that the similarities of the obtained micro-clusters can be easily characterized. A novel similarity measure between micro-clusters is then proposed for the final spectral clustering. A series of experiments based on synthetic and real-world datasets demonstrate that the proposed method has better adaptability to structurally complex data than granular-ball based methods.

Updated: 2024-08-20 02:22:59

标题: 通过挖掘密度分布和分裂流形结构进行聚类

摘要: 谱聚类需要耗时分解相似性图的拉普拉斯矩阵，从而限制了其适用于大型数据集的能力。为了提高谱聚类的效率，最近提出了一种自上而下的方法，首先将数据分成几个微簇（颗粒球），然后在这些微簇不是“紧凑”的时候分裂它们，最后使用这些微簇作为节点构建相似性图以进行更有效的谱聚类。然而，这种自上而下的方法难以适应分布不均匀或结构复杂的数据。这是因为将微簇构建为粗糙球难以捕捉数据在局部范围内的形状和结构，而仅针对“紧凑性”的简单分裂规则容易受到噪音和数据密度变化的影响，导致微簇形状各异，从而难以准确衡量它们之间的相似性。为了解决这些问题，本文首先提出从局部结构开始获取微簇，以便更好地捕捉局部邻域内的复杂结构信息。此外，通过注意到欧氏距离更适合凸集，本文进一步提出了一种耦合局部密度和数据流形结构的数据分裂规则，以便容易表征获得的微簇的相似性。然后为最终的谱聚类提出了一种新颖的微簇之间的相似性度量。基于合成和真实世界数据集的一系列实验表明，所提出的方法对于结构复杂的数据具有比基于颗粒球方法更好的适应性。

更新时间: 2024-08-20 02:22:59

领域: cs.LG

下载: http://arxiv.org/abs/2408.10493v1

Is the Lecture Engaging for Learning? Lecture Voice Sentiment Analysis for Knowledge Graph-Supported Intelligent Lecturing Assistant (ILA) System

This paper introduces an intelligent lecturing assistant (ILA) system that utilizes a knowledge graph to represent course content and optimal pedagogical strategies. The system is designed to support instructors in enhancing student learning through real-time analysis of voice, content, and teaching methods. As an initial investigation, we present a case study on lecture voice sentiment analysis, in which we developed a training set comprising over 3,000 one-minute lecture voice clips. Each clip was manually labeled as either engaging or non-engaging. Utilizing this dataset, we constructed and evaluated several classification models based on a variety of features extracted from the voice clips. The results demonstrate promising performance, achieving an F1-score of 90% for boring lectures on an independent set of over 800 test voice clips. This case study lays the groundwork for the development of a more sophisticated model that will integrate content analysis and pedagogical practices. Our ultimate goal is to aid instructors in teaching more engagingly and effectively by leveraging modern artificial intelligence techniques.

Updated: 2024-08-20 02:22:27

标题: 讲座对学习是否具有吸引力？基于知识图支持的智能讲座助手（ILA）系统的讲座语音情感分析

摘要: 本文介绍了一种智能讲授助理（ILA）系统，该系统利用知识图表示课程内容和最佳教学策略。该系统旨在通过实时分析声音、内容和教学方法来支持教师提高学生学习效果。作为初步调查，我们展示了一个关于讲座声音情感分析的案例研究，我们开发了一个包含超过3,000个一分钟讲座声音片段的训练集。每个片段都被手动标记为吸引人或不吸引人。利用这个数据集，我们构建并评估了基于从声音片段提取的各种特征的几种分类模型。结果表明有着令人期待的表现，对于无聊的讲座，在一个包含超过800个测试声音片段的独立集上实现了90%的F1分数。这个案例研究为开发一个更复杂的模型奠定了基础，该模型将整合内容分析和教学实践。我们的最终目标是通过利用现代人工智能技术，帮助教师更具吸引力和有效地教学。

更新时间: 2024-08-20 02:22:27

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2408.10492v1

Text-Driven Neural Collaborative Filtering Model for Paper Source Tracing

Identifying significant references within the complex interrelations of a citation knowledge graph is challenging, which encompasses connections through citations, authorship, keywords, and other relational attributes. The Paper Source Tracing (PST) task seeks to automate the identification of pivotal references for given scholarly articles utilizing advanced data mining techniques. In the KDD CUP OAG-Challenge PST track, we design a recommendation-based framework tailored for the PST task. This framework employs the Neural Collaborative Filtering (NCF) model to generate final predictions. To process the textual attributes of the papers and extract input features for the model, we utilize SciBERT, a pre-trained language model. According to the experimental results, our method achieved a score of 0.37814 on the Mean Average Precision (MAP) metric, outperforming baseline models and ranking 11th among all participating teams. The source code is publicly available at https://github.com/MyLove-XAB/KDDCupFinal.

Updated: 2024-08-20 02:16:41

标题: 基于文本驱动的神经协同过滤模型用于论文来源追溯

摘要: 在引文知识图的复杂相互关系中确定重要参考文献是具有挑战性的，这包括通过引文、作者、关键词和其他关系属性进行连接。Paper Source Tracing（PST）任务旨在利用先进的数据挖掘技术自动识别给定学术文章的关键参考文献。在KDD CUP OAG-Challenge PST赛道中，我们设计了一个专门针对PST任务的基于推荐的框架。该框架采用神经协作过滤（NCF）模型生成最终预测。为处理论文的文本属性并提取模型的输入特征，我们利用了预训练语言模型SciBERT。根据实验结果，我们的方法在Mean Average Precision（MAP）指标上获得了0.37814的分数，优于基线模型，并在所有参赛团队中排名第11。源代码可在https://github.com/MyLove-XAB/KDDCupFinal上公开获取。

更新时间: 2024-08-20 02:16:41

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.17722v2

A Comparison of Large Language Model and Human Performance on Random Number Generation Tasks

Random Number Generation Tasks (RNGTs) are used in psychology for examining how humans generate sequences devoid of predictable patterns. By adapting an existing human RNGT for an LLM-compatible environment, this preliminary study tests whether ChatGPT-3.5, a large language model (LLM) trained on human-generated text, exhibits human-like cognitive biases when generating random number sequences. Initial findings indicate that ChatGPT-3.5 more effectively avoids repetitive and sequential patterns compared to humans, with notably lower repeat frequencies and adjacent number frequencies. Continued research into different models, parameters, and prompting methodologies will deepen our understanding of how LLMs can more closely mimic human random generation behaviors, while also broadening their applications in cognitive and behavioral science research.

Updated: 2024-08-20 02:05:46

标题: 大型语言模型与人类在随机数生成任务上的表现比较

摘要: 随机数生成任务（RNGTs）在心理学中用于研究人类如何生成没有可预测模式的序列。通过将一种现有的人类RNGT调整为与LLM兼容的环境，这项初步研究测试了ChatGPT-3.5是否在生成随机数序列时表现出类似于人类的认知偏差。初步研究结果显示，与人类相比，ChatGPT-3.5更有效地避免了重复和顺序模式，重复频率和相邻数字频率明显较低。继续研究不同模型、参数和提示方法将加深我们对LLMs如何更接近模拟人类随机生成行为的理解，同时也拓宽它们在认知和行为科学研究中的应用。

更新时间: 2024-08-20 02:05:46

领域: cs.AI,cs.CL,q-bio.NC

下载: http://arxiv.org/abs/2408.09656v2

Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm

Sign Language Translation (SLT) is a core task in the field of AI-assisted disability. Unlike traditional SLT based on visible light videos, which is easily affected by factors such as lighting, rapid hand movements, and privacy breaches, this paper proposes the use of high-definition Event streams for SLT, effectively mitigating the aforementioned issues. This is primarily because Event streams have a high dynamic range and dense temporal signals, which can withstand low illumination and motion blur well. Additionally, due to their sparsity in space, they effectively protect the privacy of the target person. More specifically, we propose a new high-resolution Event stream sign language dataset, termed Event-CSL, which effectively fills the data gap in this area of research. It contains 14,827 videos, 14,821 glosses, and 2,544 Chinese words in the text vocabulary. These samples are collected in a variety of indoor and outdoor scenes, encompassing multiple angles, light intensities, and camera movements. We have benchmarked existing mainstream SLT works to enable fair comparison for future efforts. Based on this dataset and several other large-scale datasets, we propose a novel baseline method that fully leverages the Mamba model's ability to integrate temporal information of CNN features, resulting in improved sign language translation outcomes. Both the benchmark dataset and source code will be released on https://github.com/Event-AHU/OpenESL

Updated: 2024-08-20 02:01:30

标题: 事件流基于手语翻译：高清晰度基准数据集和新算法

摘要: 手语翻译（SLT）是AI辅助残疾领域的核心任务。与传统基于可见光视频的SLT不同，后者容易受到光照、手部快速移动和隐私泄露等因素的影响，本文提出使用高清事件流进行SLT，有效地缓解了上述问题。这主要是因为事件流具有高动态范围和密集的时间信号，可以很好地抵抗低照度和运动模糊。另外，由于事件流在空间上的稀疏性，它们有效地保护了目标人的隐私。具体来说，我们提出了一个新的高分辨率事件流手语数据集，称为Event-CSL，有效填补了这一研究领域的数据空白。它包含14,827个视频，14,821个手语，和2,544个中文单词在文本词汇表中。这些样本是在各种室内和室外场景中收集的，涵盖了多个角度、光强度和摄像机移动。我们已经对现有的主流SLT作品进行了基准测试，以便未来的努力进行公平比较。基于这个数据集和其他几个大规模数据集，我们提出了一种新颖的基准方法，充分利用了Mamba模型整合CNN特征的时间信息的能力，从而提高了手语翻译的结果。基准数据集和源代码将在https://github.com/Event-AHU/OpenESL 上发布。

更新时间: 2024-08-20 02:01:30

领域: cs.CV,cs.AI,cs.CL,cs.NE

下载: http://arxiv.org/abs/2408.10488v1

MambaEVT: Event Stream based Visual Object Tracking using State Space Model

Event camera-based visual tracking has drawn more and more attention in recent years due to the unique imaging principle and advantages of low energy consumption, high dynamic range, and dense temporal resolution. Current event-based tracking algorithms are gradually hitting their performance bottlenecks, due to the utilization of vision Transformer and the static template for target object localization. In this paper, we propose a novel Mamba-based visual tracking framework that adopts the state space model with linear complexity as a backbone network. The search regions and target template are fed into the vision Mamba network for simultaneous feature extraction and interaction. The output tokens of search regions will be fed into the tracking head for target localization. More importantly, we consider introducing a dynamic template update strategy into the tracking framework using the Memory Mamba network. By considering the diversity of samples in the target template library and making appropriate adjustments to the template memory module, a more effective dynamic template can be integrated. The effective combination of dynamic and static templates allows our Mamba-based tracking algorithm to achieve a good balance between accuracy and computational cost on multiple large-scale datasets, including EventVOT, VisEvent, and FE240hz. The source code will be released on https://github.com/Event-AHU/MambaEVT

Updated: 2024-08-20 02:01:17

标题: MambaEVT：基于状态空间模型的事件流视觉目标跟踪

摘要: 基于事件摄像头的视觉跟踪近年来受到越来越多关注，这是由于其独特的成像原理和低能耗、高动态范围、密集时间分辨率的优势。目前基于事件的跟踪算法逐渐达到性能瓶颈，这是由于利用视觉Transformer和静态模板进行目标对象定位。本文提出了一种新颖的基于Mamba的视觉跟踪框架，采用具有线性复杂度的状态空间模型作为骨干网络。搜索区域和目标模板被输入到视觉Mamba网络中进行同时特征提取和交互。搜索区域的输出标记将被输入到跟踪头中进行目标定位。更重要的是，我们考虑在跟踪框架中引入一种动态模板更新策略，使用Memory Mamba网络。通过考虑目标模板库中样本的多样性，并对模板内存模块进行适当调整，可以集成更有效的动态模板。动态模板和静态模板的有效组合使得我们基于Mamba的跟踪算法在多个大规模数据集（包括EventVOT、VisEvent和FE240hz）上实现了准确性和计算成本之间的良好平衡。源代码将发布在https://github.com/Event-AHU/MambaEVT。

更新时间: 2024-08-20 02:01:17

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.10487v1

Robust Image Classification: Defensive Strategies against FGSM and PGD Adversarial Attacks

Adversarial attacks, particularly the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) pose significant threats to the robustness of deep learning models in image classification. This paper explores and refines defense mechanisms against these attacks to enhance the resilience of neural networks. We employ a combination of adversarial training and innovative preprocessing techniques, aiming to mitigate the impact of adversarial perturbations. Our methodology involves modifying input data before classification and investigating different model architectures and training strategies. Through rigorous evaluation of benchmark datasets, we demonstrate the effectiveness of our approach in defending against FGSM and PGD attacks. Our results show substantial improvements in model robustness compared to baseline methods, highlighting the potential of our defense strategies in real-world applications. This study contributes to the ongoing efforts to develop secure and reliable machine learning systems, offering practical insights and paving the way for future research in adversarial defense. By bridging theoretical advancements and practical implementation, we aim to enhance the trustworthiness of AI applications in safety-critical domains.

Updated: 2024-08-20 02:00:02

标题: 强大的图像分类：对抗FGSM和PGD对抗性攻击的防御策略

摘要: 对抗性攻击，尤其是快速梯度符号方法（FGSM）和投影梯度下降（PGD）对图像分类中深度学习模型的稳健性构成重大威胁。本文探讨并完善了对这些攻击的防御机制，以增强神经网络的抗干扰能力。我们采用对抗性训练和创新的预处理技术相结合，旨在减轻对抗性扰动的影响。我们的方法涉及在分类之前修改输入数据，并研究不同的模型架构和训练策略。通过对基准数据集的严格评估，我们展示了我们的方法在抵御FGSM和PGD攻击方面的有效性。与基准方法相比，我们的结果显示出模型稳健性方面的显著改进，突显了我们的防御策略在实际应用中的潜力。这项研究为开发安全可靠的机器学习系统的持续努力做出了贡献，提供了实用见解，并为对抗性防御方面的未来研究铺平了道路。通过搭建理论进展和实际实施之间的桥梁，我们旨在增强安全关键领域AI应用的可信度。

更新时间: 2024-08-20 02:00:02

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2408.13274v1

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks covering five essential scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification. SciRIFF demonstrations are notable for their long input contexts, detailed task specifications, and complex structured outputs. While instruction-following resources are available in specific domains such as clinical medicine and chemistry, SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields. To demonstrate the utility of SciRIFF, we develop a sample-efficient strategy to adapt a general instruction-following model for science by performing additional finetuning on a mix of general-domain and SciRIFF demonstrations. In evaluations on nine held-out scientific tasks, our model -- called SciTulu -- improves over a strong LLM baseline by 28.1% and 6.5% at the 7B and 70B scales respectively, while maintaining general instruction-following performance within 2% of the baseline. We are optimistic that SciRIFF will facilitate the development and evaluation of LLMs to help researchers navigate the ever-growing body of scientific literature. We release our dataset, model checkpoints, and data processing and evaluation code to enable further research.

Updated: 2024-08-20 01:59:44

标题: SciRIFF：一个用于增强科学文献指令遵循的资源

摘要: 我们提出了SciRIFF（科学资源用于指令跟随和微调），这是一个包含137K个指令跟随演示的数据集，涵盖了54项任务，涵盖了五种基本的科学文献理解能力：信息提取、摘要、问题回答、声明验证和分类。SciRIFF演示以其长输入上下文、详细任务规范和复杂结构化输出而著称。虽然在特定领域如临床医学和化学中有可用的指令跟随资源，但SciRIFF是第一个专注于从研究文献中提取和综合信息的数据集，覆盖了广泛的科学领域。为了展示SciRIFF的实用性，我们开发了一种样本高效的策略，通过在通用领域和SciRIFF演示的混合上进行额外的微调，适应科学。在对九项保留的科学任务进行评估时，我们的模型SciTulu在7B和70B规模上分别比强大的LLM基线提高了28.1%和6.5%，同时保持通用指令跟随性能与基线相差不到2%。我们乐观地认为，SciRIFF将促进LLM的开发和评估，帮助研究人员更好地浏览日益增长的科学文献。我们发布了我们的数据集、模型检查点以及数据处理和评估代码，以促进进一步的研究。

更新时间: 2024-08-20 01:59:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07835v3

PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting

The self-attention mechanism in Transformer architecture, invariant to sequence order, necessitates positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences, particularly when employing longer lookback windows. To address this, we introduce an innovative approach that combines Pyramid RNN embeddings(PRE) for univariate time series with the Transformer's capability to model multivariate dependencies. PRE, utilizing pyramidal one-dimensional convolutional layers, constructs multiscale convolutional features that preserve temporal order. Additionally, RNNs, layered atop these features, learn multiscale time series representations sensitive to sequence order. This integration into Transformer models with attention mechanisms results in significant performance enhancements. We present the PRformer, a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets. This performance highlights the effectiveness of our approach in leveraging longer lookback windows and underscores the critical role of robust temporal representations in maximizing Transformer's potential for prediction tasks. Code is available at this repository: \url{https://github.com/usualheart/PRformer}.

Updated: 2024-08-20 01:56:07

标题: PRformer: 金字塔递归变换器用于多元时间序列预测

摘要: Transformer架构中的自注意机制，对序列顺序不变，需要位置嵌入来编码时间序列预测中的时间顺序。我们认为，这种对位置嵌入的依赖限制了Transformer在有效表示时间序列时的能力，特别是在使用更长的回顾窗口时。为了解决这个问题，我们引入了一种创新的方法，将金字塔RNN嵌入（PRE）与Transformer模型中建模多变量依赖关系的能力相结合。PRE利用金字塔形式的一维卷积层构建多尺度卷积特征，保留时间顺序。此外，堆叠在这些特征之上的RNN学习对序列顺序敏感的多尺度时间序列表示。将这种集成到具有注意机制的Transformer模型中，会带来显著的性能提升。我们提出了PRformer模型，将PRE与标准Transformer编码器整合在一起，展示了在各种真实世界数据集上的最新性能。这种性能突显了我们的方法在利用更长回顾窗口方面的有效性，并强调了在最大化Transformer在预测任务中潜力时，稳健的时间表示的关键作用。代码可在此存储库中找到：\url{https://github.com/usualheart/PRformer}。

更新时间: 2024-08-20 01:56:07

领域: cs.LG

下载: http://arxiv.org/abs/2408.10483v1

Evaluation Framework for AI-driven Molecular Design of Multi-target Drugs: Brain Diseases as a Case Study

The widespread application of Artificial Intelligence (AI) techniques has significantly influenced the development of new therapeutic agents. These computational methods can be used to design and predict the properties of generated molecules. Multi-target Drug Discovery (MTDD) is an emerging paradigm for discovering drugs against complex disorders that do not respond well to more traditional target-specific treatments, such as central nervous system, immune system, and cardiovascular diseases. Still, there is yet to be an established benchmark suite for assessing the effectiveness of AI tools for designing multi-target compounds. Standardized benchmarks allow for comparing existing techniques and promote rapid research progress. Hence, this work proposes an evaluation framework for molecule generation techniques in MTDD scenarios, considering brain diseases as a case study. Our methodology involves using large language models to select the appropriate molecular targets, gathering and preprocessing the bioassay datasets, training quantitative structure-activity relationship models to predict target modulation, and assessing other essential drug-likeness properties for implementing the benchmarks. Additionally, this work will assess the performance of four deep generative models and evolutionary algorithms over our benchmark suite. In our findings, both evolutionary algorithms and generative models can achieve competitive results across the proposed benchmarks.

Updated: 2024-08-20 01:42:16

标题: 多靶点药物AI驱动分子设计的评估框架：以脑疾病为案例研究

摘要: 人工智能（AI）技术的广泛应用显著影响了新疗法制剂的发展。这些计算方法可以用来设计和预测生成分子的性质。多靶点药物发现（MTDD）是一种新兴范式，用于发现针对复杂疾病的药物，这些疾病对更传统的靶点特异性治疗（如中枢神经系统、免疫系统和心血管疾病）的反应不佳。然而，尚未建立用于评估AI工具设计多靶点化合物有效性的基准测试套件。标准化的基准测试可用于比较现有技术并促进快速研究进展。因此，本研究提出了一种评估框架，用于MTDD场景中的分子生成技术，以脑部疾病作为案例研究。我们的方法涉及使用大型语言模型选择适当的分子靶点，收集和预处理生物测定数据集，训练定量结构-活性关系模型以预测靶点调节，评估其他实施基准测试所必需的药物相似性属性。此外，本研究将评估四种深度生成模型和进化算法在我们的基准测试套件上的表现。在我们的研究结果中，进化算法和生成模型都能在提出的基准测试中取得竞争性结果。

更新时间: 2024-08-20 01:42:16

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2408.10482v1

An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-Hailing

Assigning orders to drivers under localized spatiotemporal context (micro-view order-dispatching) is a major task in Didi, as it influences ride-hailing service experience. Existing industrial solutions mainly follow a two-stage pattern that incorporate heuristic or learning-based algorithms with naive combinatorial methods, tackling the uncertainty of both sides' behaviors, including emerging timings, spatial relationships, and travel duration, etc. In this paper, we propose a one-stage end-to-end reinforcement learning based order-dispatching approach that solves behavior prediction and combinatorial optimization uniformly in a sequential decision-making manner. Specifically, we employ a two-layer Markov Decision Process framework to model this problem, and present \underline{D}eep \underline{D}ouble \underline{S}calable \underline{N}etwork (D2SN), an encoder-decoder structure network to generate order-driver assignments directly and stop assignments accordingly. Besides, by leveraging contextual dynamics, our approach can adapt to the behavioral patterns for better performance. Extensive experiments on Didi's real-world benchmarks justify that the proposed approach significantly outperforms competitive baselines in optimizing matching efficiency and user experience tasks. In addition, we evaluate the deployment outline and discuss the gains and experiences obtained during the deployment tests from the view of large-scale engineering implementation.

Updated: 2024-08-20 01:30:53

标题: 一种基于端到端强化学习的骑车叫车微观视角订单派发方法

摘要: 将订单分配给司机在滴滴中是一个重要任务，因为它影响乘车服务体验。现有工业解决方案主要遵循一个包含启发式或基于学习的算法与天真的组合方法的两阶段模式，解决双方行为的不确定性，包括新出现的时机、空间关系和旅行持续时间等。本文提出了一种基于一阶端到端强化学习的订单分派方法，以顺序决策的方式统一解决行为预测和组合优化问题。具体地，我们采用了一个两层马尔可夫决策过程框架来建模这个问题，并提出了深度可扩展双网络（D2SN），一个编码器-解码器结构网络，直接生成订单-司机分配并相应地停止分配。此外，通过利用上下文动态，我们的方法可以适应更好的性能。对滴滴真实世界基准的广泛实验证明，所提出的方法在优化匹配效率和用户体验任务方面明显优于竞争基线。此外，我们评估了部署概要，并从大规模工程实施的角度讨论了在部署测试中获得的收益和经验。

更新时间: 2024-08-20 01:30:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.10479v1

Effective Bilevel Optimization via Minimax Reformulation

Bilevel optimization has found successful applications in various machine learning problems, including hyper-parameter optimization, data cleaning, and meta-learning. However, its huge computational cost presents a significant challenge for its utilization in large-scale problems. This challenge arises due to the nested structure of the bilevel formulation, where each hyper-gradient computation necessitates a costly inner optimization procedure. To address this issue, we propose a reformulation of bilevel optimization as a minimax problem, effectively decoupling the outer-inner dependency. Under mild conditions, we show these two problems are equivalent. Furthermore, we introduce a multi-stage gradient descent and ascent (GDA) algorithm to solve the resulting minimax problem with convergence guarantees. Extensive experimental results demonstrate that our method outperforms state-of-the-art bilevel methods while significantly reducing the computational cost.

Updated: 2024-08-20 01:27:21

标题: 通过极小极大重构实现有效的双层优化

摘要: 双层优化在各种机器学习问题中取得了成功的应用，包括超参数优化、数据清洗和元学习。然而，其巨大的计算成本在大规模问题中的应用中提出了重要挑战。这一挑战源于双层形式的嵌套结构，其中每个超梯度计算都需要耗时的内部优化过程。为了解决这个问题，我们提出将双层优化重新构建为一个极小极大问题，有效地解耦了外部内部的依赖关系。在温和的条件下，我们证明这两个问题是等价的。此外，我们引入了一个多阶段梯度下降和上升（GDA）算法来解决由此产生的极小极大问题，并保证其收敛性。大量实验结果表明，我们的方法在显著降低计算成本的同时优于最先进的双层方法。

更新时间: 2024-08-20 01:27:21

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2305.13153v3

Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias

The unadjusted Langevin algorithm is commonly used to sample probability distributions in extremely high-dimensional settings. However, existing analyses of the algorithm for strongly log-concave distributions suggest that, as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error in the $W_2$ metric scales in proportion to $d$ or $\sqrt{d}$. In this paper, we argue that, despite this poor scaling of the $W_2$ error for the full set of variables, the behavior for a small number of variables can be significantly better: a number of iterations proportional to $K$, up to logarithmic terms in $d$, often suffices for the algorithm to converge to within a desired $W_2$ error for all $K$-marginals. We refer to this effect as delocalization of bias. We show that the delocalization effect does not hold universally and prove its validity for Gaussian distributions and strongly log-concave distributions with certain sparse interactions. Our analysis relies on a novel $W_{2,\ell^\infty}$ metric to measure convergence. A key technical challenge we address is the lack of a one-step contraction property in this metric. Finally, we use asymptotic arguments to explore potential generalizations of the delocalization effect beyond the Gaussian and sparse interactions setting.

Updated: 2024-08-20 01:24:54

标题: 高维度中未经调整的朗之万收敛性：偏差的非局部化

摘要: 未经调整的 Langevin 算法通常用于在极高维环境中采样概率分布。然而，现有针对强对数凹分布的算法分析表明，随着问题维度 $d$ 的增加，为确保在 $W_2$ 度量中达到所需误差收敛所需的迭代次数与 $d$ 或 $\sqrt{d}$ 成比例。在本文中，我们认为，尽管对于全部变量而言 $W_2$ 误差的缩放较差，但对于少量变量来说，行为可能会更好：通常只需与 $K$ 成比例的迭代次数，再加上 $d$ 中的对数项，算法就能在所有 $K$ 边际上收敛到所需的 $W_2$ 误差范围内。我们将这种效应称为偏差的非局部化。我们展示了这种非局部化效应并不是普遍存在的，并且证明了它在高斯分布和具有某些稀疏相互作用的强对数凹分布中的有效性。我们的分析依赖于一种新颖的 $W_{2,\ell^\infty}$ 度量来衡量收敛性。我们需要解决的一个关键技术挑战是在这种度量中缺乏一步收缩特性。最后，我们使用渐近参数来探讨超出高斯和稀疏相互作用设置的非局部化效应的潜在推广。

更新时间: 2024-08-20 01:24:54

领域: stat.ML,cs.LG,math.PR,stat.CO

下载: http://arxiv.org/abs/2408.13115v1

Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for Deep Neural Networks

Stochastic optimization plays a crucial role in the advancement of deep learning technologies. Over the decades, significant effort has been dedicated to improving the training efficiency and robustness of deep neural networks, via various strategies including gradient normalization (GN) and gradient centralization (GC). Nevertheless, to the best of our knowledge, no one has considered to capture the optimal gradient descent trajectory, by adaptively controlling gradient descent direction. To address this concern, this paper is the first attempt to study a new optimization technique for deep neural networks, using the sum normalization of a gradient vector as coefficients, to dynamically regularize gradients and thus to effectively control optimization direction. The proposed technique is hence named as the adaptive gradient regularization (AGR). It can be viewed as an adaptive gradient clipping method. The theoretical analysis reveals that the AGR can effectively smooth the loss landscape, and hence can significantly improve the training efficiency and model generalization performance. We note that AGR can greatly improve the training efficiency of vanilla optimizers' including Adan and AdamW, by adding only three lines of code. The final experiments conducted on image generation, image classification, and language representation, demonstrate that the AGR method can not only improve the training efficiency but also enhance the model generalization performance.

Updated: 2024-08-20 01:21:38

标题: 自适应梯度正则化：一种更快和通用的深度神经网络优化技术

摘要: 随机优化在推动深度学习技术发展中起着至关重要的作用。几十年来，人们已经致力于改善深度神经网络的训练效率和鲁棒性，采用了各种策略，包括梯度归一化（GN）和梯度中心化（GC）。然而，据我们所知，没有人考虑过通过自适应控制梯度下降方向来捕捉最优梯度下降轨迹。为了解决这一问题，本文首次尝试研究一种新的深度神经网络优化技术，使用梯度向量的和归一化作为系数，动态地正则化梯度，从而有效地控制优化方向。因此，提出的技术被命名为自适应梯度正则化（AGR）。它可以被视为一种自适应梯度剪切方法。理论分析表明，AGR可以有效地平滑损失函数的曲面，从而显著提高训练效率和模型的泛化性能。我们注意到，AGR可以通过添加三行代码大大提高标准优化器（包括Adan和AdamW）的训练效率。最终对图像生成、图像分类和语言表示进行的实验表明，AGR方法不仅可以提高训练效率，还可以增强模型的泛化性能。

更新时间: 2024-08-20 01:21:38

领域: cs.LG

下载: http://arxiv.org/abs/2407.16944v4

LeCov: Multi-level Testing Criteria for Large Language Models

Large Language Models (LLMs) are widely used in many different domains, but because of their limited interpretability, there are questions about how trustworthy they are in various perspectives, e.g., truthfulness and toxicity. Recent research has started developing testing methods for LLMs, aiming to uncover untrustworthy issues, i.e., defects, before deployment. However, systematic and formalized testing criteria are lacking, which hinders a comprehensive assessment of the extent and adequacy of testing exploration. To mitigate this threat, we propose a set of multi-level testing criteria, LeCov, for LLMs. The criteria consider three crucial LLM internal components, i.e., the attention mechanism, feed-forward neurons, and uncertainty, and contain nine types of testing criteria in total. We apply the criteria in two scenarios: test prioritization and coverage-guided testing. The experiment evaluation, on three models and four datasets, demonstrates the usefulness and effectiveness of LeCov.

Updated: 2024-08-20 01:17:54

标题: LeCov：大型语言模型的多级测试标准

摘要: 大型语言模型（LLMs）被广泛应用于许多不同领域，但由于其有限的可解释性，人们对它们在各个方面的可信度，例如真实性和毒性，产生了疑问。最近的研究开始开发LLMs的测试方法，旨在在部署之前揭示不可信赖的问题，即缺陷。然而，缺乏系统化和形式化的测试标准，这阻碍了对测试探索的广度和充分性的全面评估。为了减轻这一威胁，我们提出了一组多级测试标准LeCov，适用于LLMs。这些标准考虑了三个关键的LLM内部组件，即注意机制，前馈神经元和不确定性，并总共包含九种测试标准。我们将这些标准应用于两种场景：测试优先级和覆盖率引导测试。在三个模型和四个数据集上的实验评估表明了LeCov的实用性和有效性。

更新时间: 2024-08-20 01:17:54

领域: cs.SE,cs.AI,cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2408.10474v1

Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism

Pre-trained language models (PLMs) are engineered to be robust in contextual understanding and exhibit outstanding performance in various natural language processing tasks. However, their considerable size incurs significant computational and storage costs. Modern pruning strategies employ one-shot techniques to compress PLMs without the need for retraining on task-specific or otherwise general data; however, these approaches often lead to an indispensable reduction in performance. In this paper, we propose SDS, a Sparse-Dense-Sparse pruning framework to enhance the performance of the pruned PLMs from a weight distribution optimization perspective. We outline the pruning process in three steps. Initially, we prune less critical connections in the model using conventional one-shot pruning methods. Next, we reconstruct a dense model featuring a pruning-friendly weight distribution by reactivating pruned connections with sparse regularization. Finally, we perform a second pruning round, yielding a superior pruned model compared to the initial pruning. Experimental results demonstrate that SDS outperforms the state-of-the-art pruning techniques SparseGPT and Wanda under an identical sparsity configuration. For instance, SDS reduces perplexity by 9.13 on Raw-Wikitext2 and improves accuracy by an average of 2.05% across multiple zero-shot benchmarks for OPT-125M with 2:4 sparsity.

Updated: 2024-08-20 01:05:45

标题: 通过稀疏-稠密-稀疏机制增强一次修剪预训练语言模型

摘要: 预训练语言模型（PLMs）被设计为在上下文理解方面具有很强的鲁棒性，并在各种自然语言处理任务中表现出色。然而，它们巨大的规模带来了显著的计算和存储成本。现代剪枝策略采用一次性技术来压缩PLMs，无需在特定任务上重新训练或使用其他通用数据；然而，这些方法通常会导致不可或缺的性能降低。在本文中，我们提出了SDS，一种稀疏-密集-稀疏剪枝框架，从权重分布优化的角度提高被剪枝PLMs的性能。我们概述了三个步骤的剪枝过程。首先，我们使用传统的一次性剪枝方法剪枝模型中不太关键的连接。接下来，我们通过重新激活具有稀疏正则化的剪枝连接来重建一个特征剪枝友好的密集模型的权重分布。最后，我们进行第二轮剪枝，得到一个比初始剪枝更优秀的剪枝模型。实验结果表明，在相同的稀疏配置下，SDS在Raw-Wikitext2上将困惑度降低了9.13，并在OPT-125M上的多个零射击基准测试中平均提高了2.05％的准确度，超过了SparseGPT和Wanda等最先进的剪枝技术。

更新时间: 2024-08-20 01:05:45

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.10473v1

Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection

Large visual-language models (LVLMs) exhibit exceptional performance in visual-language reasoning across diverse cross-modal benchmarks. Despite these advances, recent research indicates that Large Language Models (LLMs), like GPT-3.5-turbo, underachieve compared to well-trained smaller models, such as BERT, in Fake News Detection (FND), prompting inquiries into LVLMs' efficacy in FND tasks. Although performance could improve through fine-tuning LVLMs, the substantial parameters and requisite pre-trained weights render it a resource-heavy endeavor for FND applications. This paper initially assesses the FND capabilities of two notable LVLMs, CogVLM and GPT4V, in comparison to a smaller yet adeptly trained CLIP model in a zero-shot context. The findings demonstrate that LVLMs can attain performance competitive with that of the smaller model. Next, we integrate standard in-context learning (ICL) with LVLMs, noting improvements in FND performance, though limited in scope and consistency. To address this, we introduce the \textbf{I}n-context \textbf{M}ultimodal \textbf{F}ake \textbf{N}ews \textbf{D}etection (IMFND) framework, enriching in-context examples and test inputs with predictions and corresponding probabilities from a well-trained smaller model. This strategic integration directs the LVLMs' focus towards news segments associated with higher probabilities, thereby improving their analytical accuracy. The experimental results suggest that the IMFND framework significantly boosts the FND efficiency of LVLMs, achieving enhanced accuracy over the standard ICL approach across three publicly available FND datasets.

Updated: 2024-08-20 00:57:55

标题: 大型视觉-语言模型也是优秀的分类器：一项关于上下文多模态假新闻检测的研究

摘要: 大型视觉语言模型（LVLMs）在跨多样化跨模态基准测试中展现出卓越的性能。尽管取得了这些进展，最近的研究表明，像GPT-3.5-turbo这样的大型语言模型（LLMs）在虚假新闻检测（FND）方面表现不佳，与经过良好训练的较小模型（如BERT）相比，引发了对LVLMs在FND任务中有效性的疑问。尽管通过微调LVLMs可以改善性能，但大量参数和必需的预训练权重使其成为FND应用的一项资源密集型工作。本文最初评估了两个著名的LVLMs，CogVLM和GPT4V，在零-shot情境中与一个较小但训练得当的CLIP模型在FND能力方面的比较。研究结果表明，LVLMs可以达到与较小模型竞争性能的水平。接下来，我们将标准的上下文学习（ICL）与LVLMs相结合，注意到FND性能有所改善，尽管范围和一致性有限。为了解决这个问题，我们引入了\textbf{I}n-context \textbf{M}ultimodal \textbf{F}ake \textbf{N}ews \textbf{D}etection（IMFND）框架，通过将预测和相应概率从一个经过良好训练的较小模型丰富上下文示例和测试输入，指导LVLMs的注意力集中在与更高概率相关联的新闻片段上，从而提高其分析准确性。实验结果表明，IMFND框架显著提升了LVLMs的FND效率，在三个公开可用的FND数据集上实现了比标准ICL方法更高的准确性。

更新时间: 2024-08-20 00:57:55

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.12879v2

Device Sampling and Resource Optimization for Federated Learning in Cooperative Edge Networks

The conventional federated learning (FedL) architecture distributes machine learning (ML) across worker devices by having them train local models that are periodically aggregated by a server. FedL ignores two important characteristics of contemporary wireless networks, however: (i) the network may contain heterogeneous communication/computation resources, and (ii) there may be significant overlaps in devices' local data distributions. In this work, we develop a novel optimization methodology that jointly accounts for these factors via intelligent device sampling complemented by device-to-device (D2D) offloading. Our optimization methodology aims to select the best combination of sampled nodes and data offloading configuration to maximize FedL training accuracy while minimizing data processing and D2D communication resource consumption subject to realistic constraints on the network topology and device capabilities. Theoretical analysis of the D2D offloading subproblem leads to new FedL convergence bounds and an efficient sequential convex optimizer. Using these results, we develop a sampling methodology based on graph convolutional networks (GCNs) which learns the relationship between network attributes, sampled nodes, and D2D data offloading to maximize FedL accuracy. Through evaluation on popular datasets and real-world network measurements from our edge testbed, we find that our methodology outperforms popular device sampling methodologies from literature in terms of ML model performance, data processing overhead, and energy consumption.

Updated: 2024-08-20 00:45:40

标题: 在合作边缘网络中的联邦学习设备抽样和资源优化

摘要: 传统的联邦学习(FedL)架构通过让工作设备训练本地模型，并定期由服务器聚合来分布机器学习(ML)。然而，FedL忽略了当今无线网络的两个重要特征：(i)网络可能包含不同的通信/计算资源，(ii)设备的本地数据分布可能有重叠。在这项工作中，我们开发了一种新颖的优化方法，通过智能设备抽样和设备对设备(D2D)卸载来共同考虑这些因素。我们的优化方法旨在选择最佳的抽样节点组合和数据卸载配置，以最大化FedL训练精度，同时最小化数据处理和D2D通信资源消耗，同时满足网络拓扑和设备功能的现实约束。D2D卸载子问题的理论分析导致了新的FedL收敛界限和高效的序列凸优化器。利用这些结果，我们开发了一种基于图卷积网络(GCNs)的抽样方法，学习网络属性、抽样节点和D2D数据卸载之间的关系，以最大化FedL准确性。通过对流行数据集和我们边缘测试平台的真实网络测量结果的评估，我们发现我们的方法在ML模型性能、数据处理开销和能耗方面优于文献中流行的设备抽样方法。

更新时间: 2024-08-20 00:45:40

领域: cs.NI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2311.04350v2

Learning Multimodal Latent Space with EBM Prior and MCMC Inference

Multimodal generative models are crucial for various applications. We propose an approach that combines an expressive energy-based model (EBM) prior with Markov Chain Monte Carlo (MCMC) inference in the latent space for multimodal generation. The EBM prior acts as an informative guide, while MCMC inference, specifically through short-run Langevin dynamics, brings the posterior distribution closer to its true form. This method not only provides an expressive prior to better capture the complexity of multimodality but also improves the learning of shared latent variables for more coherent generation across modalities. Our proposed method is supported by empirical experiments, underscoring the effectiveness of our EBM prior with MCMC inference in enhancing cross-modal and joint generative tasks in multimodal contexts.

Updated: 2024-08-20 00:33:45

标题: 学习具有EBM先验和MCMC推断的多模态潜空间

摘要: 多模式生成模型对于各种应用至关重要。我们提出了一种方法，将具有表现力的基于能量的模型（EBM）先验与在潜在空间中的马尔可夫链蒙特卡洛（MCMC）推理相结合，用于多模式生成。EBM先验充当信息性指南，而MCMC推理，特别是通过短时朗格朗日动力学，将后验分布更接近其真实形式。这种方法不仅提供了一个富有表现力的先验，以更好地捕捉多模态复杂性，还改进了对共享潜在变量的学习，以实现更一致的跨模态生成。我们提出的方法得到了经验实验的支持，强调了我们在增强多模态环境中的跨模态和联合生成任务中的EBM先验与MCMC推理的有效性。

更新时间: 2024-08-20 00:33:45

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2408.10467v1

PromptBench: A Unified Library for Evaluation of Large Language Models

The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to evaluate LLMs. It consists of several key components that are easily used and extended by researchers: prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluation protocols, and analysis tools. PromptBench is designed to be an open, general, and flexible codebase for research purposes that can facilitate original study in creating new benchmarks, deploying downstream applications, and designing new evaluation protocols. The code is available at: https://github.com/microsoft/promptbench and will be continuously supported.

Updated: 2024-08-20 00:28:45

标题: PromptBench：用于评估大型语言模型的统一库

摘要: 评估大型语言模型（LLMs）对于评估其性能并减轻潜在安全风险至关重要。在本文中，我们介绍了PromptBench，一个用于评估LLMs的统一库。它包括几个关键组件，研究人员可以轻松使用和扩展：提示构建、提示工程、数据集和模型加载、对抗性提示攻击、动态评估协议和分析工具。PromptBench旨在成为一个开放、通用和灵活的代码库，用于研究目的，可以促进原创研究，创建新的基准，部署下游应用程序和设计新的评估协议。该代码可在https://github.com/microsoft/promptbench 上找到，并将持续支持。

更新时间: 2024-08-20 00:28:45

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2312.07910v3

Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting

The keyword spotting (KWS) problem requires large amounts of real speech training data to achieve high accuracy across diverse populations. Utilizing large amounts of text-to-speech (TTS) synthesized data can reduce the cost and time associated with KWS development. However, TTS data may contain artifacts not present in real speech, which the KWS model can exploit (overfit), leading to degraded accuracy on real speech. To address this issue, we propose applying an adversarial training method to prevent the KWS model from learning TTS-specific features when trained on large amounts of TTS data. Experimental results demonstrate that KWS model accuracy on real speech data can be improved by up to 12% when adversarial loss is used in addition to the original KWS loss. Surprisingly, we also observed that the adversarial setup improves accuracy by up to 8%, even when trained solely on TTS and real negative speech data, without any real positive examples.

Updated: 2024-08-20 00:16:12

标题: 对抗训练关键词识别以减少 TTS 数据过拟合

摘要: 关键词识别（KWS）问题需要大量真实语音训练数据，以在不同人群中实现高准确性。利用大量文本转语音（TTS）合成数据可以降低KWS开发相关成本和时间。然而，TTS数据可能包含在真实语音中不存在的人为因素，KWS模型可以利用这些因素（过拟合），导致在真实语音上的准确性下降。为了解决这个问题，我们提出应用对抗训练方法，防止KWS模型在大量TTS数据上训练时学习TTS特定特征。实验结果表明，当对抗损失与原始KWS损失一起使用时，KWS模型在真实语音数据上的准确性可以提高高达12%。令人惊讶的是，我们还观察到，即使仅在TTS和真实负面语音数据上训练，没有任何真实正面例子，对抗设置也可以将准确性提高高达8%。

更新时间: 2024-08-20 00:16:12

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2408.10463v1

Fredholm Integral Equations Neural Operator (FIE-NO) for Data-Driven Boundary Value Problems

In this paper, we present a novel Fredholm Integral Equation Neural Operator (FIE-NO) method, an integration of Random Fourier Features and Fredholm Integral Equations (FIE) into the deep learning framework, tailored for solving data-driven Boundary Value Problems (BVPs) with irregular boundaries. Unlike traditional computational approaches that struggle with the computational intensity and complexity of such problems, our method offers a robust, efficient, and accurate solution mechanism, using a physics inspired design of the learning structure. We demonstrate that the proposed physics-guided operator learning method (FIE-NO) achieves superior performance in addressing BVPs. Notably, our approach can generalize across multiple scenarios, including those with unknown equation forms and intricate boundary shapes, after being trained only on one boundary condition. Experimental validation demonstrates that the FIE-NO method performs well in simulated examples, including Darcy flow equation and typical partial differential equations such as the Laplace and Helmholtz equations. The proposed method exhibits robust performance across different boundary conditions. Experimental results indicate that FIE-NO achieves higher accuracy and stability compared to other methods when addressing complex boundary value problems with varying numbers of interior points.

Updated: 2024-08-20 00:15:27

标题: 弗雷德霍姆积分方程神经算子(FIE-NO)用于数据驱动的边值问题

摘要: 在本文中，我们提出了一种新颖的Fredholm积分方程神经算子（FIE-NO）方法，将随机傅里叶特征和Fredholm积分方程（FIE）集成到深度学习框架中，专门用于解决具有不规则边界的数据驱动边界值问题（BVPs）。与传统的计算方法不同，这些方法在处理这类问题的计算强度和复杂性方面往往面临困难，我们的方法提供了一个强大、高效和准确的解决机制，采用了物理启发式设计学习结构。我们证明了所提出的物理引导算子学习方法（FIE-NO）在处理BVPs时具有优越性能。值得注意的是，我们的方法可以在多种情况下进行泛化，包括那些具有未知方程形式和复杂边界形状的情况，在只训练一个边界条件的情况下。实验验证表明，FIE-NO方法在模拟示例中表现良好，包括达西流动方程和典型的偏微分方程，如拉普拉斯和亥姆霍兹方程。所提出的方法在不同边界条件下表现出强大的性能。实验结果表明，与其他方法相比，FIE-NO在处理具有不同数量内部点的复杂边界值问题时实现了更高的精度和稳定性。

更新时间: 2024-08-20 00:15:27

领域: cs.LG

下载: http://arxiv.org/abs/2408.12389v1

FedMFS: Federated Multimodal Fusion Learning with Selective Modality Communication

Multimodal federated learning (FL) aims to enrich model training in FL settings where devices are collecting measurements across multiple modalities (e.g., sensors measuring pressure, motion, and other types of data). However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings: (i) the set of modalities collected by each device will be diverse, and (ii) communication limitations prevent devices from uploading all their locally trained modality models to the server. In this paper, we propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS), a new multimodal fusion FL methodology that can tackle the above mentioned challenges. The key idea is the introduction of a modality selection criterion for each device, which weighs (i) the impact of the modality, gauged by Shapley value analysis, against (ii) the modality model size as a gauge for communication overhead. This enables FedMFS to flexibly balance performance against communication costs, depending on resource constraints and application requirements. Experiments on the real-world ActionSense dataset demonstrate the ability of FedMFS to achieve comparable accuracy to several baselines while reducing the communication overhead by over 4x.

Updated: 2024-08-20 00:05:20

标题: FedMFS：选择性模态通信的联合多模融合学习

摘要: 多模态联邦学习（FL）旨在丰富FL设置中的模型训练，其中设备跨多个模态收集测量数据（例如，传感器测量压力、运动和其他类型的数据）。然而，多模态FL面临尚未解决的关键挑战，特别是在异构网络环境中：（i）每个设备收集的模态集合将是多样化的，（ii）通信限制阻止设备将所有本地训练的模态模型上传到服务器。在本文中，我们提出了具有选择性模态通信的联邦多模态融合学习（FedMFS），这是一种可以应对上述挑战的新型多模态融合FL方法。关键思想是为每个设备引入一个模态选择标准，该标准权衡（i）通过Shapley值分析衡量的模态影响与（ii）作为通信开销衡量的模态模型大小。这使得FedMFS能够灵活地在性能和通信成本之间取得平衡，具体取决于资源限制和应用需求。对真实世界的ActionSense数据集进行的实验表明，FedMFS能够在减少通信开销超过4倍的同时，实现与几种基准线相当的准确性。

更新时间: 2024-08-20 00:05:20

领域: cs.LG,cs.DC,cs.NI

下载: http://arxiv.org/abs/2310.07048v4

Transfer Operator Learning with Fusion Frame

The challenge of applying learned knowledge from one domain to solve problems in another related but distinct domain, known as transfer learning, is fundamental in operator learning models that solve Partial Differential Equations (PDEs). These current models often struggle with generalization across different tasks and datasets, limiting their applicability in diverse scientific and engineering disciplines. This work presents a novel framework that enhances the transfer learning capabilities of operator learning models for solving Partial Differential Equations (PDEs) through the integration of fusion frame theory with the Proper Orthogonal Decomposition (POD)-enhanced Deep Operator Network (DeepONet). We introduce an innovative architecture that combines fusion frames with POD-DeepONet, demonstrating superior performance across various PDEs in our experimental analysis. Our framework addresses the critical challenge of transfer learning in operator learning models, paving the way for adaptable and efficient solutions across a wide range of scientific and engineering applications.

Updated: 2024-08-20 00:03:23

标题: 学习与融合框架的传输算子

摘要: 将从一个领域学到的知识应用于解决另一个相关但不同领域中的问题的挑战，即迁移学习，在解决偏微分方程（PDEs）的操作员学习模型中是基础性的。目前这些模型通常在不同任务和数据集之间的泛化方面存在困难，限制了它们在各种科学和工程学科中的适用性。本文提出了一个新颖的框架，通过将融合框架理论与Proper Orthogonal Decomposition（POD）增强的Deep Operator Network（DeepONet）相结合，增强了操作员学习模型解决偏微分方程（PDEs）的迁移学习能力。我们引入了一种创新的架构，将融合框架与POD-DeepONet结合起来，在我们的实验分析中展示了在各种PDEs上的优越性能。我们的框架解决了操作员学习模型中迁移学习的关键挑战，为在各种科学和工程应用中提供适应性和高效解决方案铺平了道路。

更新时间: 2024-08-20 00:03:23

领域: cs.LG

下载: http://arxiv.org/abs/2408.10458v1