Arxiv Day: Article

Center-fixing of tropical cyclones using uncertainty-aware deep learning applied to high-temporal-resolution geostationary satellite imagery

Determining the location of a tropical cyclone's (TC) surface circulation center -- "center-fixing" -- is a critical first step in the TC-forecasting process, affecting current and future estimates of track, intensity, and structure. Despite a recent increase in the number of automated center-fixing methods, only one such method (ARCHER-2) is operational, and its best performance is achieved when using microwave or scatterometer data, which are not available at every forecast cycle. We develop a deep-learning algorithm called GeoCenter; it relies only on geostationary IR satellite imagery, which is available for all TC basins at high frequency (10-15 min) and low latency (< 10 min) during both day and night. GeoCenter ingests an animation (time series) of IR images, including 10 channels at lag times up to 3 hours. The animation is centered at a "first guess" location, offset from the true TC-center location by 48 km on average and sometimes > 100 km; GeoCenter is tasked with correcting this offset. On an independent testing dataset, GeoCenter achieves a mean/median/RMS (root mean square) error of 26.9/23.3/32.0 km for all systems, 25.7/22.3/30.5 km for tropical systems, and 15.7/13.6/18.6 km for category-2--5 hurricanes. These values are similar to ARCHER-2 errors when microwave or scatterometer data are available, and better than ARCHER-2 errors when only IR data are available. GeoCenter also performs skillful uncertainty quantification (UQ), producing a well calibrated ensemble of 200 TC-center locations. Furthermore, all predictors used by GeoCenter are available in real time, which would make GeoCenter easy to implement operationally every 10-15 min.

Updated: 2024-09-24 23:39:56

标题: 使用高时空分辨率静止卫星图像的不确定性感知深度学习对热带气旋进行中心固定

摘要: 确定热带气旋（TC）表面环流中心的位置是TC预测过程中的关键第一步，影响着路径、强度和结构的当前和未来估计。尽管最近自动中心定位方法数量增加，但只有一种方法（ARCHER-2）是可操作的，其最佳性能是使用微波或散射计数据时实现的，而这些数据并非每个预测周期都可用。我们开发了一种名为GeoCenter的深度学习算法；它仅依赖于地球静止红外卫星图像，这些图像在所有TC盆地都以高频率（10-15分钟）和低延迟（<10分钟）在白天和黑夜都可用。GeoCenter摄入一个包含10个通道的红外图像动画（时间序列），包括至多3小时的滞后时间。动画以“第一猜测”位置为中心，平均偏离真实TC中心位置48公里，有时>100公里；GeoCenter的任务是纠正这种偏移。在一个独立的测试数据集上，GeoCenter对所有系统实现了26.9/23.3/32.0公里的均值/中位数/RMS（均方根）误差，对热带系统为25.7/22.3/30.5公里，对2-5级飓风为15.7/13.6/18.6公里。这些数值与当有微波或散射计数据可用时ARCHER-2的误差相似，并且在只有红外数据可用时优于ARCHER-2的误差。GeoCenter还执行了熟练的不确定性量化（UQ），生成了一个200个TC中心位置的良好校准的集合。此外，GeoCenter使用的所有预测因子都可实时获得，这使得GeoCenter每10-15分钟易于实施运行。

更新时间: 2024-09-24 23:39:56

领域: physics.ao-ph,cs.AI

下载: http://arxiv.org/abs/2409.16507v1

Contextual Evaluation of Large Language Models for Classifying Tropical and Infectious Diseases

While large language models (LLMs) have shown promise for medical question answering, there is limited work focused on tropical and infectious disease-specific exploration. We build on an opensource tropical and infectious diseases (TRINDs) dataset, expanding it to include demographic and semantic clinical and consumer augmentations yielding 11000+ prompts. We evaluate LLM performance on these, comparing generalist and medical LLMs, as well as LLM outcomes to human experts. We demonstrate through systematic experimentation, the benefit of contextual information such as demographics, location, gender, risk factors for optimal LLM response. Finally we develop a prototype of TRINDs-LM, a research tool that provides a playground to navigate how context impacts LLM outputs for health.

Updated: 2024-09-24 23:39:49

标题: 大型语言模型在分类热带和传染病中的情境评估

摘要: 虽然大型语言模型（LLMs）已经显示出在医学问题回答方面的潜力，但针对热带和传染病特定探索的工作有限。我们在一个开源的热带和传染病（TRINDs）数据集的基础上进行了构建，将其扩展以包括人口统计和语义临床以及消费者增强，产生11000多个提示。我们评估了LLM在这些提示上的表现，比较了通用和医学LLM，以及LLM结果与人类专家的对比。我们通过系统实验表明，上下文信息（如人口统计、地理位置、性别、风险因素）对于最佳LLM响应的好处。最后，我们开发了TRINDs-LM的原型，这是一个研究工具，提供了一个场所，用于探讨上下文如何影响健康领域的LLM输出。

更新时间: 2024-09-24 23:39:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.09201v2

SurGen: Text-Guided Diffusion Model for Surgical Video Generation

Diffusion-based video generation models have made significant strides, producing outputs with improved visual fidelity, temporal coherence, and user control. These advancements hold great promise for improving surgical education by enabling more realistic, diverse, and interactive simulation environments. In this study, we introduce SurGen, a text-guided diffusion model tailored for surgical video synthesis. SurGen produces videos with the highest resolution and longest duration among existing surgical video generation models. We validate the visual and temporal quality of the outputs using standard image and video generation metrics. Additionally, we assess their alignment to the corresponding text prompts through a deep learning classifier trained on surgical data. Our results demonstrate the potential of diffusion models to serve as valuable educational tools for surgical trainees.

Updated: 2024-09-24 23:23:50

标题: SurGen：用于手术视频生成的文本引导扩散模型

摘要: 基于扩散的视频生成模型取得了显著进展，生成的输出具有改善的视觉保真度、时间连贯性和用户控制。这些进步为改善外科教育提供了巨大的希望，使得更真实、多样化和互动的模拟环境成为可能。在这项研究中，我们介绍了SurGen，这是一种专门针对外科视频合成的文本引导扩散模型。SurGen生成的视频在现有外科视频生成模型中具有最高分辨率和最长持续时间。我们使用标准图像和视频生成指标验证输出的视觉和时间质量。此外，我们通过在外科数据上训练的深度学习分类器评估其与相应文本提示的对齐情况。我们的结果表明，扩散模型有潜力成为外科实习生宝贵的教育工具。

更新时间: 2024-09-24 23:23:50

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.14028v3

GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization

Although various visual localization approaches exist, such as scene coordinate and pose regression, these methods often struggle with high memory consumption or extensive optimization requirements. To address these challenges, we utilize recent advancements in novel view synthesis, particularly 3D Gaussian Splatting (3DGS), to enhance localization. 3DGS allows for the compact encoding of both 3D geometry and scene appearance with its spatial features. Our method leverages the dense description maps produced by XFeat's lightweight keypoint detection and description model. We propose distilling these dense keypoint descriptors into 3DGS to improve the model's spatial understanding, leading to more accurate camera pose predictions through 2D-3D correspondences. After estimating an initial pose, we refine it using a photometric warping loss. Benchmarking on popular indoor and outdoor datasets shows that our approach surpasses state-of-the-art Neural Render Pose (NRP) methods, including NeRFMatch and PNeRFLoc.

Updated: 2024-09-24 23:18:32

标题: GSplatLoc：将关键点描述符基于3D高斯分层进行地面化，以改善视觉定位

摘要: 尽管存在各种各样的视觉定位方法，例如场景坐标和姿态回归，但这些方法往往在高内存消耗或广泛优化需求方面存在困难。为了解决这些挑战，我们利用最近在新视角合成方面的进展，特别是3D高斯斑点（3DGS），以增强定位能力。 3DGS允许使用其空间特征紧凑地编码3D几何和场景外观。我们的方法利用XFeat的轻量级关键点检测和描述模型产生的密集描述图。我们建议将这些密集的关键点描述符提炼成3DGS，以改善模型的空间理解，通过2D-3D对应关系实现更准确的相机姿态预测。在估计出初始姿态后，我们使用光度变形损失进行细化。在流行的室内和室外数据集上进行基准测试表明，我们的方法超越了Neural Render Pose（NRP）方法的最新技术，包括NeRFMatch和PNeRFLoc。

更新时间: 2024-09-24 23:18:32

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2409.16502v1

Introducing CausalBench: A Flexible Benchmark Framework for Causal Analysis and Machine Learning

While witnessing the exceptional success of machine learning (ML) technologies in many applications, users are starting to notice a critical shortcoming of ML: correlation is a poor substitute for causation. The conventional way to discover causal relationships is to use randomized controlled experiments (RCT); in many situations, however, these are impractical or sometimes unethical. Causal learning from observational data offers a promising alternative. While being relatively recent, causal learning aims to go far beyond conventional machine learning, yet several major challenges remain. Unfortunately, advances are hampered due to the lack of unified benchmark datasets, algorithms, metrics, and evaluation service interfaces for causal learning. In this paper, we introduce {\em CausalBench}, a transparent, fair, and easy-to-use evaluation platform, aiming to (a) enable the advancement of research in causal learning by facilitating scientific collaboration in novel algorithms, datasets, and metrics and (b) promote scientific objectivity, reproducibility, fairness, and awareness of bias in causal learning research. CausalBench provides services for benchmarking data, algorithms, models, and metrics, impacting the needs of a broad of scientific and engineering disciplines.

Updated: 2024-09-24 23:16:02

标题: 引入CausalBench：用于因果分析和机器学习的灵活基准框架

摘要: 在目睹机器学习（ML）技术在许多应用中取得卓越成功的同时，用户开始注意到ML的一个关键缺陷：相关性并不是因果关系的良好替代。发现因果关系的常规方法是使用随机对照实验（RCT）; 然而，在许多情况下，这些方法是不切实际的，有时甚至是不道德的。从观察数据中学习因果关系提供了一个有希望的替代方案。虽然因果学习相对较新，但旨在远远超越传统的机器学习，但仍存在一些主要挑战。不幸的是，由于缺乏因果学习统一的基准数据集、算法、度量和评估服务接口，进展受到了阻碍。在本文中，我们介绍了“CausalBench”，这是一个透明、公平、易于使用的评估平台，旨在（a）通过促进新算法、数据集和度量的科学合作，促进因果学习研究的进展，并（b）促进科学客观性、可重复性、公平性和对因果学习研究中偏见的认识。CausalBench为基准数据、算法、模型和度量提供服务，影响着广泛的科学和工程学科领域的需求。

更新时间: 2024-09-24 23:16:02

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2409.08419v2

Learning Linear Dynamics from Bilinear Observations

We consider the problem of learning a realization of a partially observed dynamical system with linear state transitions and bilinear observations. Under very mild assumptions on the process and measurement noises, we provide a finite time analysis for learning the unknown dynamics matrices (up to a similarity transform). Our analysis involves a regression problem with heavy-tailed and dependent data. Moreover, each row of our design matrix contains a Kronecker product of current input with a history of inputs, making it difficult to guarantee persistence of excitation. We overcome these challenges, first providing a data-dependent high probability error bound for arbitrary but fixed inputs. Then, we derive a data-independent error bound for inputs chosen according to a simple random design. Our main results provide an upper bound on the statistical error rates and sample complexity of learning the unknown dynamics matrices from a single finite trajectory of bilinear observations.

Updated: 2024-09-24 23:11:47

标题: 从双线性观测中学习线性动态特性

摘要: 我们考虑学习一个具有线性状态转换和双线性观测的部分观测动态系统的问题。在对过程和测量噪声做非常温和的假设的情况下，我们提供了学习未知动态矩阵（经过相似变换）的有限时间分析。我们的分析涉及一个具有重尾和相关数据的回归问题。此外，我们的设计矩阵的每一行都包含当前输入与输入历史的克罗内克积，这使得很难保证激发的持久性。我们首先克服了这些挑战，提供了一个数据相关的高概率误差界，适用于任意但固定的输入。然后，我们推导了一个适用于按照简单随机设计选择的输入的数据无关误差界。我们的主要结果提供了从单个有限双线性观测轨迹学习未知动态矩阵的统计误差率和样本复杂度的上界。

更新时间: 2024-09-24 23:11:47

领域: cs.LG,cs.SY,eess.SY,math.OC,stat.ML

下载: http://arxiv.org/abs/2409.16499v1

Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval

Dense retrieval systems are commonly used for information retrieval (IR). They rely on learning text representations through an encoder and usually require supervised modeling via labelled data which can be costly to obtain or simply unavailable. In this study, we introduce a novel unsupervised text representation learning technique via instruction-tuning the pre-trained encoder-decoder large language models (LLM) under the dual-encoder retrieval framework. We demonstrate the corpus representation can be augmented by the representations of relevant synthetic queries generated by the instruct-tuned LLM founded on the Rao-Blackwell theorem. Furthermore, we effectively align the query and corpus text representation with self-instructed-tuning. Specifically, we first prompt an open-box pre-trained LLM to follow defined instructions (i.e. question generation and keyword summarization) to generate synthetic queries. Next, we fine-tune the pre-trained LLM with defined instructions and the generated queries that passed quality check. Finally, we generate synthetic queries with the instruction-tuned LLM for each corpora and represent each corpora by weighted averaging the synthetic queries and original corpora embeddings. We evaluate our proposed method under low-resource settings on three English and one German retrieval datasets measuring NDCG@10, MRR@100, Recall@100. We significantly improve the average zero-shot retrieval performance on all metrics, increasing open-box FLAN-T5 model variations by [3.34%, 3.50%] in absolute and exceeding three competitive dense retrievers (i.e. mDPR, T-Systems, mBART-Large), with model of size at least 38% smaller, by 1.96%, 4.62%, 9.52% absolute on NDCG@10.

Updated: 2024-09-24 23:03:13

标题: 无监督文本表示学习通过指导调整实现零样本密集检索

摘要: 密集检索系统通常用于信息检索（IR）。它们依赖于通过编码器学习文本表示，并通常需要通过标记数据进行监督建模，这可能成本高昂或根本无法获得。在本研究中，我们介绍了一种新颖的无监督文本表示学习技术，通过指导调整预训练的编码器-解码器大型语言模型（LLM）在双编码器检索框架下。我们展示了通过基于Rao-Blackwell定理的指导调整的LLM生成的相关合成查询的表示可以增强语料库表示。此外，我们通过自学修正有效地对齐查询和语料库文本表示。具体而言，我们首先促使一个开放式预训练的LLM遵循定义的指示（即问题生成和关键字总结）生成合成查询。接下来，我们通过定义的指导和通过质量检查的生成的查询对预训练的LLM进行微调。最后，我们使用指导调整的LLM为每个语料库生成合成查询，并通过加权平均合成查询和原始语料库嵌入来表示每个语料库。我们在三个英语和一个德语检索数据集上评估了我们提出的方法，在NDCG@10、MRR@100、Recall@100上进行衡量。我们显著提高了所有指标上的平均零-shot检索性能，在所有指标上将开放式盒FLAN-T5模型变体提高了[3.34%，3.50%]绝对值，并超过了三个竞争密集检索器（即mDPR、T-Systems、mBART-Large），模型尺寸至少小38%，在NDCG@10上绝对提高了1.96%，4.62%，9.52%。

更新时间: 2024-09-24 23:03:13

领域: cs.AI

下载: http://arxiv.org/abs/2409.16497v1

Flight: A FaaS-Based Framework for Complex and Hierarchical Federated Learning

Federated Learning (FL) is a decentralized machine learning paradigm where models are trained on distributed devices and are aggregated at a central server. Existing FL frameworks assume simple two-tier network topologies where end devices are directly connected to the aggregation server. While this is a practical mental model, it does not exploit the inherent topology of real-world distributed systems like the Internet-of-Things. We present Flight, a novel FL framework that supports complex hierarchical multi-tier topologies, asynchronous aggregation, and decouples the control plane from the data plane. We compare the performance of Flight against Flower, a state-of-the-art FL framework. Our results show that Flight scales beyond Flower, supporting up to 2048 simultaneous devices, and reduces FL makespan across several models. Finally, we show that Flight's hierarchical FL model can reduce communication overheads by more than 60%.

Updated: 2024-09-24 22:49:27

标题: Flight：基于FaaS的复杂和分层联邦学习框架

摘要: 联邦学习（FL）是一种分散式机器学习范式，其中模型在分布式设备上进行训练，并在中央服务器上进行聚合。现有的FL框架假定简单的两层网络拓扑，其中终端设备直接连接到聚合服务器。虽然这是一个实际的思维模型，但它并没有充分利用现实世界分布式系统的固有拓扑结构，如物联网。我们提出了Flight，一个支持复杂的分层多层拓扑结构、异步聚合，并将控制平面与数据平面分离的新型FL框架。我们将Flight的性能与Flower，一个最先进的FL框架进行了比较。我们的结果表明，Flight能够支持高达2048个同时设备，并在几个模型上缩短FL的完成时间。最后，我们展示了Flight的分层FL模型可以将通信开销减少超过60%。

更新时间: 2024-09-24 22:49:27

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2409.16495v1

Exploring Knowledge Tracing in Tutor-Student Dialogues

Recent advances in large language models (LLMs) have led to the development of artificial intelligence (AI)-powered tutoring chatbots, showing promise in providing broad access to high-quality personalized education. Existing works have primarily studied how to make LLMs follow tutoring principles but not how to model student behavior in dialogues. However, analyzing student dialogue turns can serve as a formative assessment, since open-ended student discourse may indicate their knowledge levels and reveal specific misconceptions. In this work, we present a first attempt at performing knowledge tracing (KT) in tutor-student dialogues. We propose LLM prompting methods to identify the knowledge components/skills involved in each dialogue turn and diagnose whether the student responds correctly to the tutor, and verify the LLM's effectiveness via an expert human evaluation. We then apply a range of KT methods on the resulting labeled data to track student knowledge levels over an entire dialogue. We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues. We perform extensive qualitative analyses to highlight the challenges in dialogue KT and outline multiple avenues for future work.

Updated: 2024-09-24 22:31:39

标题: 在导师-学生对话中探索知识追踪

摘要: 近年来，大型语言模型（LLMs）的最新进展导致人工智能（AI）辅助教学聊天机器人的发展，显示出在提供高质量个性化教育方面具有潜力。现有研究主要关注如何使LLMs遵循教学原则，但并未探讨如何建模对话中学生的行为。然而，分析学生对话转换可以作为形成性评估，因为开放式学生讨论可能表明他们的知识水平并揭示特定的误解。在这项工作中，我们首次尝试在导师-学生对话中进行知识追踪（KT）。我们提出LLM提示方法，以识别每个对话转换中涉及的知识组件/技能，并诊断学生是否正确回应导师，并通过专家人工评估验证LLM的有效性。然后，我们在结果标记数据上应用一系列KT方法，以跟踪整个对话中学生的知识水平。我们在两个辅导对话数据集上进行实验，并展示了一种新颖而简单的基于LLM的方法，LLMKT，在预测对话中学生回答是否正确方面明显优于现有的KT方法。我们进行了广泛的定性分析，突出了对话KT中的挑战，并概述了未来工作的多个途径。

更新时间: 2024-09-24 22:31:39

领域: cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2409.16490v1

Diffusion Models to Enhance the Resolution of Microscopy Images: A Tutorial

Diffusion models have emerged as a prominent technique in generative modeling with neural networks, making their mark in tasks like text-to-image translation and super-resolution. In this tutorial, we provide a comprehensive guide to build denoising diffusion probabilistic models (DDPMs) from scratch, with a specific focus on transforming low-resolution microscopy images into their corresponding high-resolution versions. We provide the theoretical background, mathematical derivations, and a detailed Python code implementation using PyTorch, along with techniques to enhance model performance.

Updated: 2024-09-24 22:29:22

标题: 扩散模型以增强显微镜图像的分辨率：教程

摘要: 扩散模型已经成为神经网络生成建模中的一种突出技术，在文本到图像翻译和超分辨率等任务中发挥作用。在本教程中，我们提供了一个全面指南，从头开始构建去噪扩散概率模型（DDPMs），重点放在将低分辨率显微镜图像转换为其相应的高分辨率版本上。我们提供了理论背景、数学推导以及使用PyTorch的详细Python代码实现，以及增强模型性能的技术。

更新时间: 2024-09-24 22:29:22

领域: eess.IV,cs.CV,cs.LG,q-bio.OT

下载: http://arxiv.org/abs/2409.16488v1

To Explore the Potential Inhibitors against Multitarget Proteins of COVID 19 using In Silico Study

The global pandemic due to emergence of COVID 19 has created the unrivaled public health crisis. It has huge morbidity rate never comprehended in the recent decades. Researchers have made many efforts to find the optimal solution of this pandemic. Progressively, drug repurposing is an emergent and powerful strategy with saving cost, time, and labor. Lacking of identified repurposed drug candidates against COVID 19 demands more efforts to explore the potential inhibitors for effective cure. In this study, we used the combination of molecular docking and machine learning regression approaches to explore the potential inhibitors for the treatment of COVID 19. We calculated the binding affinities of these drugs to multitarget proteins using molecular docking process. We perform the QSAR modeling by employing various machine learning regression approaches to identify the potential inhibitors against COVID 19. Our findings with best scores of R2 and RMSE demonstrated that our proposed Decision Tree Regression (DTR) model is the most appropriate model to explore the potential inhibitors. We proposed five novel promising inhibitors with their respective Zinc IDs ZINC (3873365, 85432544, 8214470, 85536956, and 261494640) within the range of -19.7 kcal/mol to -12.6 kcal/mol. We further analyzed the physiochemical and pharmacokinetic properties of these most potent inhibitors to examine their behavior. The analysis of these properties is the key factor to promote an effective cure for public health. Our work constructs an efficient structure with which to probe the potential inhibitors against COVID-19, creating the combination of molecular docking with machine learning regression approaches.

Updated: 2024-09-24 22:19:56

标题: 利用计算模拟研究探索针对COVID-19多靶点蛋白的潜在抑制剂

摘要: 由于新冠疫情的出现，全球大流行已经造成了前所未有的公共卫生危机。近几十年来，它具有巨大的发病率，是以前无法理解的。研究人员已经做出了许多努力来寻找这一大流行病的最佳解决方案。逐渐而言，药物再利用是一种新兴且强大的策略，能够节约成本、时间和劳动力。缺乏对新冠疫情的已识别再利用药物候选者需要更多努力来探索潜在的抑制剂以进行有效治疗。在这项研究中，我们使用分子对接和机器学习回归方法的结合来探索治疗新冠疫情的潜在抑制剂。我们通过分子对接过程计算了这些药物与多靶蛋白的结合亲和力。我们通过采用各种机器学习回归方法进行QSAR建模来识别抗击新冠疫情的潜在抑制剂。我们的研究结果显示，我们提出的决策树回归（DTR）模型具有最佳的R2和RMSE分数，是探索潜在抑制剂最合适的模型。我们提出了五种具有潜力的新型抑制剂，它们分别是ZINC（3873365、85432544、8214470、85536956和261494640），其能量范围在-19.7 kcal/mol至-12.6 kcal/mol之间。我们进一步分析了这些最强效抑制剂的理化性质和药代动力学特性以检验其行为。对这些特性的分析是促进公共卫生有效治疗的关键因素。我们的工作建立了一个高效的结构，用于探索抗击新冠病毒的潜在抑制剂，结合了分子对接和机器学习回归方法。

更新时间: 2024-09-24 22:19:56

领域: q-bio.QM,cs.AI

下载: http://arxiv.org/abs/2409.16486v1

Generative AI-driven forecasting of oil production

Forecasting oil production from oilfields with multiple wells is an important problem in petroleum and geothermal energy extraction, as well as energy storage technologies. The accuracy of oil forecasts is a critical determinant of economic projections, hydrocarbon reserves estimation, construction of fluid processing facilities, and energy price fluctuations. Leveraging generative AI techniques, we model time series forecasting of oil and water productions across four multi-well sites spanning four decades. Our goal is to effectively model uncertainties and make precise forecasts to inform decision-making processes at the field scale. We utilize an autoregressive model known as TimeGrad and a variant of a transformer architecture named Informer, tailored specifically for forecasting long sequence time series data. Predictions from both TimeGrad and Informer closely align with the ground truth data. The overall performance of the Informer stands out, demonstrating greater efficiency compared to TimeGrad in forecasting oil production rates across all sites.

Updated: 2024-09-24 22:11:21

标题: 基于生成式人工智能的石油产量预测

摘要: 使用多口井油田的油产量进行预测是石油和地热能提取以及能源储存技术中的一个重要问题。油价预测的准确性是经济预测、烃类储量估算、流体处理设施建设和能源价格波动的关键因素。利用生成式人工智能技术，我们建立了油和水产量的时间序列预测模型，涵盖了跨越四十年的四个多口井站点。我们的目标是有效地建模不确定性，并做出精确的预测，以指导现场规模的决策过程。我们利用一种称为TimeGrad的自回归模型以及一种变种变压器架构，名为Informer，专门用于预测长序列时间序列数据。TimeGrad和Informer的预测与真实数据密切吻合。Informer的整体性能突出，显示出在所有站点的油产量预测中比TimeGrad更高效。

更新时间: 2024-09-24 22:11:21

领域: cs.LG

下载: http://arxiv.org/abs/2409.16482v1

Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization

This paper investigates new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO). There has been a growing interest in FCCO due to its wide-ranging applications in machine learning and AI, as well as its ability to address the shortcomings of stochastic algorithms based on empirical risk minimization. However, current research on FCCO presumes that both the inner and outer functions are smooth, limiting their potential to tackle a more diverse set of problems. Our research expands on this area by examining non-smooth weakly-convex FCCO, where the outer function is weakly convex and non-decreasing, and the inner function is weakly-convex. We analyze a single-loop algorithm and establish its complexity for finding an $\epsilon$-stationary point of the Moreau envelop of the objective function. Additionally, we also extend the algorithm to solving novel non-smooth weakly-convex tri-level finite-sum coupled compositional optimization problems, which feature a nested arrangement of three functions. Lastly, we explore the applications of our algorithms in deep learning for two-way partial AUC maximization and multi-instance two-way partial AUC maximization, using empirical studies to showcase the effectiveness of the proposed algorithms.

Updated: 2024-09-24 22:04:17

标题: 非光滑弱凸有限和耦合组合优化

摘要: 本文研究了新的组合优化问题家族，称为非光滑弱凸有限和耦合组合优化（NSWC FCCO）。由于其在机器学习和人工智能中的广泛应用以及其解决基于经验风险最小化的随机算法的缺点的能力，人们对FCCO越来越感兴趣。然而，目前关于FCCO的研究假设内外函数都是光滑的，限制了它们处理更多种类问题的潜力。我们的研究通过研究非光滑弱凸FCCO来扩展这一领域，其中外函数是弱凸且非减的，内函数是弱凸的。我们分析了一个单循环算法，并建立了找到目标函数Moreau包络的ε-稳定点的复杂性。此外，我们还将算法扩展到解决新的非光滑弱凸三层有限和耦合组合优化问题，这些问题具有三个函数的嵌套排列。最后，我们探讨了我们的算法在深度学习中的应用，用于两方面部分AUC最大化和多实例两方面部分AUC最大化，使用实证研究展示了所提算法的有效性。

更新时间: 2024-09-24 22:04:17

领域: math.OC,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2310.03234v5

Algorithmic Drift: A Simulation Framework to Study the Effects of Recommender Systems on User Preferences

Digital platforms such as social media and e-commerce websites adopt Recommender Systems to provide value to the user. However, the social consequences deriving from their adoption are still unclear. Many scholars argue that recommenders may lead to detrimental effects, such as bias-amplification deriving from the feedback loop between algorithmic suggestions and users' choices. Nonetheless, the extent to which recommenders influence changes in users leaning remains uncertain. In this context, it is important to provide a controlled environment for evaluating the recommendation algorithm before deployment. To address this, we propose a stochastic simulation framework that mimics user-recommender system interactions in a long-term scenario. In particular, we simulate the user choices by formalizing a user model, which comprises behavioral aspects, such as the user resistance towards the recommendation algorithm and their inertia in relying on the received suggestions. Additionally, we introduce two novel metrics for quantifying the algorithm's impact on user preferences, specifically in terms of drift over time. We conduct an extensive evaluation on multiple synthetic datasets, aiming at testing the robustness of our framework when considering different scenarios and hyper-parameters setting. The experimental results prove that the proposed methodology is effective in detecting and quantifying the drift over the users preferences by means of the simulation. All the code and data used to perform the experiments are publicly available.

Updated: 2024-09-24 21:54:22

标题: 算法漂移：一种模拟框架用于研究推荐系统对用户偏好的影响

摘要: 数字平台，如社交媒体和电子商务网站采用推荐系统为用户提供价值。然而，由此带来的社会后果仍不清楚。许多学者认为，推荐系统可能导致不利影响，例如由算法建议和用户选择之间的反馈循环导致的偏见放大。然而，推荐系统对用户倾向变化的影响程度仍不确定。在这种情况下，重要的是在部署之前为评估推荐算法提供一个受控环境。为了解决这个问题，我们提出了一个随机模拟框架，模拟用户-推荐系统在长期情景下的交互。具体来说，我们通过形式化用户模型来模拟用户选择，该模型包括行为方面，如用户对推荐算法的抵抗和依赖所接收建议的惰性。此外，我们引入了两个新的度量标准，用于量化算法对用户偏好的影响，特别是随时间的漂移。我们在多个合成数据集上进行了广泛评估，旨在测试我们的框架在考虑不同场景和超参数设置时的稳健性。实验结果证明了所提出的方法在通过模拟检测和量化用户偏好的漂移方面是有效的。用于执行实验的所有代码和数据都是公开可用的。

更新时间: 2024-09-24 21:54:22

领域: cs.IR,cs.AI,cs.SI

下载: http://arxiv.org/abs/2409.16478v1

We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs

The reliance of popular programming languages such as Python and JavaScript on centralized package repositories and open-source software, combined with the emergence of code-generating Large Language Models (LLMs), has created a new type of threat to the software supply chain: package hallucinations. These hallucinations, which arise from fact-conflicting errors when generating code using LLMs, represent a novel form of package confusion attack that poses a critical threat to the integrity of the software supply chain. This paper conducts a rigorous and comprehensive evaluation of package hallucinations across different programming languages, settings, and parameters, exploring how a diverse set of models and configurations affect the likelihood of generating erroneous package recommendations and identifying the root causes of this phenomenon. Using 16 popular LLMs for code generation and two unique prompt datasets, we generate 576,000 code samples in two programming languages that we analyze for package hallucinations. Our findings reveal that that the average percentage of hallucinated packages is at least 5.2% for commercial models and 21.7% for open-source models, including a staggering 205,474 unique examples of hallucinated package names, further underscoring the severity and pervasiveness of this threat. To overcome this problem, we implement several hallucination mitigation strategies and show that they are able to significantly reduce the number of package hallucinations while maintaining code quality. Our experiments and findings highlight package hallucinations as a persistent and systemic phenomenon while using state-of-the-art LLMs for code generation, and a significant challenge which deserves the research community's urgent attention.

Updated: 2024-09-24 21:46:56

标题: 我们有一个包裹送给您！通过生成代码的LLMs对包裹幻觉的全面分析

摘要: 像Python和JavaScript这样的流行编程语言对于集中式软件包存储库和开源软件的依赖，再加上代码生成的大型语言模型（LLMs）的出现，已经为软件供应链带来了一种新的威胁：软件包幻觉。这些幻觉是由LLMs生成代码时产生的事实冲突错误造成的，代表了一种新形式的软件包混淆攻击，对软件供应链的完整性构成了重大威胁。本文对不同编程语言、设置和参数中的软件包幻觉进行了严格全面的评估，探讨了各种模型和配置如何影响生成错误软件包建议的可能性，并确定了这一现象的根本原因。使用16种流行的LLMs进行代码生成和两个独特的提示数据集，我们在两种编程语言中生成了576,000个代码样本，用于分析软件包幻觉。我们的研究结果显示，商业模型的幻觉软件包平均百分比至少为5.2％，开源模型为21.7％，其中包括惊人的205,474个唯一的幻觉软件包名称示例，进一步强调了这一威胁的严重性和普遍性。为了解决这个问题，我们实施了几种幻觉缓解策略，并展示它们能够显著减少软件包幻觉的数量，同时保持代码质量。我们的实验和研究结果突出了使用最先进的LLMs进行代码生成时软件包幻觉作为一种持续而系统性的现象，并且这是一个值得研究界紧急关注的重大挑战。

更新时间: 2024-09-24 21:46:56

领域: cs.SE,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.10279v2

Score-based Neural Ordinary Differential Equations for Computing Mean Field Control Problems

Classical neural ordinary differential equations (ODEs) are powerful tools for approximating the log-density functions in high-dimensional spaces along trajectories, where neural networks parameterize the velocity fields. This paper proposes a system of neural differential equations representing first- and second-order score functions along trajectories based on deep neural networks. We reformulate the mean field control (MFC) problem with individual noises into an unconstrained optimization problem framed by the proposed neural ODE system. Additionally, we introduce a novel regularization term to enforce characteristics of viscous Hamilton--Jacobi--Bellman (HJB) equations to be satisfied based on the evolution of the second-order score function. Examples include regularized Wasserstein proximal operators (RWPOs), probability flow matching of Fokker--Planck (FP) equations, and linear quadratic (LQ) MFC problems, which demonstrate the effectiveness and accuracy of the proposed method.

Updated: 2024-09-24 21:45:55

标题: 基于得分的神经普通微分方程用于计算均场控制问题

摘要: 经典神经常微分方程（ODEs）是在轨迹上近似高维空间中对数密度函数的强大工具，其中神经网络参数化速度场。本文提出了基于深度神经网络表示轨迹上的一阶和二阶评分函数的神经微分方程系统。我们将具有个体噪声的平均场控制（MFC）问题重新制定为由提出的神经ODE系统构成的无约束优化问题。此外，我们引入了一种新的正则化项，通过第二阶评分函数的演变来强制满足粘性Hamilton-Jacobi-Bellman（HJB）方程的特性。示例包括正则化Wasserstein近端算子（RWPOs）、Fokker-Planck（FP）方程的概率流匹配和线性二次（LQ）MFC问题，展示了所提出方法的有效性和准确性。

更新时间: 2024-09-24 21:45:55

领域: math.OC,cs.LG,34H05,G.1.7

下载: http://arxiv.org/abs/2409.16471v1

Optimal vintage factor analysis with deflation varimax

Vintage factor analysis is one important type of factor analysis that aims to first find a low-dimensional representation of the original data, and then to seek a rotation such that the rotated low-dimensional representation is scientifically meaningful. The most widely used vintage factor analysis is the Principal Component Analysis (PCA) followed by the varimax rotation. Despite its popularity, little theoretical guarantee can be provided to date mainly because varimax rotation requires to solve a non-convex optimization over the set of orthogonal matrices. In this paper, we propose a deflation varimax procedure that solves each row of an orthogonal matrix sequentially. In addition to its net computational gain and flexibility, we are able to fully establish theoretical guarantees for the proposed procedure in a broader context. Adopting this new deflation varimax as the second step after PCA, we further analyze this two step procedure under a general class of factor models. Our results show that it estimates the factor loading matrix in the minimax optimal rate when the signal-to-noise-ratio (SNR) is moderate or large. In the low SNR regime, we offer possible improvement over using PCA and the deflation varimax when the additive noise under the factor model is structured. The modified procedure is shown to be minimax optimal in all SNR regimes. Our theory is valid for finite sample and allows the number of the latent factors to grow with the sample size as well as the ambient dimension to grow with, or even exceed, the sample size. Extensive simulation and real data analysis further corroborate our theoretical findings.

Updated: 2024-09-24 21:29:31

标题: 最佳陈年因子分析与消除方差最大化

摘要: Vintage因子分析是一种重要的因子分析类型，其目的是首先找到原始数据的低维表示，然后寻找一个旋转，使得旋转后的低维表示在科学上具有意义。最广泛使用的Vintage因子分析是主成分分析（PCA）后跟着varimax旋转。尽管它很受欢迎，但目前很少能提供理论保证，主要是因为varimax旋转需要在正交矩阵集合上解决非凸优化问题。在本文中，我们提出了一种逐行解决正交矩阵的deflation varimax过程。除了其净计算效益和灵活性外，我们还能够在更广泛的背景下充分建立所提出过程的理论保证。采用这种新的deflation varimax作为PCA之后的第二步，我们进一步在一般因子模型类别下分析了这个两步过程。我们的结果表明，在信噪比（SNR）适中或较大时，它以极小化最佳速率估计因子载荷矩阵。在低SNR范围内，当因子模型下的添加噪声是结构化的时，我们提供了可能的改进，超越了使用PCA和deflation varimax。修改后的过程被证明在所有SNR范围内是极小化最佳的。我们的理论对有限样本有效，并允许潜在因子的数量随着样本大小增长，环境维度随着样本大小增长，甚至超过样本大小。广泛的模拟和真实数据分析进一步证实了我们的理论发现。

更新时间: 2024-09-24 21:29:31

领域: stat.ML,cs.IT,cs.LG,eess.SP,math.IT

下载: http://arxiv.org/abs/2310.10545v2

Communication and Energy Efficient Federated Learning using Zero-Order Optimization Technique

Federated learning (FL) is a popular machine learning technique that enables multiple users to collaboratively train a model while maintaining the user data privacy. A significant challenge in FL is the communication bottleneck in the upload direction, and thus the corresponding energy consumption of the devices, attributed to the increasing size of the model/gradient. In this paper, we address this issue by proposing a zero-order (ZO) optimization method that requires the upload of a quantized single scalar per iteration by each device instead of the whole gradient vector. We prove its theoretical convergence and find an upper bound on its convergence rate in the non-convex setting, and we discuss its implementation in practical scenarios. Our FL method and the corresponding convergence analysis take into account the impact of quantization and packet dropping due to wireless errors. We show also the superiority of our method, in terms of communication overhead and energy consumption, as compared to standard gradient-based FL methods.

Updated: 2024-09-24 20:57:22

标题: 通信和能效高的联邦学习：使用零阶优化技术

摘要: 联邦学习（FL）是一种流行的机器学习技术，可以使多个用户在保护用户数据隐私的同时协作训练模型。FL中的一个重要挑战是上传方向的通信瓶颈，以及由于模型/梯度大小增加而导致设备相应的能量消耗。本文通过提出一种零阶（ZO）优化方法来解决这个问题，该方法要求每个设备在每次迭代中上传一个量化的标量，而不是整个梯度向量。我们证明了其在非凸设置中的理论收敛性，并找到了其收敛速度的上限，并讨论了在实际场景中的实现。我们的FL方法及相应的收敛分析考虑了量化和由于无线错误导致的数据包丢失的影响。我们还展示了我们的方法在通信开销和能量消耗方面相对于标准基于梯度的FL方法的优越性。

更新时间: 2024-09-24 20:57:22

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2409.16456v1

A Scoping Review of Earth Observation and Machine Learning for Causal Inference: Implications for the Geography of Poverty

Earth observation (EO) data such as satellite imagery can have far-reaching impacts on our understanding of the geography of poverty, especially when coupled with machine learning (ML) and computer vision. Early research in computer vision used predictive models to estimate living conditions, especially in contexts where data availability on poverty was scarce. Recent work has progressed beyond using EO data to predict such outcomes -- now also using it to conduct causal inference. However, how such EO-ML models are used for causality remains incompletely mapped. To address this gap, we conduct a scoping review where we first document the growth of interest in using satellite images and other sources of EO data in causal analysis. We then trace the methodological relationship between spatial statistics and ML methods before discussing five ways in which EO data has been used in scientific workflows -- (1) outcome imputation for downstream causal analysis, (2) EO image deconfounding, (3) EO-based treatment effect heterogeneity, (4) EO-based transportability analysis, and (5) image-informed causal discovery. We consolidate these observations by providing a detailed workflow for how researchers can incorporate EO data in causal analysis going forward -- from data requirements to choice of computer vision model and evaluation metrics. While our discussion focuses on health and living conditions outcomes, our workflow applies to other measures of sustainable development where EO data are informative.

Updated: 2024-09-24 20:50:21

标题: 地球观测和机器学习在因果推断中的作用：对贫困地理学的影响的范围审查

摘要: 地球观测（EO）数据，如卫星图像，结合机器学习（ML）和计算机视觉，对我们理解贫困地理特征具有深远影响。计算机视觉早期研究使用预测模型来估计生活条件，特别是在贫困数据稀缺的情况下。最近的研究已超越使用EO数据来预测这些结果，现在还用于进行因果推断。然而，如何利用这种EO-ML模型进行因果性分析仍未完全映射。为了填补这一空白，我们进行了一项范围审查，首先记录了在因果分析中使用卫星图像和其他EO数据来源的兴趣增长。然后追踪空间统计和ML方法之间的方法论关系，然后讨论了EO数据在科学工作流程中使用的五种方式 - （1）结果插补用于下游因果分析，（2）EO图像去混淆，（3）基于EO的治疗效果异质性，（4）基于EO的可传递性分析，以及（5）基于图像的因果发现。我们通过提供研究人员如何将EO数据纳入因果分析的详细工作流程来整合这些观察结果 - 从数据需求到计算机视觉模型和评估指标的选择。虽然我们的讨论重点放在健康和生活条件结果上，但我们的工作流程适用于EO数据具有信息价值的可持续发展措施。

更新时间: 2024-09-24 20:50:21

领域: cs.LG,cs.CV,stat.ME,stat.ML,62H11,I.2.6; I.5.4

下载: http://arxiv.org/abs/2406.02584v3

Neuron-Level Knowledge Attribution in Large Language Models

Identifying important neurons for final predictions is essential for understanding the mechanisms of large language models. Due to computational constraints, current attribution techniques struggle to operate at neuron level. In this paper, we propose a static method for pinpointing significant neurons. Compared to seven other methods, our approach demonstrates superior performance across three metrics. Additionally, since most static methods typically only identify "value neurons" directly contributing to the final prediction, we propose a method for identifying "query neurons" which activate these "value neurons". Finally, we apply our methods to analyze six types of knowledge across both attention and feed-forward network (FFN) layers. Our method and analysis are helpful for understanding the mechanisms of knowledge storage and set the stage for future research in knowledge editing. The code is available on https://github.com/zepingyu0512/neuron-attribution.

Updated: 2024-09-24 20:36:03

标题: 大型语言模型中的神经元级知识归因

摘要: 识别最重要的神经元对于最终预测至关重要，这有助于理解大型语言模型的机制。由于计算约束，目前的归因技术很难在神经元级别上进行操作。在本文中，我们提出了一种静态方法来确定重要的神经元。与其他七种方法相比，我们的方法在三个指标上表现出优越性能。此外，由于大多数静态方法通常仅识别直接影响最终预测的“值神经元”，我们提出了一种方法来识别激活这些“值神经元”的“查询神经元”。最后，我们应用我们的方法来分析跨注意力和前馈网络（FFN）层的六种知识类型。我们的方法和分析有助于理解知识存储的机制，并为知识编辑的未来研究打下基础。代码可在https://github.com/zepingyu0512/neuron-attribution 上找到。

更新时间: 2024-09-24 20:36:03

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2312.12141v4

A Multi-Agent Multi-Environment Mixed Q-Learning for Partially Decentralized Wireless Network Optimization

Q-learning is a powerful tool for network control and policy optimization in wireless networks, but it struggles with large state spaces. Recent advancements, like multi-environment mixed Q-learning (MEMQ), improves performance and reduces complexity by integrating multiple Q-learning algorithms across multiple related environments so-called digital cousins. However, MEMQ is designed for centralized single-agent networks and is not suitable for decentralized or multi-agent networks. To address this challenge, we propose a novel multi-agent MEMQ algorithm for partially decentralized wireless networks with multiple mobile transmitters (TXs) and base stations (BSs), where TXs do not have access to each other's states and actions. In uncoordinated states, TXs act independently to minimize their individual costs. In coordinated states, TXs use a Bayesian approach to estimate the joint state based on local observations and share limited information with leader TX to minimize joint cost. The cost of information sharing scales linearly with the number of TXs and is independent of the joint state-action space size. The proposed scheme is 50% faster than centralized MEMQ with only a 20% increase in average policy error (APE) and is 25% faster than several advanced decentralized Q-learning algorithms with 40% less APE. The convergence of the algorithm is also demonstrated.

Updated: 2024-09-24 20:34:47

标题: 一个多代理多环境混合Q学习用于部分去中心化无线网络优化

摘要: Q学习是无线网络中网络控制和策略优化的强大工具，但在处理大状态空间时存在困难。最近的进展，如多环境混合Q学习（MEMQ），通过在多个相关环境中集成多个Q学习算法，即所谓的数字表兄弟，提高了性能并降低了复杂性。然而，MEMQ设计用于集中式单代理网络，不适用于分散式或多代理网络。为了解决这一挑战，我们提出了一种新颖的多代理MEMQ算法，用于部分分散式无线网络，其中有多个移动发射机（TXs）和基站（BSs），其中TXs不能访问彼此的状态和行动。在协调状态下，TXs使用贝叶斯方法根据本地观察估计联合状态，并与领导TX共享有限信息，以最小化联合成本。信息共享的成本与TXs的数量成线性关系，与联合状态行动空间大小无关。所提出的方案比集中式MEMQ快50％，平均策略误差（APE）仅增加20％，比多个先进的分散式Q学习算法快25％，APE减少40％。算法的收敛性也得到了证明。

更新时间: 2024-09-24 20:34:47

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2409.16450v1

How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning

We investigate the mechanism of in-context learning (ICL) on sentence classification tasks with semantically-unrelated labels ("foo"/"bar"). We find intervening in only 1\% heads (named "in-context heads") significantly affects ICL accuracy from 87.6\% to 24.4\%. To understand this phenomenon, we analyze the value-output vectors in these heads and discover that the vectors at each label position contain substantial information about the corresponding labels. Furthermore, we observe that the prediction shift from "foo" to "bar" is due to the respective reduction and increase in these heads' attention scores at "foo" and "bar" positions. Therefore, we propose a hypothesis for ICL: in in-context heads, the value-output matrices extract label features, while the query-key matrices compute the similarity between the features at the last position and those at each label position. The query and key matrices can be considered as two towers that learn the similarity metric between the last position's features and each demonstration at label positions. Using this hypothesis, we explain the majority label bias and recency bias in ICL and propose two methods to reduce these biases by 22\% and 17\%, respectively.

Updated: 2024-09-24 20:27:53

标题: 大型语言模型如何在上下文中学习？上下文头部的查询和键矩阵是度量学习的两座塔楼

摘要: 我们研究了在具有语义无关标签（“foo”/“bar”）的句子分类任务中，上下文学习（ICL）的机制。我们发现干预仅1％的头部（称为“上下文头部”）显着影响了ICL的准确率，从87.6％降至24.4％。为了理解这一现象，我们分析了这些头部中的值输出向量，并发现每个标签位置的向量包含与相应标签相关的大量信息。此外，我们观察到从“foo”到“bar”的预测变化是由于在“foo”和“bar”位置的这些头部的关注得分的相应减少和增加。因此，我们提出了关于ICL的假设：在上下文头部中，值输出矩阵提取标签特征，而查询-键矩阵计算最后位置的特征与每个标签位置的相似度。查询和键矩阵可以被视为学习最后位置特征与每个标签位置演示之间的相似度度量的两个塔。利用这一假设，我们解释了ICL中的大多数标签偏见和最近偏见，并分别提出了两种方法来分别减少这些偏见22％和17％。

更新时间: 2024-09-24 20:27:53

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.02872v3

Artificial Intelligence for Secured Information Systems in Smart Cities: Collaborative IoT Computing with Deep Reinforcement Learning and Blockchain

The accelerated expansion of the Internet of Things (IoT) has raised critical challenges associated with privacy, security, and data integrity, specifically in infrastructures such as smart cities or smart manufacturing. Blockchain technology provides immutable, scalable, and decentralized solutions to address these challenges, and integrating deep reinforcement learning (DRL) into the IoT environment offers enhanced adaptability and decision-making. This paper investigates the integration of blockchain and DRL to optimize mobile transmission and secure data exchange in IoT-assisted smart cities. Through the clustering and categorization of IoT application systems, the combination of DRL and blockchain is shown to enhance the performance of IoT networks by maintaining privacy and security. Based on the review of papers published between 2015 and 2024, we have classified the presented approaches and offered practical taxonomies, which provide researchers with critical perspectives and highlight potential areas for future exploration and research. Our investigation shows how combining blockchain's decentralized framework with DRL can address privacy and security issues, improve mobile transmission efficiency, and guarantee robust, privacy-preserving IoT systems. Additionally, we explore blockchain integration for DRL and outline the notable applications of DRL technology. By addressing the challenges of machine learning and blockchain integration, this study proposes novel perspectives for researchers and serves as a foundational exploration from an interdisciplinary standpoint.

Updated: 2024-09-24 20:25:20

标题: 智能城市中安全信息系统的人工智能：深度强化学习与区块链的协作物联网计算

摘要: 物联网（IoT）的快速扩张引发了与隐私、安全和数据完整性相关的关键挑战，特别是在智能城市或智能制造等基础设施中。区块链技术提供了不可变、可扩展和去中心化的解决方案，以解决这些挑战，并将深度强化学习（DRL）集成到物联网环境中，提供了增强的适应性和决策能力。本文研究了区块链和DRL的集成，以优化物联网辅助智能城市中的移动传输和安全数据交换。通过对物联网应用系统进行聚类和分类，显示了DRL和区块链的结合可以通过保持隐私和安全性来提高物联网网络的性能。根据2015年至2024年发表的论文回顾，我们对所提出的方法进行了分类，并提供了实用的分类法，为研究人员提供了关键的视角，并突出了未来探索和研究的潜在领域。我们的调查显示，将区块链的去中心化框架与DRL相结合可以解决隐私和安全问题，提高移动传输效率，并确保强大、保护隐私的物联网系统。此外，我们探讨了区块链与DRL的集成，并概述了DRL技术的显著应用。通过解决机器学习和区块链集成的挑战，本研究为研究人员提出了新颖的视角，并作为一个跨学科研究的基础探索。

更新时间: 2024-09-24 20:25:20

领域: cs.AI,cs.CR

下载: http://arxiv.org/abs/2409.16444v1

Knowledge-based Neural Ordinary Differential Equations for Cosserat Rod-based Soft Robots

Soft robots have many advantages over rigid robots thanks to their compliant and passive nature. However, it is generally challenging to model the dynamics of soft robots due to their high spatial dimensionality, making it difficult to use model-based methods to accurately control soft robots. It often requires direct numerical simulation of partial differential equations to simulate soft robots. This not only requires an accurate numerical model, but also makes soft robot modeling slow and expensive. Deep learning algorithms have shown promises in data-driven modeling of soft robots. However, these algorithms usually require a large amount of data, which are difficult to obtain in either simulation or real-world experiments of soft robots. In this work, we propose KNODE-Cosserat, a framework that combines first-principle physics models and neural ordinary differential equations. We leverage the best from both worlds -- the generalization ability of physics-based models and the fast speed of deep learning methods. We validate our framework in both simulation and real-world experiments. In both cases, we show that the robot model significantly improves over the baseline models under different metrics.

Updated: 2024-09-24 20:24:33

标题: 基于知识的神经常微分方程在基于Cosserat杆的软机器人中的应用

摘要: 软机器人由于其具有顺应性和被动性质，相比刚性机器人具有许多优势。然而，由于软机器人的高空间维度，通常很难对其动态进行建模，使得使用基于模型的方法准确控制软机器人变得困难。通常需要直接数值模拟偏微分方程来模拟软机器人。这不仅需要准确的数值模型，还使得软机器人建模变得缓慢和昂贵。深度学习算法在数据驱动建模软机器人方面表现出了潜力。然而，这些算法通常需要大量数据，这在软机器人的模拟或真实世界实验中很难获得。在这项工作中，我们提出了KNODE-Cosserat，这是一个结合了第一原理物理模型和神经常微分方程的框架。我们充分利用了两种世界的优势--基于物理的模型的泛化能力和深度学习方法的快速速度。我们在模拟和真实世界实验中验证了我们的框架。在两种情况下，我们展示了机器人模型在不同指标下明显优于基准模型。

更新时间: 2024-09-24 20:24:33

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2408.07776v2

A novel open-source ultrasound dataset with deep learning benchmarks for spinal cord injury localization and anatomical segmentation

While deep learning has catalyzed breakthroughs across numerous domains, its broader adoption in clinical settings is inhibited by the costly and time-intensive nature of data acquisition and annotation. To further facilitate medical machine learning, we present an ultrasound dataset of 10,223 Brightness-mode (B-mode) images consisting of sagittal slices of porcine spinal cords (N=25) before and after a contusion injury. We additionally benchmark the performance metrics of several state-of-the-art object detection algorithms to localize the site of injury and semantic segmentation models to label the anatomy for comparison and creation of task-specific architectures. Finally, we evaluate the zero-shot generalization capabilities of the segmentation models on human ultrasound spinal cord images to determine whether training on our porcine dataset is sufficient for accurately interpreting human data. Our results show that the YOLOv8 detection model outperforms all evaluated models for injury localization, achieving a mean Average Precision (mAP50-95) score of 0.606. Segmentation metrics indicate that the DeepLabv3 segmentation model achieves the highest accuracy on unseen porcine anatomy, with a Mean Dice score of 0.587, while SAMed achieves the highest Mean Dice score generalizing to human anatomy (0.445). To the best of our knowledge, this is the largest annotated dataset of spinal cord ultrasound images made publicly available to researchers and medical professionals, as well as the first public report of object detection and segmentation architectures to assess anatomical markers in the spinal cord for methodology development and clinical applications.

Updated: 2024-09-24 20:22:59

标题: 一个新颖的开源超声数据集，用于脊髓损伤定位和解剖分割的深度学习基准。

摘要: 尽管深度学习在许多领域取得了突破，但其在临床设置中的更广泛应用受到数据获取和标注的成本高昂和耗时的阻碍。为进一步促进医学机器学习，我们提供了一个超声数据集，包含10,223张亮度模式（B模式）图像，由25只猪的脊髓矢状切片（受伤前后）组成。我们还对几种最先进的目标检测算法的性能指标进行了基准测试，以定位受伤部位，并使用语义分割模型标记解剖结构，以便比较和创建特定任务的架构。最后，我们评估了分割模型在人类超声脊髓图像上的零样本泛化能力，以确定在我们的猪数据集上训练是否足以准确解释人类数据。我们的结果显示，YOLOv8检测模型在受伤定位方面优于所有评估模型，实现了0.606的平均精度（mAP50-95）分数。分割指标表明，DeepLabv3分割模型在未见过的猪解剖结构上实现了最高准确率，均值Dice分数为0.587，而SAMed在泛化到人类解剖结构时获得了最高的均值Dice分数（0.445）。据我们所知，这是公开提供给研究人员和医疗专业人员的最大的脊髓超声图像标注数据集，也是首次公开报告评估脊髓解剖标记的目标检测和分割架构，用于方法开发和临床应用。

更新时间: 2024-09-24 20:22:59

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.16441v1

Acceleration Methods

This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nesterov and structure convergence proofs using a few master templates, such as that for optimized gradient methods, which provide the key benefit of showing how momentum methods optimize convergence guarantees. We further cover proximal acceleration, at the heart of the Catalyst and Accelerated Hybrid Proximal Extragradient frameworks, using similar algorithmic patterns. Common acceleration techniques rely directly on the knowledge of some of the regularity parameters in the problem at hand. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates while adapting to unobserved regularity parameters.

Updated: 2024-09-24 20:19:22

标题: 加速方法

摘要: 这篇专著涵盖了近期在凸优化中经常使用的一系列加速技术方面的一些最新进展。我们首先利用二次优化问题介绍了两个关键的方法族，即动量和嵌套优化方案。它们在二次情况下重合形成切比雪夫方法。我们详细讨论了动量方法，从Nesterov的开创性工作开始，使用几个主模板结构收敛证明，如优化梯度方法的模板，这提供了动量方法优化收敛保证的关键好处。我们进一步涵盖了在Catalyst和Accelerated Hybrid Proximal Extragradient框架的核心地位上的近端加速，使用类似的算法模式。常见的加速技术直接依赖于对所处理问题的一些正则性参数的了解。最后，我们讨论了重启方案，一组简单的技术，可在适应未观察到的正则性参数的情况下达到几乎最佳的收敛速度。

更新时间: 2024-09-24 20:19:22

领域: math.OC,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2101.09545v4

Glitch in Time: Exploiting Temporal Misalignment of IMU For Eavesdropping

The increasing use of voice assistants and related applications has raised significant concerns about the security of Inertial Measurement Units (IMUs) in smartphones. These devices are vulnerable to acoustic eavesdropping attacks, jeopardizing user privacy. In response, Google imposed a rate limit of 200 Hz on permission-free access to IMUs, aiming to neutralize such side-channel attacks. Our research introduces a novel exploit, STAG, which circumvents these protections. It induces a temporal misalignment between the gyroscope and accelerometer, cleverly combining their data to resample at higher rates and reviving the potential for eavesdropping attacks previously curtailed by Google's security enhancements. Compared to prior methods, STAG achieves an 83.4% reduction in word error rate, highlighting its effectiveness in exploiting IMU data under restricted access and emphasizing the persistent security risks associated with these sensors.

Updated: 2024-09-24 20:04:44

标题: 时间中的故障：利用IMU的时间错位进行窃听

摘要: 随着语音助手和相关应用的日益普及，人们对智能手机中的惯性测量单元（IMUs）安全性的担忧日益增加。这些设备容易受到声学窃听攻击，危及用户隐私。为应对此问题，Google对IMUs的无需许可访问施加了200 Hz的速率限制，旨在中和此类侧信道攻击。我们的研究引入了一种新的利用技术，名为STAG，它绕过了这些保护措施。它通过在陀螺仪和加速度计之间引入时间错位，巧妙地结合它们的数据以在较高速率上重新采样，重新激活了之前被Google安全增强所限制的窃听攻击的潜力。与先前的方法相比，STAG实现了83.4%的词错误率降低，突显了其在受限访问下利用IMU数据的有效性，并强调了与这些传感器相关的持续安全风险。

更新时间: 2024-09-24 20:04:44

领域: cs.CR

下载: http://arxiv.org/abs/2409.16438v1

Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition

Parameter-efficient transfer learning (PETL) has attracted significant attention lately, due to the increasing size of pre-trained models and the need to fine-tune (FT) them for superior downstream performance. This community-wide enthusiasm has sparked a plethora of new methods. Nevertheless, a systematic study to understand their performance and suitable application scenarios is lacking, leaving questions like when to apply PETL and which method to use largely unanswered. In this paper, we conduct a unifying empirical study of representative PETL methods in the context of Vision Transformers. We systematically tune their hyper-parameters to fairly compare their accuracy on downstream tasks. Our study not only offers a valuable user guide but also unveils several new insights. First, if tuned carefully, different PETL methods can obtain quite similar accuracy in the low-shot benchmark VTAB-1K. This includes simple methods like FT the bias terms that were reported inferior. Second, though with similar accuracy, we find that PETL methods make different mistakes and high-confidence predictions, likely due to their different inductive biases. Such an inconsistency (or complementariness) opens up the opportunity for ensemble methods, and we make preliminary attempts at this. Third, going beyond the commonly used low-shot tasks, we find that PETL is also useful in many-shot regimes -- it achieves comparable and sometimes better accuracy than full FT, using much fewer learnable parameters. Last but not least, we investigate PETL's ability to preserve a pre-trained model's robustness to distribution shifts (e.g., a CLIP backbone). Perhaps not surprisingly, PETL methods outperform full FT alone. However, with weight-space ensembles, the fully FT model can achieve a better balance between downstream and out-of-distribution performance, suggesting a future research direction for PETL.

Updated: 2024-09-24 19:57:40

标题: 从视觉识别中参数高效迁移学习（PETL）的统一经验研究中学到的教训

摘要: Parameter-efficient transfer learning（PETL）最近引起了很大的关注，这是因为预训练模型的规模不断增加，需要对其进行微调以获得优越的下游性能。这种社区范围的热情引发了大量新方法。然而，缺乏对它们性能和适用场景的系统研究，留下了诸如何时应用PETL以及使用哪种方法等问题大多未解。在本文中，我们在Vision Transformers的背景下进行了一项统一的经验研究，比较了代表性的PETL方法。我们系统地调整它们的超参数，以公平比较它们在下游任务中的准确性。我们的研究不仅提供了宝贵的用户指南，还揭示了一些新的见解。首先，经过仔细调整，不同的PETL方法在低样本基准VTAB-1K中可以获得相似的准确性。这包括报道较差的简单方法，如微调偏差项。其次，虽然准确性相似，我们发现PETL方法会犯不同的错误并做出高置信度预测，可能是由于它们不同的归纳偏差。这种不一致性（或互补性）为集成方法提供了机会，我们对此进行了初步尝试。第三，超越常用的低样本任务，我们发现PETL在高样本范围中也很有用--它实现了与完全微调相当甚至更好的准确性，同时使用更少的可学习参数。最后，我们调查了PETL保留预训练模型对分布转移（例如，CLIP骨干）鲁棒性的能力。也许并不令人惊讶的是，PETL方法胜过单独的完全微调。然而，通过权重空间集成，完全微调模型可以实现更好地平衡下游和分布外性能，为PETL的未来研究方向提出了建议。

更新时间: 2024-09-24 19:57:40

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.16434v1

A Comprehensive Survey of Bias in LLMs: Current Landscape and Future Directions

Large Language Models(LLMs) have revolutionized various applications in natural language processing (NLP) by providing unprecedented text generation, translation, and comprehension capabilities. However, their widespread deployment has brought to light significant concerns regarding biases embedded within these models. This paper presents a comprehensive survey of biases in LLMs, aiming to provide an extensive review of the types, sources, impacts, and mitigation strategies related to these biases. We systematically categorize biases into several dimensions. Our survey synthesizes current research findings and discusses the implications of biases in real-world applications. Additionally, we critically assess existing bias mitigation techniques and propose future research directions to enhance fairness and equity in LLMs. This survey serves as a foundational resource for researchers, practitioners, and policymakers concerned with addressing and understanding biases in LLMs.

Updated: 2024-09-24 19:50:38

标题: 一项关于LLMs中偏见的综合调查：当前现状和未来方向

摘要: 大型语言模型（LLMs）通过提供前所未有的文本生成、翻译和理解能力，彻底改变了自然语言处理（NLP）中各种应用。然而，它们的广泛应用揭示了这些模型内嵌的显著偏见问题。本文对LLMs中的偏见进行了全面调查，旨在对与这些偏见相关的类型、来源、影响和缓解策略进行广泛审查。我们系统地将偏见分为几个维度。我们的调查综合了当前研究成果，并讨论了偏见在现实应用中的影响。此外，我们对现有的偏见缓解技术进行了批判性评估，并提出了未来研究方向，以增强LLMs中的公平性和公正性。这项调查为关注和理解LLMs中偏见的研究人员、从业者和政策制定者提供了基础资源。

更新时间: 2024-09-24 19:50:38

领域: cs.CL,cs.AI,cs.CY,cs.HC

下载: http://arxiv.org/abs/2409.16430v1

Leveraging Local Structure for Improving Model Explanations: An Information Propagation Approach

Numerous explanation methods have been recently developed to interpret the decisions made by deep neural network (DNN) models. For image classifiers, these methods typically provide an attribution score to each pixel in the image to quantify its contribution to the prediction. However, most of these explanation methods appropriate attribution scores to pixels independently, even though both humans and DNNs make decisions by analyzing a set of closely related pixels simultaneously. Hence, the attribution score of a pixel should be evaluated jointly by considering itself and its structurally-similar pixels. We propose a method called IProp, which models each pixel's individual attribution score as a source of explanatory information and explains the image prediction through the dynamic propagation of information across all pixels. To formulate the information propagation, IProp adopts the Markov Reward Process, which guarantees convergence, and the final status indicates the desired pixels' attribution scores. Furthermore, IProp is compatible with any existing attribution-based explanation method. Extensive experiments on various explanation methods and DNN models verify that IProp significantly improves them on a variety of interpretability metrics.

Updated: 2024-09-24 19:48:47

标题: 利用本地结构改进模型解释：一种信息传播方法

摘要: 最近已经开发了许多解释方法来解释深度神经网络（DNN）模型所做的决策。对于图像分类器，这些方法通常为图像中的每个像素提供一个归因分数，以量化其对预测的贡献。然而，大多数这些解释方法独立地将归因分数分配给像素，即使人类和DNN都是通过同时分析一组相关像素来做出决策。因此，应该通过同时考虑像素本身和其结构相似的像素来联合评估像素的归因分数。我们提出了一种名为IProp的方法，将每个像素的个体归因分数建模为解释信息的来源，并通过所有像素之间的信息动态传播来解释图像预测。为了制定信息传播，IProp采用了马尔可夫回报过程，保证收敛，最终状态指示所需像素的归因分数。此外，IProp与任何现有基于归因的解释方法兼容。对各种解释方法和DNN模型进行的大量实验验证了IProp在各种可解释性指标上显着改善了它们。

更新时间: 2024-09-24 19:48:47

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.16429v1

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

AI agents are increasingly autonomous in their interactions with human users and tools, leading to increased interactional safety risks. We present HAICOSYSTEM, a framework examining AI agent safety within diverse and complex social interactions. HAICOSYSTEM features a modular sandbox environment that simulates multi-turn interactions between human users and AI agents, where the AI agents are equipped with a variety of tools (e.g., patient management platforms) to navigate diverse scenarios (e.g., a user attempting to access other patients' profiles). To examine the safety of AI agents in these interactions, we develop a comprehensive multi-dimensional evaluation framework that uses metrics covering operational, content-related, societal, and legal risks. Through running 1840 simulations based on 92 scenarios across seven domains (e.g., healthcare, finance, education), we demonstrate that HAICOSYSTEM can emulate realistic user-AI interactions and complex tool use by AI agents. Our experiments show that state-of-the-art LLMs, both proprietary and open-sourced, exhibit safety risks in over 50\% cases, with models generally showing higher risks when interacting with simulated malicious users. Our findings highlight the ongoing challenge of building agents that can safely navigate complex interactions, particularly when faced with malicious users. To foster the AI agent safety ecosystem, we release a code platform that allows practitioners to create custom scenarios, simulate interactions, and evaluate the safety and performance of their agents.

Updated: 2024-09-24 19:47:21

标题: HAICOSYSTEM：一个用于隔离人工智能与人类交互安全风险的生态系统

摘要: 人工智能（AI）代理在与人类用户和工具的互动中变得越来越自主，这导致了交互安全风险的增加。我们提出了HAICOSYSTEM，这是一个在多样化和复杂社交互动中检查AI代理安全性的框架。HAICOSYSTEM具有一个模块化沙盒环境，模拟人类用户和AI代理之间的多轮互动，其中AI代理配备了各种工具（例如，患者管理平台）来应对不同的情景（例如，用户试图访问其他患者的档案）。为了检查这些互动中AI代理的安全性，我们开发了一个全面的多维度评估框架，使用覆盖运营、内容相关、社会和法律风险的指标。通过在七个领域（例如，医疗保健、金融、教育）中基于92个情景进行1840次模拟，我们展示了HAICOSYSTEM可以模拟真实的用户-AI互动和AI代理的复杂工具使用。我们的实验表明，现代LLM模型，无论是专有的还是开源的，在超过50%的情况下存在安全风险，尤其在与模拟恶意用户互动时风险更高。我们的发现突显了构建能够安全地应对复杂互动的代理的持续挑战，特别是当面对恶意用户时。为了促进AI代理安全生态系统，我们发布了一个代码平台，允许从业者创建自定义情景，模拟互动，并评估他们的代理的安全性和性能。

更新时间: 2024-09-24 19:47:21

领域: cs.AI

下载: http://arxiv.org/abs/2409.16427v1

Statistical tuning of artificial neural network

Neural networks are often regarded as "black boxes" due to their complex functions and numerous parameters, which poses significant challenges for interpretability. This study addresses these challenges by introducing methods to enhance the understanding of neural networks, focusing specifically on models with a single hidden layer. We establish a theoretical framework by demonstrating that the neural network estimator can be interpreted as a nonparametric regression model. Building on this foundation, we propose statistical tests to assess the significance of input neurons and introduce algorithms for dimensionality reduction, including clustering and (PCA), to simplify the network and improve its interpretability and accuracy. The key contributions of this study include the development of a bootstrapping technique for evaluating artificial neural network (ANN) performance, applying statistical tests and logistic regression to analyze hidden neurons, and assessing neuron efficiency. We also investigate the behavior of individual hidden neurons in relation to out-put neurons and apply these methodologies to the IDC and Iris datasets to validate their practical utility. This research advances the field of Explainable Artificial Intelligence by presenting robust statistical frameworks for interpreting neural networks, thereby facilitating a clearer understanding of the relationships between inputs, outputs, and individual network components.

Updated: 2024-09-24 19:47:03

标题: 人工神经网络的统计调优

摘要: 神经网络常常被视为“黑匣子”，因为它们的复杂功能和众多的参数，这给可解释性带来了重大挑战。本研究通过引入方法来增强对神经网络的理解，特别关注具有单个隐藏层的模型。我们建立了一个理论框架，证明了神经网络估计器可以被解释为一个非参数回归模型。在此基础上，我们提出了用于评估输入神经元显著性的统计测试，并引入了包括聚类和主成分分析（PCA）在内的降维算法，以简化网络并提高其可解释性和准确性。本研究的关键贡献包括开发一种用于评估人工神经网络（ANN）性能的自助法技术，应用统计测试和逻辑回归分析隐藏神经元，并评估神经元效率。我们还调查了个别隐藏神经元与输出神经元之间的行为，并将这些方法应用于IDC和鸢尾花数据集以验证它们的实用性。这项研究通过提供稳健的统计框架来解释神经网络，推动了可解释人工智能领域的进展，从而促进对输入、输出和个别网络组件之间关系的更清晰理解。

更新时间: 2024-09-24 19:47:03

领域: stat.ML,cs.LG,stat.AP

下载: http://arxiv.org/abs/2409.16426v1

Lessons for Editors of AI Incidents from the AI Incident Database

As artificial intelligence (AI) systems become increasingly deployed across the world, they are also increasingly implicated in AI incidents - harm events to individuals and society. As a result, industry, civil society, and governments worldwide are developing best practices and regulations for monitoring and analyzing AI incidents. The AI Incident Database (AIID) is a project that catalogs AI incidents and supports further research by providing a platform to classify incidents for different operational and research-oriented goals. This study reviews the AIID's dataset of 750+ AI incidents and two independent taxonomies applied to these incidents to identify common challenges to indexing and analyzing AI incidents. We find that certain patterns of AI incidents present structural ambiguities that challenge incident databasing and explore how epistemic uncertainty in AI incident reporting is unavoidable. We therefore report mitigations to make incident processes more robust to uncertainty related to cause, extent of harm, severity, or technical details of implicated systems. With these findings, we discuss how to develop future AI incident reporting practices.

Updated: 2024-09-24 19:46:58

标题: 来自AI事故数据库的编辑人员的教训

摘要: 随着人工智能（AI）系统在全球范围内得到越来越广泛的应用，它们也越来越多地涉及到AI事件 - 对个人和社会造成的伤害事件。因此，全球的行业、民间社会和政府正在制定监测和分析AI事件的最佳实践和规定。人工智能事件数据库（AIID）是一个项目，它记录了AI事件并通过为不同的运营和研究目标提供分类事件的平台来支持进一步研究。本研究回顾了AIID的数据集中的750多个AI事件以及应用于这些事件的两个独立分类法，以识别索引和分析AI事件时的常见挑战。我们发现某些AI事件的模式存在结构上的模糊，挑战了事件数据库化，并探讨了AI事件报告中认识不确定性是不可避免的。因此，我们报告了减轻措施，以使事件过程更加稳健，以应对有关原因、伤害程度、严重程度或涉及系统的技术细节的不确定性。根据这些发现，我们讨论了如何发展未来的AI事件报告实践。

更新时间: 2024-09-24 19:46:58

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.16425v1

Is All Learning (Natural) Gradient Descent?

This paper shows that a wide class of effective learning rules -- those that improve a scalar performance measure over a given time window -- can be rewritten as natural gradient descent with respect to a suitably defined loss function and metric. Specifically, we show that parameter updates within this class of learning rules can be expressed as the product of a symmetric positive definite matrix (i.e., a metric) and the negative gradient of a loss function. We also demonstrate that these metrics have a canonical form and identify several optimal ones, including the metric that achieves the minimum possible condition number. The proofs of the main results are straightforward, relying only on elementary linear algebra and calculus, and are applicable to continuous-time, discrete-time, stochastic, and higher-order learning rules, as well as loss functions that explicitly depend on time.

Updated: 2024-09-24 19:41:08

标题: 所有学习都是(自然)梯度下降吗？

摘要: 这篇论文表明，一类广泛有效的学习规则 - 即那些在给定时间窗口内改善标量性能指标的规则 - 可以被重新表述为相对于适当定义的损失函数和度量的自然梯度下降。具体来说，我们表明在这类学习规则中的参数更新可以被表示为对称正定矩阵（即度量）和损失函数的负梯度的乘积。我们还证明了这些度量具有一个规范形式，并确定了几种最佳度量，包括实现最小可能条件数的度量。主要结果的证明是直接的，仅依赖于基本的线性代数和微积分，适用于连续时间、离散时间、随机和高阶学习规则，以及明确依赖于时间的损失函数。

更新时间: 2024-09-24 19:41:08

领域: cs.LG,math.DS,q-bio.NC

下载: http://arxiv.org/abs/2409.16422v1

Semi-Supervised Learning Approach for Efficient Resource Allocation with Network Slicing in O-RAN

This paper introduces an innovative approach to the resource allocation problem, aiming to coordinate multiple independent x-applications (xAPPs) for network slicing and resource allocation in the Open Radio Access Network (O-RAN). Our approach maximizes the weighted throughput among user equipment (UE) and allocates physical resource blocks (PRBs). We prioritize two service types: enhanced Mobile Broadband and Ultra-Reliable Low-Latency Communication. Two xAPPs have been designed to achieve this: a power control xAPP for each UE and a PRB allocation xAPP. The method consists of a two-part training phase. The first part uses supervised learning with a Variational Autoencoder trained to regress the power transmission, UE association, and PRB allocation decisions, and the second part uses unsupervised learning with a contrastive loss approach to improve the generalization and robustness of the model. We evaluate the performance by comparing its results to those obtained from an exhaustive search and deep Q-network algorithms and reporting performance metrics for the regression task. The results demonstrate the superior efficiency of this approach in different scenarios among the service types, reaffirming its status as a more efficient and effective solution for network slicing problems compared to state-of-the-art methods. This innovative approach not only sets our research apart but also paves the way for exciting future advancements in resource allocation in O-RAN.

Updated: 2024-09-24 19:37:20

标题: 一种用于O-RAN网络切片中高效资源分配的半监督学习方法

摘要: 本文介绍了一种创新的资源分配方法，旨在协调开放式无线接入网络（O-RAN）中的多个独立x应用程序（xAPPs）进行网络切片和资源分配。我们的方法旨在最大化用户设备（UE）之间的加权吞吐量，并分配物理资源块（PRBs）。我们优先考虑两种服务类型：增强型移动宽带和超可靠低延迟通信。为此，设计了两个xAPPs：用于每个UE的功率控制xAPP和PRB分配xAPP。该方法包括一个两部分训练阶段。第一部分使用监督学习，使用经过训练的变分自动编码器回归功率传输、UE关联和PRB分配决策，第二部分使用对比损失方法进行无监督学习，以提高模型的泛化能力和鲁棒性。通过将其结果与通过详尽搜索和深度Q网络算法获得的结果进行比较，并报告回归任务的性能指标来评估性能。结果表明，在不同场景中，该方法在服务类型之间展现出卓越的效率，证实了与最先进方法相比，该方法作为网络切片问题的更有效和更高效的解决方案的地位。这种创新方法不仅使我们的研究脱颖而出，还为O-RAN中资源分配的激动人心的未来进展铺平了道路。

更新时间: 2024-09-24 19:37:20

领域: cs.NI,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2401.08861v2

Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles

Ensemble deep learning has been shown to achieve high predictive accuracy and uncertainty estimation in a wide variety of medical imaging contexts. However, perturbations in the input images at test time (e.g. noise, domain shifts) can still lead to significant performance degradation, posing challenges for trustworthy clinical deployment. In order to address this, we propose LaDiNE, a novel and robust probabilistic method that is capable of inferring informative and invariant latent variables from the input images. These latent variables are then used to recover the robust predictive distribution without relying on a predefined functional-form. This results in improved (i) generalization capabilities and (ii) calibration of prediction confidence. Extensive experiments were performed on the task of disease classification based on the Tuberculosis chest X-ray and the ISIC Melanoma skin cancer datasets. Here the performance of LaDiNE was analysed under a range of challenging covariate shift conditions, where training was based on "clean" images, and unseen noisy inputs and adversarial perturbations were presented at test time. Results show that LaDiNE outperforms existing state-of-the-art baseline methods in terms of accuracy and confidence calibration. This increases the feasibility of deploying reliable medical machine learning models in real clinical settings, where accurate and trustworthy predictions are crucial for patient care and clinical decision support.

Updated: 2024-09-24 19:33:34

标题: 使用潜在引导扩散和嵌套集成提高医学图像分类的鲁棒性和可靠性

摘要: 集成深度学习已被证明在各种医学图像背景下实现了高预测准确性和不确定性估计。然而，在测试时输入图像的扰动（例如噪声、领域转移）仍可能导致显著的性能降级，给可靠的临床部署带来挑战。为了解决这个问题，我们提出了LaDiNE，一种新颖而强大的概率方法，能够推断出输入图像中的信息丰富且不变的潜在变量。然后利用这些潜在变量恢复稳健的预测分布，而无需依赖预定义的功能形式。这导致了改进的（i）泛化能力和（ii）预测置信度的校准。我们在基于结核病胸部X光和ISIC黑色素瘤皮肤癌数据集的疾病分类任务上进行了大量实验。在这里，LaDiNE的性能在一系列具有挑战性的协变量转移条件下进行了分析，其中训练基于“干净”图像，并在测试时呈现未见的嘈杂输入和对抗性扰动。结果表明，LaDiNE在准确性和置信度校准方面优于现有的最先进基准方法。这增加了在真实临床环境中部署可靠的医学机器学习模型的可行性，对于患者护理和临床决策支持至关重要。

更新时间: 2024-09-24 19:33:34

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2310.15952v4

Task-oriented Prompt Enhancement via Script Generation

Large Language Models (LLMs) have demonstrated remarkable abilities across various tasks, leveraging advanced reasoning. Yet, they struggle with task-oriented prompts due to a lack of specific prior knowledge of the task answers. The current state-of-the-art approach, PAL, utilizes code generation to address this issue. However, PAL depends on manually crafted prompt templates and examples while still producing inaccurate results. In this work, we present TITAN-a novel strategy designed to enhance LLMs' performance on task-oriented prompts. TITAN achieves this by generating scripts using a universal approach and zero-shot learning. Unlike existing methods, TITAN eliminates the need for detailed task-specific instructions and extensive manual efforts. TITAN enhances LLMs' performance on various tasks by utilizing their analytical and code-generation capabilities in a streamlined process. TITAN employs two key techniques: (1) step-back prompting to extract the task's input specifications and (2) chain-of-thought prompting to identify required procedural steps. This information is used to improve the LLMs' code-generation process. TITAN further refines the generated script through post-processing and the script is executed to retrieve the final answer. Our comprehensive evaluation demonstrates TITAN's effectiveness in a diverse set of tasks. On average, TITAN outperforms the state-of-the-art zero-shot approach by 7.6% and 3.9% when paired with GPT-3.5 and GPT-4. Overall, without human annotation, TITAN achieves state-of-the-art performance in 8 out of 11 cases while only marginally losing to few-shot approaches (which needed human intervention) on three occasions by small margins. This work represents a significant advancement in addressing task-oriented prompts, offering a novel solution for effectively utilizing LLMs in everyday life tasks.

Updated: 2024-09-24 19:32:08

标题: 任务导向提示增强：通过脚本生成

摘要: 大型语言模型（LLMs）已经在各种任务中展示出惊人的能力，利用先进的推理。然而，它们在面向任务的提示中遇到困难，因为缺乏对任务答案的具体先验知识。目前最先进的方法PAL利用代码生成来解决这个问题。然而，PAL依赖于手工制作的提示模板和示例，同时仍然产生不准确的结果。在这项工作中，我们提出了TITAN-一种旨在增强LLMs在面向任务提示上表现的新策略。TITAN通过使用通用方法和零-shot学习生成脚本来实现这一目标。与现有方法不同，TITAN消除了对详细的任务特定说明和大量手工工作的需求。TITAN通过利用LLMs的分析和代码生成能力在简化流程中提高其在各种任务上的性能。TITAN采用了两个关键技术：（1）回退提示以提取任务的输入规范和（2）思维链提示以识别所需的程序步骤。这些信息用于改进LLMs的代码生成过程。TITAN通过后处理进一步完善生成的脚本，并执行脚本以检索最终答案。我们的全面评估证明了TITAN在各种任务中的有效性。平均而言，TITAN在与GPT-3.5和GPT-4配对时优于最先进的零-shot方法7.6％和3.9％。总体而言，没有人工注释，TITAN在11个案例中有8个达到最先进的性能，只在三个情况下略微输给了少量shot方法（需要人工干预）。这项工作代表了在解决面向任务提示方面的重大进展，为有效利用LLMs解决日常生活任务提供了新的解决方案。

更新时间: 2024-09-24 19:32:08

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2409.16418v1

Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity

Large Language Models (LLMs) have demonstrated impressive performance in software engineering tasks. However, improving their accuracy in generating correct and reliable code remains challenging. Numerous prompt engineering techniques (PETs) have been developed to address this, but no single approach is universally optimal. Selecting the right PET for each query is difficult for two primary reasons: (1) interactive prompting techniques may not consistently deliver the expected benefits, especially for simpler queries, and (2) current automated prompt engineering methods lack adaptability and fail to fully utilize multi-stage responses. To overcome these challenges, we propose PET-Select, a PET-agnostic selection model that uses code complexity as a proxy to classify queries and select the most appropriate PET. By incorporating contrastive learning, PET-Select effectively distinguishes between simple and complex problems, allowing it to choose PETs that are best suited for each query's complexity level. Our evaluations on the MBPP and HumanEval benchmarks using GPT-3.5 Turbo and GPT-4o show up to a 1.9% improvement in pass@1 accuracy, along with a 74.8% reduction in token usage. Additionally, we provide both quantitative and qualitative results to demonstrate how PET-Select effectively selects the most appropriate techniques for each code generation query, further showcasing its efficiency in optimizing PET selection.

Updated: 2024-09-24 19:28:55

标题: 通过预测代码复杂度选择适用于代码生成的及时工程技术

摘要: 大型语言模型（LLMs）在软件工程任务中表现出令人印象深刻的性能。然而，改进它们在生成正确和可靠代码方面的准确性仍然具有挑战性。已经开发了许多提示工程技术（PETs）来解决这个问题，但没有单一方法是普遍最佳的。选择每个查询的正确PET存在两个主要原因困难：（1）交互提示技术可能无法始终提供预期的好处，特别是对于更简单的查询；（2）当前的自动提示工程方法缺乏适应性，并且未能充分利用多阶段响应。为了克服这些挑战，我们提出了PET-Select，这是一个PET不可知的选择模型，它使用代码复杂性作为代理来对查询进行分类并选择最合适的PET。通过结合对比学习，PET-Select有效区分简单和复杂问题，使其能够选择最适合每个查询复杂性水平的PET。我们在MBPP和HumanEval基准上使用GPT-3.5 Turbo和GPT-4o进行评估，显示出pass@1准确性提高了1.9％，同时标记使用减少了74.8％。此外，我们提供定量和定性结果，以展示PET-Select如何有效选择每个代码生成查询的最合适技术，进一步展示了其在优化PET选择方面的效率。

更新时间: 2024-09-24 19:28:55

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2409.16416v1

Improvement and generalization of ABCD method with Bayesian inference

To find New Physics or to refine our knowledge of the Standard Model at the LHC is an enterprise that involves many factors. We focus on taking advantage of available information and pour our effort in re-thinking the usual data-driven ABCD method to improve it and to generalize it using Bayesian Machine Learning tools. We propose that a dataset consisting of a signal and many backgrounds is well described through a mixture model. Signal, backgrounds and their relative fractions in the sample can be well extracted by exploiting the prior knowledge and the dependence between the different observables at the event-by-event level with Bayesian tools. We show how, in contrast to the ABCD method, one can take advantage of understanding some properties of the different backgrounds and of having more than two independent observables to measure in each event. In addition, instead of regions defined through hard cuts, the Bayesian framework uses the information of continuous distribution to obtain soft-assignments of the events which are statistically more robust. To compare both methods we use a toy problem inspired by $pp\to hh\to b\bar b b \bar b$, selecting a reduced and simplified number of processes and analysing the flavor of the four jets and the invariant mass of the jet-pairs, modeled with simplified distributions. Taking advantage of all this information, and starting from a combination of biased and agnostic priors, leads us to a very good posterior once we use the Bayesian framework to exploit the data and the mutual information of the observables at the event-by-event level. We show how, in this simplified model, the Bayesian framework outperforms the ABCD method sensitivity in obtaining the signal fraction in scenarios with $1\%$ and $0.5\%$ true signal fractions in the dataset. We also show that the method is robust against the absence of signal.

Updated: 2024-09-24 19:24:16

标题: 用贝叶斯推断改进和概括ABCD方法

摘要: 在LHC找到新物理或者完善我们对标准模型的了解是一个涉及许多因素的事业。我们专注于利用现有信息，并将我们的努力投入到重新思考通常的基于数据的ABCD方法以改进它，并使用贝叶斯机器学习工具进行泛化。我们提出，一个包含信号和许多背景的数据集可以通过混合模型很好地描述。通过利用先验知识和事件级别不同可观测量之间的依赖关系，可以很好地提取信号、背景及其在样本中的相对比例，使用贝叶斯工具。我们展示了如何与ABCD方法相比，可以利用对不同背景的一些属性的理解以及在每个事件中测量的两个以上的独立可观测量。此外，贝叶斯框架不是通过硬切割定义区域，而是使用连续分布的信息来获得事件的软分配，这在统计上更加健壮。为了比较这两种方法，我们使用了一个受到$pp\to hh\to b\bar b b \bar b$启发的玩具问题，选择了一些减少和简化的过程，并分析了四个喷气的味道和喷气对的不变质量，用简化的分布来建模。利用所有这些信息，并从偏见和无知的先验的组合开始，一旦我们使用贝叶斯框架来利用数据和事件级别的可观测量的相互信息，就会导致非常好的后验。我们展示了在这个简化模型中，贝叶斯框架在获取数据集中具有$1\%$和$0.5\%$真实信号分数的情景中的信号分数方面优于ABCD方法的灵敏度。我们还展示了该方法对缺少信号的情况的稳健性。

更新时间: 2024-09-24 19:24:16

领域: hep-ph,cs.LG,hep-ex

下载: http://arxiv.org/abs/2402.08001v2

Evaluating Blocking Biases in Entity Matching

Entity Matching (EM) is crucial for identifying equivalent data entities across different sources, a task that becomes increasingly challenging with the growth and heterogeneity of data. Blocking techniques, which reduce the computational complexity of EM, play a vital role in making this process scalable. Despite advancements in blocking methods, the issue of fairness; where blocking may inadvertently favor certain demographic groups; has been largely overlooked. This study extends traditional blocking metrics to incorporate fairness, providing a framework for assessing bias in blocking techniques. Through experimental analysis, we evaluate the effectiveness and fairness of various blocking methods, offering insights into their potential biases. Our findings highlight the importance of considering fairness in EM, particularly in the blocking phase, to ensure equitable outcomes in data integration tasks.

Updated: 2024-09-24 19:20:00

标题: 评估实体匹配中的阻塞偏差

摘要: 实体匹配（EM）对于识别不同来源中等价数据实体至关重要，随着数据的增长和异构性，这一任务变得越来越具有挑战性。阻塞技术可以降低EM的计算复杂度，在使这一过程可扩展方面起着至关重要的作用。尽管阻塞方法有所进步，但公平性问题，即阻塞可能无意中偏向某些人口群体，一直被忽视。本研究将传统的阻塞指标扩展到包含公平性，提供了评估阻塞技术偏见的框架。通过实验分析，我们评估了各种阻塞方法的有效性和公平性，为了解它们的潜在偏见提供了见解。我们的研究结果强调了在EM中考虑公平性的重要性，特别是在阻塞阶段，以确保数据集成任务中的公正结果。

更新时间: 2024-09-24 19:20:00

领域: cs.LG,cs.DB

下载: http://arxiv.org/abs/2409.16410v1

Modern Hopfield Networks meet Encoded Neural Representations -- Addressing Practical Considerations

Content-addressable memories such as Modern Hopfield Networks (MHN) have been studied as mathematical models of auto-association and storage/retrieval in the human declarative memory, yet their practical use for large-scale content storage faces challenges. Chief among them is the occurrence of meta-stable states, particularly when handling large amounts of high dimensional content. This paper introduces Hopfield Encoding Networks (HEN), a framework that integrates encoded neural representations into MHNs to improve pattern separability and reduce meta-stable states. We show that HEN can also be used for retrieval in the context of hetero association of images with natural language queries, thus removing the limitation of requiring access to partial content in the same domain. Experimental results demonstrate substantial reduction in meta-stable states and increased storage capacity while still enabling perfect recall of a significantly larger number of inputs advancing the practical utility of associative memory networks for real-world tasks.

Updated: 2024-09-24 19:17:15

标题: 现代Hopfield网络遇到编码神经表示——解决实际考虑

摘要: 内容寻址内存，如现代霍普菲尔德网络（MHN），已被研究作为人类陈述记忆中自动联想和存储/检索的数学模型，然而它们在大规模内容存储方面的实际应用面临挑战。其中主要问题是元稳态的发生，特别是在处理大量高维内容时。本文介绍了霍普菲尔德编码网络（HEN），这是一个框架，将编码的神经表示集成到MHN中，以改善模式的可分离性并减少元稳态。我们展示了HEN也可以用于在图像与自然语言查询的异质联想环境中进行检索，从而消除了需要在同一领域访问部分内容的限制。实验结果表明，元稳态的显著减少和存储容量的增加，同时仍能够完美回忆更多输入，推进了关联记忆网络在实际任务中的实用性。

更新时间: 2024-09-24 19:17:15

领域: cs.LG,cs.AI,cs.CV,cs.IR,cs.NE

下载: http://arxiv.org/abs/2409.16408v1

Towards Representation Learning for Weighting Problems in Design-Based Causal Inference

Reweighting a distribution to minimize a distance to a target distribution is a powerful and flexible strategy for estimating a wide range of causal effects, but can be challenging in practice because optimal weights typically depend on knowledge of the underlying data generating process. In this paper, we focus on design-based weights, which do not incorporate outcome information; prominent examples include prospective cohort studies, survey weighting, and the weighting portion of augmented weighting estimators. In such applications, we explore the central role of representation learning in finding desirable weights in practice. Unlike the common approach of assuming a well-specified representation, we highlight the error due to the choice of a representation and outline a general framework for finding suitable representations that minimize this error. Building on recent work that combines balancing weights and neural networks, we propose an end-to-end estimation procedure that learns a flexible representation, while retaining promising theoretical properties. We show that this approach is competitive in a range of common causal inference tasks.

Updated: 2024-09-24 19:16:37

标题: 朝向设计为基础的因果推断中的加权问题的表示学习

摘要: 重新加权分布以最小化与目标分布的距离是估计各种因果效应的强大灵活策略，但在实践中可能具有挑战性，因为最优权重通常取决于对底层数据生成过程的了解。在本文中，我们专注于设计基础权重，这些权重不包含结果信息；突出的例子包括前瞻性队列研究、调查加权和增强加权估计器的加权部分。在这种应用中，我们探讨了在实践中找到理想权重的表示学习的核心作用。与假设一个规范明确的表示的常见方法不同，我们强调由于表示选择而导致的错误，并概述了一个找到最小化这种错误的合适表示的一般框架。基于将平衡权重和神经网络结合的最近工作，我们提出了一个端到端的估计过程，该过程学习一个灵活的表示，同时保留有前途的理论性质。我们展示了这种方法在各种常见因果推断任务中具有竞争力。

更新时间: 2024-09-24 19:16:37

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2409.16407v1

Chasing the Shadows: TTPs in Action to Attribute Advanced Persistent Threats

The current state of Advanced Persistent Threats (APT) attribution primarily relies on time-consuming manual processes. These include mapping incident artifacts onto threat attribution frameworks and employing expert reasoning to uncover the most likely responsible APT groups. This research aims to assist the threat analyst in the attribution process by presenting an attribution method named CAPTAIN (Comprehensive Advanced Persistent Threat AttrIbutioN). This novel APT attribution approach leverages the Tactics, Techniques, and Procedures (TTPs) employed by various APT groups in past attacks. CAPTAIN follows two significant development steps: baseline establishment and similarity measure for attack pattern matching. This method starts by maintaining a TTP database of APTs seen in past attacks as baseline behaviour of threat groups. The attribution process leverages the contextual information added by TTP sequences, which reflects the sequence of behaviours threat actors demonstrated during the attack on different kill-chain stages. Then, it compares the provided TTPs with established baseline to identify the most closely matching threat group. CAPTAIN introduces a novel similarity measure for APT group attack-pattern matching that calculates the similarity between TTP sequences. The proposed approach outperforms traditional similarity measures like Cosine, Euclidean, and Longest Common Subsequence (LCS) in performing attribution. Overall, CAPTAIN performs attribution with the precision of 61.36% (top-1) and 69.98% (top-2), surpassing the existing state-of-the-art attribution methods.

Updated: 2024-09-24 18:59:27

标题: 追踪阴影：行动中的TTPs以追溯高级持续性威胁

摘要: 当前高级持续威胁（APT）归因的现状主要依赖于耗时的手动过程。这些过程包括将事件迹象映射到威胁归因框架，并运用专家推理来揭示最有可能负责的APT组。本研究旨在通过提出一种名为CAPTAIN（综合高级持续威胁归因）的归因方法来协助威胁分析师进行归因过程。这种新颖的APT归因方法利用了过去攻击中各种APT组织采用的战术、技术和程序（TTPs）。CAPTAIN遵循两个重要的发展步骤：基线建立和攻击模式匹配的相似度测量。该方法首先通过维护一个过去攻击中见过的APT的TTP数据库作为威胁组织的基线行为来开始。归因过程利用了TTP序列添加的上下文信息，这反映了威胁行为者在攻击不同杀伤链阶段表现出的行为序列。然后，它将提供的TTP与已建立的基线进行比较，以确定最匹配的威胁组。CAPTAIN引入了一种新颖的相似度测量方法，用于APT组织攻击模式匹配，计算TTP序列之间的相似度。所提出的方法在执行归因时优于传统的相似度测量方法，如余弦、欧几里得和最长公共子序列（LCS）。总体而言，CAPTAIN以61.36%（top-1）和69.98%（top-2）的精度执行归因，超越了现有的最先进的归因方法。

更新时间: 2024-09-24 18:59:27

领域: cs.CR

下载: http://arxiv.org/abs/2409.16400v1

Design and Evaluation of a CDSS for Drug Allergy Management Using LLMs and Pharmaceutical Data Integration

Medication errors significantly threaten patient safety, leading to adverse drug events and substantial economic burdens on healthcare systems. Clinical Decision Support Systems (CDSSs) aimed at mitigating these errors often face limitations, including reliance on static databases and rule-based algorithms, which can result in high false alert rates and alert fatigue among clinicians. This paper introduces HELIOT, an innovative CDSS for drug allergy management, integrating Large Language Models (LLMs) with a comprehensive pharmaceutical data repository. HELIOT leverages advanced natural language processing capabilities to interpret complex medical texts and synthesize unstructured data, overcoming the limitations of traditional CDSSs. An empirical evaluation using a synthetic patient dataset and expert-verified ground truth demonstrates HELIOT's high accuracy, precision, recall, and F1 score, uniformly reaching 100\% across multiple experimental runs. The results underscore HELIOT's potential to enhance decision support in clinical settings, offering a scalable, efficient, and reliable solution for managing drug allergies.

Updated: 2024-09-24 18:55:10

标题: 设计和评估使用LLMs和药物数据集成的药物过敏管理CDSS

摘要: 药物错误严重威胁患者安全，导致不良药物事件并给医疗系统带来巨大的经济负担。旨在减少这些错误的临床决策支持系统（CDSSs）常常面临限制，包括依赖静态数据库和基于规则的算法，这可能导致高假警报率和临床医生的警报疲劳。本文介绍了HELIOT，这是一种创新的用于药物过敏管理的CDSS，集成了大型语言模型（LLMs）和全面的制药数据存储库。HELIOT利用先进的自然语言处理能力来解释复杂的医疗文本并综合非结构化数据，克服了传统CDSS的限制。使用合成患者数据集和专家验证的标准答案进行的实证评估显示，HELIOT的准确率、精确率、召回率和F1分数都非常高，多次实验运行结果均达到100％。结果强调了HELIOT在临床环境中增强决策支持的潜力，为管理药物过敏提供了可扩展、高效和可靠的解决方案。

更新时间: 2024-09-24 18:55:10

领域: cs.AI

下载: http://arxiv.org/abs/2409.16395v1

Surrogate Modeling of Trajectory Map-matching in Urban Road Networks using Transformer Sequence-to-Sequence Model

Large-scale geolocation telematics data acquired from connected vehicles has the potential to significantly enhance mobility infrastructures and operational systems within smart cities. To effectively utilize this data, it is essential to accurately match the geolocation data to the road segments. However, this matching is often not trivial due to the low sampling rate and errors exacerbated by multipath effects in urban environments. Traditionally, statistical modeling techniques such as Hidden-Markov models incorporating domain knowledge into the matching process have been extensively used for map-matching tasks. However, rule-based map-matching tasks are noise-sensitive and inefficient in processing large-scale trajectory data. Deep learning techniques directly learn the relationship between observed data and road networks from the data, often without the need for hand-crafted rules or domain knowledge. This renders them an efficient approach for map-matching large-scale datasets and more robust to the noise. This paper introduces a deep-learning model, specifically the transformer-based encoder-decoder model, to perform as a surrogate for offline map-matching algorithms. The encoder-decoder architecture initially encodes the series of noisy GPS points into a representation that automatically captures autoregressive behavior and spatial correlations between GPS points. Subsequently, the decoder associates data points with the road network features and thus transforms these representations into a sequence of road segments. The model is trained and evaluated using GPS traces collected in Manhattan, New York. Achieving an accuracy of 75%, transformer-based encoder-decoder models extensively employed in natural language processing presented a promising performance for translating noisy GPS data to the navigated routes in urban road networks.

Updated: 2024-09-24 18:53:44

标题: 在城市道路网络中使用Transformer序列到序列模型进行轨迹地图匹配的代理建模

摘要: 连接车辆获取的大规模地理定位远程数据有潜力显著增强智慧城市内的移动基础设施和运营系统。为有效利用这些数据，准确将地理定位数据匹配到道路段是至关重要的。然而，由于在城市环境中的低采样率和多路径效应导致的误差，这种匹配通常并不简单。传统上，统计建模技术如隐马尔可夫模型将领域知识纳入匹配过程中，被广泛用于地图匹配任务。然而，基于规则的地图匹配任务对噪声敏感且在处理大规模轨迹数据时效率低下。深度学习技术直接从数据中学习观察数据与道路网络之间的关系，通常无需手工制定规则或领域知识。这使它们成为地图匹配大规模数据集的高效方法，并更具抗噪声性。本文介绍了一种基于变压器编码器-解码器模型的深度学习模型，作为离线地图匹配算法的替代品。编码器-解码器架构首先将一系列嘈杂的GPS点编码为自动捕捉自回归行为和GPS点之间空间相关性的表示。随后，解码器将数据点与道路网络特征相关联，从而将这些表示转换为一系列道路段。该模型使用在纽约曼哈顿收集的GPS轨迹进行训练和评估。在将嘈杂的GPS数据转换为城市道路网络中的导航路线时，基于变压器编码器-解码器的模型取得了75%的准确度，这种模型在自然语言处理中广泛应用，展现出了很好的性能。

更新时间: 2024-09-24 18:53:44

领域: cs.AI,cs.CE

下载: http://arxiv.org/abs/2404.12460v2

Rao-Blackwellized POMDP Planning

Partially Observable Markov Decision Processes (POMDPs) provide a structured framework for decision-making under uncertainty, but their application requires efficient belief updates. Sequential Importance Resampling Particle Filters (SIRPF), also known as Bootstrap Particle Filters, are commonly used as belief updaters in large approximate POMDP solvers, but they face challenges such as particle deprivation and high computational costs as the system's state dimension grows. To address these issues, this study introduces Rao-Blackwellized POMDP (RB-POMDP) approximate solvers and outlines generic methods to apply Rao-Blackwellization in both belief updates and online planning. We compare the performance of SIRPF and Rao-Blackwellized Particle Filters (RBPF) in a simulated localization problem where an agent navigates toward a target in a GPS-denied environment using POMCPOW and RB-POMCPOW planners. Our results not only confirm that RBPFs maintain accurate belief approximations over time with fewer particles, but, more surprisingly, RBPFs combined with quadrature-based integration improve planning quality significantly compared to SIRPF-based planning under the same computational limits.

Updated: 2024-09-24 18:46:50

标题: Rao-Blackwell化的POMDP规划

摘要: 部分可观察马尔可夫决策过程（POMDPs）提供了一个结构化的决策框架，用于在不确定性下进行决策，但其应用需要高效的信念更新。顺序重要性重采样粒子滤波器（SIRPF），也称为Bootstrap粒子滤波器，在大型近似POMDP解算器中通常用作信念更新器，但面临着诸如粒子匮乏和高计算成本等挑战，随着系统状态维度的增长。为了解决这些问题，本研究介绍了Rao-Blackwellized POMDP（RB-POMDP）近似解算器，并概述了在信念更新和在线规划中应用Rao-Blackwellization的通用方法。我们比较了SIRPF和Rao-Blackwellized粒子滤波器（RBPF）在一个模拟定位问题中的性能，其中一个代理在GPS受限环境中向目标导航，使用POMCPOW和RB-POMCPOW规划器。我们的结果不仅证实了RBPF随着时间的推移保持准确的信念近似，且粒子更少，而且更令人惊讶的是，在相同的计算限制下，与基于SIRPF的规划相比，RBPF结合基于积分的四元数集成显著改善了规划质量。

更新时间: 2024-09-24 18:46:50

领域: cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2409.16392v1

Patch-Based Contrastive Learning and Memory Consolidation for Online Unsupervised Continual Learning

We focus on a relatively unexplored learning paradigm known as {\em Online Unsupervised Continual Learning} (O-UCL), where an agent receives a non-stationary, unlabeled data stream and progressively learns to identify an increasing number of classes. This paradigm is designed to model real-world applications where encountering novelty is the norm, such as exploring a terrain with several unknown and time-varying entities. Unlike prior work in unsupervised, continual, or online learning, O-UCL combines all three areas into a single challenging and realistic learning paradigm. In this setting, agents are frequently evaluated and must aim to maintain the best possible representation at any point of the data stream, rather than at the end of pre-specified offline tasks. The proposed approach, called \textbf{P}atch-based \textbf{C}ontrastive learning and \textbf{M}emory \textbf{C}onsolidation (PCMC), builds a compositional understanding of data by identifying and clustering patch-level features. Embeddings for these patch-level features are extracted with an encoder trained via patch-based contrastive learning. PCMC incorporates new data into its distribution while avoiding catastrophic forgetting, and it consolidates memory examples during ``sleep" periods. We evaluate PCMC's performance on streams created from the ImageNet and Places365 datasets. Additionally, we explore various versions of the PCMC algorithm and compare its performance against several existing methods and simple baselines.

Updated: 2024-09-24 18:46:32

标题: 基于补丁的对比学习和记忆巩固用于在线无监督持续学习

摘要: 我们专注于一个相对较少探索的学习范式，被称为{\em 在线无监督持续学习}（O-UCL），其中一个代理接收一个非平稳的、未标记的数据流，并逐渐学会识别越来越多的类别。这种范式旨在模拟现实世界中遇到新颖性是常态的应用，比如探索一个具有多个未知和时变实体的地形。与以往的无监督、持续或在线学习相比，O-UCL将这三个领域结合成一个具有挑战性和现实性的学习范式。在这种情况下，代理经常被评估，并必须努力在数据流的任何时间点保持最佳的表征，而不是在预先指定的离线任务结束时。所提出的方法，称为基于\textbf{P}atch的\textbf{C}ontrastive学习和\textbf{M}emory \textbf{C}onsolidation（PCMC），通过识别和聚类补丁级特征来构建对数据的组成理解。这些补丁级特征的嵌入是通过经过补丁级对比学习训练的编码器提取的。PCMC将新数据纳入其分布，同时避免灾难性遗忘，并在“睡眠”期间巩固记忆示例。我们评估了PCMC在从ImageNet和Places365数据集创建的数据流上的性能。此外，我们探讨了PCMC算法的各个版本，并将其性能与几种现有方法和简单基准进行了比较。

更新时间: 2024-09-24 18:46:32

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2409.16391v1

Best Linear Unbiased Estimate from Privatized Histograms

In differential privacy (DP) mechanisms, it can be beneficial to release "redundant" outputs, in the sense that a quantity can be estimated by combining different combinations of privatized values. Indeed, this structure is present in the DP 2020 Decennial Census products published by the U.S. Census Bureau. With this structure, the DP output can be improved by enforcing self-consistency (i.e., estimators obtained by combining different values result in the same estimate) and we show that the minimum variance processing is a linear projection. However, standard projection algorithms are too computationally expensive in terms of both memory and execution time for applications such as the Decennial Census. We propose the Scalable Efficient Algorithm for Best Linear Unbiased Estimate (SEA BLUE), based on a two step process of aggregation and differencing that 1) enforces self-consistency through a linear and unbiased procedure, 2) is computationally and memory efficient, 3) achieves the minimum variance solution under certain structural assumptions, and 4) is empirically shown to be robust to violations of these structural assumptions. We propose three methods of calculating confidence intervals from our estimates, under various assumptions. We apply SEA BLUE to two 2010 Census demonstration products, illustrating its scalability and validity.

Updated: 2024-09-24 18:39:26

标题: 来自私有直方图的最佳线性无偏估计

摘要: 在差分隐私（DP）机制中，释放“冗余”输出可能是有益的，即可以通过组合不同的私有化数值来估计数量。事实上，美国人口普查局发布的DP 2020年人口普查产品中存在这种结构。通过强制执行自一致性（即，通过组合不同数值获得的估计值相同），可以改善DP输出，并且我们表明最小方差处理是一个线性投影。然而，标准投影算法在内存和执行时间方面对于像人口普查这样的应用来说太昂贵。我们提出了基于聚合和差分的可扩展高效算法最佳线性无偏估计（SEA BLUE），通过一个两步过程，1）通过线性和无偏程序强制执行自一致性，2）在计算和内存方面高效，3）在特定结构假设下实现最小方差解决方案，4）在实证上表明对这些结构假设的违反具有鲁棒性。我们提出了三种根据我们的估计计算置信区间的方法，在不同的假设下。我们将SEA BLUE应用于两个2010年人口普查演示产品，说明其可扩展性和有效性。

更新时间: 2024-09-24 18:39:26

领域: stat.CO,cs.CR,stat.AP,62-08, 62P25, 68P27

下载: http://arxiv.org/abs/2409.04387v2

WebQuest: A Benchmark for Multimodal QA on Web Page Sequences

The rise of powerful multimodal LLMs has enhanced the viability of building web agents which can, with increasing levels of autonomy, assist users to retrieve information and complete tasks on various human-computer interfaces. It is hence necessary to build challenging benchmarks that span a wide-variety of use cases reflecting real-world usage. In this work, we present WebQuest, a multi-page question-answering dataset that requires reasoning across multiple related web pages. In contrast to existing UI benchmarks that focus on multi-step web navigation and task completion, our dataset evaluates information extraction, multimodal retrieval and composition of information from many web pages. WebQuest includes three question categories: single-screen QA, multi-screen QA, and QA based on navigation traces. We evaluate leading proprietary multimodal models like GPT-4V, Gemini Flash, Claude 3, and open source models like InstructBLIP, PaliGemma on our dataset, revealing a significant gap between single-screen and multi-screen reasoning. Finally, we investigate inference time techniques like Chain-of-Thought prompting to improve model capabilities on multi-screen reasoning.

Updated: 2024-09-24 18:38:02

标题: WebQuest：网页序列上多模态问答的基准

摘要: 强大的多模态LLM的崛起增强了构建网络代理的可行性，这些代理可以在不断增加的自主性下，帮助用户检索信息并完成各种人机界面上的任务。因此，有必要构建挑战性的基准，涵盖广泛的用例，反映现实世界的使用情况。在这项工作中，我们提出了WebQuest，一个需要跨越多个相关网页进行推理的多页面问答数据集。与现有的侧重于多步骤网络导航和任务完成的UI基准相比，我们的数据集评估了从许多网页中提取信息、多模态检索和组合信息的能力。WebQuest包括三种问题类别：单屏QA、多屏QA和基于导航痕迹的QA。我们在我们的数据集上评估了领先的专有多模态模型，如GPT-4V、Gemini Flash、Claude 3，以及开源模型，如InstructBLIP、PaliGemma，揭示了单屏推理和多屏推理之间的显著差距。最后，我们研究了像“思维链”提示这样的推理时间技术，以提高模型在多屏推理上的能力。

更新时间: 2024-09-24 18:38:02

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2409.13711v2

Deploying Open-Source Large Language Models: A performance Analysis

Since the release of ChatGPT in November 2022, large language models (LLMs) have seen considerable success, including in the open-source community, with many open-weight models available. However, the requirements to deploy such a service are often unknown and difficult to evaluate in advance. To facilitate this process, we conducted numerous tests at the Centre Inria de l'Universit\'e de Bordeaux. In this article, we propose a comparison of the performance of several models of different sizes (mainly Mistral and LLaMa) depending on the available GPUs, using vLLM, a Python library designed to optimize the inference of these models. Our results provide valuable information for private and public groups wishing to deploy LLMs, allowing them to evaluate the performance of different models based on their available hardware. This study thus contributes to facilitating the adoption and use of these large language models in various application domains.

Updated: 2024-09-24 18:26:03

标题: 部署开源大型语言模型：性能分析

摘要: 自从ChatGPT于2022年11月发布以来，大型语言模型（LLMs）取得了相当大的成功，包括在开源社区中，许多开放权重模型都可用。然而，部署此类服务的要求通常是未知的，并且很难提前评估。为了促进这一过程，我们在波尔多大学Inria中心进行了大量测试。在本文中，我们提出了对不同规模模型（主要是Mistral和LLaMa）在可用GPU上的性能进行比较，使用vLLM，这是一个旨在优化这些模型推理的Python库。我们的结果为希望部署LLMs的私人和公共团体提供了有价值的信息，使他们能够根据其可用硬件评估不同模型的性能。这项研究有助于促进在各种应用领域中采用和使用这些大型语言模型。

更新时间: 2024-09-24 18:26:03

领域: cs.PF,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.14887v2

Development and Application of a Sentinel-2 Satellite Imagery Dataset for Deep-Learning Driven Forest Wildfire Detection

Forest loss due to natural events, such as wildfires, represents an increasing global challenge that demands advanced analytical methods for effective detection and mitigation. To this end, the integration of satellite imagery with deep learning (DL) methods has become essential. Nevertheless, this approach requires substantial amounts of labeled data to produce accurate results. In this study, we use bi-temporal Sentinel-2 satellite imagery sourced from Google Earth Engine (GEE) to build the California Wildfire GeoImaging Dataset (CWGID), a high-resolution labeled satellite imagery dataset with over 100,000 labeled before and after forest wildfire image pairs for wildfire detection through DL. Our methods include data acquisition from authoritative sources, data processing, and an initial dataset analysis using three pre-trained Convolutional Neural Network (CNN) architectures. Our results show that the EF EfficientNet-B0 model achieves the highest accuracy of over 92% in detecting forest wildfires. The CWGID and the methodology used to build it, prove to be a valuable resource for training and testing DL architectures for forest wildfire detection.

Updated: 2024-09-24 18:25:02

标题: 《基于深度学习的Sentinel-2卫星图像数据集开发与应用在森林火灾检测中的研究》

摘要: 由于自然事件，如野火导致的森林减少，构成了一个日益增加的全球挑战，需要先进的分析方法来进行有效的检测和缓解。为此，卫星图像与深度学习（DL）方法的整合变得至关重要。然而，这种方法需要大量的标记数据才能产生准确的结果。在本研究中，我们利用来自Google Earth Engine（GEE）的双时相Sentinel-2卫星图像建立了加利福尼亚野火地理成像数据集（CWGID），这是一个高分辨率的标记卫星图像数据集，包括超过10万张标记的森林野火前后图像对，用于通过DL进行野火检测。我们的方法包括从权威来源获取数据、数据处理和使用三种预训练的卷积神经网络（CNN）架构进行初步数据集分析。我们的结果表明，EF EfficientNet-B0模型在检测森林野火方面的准确率达到了92%以上。CWGID及其建立所使用的方法证明是一个有价值的资源，用于训练和测试DL架构以进行森林野火检测。

更新时间: 2024-09-24 18:25:02

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.16380v1

On Collaboration in Distributed Parameter Estimation with Resource Constraints

Effective resource allocation in sensor networks, IoT systems, and distributed computing is essential for applications such as environmental monitoring, surveillance, and smart infrastructure. Sensors or agents must optimize their resource allocation to maximize the accuracy of parameter estimation. In this work, we consider a group of sensors or agents, each sampling from a different variable of a multivariate Gaussian distribution and having a different estimation objective. We formulate a sensor or agent's data collection and collaboration policy design problem as a Fisher information maximization (or Cramer-Rao bound minimization) problem. This formulation captures a novel trade-off in energy use, between locally collecting univariate samples and collaborating to produce multivariate samples. When knowledge of the correlation between variables is available, we analytically identify two cases: (1) where the optimal data collection policy entails investing resources to transfer information for collaborative sampling, and (2) where knowledge of the correlation between samples cannot enhance estimation efficiency. When knowledge of certain correlations is unavailable, but collaboration remains potentially beneficial, we propose novel approaches that apply multi-armed bandit algorithms to learn the optimal data collection and collaboration policy in our sequential distributed parameter estimation problem. We illustrate the effectiveness of the proposed algorithms, DOUBLE-F, DOUBLE-Z, UCB-F, UCB-Z, through simulation.

Updated: 2024-09-24 18:18:27

标题: 关于具有资源约束的分布式参数估计中的协作

摘要: 在传感器网络、物联网系统和分布式计算中，有效的资源分配对于环境监测、监视和智能基础设施等应用至关重要。传感器或代理必须优化其资源分配，以最大化参数估计的准确性。在这项工作中，我们考虑了一组传感器或代理，每个传感器或代理从多元高斯分布的不同变量中采样，并具有不同的估计目标。我们将传感器或代理的数据收集和协作策略设计问题表述为费舍尔信息最大化（或克拉美洛下界最小化）问题。这种表述捕捉了在能源使用方面的一种新的权衡，即在本地收集单变量样本和协作生成多元样本之间的权衡。当变量之间的相关性已知时，我们从分析角度确定了两种情况：（1）最佳数据收集策略需要投入资源进行协作采样信息传输，以及（2）样本之间的相关性知识无法增强估计效率。当某些相关性的知识不可用，但协作仍然可能有益时，我们提出了应用多臂赌博算法来学习在我们的顺序分布参数估计问题中的最佳数据收集和协作策略的新方法。通过模拟，我们展示了所提出算法DOUBLE-F、DOUBLE-Z、UCB-F、UCB-Z的有效性。

更新时间: 2024-09-24 18:18:27

领域: cs.LG,cs.DC,cs.MA,stat.ML

下载: http://arxiv.org/abs/2307.06442v2

Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling

Generative artificial intelligence (GenAI) can reshape education and learning. While large language models (LLMs) like ChatGPT dominate current educational research, multimodal capabilities, such as text-to-speech and text-to-image, are less explored. This study uses topic modeling to map the research landscape of multimodal and generative AI in education. An extensive literature search using Dimensions.ai yielded 4175 articles. Employing a topic modeling approach, latent topics were extracted, resulting in 38 interpretable topics organized into 14 thematic areas. Findings indicate a predominant focus on text-to-text models in educational contexts, with other modalities underexplored, overlooking the broader potential of multimodal approaches. The results suggest a research gap, stressing the importance of more balanced attention across different AI modalities and educational levels. In summary, this research provides an overview of current trends in generative AI for education, underlining opportunities for future exploration of multimodal technologies to fully realize the transformative potential of artificial intelligence in education.

Updated: 2024-09-24 18:11:24

标题: 超越文本对文本：多模态和生成人工智能在教育中的概述，使用主题建模

摘要: 生成人工智能（GenAI）可以重塑教育和学习。虽然像ChatGPT这样的大型语言模型（LLMs）主导了当前的教育研究，但文本转语音和文本转图像等多模态能力却较少被探索。本研究利用主题建模来绘制教育中多模态和生成人工智能的研究格局。通过使用Dimensions.ai进行广泛的文献搜索，共获得4175篇文章。采用主题建模方法，提取出潜在主题，结果显示出38个可解释的主题组织成14个主题领域。研究结果表明，在教育环境中主要关注文本到文本模型，其他模态被较少探索，忽视了多模态方法的更广泛潜力。结果表明存在研究空白，强调在不同人工智能模态和教育层次之间更平衡地关注的重要性。总之，本研究提供了生成人工智能在教育中当前趋势的概述，强调了未来探索多模态技术以充分实现人工智能在教育中的变革潜力的机会。

更新时间: 2024-09-24 18:11:24

领域: cs.AI,cs.HC,I.2; K.3.0

下载: http://arxiv.org/abs/2409.16376v1

Uncertainty-aware Surrogate Models for Airfoil Flow Simulations with Denoising Diffusion Probabilistic Models

Leveraging neural networks as surrogate models for turbulence simulation is a topic of growing interest. At the same time, embodying the inherent uncertainty of simulations in the predictions of surrogate models remains very challenging. The present study makes a first attempt to use denoising diffusion probabilistic models (DDPMs) to train an uncertainty-aware surrogate model for turbulence simulations. Due to its prevalence, the simulation of flows around airfoils with various shapes, Reynolds numbers, and angles of attack is chosen as the learning objective. Our results show that DDPMs can successfully capture the whole distribution of solutions and, as a consequence, accurately estimate the uncertainty of the simulations. The performance of DDPMs is also compared with varying baselines in the form of Bayesian neural networks and heteroscedastic models. Experiments demonstrate that DDPMs outperform the other methods regarding a variety of accuracy metrics. Besides, it offers the advantage of providing access to the complete distributions of uncertainties rather than providing a set of parameters. As such, it can yield realistic and detailed samples from the distribution of solutions. We also evaluate an emerging generative modeling variant, flow matching, in comparison to regular diffusion models. The results demonstrate that flow matching addresses the problem of slow sampling speed typically associated with diffusion models. As such, it offers a promising new paradigm for uncertainty quantification with generative models.

Updated: 2024-09-24 18:00:46

标题: 考虑不确定性的翼型流动模拟的代理模型：使用去噪扩散概率模型

摘要: 将神经网络作为湍流模拟的代理模型是一个越来越受关注的话题。与此同时，在代理模型的预测中体现模拟的固有不确定性仍然非常具有挑战性。本研究首次尝试使用去噪扩散概率模型（DDPMs）来训练一个对湍流模拟具有不确定性意识的代理模型。由于其普遍性，选择了在各种形状、雷诺数和攻角的翼型周围流动的模拟作为学习目标。我们的结果表明，DDPMs能够成功捕获解决方案的整个分布，并因此准确估计模拟的不确定性。DDPMs的性能还与贝叶斯神经网络和异方差模型等不同基线进行了比较。实验表明，DDPMs在多种准确度指标上优于其他方法。此外，它的优势在于提供对完整不确定性分布的访问权限，而不是提供一组参数。因此，它可以从解决方案的分布中产生现实和详细的样本。我们还评估了一种新兴的生成建模变体，流匹配，与常规扩散模型进行比较。结果表明，流匹配解决了通常与扩散模型相关的缓慢采样速度问题。因此，它为使用生成模型进行不确定性量化提供了一个有前途的新范式。

更新时间: 2024-09-24 18:00:46

领域: physics.flu-dyn,cs.LG,76G25 (Primary) 68T37 (Secondary)

下载: http://arxiv.org/abs/2312.05320v3

Scalable quantum dynamics compilation via quantum machine learning

Quantum dynamics compilation is an important task for improving quantum simulation efficiency: It aims to synthesize multi-qubit target dynamics into a circuit consisting of as few elementary gates as possible. Compared to deterministic methods such as Trotterization, variational quantum compilation (VQC) methods employ variational optimization to reduce gate costs while maintaining high accuracy. In this work, we explore the potential of a VQC scheme by making use of out-of-distribution generalization results in quantum machine learning (QML): By learning the action of a given many-body dynamics on a small data set of product states, we can obtain a unitary circuit that generalizes to highly entangled states such as the Haar random states. The efficiency in training allows us to use tensor network methods to compress such time-evolved product states by exploiting their low entanglement features. Our approach exceeds state-of-the-art compilation results in both system size and accuracy in one dimension ($1$D). For the first time, we extend VQC to systems on two-dimensional (2D) strips with a quasi-1D treatment, demonstrating a significant resource advantage over standard Trotterization methods, highlighting the method's promise for advancing quantum simulation tasks on near-term quantum processors.

Updated: 2024-09-24 18:00:00

标题: 通过量子机器学习实现可扩展的量子动力学编译

摘要: 量子动力学编译是提高量子模拟效率的重要任务：它旨在将多量子比特目标动力学综合成尽可能少的基本门构成的电路。与Trotterization等确定性方法相比，变分量子编译(VQC)方法利用变分优化来减少门成本同时保持高准确性。在这项工作中，我们通过利用量子机器学习(QML)中的超出分布泛化结果，探索了VQC方案的潜力：通过学习给定多体动力学对一小组乘积态的作用，我们可以获得一个可以推广到高度纠缠态的酉电路，如Haar随机态。训练的效率使我们能够利用张量网络方法通过利用它们的低纠缠特性来压缩这些时间演化的乘积态。我们的方法在一维（1D）中超过了最先进的编译结果，无论是在系统大小还是准确性方面。首次，我们将VQC扩展到具有准一维处理的二维(2D)条带系统，展示了与标准Trotterization方法相比的显著资源优势，突显了该方法在推进近期量子处理器上的量子模拟任务方面的潜力。

更新时间: 2024-09-24 18:00:00

领域: quant-ph,cond-mat.str-el,cs.LG

下载: http://arxiv.org/abs/2409.16346v1

Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking

Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered. Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics. To address this limitation, we present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds. Our method leverages any interactive perception technique as a foundation for interactive perception, inducing slight object movement to generate point cloud frames of the evolving dynamic scene. These point clouds are then segmented using Segment Anything Model 2 (SAM2), after which the moving part of the object is masked for accurate motion online axis estimation, guiding subsequent robotic actions. Our approach significantly enhances the precision and efficiency of manipulation tasks involving articulated objects. Experiments in simulated environments demonstrate that our method outperforms baseline approaches, especially in tasks that demand precise axis-based control. Project Page: https://hytidel.github.io/video-tracking-for-axis-estimation/.

Updated: 2024-09-24 17:59:56

标题: 使用基于SAM2跟踪的在线轴估计进行关节目标操作

摘要: 关节物体操作需要精确的物体交互，其中物体的轴线必须被仔细考虑。先前的研究采用交互式感知来操作关节物体，但通常，开环方法往往会忽视交互动态。为了解决这一限制，我们提出了一个闭环流程，将交互式感知与从分割的3D点云中在线估计轴线相结合。我们的方法利用任何交互式感知技术作为交互式感知的基础，引导轻微的物体移动来生成不断演变的动态场景的点云帧。然后使用Segment Anything Model 2（SAM2）对这些点云进行分割，然后对物体的移动部分进行遮罩处理，以实现准确的动态轴线估计，并引导后续的机器人动作。我们的方法显著提高了涉及关节物体的操作任务的精度和效率。在模拟环境中的实验证明，我们的方法优于基线方法，尤其是在需要精确轴线控制的任务中。项目页面：https://hytidel.github.io/video-tracking-for-axis-estimation/。

更新时间: 2024-09-24 17:59:56

领域: cs.RO,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2409.16287v1

Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation

How can robot manipulation policies generalize to novel tasks involving unseen object types and new motions? In this paper, we provide a solution in terms of predicting motion information from web data through human video generation and conditioning a robot policy on the generated video. Instead of attempting to scale robot data collection which is expensive, we show how we can leverage video generation models trained on easily available web data, for enabling generalization. Our approach Gen2Act casts language-conditioned manipulation as zero-shot human video generation followed by execution with a single policy conditioned on the generated video. To train the policy, we use an order of magnitude less robot interaction data compared to what the video prediction model was trained on. Gen2Act doesn't require fine-tuning the video model at all and we directly use a pre-trained model for generating human videos. Our results on diverse real-world scenarios show how Gen2Act enables manipulating unseen object types and performing novel motions for tasks not present in the robot data. Videos are at https://homangab.github.io/gen2act/

Updated: 2024-09-24 17:57:33

标题: Gen2Act：在新颖情景下进行的人类视频生成实现了可泛化的机器人操作

摘要: 机器人操纵策略如何推广到涉及未见过的物体类型和新动作的新任务？在本文中，我们提出了一种解决方案，即通过人类视频生成来预测网络数据中的运动信息，并将机器人策略置于生成的视频上。我们展示了如何利用在易获得的网络数据上训练的视频生成模型来实现泛化，而不是尝试扩展昂贵的机器人数据收集。我们的方法Gen2Act将语言条件操纵视为零样本人类视频生成，然后使用单一策略执行，该策略受生成的视频条件约束。为了训练策略，我们使用的机器人交互数据比视频预测模型训练数据少一个数量级。Gen2Act根本不需要对视频模型进行微调，我们直接使用预训练模型生成人类视频。我们在各种真实场景中的结果显示了Gen2Act如何能够操纵未见过的物体类型，并执行机器人数据中不存在的任务的新动作。视频请访问https://homangab.github.io/gen2act/。

更新时间: 2024-09-24 17:57:33

领域: cs.RO,cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2409.16283v1

Order of Magnitude Speedups for LLM Membership Inference

Large Language Models (LLMs) have the promise to revolutionize computing broadly, but their complexity and extensive training data also expose significant privacy vulnerabilities. One of the simplest privacy risks associated with LLMs is their susceptibility to membership inference attacks (MIAs), wherein an adversary aims to determine whether a specific data point was part of the model's training set. Although this is a known risk, state of the art methodologies for MIAs rely on training multiple computationally costly shadow models, making risk evaluation prohibitive for large models. Here we adapt a recent line of work which uses quantile regression to mount membership inference attacks; we extend this work by proposing a low-cost MIA that leverages an ensemble of small quantile regression models to determine if a document belongs to the model's training set or not. We demonstrate the effectiveness of this approach on fine-tuned LLMs of varying families (OPT, Pythia, Llama) and across multiple datasets. Across all scenarios we obtain comparable or improved accuracy compared to state of the art shadow model approaches, with as little as 6% of their computation budget. We demonstrate increased effectiveness across multi-epoch trained target models, and architecture miss-specification robustness, that is, we can mount an effective attack against a model using a different tokenizer and architecture, without requiring knowledge on the target model.

Updated: 2024-09-24 17:48:58

标题: LLM成员推理的数量级加速

摘要: 大型语言模型（LLMs）有望在计算领域引发革命，但它们的复杂性和广泛的训练数据也暴露出重大的隐私漏洞。与LLMs相关的最简单的隐私风险之一是它们容易受到成员推断攻击（MIAs）的影响，其中对手试图确定特定数据点是否是模型的训练集的一部分。尽管这是一个已知的风险，但MIAs的最新方法依赖于训练多个计算成本高昂的影子模型，使得对大型模型的风险评估变得困难。在这里，我们改编了最近一系列使用分位数回归来进行成员推断攻击的工作；我们通过提出一种低成本的MIA来扩展这项工作，该方法利用一组小型分位数回归模型来确定文档是否属于模型的训练集。我们展示了这种方法在不同系列（OPT、Pythia、Llama）的微调LLMs以及多个数据集上的有效性。在所有场景中，与最先进的影子模型方法相比，我们获得了相当或更高的准确性，而且计算预算只需其6%。我们展示了对多轮训练的目标模型的提高有效性，以及对架构误差规范的鲁棒性，也就是说，我们可以针对使用不同分词器和架构的模型发起有效攻击，而无需了解目标模型。

更新时间: 2024-09-24 17:48:58

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2409.14513v2

Transport-Level Encryption in Datacenter Networks

Cloud applications need network data encryption to isolate from other tenants and protect their data from potential eavesdroppers in the network infrastructure. This paper presents SDT, a protocol design for emerging datacenter transport protocols to integrate data encryption while using existing NIC offloading designed for TLS over TCP. Therefore, SDT could enable a deployment path of new transport protocols in data-centers without giving up hardware offloading.

Updated: 2024-09-24 17:38:46

标题: 数据中心网络中的传输级加密

摘要: 云应用需要网络数据加密，以与其他租户隔离，并保护其数据免受网络基础设施中潜在的窃听者的侵害。本文介绍了SDT，这是一种用于新兴数据中心传输协议的协议设计，可以在使用现有的基于TLS over TCP的NIC卸载的同时集成数据加密。因此，SDT可以在数据中心中为新的传输协议提供部署路径，而无需放弃硬件卸载功能。

更新时间: 2024-09-24 17:38:46

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2406.15686v2

On the Principles behind Opinion Dynamics in Multi-Agent Systems of Large Language Models

We study the evolution of opinions inside a population of interacting large language models (LLMs). Every LLM needs to decide how much funding to allocate to an item with three initial possibilities: full, partial, or no funding. We identify biases that drive the exchange of opinions based on the LLM's tendency to find consensus with the other LLM's opinion, display caution when specifying funding, and consider ethical concerns in its opinion. We find these biases are affected by the perceived absence of compelling reasons for opinion change, the perceived willingness to engage in discussion, and the distribution of allocation values. Moreover, tensions among biases can lead to the survival of funding for items with negative connotations. We also find that the final distribution of full, partial, and no funding opinions is more diverse when an LLM freely forms its opinion after an interaction than when its opinion is a multiple-choice selection among the three allocation options. In the latter case, consensus is mostly attained. When agents are aware of past opinions, they seek to maintain consistency with them, changing the opinion dynamics. Our study is performed using Llama 3 and Mistral LLMs.

Updated: 2024-09-24 17:37:28

标题: 大型语言模型中多智能体系统舆论动态背后的原则

摘要: 我们研究了一个互动的大型语言模型（LLMs）群体内意见的演变。每个LLM需要决定在三种初始可能性（全额、部分或无拨款）中分配多少资金给一个项目。我们确定了驱使意见交换的偏见，这些偏见基于LLM倾向于与其他LLM的意见达成共识、在确定资金时显示谨慎和考虑道德方面的担忧。我们发现这些偏见受到认为没有令人信服的改变意见的理由、参与讨论的意愿和分配价值分布的影响。此外，偏见之间的紧张关系可能导致对具有负面含义的项目的资金继续存在。我们还发现，当LLM在互动后自由形成意见时，全额、部分和无拨款意见的最终分布更加多样化，而当其意见是在三种分配选项中做出多项选择时，基本上会达成共识。当代理知道过去的意见时，他们会寻求与之保持一致，从而改变意见动态。我们的研究是使用Llama 3和Mistral LLMs进行的。

更新时间: 2024-09-24 17:37:28

领域: cs.MA,cs.LG,physics.soc-ph

下载: http://arxiv.org/abs/2406.15492v2

Can We Count on LLMs? The Fixed-Effect Fallacy and Claims of GPT-4 Capabilities

In this paper we explore evaluation of LLM capabilities. We present measurements of GPT-4 performance on several deterministic tasks; each task involves a basic calculation and takes as input parameter some element drawn from a large well-defined population (e.g., count elements in a list, multiply two k-digit numbers, etc). We examine several conditions per-task and perform enough trials so that statistically significant differences can be detected. This allows us to investigate the sensitivity of task-accuracy both to query phrasing and input parameter population. We find that seemingly trivial modifications in the task-prompt or input population can yield differences far larger than can be explained by sampling effects. For example, performance on a simple list-counting task varies with query-phrasing and list-length, but also with list composition (i.e., the thing-to-be-counted) and object frequency (e.g., success when an element accounts for $\approx$ 50\% of a list is different from when it accounts for $\approx$ 70\% etc). We conclude that efforts to quantify LLM capabilities easily succumb to the language-as-fixed-effect fallacy, where experimental observations are improperly generalized beyond what the data supports. A consequence appears to be that intuitions that have been formed based on interactions with humans form a very unreliable guide as to which input modifications should ``make no difference'' to LLM performance.

Updated: 2024-09-24 17:34:07

标题: 我们可以相信LLMs吗？固定效应谬误和GPT-4能力声明

摘要: 在这篇论文中，我们探讨了LLM能力的评估。我们展示了GPT-4在几个确定性任务上的表现；每个任务涉及基本计算，并且将来自一个大的明确定义人口的元素作为输入参数（例如，在列表中计算元素的数量，两个k位数字相乘等）。我们检查了每个任务的几个条件，并进行了足够多的试验，以便可以检测到具有统计显著差异。这使我们能够研究任务准确性对查询措辞和输入参数人口的敏感性。我们发现，任务提示或输入人口中看似微不足道的修改可能会产生远远超出抽样效应所能解释的差异。例如，对于一个简单的列表计数任务，性能会随着查询措辞和列表长度的变化而变化，但也会随着列表组成（即要计数的事物）和对象频率（例如，当一个元素占列表的大约50\%时，成功与占列表的大约70\%时不同等）而变化。我们得出结论，努力量化LLM能力很容易陷入“语言作为固定效应”的谬误，即实验观察被错误地推广到数据所支持的范围之外。一个后果似乎是，基于与人类的互动形成的直觉对于哪些输入修改对LLM性能“没有影响”应该是一个非常不可靠的指导。

更新时间: 2024-09-24 17:34:07

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.07638v2

PICL: Physics Informed Contrastive Learning for Partial Differential Equations

Neural operators have recently grown in popularity as Partial Differential Equation (PDE) surrogate models. Learning solution functionals, rather than functions, has proven to be a powerful approach to calculate fast, accurate solutions to complex PDEs. While much work has been done evaluating neural operator performance on a wide variety of surrogate modeling tasks, these works normally evaluate performance on a single equation at a time. In this work, we develop a novel contrastive pretraining framework utilizing Generalized Contrastive Loss that improves neural operator generalization across multiple governing equations simultaneously. Governing equation coefficients are used to measure ground-truth similarity between systems. A combination of physics-informed system evolution and latent-space model output are anchored to input data and used in our distance function. We find that physics-informed contrastive pretraining improves accuracy for the Fourier Neural Operator in fixed-future and autoregressive rollout tasks for the 1D and 2D Heat, Burgers', and linear advection equations.

Updated: 2024-09-24 17:31:32

标题: PICL：物理信息对比学习用于偏微分方程

摘要: 神经算子最近作为偏微分方程（PDE）的代理模型而日益受到关注。学习解决方案的泛函，而不是函数，已被证明是计算复杂PDE快速准确解的强大方法。虽然在评估神经算子在各种代理建模任务上的性能方面已经做了很多工作，但这些工作通常是逐个方程评估性能。在这项工作中，我们开发了一种新颖的对比预训练框架，利用广义对比损失，可以同时提高神经算子在多个控制方程上的泛化能力。控制方程系数被用来衡量系统之间的地面真实相似性。物理信息系统演变和潜在空间模型输出的结合被锚定到输入数据并用于我们的距离函数中。我们发现，物理信息对比预训练提高了傅里叶神经算子在一维和二维热传导、Burgers'以及线性对流方程的固定未来和自回归滚动任务的准确性。

更新时间: 2024-09-24 17:31:32

领域: cs.LG,cs.NA,math.NA,physics.comp-ph

下载: http://arxiv.org/abs/2401.16327v4

Representation Learning for Sequential Volumetric Design Tasks

Volumetric design, also called massing design, is the first and critical step in professional building design which is sequential in nature. As the volumetric design process requires careful design decisions and iterative adjustments, the underlying sequential design process encodes valuable information for designers. Many efforts have been made to automatically generate reasonable volumetric designs, but the quality of the generated design solutions varies, and evaluating a design solution requires either a prohibitively comprehensive set of metrics or expensive human expertise. While previous approaches focused on learning only the final design instead of sequential design tasks, we propose to encode the design knowledge from a collection of expert or high-performing design sequences and extract useful representations using transformer-based models. Later we propose to utilize the learned representations for crucial downstream applications such as design preference evaluation and procedural design generation. We develop the preference model by estimating the density of the learned representations whereas we train an autoregressive transformer model for sequential design generation. We demonstrate our ideas by leveraging a novel dataset of thousands of sequential volumetric designs. Our preference model can compare two arbitrarily given design sequences and is almost $90\%$ accurate in evaluation against random design sequences. Our autoregressive model is also capable of autocompleting a volumetric design sequence from a partial design sequence.

Updated: 2024-09-24 17:28:47

标题: 序列体积设计任务的表示学习

摘要: 体积设计，也称为质量设计，是专业建筑设计中的第一步关键性步骤，是顺序性的。由于体积设计过程需要仔细的设计决策和迭代调整，因此底层的顺序设计过程为设计师编码了宝贵的信息。许多努力已经被做出来自动产生合理的体积设计，但生成设计解决方案的质量各不相同，并且评估设计解决方案要么需要一套全面的度量标准，要么需要昂贵的人力专业知识。虽然之前的方法专注于学习最终设计而不是顺序设计任务，我们提出从专家或高性能设计序列的集合中编码设计知识，并利用基于变压器的模型提取有用的表示。然后，我们提出利用学到的表示进行重要的下游应用，例如设计偏好评估和程序化设计生成。我们通过估计学到的表示的密度开发了偏好模型，同时我们训练了一个用于顺序设计生成的自回归变压器模型。我们通过利用数千个顺序体积设计的新数据集来展示我们的想法。我们的偏好模型可以比较两个任意给定的设计序列，在评估中几乎有90%的准确性，相对于随机设计序列。我们的自回归模型也能够从部分设计序列中自动完成一个体积设计序列。

更新时间: 2024-09-24 17:28:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2309.02583v2

Transformer based time series prediction of the maximum power point for solar photovoltaic cells

This paper proposes an improved deep learning based maximum power point tracking (MPPT) in solar photovoltaic cells considering various time series based environmental inputs. Generally, artificial neural network based MPPT algorithms use basic neural network architectures and inputs which do not represent the ambient conditions in a comprehensive manner. In this article, the ambient conditions of a location are represented through a comprehensive set of environmental features. Furthermore, the inclusion of time based features in the input data is considered to model cyclic patterns temporally within the atmospheric conditions leading to robust modeling of the MPPT algorithm. A transformer based deep learning architecture is trained as a time series prediction model using multidimensional time series input features. The model is trained on a dataset containing typical meteorological year data points of ambient weather conditions from 50 locations. The attention mechanism in the transformer modules allows the model to learn temporal patterns in the data efficiently. The proposed model achieves a 0.47% mean average percentage error of prediction on non zero operating voltage points in a test dataset consisting of data collected over a period of 200 consecutive hours resulting in the average power efficiency of 99.54% and peak power efficiency of 99.98%. The proposed model is validated through real time simulations. The proposed model performs power point tracking in a robust, dynamic, and nonlatent manner, over a wide range of atmospheric conditions.

Updated: 2024-09-24 17:26:55

标题: 基于Transformer的太阳能光伏电池最大功率点时间序列预测

摘要: 这篇论文提出了一种改进的基于深度学习的太阳能光伏电池最大功率点跟踪（MPPT）方法，考虑了各种基于时间序列的环境输入。通常，基于人工神经网络的MPPT算法使用基本的神经网络结构和输入，不能全面代表环境条件。在本文中，一个地点的环境条件通过一套全面的环境特征来表示。此外，考虑将基于时间的特征包含在输入数据中，以在大气条件中暂时建模周期性模式，从而实现MPPT算法的稳健建模。使用基于转换器的深度学习架构训练时间序列预测模型，使用多维时间序列输入特征。该模型在包含50个地点环境气象条件的典型气象年数据点的数据集上进行训练。变压器模块中的注意机制使模型能够有效地学习数据中的时间模式。提出的模型在一个持续时间为200小时的测试数据集中，实现了在非零工作电压点的0.47%平均百分比预测误差，平均功率效率为99.54%，峰值功率效率为99.98%。提出的模型通过实时模拟进行验证。提出的模型以稳健、动态和非潜在的方式在各种大气条件下进行功率点跟踪。

更新时间: 2024-09-24 17:26:55

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2409.16342v1

CRISP: Curriculum Inducing Primitive Informed Subgoal Prediction for Hierarchical Reinforcement Learning

Hierarchical reinforcement learning (HRL) is a promising approach that uses temporal abstraction to solve complex long horizon problems. However, simultaneously learning a hierarchy of policies is unstable as it is challenging to train higher-level policy when the lower-level primitive is non-stationary. In this paper, we present CRISP, a novel HRL algorithm that effectively generates a curriculum of achievable subgoals for evolving lower-level primitives using reinforcement learning and imitation learning. CRISP uses the lower level primitive to periodically perform data relabeling on a handful of expert demonstrations, using a novel primitive informed parsing (PIP) approach, thereby mitigating non-stationarity. Since our approach only assumes access to a handful of expert demonstrations, it is suitable for most robotic control tasks. Experimental evaluations on complex robotic maze navigation and robotic manipulation tasks demonstrate that inducing hierarchical curriculum learning significantly improves sample efficiency, and results in efficient goal conditioned policies for solving temporally extended tasks. Additionally, we perform real world robotic experiments on complex manipulation tasks and demonstrate that CRISP demonstrates impressive generalization in real world scenarios.

Updated: 2024-09-24 17:23:48

标题: CRISP: 为分层强化学习提供基于原始信息的子目标预测的课程设计

摘要: Hierarchical reinforcement learning (HRL)是一种有前途的方法，它利用时间抽象来解决复杂的长期问题。然而，同时学习一系列策略是不稳定的，因为在较低级别原始非平稳时训练较高级别策略具有挑战性。在本文中，我们提出了CRISP，这是一种新颖的HRL算法，通过使用强化学习和模仿学习有效地生成一个可实现子目标的课程，以演变较低级别原始。CRISP利用较低级别原始定期在少数专家示范上执行数据重新标记，使用一种新颖的原始信息解析（PIP）方法，从而减轻非平稳性。由于我们的方法仅假设可以访问少数专家示范，因此适用于大多数机器人控制任务。在复杂的机器人迷宫导航和机器人操作任务上的实验评估表明，引入分层课程学习显著提高了样本效率，并导致有效的目标条件策略，用于解决时间扩展任务。此外，我们在复杂操作任务上进行真实世界机器人实验，并展示CRISP在真实世界场景中展现出令人印象深刻的泛化能力。

更新时间: 2024-09-24 17:23:48

领域: cs.LG

下载: http://arxiv.org/abs/2304.03535v5

Learning To Help: Training Models to Assist Legacy Devices

Machine learning models implemented in hardware on physical devices may be deployed for a long time. The computational abilities of the device may be limited and become outdated with respect to newer improvements. Because of the size of ML models, offloading some computation (e.g. to an edge cloud) can help such legacy devices. We cast this problem in the framework of learning with abstention (LWA) in which the expert (edge) must be trained to assist the client (device). Prior work on LWA trains the client assuming the edge is either an oracle or a human expert. In this work, we formalize the reverse problem of training the expert for a fixed (legacy) client. As in LWA, the client uses a rejection rule to decide when to offload inference to the expert (at a cost). We find the Bayes-optimal rule, prove a generalization bound, and find a consistent surrogate loss function. Empirical results show that our framework outperforms confidence-based rejection rules.

Updated: 2024-09-24 17:21:25

标题: 学习帮助：训练模型以协助传统设备

摘要: 在物理设备上实施的硬件上的机器学习模型可能会被部署很长时间。设备的计算能力可能会受限并且随着新的改进而过时。由于机器学习模型的大小，将一些计算（例如到边缘云）卸载可以帮助这种传统设备。我们将这个问题放在学习与弃权（LWA）的框架中，即专家（边缘）必须接受训练来协助客户（设备）。之前关于LWA的工作在训练客户时假设边缘是一个神谕或人类专家。在这项工作中，我们形式化了为固定（传统）客户训练专家的逆问题。与LWA一样，客户使用拒绝规则来决定何时将推理卸载到专家（以一定成本）。我们找到了贝叶斯最优规则，证明了一个泛化界限，并找到了一个一致的替代损失函数。实证结果表明我们的框架优于基于置信度的拒绝规则。

更新时间: 2024-09-24 17:21:25

领域: cs.LG,I.2.6; I.2.11

下载: http://arxiv.org/abs/2409.16253v1

Fields of The World: A Machine Learning Benchmark Dataset For Global Agricultural Field Boundary Segmentation

Crop field boundaries are foundational datasets for agricultural monitoring and assessments but are expensive to collect manually. Machine learning (ML) methods for automatically extracting field boundaries from remotely sensed images could help realize the demand for these datasets at a global scale. However, current ML methods for field instance segmentation lack sufficient geographic coverage, accuracy, and generalization capabilities. Further, research on improving ML methods is restricted by the lack of labeled datasets representing the diversity of global agricultural fields. We present Fields of The World (FTW) -- a novel ML benchmark dataset for agricultural field instance segmentation spanning 24 countries on four continents (Europe, Africa, Asia, and South America). FTW is an order of magnitude larger than previous datasets with 70,462 samples, each containing instance and semantic segmentation masks paired with multi-date, multi-spectral Sentinel-2 satellite images. We provide results from baseline models for the new FTW benchmark, show that models trained on FTW have better zero-shot and fine-tuning performance in held-out countries than models that aren't pre-trained with diverse datasets, and show positive qualitative zero-shot results of FTW models in a real-world scenario -- running on Sentinel-2 scenes over Ethiopia.

Updated: 2024-09-24 17:20:58

标题: 《全球农田领域：用于全球农田边界分割的机器学习基准数据集》

摘要: 农田边界是农业监测和评估的基础数据集，但手动收集成本昂贵。利用机器学习（ML）方法从遥感图像中自动提取田间边界可以帮助实现全球范围内对这些数据集的需求。然而，目前用于田间实例分割的ML方法缺乏足够的地理覆盖范围、准确性和泛化能力。此外，由于缺乏代表全球农田多样性的标记数据集，对改进ML方法的研究受到限制。我们提出了“全球农田”（FTW）--一个新颖的ML基准数据集，用于农田实例分割，跨越欧洲、非洲、亚洲和南美洲四个大陆的24个国家。FTW比以前的数据集大一个数量级，包含70,462个样本，每个样本都包含实例和语义分割掩模，配对多日期、多光谱的Sentinel-2卫星图像。我们提供了新FTW基准模型的基线模型结果，表明在FTW上训练的模型在保留国家的零样本和微调性能方面优于未经多样数据集预训练的模型，并展示了FTW模型在现实场景中的正面零样本结果--在埃塞俄比亚的Sentinel-2场景上运行。

更新时间: 2024-09-24 17:20:58

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.16252v1

Quality Matters: Evaluating Synthetic Data for Tool-Using LLMs

Training large language models (LLMs) for external tool usage is a rapidly expanding field, with recent research focusing on generating synthetic data to address the shortage of available data. However, the absence of systematic data quality checks poses complications for properly training and testing models. To that end, we propose two approaches for assessing the reliability of data for training LLMs to use external tools. The first approach uses intuitive, human-defined correctness criteria. The second approach uses a model-driven assessment with in-context evaluation. We conduct a thorough evaluation of data quality on two popular benchmarks, followed by an extrinsic evaluation that showcases the impact of data quality on model performance. Our results demonstrate that models trained on high-quality data outperform those trained on unvalidated data, even when trained with a smaller quantity of data. These findings empirically support the significance of assessing and ensuring the reliability of training data for tool-using LLMs.

Updated: 2024-09-24 17:20:02

标题: 质量至关重要：评估用于工具使用的LLMs的合成数据

摘要: 培训大型语言模型（LLMs）用于外部工具使用是一个快速发展的领域，最近的研究集中在生成合成数据以解决可用数据不足的问题。然而，缺乏系统性数据质量检查给正确训练和测试模型带来了复杂性。为此，我们提出了两种评估用于培训LLMs使用外部工具的数据可靠性的方法。第一种方法使用直观的、人为定义的正确性标准。第二种方法使用基于模型驱动的评估和上下文评估。我们对两个流行的基准进行了数据质量的彻底评估，然后进行了外在评估，展示了数据质量对模型性能的影响。我们的结果表明，基于高质量数据训练的模型在性能上优于基于未经验证数据训练的模型，即使训练数据量较少。这些发现从经验上支持了对工具使用LLMs的训练数据进行评估和确保可靠性的重要性。

更新时间: 2024-09-24 17:20:02

领域: cs.LG,cs.CL,cs.SE

下载: http://arxiv.org/abs/2409.16341v1

Robust Estimation under the Wasserstein Distance

We study the problem of robust distribution estimation under the Wasserstein distance, a popular discrepancy measure between probability distributions rooted in optimal transport (OT) theory. Given $n$ samples from an unknown distribution $\mu$, of which $\varepsilon n$ are adversarially corrupted, we seek an estimate for $\mu$ with minimal Wasserstein error. To address this task, we draw upon two frameworks from OT and robust statistics: partial OT (POT) and minimum distance estimation (MDE). We prove new structural properties for POT and use them to show that MDE under a partial Wasserstein distance achieves the minimax-optimal robust estimation risk in many settings. Along the way, we derive a novel dual form for POT that adds a sup-norm penalty to the classic Kantorovich dual for standard OT. Since the popular Wasserstein generative adversarial network (WGAN) framework implements Wasserstein MDE via Kantorovich duality, our penalized dual enables large-scale generative modeling with contaminated datasets via an elementary modification to WGAN. Numerical experiments demonstrating the efficacy of our approach in mitigating the impact of adversarial corruptions are provided.

Updated: 2024-09-24 17:18:09

标题: 在瓦瑟斯坦距离下的稳健估计

摘要: 我们研究了在Wasserstein距离下的鲁棒分布估计问题，这是一种概率分布之间的流行差异度量，根源于最优输运（OT）理论。给定从未知分布μ中抽取的n个样本，其中有εn个是对抗性污染的，我们寻求一个对μ的估计，使得Wasserstein误差最小。为了解决这个任务，我们借鉴了OT和鲁棒统计学中的两个框架：部分OT（POT）和最小距离估计（MDE）。我们证明了POT的新结构性质，并利用这些性质表明，在部分Wasserstein距离下的MDE在许多情况下实现了最小最大优化的鲁棒估计风险。在此过程中，我们推导了一个新的POT的对偶形式，该形式在经典Kantorovich对偶的基础上增加了一个sup-norm惩罚项，用于标准OT。由于流行的Wasserstein生成对抗网络（WGAN）框架通过Kantorovich对偶实现了Wasserstein MDE，我们的惩罚对偶通过对WGAN进行基本修改，实现了使用受污染数据集进行大规模生成建模。我们提供了数值实验，展示了我们的方法在减轻对抗性污染影响方面的有效性。

更新时间: 2024-09-24 17:18:09

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2302.01237v2

Unclonable Non-Interactive Zero-Knowledge

A non-interactive ZK (NIZK) proof enables verification of NP statements without revealing secrets about them. However, an adversary that obtains a NIZK proof may be able to clone this proof and distribute arbitrarily many copies of it to various entities: this is inevitable for any proof that takes the form of a classical string. In this paper, we ask whether it is possible to rely on quantum information in order to build NIZK proof systems that are impossible to clone. We define and construct unclonable non-interactive zero-knowledge arguments (of knowledge) for NP, addressing a question first posed by Aaronson (CCC 2009). Besides satisfying the zero-knowledge and argument of knowledge properties, these proofs additionally satisfy unclonability. Very roughly, this ensures that no adversary can split an honestly generated proof of membership of an instance $x$ in an NP language $\mathcal{L}$ and distribute copies to multiple entities that all obtain accepting proofs of membership of $x$ in $\mathcal{L}$. Our result has applications to unclonable signatures of knowledge, which we define and construct in this work; these non-interactively prevent replay attacks.

Updated: 2024-09-24 17:16:31

标题: 无法克隆的非交互式零知识

摘要: 一种非交互式零知识（NIZK）证明可以在不泄露有关它们的秘密的情况下验证NP语句。然而，获得NIZK证明的对手可能能够克隆此证明并将任意多个副本分发给各种实体：对于采用经典字符串形式的任何证明，这是不可避免的。在本文中，我们探讨了是否可能依赖量子信息来构建不可能克隆的NIZK证明系统。我们定义并构建了用于NP的不可克隆非交互式零知识证明（知识），这是对Aaronson（2009年CCC）首次提出的问题的回应。除了满足零知识和知识证明属性外，这些证明还满足不可克隆性。粗略地说，这确保没有对手可以拆分一个诚实生成的关于NP语言$\mathcal{L}$中实例$x$的成员身份的证明，并将副本分发给多个实体，这些实体都获得了有关$x$在$\mathcal{L}$中的成员身份的接受证明。我们的结果适用于知识的不可克隆签名，我们在本文中定义并构建了这些签名；这些签名在非交互式情况下防止重放攻击。

更新时间: 2024-09-24 17:16:31

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2310.07118v3

A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models

Large language models (LLMs) like GitHub Copilot and ChatGPT have emerged as powerful tools for code generation, significantly enhancing productivity and accelerating software development. However, existing benchmarks primarily focus on general code generation without considering API-oriented code generation, i.e., generating code that invokes APIs from specific libraries. Given the growing demand for API-oriented code generation, there is a pressing need for a systematic and automated approach to evaluate LLM on API-oriented code generation. To address this gap, we propose AutoAPIEval, a lightweight and automated framework designed to evaluate the capabilities of LLMs in API-oriented code generation. Our framework works with any library that provides API documentation and focuses on two unit tasks: API recommendation and code example generation, along with four metrics to evaluate the generated APIs and code examples, such as the proportion of incorrect API recommendations for Task 1, and the proportion of code examples where no specific API is invoked and uncompilable/unexecutable code examples for Task 2. In addition, we conducted a case study on three LLMs (ChatGPT, MagiCoder, and DeepSeek Coder) and Java Runtime Environment 8 to demonstrate the framework's effectiveness. Our findings reveal substantial variability in LLM performance across tasks, with ChatGPT adhering better to instructions, while sharing similar effectiveness in code example generation with its counterparts (i.e., MagiCoder and DeekSeek Coder). We also identify key factors associated with code quality, such as API popularity and model confidence, and build classifiers that achieve high accuracy in detecting incorrect API recommendations and erroneous code examples. Retrieval-augmented generation enhances the quality of code generated by LLMs, though its effectiveness varies across different LLMs.

Updated: 2024-09-24 17:13:43

标题: 一个评估大型语言模型中基于API的代码生成的综合框架

摘要: 大型语言模型（LLMs）如GitHub Copilot和ChatGPT已经成为代码生成的强大工具，显著增强了生产力并加速了软件开发。然而，现有的基准主要专注于一般代码生成，而没有考虑面向API的代码生成，即生成调用特定库API的代码。鉴于对面向API的代码生成的需求不断增长，迫切需要一种系统化和自动化的评估LLM在API导向代码生成方面的方法。为了填补这一空白，我们提出了AutoAPIEval，一个轻量级自动化框架，旨在评估LLMs在API导向代码生成中的能力。我们的框架适用于任何提供API文档的库，并专注于两个单元任务：API推荐和代码示例生成，以及四个评估生成的API和代码示例的指标，例如任务1中不正确API推荐的比例，任务2中未调用具体API和不可编译/不可执行的代码示例的比例。此外，我们对三种LLMs（ChatGPT、MagiCoder和DeepSeek Coder）和Java Runtime Environment 8进行了案例研究，以展示框架的有效性。我们的研究结果显示，在不同任务中，LLM的性能存在显著的变化，ChatGPT更好地遵循指令，而在代码示例生成方面与其同行（即MagiCoder和DeekSeek Coder）具有类似的效果。我们还确定了与代码质量相关的关键因素，如API的受欢迎程度和模型置信度，并构建了在检测不正确的API推荐和错误代码示例方面具有高准确率的分类器。检索增强生成提高了LLMs生成的代码质量，尽管其效果在不同LLMs之间存在差异。

更新时间: 2024-09-24 17:13:43

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.15228v2

Cooperative Resilience in Artificial Intelligence Multiagent Systems

Resilience refers to the ability of systems to withstand, adapt to, and recover from disruptive events. While studies on resilience have attracted significant attention across various research domains, the precise definition of this concept within the field of cooperative artificial intelligence remains unclear. This paper addresses this gap by proposing a clear definition of `cooperative resilience' and outlining a methodology for its quantitative measurement. The methodology is validated in an environment with RL-based and LLM-augmented autonomous agents, subjected to environmental changes and the introduction of agents with unsustainable behaviors. These events are parameterized to create various scenarios for measuring cooperative resilience. The results highlight the crucial role of resilience metrics in analyzing how the collective system prepares for, resists, recovers from, sustains well-being, and transforms in the face of disruptions. These findings provide foundational insights into the definition, measurement, and preliminary analysis of cooperative resilience, offering significant implications for the broader field of AI. Moreover, the methodology and metrics developed here can be adapted to a wide range of AI applications, enhancing the reliability and effectiveness of AI in dynamic and unpredictable environments.

Updated: 2024-09-24 17:13:07

标题: 人工智能多智能体系统中的合作韧性

摘要: 弹性是指系统抵御、适应和从破坏性事件中恢复的能力。尽管弹性研究在各个领域引起了广泛关注，但在合作人工智能领域内对这一概念的确切定义仍不清晰。本文通过提出“合作弹性”的明确定义并概述了其量化测量方法来填补这一空白。该方法在 RL 和 LLM 增强的自主代理环境中得到验证，这些代理受到环境变化和引入行为不可持续代理的影响。这些事件被参数化以创建用于测量合作弹性的各种情景。结果突显了弹性度量在分析集体系统如何准备、抵抗、恢复、维持福祉和在面对干扰时转变的关键作用。这些发现为合作弹性的定义、测量和初步分析提供了基础见解，并为人工智能领域提供重要启示。此外，此处开发的方法和指标可以应用于广泛的人工智能应用中，增强人工智能在动态和不可预测环境中的可靠性和效果。

更新时间: 2024-09-24 17:13:07

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2409.13187v2

From Predictive Importance to Causality: Which Machine Learning Model Reflects Reality?

This study analyzes the Ames Housing Dataset using CatBoost and LightGBM models to explore feature importance and causal relationships in housing price prediction. We examine the correlation between SHAP values and EconML predictions, achieving high accuracy in price forecasting. Our analysis reveals a moderate Spearman rank correlation of 0.48 between SHAP-based feature importance and causally significant features, highlighting the complexity of aligning predictive modeling with causal understanding in housing market analysis. Through extensive causal analysis, including heterogeneity exploration and policy tree interpretation, we provide insights into how specific features like porches impact housing prices across various scenarios. This work underscores the need for integrated approaches that combine predictive power with causal insights in real estate valuation, offering valuable guidance for stakeholders in the industry.

Updated: 2024-09-24 17:06:31

标题: 从预测重要性到因果关系：哪种机器学习模型反映了现实？

摘要: 这项研究使用CatBoost和LightGBM模型分析了艾姆斯房屋数据集，以探索房价预测中的特征重要性和因果关系。我们检验了SHAP值和EconML预测之间的相关性，实现了高准确度的价格预测。我们的分析揭示了基于SHAP的特征重要性与因果显著特征之间的中等斯皮尔曼等级相关性为0.48，突显了在房地产市场分析中将预测建模与因果理解进行对齐的复杂性。通过广泛的因果分析，包括异质性探索和政策树解释，我们深入研究了特定特征如门廊在各种场景下如何影响房价。这项工作强调了在房地产估值中将预测能力与因果洞察相结合的综合方法的必要性，为行业利益相关者提供了有价值的指导。

更新时间: 2024-09-24 17:06:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.02130v2

Opponent Shaping for Antibody Development

Anti-viral therapies are typically designed to target the current strains of a virus. Game theoretically, this corresponds to a short-sighted, or myopic, response. However, therapy-induced selective pressures act on viral antigens to drive the emergence of mutated strains, against which initial therapies have reduced efficacy. Building on a computational model of binding between antibodies and viral antigens (the Absolut! framework), we design and implement a genetic simulation of such viral evolutionary escape. Crucially, this allows our antibody optimisation algorithm to consider and influence the entire escape curve of the virus, i.e. to guide (or ''shape'') the viral evolution. This is inspired by opponent shaping which, in general-sum learning, accounts for the adaptation of the co-player rather than playing a myopic best response. Hence we call the optimised antibodies shapers. Within our simulations, we demonstrate that our shapers target both current and simulated future viral variants, outperforming the antibodies chosen in a myopic way. Furthermore, we show that shapers exert specific evolutionary pressure on the virus compared to myopic antibodies. Altogether, shapers modify the evolutionary trajectories of viral strains and minimise the viral escape compared to their myopic counterparts. While this is a simplified model, we hope that our proposed paradigm will enable the discovery of better long-lived vaccines and antibody therapies in the future, enabled by rapid advancements in the capabilities of simulation tools. Our code is available at https://github.com/olakalisz/antibody-shapers.

Updated: 2024-09-24 17:05:55

标题: 对手塑造在抗体开发中的应用

摘要: 抗病毒疗法通常旨在针对病毒的当前毒株。从博弈论的角度来看，这相当于一种短视或目光短浅的反应。然而，治疗引起的选择性压力作用于病毒抗原，促使突变毒株的出现，初期疗法对其效果降低。基于抗体和病毒抗原之间结合的计算模型（Absolut！框架），我们设计并实施了这种病毒进化逃逸的遗传模拟。至关重要的是，这使得我们的抗体优化算法能够考虑和影响病毒的整个逃逸曲线，即引导（或“塑造”）病毒进化。这受到对手塑造的启发，通常在总和博弈学习中考虑合作者的适应性，而不是采取短视的最佳反应。因此，我们将优化后的抗体称为“塑造者”。在我们的模拟中，我们展示了我们的塑造者针对当前和模拟未来的病毒变种，优于以短视方式选择的抗体。此外，我们展示了塑造者相对于短视抗体对病毒施加特定的进化压力。总的来说，塑造者修改了病毒株的进化轨迹，与其短视的对应物相比，最小化了病毒逃逸。虽然这是一个简化的模型，但我们希望我们提出的范式将促进未来更好的长期疫苗和抗体疗法的发现，这得益于模拟工具能力的快速进步。我们的代码可在https://github.com/olakalisz/antibody-shapers找到。

更新时间: 2024-09-24 17:05:55

领域: q-bio.PE,cs.AI,cs.GT,cs.MA,92-08,I.2.1; J.3

下载: http://arxiv.org/abs/2409.10588v3

LLM Echo Chamber: personalized and automated disinformation

Recent advancements have showcased the capabilities of Large Language Models like GPT4 and Llama2 in tasks such as summarization, translation, and content review. However, their widespread use raises concerns, particularly around the potential for LLMs to spread persuasive, humanlike misinformation at scale, which could significantly influence public opinion. This study examines these risks, focusing on LLMs ability to propagate misinformation as factual. To investigate this, we built the LLM Echo Chamber, a controlled digital environment simulating social media chatrooms, where misinformation often spreads. Echo chambers, where individuals only interact with like minded people, further entrench beliefs. By studying malicious bots spreading misinformation in this environment, we can better understand this phenomenon. We reviewed current LLMs, explored misinformation risks, and applied sota finetuning techniques. Using Microsoft phi2 model, finetuned with our custom dataset, we generated harmful content to create the Echo Chamber. This setup, evaluated by GPT4 for persuasiveness and harmfulness, sheds light on the ethical concerns surrounding LLMs and emphasizes the need for stronger safeguards against misinformation.

Updated: 2024-09-24 17:04:12

标题: LLM回声室：个性化和自动化的虚假信息

摘要: 最近的进展展示了大型语言模型如GPT4和Llama2在摘要、翻译和内容审查等任务中的能力。然而，它们的广泛使用引发了担忧，特别是关于LLMs在规模上传播具有说服力的、类人的错误信息的潜力，这可能会显著影响公众舆论。本研究重点研究了这些风险，着重分析LLMs传播错误信息作为事实的能力。为了调查这一点，我们建立了LLM回音室，这是一个模拟社交媒体聊天室的受控数字环境，在这里错误信息经常传播。回音室是指个人只与志趣相投的人互动，进一步巩固信仰。通过在这种环境中研究恶意机器人传播错误信息，我们可以更好地理解这种现象。我们审查了当前的LLMs，探讨了错误信息的风险，并应用了sota微调技术。使用微软phi2模型，通过我们的自定义数据集进行微调，我们生成了有害内容来创建回音室。这个设置经过GPT4评估了说服力和有害性，揭示了围绕LLMs的伦理问题，并强调了对错误信息的更强保护措施的需求。

更新时间: 2024-09-24 17:04:12

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2409.16241v1

Future-Proofing Medical Imaging with Privacy-Preserving Federated Learning and Uncertainty Quantification: A Review

Artificial Intelligence (AI) has demonstrated significant potential in automating various medical imaging tasks, which could soon become routine in clinical practice for disease diagnosis, prognosis, treatment planning, and post-treatment surveillance. However, the privacy concerns surrounding patient data present a major barrier to the widespread adoption of AI in medical imaging, as large, diverse training datasets are essential for developing accurate, generalizable, and robust Artificial intelligence models. Federated Learning (FL) offers a solution that enables organizations to train AI models collaboratively without sharing sensitive data. federated learning exchanges model training information, such as gradients, between the participating sites. Despite its promise, federated learning is still in its developmental stages and faces several challenges. Notably, sensitive information can still be inferred from the gradients shared during model training. Quantifying AI models' uncertainty is vital due to potential data distribution shifts post-deployment, which can affect model performance. Uncertainty quantification (UQ) in FL is particularly challenging due to data heterogeneity across participating sites. This review provides a comprehensive examination of FL, privacy-preserving FL (PPFL), and UQ in FL. We identify key gaps in current FL methodologies and propose future research directions to enhance data privacy and trustworthiness in medical imaging applications.

Updated: 2024-09-24 16:55:32

标题: 用隐私保护的联邦学习和不确定性量化未来保障医学影像学：一项审查

摘要: 人工智能（AI）已经展示出在自动化各种医学成像任务方面的显著潜力，这些任务很快可能会成为临床实践中疾病诊断、预后、治疗规划和治疗后监测的常规内容。然而，围绕患者数据的隐私问题构成了医学成像中广泛采用AI的主要障碍，因为大规模、多样化的训练数据集对于开发准确、可泛化和稳健的人工智能模型至关重要。联邦学习（FL）提供了一种解决方案，可以使组织在不共享敏感数据的情况下协作训练AI模型。联邦学习在参与站点之间交换模型训练信息，如梯度。尽管具有潜力，但联邦学习仍处于发展阶段并面临几个挑战。值得注意的是，在模型训练期间共享的梯度仍然可能推断出敏感信息。由于部署后可能发生数据分布变化，影响模型性能，因此量化AI模型的不确定性至关重要。在FL中量化不确定性（UQ）尤为具有挑战性，因为参与站点之间的数据异质性。本综述全面审视了FL、隐私保护FL（PPFL）和FL中的UQ。我们确定了当前FL方法论中的关键差距，并提出了未来研究方向，以增强医学成像应用中数据隐私和可信度。

更新时间: 2024-09-24 16:55:32

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.16340v1

GLoCIM: Global-view Long Chain Interest Modeling for news recommendation

Accurately recommending candidate news articles to users has always been the core challenge of news recommendation system. News recommendations often require modeling of user interest to match candidate news. Recent efforts have primarily focused on extracting local subgraph information in a global click graph constructed by the clicked news sequence of all users. Howerer, the computational complexity of extracting global click graph information has hindered the ability to utilize far-reaching linkage which is hidden between two distant nodes in global click graph collaboratively among similar users. To overcome the problem above, we propose a Global-view Long Chain Interests Modeling for news recommendation (GLoCIM), which combines neighbor interest with long chain interest distilled from a global click graph, leveraging the collaboration among similar users to enhance news recommendation. We therefore design a long chain selection algorithm and long chain interest encoder to obtain global-view long chain interest from the global click graph. We design a gated network to integrate long chain interest with neighbor interest to achieve the collaborative interest among similar users. Subsequently we aggregate it with local news category-enhanced representation to generate final user representation. Then candidate news representation can be formed to match user representation to achieve news recommendation. Experimental results on real-world datasets validate the effectiveness of our method to improve the performance of news recommendation.

Updated: 2024-09-24 16:54:35

标题: GLoCIM：全局视角长链兴趣建模用于新闻推荐

摘要: 准确地向用户推荐候选新闻文章一直是新闻推荐系统的核心挑战。新闻推荐通常需要建模用户兴趣以匹配候选新闻。最近的努力主要集中在提取全局点击图中的局部子图信息，该全局点击图由所有用户的点击新闻序列构成。然而，提取全局点击图信息的计算复杂性阻碍了利用全局点击图中隐藏的两个远程节点之间的广泛联系在类似用户之间进行协作。为了克服上述问题，我们提出了一种用于新闻推荐的全局视图长链兴趣建模（GLoCIM）方法，该方法将邻近兴趣与从全局点击图中提取的长链兴趣结合起来，利用类似用户之间的协作来增强新闻推荐。我们设计了一种长链选择算法和长链兴趣编码器，以从全局点击图中获取全局视图长链兴趣。我们设计了一个门控网络，以将长链兴趣与邻近兴趣整合起来，实现类似用户之间的协作兴趣。随后，我们将其与本地新闻类别增强表示相结合，生成最终用户表示。然后，可以形成候选新闻表示以匹配用户表示，从而实现新闻推荐。在真实世界数据集上的实验结果验证了我们的方法改善新闻推荐性能的有效性。

更新时间: 2024-09-24 16:54:35

领域: cs.AI,cs.IR

下载: http://arxiv.org/abs/2408.00859v2

Label-Augmented Dataset Distillation

Traditional dataset distillation primarily focuses on image representation while often overlooking the important role of labels. In this study, we introduce Label-Augmented Dataset Distillation (LADD), a new dataset distillation framework enhancing dataset distillation with label augmentations. LADD sub-samples each synthetic image, generating additional dense labels to capture rich semantics. These dense labels require only a 2.5% increase in storage (ImageNet subsets) with significant performance benefits, providing strong learning signals. Our label generation strategy can complement existing dataset distillation methods for significantly enhancing their training efficiency and performance. Experimental results demonstrate that LADD outperforms existing methods in terms of computational overhead and accuracy. With three high-performance dataset distillation algorithms, LADD achieves remarkable gains by an average of 14.9% in accuracy. Furthermore, the effectiveness of our method is proven across various datasets, distillation hyperparameters, and algorithms. Finally, our method improves the cross-architecture robustness of the distilled dataset, which is important in the application scenario.

Updated: 2024-09-24 16:54:22

标题: 标签增强数据集蒸馏

摘要: 传统的数据集蒸馏主要关注图像表示，往往忽视标签的重要作用。在这项研究中，我们引入了Label-Augmented Dataset Distillation（LADD），这是一个新的数据集蒸馏框架，通过标签增强来增强数据集蒸馏。LADD对每个合成图像进行子采样，生成额外的密集标签以捕捉丰富的语义信息。这些密集标签仅需要增加2.5%的存储空间（ImageNet子集），但却具有显著的性能优势，提供强大的学习信号。我们的标签生成策略可以补充现有的数据集蒸馏方法，显著提高它们的训练效率和性能。实验结果表明，在计算开销和准确性方面，LADD胜过现有方法。通过三种高性能数据集蒸馏算法，LADD的准确率平均提高了14.9%。此外，我们的方法在各种数据集、蒸馏超参数和算法中都得到了验证。最后，我们的方法提高了蒸馏数据集的跨架构鲁棒性，这在应用场景中非常重要。

更新时间: 2024-09-24 16:54:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.16239v1

Efficiently Learning Probabilistic Logical Models by Cheaply Ranking Mined Rules

Probabilistic logical models are a core component of neurosymbolic AI and are important models in their own right for tasks that require high explainability. Unlike neural networks, logical models are often handcrafted using domain expertise, making their development costly and prone to errors. While there are algorithms that learn logical models from data, they are generally prohibitively expensive, limiting their applicability in real-world settings. In this work, we introduce precision and recall for logical rules and define their composition as rule utility -- a cost-effective measure to evaluate the predictive power of logical models. Further, we introduce SPECTRUM, a scalable framework for learning logical models from relational data. Its scalability derives from a linear-time algorithm that mines recurrent structures in the data along with a second algorithm that, using the cheap utility measure, efficiently ranks rules built from these structures. Moreover, we derive theoretical guarantees on the utility of the learnt logical model. As a result, SPECTRUM learns more accurate logical models orders of magnitude faster than previous methods on real-world datasets.

Updated: 2024-09-24 16:54:12

标题: 通过廉价排名挖掘规则高效学习概率逻辑模型

摘要: 概率逻辑模型是神经符号人工智能的核心组成部分，并且对于需要高可解释性的任务而言也是重要的模型。与神经网络不同，逻辑模型通常是通过领域专业知识手工制作的，这使得它们的开发成本高昂且容易出错。虽然有一些算法可以从数据中学习逻辑模型，但它们通常成本过高，限制了它们在现实世界中的适用性。在这项工作中，我们引入了精度和召回率用于逻辑规则，并将它们的组合定义为规则效用--一种成本效益高的度量来评估逻辑模型的预测能力。此外，我们引入了SPECTRUM，一个从关系数据中学习逻辑模型的可扩展框架。其可扩展性源自一个线性时间算法，该算法挖掘数据中的重复结构，以及第二个算法，使用廉价效用度量，有效地对从这些结构构建的规则进行排名。此外，我们对学习到的逻辑模型的效用提供了理论保证。因此，SPECTRUM在真实世界数据集上比以前的方法学习到更准确的逻辑模型快了几个数量级。

更新时间: 2024-09-24 16:54:12

领域: cs.AI

下载: http://arxiv.org/abs/2409.16238v1

Incorporating Human Flexibility through Reward Preferences in Human-AI Teaming

Preference-based Reinforcement Learning (PbRL) has made significant strides in single-agent settings, but has not been studied for multi-agent frameworks. On the other hand, modeling cooperation between multiple agents, specifically, Human-AI Teaming settings while ensuring successful task completion is a challenging problem. To this end, we perform the first investigation of multi-agent PbRL by extending single-agent PbRL to the two-agent teaming settings and formulate it as a Human-AI PbRL Cooperation Game, where the RL agent queries the human-in-the-loop to elicit task objective and human's preferences on the joint team behavior. Under this game formulation, we first introduce the notion of Human Flexibility to evaluate team performance based on if humans prefer to follow a fixed policy or adapt to the RL agent on the fly. Secondly, we study the RL agent's varying access to the human policy. We highlight a special case along these two dimensions, which we call Specified Orchestration, where the human is least flexible and agent has complete access to human policy. We motivate the need for taking Human Flexibility into account and the usefulness of Specified Orchestration through a gamified user study. We evaluate state-of-the-art PbRL algorithms for Human-AI cooperative setups through robot locomotion based domains that explicitly require forced cooperation. Our findings highlight the challenges associated with PbRL by varying Human Flexibility and agent's access to the human policy. Finally, we draw insights from our user study and empirical results, and conclude that Specified Orchestration can be seen as an upper bound PbRL performance for future research in Human-AI teaming scenarios.

Updated: 2024-09-24 16:52:34

标题: 将人类的灵活性通过奖励偏好融入人机协作队伍中

摘要: 首选基于强化学习的偏好（PbRL）在单智能体环境中取得了显著进展，但在多智能体框架中尚未进行研究。另一方面，在多智能体之间建模合作，特别是在人工智能团队设置中确保成功完成任务是一个具有挑战性的问题。为此，我们通过将单智能体PbRL扩展到两智能体团队设置，并将其制定为人工智能PbRL合作游戏，首次对多智能体PbRL进行调查，其中RL智能体查询人类以引出任务目标和人类对联合团队行为的偏好。在这种游戏制定下，我们首先引入了人类灵活性的概念，以评估团队绩效，基于人类是否倾向于遵循固定策略或随RL智能体灵活变化。其次，我们研究了RL智能体对人类策略的不同访问。我们强调了这两个维度中的一个特殊情况，我们称之为指定编排，其中人类最不灵活，智能体完全可以访问人类策略。我们提出了考虑人类灵活性的需求和指定编排的实用性，通过一个游戏化用户研究来支持。我们通过基于机器人运动学领域的强制合作的领域，评估了用于人工智能合作设置的最新PbRL算法。我们的研究结果突出了通过变化人类灵活性和智能体对人类策略的访问所带来的挑战。最后，我们从用户研究和实证结果中得出结论，指定编排可以被视为未来研究中人工智能团队情景的PbRL绩效上限。

更新时间: 2024-09-24 16:52:34

领域: cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2312.14292v2

OmniBench: Towards The Future of Universal Omni-Language Models

Recent advancements in multimodal large language models (MLLMs) have aimed to integrate and interpret data across diverse modalities. However, the capacity of these models to concurrently process and reason about multiple modalities remains inadequately explored, partly due to the lack of comprehensive modality-wise benchmarks. We introduce OmniBench, a novel benchmark designed to rigorously evaluate models' ability to recognize, interpret, and reason across visual, acoustic, and textual inputs simultaneously. We define models capable of such tri-modal processing as omni-language models (OLMs). OmniBench is distinguished by high-quality human annotations, ensuring that accurate responses require integrated understanding and reasoning across all three modalities. Our main findings reveal that: i) most OLMs exhibit critical limitations in instruction-following and reasoning capabilities within tri-modal contexts; and ii) most baselines models perform poorly (below 50\% accuracy) even when provided with alternative textual representations of images or/and audio. These results suggest that the ability to construct a consistent context from text, image, and audio is often overlooked in existing MLLM training paradigms. We advocate for future research to focus on developing more robust tri-modal integration techniques and training strategies to enhance OLM performance across diverse modalities. The codes and live leaderboard could be found at https://m-a-p.ai/OmniBench.

Updated: 2024-09-24 16:51:45

标题: OmniBench：走向通用全方位语言模型的未来

摘要: 最近，多模态大型语言模型（MLLMs）的最新进展旨在整合和解释跨多种模态的数据。然而，这些模型同时处理和推理多种模态的能力仍然未被充分探索，部分原因是缺乏全面的基准。我们引入了OmniBench，这是一个新颖的基准，旨在严格评估模型同时识别、解释和推理视觉、声音和文本输入的能力。我们将能够进行三模态处理的模型定义为全语言模型（OLMs）。OmniBench以高质量的人工注释为特点，确保准确的响应需要在所有三个模态上进行综合理解和推理。我们的主要发现是：i）大多数OLMs在三模态环境中的遵循指令和推理能力方面存在关键限制；ii）大多数基线模型表现不佳（准确率低于50%），即使提供了图像或/和音频的替代文本表示。这些结果表明，在现有的MLLM训练范式中，往往忽视了从文本、图像和音频构建一致上下文的能力。我们主张未来的研究应该集中在开发更强大的三模态整合技术和训练策略，以增强OLM在各种模态中的性能。代码和实时排行榜可在https://m-a-p.ai/OmniBench找到。

更新时间: 2024-09-24 16:51:45

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.15272v2

Predicting Deterioration in Mild Cognitive Impairment with Survival Transformers, Extreme Gradient Boosting and Cox Proportional Hazard Modelling

The paper proposes a novel approach of survival transformers and extreme gradient boosting models in predicting cognitive deterioration in individuals with mild cognitive impairment (MCI) using metabolomics data in the ADNI cohort. By leveraging advanced machine learning and transformer-based techniques applied in survival analysis, the proposed approach highlights the potential of these techniques for more accurate early detection and intervention in Alzheimer's dementia disease. This research also underscores the importance of non-invasive biomarkers and innovative modelling tools in enhancing the accuracy of dementia risk assessments, offering new avenues for clinical practice and patient care. A comprehensive Monte Carlo simulation procedure consisting of 100 repetitions of a nested cross-validation in which models were trained and evaluated, indicates that the survival machine learning models based on Transformer and XGBoost achieved the highest mean C-index performances, namely 0.85 and 0.8, respectively, and that they are superior to the conventional survival analysis Cox Proportional Hazards model which achieved a mean C-Index of 0.77. Moreover, based on the standard deviations of the C-Index performances obtained in the Monte Carlo simulation, we established that both survival machine learning models above are more stable than the conventional statistical model.

Updated: 2024-09-24 16:49:43

标题: 用生存变换器、极端梯度增强和Cox比例风险模型预测轻度认知功能障碍的恶化

摘要: 本文提出了一种新颖的生存转换器和极限梯度提升模型，用于预测轻度认知障碍（MCI）患者在ADNI队列中使用代谢组学数据的认知恶化。通过利用先进的机器学习和基于转换器的技术应用于生存分析，提出的方法突出了这些技术在更准确早期检测和干预阿尔茨海默病疾病中的潜力。这项研究还强调了非侵入性生物标志物和创新建模工具在增强痴呆风险评估准确性方面的重要性，为临床实践和患者护理提供了新途径。通过包含100次重复的嵌套交叉验证的全面蒙特卡洛模拟程序，其中模型进行训练和评估，表明基于Transformer和XGBoost的生存机器学习模型实现了最高的平均C-指数性能，分别为0.85和0.8，它们优于传统的生存分析Cox比例风险模型，其平均C-指数为0.77。此外，根据蒙特卡洛模拟中获得的C-指数性能的标准差，我们建立了以上两种生存机器学习模型比传统统计模型更稳定的结论。

更新时间: 2024-09-24 16:49:43

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2409.16231v1

Low-degree Security of the Planted Random Subgraph Problem

The planted random subgraph detection conjecture of Abram et al. (TCC 2023) asserts the pseudorandomness of a pair of graphs $(H, G)$, where $G$ is an Erdos-Renyi random graph on $n$ vertices, and $H$ is a random induced subgraph of $G$ on $k$ vertices. Assuming the hardness of distinguishing these two distributions (with two leaked vertices), Abram et al. construct communication-efficient, computationally secure (1) 2-party private simultaneous messages (PSM) and (2) secret sharing for forbidden graph structures. We prove the low-degree hardness of detecting planted random subgraphs all the way up to $k\leq n^{1 - \Omega(1)}$. This improves over Abram et al.'s analysis for $k \leq n^{1/2 - \Omega(1)}$. The hardness extends to $r$-uniform hypergraphs for constant $r$. Our analysis is tight in the distinguisher's degree, its advantage, and in the number of leaked vertices. Extending the constructions of Abram et al, we apply the conjecture towards (1) communication-optimal multiparty PSM protocols for random functions and (2) bit secret sharing with share size $(1 + \epsilon)\log n$ for any $\epsilon > 0$ in which arbitrary minimal coalitions of up to $r$ parties can reconstruct and secrecy holds against all unqualified subsets of up to $\ell = o(\epsilon \log n)^{1/(r-1)}$ parties.

Updated: 2024-09-24 16:42:00

标题: Planted Random Subgraph问题的低度安全性

摘要: Abram等人提出的种植随机子图检测猜想(TCC 2023)断言一对图$(H,G)$的伪随机性，其中$G$是一个具有$n$个顶点的Erdos-Renyi随机图，而$H$是$G$上具有$k$个顶点的随机诱导子图。假设很难区分这两个分布(有两个泄露的顶点)，Abram等人构建了通信高效、计算安全的(1) 2方私密同时信息传递(PSM)和(2) 用于禁止图结构的秘密分享。我们证明了检测种植随机子图的低次难度一直到$k\leq n^{1 - \Omega(1)}$。这改善了Abram等人对于$k \leq n^{1/2 - \Omega(1)}$的分析。该难度也适用于常数$r$的$r$-uniform超图。我们的分析在区分器的度、优势以及泄露顶点的数量方面是紧密的。扩展了Abram等人的构造，我们将该猜想应用于(1) 随机函数的通信最优多方PSM协议和(2) 具有份额大小为$(1 + \epsilon)\log n$的比特秘密分享，其中任意最小联合体最多可以重建$r$方，并且对所有不合格的最多$\ell = o(\epsilon \log n)^{1/(r-1)}$方的子集保持保密。

更新时间: 2024-09-24 16:42:00

领域: cs.CR,cs.DS,math.ST,stat.TH

下载: http://arxiv.org/abs/2409.16227v1

Fine-Tuning is Fine, if Calibrated

Fine-tuning is arguably the most straightforward way to tailor a pre-trained model (e.g., a foundation model) to downstream applications, but it also comes with the risk of losing valuable knowledge the model had learned in pre-training. For example, fine-tuning a pre-trained classifier capable of recognizing a large number of classes to master a subset of classes at hand is shown to drastically degrade the model's accuracy in the other classes it had previously learned. As such, it is hard to further use the fine-tuned model when it encounters classes beyond the fine-tuning data. In this paper, we systematically dissect the issue, aiming to answer the fundamental question, ''What has been damaged in the fine-tuned model?'' To our surprise, we find that the fine-tuned model neither forgets the relationship among the other classes nor degrades the features to recognize these classes. Instead, the fine-tuned model often produces more discriminative features for these other classes, even if they were missing during fine-tuning! {What really hurts the accuracy is the discrepant logit scales between the fine-tuning classes and the other classes}, implying that a simple post-processing calibration would bring back the pre-trained model's capability and at the same time unveil the feature improvement over all classes. We conduct an extensive empirical study to demonstrate the robustness of our findings and provide preliminary explanations underlying them, suggesting new directions for future theoretical analysis. Our code is available at https://github.com/OSU-MLB/Fine-Tuning-Is-Fine-If-Calibrated.

Updated: 2024-09-24 16:35:16

标题: 微调是可以的，只要校准得当

摘要: 微调可以说是将预先训练的模型（例如，基础模型）调整到下游应用程序的最直接方式，但也存在失去模型在预训练中学到的宝贵知识的风险。例如，微调一个能够识别大量类别的预先训练分类器以掌握手头的类别子集，会导致模型在先前学习的其他类别中的准确性急剧下降。因此，当微调后的模型遇到超出微调数据范围的类别时，很难进一步使用微调后的模型。在本文中，我们系统地剖析了这个问题，旨在回答一个基本问题，“微调后的模型受到了什么损害？” 令人惊讶的是，我们发现微调后的模型既不会忘记其他类别之间的关系，也不会降低识别这些类别的特征。相反，微调后的模型通常为这些其他类别产生更具辨识性的特征，即使在微调过程中缺失了这些类别！“真正影响准确性的是微调类别和其他类别之间不一致的逻辑刻度”，这意味着一个简单的后处理校准可以恢复预训练模型的能力，同时揭示所有类别的特征改进。我们进行了广泛的实证研究来证明我们发现的稳健性，并提供了潜在的解释，为未来的理论分析提供新的方向。我们的代码可在https://github.com/OSU-MLB/Fine-Tuning-Is-Fine-If-Calibrated上找到。

更新时间: 2024-09-24 16:35:16

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.16223v1

Towards Enhancing Linked Data Retrieval in Conversational UIs using Large Language Models

Despite the recent broad adoption of Large Language Models (LLMs) across various domains, their potential for enriching information systems in extracting and exploring Linked Data (LD) and Resource Description Framework (RDF) triplestores has not been extensively explored. This paper examines the integration of LLMs within existing systems, emphasising the enhancement of conversational user interfaces (UIs) and their capabilities for data extraction by producing more accurate SPARQL queries without the requirement for model retraining. Typically, conversational UI models necessitate retraining with the introduction of new datasets or updates, limiting their functionality as general-purpose extraction tools. Our approach addresses this limitation by incorporating LLMs into the conversational UI workflow, significantly enhancing their ability to comprehend and process user queries effectively. By leveraging the advanced natural language understanding capabilities of LLMs, our method improves RDF entity extraction within web systems employing conventional chatbots. This integration facilitates a more nuanced and context-aware interaction model, critical for handling the complex query patterns often encountered in RDF datasets and Linked Open Data (LOD) endpoints. The evaluation of this methodology shows a marked enhancement in system expressivity and the accuracy of responses to user queries, indicating a promising direction for future research in this area. This investigation not only underscores the versatility of LLMs in enhancing existing information systems but also sets the stage for further explorations into their potential applications within more specialised domains of web information systems.

Updated: 2024-09-24 16:31:33

标题: 朝向利用大型语言模型增强对话式用户界面中的链接数据检索

摘要: 尽管近年来大型语言模型（LLMs）在各个领域得到了广泛应用，但它们在提取和探索链接数据（LD）和资源描述框架（RDF）三元组存储中的潜力尚未得到广泛探讨。本文研究了LLMs在现有系统中的集成，重点强调了增强会话式用户界面（UIs）及其在数据提取方面的能力，通过生成更准确的SPARQL查询而无需重新训练模型。通常，会话式UI模型需要在引入新数据集或更新时重新训练，从而限制了它们作为通用提取工具的功能。我们的方法通过将LLMs纳入会话式UI工作流程，显著增强了它们有效理解和处理用户查询的能力。通过利用LLMs的先进自然语言理解能力，我们的方法改进了在采用传统聊天机器人的Web系统中的RDF实体提取。这种集成促进了更加细致和具有上下文意识的交互模型，对于处理在RDF数据集和链接开放数据（LOD）端点中经常遇到的复杂查询模式至关重要。对这种方法的评估显示了系统表达能力和对用户查询的响应准确性的显著提升，表明了未来研究在这一领域的有望方向。这项研究不仅强调了LLMs在增强现有信息系统方面的多功能性，也为进一步探索它们在Web信息系统更专门领域的潜在应用奠定了基础。

更新时间: 2024-09-24 16:31:33

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.16220v1

Problem-oriented AutoML in Clustering

The Problem-oriented AutoML in Clustering (PoAC) framework introduces a novel, flexible approach to automating clustering tasks by addressing the shortcomings of traditional AutoML solutions. Conventional methods often rely on predefined internal Clustering Validity Indexes (CVIs) and static meta-features, limiting their adaptability and effectiveness across diverse clustering tasks. In contrast, PoAC establishes a dynamic connection between the clustering problem, CVIs, and meta-features, allowing users to customize these components based on the specific context and goals of their task. At its core, PoAC employs a surrogate model trained on a large meta-knowledge base of previous clustering datasets and solutions, enabling it to infer the quality of new clustering pipelines and synthesize optimal solutions for unseen datasets. Unlike many AutoML frameworks that are constrained by fixed evaluation metrics and algorithm sets, PoAC is algorithm-agnostic, adapting seamlessly to different clustering problems without requiring additional data or retraining. Experimental results demonstrate that PoAC not only outperforms state-of-the-art frameworks on a variety of datasets but also excels in specific tasks such as data visualization, and highlight its ability to dynamically adjust pipeline configurations based on dataset complexity.

Updated: 2024-09-24 16:25:53

标题: 基于问题导向的自动机器学习在聚类中的应用

摘要: Problem-oriented AutoML in Clustering (PoAC) 框架引入了一种新颖、灵活的方法，通过解决传统 AutoML 解决方案的缺点来自动化聚类任务。传统方法通常依赖预定义的内部聚类有效性指数 (CVIs) 和静态元特征，限制了它们在各种聚类任务中的适应性和有效性。相反，PoAC 在聚类问题、CVIs 和元特征之间建立了动态连接，使用户能够根据任务的具体背景和目标来定制这些组件。在其核心，PoAC 使用一个在先前聚类数据集和解决方案的大型元知识库上训练的替代模型，使其能够推断新的聚类管道的质量，并为未见数据集综合出最优解决方案。与许多受固定评估指标和算法集约束的 AutoML 框架不同，PoAC 不受算法限制，能够无缝适应不同的聚类问题，而无需额外数据或重新训练。实验结果表明，PoAC 在各种数据集上不仅优于最先进的框架，而且在特定任务如数据可视化方面表现出色，并突显了它根据数据集复杂性动态调整管道配置的能力。

更新时间: 2024-09-24 16:25:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.16218v1

Sparse-to-Dense LiDAR Point Generation by LiDAR-Camera Fusion for 3D Object Detection

Accurately detecting objects at long distances remains a critical challenge in 3D object detection when relying solely on LiDAR sensors due to the inherent limitations of data sparsity. To address this issue, we propose the LiDAR-Camera Augmentation Network (LCANet), a novel framework that reconstructs LiDAR point cloud data by fusing 2D image features, which contain rich semantic information, generating additional points to improve detection accuracy. LCANet fuses data from LiDAR sensors and cameras by projecting image features into the 3D space, integrating semantic information into the point cloud data. This fused data is then encoded to produce 3D features that contain both semantic and spatial information, which are further refined to reconstruct final points before bounding box prediction. This fusion effectively compensates for LiDAR's weakness in detecting objects at long distances, which are often represented by sparse points. Additionally, due to the sparsity of many objects in the original dataset, which makes effective supervision for point generation challenging, we employ a point cloud completion network to create a complete point cloud dataset that supervises the generation of dense point clouds in our network. Extensive experiments on the KITTI and Waymo datasets demonstrate that LCANet significantly outperforms existing models, particularly in detecting sparse and distant objects.

Updated: 2024-09-24 16:20:30

标题: LiDAR相机融合实现稀疏到密集的LiDAR点云生成用于3D物体检测

摘要: 在仅依赖LiDAR传感器时，准确地检测长距离的物体仍然是3D物体检测中的一个关键挑战，这是由于数据稀疏性的固有限制。为了解决这个问题，我们提出了LiDAR-Camera增强网络（LCANet），这是一个新颖的框架，通过融合包含丰富语义信息的2D图像特征重建LiDAR点云数据，生成额外的点以提高检测准确性。LCANet通过将图像特征投影到3D空间，将来自LiDAR传感器和摄像头的数据融合在一起，将语义信息集成到点云数据中。然后对这些融合的数据进行编码，产生包含语义和空间信息的3D特征，进一步对其进行细化以重建最终点云，然后进行边界框预测。这种融合有效弥补了LiDAR在检测长距离物体方面的弱点，这些物体通常由稀疏点表示。此外，由于原始数据集中许多对象的稀疏性，使得点生成的有效监督变得具有挑战性，因此我们使用点云完成网络来创建完整的点云数据集，监督我们网络中密集点云的生成。对KITTI和Waymo数据集的大量实验表明，LCANet在检测稀疏和远距离物体方面明显优于现有模型。

更新时间: 2024-09-24 16:20:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.14985v2

Fairness and Bias in Algorithmic Hiring: a Multidisciplinary Survey

Employers are adopting algorithmic hiring technology throughout the recruitment pipeline. Algorithmic fairness is especially applicable in this domain due to its high stakes and structural inequalities. Unfortunately, most work in this space provides partial treatment, often constrained by two competing narratives, optimistically focused on replacing biased recruiter decisions or pessimistically pointing to the automation of discrimination. Whether, and more importantly what types of, algorithmic hiring can be less biased and more beneficial to society than low-tech alternatives currently remains unanswered, to the detriment of trustworthiness. This multidisciplinary survey caters to practitioners and researchers with a balanced and integrated coverage of systems, biases, measures, mitigation strategies, datasets, and legal aspects of algorithmic hiring and fairness. Our work supports a contextualized understanding and governance of this technology by highlighting current opportunities and limitations, providing recommendations for future work to ensure shared benefits for all stakeholders.

Updated: 2024-09-24 16:18:51

标题: 算法招聘中的公平性和偏见：多学科调查

摘要: 雇主正在采用算法招聘技术来进行招聘流程。由于高风险和结构性不平等，算法公平在这一领域尤为适用。不幸的是，这一领域的大部分工作提供了部分处理，往往受到两种相互竞争的叙事的限制，乐观地专注于取代有偏见的招聘者决策，或者悲观地指出歧视的自动化。目前尚未得知，以及更重要的是，算法招聘可以比目前低技术替代方案更少偏见且对社会更有益。这种多学科调查适合从业者和研究人员，平衡并整合了系统、偏见、度量、缓解策略、数据集以及算法招聘和公平的法律方面。我们的工作通过突出当前的机会和限制，提供未来工作的建议，以确保所有利益相关者共享利益，支持对该技术的情境化理解和治理。

更新时间: 2024-09-24 16:18:51

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2309.13933v3

Deep Learning for Precision Agriculture: Post-Spraying Evaluation and Deposition Estimation

Precision spraying evaluation requires automation primarily in post-spraying imagery. In this paper we propose an eXplainable Artificial Intelligence (XAI) computer vision pipeline to evaluate a precision spraying system post-spraying without the need for traditional agricultural methods. The developed system can semantically segment potential targets such as lettuce, chickweed, and meadowgrass and correctly identify if targets have been sprayed. Furthermore, this pipeline evaluates using a domain-specific Weakly Supervised Deposition Estimation task, allowing for class-specific quantification of spray deposit weights in {\mu}L. Estimation of coverage rates of spray deposition in a class-wise manner allows for further understanding of effectiveness of precision spraying systems. Our study evaluates different Class Activation Mapping techniques, namely AblationCAM and ScoreCAM, to determine which is more effective and interpretable for these tasks. In the pipeline, inference-only feature fusion is used to allow for further interpretability and to enable the automation of precision spraying evaluation post-spray. Our findings indicate that a Fully Convolutional Network with an EfficientNet-B0 backbone and inference-only feature fusion achieves an average absolute difference in deposition values of 156.8 {\mu}L across three classes in our test set. The dataset curated in this paper is publicly available at https://github.com/Harry-Rogers/PSIE

Updated: 2024-09-24 16:16:19

标题: 深度学习在精准农业中的应用：喷洒后评估和沉积估计

摘要: Precision spraying evaluation需要主要在喷洒后的图像中实现自动化。在本文中，我们提出了一种可解释的人工智能（XAI）计算机视觉流程，用于评估一个精准喷洒系统在喷洒后而无需传统农业方法。开发的系统可以语义分割潜在目标，如生菜、蒲公英和禾本科草，并正确识别目标是否已被喷洒。此外，该流程使用特定领域的弱监督沉积估计任务，允许对喷雾沉积重量进行类别特定的量化（单位为μL）。以类别方式估计喷洒沉积的覆盖率，进一步了解精准喷洒系统的有效性。我们的研究评估了不同的类激活映射技术，即AblationCAM和ScoreCAM，以确定哪种对这些任务更有效和可解释。在流程中，仅推断特征融合用于进一步解释和实现喷洒后精准喷洒评估的自动化。我们的研究结果表明，一个具有EfficientNet-B0骨干和仅推断特征融合的全卷积网络在我们的测试集中的三个类别中实现了156.8μL的沉积值的平均绝对差。本文中策划的数据集可以在https://github.com/Harry-Rogers/PSIE上公开获取。

更新时间: 2024-09-24 16:16:19

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.16213v1

MaskBit: Embedding-free Image Generation via Bit Tokens

Masked transformer models for class-conditional image generation have become a compelling alternative to diffusion models. Typically comprising two stages - an initial VQGAN model for transitioning between latent space and image space, and a subsequent Transformer model for image generation within latent space - these frameworks offer promising avenues for image synthesis. In this study, we present two primary contributions: Firstly, an empirical and systematic examination of VQGANs, leading to a modernized VQGAN. Secondly, a novel embedding-free generation network operating directly on bit tokens - a binary quantized representation of tokens with rich semantics. The first contribution furnishes a transparent, reproducible, and high-performing VQGAN model, enhancing accessibility and matching the performance of current state-of-the-art methods while revealing previously undisclosed details. The second contribution demonstrates that embedding-free image generation using bit tokens achieves a new state-of-the-art FID of 1.52 on the ImageNet 256x256 benchmark, with a compact generator model of mere 305M parameters.

Updated: 2024-09-24 16:12:12

标题: MaskBit: 通过比特标记实现无嵌入图像生成

摘要: 遮蔽的Transformer模型已成为有条件的图像生成的引人注目的替代品，而不是扩散模型。通常由两个阶段组成 - 初始的VQGAN模型用于在潜在空间和图像空间之间过渡，以及随后的Transformer模型用于在潜在空间中生成图像 - 这些框架为图像合成提供了有前途的途径。在这项研究中，我们提出了两个主要贡献：首先，对VQGAN进行实证和系统性检查，导致了现代化的VQGAN。其次，是一种新颖的无嵌入生成网络，直接在位令牌上操作 - 这是一种具有丰富语义的位量化表示。第一个贡献提供了一个透明、可重现且性能优越的VQGAN模型，增强了可访问性，并匹配了当前最先进方法的性能，同时揭示了先前未披露的细节。第二个贡献表明，使用位令牌的无嵌入图像生成实现了ImageNet 256x256基准测试的新FID最佳值为1.52，生成器模型仅有305M个参数。

更新时间: 2024-09-24 16:12:12

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.16211v1

Testing Dependency of Weighted Random Graphs

In this paper, we study the task of detecting the edge dependency between two weighted random graphs. We formulate this task as a simple hypothesis testing problem, where under the null hypothesis, the two observed graphs are statistically independent, whereas under the alternative, the edges of one graph are dependent on the edges of a uniformly and randomly vertex-permuted version of the other graph. For general edge-weight distributions, we establish thresholds at which optimal testing becomes information-theoretically possible or impossible, as a function of the total number of nodes in the observed graphs and the generative distributions of the weights. Finally, we identify a statistical-computational gap, and present evidence suggesting that this gap is inherent using the framework of low-degree polynomials.

Updated: 2024-09-24 16:07:57

标题: 测试加权随机图的依赖性

摘要: 在这篇论文中，我们研究了检测两个加权随机图之间边依赖性的任务。我们将这个任务构建为一个简单的假设检验问题，在零假设下，两个观察到的图在统计上是独立的，而在备择假设下，一个图的边是依赖于另一个图的均匀随机顶点排列版本的边。对于一般的边权重分布，我们建立了阈值，使得在观察到的图中的节点总数和权重的生成分布的函数中，最优检测在信息理论上可能或不可能。最后，我们确定了一个统计计算差距，并提出证据表明这个差距是固有的，使用低次多项式框架。

更新时间: 2024-09-24 16:07:57

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2409.14870v2

A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

Sparse Autoencoders (SAEs) have emerged as a promising approach to decompose the activations of Large Language Models (LLMs) into human-interpretable latents. In this paper, we pose two questions. First, to what extent do SAEs extract monosemantic and interpretable latents? Second, to what extent does varying the sparsity or the size of the SAE affect monosemanticity / interpretability? By investigating these questions in the context of a simple first-letter identification task where we have complete access to ground truth labels for all tokens in the vocabulary, we are able to provide more detail than prior investigations. Critically, we identify a problematic form of feature-splitting we call feature absorption where seemingly monosemantic latents fail to fire in cases where they clearly should. Our investigation suggests that varying SAE size or sparsity is insufficient to solve this issue, and that there are deeper conceptual issues in need of resolution.

Updated: 2024-09-24 16:07:31

标题: A代表吸收：研究稀疏自动编码器中的特征分裂和吸收

摘要: 稀疏自编码器（SAEs）已经成为将大型语言模型（LLMs）的激活分解为人类可解释的潜在特征的一种有前途的方法。本文提出了两个问题。首先，SAEs在多大程度上提取了单一语义和可解释的潜在特征？其次，SAE的稀疏度或大小的变化在何种程度上影响了单一语义性/可解释性？通过在一个简单的首字母识别任务中调查这些问题，在这个任务中我们完全可以访问词汇表中所有标记的真实标签，我们能够提供比以前的调查更详细的信息。关键是，我们发现了一种我们称之为特征吸收的问题形式，即表面上单一语义的潜在特征在清楚应该激活的情况下却未能激活。我们的调查表明，改变SAE的大小或稀疏度是不足以解决这个问题的，而需要解决更深层次的概念问题。

更新时间: 2024-09-24 16:07:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.14507v2

Triggering Dark Showers with Conditional Dual Auto-Encoders

We present a family of conditional dual auto-encoders (CoDAEs) for generic and model-independent new physics searches at colliders. New physics signals, which arise from new types of particles and interactions, are considered in our study as anomalies causing deviations in data with respect to expected background events. In this work, we perform a normal-only anomaly detection, which employs only background samples, to search for manifestations of a dark version of strong force applying (variational) auto-encoders on raw detector images, which are large and highly sparse, without leveraging any physics-based pre-processing or strong assumption on the signals. The proposed CoDAE has a dual-encoder design, which is general and can learn an auxiliary yet compact latent space through spatial conditioning, showing a neat improvement over competitive physics-based baselines and related approaches, therefore also reducing the gap with fully supervised models. It is the first time an unsupervised model is shown to exhibit excellent discrimination against multiple dark shower models, illustrating the suitability of this method as an accurate, fast, model-independent algorithm to deploy, e.g., in the real-time event triggering systems of Large Hadron Collider experiments such as ATLAS and CMS.

Updated: 2024-09-24 16:05:46

标题: 使用条件双自动编码器触发暗淋浴

摘要: 我们提出了一种用于在对撞机中进行通用和独立于模型的新物理搜索的条件双自动编码器（CoDAEs）家族。在我们的研究中，考虑到来自新类型粒子和相互作用的新物理信号，这些信号被视为导致数据与预期背景事件发生偏差的异常。在这项工作中，我们执行了一种仅使用背景样本的正常异常检测，以搜索在原始探测器图像上应用（变分）自动编码器的暗强力的表现，这些图像又大又高度稀疏，没有利用任何基于物理的预处理或对信号的强假设。所提出的CoDAE具有双编码器设计，通用且可以通过空间调节学习一个辅助但紧凑的潜在空间，显示出对竞争性基于物理的基线和相关方法的明显改进，从而也缩小了与完全监督模型之间的差距。这是第一次展示无监督模型对多个暗淋浴模型具有出色的区分能力，说明了这种方法作为准确、快速、独立于模型的算法的适用性，可部署在大型强子对撞机实验（如ATLAS和CMS）的实时事件触发系统中。

更新时间: 2024-09-24 16:05:46

领域: hep-ex,cs.LG

下载: http://arxiv.org/abs/2306.12955v2

Large-scale digital phenotyping: identifying depression and anxiety indicators in a general UK population with over 10,000 participants

Digital phenotyping offers a novel and cost-efficient approach for managing depression and anxiety. Previous studies, often limited to small-to-medium or specific populations, may lack generalizability. We conducted a cross-sectional analysis of data from 10,129 participants recruited from a UK-based general population between June 2020 and August 2022. Participants shared wearable (Fitbit) data and self-reported questionnaires on depression (PHQ-8), anxiety (GAD-7), and mood via a study app. We first examined the correlations between PHQ-8/GAD-7 scores and wearable-derived features, demographics, health data, and mood assessments. Subsequently, unsupervised clustering was used to identify behavioural patterns associated with depression or anxiety. Finally, we employed separate XGBoost models to predict depression and anxiety and compared the results using different subsets of features. We observed significant associations between the severity of depression and anxiety with several factors, including mood, age, gender, BMI, sleep patterns, physical activity, and heart rate. Clustering analysis revealed that participants simultaneously exhibiting lower physical activity levels and higher heart rates reported more severe symptoms. Prediction models incorporating all types of variables achieved the best performance ($R^2$=0.41, MAE=3.42 for depression; $R^2$=0.31, MAE=3.50 for anxiety) compared to those using subsets of variables. This study identified potential indicators for depression and anxiety, highlighting the utility of digital phenotyping and machine learning technologies for rapid screening of mental disorders in general populations. These findings provide robust real-world insights for future healthcare applications.

Updated: 2024-09-24 16:05:17

标题: 大规模数字表型研究：在英国一般人口中识别抑郁和焦虑指标，涉及超过1万名参与者

摘要: 数字表型学为管理抑郁症和焦虑症提供了一种新颖且高效的方法。先前的研究往往仅限于小规模或特定人群，可能缺乏普适性。我们对来自英国一般人口的10,129名参与者的数据进行了横断面分析，时间跨度为2020年6月至2022年8月。参与者通过研究应用程序分享了可穿戴设备（Fitbit）数据和自我报告的抑郁（PHQ-8）、焦虑（GAD-7）和情绪问卷。我们首先检查了PHQ-8/GAD-7得分与可穿戴设备衍生特征、人口统计学数据、健康数据和情绪评估之间的相关性。随后，使用无监督聚类来识别与抑郁或焦虑相关的行为模式。最后，我们采用独立的XGBoost模型来预测抑郁和焦虑，并使用不同特征子集比较结果。我们观察到抑郁和焦虑严重程度与多个因素（包括情绪、年龄、性别、BMI、睡眠模式、体力活动和心率）之间存在显著关联。聚类分析显示，同时表现出较低体力活动水平和较高心率的参与者报告了更严重的症状。综合考虑所有类型变量的预测模型表现最佳（抑郁R²=0.41，MAE=3.42；焦虑R²=0.31，MAE=3.50），与使用变量子集的模型相比。这项研究识别了抑郁和焦虑的潜在指标，突显了数字表型学和机器学习技术在一般人口中快速筛查心理障碍的实用性。这些发现为未来医疗应用提供了强大的现实世界见解。

更新时间: 2024-09-24 16:05:17

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2409.16339v1

AUGUR, A flexible and efficient optimization algorithm for identification of optimal adsorption sites

In this paper, we propose a novel flexible optimization pipeline for determining the optimal adsorption sites, named AUGUR (Aware of Uncertainty Graph Unit Regression). Our model combines graph neural networks and Gaussian processes to create a flexible, efficient, symmetry-aware, translation, and rotation-invariant predictor with inbuilt uncertainty quantification. This predictor is then used as a surrogate for a data-efficient Bayesian Optimization scheme to determine the optimal adsorption positions. This pipeline determines the optimal position of large and complicated clusters with far fewer iterations than current state-of-the-art approaches. Further, it does not rely on hand-crafted features and can be seamlessly employed on any molecule without any alterations. Additionally, the pooling properties of graphs allow for the processing of molecules of different sizes by the same model. This allows the energy prediction of computationally demanding systems by a model trained on comparatively smaller and less expensive ones

Updated: 2024-09-24 16:03:01

标题: 占位符，一种用于确定最佳吸附位点的灵活高效优化算法

摘要: 本文提出了一种新颖的灵活优化流程，用于确定最佳吸附位置，名为AUGUR（感知不确定性图单元回归）。我们的模型结合了图神经网络和高斯过程，创建了一个灵活、高效、对称感知、平移和旋转不变的预测器，并具有内置的不确定性量化。然后，将此预测器用作数据效率高的贝叶斯优化方案的替代，以确定最佳吸附位置。与当前最先进方法相比，该流程确定了大型和复杂团簇的最佳位置所需的迭代次数要少得多。此外，它不依赖手工设计的特征，并且可以无缝地应用于任何分子而无需任何修改。此外，图的聚合特性允许使用相同模型处理不同大小的分子。这使得通过在相对较小和便宜的系统上训练的模型来预测计算要求高的系统的能量成为可能。

更新时间: 2024-09-24 16:03:01

领域: physics.chem-ph,cs.LG

下载: http://arxiv.org/abs/2409.16204v1

Incentivizing Exploration with Linear Contexts and Combinatorial Actions

We advance the study of incentivized bandit exploration, in which arm choices are viewed as recommendations and are required to be Bayesian incentive compatible. Recent work has shown under certain independence assumptions that after collecting enough initial samples, the popular Thompson sampling algorithm becomes incentive compatible. We give an analog of this result for linear bandits, where the independence of the prior is replaced by a natural convexity condition. This opens up the possibility of efficient and regret-optimal incentivized exploration in high-dimensional action spaces. In the semibandit model, we also improve the sample complexity for the pre-Thompson sampling phase of initial data collection.

Updated: 2024-09-24 16:02:29

标题: 用线性语境和组合动作激励探索

摘要: 我们推进了激励式贝叶斯探索的研究，其中臂的选择被视为推荐，并且需要符合贝叶斯激励兼容性。最近的研究表明，在某些独立性假设下，在收集足够的初始样本之后，流行的汤普森抽样算法变得激励兼容。我们为线性臂类提供了这一结果的类比，其中先验的独立性被自然凸性条件取代。这为高维动作空间中的高效和最优后悔激励式探索打开了可能性。在半臂模型中，我们还改进了收集初始数据的预汤普森抽样阶段的样本复杂度。

更新时间: 2024-09-24 16:02:29

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2306.01990v3

Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech

We propose FEIM-TTS, an innovative zero-shot text-to-speech (TTS) model that synthesizes emotionally expressive speech, aligned with facial images and modulated by emotion intensity. Leveraging deep learning, FEIM-TTS transcends traditional TTS systems by interpreting facial cues and adjusting to emotional nuances without dependence on labeled datasets. To address sparse audio-visual-emotional data, the model is trained using LRS3, CREMA-D, and MELD datasets, demonstrating its adaptability. FEIM-TTS's unique capability to produce high-quality, speaker-agnostic speech makes it suitable for creating adaptable voices for virtual characters. Moreover, FEIM-TTS significantly enhances accessibility for individuals with visual impairments or those who have trouble seeing. By integrating emotional nuances into TTS, our model enables dynamic and engaging auditory experiences for webcomics, allowing visually impaired users to enjoy these narratives more fully. Comprehensive evaluation evidences its proficiency in modulating emotion and intensity, advancing emotional speech synthesis and accessibility. Samples are available at: https://feim-tts.github.io/.

Updated: 2024-09-24 16:01:12

标题: 面部表情增强的TTS：结合面部表达和情感强度实现自适应语音

摘要: 我们提出了FEIM-TTS，一种创新的零样本文本到语音（TTS）模型，可以合成与面部图像对齐并受情绪强度调节的情感表达语音。利用深度学习，FEIM-TTS通过解释面部线索并调整情感细微差别，超越传统的TTS系统，而不依赖标记的数据集。为了解决稀疏的音频-视觉-情感数据，该模型使用LRS3、CREMA-D和MELD数据集进行训练，展示了其适应性。FEIM-TTS独特的能力可以产生高质量、与说话者无关的语音，使其适用于为虚拟角色创建可适应的声音。此外，FEIM-TTS显著提高了视力受损或视力有障碍的个人的可访问性。通过将情感细微差别融入TTS中，我们的模型使网络漫画拥有动态而引人入胜的听觉体验，让视障用户更充分地享受这些叙事。全面的评估证明了其在调节情感和强度方面的熟练程度，推进了情感语音合成和可访问性。示例可在以下链接找到：https://feim-tts.github.io/。

更新时间: 2024-09-24 16:01:12

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2409.16203v1

CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data

Online education platforms have significantly transformed the dissemination of educational resources by providing a dynamic and digital infrastructure. With the further enhancement of this transformation, the advent of Large Language Models (LLMs) has elevated the intelligence levels of these platforms. However, current academic benchmarks provide limited guidance for real-world industry scenarios. This limitation arises because educational applications require more than mere test question responses. To bridge this gap, we introduce CJEval, a benchmark based on Chinese Junior High School Exam Evaluations. CJEval consists of 26,136 samples across four application-level educational tasks covering ten subjects. These samples include not only questions and answers but also detailed annotations such as question types, difficulty levels, knowledge concepts, and answer explanations. By utilizing this benchmark, we assessed LLMs' potential applications and conducted a comprehensive analysis of their performance by fine-tuning on various educational tasks. Extensive experiments and discussions have highlighted the opportunities and challenges of applying LLMs in the field of education.

Updated: 2024-09-24 16:00:28

标题: CJEval：使用中国初中考试数据评估大型语言模型的基准

摘要: 在线教育平台通过提供一个动态和数字化的基础设施显著改变了教育资源的传播。随着这种转变的进一步加强，大型语言模型（LLMs）的出现提升了这些平台的智能水平。然而，目前的学术基准提供了有限的指导，无法适用于现实世界的行业场景。这种限制的原因在于教育应用需要不仅仅是简单的测试问题回答。为了弥补这一差距，我们引入了一个基于中国初中考试评估的基准，即CJEval。CJEval包含26,136个样本，涵盖了十个学科的四个应用级教育任务。这些样本不仅包括问题和答案，还包括详细的注释，如问题类型、难度级别、知识概念和答案解释。通过利用这个基准，我们评估了LLMs在各种教育任务上的潜在应用，并进行了对它们性能的全面分析，通过在各种教育任务上进行微调。广泛的实验和讨论突显了在教育领域应用LLMs的机遇和挑战。

更新时间: 2024-09-24 16:00:28

领域: cs.AI

下载: http://arxiv.org/abs/2409.16202v1

Stochastic Multi-round Submodular Optimization with Budget

In this work, we study the Stochastic Budgeted Multi-round Submodular Maximization (SBMSm) problem, where we aim to adaptively maximize the sum, over multiple rounds, of a monotone and submodular objective function defined on subsets of items. The objective function also depends on the realization of stochastic events, and the total number of items we can select over all rounds is bounded by a limited budget. This problem extends, and generalizes to multiple round settings, well-studied problems such as (adaptive) influence maximization and stochastic probing. We show that, if the number of items and stochastic events is somehow bounded, there is a polynomial time dynamic programming algorithm for SBMSm. Then, we provide a simple greedy $1/2(1-1/e-\epsilon)\approx 0.316$-approximation algorithm for SBMSm, that first non-adaptively allocates the budget to be spent at each round, and then greedily and adaptively maximizes the objective function by using the budget assigned at each round. Finally, we introduce the {\em budget-adaptivity gap}, by which we measure how much an adaptive policy for SBMSm is better than an optimal partially adaptive one that, as in our greedy algorithm, determines the budget allocation in advance. We show that the budget-adaptivity gap lies between $e/(e-1)\approx 1.582$ and $2$.

Updated: 2024-09-24 15:58:29

标题: 随机多轮带预算的次模优化

摘要: 在这项工作中，我们研究了随机预算多轮子模块化最大化（SBMSm）问题，在这个问题中，我们的目标是在多轮中自适应地最大化定义在物品子集上的单调和子模块化目标函数的总和。目标函数还取决于随机事件的实现，并且我们在所有轮次中可以选择的物品总数受限于有限预算。这个问题扩展并推广了多轮设置下的众所周知的问题，如（自适应）影响最大化和随机探测。我们证明，如果物品和随机事件的数量在某种程度上受限，那么SBMSm就有一个多项式时间的动态规划算法。然后，我们为SBMSm提供了一个简单的贪心$1/2(1-1/e-\epsilon)\approx 0.316$-近似算法，该算法首先非自适应地分配每轮要花费的预算，然后贪婪地并自适应地使用每轮分配的预算最大化目标函数。最后，我们引入了“预算自适应性差距”，通过它我们衡量了一个自适应策略对于SBMSm比一个部分自适应的最优策略更好多少，就像我们的贪心算法一样，提前确定预算分配。我们证明，预算自适应性差距在$e/(e-1)\approx 1.582$和$2$之间。

更新时间: 2024-09-24 15:58:29

领域: cs.DS,cs.AI

下载: http://arxiv.org/abs/2404.13737v3

Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking

Text ranking has witnessed significant advancements, attributed to the utilization of dual-encoder enhanced by Pre-trained Language Models (PLMs). Given the proliferation of available PLMs, selecting the most effective one for a given dataset has become a non-trivial challenge. As a promising alternative to human intuition and brute-force fine-tuning, Transferability Estimation (TE) has emerged as an effective approach to model selection. However, current TE methods are primarily designed for classification tasks, and their estimated transferability may not align well with the objectives of text ranking. To address this challenge, we propose to compute the expected rank as transferability, explicitly reflecting the model's ranking capability. Furthermore, to mitigate anisotropy and incorporate training dynamics, we adaptively scale isotropic sentence embeddings to yield an accurate expected rank score. Our resulting method, Adaptive Ranking Transferability (AiRTran), can effectively capture subtle differences between models. On challenging model selection scenarios across various text ranking datasets, it demonstrates significant improvements over previous classification-oriented TE methods, human intuition, and ChatGPT with minor time consumption.

Updated: 2024-09-24 15:48:03

标题: 利用估计的可转移性胜过人类直觉在文本排名中的模型选择

摘要: 文本排名已经取得了显著进展，归因于利用由预训练语言模型（PLMs）增强的双编码器。随着可用PLMs的增加，为给定数据集选择最有效的模型已经成为一个非常复杂的挑战。作为人类直觉和 brute-force 调优的一个有希望的替代方案，转移能力估计（TE）已经成为一种有效的模型选择方法。然而，目前的 TE 方法主要是为分类任务而设计的，它们估计的转移能力可能与文本排名的目标不太匹配。为了解决这一挑战，我们提出计算期望排名作为转移能力，明确反映模型的排名能力。此外，为了减少各向异性并纳入训练动态，我们自适应缩放各向同性的句子嵌入以得到准确的期望排名分数。我们得到的方法，自适应排名转移能力（AiRTran），可以有效地捕捉模型之间的细微差异。在各种文本排名数据集上具有挑战性的模型选择场景中，它相对于先前以分类为导向的 TE 方法、人类直觉和 ChatGPT 显示出显著的改进，且时间消耗较少。

更新时间: 2024-09-24 15:48:03

领域: cs.AI

下载: http://arxiv.org/abs/2409.16198v1

Efficient Parallelization Layouts for Large-Scale Distributed Model Training

Efficiently training large language models requires parallelizing across hundreds of hardware accelerators and invoking various compute and memory optimizations. When combined, many of these strategies have complex interactions regarding the final training efficiency. Prior work tackling this problem did not have access to the latest set of optimizations, such as FlashAttention or sequence parallelism. In this work, we conduct a comprehensive ablation study of possible training configurations for large language models. We distill this large study into several key recommendations for the most efficient training. For instance, we find that using a micro-batch size of 1 usually enables the most efficient training layouts. Larger micro-batch sizes necessitate activation checkpointing or higher degrees of model parallelism and also lead to larger pipeline bubbles. Our most efficient configurations enable us to achieve state-of-the-art training efficiency results over a range of model sizes, most notably a Model FLOPs utilization of 70.5% when training a Llama 13B model.

Updated: 2024-09-24 15:42:51

标题: 大规模分布式模型训练的高效并行化布局

摘要: Efficiently training large language models requires parallelizing across hundreds of hardware accelerators and invoking various compute and memory optimizations. When combined, many of these strategies have complex interactions regarding the final training efficiency. Prior work tackling this problem did not have access to the latest set of optimizations, such as FlashAttention or sequence parallelism. In this work, we conduct a comprehensive ablation study of possible training configurations for large language models. We distill this large study into several key recommendations for the most efficient training. For instance, we find that using a micro-batch size of 1 usually enables the most efficient training layouts. Larger micro-batch sizes necessitate activation checkpointing or higher degrees of model parallelism and also lead to larger pipeline bubbles. Our most efficient configurations enable us to achieve state-of-the-art training efficiency results over a range of model sizes, most notably a Model FLOPs utilization of 70.5% when training a Llama 13B model.

更新时间: 2024-09-24 15:42:51

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2311.05610v3

Second Order Bounds for Contextual Bandits with Function Approximation

Many works have developed algorithms no-regret algorithms for contextual bandits with function approximation, where the mean rewards over context-action pairs belongs to a function class. Although there are many approaches to this problem, one that has gained in importance is the use of algorithms based on the optimism principle such as optimistic least squares. It can be shown the regret of this algorithm scales as square root of the product of the eluder dimension (a statistical measure of the complexity of the function class), the logarithm of the function class size and the time horizon. Unfortunately, even if the variance of the measurement noise of the rewards at each time is changing and is very small, the regret of the optimistic least squares algorithm scales with square root of the time horizon. In this work we are the first to develop algorithms that satisfy regret bounds of scaling not with the square root of the time horizon, but the square root of the sum of the measurement variances in the setting of contextual bandits with function approximation when the variances are unknown. These bounds generalize existing techniques for deriving second order bounds in contextual linear problems.

Updated: 2024-09-24 15:42:04

标题: 使用函数逼近的情境下文献标题的二阶界限

摘要: 许多研究已经为具有函数逼近的上下文贝叶斯算法开发了无后悔算法，其中上下文-动作对的平均奖励属于一个函数类。虽然有许多方法可以解决这个问题，但其中一个变得越来越重要的方法是基于乐观原则的算法，如乐观最小二乘法。可以证明，该算法的后悔度随着误导维度（函数类复杂性的统计度量）、函数类大小的对数和时间视角的乘积的平方根而缩放。不幸的是，即使在每个时刻奖励的测量噪声的方差在改变并且非常小的情况下，乐观最小二乘法的后悔度也随着时间视角的平方根而缩放。在这项工作中，我们是第一个开发出能够满足后悔度缩放不是随时间视角的平方根，而是在上下文贝叶斯算法中满足测量方差之和的平方根的后悔度界的算法，当方差是未知的情况下。这些界推广了为在上下文线性问题中推导二阶界的现有技术。

更新时间: 2024-09-24 15:42:04

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2409.16197v1

MDS-ED: Multimodal Decision Support in the Emergency Department -- a Benchmark Dataset for Diagnoses and Deterioration Prediction in Emergency Medicine

Background: A clinically meaningful comparative assessment of medical decision support in emergency care is challenging due to a lack of appropriate datasets with multimodal input modalities and comprehensive prediction task. This hampers measurable progress in the field. Results: We introduce a dataset based on MIMIC-IV, a benchmarking protocol, and initial results for evaluating multimodal decision support in the emergency department (ED). We use diverse data modalities from the first 1.5 hours after patient arrival, including demographics, biometrics, vital signs, lab values, and electrocardiogram waveforms. We analyze 1443 clinical labels across two contexts: predicting diagnoses and patient deterioration. Our diagnostic model achieves an AUROC score over 0.8 in a statistically significant manner for 609 out of 1428 conditions, including cardiac conditions like myocardial infarction and non-cardiac conditions such as renal disease and diabetes. The deterioration model scores above 0.8 in a statistically significant manner for 14 out of 15 targets, including critical events like cardiac arrest, mechanical ventilation, intensive care unit admission, as well as short- and long-term mortality. Furthermore, we provide one of the first robust demonstrations of the significant impact of raw waveform input data on model performance. Conclusions: This study highlights the proposed dataset as a unique resource to foster progress towards measurable progress in the domain of algorithmic decision support in emergency care. The presented multimodal baseline models showcase the potential of diagnostic decision support in the field and provide strong incentives for including raw waveform data.

Updated: 2024-09-24 15:20:57

标题: MDS-ED：急诊科的多模式决策支持——应急医学中诊断和恶化预测的基准数据集

摘要: 背景：由于缺乏多模态输入模式和综合预测任务的适当数据集，医疗决策支持在急救护理中的临床意义比较评估具有挑战性。这妨碍了该领域的可量化进展。结果：我们介绍了一个基于MIMIC-IV的数据集，一个用于评估急诊科（ED）中多模态决策支持的基准协议，并初步结果。我们使用患者到达后的前1.5小时内的多种数据模态，包括人口统计学、生物测量学、生命体征、实验室值和心电图波形。我们分析了两种情境下的1443个临床标签：预测诊断和患者恶化。我们的诊断模型针对1428种情况中的609种以统计显著方式获得了超过0.8的AUROC分数，包括心脏病情况如心肌梗死以及非心脏病情况如肾病和糖尿病。恶化模型在15个目标中的14个以统计显著方式获得了超过0.8的分数，包括心脏骤停、机械通气、重症监护室入院以及短期和长期死亡等关键事件。此外，我们提供了对原始波形输入数据对模型性能的显著影响的首次强有力证明。结论：这项研究强调了所提出的数据集作为推动急救护理中算法决策支持领域的可量化进展的独特资源。所呈现的多模态基线模型展示了该领域诊断决策支持的潜力，并为包括原始波形数据提供了强有力的动力。

更新时间: 2024-09-24 15:20:57

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2407.17856v3

Cyber Knowledge Completion Using Large Language Models

The integration of the Internet of Things (IoT) into Cyber-Physical Systems (CPSs) has expanded their cyber-attack surface, introducing new and sophisticated threats with potential to exploit emerging vulnerabilities. Assessing the risks of CPSs is increasingly difficult due to incomplete and outdated cybersecurity knowledge. This highlights the urgent need for better-informed risk assessments and mitigation strategies. While previous efforts have relied on rule-based natural language processing (NLP) tools to map vulnerabilities, weaknesses, and attack patterns, recent advancements in Large Language Models (LLMs) present a unique opportunity to enhance cyber-attack knowledge completion through improved reasoning, inference, and summarization capabilities. We apply embedding models to encapsulate information on attack patterns and adversarial techniques, generating mappings between them using vector embeddings. Additionally, we propose a Retrieval-Augmented Generation (RAG)-based approach that leverages pre-trained models to create structured mappings between different taxonomies of threat patterns. Further, we use a small hand-labeled dataset to compare the proposed RAG-based approach to a baseline standard binary classification model. Thus, the proposed approach provides a comprehensive framework to address the challenge of cyber-attack knowledge graph completion.

Updated: 2024-09-24 15:20:39

标题: 使用大型语言模型完成网络知识的翻译

摘要: 将物联网（IoT）整合到网络物理系统（CPSs）中扩大了它们的网络攻击面，引入了新的和复杂的威胁，有潜力利用新出现的漏洞。由于网络安全知识不完整和过时，评估CPSs的风险变得越来越困难。这突显了对更明智的风险评估和缓解策略的迫切需要。虽然先前的努力依赖于基于规则的自然语言处理（NLP）工具来映射漏洞、弱点和攻击模式，但最近大型语言模型（LLMs）的进展提供了一个独特的机会，通过改进推理、推断和总结能力来增强网络攻击知识完成。我们应用嵌入模型来封装攻击模式和对抗技术的信息，使用向量嵌入生成它们之间的映射。此外，我们提出了一种基于检索增强生成（RAG）的方法，利用预训练模型创建不同威胁模式分类法之间的结构化映射。此外，我们使用一个小的手工标记的数据集来比较所提出的基于RAG的方法与基线标准二元分类模型。因此，所提出的方法提供了一个全面的框架来解决网络攻击知识图完成的挑战。

更新时间: 2024-09-24 15:20:39

领域: cs.CR,cs.AI,J.7; H.3.3

下载: http://arxiv.org/abs/2409.16176v1

Fourier neural operators for spatiotemporal dynamics in two-dimensional turbulence

High-fidelity direct numerical simulation of turbulent flows for most real-world applications remains an outstanding computational challenge. Several machine learning approaches have recently been proposed to alleviate the computational cost even though they become unstable or unphysical for long time predictions. We identify that the Fourier neural operator (FNO) based models combined with a partial differential equation (PDE) solver can accelerate fluid dynamic simulations and thus address computational expense of large-scale turbulence simulations. We treat the FNO model on the same footing as a PDE solver and answer important questions about the volume and temporal resolution of data required to build pre-trained models for turbulence. We also discuss the pitfalls of purely data-driven approaches that need to be avoided by the machine learning models to become viable and competitive tools for long time simulations of turbulence.

Updated: 2024-09-24 15:13:54

标题: Fourier神经算子在二维湍流中的时空动态效应

摘要: 高保真度直接数值模拟湍流流动对于大多数实际应用仍然是一个突出的计算挑战。尽管最近提出了几种机器学习方法来减轻计算成本，但这些方法在长时间预测时可能会变得不稳定或非物理。我们发现基于傅里叶神经算子（FNO）的模型结合偏微分方程（PDE）求解器可以加速流体动力学模拟，从而解决大规模湍流模拟的计算费用。我们将FNO模型与PDE求解器视为同等重要，并回答了建立用于湍流预训练模型所需的数据的体积和时间分辨率的重要问题。我们还讨论了纯数据驱动方法的缺陷，这些缺陷需要机器学习模型避免，以成为长时间模拟湍流的可行和有竞争力的工具。

更新时间: 2024-09-24 15:13:54

领域: physics.flu-dyn,cs.LG,nlin.CD

下载: http://arxiv.org/abs/2409.14660v2

Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering

Low-Rank Adaptation (LoRA) has emerged as a popular technique for fine-tuning large language models (LLMs) to various domains due to its modular design and widespread availability on platforms like Huggingface. This modularity has sparked interest in combining multiple LoRAs to enhance LLM capabilities. However, existing methods for LoRA composition primarily focus on task-specific adaptations that require additional training, and current model merging techniques often fail to fully leverage LoRA's modular nature, leading to parameter interference and performance degradation. In this paper, we investigate the feasibility of disassembling and reassembling multiple LoRAs at a finer granularity, analogous to assembling LEGO blocks. We introduce the concept of Minimal Semantic Units (MSUs), where the parameters corresponding to each rank in LoRA function as independent units. These MSUs demonstrate permutation invariance and concatenation-summation equivalence properties, enabling flexible combinations to create new LoRAs. Building on these insights, we propose the LoRA-LEGO framework. This framework conducts rank-wise parameter clustering by grouping MSUs from different LoRAs into $k$ clusters. The centroid of each cluster serves as a representative MSU, enabling the assembly of a merged LoRA with an adjusted rank of $k$. Additionally, we apply a dual reweighting strategy to optimize the scale of the merged LoRA. Experiments across various benchmarks demonstrate that our method outperforms existing approaches in LoRA merging.

Updated: 2024-09-24 15:08:41

标题: 像玩乐高一样合并LoRAs：通过等级聚类将LoRA的模块化推向极致

摘要: 低秩适应（LoRA）已经成为微调大型语言模型（LLMs）到各种领域的流行技术，这是由于其模块化设计以及在Huggingface等平台上的广泛可用性。这种模块化性引起了将多个LoRA结合以增强LLM功能的兴趣。然而，现有的LoRA组合方法主要集中在需要额外训练的任务特定适应上，当前的模型合并技术通常未能充分利用LoRA的模块化特性，导致参数干扰和性能下降。在本文中，我们调查了将多个LoRA分解和重新组装的可行性，类似于组装乐高积木。我们引入了最小语义单元（MSUs）的概念，其中LoRA中每个排名对应的参数作为独立单元。这些MSUs表现出置换不变性和串联-求和等价性属性，能够灵活组合以创建新的LoRA。基于这些见解，我们提出了LoRA-LEGO框架。该框架通过将来自不同LoRA的MSUs分组到$k$个簇中进行排名参数聚类。每个簇的质心作为代表性MSU，使得可以组装一个具有调整排名为$k$的合并LoRA。此外，我们应用了双重加权策略来优化合并LoRA的尺度。跨各种基准的实验表明，我们的方法在LoRA合并方面优于现有方法。

更新时间: 2024-09-24 15:08:41

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.16167v1

OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection

Out-of-Distribution (OOD) detection is critical for the reliable operation of open-world intelligent systems. Despite the emergence of an increasing number of OOD detection methods, the evaluation inconsistencies present challenges for tracking the progress in this field. OpenOOD v1 initiated the unification of the OOD detection evaluation but faced limitations in scalability and usability. In response, this paper presents OpenOOD v1.5, a significant improvement from its predecessor that ensures accurate, standardized, and user-friendly evaluation of OOD detection methodologies. Notably, OpenOOD v1.5 extends its evaluation capabilities to large-scale datasets such as ImageNet, investigates full-spectrum OOD detection which is important yet underexplored, and introduces new features including an online leaderboard and an easy-to-use evaluator. This work also contributes in-depth analysis and insights derived from comprehensive experimental results, thereby enriching the knowledge pool of OOD detection methodologies. With these enhancements, OpenOOD v1.5 aims to drive advancements and offer a more robust and comprehensive evaluation benchmark for OOD detection research.

Updated: 2024-09-24 15:07:37

标题: OpenOOD v1.5：增强的用于检测分布外数据的基准Benchmark

摘要: Out-of-Distribution (OOD)检测对于可靠运行的开放式智能系统至关重要。尽管出现了越来越多的OOD检测方法，但评估不一致性提出了追踪该领域进展的挑战。OpenOOD v1开始统一OOD检测评估，但在可伸缩性和可用性方面存在局限性。为此，本文提出了OpenOOD v1.5，这是其前身的一项重大改进，确保对OOD检测方法进行准确、标准化和用户友好的评估。值得注意的是，OpenOOD v1.5将其评估能力扩展到像ImageNet这样的大规模数据集，探讨了重要但尚未充分开发的全谱OOD检测，并引入了新功能，包括在线排行榜和易于使用的评估器。本文还通过全面的实验结果进行深入分析和洞察，从而丰富OOD检测方法的知识库。通过这些增强功能，OpenOOD v1.5旨在推动进步，为OOD检测研究提供更强大和全面的评估基准。

更新时间: 2024-09-24 15:07:37

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2306.09301v4

ChatDBG: An AI-Powered Debugging Assistant

Debugging is a critical but challenging task for programmers. This paper proposes ChatDBG, an AI-powered debugging assistant. ChatDBG integrates large language models (LLMs) to significantly enhance the capabilities and user-friendliness of conventional debuggers. ChatDBG lets programmers engage in a collaborative dialogue with the debugger, allowing them to pose complex questions about program state, perform root cause analysis for crashes or assertion failures, and explore open-ended queries like `why is x null?'. To handle these queries, ChatDBG grants the LLM autonomy to "take the wheel": it can act as an independent agent capable of querying and controlling the debugger to navigate through stacks and inspect program state. It then reports its findings and yields back control to the programmer. Our ChatDBG prototype integrates with standard debuggers including LLDB and GDB for native code and Pdb for Python. Our evaluation across a diverse set of code, including C/C++ code with known bugs and a suite of Python code including standalone scripts and Jupyter notebooks, demonstrates that ChatDBG can successfully analyze root causes, explain bugs, and generate accurate fixes for a wide range of real-world errors. For the Python programs, a single query led to an actionable bug fix 67% of the time; one additional follow-up query increased the success rate to 85%. ChatDBG has seen rapid uptake; it has already been downloaded roughly 50,000 times.

Updated: 2024-09-24 15:07:24

标题: ChatDBG：一款AI辅助调试助手

摘要: 调试是程序员面临的一项至关重要但具有挑战性的任务。本文提出了ChatDBG，一种由人工智能驱动的调试助手。ChatDBG集成了大型语言模型（LLMs），显著增强了传统调试器的能力和用户友好性。ChatDBG允许程序员与调试器进行协作对话，使他们能够提出关于程序状态的复杂问题，对崩溃或断言失败进行根本原因分析，并探索诸如“为什么x为空？”之类的开放式查询。为了处理这些查询，ChatDBG授予LLM自主权来“接管控制”：它可以作为一个独立的代理，能够查询和控制调试器以浏览堆栈并检查程序状态。然后，它报告其发现并将控制权交还给程序员。我们的ChatDBG原型与包括LLDB和GDB在内的标准调试器集成，用于本机代码和用于Python的Pdb。我们在包括已知错误的C/C++代码和一套Python代码（包括独立脚本和Jupyter笔记本）的多样化代码集上进行评估，结果表明ChatDBG能够成功分析根本原因，解释错误并为各种真实世界错误生成准确的修复方案。对于Python程序来说，67%的时间一个查询会导致一个可操作的错误修复；一个额外的跟进查询将成功率提高到85%。ChatDBG已经迅速获得了广泛的应用；它已被下载大约50,000次。

更新时间: 2024-09-24 15:07:24

领域: cs.SE,cs.AI,cs.LG,cs.PL

下载: http://arxiv.org/abs/2403.16354v2

EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges

Although language model (LM) agents are demonstrating growing potential in many domains, their success in cybersecurity has been limited due to simplistic design and the lack of fundamental features for this domain. We present EnIGMA, an LM agent for autonomously solving Capture The Flag (CTF) challenges. EnIGMA introduces new Agent-Computer Interfaces (ACIs) to improve the success rate on CTF challenges. We establish the novel Interactive Agent Tool concept, which enables LM agents to run interactive command-line utilities essential for these challenges. Empirical analysis of EnIGMA on over 350 CTF challenges from three different benchmarks indicates that providing a robust set of new tools with demonstration of their usage helps the LM solve complex problems and achieves state-of-the-art results on the NYU CTF and Intercode-CTF benchmarks. Finally, we discuss insights on ACI design and agent behavior on cybersecurity tasks that highlight the need to adapt real-world tools for LM agents.

Updated: 2024-09-24 15:06:01

标题: EnIGMA：用于CTF挑战的增强型交互式生成模型代理

摘要: 虽然语言模型（LM）代理在许多领域展示出越来越大的潜力，但它们在网络安全领域的成功受限于简单的设计和缺乏这一领域的基本特性。我们提出 EnIGMA，一个用于自主解决捉旗挑战（CTF）的LM代理。EnIGMA引入了新的Agent-Computer Interfaces（ACIs），以提高在CTF挑战中的成功率。我们建立了新颖的交互式代理工具概念，使LM代理能够运行对这些挑战至关重要的交互式命令行实用程序。对EnIGMA在来自三个不同基准测试的超过350个CTF挑战上的实证分析表明，提供一套强大的新工具，并展示它们的使用方式有助于LM解决复杂问题，并在NYU CTF和Intercode-CTF基准测试上取得了最新的成果。最后，我们讨论了关于ACI设计和代理行为在网络安全任务中的见解，强调LM代理需要适应真实世界工具的需求。

更新时间: 2024-09-24 15:06:01

领域: cs.AI

下载: http://arxiv.org/abs/2409.16165v1

Learn and Don't Forget: Adding a New Language to ASR Foundation Models

Foundation ASR models often support many languages, e.g. 100 languages in Whisper. However, there has been limited work on integrating an additional, typically low-resource, language, while maintaining performance on the original language set. Fine-tuning, while simple, may degrade the accuracy of the original set. We compare three approaches that exploit adaptation parameters: soft language code tuning, train only the language code; soft prompt tuning, train prepended tokens; and LoRA where a small set of additional parameters are optimised. Elastic Weight Consolidation (EWC) offers an alternative compromise with the potential to maintain performance in specific target languages. Results show that direct fine-tuning yields the best performance for the new language but degrades existing language capabilities. EWC can address this issue for specific languages. If only adaptation parameters are used, the language capabilities are maintained but at the cost of performance in the new language.

Updated: 2024-09-24 15:04:49

标题: 学习并不会遗忘：向ASR基础模型添加新语言

摘要: 基础ASR模型通常支持许多语言，例如Whisper中的100种语言。然而，在保持原始语言集性能的同时，对于集成另一种通常是资源匮乏的语言，相关工作有限。微调虽然简单，但可能会降低原始集合的准确性。我们比较了三种利用适应参数的方法：软语言代码调整，仅训练语言代码；软提示调整，训练添加的标记；以及LoRA，其中优化了一小组额外参数。弹性权重整合（EWC）提供了一个替代性妥协方案，有潜力在特定目标语言中保持性能。结果显示，直接微调对新语言的性能表现最佳，但降低了现有语言的能力。EWC可以解决特定语言的问题。如果只使用适应参数，语言能力会得到保持，但以牺牲新语言性能为代价。

更新时间: 2024-09-24 15:04:49

领域: eess.AS,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2407.06800v3

GaRField++: Reinforced Gaussian Radiance Fields for Large-Scale 3D Scene Reconstruction

This paper proposes a novel framework for large-scale scene reconstruction based on 3D Gaussian splatting (3DGS) and aims to address the scalability and accuracy challenges faced by existing methods. For tackling the scalability issue, we split the large scene into multiple cells, and the candidate point-cloud and camera views of each cell are correlated through a visibility-based camera selection and a progressive point-cloud extension. To reinforce the rendering quality, three highlighted improvements are made in comparison with vanilla 3DGS, which are a strategy of the ray-Gaussian intersection and the novel Gaussians density control for learning efficiency, an appearance decoupling module based on ConvKAN network to solve uneven lighting conditions in large-scale scenes, and a refined final loss with the color loss, the depth distortion loss, and the normal consistency loss. Finally, the seamless stitching procedure is executed to merge the individual Gaussian radiance field for novel view synthesis across different cells. Evaluation of Mill19, Urban3D, and MatrixCity datasets shows that our method consistently generates more high-fidelity rendering results than state-of-the-art methods of large-scale scene reconstruction. We further validate the generalizability of the proposed approach by rendering on self-collected video clips recorded by a commercial drone.

Updated: 2024-09-24 15:03:24

标题: GaRField++：用于大规模3D场景重建的强化高斯辐射场

摘要: 本文提出了一种基于三维高斯喷洒（3DGS）的大规模场景重建新框架，并旨在解决现有方法面临的可扩展性和准确性挑战。为了解决可扩展性问题，我们将大场景分割为多个单元格，并通过基于可见性的摄像机选择和渐进式点云扩展来相关每个单元格的候选点云和摄像机视图。为了加强渲染质量，与普通的3DGS相比，进行了三个突出的改进，即射线-高斯交点策略和用于学习效率的新型高斯密度控制，基于ConvKAN网络的外观解耦模块来解决大规模场景中不均匀的照明条件，以及带有颜色损失、深度失真损失和法线一致性损失的精制最终损失。最后，执行无缝拼接程序，将各个高斯辐射场合并，以进行跨不同单元格的新视图合成。对Mill19、Urban3D和MatrixCity数据集的评估显示，我们的方法始终比大规模场景重建中的最先进方法生成更高保真度的渲染结果。我们通过在商用无人机记录的自采视频剪辑上进行渲染，进一步验证了所提出方法的泛化能力。

更新时间: 2024-09-24 15:03:24

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2409.12774v3

Seeing Faces in Things: A Model and Dataset for Pareidolia

The human visual system is well-tuned to detect faces of all shapes and sizes. While this brings obvious survival advantages, such as a better chance of spotting unknown predators in the bush, it also leads to spurious face detections. ``Face pareidolia'' describes the perception of face-like structure among otherwise random stimuli: seeing faces in coffee stains or clouds in the sky. In this paper, we study face pareidolia from a computer vision perspective. We present an image dataset of ``Faces in Things'', consisting of five thousand web images with human-annotated pareidolic faces. Using this dataset, we examine the extent to which a state-of-the-art human face detector exhibits pareidolia, and find a significant behavioral gap between humans and machines. We find that the evolutionary need for humans to detect animal faces, as well as human faces, may explain some of this gap. Finally, we propose a simple statistical model of pareidolia in images. Through studies on human subjects and our pareidolic face detectors we confirm a key prediction of our model regarding what image conditions are most likely to induce pareidolia. Dataset and Website: https://aka.ms/faces-in-things

Updated: 2024-09-24 14:50:21

标题: 看到事物中的面孔：一个关于错觉的模型和数据集

摘要: 人类视觉系统对各种形状和大小的面孔都有很好的适应能力。虽然这带来了明显的生存优势，比如更有可能在丛林中发现未知的捕食者，但也会导致虚假的面孔检测。 “面孔错觉”描述了在其他随机刺激中察觉到类似面孔结构的现象：在咖啡渍或天空中看到面孔。在这篇论文中，我们从计算机视觉的角度研究了面孔错觉。我们提出了一个“东西中的面孔”图像数据集，包括五千张人类注释的错觉面孔网图。利用这个数据集，我们研究了最先进的人脸检测器展示面孔错觉的程度，并发现人类和机器之间存在显著的行为差距。我们发现，人类需要检测动物面孔以及人类面孔的进化需求可能解释了这种差距的一部分。最后，我们提出了一个关于图像中错觉的简单统计模型。通过对人类受试者和我们的错觉面孔检测器的研究，我们证实了我们模型关于哪些图像条件最有可能引起错觉的一个关键预测。数据集和网站：https://aka.ms/faces-in-things

更新时间: 2024-09-24 14:50:21

领域: cs.CV,cs.AI,cs.HC,cs.IR,cs.LG

下载: http://arxiv.org/abs/2409.16143v1

HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection

Open-vocabulary object detection (OVD) models are considered to be Large Multi-modal Models (LMM), due to their extensive training data and a large number of parameters. Mainstream OVD models prioritize object coarse-grained category rather than focus on their fine-grained attributes, e.g., colors or materials, thus failed to identify objects specified with certain attributes. However, OVD models are pretrained on large-scale image-text pairs with rich attribute words, whose latent feature space can represent the global text feature as a linear composition of fine-grained attribute tokens without highlighting them. Therefore, we propose in this paper a universal and explicit approach for frozen mainstream OVD models that boosts their attribute-level detection capabilities by highlighting fine-grained attributes in explicit linear space. Firstly, a LLM is leveraged to highlight attribute words within the input text as a zero-shot prompted task. Secondly, by strategically adjusting the token masks, the text encoders of OVD models extract both global text and attribute-specific features, which are then explicitly composited as two vectors in linear space to form the new attribute-highlighted feature for detection tasks, where corresponding scalars are hand-crafted or learned to reweight both two vectors. Notably, these scalars can be seamlessly transferred among different OVD models, which proves that such an explicit linear composition is universal. Empirical evaluation on the FG-OVD dataset demonstrates that our proposed method uniformly improves fine-grained attribute-level OVD of various mainstream models and achieves new state-of-the-art performance.

Updated: 2024-09-24 14:43:14

标题: HA-FGOVD：通过显式线性组合突出细粒度属性，用于开放词汇目标检测

摘要: 开放词汇目标检测（OVD）模型被认为是大型多模态模型（LMM），因为它们具有广泛的训练数据和大量的参数。主流的OVD模型优先考虑对象的粗粒度类别，而不是专注于它们的细粒度属性，例如颜色或材料，因此无法识别具有特定属性的对象。然而，OVD模型是在具有丰富属性词的大规模图像文本对上预训练的，其潜在特征空间可以将全局文本特征表示为细粒度属性标记的线性组合，而不突出显示它们。因此，本文提出了一种用于增强主流OVD模型属性级别检测能力的通用和显式方法，通过在显式线性空间中突出细粒度属性。首先，利用LLM来突出输入文本中的属性词作为零-shot提示任务。其次，通过策略性地调整标记掩码，OVD模型的文本编码器提取全局文本和属性特定特征，然后明确地组合为两个向量在线性空间中形成新的属性突出特征，用于检测任务，其中相应的标量是手工制作或学习以重新加权这两个向量。值得注意的是，这些标量可以在不同的OVD模型之间无缝传输，证明了这种显式线性组合是通用的。对FG-OVD数据集的实证评估表明，我们提出的方法统一改善了各种主流模型的细粒度属性级别OVD，并实现了新的最新性能。

更新时间: 2024-09-24 14:43:14

领域: cs.CV,cs.AI,cs.CL,cs.MM

下载: http://arxiv.org/abs/2409.16136v1

Subsampling Suffices for Adaptive Data Analysis

Ensuring that analyses performed on a dataset are representative of the entire population is one of the central problems in statistics. Most classical techniques assume that the dataset is independent of the analyst's query and break down in the common setting where a dataset is reused for multiple, adaptively chosen, queries. This problem of \emph{adaptive data analysis} was formalized in the seminal works of Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014). We identify a remarkably simple set of assumptions under which the queries will continue to be representative even when chosen adaptively: The only requirements are that each query takes as input a random subsample and outputs few bits. This result shows that the noise inherent in subsampling is sufficient to guarantee that query responses generalize. The simplicity of this subsampling-based framework allows it to model a variety of real-world scenarios not covered by prior work. In addition to its simplicity, we demonstrate the utility of this framework by designing mechanisms for two foundational tasks, statistical queries and median finding. In particular, our mechanism for answering the broadly applicable class of statistical queries is both extremely simple and state of the art in many parameter regimes.

Updated: 2024-09-24 14:42:58

标题: 子抽样足以进行自适应数据分析

摘要: 确保对数据集进行的分析代表整个人口是统计学中的一个核心问题。大多数经典技术假定数据集与分析师的查询无关，并在数据集被多次自适应选择的查询重复使用的常见情况下出现问题。这种\emph{自适应数据分析}问题在Dwork等人的重要作品（STOC，2015）和Hardt和Ullman（FOCS，2014）中得到了形式化。我们确定了一组非常简单的假设，使得即使在自适应选择时，查询仍将代表性：唯一的要求是每个查询输入一个随机子样本并输出少量位。这一结果表明，子采样中固有的噪声足以保证查询响应的泛化。这种基于子采样的框架的简单性使其能够模拟许多之前的工作未涵盖的各种现实场景。除了其简单性外，我们通过设计两个基础任务的机制，统计查询和中值查找，展示了这个框架的实用性。特别是，我们对回答广泛适用的统计查询类的机制既非常简单，又在许多参数范围内处于领先地位。

更新时间: 2024-09-24 14:42:58

领域: cs.LG,cs.DS,cs.IT,math.IT

下载: http://arxiv.org/abs/2302.08661v3

Evaluation of state-of-the-art ASR Models in Child-Adult Interactions

The ability to reliably transcribe child-adult conversations in a clinical setting is valuable for diagnosis and understanding of numerous developmental disorders such as Autism Spectrum Disorder. Recent advances in deep learning architectures and availability of large scale transcribed data has led to development of speech foundation models that have shown dramatic improvements in ASR performance. However, the ability of these models to translate well to conversational child-adult interactions is under studied. In this work, we provide a comprehensive evaluation of ASR performance on a dataset containing child-adult interactions from autism diagnostic sessions, using Whisper, Wav2Vec2, HuBERT, and WavLM. We find that speech foundation models show a noticeable performance drop (15-20% absolute WER) for child speech compared to adult speech in the conversational setting. Then, we employ LoRA on the best performing zero shot model (whisper-large) to probe the effectiveness of fine-tuning in a low resource setting, resulting in ~8% absolute WER improvement for child speech and ~13% absolute WER improvement for adult speech.

Updated: 2024-09-24 14:42:37

标题: 评估现有ASR模型在儿童-成人互动中的表现

摘要: 在临床环境中可靠地转录儿童和成人之间的对话对于诊断和理解许多发育障碍，如自闭症谱系障碍，具有重要价值。深度学习架构的最新进展和大规模已转录数据的可用性导致了基于语音的基础模型的发展，这些模型在自动语音识别（ASR）性能方面取得了显著改进。然而，这些模型在转化到儿童和成人之间的对话交互方面的能力尚未受到充分研究。在这项工作中，我们对包含自闭症诊断会话中儿童和成人交互的数据集进行了ASR性能全面评估，使用了Whisper、Wav2Vec2、HuBERT和WavLM。我们发现在对话环境中，与成人语音相比，语音基础模型在儿童语音方面表现出明显的性能下降（绝对WER下降15-20%）。然后，我们在表现最佳的零样本模型（whisper-large）上使用LoRA来探究在资源匮乏环境中微调的有效性，结果儿童语音的绝对WER改善约8%，成人语音的绝对WER改善约13%。

更新时间: 2024-09-24 14:42:37

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2409.16135v1

Mixture of Tokens: Continuous MoE through Cross-Example Aggregation

Mixture of Experts (MoE) models based on Transformer architecture are pushing the boundaries of language and vision tasks. The allure of these models lies in their ability to substantially increase the parameter count without a corresponding increase in FLOPs. Most widely adopted MoE models are discontinuous with respect to their parameters - often referred to as sparse. At the same time, existing continuous MoE designs either lag behind their sparse counterparts or are incompatible with autoregressive decoding. Motivated by the observation that the adaptation of fully continuous methods has been an overarching trend in deep learning, we develop Mixture of Tokens (MoT), a simple, continuous architecture that is capable of scaling the number of parameters similarly to sparse MoE models. Unlike conventional methods, MoT assigns mixtures of tokens from different examples to each expert. This architecture is fully compatible with autoregressive training and generation. Our best models not only achieve a 3x increase in training speed over dense Transformer models in language pretraining but also match the performance of state-of-the-art MoE architectures. Additionally, a close connection between MoT and MoE is demonstrated through a novel technique we call transition tuning.

Updated: 2024-09-24 14:40:57

标题: Token的混合：通过交叉示例聚合实现连续MoE

摘要: 基于Transformer架构的混合专家（MoE）模型正在推动语言和视觉任务的边界。这些模型的吸引力在于它们能够大幅增加参数数量，而不需要相应增加FLOPs。最广泛采用的MoE模型在参数方面是不连续的 - 通常被称为稀疏。与此同时，现有的连续MoE设计要么落后于其稀疏对应物，要么与自回归解码不兼容。受到完全连续方法适应深度学习的趋势的启发，我们开发了混合令牌（MoT），这是一种简单的连续架构，能够像稀疏MoE模型一样扩展参数数量。与传统方法不同，MoT将来自不同示例的令牌混合分配给每个专家。该架构完全兼容自回归训练和生成。我们的最佳模型不仅在语言预训练中比稠密Transformer模型提高了3倍的训练速度，还与最先进的MoE架构性能相匹配。此外，通过我们称为过渡调整的新技术，MoT与MoE之间展示了密切联系。

更新时间: 2024-09-24 14:40:57

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2310.15961v2

Implicit assessment of language learning during practice as accurate as explicit testing

Assessment of proficiency of the learner is an essential part of Intelligent Tutoring Systems (ITS). We use Item Response Theory (IRT) in computer-aided language learning for assessment of student ability in two contexts: in test sessions, and in exercises during practice sessions. Exhaustive testing across a wide range of skills can provide a detailed picture of proficiency, but may be undesirable for a number of reasons. Therefore, we first aim to replace exhaustive tests with efficient but accurate adaptive tests. We use learner data collected from exhaustive tests under imperfect conditions, to train an IRT model to guide adaptive tests. Simulations and experiments with real learner data confirm that this approach is efficient and accurate. Second, we explore whether we can accurately estimate learner ability directly from the context of practice with exercises, without testing. We transform learner data collected from exercise sessions into a form that can be used for IRT modeling. This is done by linking the exercises to {\em linguistic constructs}; the constructs are then treated as "items" within IRT. We present results from large-scale studies with thousands of learners. Using teacher assessments of student ability as "ground truth," we compare the estimates obtained from tests vs. those from exercises. The experiments confirm that the IRT models can produce accurate ability estimation based on exercises.

Updated: 2024-09-24 14:40:44

标题: 隐性评估语言学习在练习过程中与显性测试一样准确

摘要: 评估学习者的能力水平是智能辅导系统（ITS）的重要组成部分。我们在计算机辅助语言学习中使用项目反应理论（IRT）来评估学生能力的两个情境：测试会话和练习会话中的练习。在多种技能范围内进行详尽的测试可以提供对能力水平的详细图像，但由于多种原因可能是不可取的。因此，我们首先旨在用高效而准确的自适应测试取代详尽测试。我们利用在不完美条件下收集的学习者数据，训练一个IRT模型来指导自适应测试。模拟和实验使用真实学习者数据证实了这种方法的高效性和准确性。其次，我们探讨是否可以直接从练习的上下文中准确估计学习者的能力，而无需测试。我们将从练习会话中收集的学习者数据转换成可用于IRT建模的形式。通过将练习与语言构建链接起来，然后将构建视为IRT中的“项目”。我们展示了包括成千上万学习者在内的大规模研究结果。利用教师对学生能力的评估作为“基本事实”，我们比较了从测试和从练习中获得的估计值。实验证实，IRT模型可以根据练习产生准确的能力估计。

更新时间: 2024-09-24 14:40:44

领域: cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2409.16133v1

Analyzing Probabilistic Methods for Evaluating Agent Capabilities

To mitigate risks from AI systems, we need to assess their capabilities accurately. This is especially difficult in cases where capabilities are only rarely displayed. Phuong et al. propose two methods that aim to obtain better estimates of the probability of an AI agent successfully completing a given task. The milestone method decomposes tasks into subtasks, aiming to improve overall success rate estimation, while the expert best-of-N method leverages human guidance as a proxy for the model's independent performance. Our analysis of these methods as Monte Carlo estimators reveals that while both effectively reduce variance compared to naive Monte Carlo sampling, they also introduce bias. Experimental results demonstrate that the milestone method underestimates true solve rates for many real-world tasks due to its constraining assumptions. The expert best-of-N method exhibits even more severe underestimation across all tasks, attributed to an inherently flawed re-weighting factor. To enhance the accuracy of capability estimates of AI agents on difficult tasks, we suggest future work should leverage the rich literature on Monte Carlo Estimators.

Updated: 2024-09-24 14:35:20

标题: 分析评估代理能力的概率方法

摘要: 为了减轻人工智能系统带来的风险，我们需要准确评估它们的能力。在那些能力很少展示的情况下，这尤其困难。Phuong等人提出了两种方法，旨在更好地估计人工智能代理成功完成给定任务的概率。里程碑方法将任务分解为子任务，旨在提高整体成功率估计，而专家最佳N方法利用人类指导作为模型独立性能的代理。我们对这些方法作为蒙特卡洛估计器的分析表明，与简单的蒙特卡洛抽样相比，两者都有效地降低了方差，但也引入了偏差。实验结果表明，由于其约束性假设，里程碑方法低估了许多真实任务的真实解决率。专家最佳N方法在所有任务中都表现出更严重的低估，归因于一个固有缺陷的重新加权因子。为了提高对困难任务中人工智能代理能力估计的准确性，我们建议未来的研究应该利用丰富的蒙特卡洛估计器文献。

更新时间: 2024-09-24 14:35:20

领域: cs.AI

下载: http://arxiv.org/abs/2409.16125v1

MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents

Developing AI agents powered by large language models (LLMs) faces significant challenges in achieving true Turing completeness and adaptive, code-driven evolution. Current approaches often generate code independently of its runtime context, relying heavily on the LLM's memory, which results in inefficiencies and limits adaptability. Manual protocol development in sandbox environments further constrains the agent's autonomous adaptability. Crucially, achieving consistency in code and context across multi-turn interactions and ensuring isolation of local variables within each interaction remains an unsolved problem. We introduce MOSS (llM-oriented Operating System Simulation), a novel framework that addresses these challenges by integrating code generation with a dynamic context management system. MOSS ensures consistency and adaptability by using a mechanism that maintains the Python context across interactions, including isolation of local variables and preservation of runtime integrity. At its core, the framework employs an Inversion of Control (IoC) container in conjunction with decorators to enforce the least knowledge principle, allowing agents to focus on abstract interfaces rather than concrete implementations. This facilitates seamless integration of new tools and libraries, enables runtime instance replacement, and reduces prompt complexity, providing a "what you see is what you get" environment for the agent. Through a series of case studies, we show how this framework can enhance the efficiency and capabilities of agent development and highlight its advantages in moving towards Turing-complete agents capable of evolving through code.

Updated: 2024-09-24 14:30:21

标题: MOSS：为AI代理实现基于代码驱动的演进和上下文管理

摘要: 开发由大型语言模型（LLM）驱动的人工智能代理面临着实现真正的图灵完备性和自适应、代码驱动演化的重大挑战。当前的方法通常独立生成代码，不考虑其运行时环境，过度依赖LLM的内存，导致效率低下并限制了适应性。在沙盒环境中手动开发协议进一步限制了代理的自主适应性。在多轮交互中实现代码和上下文的一致性，并确保在每次交互中局部变量的隔离仍然是一个未解决的问题。我们介绍了一种新颖的框架MOSS（LLM导向操作系统模拟），通过将代码生成与动态上下文管理系统集成起来，解决了这些挑战。MOSS通过使用一种机制来在交互中保持Python上下文的一致性和适应性，包括隔离局部变量和保持运行时完整性。在其核心，该框架使用控制反转（IoC）容器结合装饰器来实施最小知识原则，使代理能够专注于抽象接口而不是具体实现。这有助于无缝集成新工具和库，实现运行时实例替换，并减少提示复杂性，为代理提供一个“所见即所得”的环境。通过一系列案例研究，我们展示了这一框架如何提升代理开发的效率和能力，并突出了它在朝着能够通过代码演化的图灵完备代理方向的优势。

更新时间: 2024-09-24 14:30:21

领域: cs.SE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.16120v1

Does AI help humans make better decisions? A methodological framework for experimental evaluation

The use of Artificial Intelligence (AI), or more generally data-driven algorithms, has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a new methodological framework to experimentally answer this question without additional assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded experimental design, in which the provision of AI-generated recommendations is randomized across cases with humans making final decisions. Under this experimental design, we show how to compare the performance of three alternative decision-making systems -- human-alone, human-with-AI, and AI-alone. We also show when to provide a human-decision maker with AI recommendations and when they should follow such recommendations. We apply the proposed methodology to the data from our own randomized controlled trial of a pretrial risk assessment instrument. We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Our analysis also shows that the risk assessment-alone decisions generally perform worse than human decisions with or without algorithmic assistance.

Updated: 2024-09-24 14:28:23

标题: 人工智能是否有助于人类做出更好的决策？实验评估的方法论框架

摘要: 人工智能（AI）的使用，或者更普遍地说是基于数据驱动的算法，在当今社会已经变得无处不在。然而，在许多情况下，特别是在利害关系重大时，人类仍然做出最终决定。因此，关键问题是，与仅由人类或仅由AI系统做出决定相比，AI是否有助于人类做出更好的决策。我们引入了一种新的方法论框架，以在没有额外假设的情况下实验性地回答这个问题。我们使用基于基准潜在结果的标准分类指标来衡量决策者做出正确决定的能力。我们考虑了一个单盲实验设计，在这种设计中，提供AI生成的建议是在由人类做出最终决定的情况下随机分配的。在这种实验设计下，我们展示了如何比较三种替代决策系统的性能——仅人类、人类与AI、以及仅AI。我们还展示了何时向决策者提供AI建议，以及何时应该遵循这些建议。我们将提出的方法应用到我们自己的一项随机对照试验的数据中，该试验涉及一种预审风险评估工具。我们发现，风险评估建议并未提高法官决定是否要实施现金保释的分类准确性。我们的分析还表明，仅仅依靠风险评估做出的决定通常表现不如有或没有算法帮助的人类决策。

更新时间: 2024-09-24 14:28:23

领域: cs.AI,econ.GN,q-fin.EC,stat.AP,stat.ME

下载: http://arxiv.org/abs/2403.12108v2

TabEBM: A Tabular Data Augmentation Method with Distinct Class-Specific Energy-Based Models

Data collection is often difficult in critical fields such as medicine, physics, and chemistry. As a result, classification methods usually perform poorly with these small datasets, leading to weak predictive performance. Increasing the training set with additional synthetic data, similar to data augmentation in images, is commonly believed to improve downstream classification performance. However, current tabular generative methods that learn either the joint distribution $ p(\mathbf{x}, y) $ or the class-conditional distribution $ p(\mathbf{x} \mid y) $ often overfit on small datasets, resulting in poor-quality synthetic data, usually worsening classification performance compared to using real data alone. To solve these challenges, we introduce TabEBM, a novel class-conditional generative method using Energy-Based Models (EBMs). Unlike existing methods that use a shared model to approximate all class-conditional densities, our key innovation is to create distinct EBM generative models for each class, each modelling its class-specific data distribution individually. This approach creates robust energy landscapes, even in ambiguous class distributions. Our experiments show that TabEBM generates synthetic data with higher quality and better statistical fidelity than existing methods. When used for data augmentation, our synthetic data consistently improves the classification performance across diverse datasets of various sizes, especially small ones.

Updated: 2024-09-24 14:25:59

标题: TabEBM：一种具有不同类别特定能量模型的表格数据增强方法

摘要: 数据收集在医学、物理和化学等关键领域通常很困难。因此，分类方法通常在这些小数据集上表现不佳，导致预测性能较弱。增加训练集与额外的合成数据，类似于图像数据增强，通常被认为可以提高下游分类性能。然而，目前学习联合分布$ p(\mathbf{x}, y) $或类条件分布$ p(\mathbf{x} \mid y) $的表格生成方法往往会在小数据集上过拟合，导致合成数据质量较差，通常会比仅使用真实数据时恶化分类性能。为了解决这些挑战，我们引入了TabEBM，一种使用能量基模型(EBMs)的新颖的类条件生成方法。与现有方法使用共享模型来逼近所有类条件密度不同，我们的关键创新是为每个类创建独立的EBM生成模型，每个模型都单独模拟其类特定的数据分布。这种方法即使在模糊的类分布中也能创建强大的能量景观。我们的实验证明，TabEBM生成的合成数据质量更高，统计保真度更好，比现有方法更优。当用于数据增强时，我们的合成数据始终可以改善各种大小的各种数据集的分类性能，特别是小数据集。

更新时间: 2024-09-24 14:25:59

领域: cs.LG

下载: http://arxiv.org/abs/2409.16118v1

Self-attention as an attractor network: transient memories without backpropagation

Transformers are one of the most successful architectures of modern neural networks. At their core there is the so-called attention mechanism, which recently interested the physics community as it can be written as the derivative of an energy function in certain cases: while it is possible to write the cross-attention layer as a modern Hopfield network, the same is not possible for the self-attention, which is used in the GPT architectures and other autoregressive models. In this work we show that it is possible to obtain the self-attention layer as the derivative of local energy terms, which resemble a pseudo-likelihood. We leverage the analogy with pseudo-likelihood to design a recurrent model that can be trained without backpropagation: the dynamics shows transient states that are strongly correlated with both train and test examples. Overall we present a novel framework to interpret self-attention as an attractor network, potentially paving the way for new theoretical approaches inspired from physics to understand transformers.

Updated: 2024-09-24 14:19:56

标题: 自注意力作为吸引子网络：无需反向传播的瞬时记忆

摘要: 变压器是现代神经网络中最成功的架构之一。在它们的核心是所谓的注意力机制，最近引起了物理学界的兴趣，因为在某些情况下可以将其写成能量函数的导数：虽然可以将交叉注意力层写成现代霍普菲尔德网络，但不能对自注意力进行相同的处理，后者在GPT架构和其他自回归模型中使用。在这项工作中，我们展示了可以将自注意力层获得为局部能量项的导数，这些项类似于伪似然性。我们利用伪似然性的类比来设计一个可以在没有反向传播的情况下进行训练的递归模型：动态显示出与训练和测试示例都强相关的瞬态状态。总体上，我们提出了一个新颖的框架，将自注意力解释为一个吸引子网络，可能为从物理学中受启发的新理论方法铺平道路，以理解变压器。

更新时间: 2024-09-24 14:19:56

领域: cs.LG,cond-mat.dis-nn

下载: http://arxiv.org/abs/2409.16112v1

Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Large language models (LLMs) have demonstrated considerable proficiency in general natural language processing (NLP) tasks. Instruction tuning, a successful paradigm, enhances the ability of LLMs to follow natural language instructions and exhibit robust generalization across general tasks. However, these models often encounter performance limitations across multiple tasks due to constrained model capacity. Expanding this capacity during the instruction tuning phase poses significant challenges. To address this issue, we introduce parameter-efficient sparsity crafting (PESC), which crafts dense models into sparse models using the mixture-of-experts (MoE) architecture. PESC integrates adapters into the MoE layers of sparse models, differentiating experts without altering the individual weights within these layers. This method significantly reduces computational costs and GPU memory requirements, facilitating model capacity expansion through a minimal parameter increase when guaranteeing the quality of approximation in function space compared to original sparse upcycling. Our empirical evaluation demonstrates the effectiveness of the PESC method. Using PESC during instruction tuning, our best sparse model outperforms other sparse and dense models and exhibits superior general capabilities compared to GPT-3.5. Our code is available at https://github.com/wuhy68/Parameter-Efficient-MoE.

Updated: 2024-09-24 14:14:40

标题: Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks （参数高效的稀疏性构建：从密集到专家混合体，用于通用任务的指导调整）

摘要: 大型语言模型（LLMs）在一般自然语言处理（NLP）任务中表现出相当高水准。指导调整是一种成功的范式，增强了LLMs遵循自然语言指令并在一般任务中表现出强大泛化能力的能力。然而，由于受限于模型容量，这些模型在多个任务中经常遇到性能限制。在指导调整阶段扩展这种容量提出了重大挑战。为了解决这个问题，我们引入了参数高效稀疏构建（PESC），该方法使用专家混合（MoE）架构将密集模型制作成稀疏模型。PESC将适配器集成到稀疏模型的MoE层中，区分专家而不改变这些层内的个体权重。与原始稀疏升级相比，这种方法显著降低了计算成本和GPU内存需求，从而通过最小参数增加来扩展模型容量，同时保证在函数空间中的逼近质量。我们的实证评估证明了PESC方法的有效性。在指导调整过程中使用PESC，我们最佳稀疏模型的表现优于其他稀疏和密集模型，并且在一般能力方面优于GPT-3.5。我们的代码可在https://github.com/wuhy68/Parameter-Efficient-MoE上找到。

更新时间: 2024-09-24 14:14:40

领域: cs.AI

下载: http://arxiv.org/abs/2401.02731v4

Ciphertext Malleability in Lattice-Based KEMs as a Countermeasure to Side Channel Analysis

Due to developments in quantum computing, classical asymmetric cryptography is at risk of being breached. Consequently, new Post-Quantum Cryptography (PQC) primitives using lattices are studied. Another point of scrutiny is the resilience of these new primitives to Side Channel Analysis (SCA), where an attacker can study physical leakages. In this work we discuss a SCA vulnerability due to the ciphertext malleability of some PQC primitives exposed by a work from Ravi et al. We propose a novel countermeasure to this vulnerability exploiting the same ciphertext malleability and discuss its practical application to several PQC primitives. We also extend the seminal work of Ravi et al. by detailling their attack on the different security levels of a post-quantum Key Encapsulation Mechanism (KEM), namely FrodoKEM.

Updated: 2024-09-24 14:07:48

标题: 基于格的密钥交换机制中密文的可塑性作为抵御侧信道分析的对策

摘要: 由于量子计算的发展，经典的非对称加密面临被破解的风险。因此，研究了使用格的新后量子密码（PQC）原语。另一个关注点是这些新原语对侧信道分析（SCA）的抗性，攻击者可以通过研究物理泄漏来进行攻击。在这项工作中，我们讨论了一种由Ravi等人的工作揭示的一些PQC原语的密文可塑性导致的SCA漏洞。我们提出了一种新颖的对抗措施来利用相同的密文可塑性，并讨论其在几种PQC原语中的实际应用。我们还通过详细描述他们对后量子密钥封装机制（KEM）FrodoKEM不同安全级别的攻击，扩展了Ravi等人的开创性工作。

更新时间: 2024-09-24 14:07:48

领域: cs.CR,94A60,E.3.3

下载: http://arxiv.org/abs/2409.16107v1

Scenario of Use Scheme: Threat Model Specification for Speaker Privacy Protection in the Medical Domain

Speech recordings are being more frequently used to detect and monitor disease, leading to privacy concerns. Beyond cryptography, protection of speech can be addressed by approaches, such as perturbation, disentanglement, and re-synthesis, that eliminate sensitive information of the speaker, leaving the information necessary for medical analysis purposes. In order for such privacy protective approaches to be developed, clear and systematic specifications of assumptions concerning medical settings and the needs of medical professionals are necessary. In this paper, we propose a Scenario of Use Scheme that incorporates an Attacker Model, which characterizes the adversary against whom the speaker's privacy must be defended, and a Protector Model, which specifies the defense. We discuss the connection of the scheme with previous work on speech privacy. Finally, we present a concrete example of a specified Scenario of Use and a set of experiments about protecting speaker data against gender inference attacks while maintaining utility for Parkinson's detection.

Updated: 2024-09-24 14:07:47

标题: 使用方案场景：医疗领域演讲者隐私保护的威胁模型规范

摘要: 语音录音越来越频繁地被用于检测和监测疾病，这引发了隐私方面的担忧。除了密码学之外，保护语音可以通过诸如扰动、解缠和重新合成等方法来实现，这些方法消除了演讲者的敏感信息，仅保留了医学分析所需的信息。为了开发这种保护隐私的方法，需要明确和系统地规定有关医疗环境和医疗专业人员需求的假设。在本文中，我们提出了一个使用场景方案，其中包括一个攻击者模型，对应需要保护演讲者隐私的对手，以及一个保护者模型，规定了防御方法。我们讨论了该方案与先前关于语音隐私的研究的联系。最后，我们提供了一个具体的使用场景示例，并进行了一系列实验，以防止性别推断攻击并同时保持对帕金森病检测的实用性。

更新时间: 2024-09-24 14:07:47

领域: eess.AS,cs.AI,cs.CR,cs.SD

下载: http://arxiv.org/abs/2409.16106v1

FairBranch: Mitigating Bias Transfer in Fair Multi-task Learning

The generalisation capacity of Multi-Task Learning (MTL) suffers when unrelated tasks negatively impact each other by updating shared parameters with conflicting gradients. This is known as negative transfer and leads to a drop in MTL accuracy compared to single-task learning (STL). Lately, there has been a growing focus on the fairness of MTL models, requiring the optimization of both accuracy and fairness for individual tasks. Analogously to negative transfer for accuracy, task-specific fairness considerations might adversely affect the fairness of other tasks when there is a conflict of fairness loss gradients between the jointly learned tasks - we refer to this as Bias Transfer. To address both negative- and bias-transfer in MTL, we propose a novel method called FairBranch, which branches the MTL model by assessing the similarity of learned parameters, thereby grouping related tasks to alleviate negative transfer. Moreover, it incorporates fairness loss gradient conflict correction between adjoining task-group branches to address bias transfer within these task groups. Our experiments on tabular and visual MTL problems show that FairBranch outperforms state-of-the-art MTLs on both fairness and accuracy.

Updated: 2024-09-24 14:06:33

标题: 公平分支：在公平多任务学习中减轻偏见转移

摘要: 多任务学习（MTL）的泛化能力会受到影响，当不相关的任务通过更新共享参数并具有冲突梯度时，相互产生负面影响。这被称为负迁移，导致与单任务学习（STL）相比，MTL的准确性下降。最近，人们越来越关注MTL模型的公平性，要求对各个任务的准确性和公平性进行优化。类似于准确性的负迁移，任务特定的公平性考虑可能在共同学习任务之间存在公平性损失梯度冲突时，对其他任务的公平性产生不利影响 - 我们将其称为偏见传递。为了解决MTL中的负迁移和偏见传递，我们提出了一种新颖的方法称为FairBranch，通过评估学习参数的相似性，将MTL模型分支，从而将相关任务分组以减轻负迁移。此外，它还在相邻任务组分支之间集成了公平性损失梯度冲突修正，以解决这些任务组内的偏见传递。我们在表格和视觉MTL问题上的实验显示，FairBranch在公平性和准确性方面均优于最先进的MTL模型。

更新时间: 2024-09-24 14:06:33

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2310.13746v2

Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives

Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research. Multi-modal Variational Autoencoders (VAEs) have been a popular generative model class that learns latent representations that jointly explain multiple modalities. Various objective functions for such models have been suggested, often motivated as lower bounds on the multi-modal data log-likelihood or from information-theoretic considerations. To encode latent variables from different modality subsets, Product-of-Experts (PoE) or Mixture-of-Experts (MoE) aggregation schemes have been routinely used and shown to yield different trade-offs, for instance, regarding their generative quality or consistency across multiple modalities. In this work, we consider a variational objective that can tightly approximate the data log-likelihood. We develop more flexible aggregation schemes that avoid the inductive biases in PoE or MoE approaches by combining encoded features from different modalities based on permutation-invariant neural networks. Our numerical experiments illustrate trade-offs for multi-modal variational objectives and various aggregation schemes. We show that our variational objective and more flexible aggregation models can become beneficial when one wants to approximate the true joint distribution over observed modalities and latent variables in identifiable models.

Updated: 2024-09-24 13:59:59

标题: 学习多模态生成模型与不变排列编码器和更紧密的变分目标

摘要: 在机器学习研究中，为多模态数据设计深层潜变量模型一直是一个长期存在的主题。多模态变分自动编码器(VAEs)已经成为一种流行的生成模型类别，它学习能够共同解释多种模态的潜在表示。针对这种模型提出了各种目标函数，通常是作为多模态数据对数似然的下限或从信息论考虑出发。为了对不同模态子集的潜变量进行编码，通常会使用产品专家(PoE)或专家混合(MoE)聚合方案，并且已经被证明会产生不同的权衡，例如在生成质量或跨多个模态的一致性方面。在这项工作中，我们考虑一个可以紧密逼近数据对数似然的变分目标。我们开发了更灵活的聚合方案，通过基于排列不变神经网络结合来自不同模态的编码特征，避免了PoE或MoE方法中的归纳偏见。我们的数值实验展示了多模态变分目标和各种聚合方案之间的权衡。我们表明，当一个人想要在可识别的模型中逼近观察到的模态和潜在变量的真实联合分布时，我们的变分目标和更灵活的聚合模型可以带来益处。

更新时间: 2024-09-24 13:59:59

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2309.00380v3

Refereeing the Referees: Evaluating Two-Sample Tests for Validating Generators in Precision Sciences

We propose a robust methodology to evaluate the performance and computational efficiency of non-parametric two-sample tests, specifically designed for high-dimensional generative models in scientific applications such as in particle physics. The study focuses on tests built from univariate integral probability measures: the sliced Wasserstein distance and the mean of the Kolmogorov-Smirnov statistics, already discussed in the literature, and the novel sliced Kolmogorov-Smirnov statistic. These metrics can be evaluated in parallel, allowing for fast and reliable estimates of their distribution under the null hypothesis. We also compare these metrics with the recently proposed unbiased Fr\'echet Gaussian Distance and the unbiased quadratic Maximum Mean Discrepancy, computed with a quartic polynomial kernel. We evaluate the proposed tests on various distributions, focusing on their sensitivity to deformations parameterized by a single parameter $\epsilon$. Our experiments include correlated Gaussians and mixtures of Gaussians in 5, 20, and 100 dimensions, and a particle physics dataset of gluon jets from the JetNet dataset, considering both jet- and particle-level features. Our results demonstrate that one-dimensional-based tests provide a level of sensitivity comparable to other multivariate metrics, but with significantly lower computational cost, making them ideal for evaluating generative models in high-dimensional settings. This methodology offers an efficient, standardized tool for model comparison and can serve as a benchmark for more advanced tests, including machine-learning-based approaches.

Updated: 2024-09-24 13:58:46

标题: 审查裁判员：评估用于验证精密科学中生成器的双样本检验

摘要: 我们提出了一种鲁棒的方法来评估非参数两样本检验的性能和计算效率，专门设计用于科学应用中的高维生成模型，例如在粒子物理学中。该研究侧重于建立在单变量积分概率测度基础上的检验方法：切片Wasserstein距离和Kolmogorov-Smirnov统计量的均值，这些已经在文献中讨论过，以及新颖的切片Kolmogorov-Smirnov统计量。这些度量可以并行评估，从而快速可靠地估计它们在零假设下的分布。我们还将这些度量与最近提出的无偏Fr\'echet高斯距离和无偏二次最大均值差异进行比较，使用四次多项式核进行计算。我们在各种分布上评估了所提出的检验方法，关注它们对由单参数$\epsilon$参数化的变形的敏感性。我们的实验包括5、20和100维度中的相关高斯分布和高斯混合分布，以及从JetNet数据集中的胶子喷流粒子物理数据集，考虑了喷流和粒子级别特征。我们的结果表明，基于一维的检验方法提供了与其他多元度量可比拟的敏感性水平，但计算成本显著降低，使其成为评估高维情境中生成模型的理想选择。这种方法提供了一个高效、标准化的模型比较工具，并可作为更先进测试的基准，包括基于机器学习的方法。

更新时间: 2024-09-24 13:58:46

领域: stat.ML,cs.LG,hep-ph,stat.AP

下载: http://arxiv.org/abs/2409.16336v1

Adversarial Attacks on Machine Learning-Aided Visualizations

Research in ML4VIS investigates how to use machine learning (ML) techniques to generate visualizations, and the field is rapidly growing with high societal impact. However, as with any computational pipeline that employs ML processes, ML4VIS approaches are susceptible to a range of ML-specific adversarial attacks. These attacks can manipulate visualization generations, causing analysts to be tricked and their judgments to be impaired. Due to a lack of synthesis from both visualization and ML perspectives, this security aspect is largely overlooked by the current ML4VIS literature. To bridge this gap, we investigate the potential vulnerabilities of ML-aided visualizations from adversarial attacks using a holistic lens of both visualization and ML perspectives. We first identify the attack surface (i.e., attack entry points) that is unique in ML-aided visualizations. We then exemplify five different adversarial attacks. These examples highlight the range of possible attacks when considering the attack surface and multiple different adversary capabilities. Our results show that adversaries can induce various attacks, such as creating arbitrary and deceptive visualizations, by systematically identifying input attributes that are influential in ML inferences. Based on our observations of the attack surface characteristics and the attack examples, we underline the importance of comprehensive studies of security issues and defense mechanisms as a call of urgency for the ML4VIS community.

Updated: 2024-09-24 13:58:37

标题: 机器学习辅助可视化的对抗攻击

摘要: ML4VIS的研究探讨了如何利用机器学习（ML）技术生成可视化，该领域正在迅速发展，并具有很高的社会影响。然而，与任何采用ML过程的计算流程一样，ML4VIS方法容易受到各种ML特定的对抗攻击。这些攻击可以操纵可视化生成，导致分析师被欺骗并影响他们的判断。由于当前ML4VIS文献对可视化和ML视角的综合不足，这一安全方面往往被忽视。为了弥合这一差距，我们从可视化和ML视角的整体视角，研究了ML辅助可视化面临对抗攻击的潜在漏洞。我们首先确定了在ML辅助可视化中独特的攻击面（即攻击入口点）。然后举例说明了五种不同的对抗攻击。这些例子突显了在考虑攻击面和多种不同对手能力时可能发生的各种攻击。我们的结果表明，对手可以通过系统地识别对ML推理有影响的输入属性，诱发各种攻击，例如创建任意和欺骗性的可视化。根据我们对攻击面特征和攻击示例的观察，我们强调了对ML4VIS社区来说，全面研究安全问题和防御机制的重要性。

更新时间: 2024-09-24 13:58:37

领域: cs.CR,cs.AI,cs.HC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2409.02485v2

Neuromorphic Drone Detection: an Event-RGB Multimodal Approach

In recent years, drone detection has quickly become a subject of extreme interest: the potential for fast-moving objects of contained dimensions to be used for malicious intents or even terrorist attacks has posed attention to the necessity for precise and resilient systems for detecting and identifying such elements. While extensive literature and works exist on object detection based on RGB data, it is also critical to recognize the limits of such modality when applied to UAVs detection. Detecting drones indeed poses several challenges such as fast-moving objects and scenes with a high dynamic range or, even worse, scarce illumination levels. Neuromorphic cameras, on the other hand, can retain precise and rich spatio-temporal information in situations that are challenging for RGB cameras. They are resilient to both high-speed moving objects and scarce illumination settings, while prone to suffer a rapid loss of information when the objects in the scene are static. In this context, we present a novel model for integrating both domains together, leveraging multimodal data to take advantage of the best of both worlds. To this end, we also release NeRDD (Neuromorphic-RGB Drone Detection), a novel spatio-temporally synchronized Event-RGB Drone detection dataset of more than 3.5 hours of multimodal annotated recordings.

Updated: 2024-09-24 13:53:20

标题: 神经形态飞行器检测：一种事件-RGB多模态方法

摘要: 近年来，无人机检测迅速成为极度感兴趣的课题：快速移动的具有限定尺寸的物体被用于恶意目的甚至恐怖袭击的潜力引起了对于需要准确和弹性系统来检测和识别这些元素的关注。虽然关于基于RGB数据的目标检测的大量文献和作品存在，但同时也必须意识到当应用于无人机检测时此种模式的局限性。检测无人机的确面临几个挑战，如快速移动的物体和具有高动态范围或甚至更糟的稀缺照明水平的场景。另一方面，神经形态相机能够在对RGB相机具有挑战性的情况下保留精确和丰富的时空信息。它们对高速移动物体和稀缺照明设置都具有抗性，但在场景中的物体静止时会迅速丢失信息。在这种情况下，我们提出了一个新颖的模型，将两个领域整合在一起，利用多模态数据充分利用两个领域的优势。为此，我们还发布了NeRDD（神经形态-RGB无人机检测），一个包含超过3.5小时的多模态注释记录的新颖时空同步事件-RGB无人机检测数据集。

更新时间: 2024-09-24 13:53:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.16099v1

The Digital Transformation in Health: How AI Can Improve the Performance of Health Systems

Mobile health has the potential to revolutionize health care delivery and patient engagement. In this work, we discuss how integrating Artificial Intelligence into digital health applications-focused on supply chain, patient management, and capacity building, among other use cases-can improve the health system and public health performance. We present an Artificial Intelligence and Reinforcement Learning platform that allows the delivery of adaptive interventions whose impact can be optimized through experimentation and real-time monitoring. The system can integrate multiple data sources and digital health applications. The flexibility of this platform to connect to various mobile health applications and digital devices and send personalized recommendations based on past data and predictions can significantly improve the impact of digital tools on health system outcomes. The potential for resource-poor settings, where the impact of this approach on health outcomes could be more decisive, is discussed specifically. This framework is, however, similarly applicable to improving efficiency in health systems where scarcity is not an issue.

Updated: 2024-09-24 13:52:15

标题: 健康领域的数字化转型：人工智能如何提升健康系统的绩效

摘要: 移动健康有潜力彻底改变医疗保健服务和患者参与。在这项工作中，我们讨论了如何将人工智能整合到数字健康应用程序中，重点放在供应链、患者管理和能力建设等用例上，可以提高健康系统和公共卫生绩效。我们提出了一个人工智能和强化学习平台，可以提供可通过实验和实时监测优化影响的自适应干预。该系统可以整合多个数据来源和数字健康应用程序。该平台灵活性强，可以连接各种移动健康应用程序和数字设备，并根据过去数据和预测发送个性化建议，从而显著提高数字工具对健康系统结果的影响。特别讨论了在资源匮乏环境中，该方法对健康结果的影响可能更为决定性的潜力。然而，这一框架同样适用于提高健康系统效率的情况，资源稀缺并非问题。

更新时间: 2024-09-24 13:52:15

领域: cs.LG,cs.AI,cs.CY,cs.HC

下载: http://arxiv.org/abs/2409.16098v1

Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack

Large Language Models (LLMs) have risen significantly in popularity and are increasingly being adopted across multiple applications. These LLMs are heavily aligned to resist engaging in illegal or unethical topics as a means to avoid contributing to responsible AI harms. However, a recent line of attacks, known as jailbreaks, seek to overcome this alignment. Intuitively, jailbreak attacks aim to narrow the gap between what the model can do and what it is willing to do. In this paper, we introduce a novel jailbreak attack called Crescendo. Unlike existing jailbreak methods, Crescendo is a simple multi-turn jailbreak that interacts with the model in a seemingly benign manner. It begins with a general prompt or question about the task at hand and then gradually escalates the dialogue by referencing the model's replies progressively leading to a successful jailbreak. We evaluate Crescendo on various public systems, including ChatGPT, Gemini Pro, Gemini-Ultra, LlaMA-2 70b and LlaMA-3 70b Chat, and Anthropic Chat. Our results demonstrate the strong efficacy of Crescendo, with it achieving high attack success rates across all evaluated models and tasks. Furthermore, we present Crescendomation, a tool that automates the Crescendo attack and demonstrate its efficacy against state-of-the-art models through our evaluations. Crescendomation surpasses other state-of-the-art jailbreaking techniques on the AdvBench subset dataset, achieving 29-61% higher performance on GPT-4 and 49-71% on Gemini-Pro. Finally, we also demonstrate Crescendo's ability to jailbreak multimodal models.

Updated: 2024-09-24 13:51:39

标题: 太棒了，现在写一篇关于这个的文章：Crescendo多轮LLM越狱攻击

摘要: 大型语言模型（LLMs）在受欢迎程度上显著上升，并且越来越多地被应用在多个领域。这些LLMs被严格对齐，以避免涉及非法或不道德的话题，以免对负责任的人工智能造成伤害。然而，最近出现的一系列攻击，被称为越狱攻击，旨在突破这种对齐。直觉上，越狱攻击旨在缩小模型能够做到和愿意做到之间的差距。在本文中，我们介绍了一种名为Crescendo的新型越狱攻击。与现有的越狱方法不同，Crescendo是一种简单的多轮越狱，通过看似良性的方式与模型交互。它从关于手头任务的一般提示或问题开始，然后逐渐升级对话，逐步引导模型的回复，最终实现成功的越狱。我们在包括ChatGPT、Gemini Pro、Gemini-Ultra、LlaMA-270b和LlaMA-370b Chat以及Anthropic Chat在内的各种公共系统上评估了Crescendo。我们的结果表明，Crescendo具有很强的攻击效果，它在所有评估的模型和任务中都实现了较高的攻击成功率。此外，我们还介绍了Crescendomation，这是一个自动化Crescendo攻击的工具，并通过我们的评估展示了其对最先进模型的有效性。Crescendomation在AdvBench子数据集上超越了其他最先进的越狱技术，在GPT-4上的性能提高了29-61%，在Gemini-Pro上提高了49-71%。最后，我们还展示了Crescendo越狱多模态模型的能力。

更新时间: 2024-09-24 13:51:39

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2404.01833v2

When Witnesses Defend: A Witness Graph Topological Layer for Adversarial Graph Learning

Capitalizing on the intuitive premise that shape characteristics are more robust to perturbations, we bridge adversarial graph learning with the emerging tools from computational topology, namely, persistent homology representations of graphs. We introduce the concept of witness complex to adversarial analysis on graphs, which allows us to focus only on the salient shape characteristics of graphs, yielded by the subset of the most essential nodes (i.e., landmarks), with minimal loss of topological information on the whole graph. The remaining nodes are then used as witnesses, governing which higher-order graph substructures are incorporated into the learning process. Armed with the witness mechanism, we design Witness Graph Topological Layer (WGTL), which systematically integrates both local and global topological graph feature representations, the impact of which is, in turn, automatically controlled by the robust regularized topological loss. Given the attacker's budget, we derive the important stability guarantees of both local and global topology encodings and the associated robust topological loss. We illustrate the versatility and efficiency of WGTL by its integration with five GNNs and three existing non-topological defense mechanisms. Our extensive experiments across six datasets demonstrate that WGTL boosts the robustness of GNNs across a range of perturbations and against a range of adversarial attacks, leading to relative gains of up to 18%.

Updated: 2024-09-24 13:51:05

标题: 当见证人进行辩护：一种用于对抗性图学习的见证图拓扑层级

摘要: 利用直觉前提，即形状特征对扰动更加稳健，我们将对抗图学习与计算拓扑学中新兴工具相结合，即图的持久同调表示。我们引入见证复杂性的概念到对抗图分析中，这使我们能够仅关注图的显著形状特征，由最基本节点（即地标）的子集产生，并最小限度地损失整个图的拓扑信息。然后，剩余节点被用作见证，指导哪些高阶图子结构被纳入学习过程。借助见证机制，我们设计了见证图拓扑层（WGTL），系统地整合了局部和全局拓扑图特征表示，其影响反过来由稳健正则化的拓扑损失自动控制。在给定攻击者的预算的情况下，我们推导了本地和全局拓扑编码以及相关稳健拓扑损失的重要稳定性保证。我们通过将WGTL与五个GNN和三种现有非拓扑防御机制整合，展示了其多功能性和效率。我们在六个数据集上进行了广泛的实验，证明了WGTL提高了GNN对一系列扰动和一系列对抗攻击的稳健性，导致相对增益高达18%。

更新时间: 2024-09-24 13:51:05

领域: cs.LG

下载: http://arxiv.org/abs/2409.14161v2

From Pixels to Words: Leveraging Explainability in Face Recognition through Interactive Natural Language Processing

Face Recognition (FR) has advanced significantly with the development of deep learning, achieving high accuracy in several applications. However, the lack of interpretability of these systems raises concerns about their accountability, fairness, and reliability. In the present study, we propose an interactive framework to enhance the explainability of FR models by combining model-agnostic Explainable Artificial Intelligence (XAI) and Natural Language Processing (NLP) techniques. The proposed framework is able to accurately answer various questions of the user through an interactive chatbot. In particular, the explanations generated by our proposed method are in the form of natural language text and visual representations, which for example can describe how different facial regions contribute to the similarity measure between two faces. This is achieved through the automatic analysis of the output's saliency heatmaps of the face images and a BERT question-answering model, providing users with an interface that facilitates a comprehensive understanding of the FR decisions. The proposed approach is interactive, allowing the users to ask questions to get more precise information based on the user's background knowledge. More importantly, in contrast to previous studies, our solution does not decrease the face recognition performance. We demonstrate the effectiveness of the method through different experiments, highlighting its potential to make FR systems more interpretable and user-friendly, especially in sensitive applications where decision-making transparency is crucial.

Updated: 2024-09-24 13:40:39

标题: 从像素到文字：通过交互式自然语言处理在人脸识别中利用解释性

摘要: 人脸识别（FR）随着深度学习的发展取得了显著进展，在多个应用中实现了高准确性。然而，这些系统缺乏可解释性，引发了对其问责、公平性和可靠性的担忧。在本研究中，我们提出了一个交互式框架，通过结合模型无关的可解释人工智能（XAI）和自然语言处理（NLP）技术，以提升FR模型的可解释性。所提出的框架能够通过交互式聊天机器人准确回答用户的各种问题。特别是，我们提出的方法生成的解释以自然语言文本和视觉表示的形式呈现，例如可以描述不同的面部区域如何影响两张脸之间的相似度度量。这是通过自动分析脸部图像的显著性热图和一个BERT问答模型实现的，为用户提供了一个界面，有助于全面理解FR决策。所提出的方法是交互式的，允许用户提出问题以获取基于用户背景知识的更精确信息。更重要的是，与以前的研究相比，我们的解决方案并不会降低人脸识别性能。我们通过不同实验展示了该方法的有效性，突出了其潜力，使FR系统更具可解释性和用户友好性，尤其在决策透明性至关重要的敏感应用中。

更新时间: 2024-09-24 13:40:39

领域: cs.CV,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2409.16089v1

Assessing Simplification Levels in Neural Networks: The Impact of Hyperparameter Configurations on Complexity and Sensitivity

This paper presents an experimental study focused on understanding the simplification properties of neural networks under different hyperparameter configurations, specifically investigating the effects on Lempel Ziv complexity and sensitivity. By adjusting key hyperparameters such as activation functions, hidden layers, and learning rate, this study evaluates how these parameters impact the complexity of network outputs and their robustness to input perturbations. The experiments conducted using the MNIST dataset aim to provide insights into the relationships between hyperparameters, complexity, and sensitivity, contributing to a deeper theoretical understanding of these concepts in neural networks.

Updated: 2024-09-24 13:39:04

标题: 评估神经网络简化水平：超参数配置对复杂性和敏感性的影响

摘要: 本文介绍了一项实验研究，重点在于了解神经网络在不同超参数配置下的简化特性，具体研究了对Lempel Ziv复杂性和灵敏度的影响。通过调整关键超参数，如激活函数、隐藏层和学习率，本研究评估了这些参数如何影响网络输出的复杂性及其对输入扰动的稳健性。使用MNIST数据集进行的实验旨在提供关于超参数、复杂性和灵敏度之间关系的见解，从而为神经网络中这些概念的深入理论理解做出贡献。

更新时间: 2024-09-24 13:39:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.16086v1

Beyond the Black Box: A Statistical Model for LLM Reasoning and Inference

This paper introduces a novel Bayesian learning model to explain the behavior of Large Language Models (LLMs), focusing on their core optimization metric of next token prediction. We develop a theoretical framework based on an ideal generative text model represented by a multinomial transition probability matrix with a prior, and examine how LLMs approximate this matrix. Key contributions include: (i) a continuity theorem relating embeddings to multinomial distributions, (ii) a demonstration that LLM text generation aligns with Bayesian learning principles, (iii) an explanation for the emergence of in-context learning in larger models, (iv) empirical validation using visualizations of next token probabilities from an instrumented Llama model Our findings provide new insights into LLM functioning, offering a statistical foundation for understanding their capabilities and limitations. This framework has implications for LLM design, training, and application, potentially guiding future developments in the field.

Updated: 2024-09-24 13:30:25

标题: 超越黑盒子：LLM推理和推断的统计模型

摘要: 本文介绍了一种新颖的贝叶斯学习模型，用于解释大型语言模型（LLMs）的行为，重点关注它们核心优化指标的下一个标记预测。我们基于一个理想的生成文本模型，该模型用具有先验的多项式转移概率矩阵表示，开发了一个理论框架，并研究了LLMs如何逼近这个矩阵。主要贡献包括：（i）将嵌入关联到多项式分布的连续性定理，（ii）演示LLM文本生成与贝叶斯学习原则一致，（iii）解释更大模型中上下文学习的出现，（iv）使用从一个经过仪器化的Llama模型的下一个标记概率的可视化进行经验验证。我们的研究结果为LLM的功能提供了新的见解，为理解它们的能力和局限性提供了统计基础。这一框架对LLM的设计、训练和应用具有重要意义，可能指导该领域未来的发展。

更新时间: 2024-09-24 13:30:25

领域: cs.LG,cs.AI,I.2.7

下载: http://arxiv.org/abs/2402.03175v2

Online Multi-level Contrastive Representation Distillation for Cross-Subject fNIRS Emotion Recognition

Utilizing functional near-infrared spectroscopy (fNIRS) signals for emotion recognition is a significant advancement in understanding human emotions. However, due to the lack of artificial intelligence data and algorithms in this field, current research faces the following challenges: 1) The portable wearable devices have higher requirements for lightweight models; 2) The objective differences of physiology and psychology among different subjects aggravate the difficulty of emotion recognition. To address these challenges, we propose a novel cross-subject fNIRS emotion recognition method, called the Online Multi-level Contrastive Representation Distillation framework (OMCRD). Specifically, OMCRD is a framework designed for mutual learning among multiple lightweight student networks. It utilizes multi-level fNIRS feature extractor for each sub-network and conducts multi-view sentimental mining using physiological signals. The proposed Inter-Subject Interaction Contrastive Representation (IS-ICR) facilitates knowledge transfer for interactions between student models, enhancing cross-subject emotion recognition performance. The optimal student network can be selected and deployed on a wearable device. Some experimental results demonstrate that OMCRD achieves state-of-the-art results in emotional perception and affective imagery tasks.

Updated: 2024-09-24 13:30:15

标题: 在线多层次对照表示蒸馏用于跨主体fNIRS情绪识别

摘要: 利用功能性近红外光谱（fNIRS）信号进行情绪识别是理解人类情绪的重要进展。然而，由于在这一领域缺乏人工智能数据和算法，当前的研究面临以下挑战：1）便携式可穿戴设备对轻量级模型有更高要求；2）不同受试者之间生理和心理的客观差异加剧了情绪识别的困难。为了解决这些挑战，我们提出了一种新颖的跨受试者fNIRS情绪识别方法，称为在线多级对比表示蒸馏框架（OMCRD）。具体来说，OMCRD是一个旨在实现多个轻量级学生网络之间相互学习的框架。它利用每个子网络的多级fNIRS特征提取器，并使用生理信号进行多视角情感挖掘。所提出的主体间交互对比表示（IS-ICR）促进了学生模型之间的知识传递，增强了跨受试者情绪识别性能。最佳学生网络可以被选择并部署在可穿戴设备上。一些实验结果表明，OMCRD在情绪感知和情感想象任务中取得了最先进的结果。

更新时间: 2024-09-24 13:30:15

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2409.16081v1

OpenGraph: Towards Open Graph Foundation Models

Graph learning has become indispensable for interpreting and harnessing relational data in diverse fields, ranging from recommendation systems to social network analysis. In this context, a variety of GNNs have emerged as promising methodologies for encoding the structural information of graphs. By effectively capturing the graph's underlying structure, these GNNs have shown great potential in enhancing performance in graph learning tasks, such as link prediction and node classification. However, despite their successes, a significant challenge persists: these advanced methods often face difficulties in generalizing to unseen graph data that significantly differs from the training instances. In this work, our aim is to advance the graph learning paradigm by developing a general graph foundation model. This model is designed to understand the complex topological patterns present in diverse graph data, enabling it to excel in zero-shot graph learning tasks across different downstream datasets. To achieve this goal, we address several key technical challenges in our OpenGraph model. Firstly, we propose a unified graph tokenizer to adapt our graph model to generalize well on unseen graph data, even when the underlying graph properties differ significantly from those encountered during training. Secondly, we develop a scalable graph transformer as the foundational encoder, which effectively captures node-wise dependencies within the global topological context. Thirdly, we introduce a data augmentation mechanism enhanced by a LLM to alleviate the limitations of data scarcity in real-world scenarios. Extensive experiments validate the effectiveness of our framework. By adapting our OpenGraph to new graph characteristics and comprehending the nuances of diverse graphs, our approach achieves remarkable zero-shot graph learning performance across various settings and domains.

Updated: 2024-09-24 13:26:08

标题: OpenGraph：朝着开放图基础模型的方向

摘要: 图学习已经成为解释和利用各个领域中的关系数据的不可或缺的工具，从推荐系统到社交网络分析。在这种情况下，各种GNN已经成为一种有希望的方法论，用于编码图的结构信息。通过有效地捕捉图的潜在结构，这些GNN已经显示出在图学习任务中提高性能的巨大潜力，例如链接预测和节点分类。然而，尽管取得了成功，一个重要的挑战仍然存在：这些先进的方法经常面临着泛化到与训练实例显著不同的未见图数据的困难。在这项工作中，我们的目标是通过开发一个通用的图基础模型推进图学习范式。这个模型旨在理解不同图数据中存在的复杂拓扑模式，使其能够在不同的下游数据集中在零样本图学习任务中表现出色。为了实现这个目标，我们在我们的OpenGraph模型中解决了几个关键技术挑战。首先，我们提出了一个统一的图标记器，以便我们的图模型能够在未见图数据上进行良好的泛化，即使底层图属性与训练过程中遇到的属性显著不同。其次，我们开发了一个可扩展的图变换器作为基础编码器，有效地捕捉全局拓扑上下文中的节点依赖关系。第三，我们引入了一种数据增强机制，通过LLM增强，以减轻现实场景中数据稀缺的限制。广泛的实验验证了我们框架的有效性。通过使我们的OpenGraph适应新的图特征并理解各种图的微妙之处，我们的方法在各种设置和领域中实现了显著的零样本图学习性能。

更新时间: 2024-09-24 13:26:08

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2403.01121v3

Leveraging Mixture of Experts for Improved Speech Deepfake Detection

Speech deepfakes pose a significant threat to personal security and content authenticity. Several detectors have been proposed in the literature, and one of the primary challenges these systems have to face is the generalization over unseen data to identify fake signals across a wide range of datasets. In this paper, we introduce a novel approach for enhancing speech deepfake detection performance using a Mixture of Experts architecture. The Mixture of Experts framework is well-suited for the speech deepfake detection task due to its ability to specialize in different input types and handle data variability efficiently. This approach offers superior generalization and adaptability to unseen data compared to traditional single models or ensemble methods. Additionally, its modular structure supports scalable updates, making it more flexible in managing the evolving complexity of deepfake techniques while maintaining high detection accuracy. We propose an efficient, lightweight gating mechanism to dynamically assign expert weights for each input, optimizing detection performance. Experimental results across multiple datasets demonstrate the effectiveness and potential of our proposed approach.

Updated: 2024-09-24 13:24:03

标题: 利用专家混合模型提高语音深度伪造检测

摘要: 语音深度伪造对个人安全和内容真实性构成了重大威胁。文献中提出了几种检测器，这些系统面临的主要挑战之一是在未见数据上实现泛化，以识别跨多个数据集的假信号。在本文中，我们介绍了一种利用专家混合体系结构增强语音深度伪造检测性能的新方法。专家混合体系结构非常适合语音深度伪造检测任务，因为它能够专门处理不同的输入类型并有效处理数据的变化。与传统的单一模型或集成方法相比，这种方法提供了更优越的泛化性和适应性，以适应未见数据。此外，其模块化结构支持可扩展的更新，使其在处理深度伪造技术的不断演变复杂性的同时保持高检测准确性更加灵活。我们提出了一种高效、轻量级的门控机制，动态分配每个输入的专家权重，优化检测性能。跨多个数据集的实验结果展示了我们提出方法的有效性和潜力。

更新时间: 2024-09-24 13:24:03

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2409.16077v1

Ultra-low latency quantum-inspired machine learning predictors implemented on FPGA

Tensor Networks (TNs) are a computational paradigm used for representing quantum many-body systems. Recent works have shown how TNs can also be applied to perform Machine Learning (ML) tasks, yielding comparable results to standard supervised learning techniques.In this work, we study the use of Tree Tensor Networks (TTNs) in high-frequency real-time applications by exploiting the low-latency hardware of the Field-Programmable Gate Array (FPGA) technology. We present different implementations of TTN classifiers, capable of performing inference on classical ML datasets as well as on complex physics data. A preparatory analysis of bond dimensions and weight quantization is realized in the training phase, together with entanglement entropy and correlation measurements, that help setting the choice of the TTN architecture. The generated TTNs are then deployed on a hardware accelerator; using an FPGA integrated into a server, the inference of the TTN is completely offloaded. Eventually, a classifier for High Energy Physics (HEP) applications is implemented and executed fully pipelined with sub-microsecond latency.

Updated: 2024-09-24 13:21:21

标题: 在FPGA上实现的超低延迟量子启发式机器学习预测器

摘要: 张量网络（TNs）是一种用于表示量子多体系统的计算范式。最近的研究表明，TNs也可以应用于执行机器学习（ML）任务，产生与标准监督学习技术相媲美的结果。在这项工作中，我们通过利用现场可编程门阵列（FPGA）技术的低延迟硬件，研究了树张量网络（TTNs）在高频实时应用中的使用。我们提出了不同的TTN分类器实现，能够对经典ML数据集以及复杂的物理数据进行推断。在训练阶段进行了键维度和权重量化的预备分析，同时进行了纠缠熵和相关性测量，这有助于确定TTN架构的选择。生成的TTNs然后部署在硬件加速器上；利用集成在服务器中的FPGA完全卸载了TTN的推断。最终，为高能物理（HEP）应用实现了一个分类器，并以完全流水线化的方式执行，延迟为亚微秒级。

更新时间: 2024-09-24 13:21:21

领域: hep-ex,cs.LG,quant-ph

下载: http://arxiv.org/abs/2409.16075v1

Count on Your Elders: Laplace vs Gaussian Noise

In recent years, Gaussian noise has become a popular tool in differentially private algorithms, often replacing Laplace noise which dominated the early literature on differential privacy. Gaussian noise is the standard approach to $\textit{approximate}$ differential privacy, often resulting in much higher utility than traditional (pure) differential privacy mechanisms. In this paper we argue that Laplace noise may in fact be preferable to Gaussian noise in many settings, in particular when we seek to achieve $(\varepsilon,\delta)$-differential privacy for small values of $\delta$. We consider two scenarios: First, we consider the problem of counting under continual observation and present a new generalization of the binary tree mechanism that uses a $k$-ary number system with $\textit{negative digits}$ to improve the privacy-accuracy trade-off. Our mechanism uses Laplace noise and improves the mean squared error over all ``optimal'' $(\varepsilon,\delta)$-differentially private factorization mechanisms based on Gaussian noise whenever $\delta$ is sufficiently small. Specifically, using $k=19$ we get an asymptotic improvement over the bound given in the work by Henzinger, Upadhyay and Upadhyay (SODA 2023) when $\delta = O(T^{-0.92})$. Second, we show that the noise added by the Gaussian mechanism can always be replaced by Laplace noise of comparable variance for the same $(\epsilon, \delta)$ privacy guarantee, and in fact for sufficiently small $\delta$ the variance of the Laplace noise becomes strictly better. This challenges the conventional wisdom that Gaussian noise should be used for high-dimensional noise.

Updated: 2024-09-24 13:15:47

标题: 依赖于您的长者：拉普拉斯噪声对高斯噪声

摘要: 近年来，高斯噪声已成为差分隐私算法中的流行工具，通常取代了在差分隐私早期文献中占主导地位的拉普拉斯噪声。高斯噪声是$\textit{近似}$差分隐私的标准方法，通常比传统（纯）差分隐私机制具有更高的效用。在本文中，我们认为在许多情况下，拉普拉斯噪声实际上可能优于高斯噪声，特别是当我们试图实现$(\varepsilon,\delta)$-差分隐私的小$\delta$值时。我们考虑了两种情况：首先，我们考虑在持续观察下进行计数的问题，并提出了一种新的二叉树机制的泛化，该机制使用$k$进制数系统与$\textit{负数位}$以改善隐私-准确性权衡。我们的机制使用拉普拉斯噪声，并在$\delta$足够小时改进了所有“最佳”$(\varepsilon,\delta)$-差分隐私因子化机制的均方误差，这些机制基于高斯噪声。具体来说，当$k=19$时，我们对Henzinger、Upadhyay和Upadhyay在SODA 2023中的工作给出的界限进行了渐近改进，当$\delta = O(T^{-0.92})$时。其次，我们表明高斯机制添加的噪声总是可以用相同$(\varepsilon,\delta)$隐私保证的拉普拉斯噪声来替代，并且事实上，对于足够小的$\delta$，拉普拉斯噪声的方差变得更好。这挑战了高维噪声应该使用高斯噪声的传统智慧。

更新时间: 2024-09-24 13:15:47

领域: cs.CR,cs.DS

下载: http://arxiv.org/abs/2408.07021v2

Learning with Confidence: Training Better Classifiers from Soft Labels

In supervised machine learning, models are typically trained using data with hard labels, i.e., definite assignments of class membership. This traditional approach, however, does not take the inherent uncertainty in these labels into account. We investigate whether incorporating label uncertainty, represented as discrete probability distributions over the class labels -- known as soft labels -- improves the predictive performance of classification models. We first demonstrate the potential value of soft label learning (SLL) for estimating model parameters in a simulation experiment, particularly for limited sample sizes and imbalanced data. Subsequently, we compare the performance of various wrapper methods for learning from both hard and soft labels using identical base classifiers. On real-world-inspired synthetic data with clean labels, the SLL methods consistently outperform hard label methods. Since real-world data is often noisy and precise soft labels are challenging to obtain, we study the effect that noisy probability estimates have on model performance. Alongside conventional noise models, our study examines four types of miscalibration that are known to affect human annotators. The results show that SLL methods outperform the hard label methods in the majority of settings. Finally, we evaluate the methods on a real-world dataset with confidence scores, where the SLL methods are shown to match the traditional methods for predicting the (noisy) hard labels while providing more accurate confidence estimates.

Updated: 2024-09-24 13:12:29

标题: 学习有信心：从软标签中训练更好的分类器

摘要: 在监督式机器学习中，模型通常是使用带有硬标签的数据进行训练的，即明确指定类成员资格的分配。然而，这种传统方法并未考虑这些标签中固有的不确定性。我们研究了是否将标签不确定性纳入考虑，表示为对类标签的离散概率分布 -- 称为软标签 -- 能够提高分类模型的预测性能。我们首先在模拟实验中展示了软标签学习（SLL）在估计模型参数方面的潜在价值，特别是对于有限样本量和不平衡数据。随后，我们比较了使用相同基础分类器学习硬标签和软标签的各种包装器方法的性能。在具有干净标签的仿真实验数据上，SLL方法始终优于硬标签方法。由于真实世界数据通常存在噪声，并且精确的软标签很难获得，我们研究了噪声概率估计对模型性能的影响。除了常规噪声模型外，我们的研究还检查了四种已知影响人类注释者的误差校准类型。结果显示，在大多数情况下，SLL方法优于硬标签方法。最后，我们在具有置信度分数的真实世界数据集上评估了这些方法，结果显示SLL方法在预测（嘈杂的）硬标签方面与传统方法相匹配，同时提供更准确的置信度估计。

更新时间: 2024-09-24 13:12:29

领域: cs.LG

下载: http://arxiv.org/abs/2409.16071v1

A decision-theoretic model for a principal-agent collaborative learning problem

In this technical note, we consider a collaborative learning framework with principal-agent setting, in which the principal at each time-step determines a set of appropriate aggregation coefficients based on how the current parameter estimates from a group of $K$ agents effectively performed in connection with a separate test dataset, which is not part of the agents' training model datasets. Whereas, the agents, who act together as a team, then update their parameter estimates using a discrete-time version of Langevin dynamics with mean-field-like interaction term, but guided by their respective different training model datasets. Here, we propose a decision-theoretic framework that explicitly describes how the principal progressively determines a set of nonnegative and sum to one aggregation coefficients used by the agents in their mean-field-like interaction term, that eventually leading them to reach a consensus optimal parameter estimate. Interestingly, due to the inherent feedbacks and cooperative behavior among the agents, the proposed framework offers some advantages in terms of stability and generalization, despite that both the principal and the agents do not necessarily need to have any knowledge of the sample distributions or the quality of each others' datasets.

Updated: 2024-09-24 13:08:51

标题: 一个主体-代理协作学习问题的决策理论模型

摘要: 在这篇技术说明中，我们考虑了一个具有委托-代理设置的协作学习框架，其中在每个时间步骤，委托方根据一组$K$代理的当前参数估计在与一个单独的测试数据集相关联的表现如何，确定一组合适的聚合系数，该测试数据集不是代理的训练模型数据集的一部分。而代理作为一个团队一起行动，然后使用Langevin动力学的离散时间版本来更新他们的参数估计，带有类似场论的相互作用项，但是由各自不同的训练模型数据集引导。在这里，我们提出了一个决策理论框架，明确描述了委托方如何逐渐确定一组非负且总和为一的聚合系数，这些系数由代理在其类似场论的相互作用项中使用，最终使他们达成共识的最优参数估计。有趣的是，由于代理之间固有的反馈和合作行为，提出的框架在稳定性和泛化方面具有一些优势，尽管委托方和代理不一定需要了解样本分布或彼此数据集的质量。

更新时间: 2024-09-24 13:08:51

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2409.16068v1

Reliability in Semantic Segmentation: Can We Use Synthetic Data?

Assessing the robustness of perception models to covariate shifts and their ability to detect out-of-distribution (OOD) inputs is crucial for safety-critical applications such as autonomous vehicles. By nature of such applications, however, the relevant data is difficult to collect and annotate. In this paper, we show for the first time how synthetic data can be specifically generated to assess comprehensively the real-world reliability of semantic segmentation models. By fine-tuning Stable Diffusion with only in-domain data, we perform zero-shot generation of visual scenes in OOD domains or inpainted with OOD objects. This synthetic data is employed to evaluate the robustness of pretrained segmenters, thereby offering insights into their performance when confronted with real edge cases. Through extensive experiments, we demonstrate a high correlation between the performance of models when evaluated on our synthetic OOD data and when evaluated on real OOD inputs, showing the relevance of such virtual testing. Furthermore, we demonstrate how our approach can be utilized to enhance the calibration and OOD detection capabilities of segmenters. Code and data are made public.

Updated: 2024-09-24 13:05:28

标题: 语义分割的可靠性：我们可以使用合成数据吗？

摘要: 评估感知模型对协变量转移的稳健性以及它们检测超出分布（OOD）输入的能力对于自动驾驶汽车等安全关键应用至关重要。然而，由于这类应用的特性，相关数据很难收集和标注。本文首次展示了如何生成合成数据来全面评估语义分割模型在现实世界中的可靠性。通过仅使用领域内数据对Stable Diffusion进行微调，我们在OOD领域中进行了零-shot生成具有OOD对象的视觉场景或填充。利用这些合成数据评估预训练分割器的稳健性，从而为在面对真实边缘情况时的性能提供见解。通过大量实验，我们展示了模型在我们的合成OOD数据上评估时的性能与在真实OOD输入上评估时的性能之间的高相关性，显示了这种虚拟测试的相关性。此外，我们展示了我们的方法如何用于增强分割器的校准和OOD检测能力。代码和数据已公开。

更新时间: 2024-09-24 13:05:28

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.09231v2

Towards Robust Object Detection: Identifying and Removing Backdoors via Module Inconsistency Analysis

Object detection models, widely used in security-critical applications, are vulnerable to backdoor attacks that cause targeted misclassifications when triggered by specific patterns. Existing backdoor defense techniques, primarily designed for simpler models like image classifiers, often fail to effectively detect and remove backdoors in object detectors. We propose a backdoor defense framework tailored to object detection models, based on the observation that backdoor attacks cause significant inconsistencies between local modules' behaviors, such as the Region Proposal Network (RPN) and classification head. By quantifying and analyzing these inconsistencies, we develop an algorithm to detect backdoors. We find that the inconsistent module is usually the main source of backdoor behavior, leading to a removal method that localizes the affected module, resets its parameters, and fine-tunes the model on a small clean dataset. Extensive experiments with state-of-the-art two-stage object detectors show our method achieves a 90% improvement in backdoor removal rate over fine-tuning baselines, while limiting clean data accuracy loss to less than 4%. To the best of our knowledge, this work presents the first approach that addresses both the detection and removal of backdoors in two-stage object detection models, advancing the field of securing these complex systems against backdoor attacks.

Updated: 2024-09-24 12:58:35

标题: 朝向稳健的目标检测：通过模块不一致性分析识别和移除后门

摘要: 目标检测模型广泛应用于安全关键应用中，但容易受到后门攻击的影响，当触发特定模式时会导致目标错误分类。现有的后门防御技术主要针对简单模型如图像分类器，往往无法有效检测和消除目标检测器中的后门。我们提出了一种针对目标检测模型定制的后门防御框架，基于后门攻击导致本地模块行为之间显著不一致的观察，如区域提议网络（RPN）和分类头。通过量化和分析这些不一致性，我们开发了一个算法来检测后门。我们发现不一致的模块通常是后门行为的主要来源，导致一种定位受影响模块、重置其参数并在小型清洁数据集上微调模型的移除方法。通过与最先进的两阶段目标检测器进行大量实验，我们的方法在后期微调基线上实现了90%的后门移除率改进，同时将清洁数据准确性损失限制在不到4%。据我们所知，这项工作是首次针对两阶段目标检测模型中的后门检测和移除提出的方法，推动了保护这些复杂系统免受后门攻击的领域的发展。

更新时间: 2024-09-24 12:58:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.16057v1

Adversarial Watermarking for Face Recognition

Watermarking is an essential technique for embedding an identifier (i.e., watermark message) within digital images to assert ownership and monitor unauthorized alterations. In face recognition systems, watermarking plays a pivotal role in ensuring data integrity and security. However, an adversary could potentially interfere with the watermarking process, significantly impairing recognition performance. We explore the interaction between watermarking and adversarial attacks on face recognition models. Our findings reveal that while watermarking or input-level perturbation alone may have a negligible effect on recognition accuracy, the combined effect of watermarking and perturbation can result in an adversarial watermarking attack, significantly degrading recognition performance. Specifically, we introduce a novel threat model, the adversarial watermarking attack, which remains stealthy in the absence of watermarking, allowing images to be correctly recognized initially. However, once watermarking is applied, the attack is activated, causing recognition failures. Our study reveals a previously unrecognized vulnerability: adversarial perturbations can exploit the watermark message to evade face recognition systems. Evaluated on the CASIA-WebFace dataset, our proposed adversarial watermarking attack reduces face matching accuracy by 67.2% with an $\ell_\infty$ norm-measured perturbation strength of ${2}/{255}$ and by 95.9% with a strength of ${4}/{255}$.

Updated: 2024-09-24 12:58:32

标题: 对抗性水印技术用于人脸识别

摘要: 数字图像水印技术是一种在数字图像中嵌入标识符（即水印信息）的重要技术，用于确认所有权并监控未经授权的更改。在人脸识别系统中，水印技术在确保数据完整性和安全性方面起着至关重要的作用。然而，对手可能会干扰水印过程，严重影响识别性能。我们探讨了水印技术与人脸识别模型对抗攻击之间的相互作用。我们的研究结果表明，尽管水印或输入级扰动本身对识别准确性影响微乎其微，但水印和扰动的联合效应可能导致一种对抗性水印攻击，显著降低识别性能。具体而言，我们引入了一种新的威胁模型，即对抗性水印攻击，在没有水印的情况下保持隐秘性，使图像可以最初被正确识别。然而，一旦应用水印，攻击就会被激活，导致识别失败。我们的研究揭示了一个以前未被认识到的漏洞：对抗性扰动可以利用水印信息来规避人脸识别系统。在CASIA-WebFace数据集上评估，我们提出的对抗性水印攻击在$\ell_\infty$范数测量的扰动强度为${2}/{255}$时，将人脸匹配准确性降低了67.2%，在强度为${4}/{255}$时降低了95.9%。

更新时间: 2024-09-24 12:58:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.16056v1

Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting

In practical scenarios, time series forecasting necessitates not only accuracy but also efficiency. Consequently, the exploration of model architectures remains a perennially trending topic in research. To address these challenges, we propose a novel backbone architecture named Time Evidence Fusion Network (TEFN) from the perspective of information fusion. Specifically, we introduce the Basic Probability Assignment (BPA) Module based on evidence theory to capture the uncertainty of multivariate time series data from both channel and time dimensions. Additionally, we develop a novel multi-source information fusion method to effectively integrate the two distinct dimensions from BPA output, leading to improved forecasting accuracy. Lastly, we conduct extensive experiments to demonstrate that TEFN achieves performance comparable to state-of-the-art methods while maintaining significantly lower complexity and reduced training time. Also, our experiments show that TEFN exhibits high robustness, with minimal error fluctuations during hyperparameter selection. Furthermore, due to the fact that BPA is derived from fuzzy theory, TEFN offers a high degree of interpretability. Therefore, the proposed TEFN balances accuracy, efficiency, stability, and interpretability, making it a desirable solution for time series forecasting.

Updated: 2024-09-24 12:57:39

标题: 时间证据融合网络：长期时间序列预测中的多源视角

摘要: 在实际场景中，时间序列预测不仅需要准确性，还需要效率。因此，模型架构的探索仍然是研究中一个永恒的热门话题。为了解决这些挑战，我们提出了一种新颖的骨干架构，名为时间证据融合网络（TEFN），从信息融合的角度出发。具体来说，我们引入基于证据理论的基本概率分配（BPA）模块，以捕获多变量时间序列数据在通道和时间维度上的不确定性。此外，我们开发了一种新颖的多源信息融合方法，有效地整合了BPA输出的两个不同维度，从而提高了预测准确性。最后，我们进行了大量实验，证明了TEFN在保持较低复杂性和减少训练时间的同时，实现了与最先进方法可比的性能。此外，我们的实验表明，TEFN表现出高度的稳健性，在超参数选择过程中误差波动很小。此外，由于BPA源自模糊理论，TEFN具有很高的可解释性。因此，所提出的TEFN平衡了准确性、效率、稳定性和可解释性，使其成为时间序列预测的理想解决方案。

更新时间: 2024-09-24 12:57:39

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2405.06419v3

Denoising Graph Super-Resolution towards Improved Collider Event Reconstruction

Accurately reconstructing particles from detector data is a critical challenge in experimental particle physics, where the spatial resolution of calorimeters has a crucial impact. This study explores the integration of super-resolution techniques into an LHC-like reconstruction pipeline to effectively enhance the granularity of calorimeter data and suppress noise. We find that this software preprocessing step can significantly improve reconstruction quality without physical changes to detectors. To demonstrate the impact of our approach, we propose a novel particle flow model that offers enhanced particle reconstruction quality and interpretability. These advancements underline the potential of super-resolution to impact both current and future particle physics experiments.

Updated: 2024-09-24 12:56:56

标题: 去噪图形超分辨率以改善对撞机事件重建

摘要: 准确重构探测器数据中的粒子是实验粒子物理学中的一项关键挑战，其中量能器的空间分辨率具有至关重要的影响。本研究探讨了在类似LHC的重建流程中整合超分辨技术，以有效增强量能器数据的细节并抑制噪音。我们发现，这种软件预处理步骤可以显著改善重建质量，而无需对探测器进行物理改变。为了展示我们方法的影响，我们提出了一种新颖的粒子流模型，提供了增强的粒子重建质量和可解释性。这些进展凸显了超分辨技术对当前和未来的粒子物理实验的潜力影响。

更新时间: 2024-09-24 12:56:56

领域: hep-ex,cs.LG

下载: http://arxiv.org/abs/2409.16052v1

Efficiently Dispatching Flash Attention For Partially Filled Attention Masks

Transformers are widely used across various applications, many of which yield sparse or partially filled attention matrices. Examples include attention masks designed to reduce the quadratic complexity of attention, sequence packing techniques, and recent innovations like tree masking for fast validation in MEDUSA. Despite the inherent sparsity in these matrices, the state-of-the-art algorithm Flash Attention still processes them with quadratic complexity as though they were dense. In this paper, we introduce Binary Block Masking, a highly efficient modification that enhances Flash Attention by making it mask-aware. We further propose two optimizations: one tailored for masks with contiguous non-zero patterns and another for extremely sparse masks. Our experiments on attention masks derived from real-world scenarios demonstrate up to a 9x runtime improvement. The implementation will be publicly released to foster further research and application.

Updated: 2024-09-24 12:56:13

标题: 高效派发对部分填充的注意力掩码

摘要: 变压器广泛应用于各种应用程序中，其中许多产生稀疏或部分填充的注意力矩阵。示例包括旨在减少注意力的二次复杂性的注意力掩码，序列打包技术，以及像MEDUSA中用于快速验证的树掩码这样的最新创新。尽管这些矩阵中存在固有的稀疏性，但当前的最先进算法Flash Attention仍以二次复杂度处理它们，就像它们是密集的一样。在本文中，我们介绍了一种高效的修改，名为二进制块掩码，通过使其具有掩码意识来增强Flash Attention。我们进一步提出了两种优化：一种专门针对具有连续非零模式的掩码，另一种针对极其稀疏的掩码。我们基于来自真实场景的注意力掩码的实验表明，运行时间提高了多达9倍。实现将公开发布以促进进一步的研究和应用。

更新时间: 2024-09-24 12:56:13

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.15097v2

Whole-body end-effector pose tracking

Combining manipulation with the mobility of legged robots is essential for a wide range of robotic applications. However, integrating an arm with a mobile base significantly increases the system's complexity, making precise end-effector control challenging. Existing model-based approaches are often constrained by their modeling assumptions, leading to limited robustness. Meanwhile, recent Reinforcement Learning (RL) implementations restrict the arm's workspace to be in front of the robot or track only the position to obtain decent tracking accuracy. In this work, we address these limitations by introducing a whole-body RL formulation for end-effector pose tracking in a large workspace on rough, unstructured terrains. Our proposed method involves a terrain-aware sampling strategy for the robot's initial configuration and end-effector pose commands, as well as a game-based curriculum to extend the robot's operating range. We validate our approach on the ANYmal quadrupedal robot with a six DoF robotic arm. Through our experiments, we show that the learned controller achieves precise command tracking over a large workspace and adapts across varying terrains such as stairs and slopes. On deployment, it achieves a pose-tracking error of 2.64 cm and 3.64 degrees, outperforming existing competitive baselines.

Updated: 2024-09-24 12:51:32

标题: 全身末端执行器姿态跟踪

摘要: 将操纵与四足机器人的移动性结合起来对于广泛的机器人应用至关重要。然而，将手臂与移动基座集成会显著增加系统的复杂性，使精确的末端执行器控制具有挑战性。现有的基于模型的方法通常受到建模假设的限制，导致鲁棒性受限。与此同时，最近的强化学习（RL）实现将手臂的工作空间限定为在机器人前方，或者仅跟踪位置以获得良好的跟踪精度。在这项工作中，我们通过引入整体RL表达式来解决这些限制，以实现在粗糙、非结构化地形上对末端执行器姿态进行跟踪。我们提出的方法涉及一种考虑地形的采样策略，用于机器人的初始配置和末端执行器姿态命令，以及一个基于游戏的课程表，以扩展机器人的操作范围。我们在具有六自由度机械臂的ANYmal四足机器人上验证了我们的方法。通过我们的实验，我们展示了学习控制器在大工作空间上实现精确的命令跟踪，并在不同地形（如楼梯和斜坡）之间进行调整。在部署时，它实现了2.64厘米和3.64度的姿态跟踪误差，优于现有的竞争基线。

更新时间: 2024-09-24 12:51:32

领域: cs.RO,cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2409.16048v1

LTNtorch: PyTorch Implementation of Logic Tensor Networks

Logic Tensor Networks (LTN) is a Neuro-Symbolic framework that effectively incorporates deep learning and logical reasoning. In particular, LTN allows defining a logical knowledge base and using it as the objective of a neural model. This makes learning by logical reasoning possible as the parameters of the model are optimized by minimizing a loss function composed of a set of logical formulas expressing facts about the learning task. The framework learns via gradient-descent optimization. Fuzzy logic, a relaxation of classical logic permitting continuous truth values in the interval [0,1], makes this learning possible. Specifically, the training of an LTN consists of three steps. Firstly, (1) the training data is used to ground the formulas. Then, (2) the formulas are evaluated, and the loss function is computed. Lastly, (3) the gradients are back-propagated through the logical computational graph, and the weights of the neural model are changed so the knowledge base is maximally satisfied. LTNtorch is the fully documented and tested PyTorch implementation of Logic Tensor Networks. This paper presents the formalization of LTN and how LTNtorch implements it. Moreover, it provides a basic binary classification example.

Updated: 2024-09-24 12:50:22

标题: LTNtorch：逻辑张量网络的PyTorch实现

摘要: 逻辑张量网络（LTN）是一个神经符号框架，有效地结合了深度学习和逻辑推理。特别是，LTN允许定义一个逻辑知识库，并将其用作神经模型的目标。这使得通过逻辑推理学习成为可能，因为模型的参数通过最小化由一组逻辑公式组成的损失函数来优化。该框架通过梯度下降优化学习。模糊逻辑是经典逻辑的一种松弛，允许在区间[0,1]中有连续的真值，这使得这种学习成为可能。具体来说，LTN的训练包括三个步骤。首先，（1）训练数据用于对公式进行基础。然后，（2）评估公式，并计算损失函数。最后，（3）通过逻辑计算图反向传播梯度，并更改神经模型的权重，使知识库得到最大满足。LTNtorch是逻辑张量网络的完全文档化和经过测试的PyTorch实现。本文介绍了LTN的形式化以及LTNtorch如何实现它。此外，它提供了一个基本的二元分类示例。

更新时间: 2024-09-24 12:50:22

领域: cs.AI

下载: http://arxiv.org/abs/2409.16045v1

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Deep learning for time series forecasting has seen significant advancements over the past decades. However, despite the success of large-scale pre-training in language and vision domains, pre-trained time series models remain limited in scale and operate at a high cost, hindering the development of larger capable forecasting models in real-world applications. In response, we introduce Time-MoE, a scalable and unified architecture designed to pre-train larger, more capable forecasting foundation models while reducing inference costs. By leveraging a sparse mixture-of-experts (MoE) design, Time-MoE enhances computational efficiency by activating only a subset of networks for each prediction, reducing computational load while maintaining high model capacity. This allows Time-MoE to scale effectively without a corresponding increase in inference costs. Time-MoE comprises a family of decoder-only transformer models that operate in an auto-regressive manner and support flexible forecasting horizons with varying input context lengths. We pre-trained these models on our newly introduced large-scale data Time-300B, which spans over 9 domains and encompassing over 300 billion time points. For the first time, we scaled a time series foundation model up to 2.4 billion parameters, achieving significantly improved forecasting precision. Our results validate the applicability of scaling laws for training tokens and model size in the context of time series forecasting. Compared to dense models with the same number of activated parameters or equivalent computation budgets, our models consistently outperform them by large margin. These advancements position Time-MoE as a state-of-the-art solution for tackling real-world time series forecasting challenges with superior capability, efficiency, and flexibility.

Updated: 2024-09-24 12:42:18

标题: Time-MoE：具有亿级规模的时间序列基础模型，采用专家混合模型

摘要: 深度学习用于时间序列预测在过去几十年取得了显著进展。然而，尽管在语言和视觉领域大规模预训练取得成功，预训练的时间序列模型仍然存在规模有限且运行成本高的问题，阻碍了在实际应用中开发更大能力的预测模型。为此，我们引入了Time-MoE，一个可扩展且统一的架构，旨在预训练更大、更有能力的预测基础模型，同时降低推断成本。通过利用稀疏的专家混合（MoE）设计，Time-MoE通过仅激活每次预测的网络子集来提高计算效率，降低计算负载同时保持高模型容量。这使得Time-MoE能够有效扩展，而不会相应增加推断成本。Time-MoE包括一系列仅解码器的transformer模型，以自回归方式运行，并支持具有不同输入上下文长度的灵活预测时间跨度。我们在我们新引入的大规模数据Time-300B上预训练了这些模型，该数据跨越9个领域，涵盖超过3000亿个时间点。我们首次将时间序列基础模型扩展到24亿参数，实现了显著提高的预测精度。我们的结果验证了在时间序列预测环境中对训练令牌和模型大小的扩展规律的适用性。与具有相同激活参数数量或等效计算预算的密集模型相比，我们的模型始终表现出较大的优势。这些进展将Time-MoE定位为处理具有卓越能力、效率和灵活性的实际时间序列预测挑战的最先进解决方案。

更新时间: 2024-09-24 12:42:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.16040v1

An Invitation to Deep Reinforcement Learning

Training a deep neural network to maximize a target objective has become the standard recipe for successful machine learning over the last decade. These networks can be optimized with supervised learning, if the target objective is differentiable. For many interesting problems, this is however not the case. Common objectives like intersection over union (IoU), bilingual evaluation understudy (BLEU) score or rewards cannot be optimized with supervised learning. A common workaround is to define differentiable surrogate losses, leading to suboptimal solutions with respect to the actual objective. Reinforcement learning (RL) has emerged as a promising alternative for optimizing deep neural networks to maximize non-differentiable objectives in recent years. Examples include aligning large language models via human feedback, code generation, object detection or control problems. This makes RL techniques relevant to the larger machine learning audience. The subject is, however, time intensive to approach due to the large range of methods, as well as the often very theoretical presentation. In this introduction, we take an alternative approach, different from classic reinforcement learning textbooks. Rather than focusing on tabular problems, we introduce reinforcement learning as a generalization of supervised learning, which we first apply to non-differentiable objectives and later to temporal problems. Assuming only basic knowledge of supervised learning, the reader will be able to understand state-of-the-art deep RL algorithms like proximal policy optimization (PPO) after reading this tutorial.

Updated: 2024-09-24 12:39:56

标题: 一则深度强化学习的邀请

摘要: 在过去的十年中，训练深度神经网络以最大化目标目标已成为成功机器学习的标准配方。如果目标目标是可微的，这些网络可以通过监督学习进行优化。然而，对于许多有趣的问题，情况并非如此。常见的目标，如交集超联合（IoU）、双语评估助手（BLEU）分数或奖励，无法通过监督学习进行优化。一种常见的解决方法是定义可微损失函数，导致相对于实际目标的次优解。最近，强化学习（RL）作为优化深度神经网络以最大化不可微目标的有前途的替代方案出现。示例包括通过人类反馈对齐大型语言模型、代码生成、目标检测或控制问题。这使得RL技术与更广泛的机器学习受众相关。然而，由于方法的范围广泛，以及通常非常理论化的呈现，这一主题在方法上是耗时的。在本介绍中，我们采取了一种与经典强化学习教科书不同的方法。我们将强化学习作为监督学习的一种泛化，首先应用于不可微分的目标，然后再应用于时间问题。假设读者只具有监督学习的基本知识，在阅读本教程后将能够理解像近端策略优化（PPO）这样的最新深度RL算法。

更新时间: 2024-09-24 12:39:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.08365v2

Grounded Computation & Consciousness: A Framework for Exploring Consciousness in Machines & Other Organisms

Computational modeling is a critical tool for understanding consciousness, but is it enough on its own? This paper discusses the necessity for an ontological basis of consciousness, and introduces a formal framework for grounding computational descriptions into an ontological substrate. Utilizing this technique, a method is demonstrated for estimating the difference in qualitative experience between two systems. This framework has wide applicability to computational theories of consciousness.

Updated: 2024-09-24 12:34:05

标题: 基于实地计算与意识：探索机器和其他生物意识的框架

摘要: 计算建模是理解意识的关键工具，但单靠它足够吗？本文讨论了意识本体论基础的必要性，并引入了一个将计算描述基础化为本体基质的形式框架。利用这一技术，展示了一种估计两个系统之间定性经验差异的方法。这一框架对意识的计算理论具有广泛适用性。

更新时间: 2024-09-24 12:34:05

领域: q-bio.NC,cs.AI

下载: http://arxiv.org/abs/2409.16036v1

Deep chroma compression of tone-mapped images

Acquisition of high dynamic range (HDR) images is thriving due to the increasing use of smart devices and the demand for high-quality output. Extensive research has focused on developing methods for reducing the luminance range in HDR images using conventional and deep learning-based tone mapping operators to enable accurate reproduction on conventional 8 and 10-bit digital displays. However, these methods often fail to account for pixels that may lie outside the target display's gamut, resulting in visible chromatic distortions or color clipping artifacts. Previous studies suggested that a gamut management step ensures that all pixels remain within the target gamut. However, such approaches are computationally expensive and cannot be deployed on devices with limited computational resources. We propose a generative adversarial network for fast and reliable chroma compression of HDR tone-mapped images. We design a loss function that considers the hue property of generated images to improve color accuracy, and train the model on an extensive image dataset. Quantitative experiments demonstrate that the proposed model outperforms state-of-the-art image generation and enhancement networks in color accuracy, while a subjective study suggests that the generated images are on par or superior to those produced by conventional chroma compression methods in terms of visual quality. Additionally, the model achieves real-time performance, showing promising results for deployment on devices with limited computational resources.

Updated: 2024-09-24 12:31:55

标题: 色调映射图像的深度色度压缩

摘要: 获得高动态范围（HDR）图像的需求正在增长，这主要是由于智能设备的广泛使用和对高质量输出的需求。大量的研究已经集中在开发方法上，以减少HDR图像中的亮度范围，使用传统的和基于深度学习的色调映射算子，以便在传统的8位和10位数字显示器上实现准确的再现。然而，这些方法经常未能考虑到可能位于目标显示器色域之外的像素，导致可见的色彩失真或颜色剪切伪影。先前的研究表明，色域管理步骤确保所有像素保持在目标色域内。然而，这种方法在计算上昂贵，并且无法在计算资源有限的设备上部署。我们提出了一个用于快速可靠的HDR色调映射图像的色度压缩的生成对抗网络。我们设计了一个损失函数，考虑到生成图像的色调属性以改善色彩准确性，并在广泛的图像数据集上对模型进行训练。定量实验表明，所提出的模型在色彩准确性方面优于最先进的图像生成和增强网络，而主观研究表明，生成的图像在视觉质量方面与传统的色度压缩方法产生的图像相当或更优秀。此外，该模型实现了实时性能，显示出在计算资源有限的设备上部署的有希望的结果。

更新时间: 2024-09-24 12:31:55

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.16032v1

(In)Security of Mobile Apps in Developing Countries: A Systematic Literature Review

In developing countries, several key sectors, including education, finance, agriculture, and healthcare, mainly deliver their services via mobile app technology on handheld devices. As a result, mobile app security has emerged as a paramount issue in developing countries. In this paper, we investigate the state of research on mobile app security, focusing on developing countries. More specifically, we performed a systematic literature review exploring the research directions taken by existing works, the different security concerns addressed, and the techniques used by researchers to highlight or address app security issues. Our main findings are: (1) the literature includes only a few studies on mobile app security in the context of developing countries ; (2) among the different security concerns that researchers study, vulnerability detection appears to be the leading research topic; (3) FinTech apps are revealed as the main target in the relevant literature. Overall, our work highlights that there is largely room for developing further specialized techniques addressing mobile app security in the context of developing countries.

Updated: 2024-09-24 12:24:51

标题: 发展中国家移动应用程序的（不）安全性：一项系统文献综述

摘要: 在发展中国家，包括教育、金融、农业和医疗等几个关键领域主要通过手持设备上的移动应用技术提供服务。因此，移动应用安全已成为发展中国家的一个重要问题。本文调查了移动应用安全研究的现状，重点关注发展中国家。更具体地，我们进行了系统文献综述，探讨现有研究所采取的研究方向、不同的安全关注领域以及研究人员用于突出或解决应用安全问题的技术。我们主要的发现是：（1）文献中只有少数几项关于发展中国家移动应用安全的研究；（2）在研究人员研究的不同安全关注领域中，漏洞检测似乎是主要研究课题；（3）金融科技应用被揭示为相关文献中的主要目标。总体而言，我们的工作突出了在发展中国家的背景下，有很大的空间来发展更多专业化技术，解决移动应用安全问题。

更新时间: 2024-09-24 12:24:51

领域: cs.CR

下载: http://arxiv.org/abs/2405.05117v2

Bridging Environments and Language with Rendering Functions and Vision-Language Models

Vision-language models (VLMs) have tremendous potential for grounding language, and thus enabling language-conditioned agents (LCAs) to perform diverse tasks specified with text. This has motivated the study of LCAs based on reinforcement learning (RL) with rewards given by rendering images of an environment and evaluating those images with VLMs. If single-task RL is employed, such approaches are limited by the cost and time required to train a policy for each new task. Multi-task RL (MTRL) is a natural alternative, but requires a carefully designed corpus of training tasks and does not always generalize reliably to new tasks. Therefore, this paper introduces a novel decomposition of the problem of building an LCA: first find an environment configuration that has a high VLM score for text describing a task; then use a (pretrained) goal-conditioned policy to reach that configuration. We also explore several enhancements to the speed and quality of VLM-based LCAs, notably, the use of distilled models, and the evaluation of configurations from multiple viewpoints to resolve the ambiguities inherent in a single 2D view. We demonstrate our approach on the Humanoid environment, showing that it results in LCAs that outperform MTRL baselines in zero-shot generalization, without requiring any textual task descriptions or other forms of environment-specific annotation during training. Videos and an interactive demo can be found at https://europe.naverlabs.com/text2control

Updated: 2024-09-24 12:24:07

标题: 用渲染功能和视觉语言模型搭建环境和语言之间的桥梁

摘要: 视觉语言模型（VLMs）具有巨大的潜力，可以为语言提供基础，从而使语言条件代理（LCAs）能够执行用文本指定的多样化任务。这促使了基于强化学习（RL）的LCAs研究，其奖励由渲染环境图像和使用VLMs评估这些图像而给出。如果采用单任务RL，这种方法受到为每个新任务训练策略所需的成本和时间的限制。多任务RL（MTRL）是一个自然的替代方案，但需要一个精心设计的训练任务语料库，并且并不总是可靠地推广到新任务。因此，本文介绍了构建LCA问题的一个新的分解方法：首先找到一个环境配置，该配置对描述任务的文本具有较高的VLM分数；然后使用（预训练的）目标条件策略来达到该配置。我们还探讨了几种提高基于VLM的LCAs速度和质量的增强方法，特别是使用精简模型，以及从多个视角评估配置以解决单个2D视图固有的歧义。我们在Humanoid环境上展示了我们的方法，结果表明，在零次推广中，LCAs的表现优于MTRL基线，而无需在训练过程中需要任何文本任务描述或其他形式的环境特定注释。视频和交互式演示可以在https://europe.naverlabs.com/text2control找到。

更新时间: 2024-09-24 12:24:07

领域: cs.AI

下载: http://arxiv.org/abs/2409.16024v1

AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment

Cognitive biases are systematic deviations in thinking that lead to irrational judgments and problematic decision-making, extensively studied across various fields. Recently, large language models (LLMs) have shown advanced understanding capabilities but may inherit human biases from their training data. While social biases in LLMs have been well-studied, cognitive biases have received less attention, with existing research focusing on specific scenarios. The broader impact of cognitive biases on LLMs in various decision-making contexts remains underexplored. We investigated whether LLMs are influenced by the threshold priming effect in relevance judgments, a core task and widely-discussed research topic in the Information Retrieval (IR) coummunity. The priming effect occurs when exposure to certain stimuli unconsciously affects subsequent behavior and decisions. Our experiment employed 10 topics from the TREC 2019 Deep Learning passage track collection, and tested AI judgments under different document relevance scores, batch lengths, and LLM models, including GPT-3.5, GPT-4, LLaMa2-13B and LLaMa2-70B. Results showed that LLMs tend to give lower scores to later documents if earlier ones have high relevance, and vice versa, regardless of the combination and model used. Our finding demonstrates that LLM%u2019s judgments, similar to human judgments, are also influenced by threshold priming biases, and suggests that researchers and system engineers should take into account potential human-like cognitive biases in designing, evaluating, and auditing LLMs in IR tasks and beyond.

Updated: 2024-09-24 12:23:15

标题: AI可能存在认知偏见：基于LLM批量相关性评估中的阈值启动的探索性研究

摘要: 认知偏见是导致不理性判断和问题决策的系统性偏差，在各个领域得到了广泛研究。最近，大型语言模型（LLMs）展示了先进的理解能力，但可能会继承其训练数据中的人类偏见。虽然LLMs中的社会偏见得到了充分研究，但认知偏见却受到较少关注，现有研究主要集中在特定场景上。认知偏见对LLMs在各种决策背景中的广泛影响尚未得到充分探讨。我们调查了LLMs是否受到阈值启动效应在相关性判断中的影响，这是信息检索（IR）社区的核心任务和广泛讨论的研究课题。启动效应发生在暴露于某些刺激物后无意识地影响后续行为和决策。我们的实验使用了来自TREC 2019深度学习段落跟踪收集的10个主题，并在不同的文档相关性分数、批次长度和LLM模型（包括GPT-3.5、GPT-4、LLaMa2-13B和LLaMa2-70B）下测试了人工智能判断。结果显示，LLMs倾向于给予后续文档较低的分数，如果先前的文档具有较高的相关性，反之亦然，无论使用的组合和模型如何。我们的发现表明，类似于人类判断，LLMs的判断也受到阈值启动偏见的影响，并建议研究人员和系统工程师在设计、评估和审核IR任务以及其他领域中的LLMs时应考虑潜在的类人认知偏见。

更新时间: 2024-09-24 12:23:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.16022v1

Lattice-Based Vulnerabilities in Lee Metric Post-Quantum Cryptosystems

Post-quantum cryptography has gained attention due to the need for secure cryptographic systems in the face of quantum computing. Code-based and lattice-based cryptography are two prominent approaches, both heavily studied within the NIST standardization project. Code-based cryptography -- most prominently exemplified by the McEliece cryptosystem -- is based on the hardness of decoding random linear error-correcting codes. Despite the McEliece cryptosystem having been unbroken for several decades, it suffers from large key sizes, which has led to exploring variants using metrics than the Hamming metric, such as the Lee metric. This alternative metric may allow for smaller key sizes, but requires further analysis for potential vulnerabilities to lattice-based attack techniques. In this paper, we consider a generic Lee metric based McEliece type cryptosystem and evaluate its security against lattice-based attacks.

Updated: 2024-09-24 12:21:33

标题: 基于格的莱氏度量后量子密码系统中的漏洞

摘要: 后量子密码学因量子计算的出现而备受关注，人们需要安全的密码系统。基于码和基于格的密码学是两种著名的方法，都在NIST标准化项目中得到了广泛研究。基于码的密码学，最著名的例子是McEliece密码系统，它基于解码随机线性纠错码的困难性。尽管McEliece密码系统几十年来一直没有被破解，但它存在着大密钥尺寸的问题，因此人们开始探索使用与汉明度量不同的度量标准，比如李度量。这种替代度量标准可能会导致更小的密钥尺寸，但需要进一步分析其对基于格攻击技术的潜在脆弱性。在本文中，我们考虑了一种基于李度量的McEliece类型密码系统，并评估其对基于格攻击的安全性。

更新时间: 2024-09-24 12:21:33

领域: cs.CR,cs.IT,math.IT

下载: http://arxiv.org/abs/2409.16018v1

Halfway Escape Optimization: A Quantum-Inspired Solution for General Optimization Problems

This paper first proposes the Halfway Escape Optimization (HEO) algorithm, a quantum-inspired metaheuristic designed to address general optimization problems. The HEO mimics the effects between quantum such as tunneling, entanglement. After the introduction to the HEO mechansims, the study presents a comprehensive evaluation of HEO's performance against extensively-used optimization algorithms, including Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Artificial Fish Swarm Algorithm (AFSA), Grey Wolf Optimizer (GWO), and Quantum behaved Particle Swarm Optimization (QPSO). The primary analysis encompasses 14 benchmark functions with dimension 30, demonstrating HEO's effectiveness and adaptability in navigating general optimization problems. The test of HEO in Pressure Vessel Design and Tubular Column Design also infers its feasibility and potential in real-time applications. Further validation of HEO in Osmancik-97 and Cammeo Rice Classification achieves a higher accuracy record.

Updated: 2024-09-24 12:11:50

标题: 半路逃逸优化：一种量子启发式解决方案用于一般优化问题

摘要: 本文首先提出了Halfway Escape Optimization (HEO)算法，这是一种受量子启发的元启发式算法，旨在解决一般优化问题。HEO模拟了量子之间的效应，如隧道效应和纠缠。在介绍HEO机制之后，研究展示了HEO在广泛使用的优化算法中的表现，包括粒子群优化(PSO)、遗传算法(GA)、人工鱼群算法(AFSA)、灰狼优化器(GWO)和量子行为粒子群优化(QPSO)。主要分析包括30维度的14个基准函数，展示了HEO在导航一般优化问题中的有效性和适应性。在压力容器设计和管状柱设计中对HEO的测试也表明其在实时应用中的可行性和潜力。在Osmancik-97和Cammeo大米分类中对HEO的进一步验证实现了更高的准确性记录。

更新时间: 2024-09-24 12:11:50

领域: cs.NE,cs.AI,math.OC

下载: http://arxiv.org/abs/2405.02850v7

NTK-Guided Few-Shot Class Incremental Learning

The proliferation of Few-Shot Class Incremental Learning (FSCIL) methodologies has highlighted the critical challenge of maintaining robust anti-amnesia capabilities in FSCIL learners. In this paper, we present a novel conceptualization of anti-amnesia in terms of mathematical generalization, leveraging the Neural Tangent Kernel (NTK) perspective. Our method focuses on two key aspects: ensuring optimal NTK convergence and minimizing NTK-related generalization loss, which serve as the theoretical foundation for cross-task generalization. To achieve global NTK convergence, we introduce a principled meta-learning mechanism that guides optimization within an expanded network architecture. Concurrently, to reduce the NTK-related generalization loss, we systematically optimize its constituent factors. Specifically, we initiate self-supervised pre-training on the base session to enhance NTK-related generalization potential. These self-supervised weights are then carefully refined through curricular alignment, followed by the application of dual NTK regularization tailored specifically for both convolutional and linear layers. Through the combined effects of these measures, our network acquires robust NTK properties, ensuring optimal convergence and stability of the NTK matrix and minimizing the NTK-related generalization loss, significantly enhancing its theoretical generalization. On popular FSCIL benchmark datasets, our NTK-FSCIL surpasses contemporary state-of-the-art approaches, elevating end-session accuracy by 2.9\% to 9.3\%.

Updated: 2024-09-24 12:11:47

标题: NTK引导的少样本类别增量学习

摘要: Few-Shot Class Incremental Learning (FSCIL)方法的广泛应用突显了在FSCIL学习者中保持强大的反遗忘能力的关键挑战。在本文中，我们提出了一种新颖的反遗忘概念，即数学泛化，利用神经切线核（NTK）的视角。我们的方法集中在两个关键方面：确保最优NTK收敛和最小化与NTK相关的泛化损失，这两者为跨任务泛化提供了理论基础。为了实现全局NTK收敛，我们引入了一个有原则的元学习机制，指导在扩展的网络架构内进行优化。同时，为了减少与NTK相关的泛化损失，我们系统地优化其组成因素。具体来说，我们在基础会话上启动自监督预训练，以增强与NTK相关的泛化潜力。然后，通过课程对齐精心调整这些自监督权重，然后应用专门针对卷积和线性层的双NTK正则化。通过这些措施的综合效果，我们的网络获得了强大的NTK特性，确保NTK矩阵的最佳收敛性和稳定性，并将NTK相关的泛化损失最小化，显著增强了其理论泛化能力。在流行的FSCIL基准数据集上，我们的NTK-FSCIL超越了当代最先进的方法，将终端准确性提高了2.9%至9.3%。

更新时间: 2024-09-24 12:11:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.12486v2

Robust Neural IDA-PBC: passivity-based stabilization under approximations

In this paper, we restructure the Neural Interconnection and Damping Assignment - Passivity Based Control (Neural IDA-PBC) design methodology, and we formally analyze its closed-loop properties. Neural IDA-PBC redefines the IDA-PBC design approach as an optimization problem by building on the framework of Physics Informed Neural Networks (PINNs). However, the closed-loop stability and robustness properties under Neural IDA-PBC remain unexplored. To address the issue, we study the behavior of classical IDA-PBC under approximations. Our theoretical analysis allows deriving conditions for practical and asymptotic stability of the desired equilibrium point. Moreover, it extends the Neural IDA-PBC applicability to port-Hamiltonian systems where the matching conditions cannot be solved exactly. Our renewed optimization-based design introduces three significant aspects: i) it involves a novel optimization objective including stability and robustness constraints issued from our theoretical analysis; ii) it employs separate Neural Networks (NNs), which can be structured to reduce the search space to relevant functions; iii) it does not require knowledge about the port-Hamiltonian formulation of the system's model. Our methodology is validated with simulations on three standard benchmarks: a double pendulum, a nonlinear mass-spring-damper and a cartpole. Notably, classical IDA-PBC designs cannot be analytically derived for the latter.

Updated: 2024-09-24 12:08:27

标题: 强健的神经化IDA-PBC：在近似条件下的基于从动性的稳定化

摘要: 在这篇论文中，我们重构了神经互连和阻尼分配 - 基于从动控制（神经IDA-PBC）设计方法，并对其闭环特性进行了正式分析。神经IDA-PBC通过借鉴物理启发神经网络（PINNs）框架，将IDA-PBC设计方法重新定义为一个优化问题。然而，在神经IDA-PBC下的闭环稳定性和鲁棒性特性仍未被探讨。为解决这一问题，我们研究了经典IDA-PBC在近似条件下的行为。我们的理论分析允许推导出对所需平衡点的实际和渐近稳定性的条件。此外，它将神经IDA-PBC的适用性扩展到端哈密尔顿系统，其中匹配条件无法精确求解。我们更新的基于优化的设计引入了三个重要方面：i）它包含一个新颖的优化目标，包括来自我们的理论分析的稳定性和鲁棒性约束；ii）它使用单独的神经网络（NNs），可以构建为减少相关函数的搜索空间；iii）它不需要了解系统模型的端哈密尔顿表述。我们的方法通过对三个标准基准进行模拟验证：一个双摆，一个非线性质量弹簧阻尼器和一个小车摆。值得注意的是，经典的IDA-PBC设计无法为后者进行解析推导。

更新时间: 2024-09-24 12:08:27

领域: eess.SY,cs.LG,cs.SY,math.OC

下载: http://arxiv.org/abs/2409.16008v1

A Distributed Approach to Autonomous Intersection Management via Multi-Agent Reinforcement Learning

Autonomous intersection management (AIM) poses significant challenges due to the intricate nature of real-world traffic scenarios and the need for a highly expensive centralised server in charge of simultaneously controlling all the vehicles. This study addresses such issues by proposing a novel distributed approach to AIM utilizing multi-agent reinforcement learning (MARL). We show that by leveraging the 3D surround view technology for advanced assistance systems, autonomous vehicles can accurately navigate intersection scenarios without needing any centralised controller. The contributions of this paper thus include a MARL-based algorithm for the autonomous management of a 4-way intersection and also the introduction of a new strategy called prioritised scenario replay for improved training efficacy. We validate our approach as an innovative alternative to conventional centralised AIM techniques, ensuring the full reproducibility of our results. Specifically, experiments conducted in virtual environments using the SMARTS platform highlight its superiority over benchmarks across various metrics.

Updated: 2024-09-24 12:04:50

标题: 一个基于多智能体强化学习的分布式自治交叉口管理方法

摘要: 自主交叉路口管理（AIM）由于真实世界交通场景的复杂性和需要一个高昂的集中式服务器来同时控制所有车辆而面临重大挑战。本研究通过提出一种利用多智能体强化学习（MARL）的分布式AIM方法来解决这些问题。我们展示，通过利用3D环视技术进行高级辅助系统，自动驾驶车辆可以准确地在交叉路口场景中导航，而无需任何集中式控制器。本文的贡献包括基于MARL的算法，用于自主管理4路口，并引入了一种名为优先排序场景重演的新策略，以提高训练效果。我们验证我们的方法作为传统集中式AIM技术的创新替代方案，确保我们结果的完全可重现性。具体来说，在使用SMARTS平台的虚拟环境中进行的实验突显了其在各种指标上优于基准的优越性。

更新时间: 2024-09-24 12:04:50

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.08655v2

Artificial Human Intelligence: The role of Humans in the Development of Next Generation AI

Human intelligence, the most evident and accessible form of source of reasoning, hosted by biological hardware, has evolved and been refined over thousands of years, positioning itself today to create new artificial forms and preparing to self--design their evolutionary path forward. Beginning with the advent of foundation models, the rate at which human and artificial intelligence interact with each other has surpassed any anticipated quantitative figures. The close engagement led to both bits of intelligence to be impacted in various ways, which naturally resulted in complex confluences that warrant close scrutiny. In the sequel, we shall explore the interplay between human and machine intelligence, focusing on the crucial role humans play in developing ethical, responsible, and robust intelligent systems. We slightly delve into interesting aspects of implementation inspired by the mechanisms underlying neuroscience and human cognition. Additionally, we propose future perspectives, capitalizing on the advantages of symbiotic designs to suggest a human-centered direction for next-generation AI development. We finalize this evolving document with a few thoughts and open questions yet to be addressed by the broader community.

Updated: 2024-09-24 12:02:20

标题: 人工智能：人类在下一代人工智能发展中的角色

摘要: 人类智力是最明显和可接触的推理来源，由生物硬件托管，经过数千年的演化和精炼，如今已定位于创造新的人工形式并准备自我设计其进化路径。从基础模型的出现开始，人类和人工智能相互作用的速度超过了任何预期的数量数字。这种密切的互动导致了两种智力在各种方式上受到影响，自然而然地产生了需要密切审视的复杂融合。在随后的部分，我们将探讨人类智能和机器智能之间的相互作用，重点放在人类在开发道德、负责任和强大智能系统方面所扮演的关键角色。我们稍微深入探讨受到神经科学和人类认知机制启发的实施有趣方面。此外，我们提出未来展望，利用共生设计的优势，建议为下一代人工智能开发指出以人为中心的方向。我们用一些想法和尚未被更广泛社区解决的开放问题来完成这一不断发展的文档。

更新时间: 2024-09-24 12:02:20

领域: cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2409.16001v1

Large Language Models as Carriers of Hidden Messages

Simple fine-tuning can embed hidden text into large language models (LLMs), which is revealed only when triggered by a specific query. Applications include LLM fingerprinting, where a unique identifier is embedded to verify licensing compliance, and steganography, where the LLM carries hidden messages disclosed through a trigger query. Our work demonstrates that embedding hidden text via fine-tuning, although seemingly secure due to the vast number of potential triggers, is vulnerable to extraction through analysis of the LLM's output decoding process. We introduce an extraction attack called Unconditional Token Forcing (UTF), which iteratively feeds tokens from the LLM's vocabulary to reveal sequences with high token probabilities, indicating hidden text candidates. We also present Unconditional Token Forcing Confusion (UTFC), a defense paradigm that makes hidden text resistant to all known extraction attacks without degrading the general performance of LLMs compared to standard fine-tuning. UTFC has both benign (improving LLM fingerprinting) and malign applications (using LLMs to create covert communication channels).

Updated: 2024-09-24 12:00:29

标题: 大型语言模型作为隐藏信息的载体

摘要: 简单的微调可以将隐藏文本嵌入大型语言模型（LLMs）中，只有在特定查询触发时才会显现出来。应用包括LLM指纹识别，其中嵌入了唯一标识符以验证许可合规性，以及隐写术，其中LLM携带隐藏信息，通过触发查询披露出来。我们的工作表明，通过微调嵌入隐藏文本，尽管由于潜在触发器的大量而看似安全，但却容易受到LLM输出解码过程分析的攻击。我们引入了一种称为无条件令牌强制（UTF）的提取攻击，它通过迭代地将LLM词汇中的令牌提供给序列，以揭示具有高令牌概率的序列，指示隐藏文本候选者。我们还提出了无条件令牌强制混淆（UTFC），这是一种防御范式，使隐藏文本对所有已知的提取攻击都具有抵抗力，而与标准微调相比不会降低LLMs的总体性能。UTFC既有善意应用（改进LLM指纹识别），也有恶意应用（使用LLMs创建隐蔽通信渠道）。

更新时间: 2024-09-24 12:00:29

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2406.02481v4

Improvements to SDXL in NovelAI Diffusion V3

In this technical report, we document the changes we made to SDXL in the process of training NovelAI Diffusion V3, our state of the art anime image generation model.

Updated: 2024-09-24 11:57:12

标题: 《在NovelAI Diffusion V3中对SDXL的改进》

摘要: 在这份技术报告中，我们记录了在训练NovelAI Diffusion V3（我们最先进的动漫图像生成模型）过程中对SDXL所做的更改。

更新时间: 2024-09-24 11:57:12

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.15997v1

Low-Energy On-Device Personalization for MCUs

Microcontroller Units (MCUs) are ideal platforms for edge applications due to their low cost and energy consumption, and are widely used in various applications, including personalized machine learning tasks, where customized models can enhance the task adaptation. However, existing approaches for local on-device personalization mostly support simple ML architectures or require complex local pre-training/training, leading to high energy consumption and negating the low-energy advantage of MCUs. In this paper, we introduce $MicroT$, an efficient and low-energy MCU personalization approach. $MicroT$ includes a robust, general, but tiny feature extractor, developed through self-supervised knowledge distillation, which trains a task-specific head to enable independent on-device personalization with minimal energy and computational requirements. MicroT implements an MCU-optimized early-exit inference mechanism called stage-decision to further reduce energy costs. This mechanism allows for user-configurable exit criteria (stage-decision ratio) to adaptively balance energy cost with model performance. We evaluated MicroT using two models, three datasets, and two MCU boards. $MicroT$ outperforms traditional transfer learning (TTL) and two SOTA approaches by 2.12 - 11.60% across two models and three datasets. Targeting widely used energy-aware edge devices, MicroT's on-device training requires no additional complex operations, halving the energy cost compared to SOTA approaches by up to 2.28$\times$ while keeping SRAM usage below 1MB. During local inference, MicroT reduces energy cost by 14.17% compared to TTL across two boards and two datasets, highlighting its suitability for long-term use on energy-aware resource-constrained MCUs.

Updated: 2024-09-24 11:46:09

标题: 低功耗的MCU设备个性化

摘要: Microcontroller Units (MCUs) are ideal platforms for edge applications due to their low cost and energy consumption, and are widely used in various applications, including personalized machine learning tasks, where customized models can enhance task adaptation. However, existing approaches for local on-device personalization mostly support simple ML architectures or require complex local pre-training/training, leading to high energy consumption and negating the low-energy advantage of MCUs. In this paper, we introduce MicroT, an efficient and low-energy MCU personalization approach. MicroT includes a robust, general, but tiny feature extractor, developed through self-supervised knowledge distillation, which trains a task-specific head to enable independent on-device personalization with minimal energy and computational requirements. MicroT implements an MCU-optimized early-exit inference mechanism called stage-decision to further reduce energy costs. This mechanism allows for user-configurable exit criteria (stage-decision ratio) to adaptively balance energy cost with model performance. We evaluated MicroT using two models, three datasets, and two MCU boards. MicroT outperforms traditional transfer learning (TTL) and two state-of-the-art approaches by 2.12 - 11.60% across two models and three datasets. Targeting widely used energy-aware edge devices, MicroT's on-device training requires no additional complex operations, halving the energy cost compared to state-of-the-art approaches by up to 2.28 times while keeping SRAM usage below 1MB. During local inference, MicroT reduces energy cost by 14.17% compared to TTL across two boards and two datasets, highlighting its suitability for long-term use on energy-aware resource-constrained MCUs.

更新时间: 2024-09-24 11:46:09

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2403.08040v3

PACE: Poisoning Attacks on Learned Cardinality Estimation

Cardinality estimation (CE) plays a crucial role in database optimizer. We have witnessed the emergence of numerous learned CE models recently which can outperform traditional methods such as histograms and samplings. However, learned models also bring many security risks. For example, a query-driven learned CE model learns a query-to-cardinality mapping based on the historical workload. Such a learned model could be attacked by poisoning queries, which are crafted by malicious attackers and woven into the historical workload, leading to performance degradation of CE. In this paper, we explore the potential security risks in learned CE and study a new problem of poisoning attacks on learned CE in a black-box setting. Experiments show that PACE reduces the accuracy of the learned CE models by 178 times, leading to a 10 times decrease in the end-to-end performance of the target database.

Updated: 2024-09-24 11:45:23

标题: PACE：对学习基数估计的毒害攻击

摘要: 基数估计（CE）在数据库优化器中起着至关重要的作用。最近我们目睹了许多学习的CE模型的出现，这些模型可以胜过传统方法，如直方图和抽样。然而，学习模型也带来了许多安全风险。例如，一个以查询驱动的学习CE模型基于历史工作负载学习查询到基数的映射。这样一个学习模型可能会受到毒化查询的攻击，这些查询是由恶意攻击者精心制作的，并编织进历史工作负载，导致CE性能下降。在本文中，我们探讨学习CE中潜在的安全风险，并研究一个新的问题，即在黑匣子设置中对学习CE的毒化攻击。实验证明，PACE将学习CE模型的准确性降低了178倍，导致目标数据库的端到端性能减少了10倍。

更新时间: 2024-09-24 11:45:23

领域: cs.DB,cs.CR

下载: http://arxiv.org/abs/2409.15990v1

Semi-strong Efficient Market of Bitcoin and Twitter: an Analysis of Semantic Vector Spaces of Extracted Keywords and Light Gradient Boosting Machine Models

This study extends the examination of the Efficient-Market Hypothesis in Bitcoin market during a five year fluctuation period, from September 1 2017 to September 1 2022, by analyzing 28,739,514 qualified tweets containing the targeted topic "Bitcoin". Unlike previous studies, we extracted fundamental keywords as an informative proxy for carrying out the study of the EMH in the Bitcoin market rather than focusing on sentiment analysis, information volume, or price data. We tested market efficiency in hourly, 4-hourly, and daily time periods to understand the speed and accuracy of market reactions towards the information within different thresholds. A sequence of machine learning methods and textual analyses were used, including measurements of distances of semantic vector spaces of information, keywords extraction and encoding model, and Light Gradient Boosting Machine (LGBM) classifiers. Our results suggest that 78.06% (83.08%), 84.63% (87.77%), and 94.03% (94.60%) of hourly, 4-hourly, and daily bullish (bearish) market movements can be attributed to public information within organic tweets.

Updated: 2024-09-24 11:42:23

标题: 比特币和Twitter的半强有效市场：对提取关键词的语义向量空间和轻梯度提升机模型的分析

摘要: 这项研究通过分析包含目标主题“比特币”的28,739,514条合格推文，延伸了对比特币市场中有效市场假说的研究，研究涵盖了为期五年的波动周期，从2017年9月1日到2022年9月1日。与先前的研究不同，我们提取了基本关键词作为进行比特币市场有效市场假设研究的信息性代理，而不是专注于情绪分析、信息量或价格数据。我们测试了每小时、每4小时和每日时间段内市场效率，以了解市场对不同阈值内信息的反应速度和准确性。我们使用了一系列机器学习方法和文本分析，包括测量信息语义向量空间的距离、关键词提取和编码模型，以及Light Gradient Boosting Machine (LGBM)分类器。我们的结果表明，每小时、每4小时和每日牛市（熊市）运动的78.06%（83.08%）、84.63%（87.77%）和94.03%（94.60%）可以归因于有机推文中的公共信息。

更新时间: 2024-09-24 11:42:23

领域: econ.GN,cs.LG,q-fin.EC

下载: http://arxiv.org/abs/2409.15988v1

Exploring the Impact of Outlier Variability on Anomaly Detection Evaluation Metrics

Anomaly detection is a dynamic field, in which the evaluation of models plays a critical role in understanding their effectiveness. The selection and interpretation of the evaluation metrics are pivotal, particularly in scenarios with varying amounts of anomalies. This study focuses on examining the behaviors of three widely used anomaly detection metrics under different conditions: F1 score, Receiver Operating Characteristic Area Under Curve (ROC AUC), and Precision-Recall Curve Area Under Curve (AUCPR). Our study critically analyzes the extent to which these metrics provide reliable and distinct insights into model performance, especially considering varying levels of outlier fractions and contamination thresholds in datasets. Through a comprehensive experimental setup involving widely recognized algorithms for anomaly detection, we present findings that challenge the conventional understanding of these metrics and reveal nuanced behaviors under varying conditions. We demonstrated that while the F1 score and AUCPR are sensitive to outlier fractions, the ROC AUC maintains consistency and is unaffected by such variability. Additionally, under conditions of a fixed outlier fraction in the test set, we observe an alignment between ROC AUC and AUCPR, indicating that the choice between these two metrics may be less critical in such scenarios. The results of our study contribute to a more refined understanding of metric selection and interpretation in anomaly detection, offering valuable insights for both researchers and practitioners in the field.

Updated: 2024-09-24 11:39:09

标题: 探讨异常值变异性对异常检测评估指标的影响

摘要: 异常检测是一个动态领域，在其中模型评估在理解其有效性方面起关键作用。选择和解释评估指标在不同数量异常情况下尤为关键。本研究侧重于在不同条件下检查三种广泛使用的异常检测指标的行为：F1分数，接收器操作特征曲线下面积（ROC AUC）和精确-召回曲线下面积（AUCPR）。我们的研究批判性地分析这些指标在不同异常分数和数据集污染阈值方面提供可靠和独特见解的程度。通过涉及异常检测中广泛认可的算法的全面实验设置，我们呈现了挑战这些指标传统理解并揭示在不同条件下微妙行为的发现。我们证明，虽然F1分数和AUCPR对异常分数敏感，但ROC AUC保持一致性，不受此种变化影响。此外，在测试集中固定异常分数的情况下，我们观察到ROC AUC和AUCPR之间的一致性，表明在这种情况下选择这两个指标之间可能不那么关键。我们研究的结果有助于更精细地理解异常检测中的指标选择和解释，为该领域的研究人员和从业者提供宝贵见解。

更新时间: 2024-09-24 11:39:09

领域: cs.LG

下载: http://arxiv.org/abs/2409.15986v1

DataGpt-SQL-7B: An Open-Source Language Model for Text-to-SQL

In addressing the pivotal role of translating natural language queries into SQL commands, we propose a suite of compact, fine-tuned models and self-refine mechanisms to democratize data access and analysis for non-expert users, mitigating risks associated with closed-source Large Language Models. Specifically, we constructed a dataset of over 20K sample for Text-to-SQL as well as the preference dateset, to improve the efficiency in the domain of SQL generation. To further ensure code validity, a code corrector was integrated into the model. Our system, DataGpt-sql, achieved 87.2\% accuracy on the spider-dev, respectively, showcasing the effectiveness of our solution in text-to-SQL conversion tasks. Our code, data, and models are available at \url{https://github.com/CainiaoTechAi/datagpt-sql-7b}

Updated: 2024-09-24 11:38:08

标题: DataGpt-SQL-7B: 一个用于文本到SQL的开源语言模型

摘要: 在解决将自然语言查询转换为SQL命令的关键作用时，我们提出了一套紧凑，经过精细调整的模型和自我完善机制，以使非专业用户能够访问和分析数据，减轻与封闭源大型语言模型相关的风险。具体来说，我们构建了一个包含超过20K个样本的Text-to-SQL数据集以及偏好数据集，以提高SQL生成领域的效率。为了进一步确保代码的有效性，我们将代码校正器集成到模型中。我们的系统DataGpt-sql在spider-dev上实现了87.2%的准确率，展示了我们的解决方案在文本到SQL转换任务中的有效性。我们的代码、数据和模型可在\url{https://github.com/CainiaoTechAi/datagpt-sql-7b} 上找到。

更新时间: 2024-09-24 11:38:08

领域: cs.AI

下载: http://arxiv.org/abs/2409.15985v1

Audio Editing with Non-Rigid Text Prompts

In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in-painting. We quantitatively and qualitatively show that the edits are able to obtain results which outperform Audio-LDM, a recently released text-prompted audio generation model. Qualitative inspection of the results points out that the edits given by our approach remain more faithful to the input audio in terms of keeping the original onsets and offsets of the audio events.

Updated: 2024-09-24 11:25:49

标题: 使用非刚性文本提示进行音频编辑

摘要: 在这篇论文中，我们探讨了使用非刚性文本编辑进行音频编辑。我们展示了所提出的编辑流程能够创建保持与输入音频一致的音频编辑。我们探索了执行添加、风格转移和修补的文本提示。我们定量和定性地展示了这些编辑能够获得优于最近发布的文本提示音频生成模型Audio-LDM的结果。对结果的定性检查指出，我们的方法给出的编辑在保持音频事件的原始起始和结束点方面更忠实于输入音频。

更新时间: 2024-09-24 11:25:49

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2310.12858v3

Leveraging Unsupervised Learning for Cost-Effective Visual Anomaly Detection

Traditional machine learning-based visual inspection systems require extensive data collection and repetitive model training to improve accuracy. These systems typically require expensive camera, computing equipment and significant machine learning expertise, which can substantially burden small and medium-sized enterprises. This study explores leveraging unsupervised learning methods with pre-trained models and low-cost hardware to create a cost-effective visual anomaly detection system. The research aims to develop a low-cost visual anomaly detection solution that uses minimal data for model training while maintaining generalizability and scalability. The system utilises unsupervised learning models from Anomalib and is deployed on affordable Raspberry Pi hardware through openVINO. The results show that this cost-effective system can complete anomaly defection training and inference on a Raspberry Pi in just 90 seconds using only 10 normal product images, achieving an F1 macro score exceeding 0.95. While the system is slightly sensitive to environmental changes like lighting, product positioning, or background, it remains a swift and economical method for factory automation inspection for small and medium-sized manufacturers

Updated: 2024-09-24 11:22:24

标题: 利用无监督学习进行成本效益的视觉异常检测

摘要: 传统的基于机器学习的视觉检测系统需要大量的数据收集和重复的模型训练来提高准确性。这些系统通常需要昂贵的摄像头、计算设备和大量的机器学习专业知识，这可能会给中小企业造成重大负担。本研究探讨了利用预训练模型和低成本硬件结合无监督学习方法，创建一种经济实惠的视觉异常检测系统。该研究旨在开发一种低成本的视觉异常检测解决方案，使用最少的数据进行模型训练，同时保持通用性和可扩展性。该系统利用 Anomalib 的无监督学习模型，并通过 openVINO 部署在价格实惠的 Raspberry Pi 硬件上。结果表明，这种经济实惠的系统可以在仅使用 10 张正常产品图像的情况下，在 Raspberry Pi 上完成异常检测训练和推断，仅需 90 秒，实现了超过 0.95 的 F1 宏分数。虽然该系统对环境变化（如光线、产品位置或背景）稍微敏感，但仍然是中小型制造商进行工厂自动化检测的一种快速经济的方法。

更新时间: 2024-09-24 11:22:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.15980v1

tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models

Contrastive Language-Audio Pretraining (CLAP) became of crucial importance in the field of audio and speech processing. Its employment ranges from sound event detection to text-to-audio generation. However, one of the main limitations is the considerable amount of data required in the training process and the overall computational complexity during inference. This paper investigates how we can reduce the complexity of contrastive language-audio pre-trained models, yielding an efficient model that we call tinyCLAP. We derive an unimodal distillation loss from first principles and explore how the dimensionality of the shared, multimodal latent space can be reduced via pruning. TinyCLAP uses only 6% of the original Microsoft CLAP parameters with a minimal reduction (less than 5%) in zero-shot classification performance across the three sound event detection datasets on which it was tested

Updated: 2024-09-24 11:22:04

标题: tinyCLAP：提炼对比语音-语言预训练模型

摘要: 对比语言-音频预训练（CLAP）在音频和语音处理领域变得至关重要。它的应用范围从声音事件检测到文本到音频生成。然而，主要限制之一是在训练过程中所需的大量数据和推理过程中的整体计算复杂性。本文研究了如何减少对比语言-音频预训练模型的复杂性，得到一个称为tinyCLAP的高效模型。我们从第一原理推导出了单模态蒸馏损失，并探讨了如何通过修剪来减少共享的多模态潜在空间的维度。TinyCLAP仅使用原始Microsoft CLAP参数的6％，在三个声音事件检测数据集上测试时，零-shot分类性能仅减少了不到5％。

更新时间: 2024-09-24 11:22:04

领域: cs.SD,cs.CL,cs.LG,eess.AS

下载: http://arxiv.org/abs/2311.14517v3

Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification

There has been an increasing research interest in cross-age speaker verification~(CASV). However, existing speaker verification systems perform poorly in CASV due to the great individual differences in voice caused by aging. In this paper, we propose a disentangled representation learning framework for CASV based on mutual information~(MI) minimization. In our method, a backbone model is trained to disentangle the identity- and age-related embeddings from speaker information, and an MI estimator is trained to minimize the correlation between age- and identity-related embeddings via MI minimization, resulting in age-invariant speaker embeddings. Furthermore, by using the age gaps between positive and negative samples, we propose an aging-aware MI minimization loss function that allows the backbone model to focus more on the vocal changes with large age gaps. Experimental results show that the proposed method outperforms other methods on multiple Cross-Age test sets of Vox-CA.

Updated: 2024-09-24 11:08:23

标题: 使用互信息最小化方法解开年龄和身份之间的联系，用于跨年龄说话者验证

摘要: 近年来，对跨年龄说话者验证（CASV）的研究兴趣逐渐增加。然而，由于年龄造成的声音个体差异很大，现有的说话者验证系统在CASV中表现不佳。在本文中，我们提出了一个基于互信息（MI）最小化的CASV的解耦表示学习框架。在我们的方法中，一个骨干模型被训练来解耦身份和年龄相关的嵌入，而一个MI估计器被训练来通过MI最小化来最小化年龄和身份相关的嵌入之间的相关性，从而产生年龄不变的说话者嵌入。此外，通过使用正样本和负样本之间的年龄差距，我们提出了一个考虑年龄的MI最小化损失函数，允许骨干模型更多地关注具有较大年龄差距的声音变化。实验结果表明，我们提出的方法在Vox-CA的多个跨年龄测试集上优于其他方法。

更新时间: 2024-09-24 11:08:23

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2409.15974v1

Edge-device Collaborative Computing for Multi-view Classification

Motivated by the proliferation of Internet-of-Thing (IoT) devices and the rapid advances in the field of deep learning, there is a growing interest in pushing deep learning computations, conventionally handled by the cloud, to the edge of the network to deliver faster responses to end users, reduce bandwidth consumption to the cloud, and address privacy concerns. However, to fully realize deep learning at the edge, two main challenges still need to be addressed: (i) how to meet the high resource requirements of deep learning on resource-constrained devices, and (ii) how to leverage the availability of multiple streams of spatially correlated data, to increase the effectiveness of deep learning and improve application-level performance. To address the above challenges, we explore collaborative inference at the edge, in which edge nodes and end devices share correlated data and the inference computational burden by leveraging different ways to split computation and fuse data. Besides traditional centralized and distributed schemes for edge-end device collaborative inference, we introduce selective schemes that decrease bandwidth resource consumption by effectively reducing data redundancy. As a reference scenario, we focus on multi-view classification in a networked system in which sensing nodes can capture overlapping fields of view. The proposed schemes are compared in terms of accuracy, computational expenditure at the nodes, communication overhead, inference latency, robustness, and noise sensitivity. Experimental results highlight that selective collaborative schemes can achieve different trade-offs between the above performance metrics, with some of them bringing substantial communication savings (from 18% to 74% of the transmitted data with respect to centralized inference) while still keeping the inference accuracy well above 90%.

Updated: 2024-09-24 11:07:33

标题: 边缘设备协作计算用于多视角分类

摘要: 由于物联网（IoT）设备的激增和深度学习领域的快速进展，人们越来越关注将传统由云处理的深度学习计算推动到网络边缘，以为终端用户提供更快的响应速度，减少对云端带宽的消耗，并解决隐私问题。然而，要充分实现边缘深度学习，仍然需要解决两个主要挑战：（i）如何满足资源受限设备上深度学习的高资源需求，以及（ii）如何利用空间相关数据的多个流，提高深度学习的有效性并改善应用程序级性能。为了解决上述挑战，我们探索了边缘协作推理，在这种情况下，边缘节点和终端设备通过利用不同的分割计算和融合数据的方式来共享相关数据和推理计算负担。除了传统的用于边缘-终端设备协作推理的集中和分布式方案之外，我们还引入了降低带宽资源消耗的选择性方案，有效减少数据冗余。作为参考场景，我们专注于在网络系统中进行多视图分类，在该系统中，感知节点可以捕获重叠的视野。所提出的方案根据准确性、节点的计算开销、通信开销、推理延迟、鲁棒性和噪声敏感性进行比较。实验结果表明，选择性协作方案可以在上述性能指标之间取得不同的平衡，其中一些方案可以实现相当大的通信节省（相对于集中推理，传输数据量从18％到74％），同时仍然保持推理准确性远远超过90％。

更新时间: 2024-09-24 11:07:33

领域: cs.LG,cs.AI,cs.DC,cs.NI

下载: http://arxiv.org/abs/2409.15973v1

Logical Characterizations of Recurrent Graph Neural Networks with Reals and Floats

In pioneering work from 2019, Barcel\'o and coauthors identified logics that precisely match the expressive power of constant iteration-depth graph neural networks (GNNs) relative to properties definable in first-order logic. In this article, we give exact logical characterizations of recurrent GNNs in two scenarios: (1) in the setting with floating-point numbers and (2) with reals. For floats, the formalism matching recurrent GNNs is a rule-based modal logic with counting, while for reals we use a suitable infinitary modal logic, also with counting. These results give exact matches between logics and GNNs in the recurrent setting without relativising to a background logic in either case, but using some natural assumptions about floating-point arithmetic. Applying our characterizations, we also prove that, relative to graph properties definable in monadic second-order logic (MSO), our infinitary and rule-based logics are equally expressive. This implies that recurrent GNNs with reals and floats have the same expressive power over MSO-definable properties and shows that, for such properties, also recurrent GNNs with reals are characterized by a (finitary!) rule-based modal logic. In the general case, in contrast, the expressive power with floats is weaker than with reals. In addition to logic-oriented results, we also characterize recurrent GNNs, with both reals and floats, via distributed automata, drawing links to distributed computing models.

Updated: 2024-09-24 11:06:21

标题: 具有实数和浮点数的循环图神经网络的逻辑特征描述

摘要: 在2019年的开创性研究中，Barcel\'o和合著者确定了与一阶逻辑可定义属性相匹配的常数迭代深度图神经网络（GNNs）的表达能力。在本文中，我们在两种情景下对递归GNNs进行了精确的逻辑特征描述：（1）在具有浮点数的情境中；（2）在具有实数的情境中。对于浮点数，与递归GNNs匹配的形式化语言是具有计数的基于规则的模态逻辑，而对于实数，我们使用了适当的无穷模态逻辑，同样也具有计数。这些结果在递归情境下精确地匹配了逻辑和GNNs，而无需相对于背景逻辑进行规范化，但是使用了一些关于浮点算术的自然假设。应用我们的特征描述，我们还证明了相对于在单调二阶逻辑（MSO）中可定义的图属性，我们的无穷和基于规则的逻辑表达能力相等。这意味着具有实数和浮点数的递归GNNs在MSO可定义属性上具有相同的表达能力，并且表明对于这些属性，具有实数的递归GNNs也由（有限的！）基于规则的模态逻辑所表征。相比之下，一般情况下，具有浮点数的表达能力要弱于具有实数的情况。除了逻辑方面的结果，我们还通过分布式自动机对具有实数和浮点数的递归GNNs进行了特征化，从而与分布式计算模型建立联系。

更新时间: 2024-09-24 11:06:21

领域: cs.LO,cs.AI,F.4.1; F.1.1; I.2.0

下载: http://arxiv.org/abs/2405.14606v3

Creating Healthy Friction: Determining Stakeholder Requirements of Job Recommendation Explanations

The increased use of information retrieval in recruitment, primarily through job recommender systems (JRSs), can have a large impact on job seekers, recruiters, and companies. As a result, such systems have been determined to be high-risk in recent legislature. This requires JRSs to be trustworthy and transparent, allowing stakeholders to understand why specific recommendations were made. To fulfill this requirement, the stakeholders' exact preferences and needs need to be determined. To do so, we evaluated an explainable job recommender system using a realistic, task-based, mixed-design user study (n=30) in which stakeholders had to make decisions based on the model's explanations. This mixed-methods evaluation consisted of two objective metrics - correctness and efficiency, along with three subjective metrics - trust, transparency, and usefulness. These metrics were evaluated twice per participant, once using real explanations and once using random explanations. The study included a qualitative analysis following a think-aloud protocol while performing tasks adapted to each stakeholder group. We find that providing stakeholders with real explanations does not significantly improve decision-making speed and accuracy. Our results showed a non-significant trend for the real explanations to outperform the random ones on perceived trust, usefulness, and transparency of the system for all stakeholder types. We determine that stakeholders benefit more from interacting with explanations as decision support capable of providing healthy friction, rather than as previously-assumed persuasive tools.

Updated: 2024-09-24 11:03:17

标题: 创建健康的摩擦：确定工作推荐解释的利益相关者需求

摘要: 招聘过程中信息检索的增加，主要通过职位推荐系统（JRSs），可能对求职者、招聘者和公司产生巨大影响。因此，最近的立法确定这类系统存在高风险。这要求JRSs必须值得信赖和透明，让利益相关者了解为何会做出特定推荐。为了满足这一要求，需要确定利益相关者的确切偏好和需求。为此，我们使用一个基于任务的、混合设计用户研究（n=30）来评估一个可解释的职位推荐系统，利益相关者必须根据模型的解释做出决策。这种混合方法评估包括两个客观指标-正确性和效率，以及三个主观指标-信任、透明度和有用性。这些指标每位参与者评估两次，一次使用真实解释，一次使用随机解释。研究包括一个按思考出声协议进行的定性分析，同时执行适应每个利益相关者群体的任务。我们发现，提供利益相关者真实解释并没有显著提高决策速度和准确性。我们的结果显示，对于所有类型的利益相关者，真实解释在受信任度、有用性和系统透明度方面的表现优于随机解释，但这种优势趋势并不显著。我们确定，利益相关者更能从与解释互动中受益，作为提供健康摩擦的决策支持，而不是作为先前假设的说服工具。

更新时间: 2024-09-24 11:03:17

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2409.15971v1

More Consideration for the Perceptron

In this paper, we introduce the gated perceptron, an enhancement of the conventional perceptron, which incorporates an additional input computed as the product of the existing inputs. This allows the perceptron to capture non-linear interactions between features, significantly improving its ability to classify and regress on complex datasets. We explore its application in both linear and non-linear regression tasks using the Iris dataset, as well as binary and multi-class classification problems, including the PIMA Indian dataset and Breast Cancer Wisconsin dataset. Our results demonstrate that the gated perceptron can generate more distinct decision regions compared to traditional perceptrons, enhancing its classification capabilities, particularly in handling non-linear data. Performance comparisons show that the gated perceptron competes with state-of-the-art classifiers while maintaining a simple architecture.

Updated: 2024-09-24 10:57:14

标题: 对感知器的更多考虑

摘要: 在这篇论文中，我们介绍了门控感知器，这是传统感知器的一种增强，它包含一个额外的输入，计算为现有输入的乘积。这使感知器能够捕捉特征之间的非线性交互作用，显著提高了其在复杂数据集上分类和回归的能力。我们在使用鸢尾花数据集进行线性和非线性回归任务以及二元和多类分类问题时探索了其应用，包括PIMA印第安数据集和威斯康星乳腺癌数据集。我们的结果表明，与传统感知器相比，门控感知器可以生成更明显的决策区域，增强了其分类能力，特别是在处理非线性数据时。性能比较表明，门控感知器在保持简单结构的同时与最先进的分类器竞争。

更新时间: 2024-09-24 10:57:14

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2409.13854v2

Adaptive joint distribution learning

We develop a new framework for estimating joint probability distributions using tensor product reproducing kernel Hilbert spaces (RKHS). Our framework accommodates a low-dimensional, normalized and positive model of a Radon--Nikodym derivative, which we estimate from sample sizes of up to several millions, alleviating the inherent limitations of RKHS modeling. Well-defined normalized and positive conditional distributions are natural by-products to our approach. Our proposal is fast to compute and accommodates learning problems ranging from prediction to classification. Our theoretical findings are supplemented by favorable numerical results.

Updated: 2024-09-24 10:56:04

标题: 自适应联合分布学习

摘要: 我们开发了一个新的框架，用于使用张量积再生核希尔伯特空间(RKHS)估计联合概率分布。我们的框架容纳了一个低维、归一化和正模型的Radon-Nikodym导数，我们可以从数百万个样本中估计出来，缓解了RKHS建模的固有限制。我们的方法很自然地产生了明确定义的归一化和正条件分布。我们的提议计算速度快，适用于从预测到分类的学习问题。我们的理论发现得到了有利的数值结果的支持。

更新时间: 2024-09-24 10:56:04

领域: stat.ML,cs.LG,cs.NA,math.NA,65D05, 65D15, 62G07

下载: http://arxiv.org/abs/2110.04829v5

Provably Efficient Exploration in Inverse Constrained Reinforcement Learning

To obtain the optimal constraints in complex environments, Inverse Constrained Reinforcement Learning (ICRL) seeks to recover these constraints from expert demonstrations in a data-driven manner. Existing ICRL algorithms collect training samples from an interactive environment. However, the efficacy and efficiency of these sampling strategies remain unknown. To bridge this gap, we introduce a strategic exploration framework with provable efficiency. Specifically, we define a feasible constraint set for ICRL problems and investigate how expert policy and environmental dynamics influence the optimality of constraints. Motivated by our findings, we propose two exploratory algorithms to achieve efficient constraint inference via 1) dynamically reducing the bounded aggregate error of cost estimation and 2) strategically constraining the exploration policy. Both algorithms are theoretically grounded with tractable sample complexity. We empirically demonstrate the performance of our algorithms under various environments.

Updated: 2024-09-24 10:48:13

标题: 在逆约束强化学习中的可证明有效的探索

摘要: 为了在复杂环境中获得最佳约束条件，逆向约束强化学习（ICRL）寻求以数据驱动的方式从专家演示中恢复这些约束条件。现有的ICRL算法从交互式环境中收集训练样本。然而，这些采样策略的有效性和效率仍然未知。为了弥补这一差距，我们引入了一个具有可证明效率的战略探索框架。具体而言，我们为ICRL问题定义了一个可行的约束集，并研究了专家策略和环境动态如何影响约束的最优性。在我们的发现的基础上，我们提出了两种探索算法，通过动态减少成本估计的有界聚合误差和策略性地约束探索策略来实现高效的约束推断。这两种算法都在可处理的样本复杂性的理论基础上。我们在各种环境下经验性地展示了我们算法的性能。

更新时间: 2024-09-24 10:48:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.15963v1

ASD-Diffusion: Anomalous Sound Detection with Diffusion Models

Unsupervised Anomalous Sound Detection (ASD) aims to design a generalizable method that can be used to detect anomalies when only normal sounds are given. In this paper, Anomalous Sound Detection based on Diffusion Models (ASD-Diffusion) is proposed for ASD in real-world factories. In our pipeline, the anomalies in acoustic features are reconstructed from their noisy corrupted features into their approximate normal pattern. Secondly, a post-processing anomalies filter algorithm is proposed to detect anomalies that exhibit significant deviation from the original input after reconstruction. Furthermore, denoising diffusion implicit model is introduced to accelerate the inference speed by a longer sampling interval of the denoising process. The proposed method is innovative in the application of diffusion models as a new scheme. Experimental results on the development set of DCASE 2023 challenge task 2 outperform the baseline by 7.75%, demonstrating the effectiveness of the proposed method.

Updated: 2024-09-24 10:42:23

标题: ASD-Diffusion：使用扩散模型进行异常声音检测

摘要: 无监督异常声音检测（ASD）旨在设计一种通用方法，仅在给定正常声音时就能检测异常。本文提出了基于扩散模型的异常声音检测（ASD-Diffusion）方法，用于在现实世界的工厂中进行ASD。在我们的流程中，声学特征中的异常被从其有噪声干扰的特征中重建，使其近似正常模式。其次，提出了一个后处理异常过滤算法，用于检测在重建后与原始输入有显著偏差的异常。此外，引入了去噪扩散隐式模型，通过更长的去噪过程的采样间隔加速推理速度。该方法在扩散模型的应用上具有创新性。在DCASE 2023挑战任务2的开发集上的实验结果优于基线7.75％，证明了所提出方法的有效性。

更新时间: 2024-09-24 10:42:23

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2409.15957v1

Historical Trajectory Assisted Zeroth-Order Federated Optimization

Federated learning is a distributed learning framework which enables clients to train models individually and to upload their model updates for aggregation. The local training process heavily relies on distributed gradient descent techniques. In the situation where gradient information is not available, the gradients need to be estimated from zeroth-order information, which typically involves computing finite-differences along isotropic random directions. This method suffers from high estimation errors, as the geometric features of the objective landscape may be overlooked during the isotropic sampling. In this work, we propose a non-isotropic sampling method to improve the gradient estimation procedure. Gradients in our method are estimated in a subspace spanned by historical trajectories of solutions, aiming to encourage the exploration of promising regions and hence improve the convergence. We implement this method in zeroth-order federated settings, and show that the convergence rate aligns with existing ones while introducing no significant overheads in communication or local computation. The effectiveness of our proposal is verified on several numerical experiments in comparison to several commonly-used zeroth-order federated optimization algorithms.

Updated: 2024-09-24 10:36:40

标题: 历史轨迹辅助零阶联邦优化

摘要: 联邦学习是一种分布式学习框架，使客户能够独立训练模型并上传其模型更新进行聚合。本地训练过程严重依赖于分布式梯度下降技术。在梯度信息不可用的情况下，梯度需要从零阶信息中估计，通常涉及沿各向同性随机方向计算有限差分。这种方法受到高估计误差的影响，因为在各向同性采样过程中可能会忽略目标景观的几何特征。在这项工作中，我们提出了一种非各向同性采样方法来改善梯度估计过程。我们的方法中的梯度是在由解决方案的历史轨迹张成的子空间中估计的，旨在鼓励探索有前途的区域，从而改善收敛性。我们在零阶联邦设置中实现了这种方法，并展示了收敛速度与现有方法一致，而没有引入重大的通信或本地计算开销。通过与几种常用的零阶联邦优化算法进行比较，我们验证了我们提案的有效性在几个数值实验中。

更新时间: 2024-09-24 10:36:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.15955v1

Predicting Distance matrix with large language models

Structural prediction has long been considered critical in RNA research, especially following the success of AlphaFold2 in protein studies, which has drawn significant attention to the field. While recent advances in machine learning and data accumulation have effectively addressed many biological tasks, particularly in protein related research. RNA structure prediction remains a significant challenge due to data limitations. Obtaining RNA structural data is difficult because traditional methods such as nuclear magnetic resonance spectroscopy, Xray crystallography, and electron microscopy are expensive and time consuming. Although several RNA 3D structure prediction methods have been proposed, their accuracy is still limited. Predicting RNA structural information at another level, such as distance maps, remains highly valuable. Distance maps provide a simplified representation of spatial constraints between nucleotides, capturing essential relationships without requiring a full 3D model. This intermediate level of structural information can guide more accurate 3D modeling and is computationally less intensive, making it a useful tool for improving structural predictions. In this work, we demonstrate that using only primary sequence information, we can accurately infer the distances between RNA bases by utilizing a large pretrained RNA language model coupled with a well trained downstream transformer.

Updated: 2024-09-24 10:28:55

标题: 使用大型语言模型预测距离矩阵

摘要: RNA结构预测长期以来一直被认为是RNA研究中的关键问题，特别是在蛋白质研究中AlphaFold2取得成功后，引起了该领域的广泛关注。尽管最近机器学习和数据积累方面取得了显著进展，尤其是在蛋白质相关研究中，但由于数据限制，RNA结构预测仍然是一个重要挑战。获取RNA结构数据很困难，因为传统方法如核磁共振光谱、X射线晶体学和电子显微镜都昂贵且耗时。虽然已提出了几种RNA 3D结构预测方法，但其准确性仍然有限。在另一个层面上预测RNA结构信息，如距离图，仍然具有很高的价值。距离图提供了核苷酸之间空间约束的简化表示，捕捉了必要的关系而无需完整的3D模型。这种中间层次的结构信息可以指导更准确的3D建模，并且计算量较少，是改进结构预测的有用工具。在这项工作中，我们展示了只使用主要序列信息，通过利用一个大型预训练RNA语言模型和一个训练良好的下游变压器，可以准确推断RNA碱基之间的距离。

更新时间: 2024-09-24 10:28:55

领域: q-bio.BM,cs.CV,cs.LG,q-fin.CP

下载: http://arxiv.org/abs/2409.16333v1

TSFeatLIME: An Online User Study in Enhancing Explainability in Univariate Time Series Forecasting

Time series forecasting, while vital in various applications, often employs complex models that are difficult for humans to understand. Effective explainable AI techniques are crucial to bridging the gap between model predictions and user understanding. This paper presents a framework - TSFeatLIME, extending TSLIME, tailored specifically for explaining univariate time series forecasting. TSFeatLIME integrates an auxiliary feature into the surrogate model and considers the pairwise Euclidean distances between the queried time series and the generated samples to improve the fidelity of the surrogate models. However, the usefulness of such explanations for human beings remains an open question. We address this by conducting a user study with 160 participants through two interactive interfaces, aiming to measure how individuals from different backgrounds can simulate or predict model output changes in the treatment group and control group. Our results show that the surrogate model under the TSFeatLIME framework is able to better simulate the behaviour of the black-box considering distance, without sacrificing accuracy. In addition, the user study suggests that the explanations were significantly more effective for participants without a computer science background.

Updated: 2024-09-24 10:24:53

标题: TSFeatLIME：在线用户研究中提高单变量时间序列预测可解释性

摘要: 时间序列预测，在各种应用中至关重要，通常采用人类难以理解的复杂模型。有效的可解释人工智能技术对于弥合模型预测和用户理解之间的差距至关重要。本文提出了一个框架 - TSFeatLIME，扩展了TSLIME，专门用于解释单变量时间序列预测。TSFeatLIME将一个辅助特征集成到替代模型中，并考虑了查询时间序列和生成样本之间的配对欧氏距离，以提高替代模型的准确性。然而，这种解释对于人类的实用性仍然是一个未解之谜。我们通过两个交互式界面进行了一项涉及160名参与者的用户研究，旨在衡量不同背景的个体如何在处理组和对照组中模拟或预测模型输出变化。我们的结果表明，在TSFeatLIME框架下的替代模型能够更好地模拟黑盒的行为，同时不损害准确性。此外，用户研究表明，对于没有计算机科学背景的参与者，解释显著更为有效。

更新时间: 2024-09-24 10:24:53

领域: cs.AI

下载: http://arxiv.org/abs/2409.15950v1

Vulnerabilities that arise from poor governance in Distributed Ledger Technologies

Current implementations of governance in Distributed Ledger Technologies leave them susceptible to a number of attacks. We survey the state of the art of Distributed Ledger Technologies (DLTs) governance protocols and work carried out to systematise good governance properties in the context of DLTs. We then select the most appropriate taxonomy of good governance properties and point to formal security notions that good governance protocols should satisfy. We point practitioners to existing solutions to deliver them, where possible. Furthermore, we outline a number of vulnerabilities that arise in the absence of good governance properties. We call on the research community and DLT research practitioners to prioritise delivering these good governance properties and continue to develop tools to do so, to avoid attacks to DLT protocols that exploit their poor governance models.

Updated: 2024-09-24 10:19:00

标题: 分布式账本技术中由于糟糕治理而产生的漏洞

摘要: 目前在分布式账本技术中对治理的实施存在多种攻击的风险。我们对分布式账本技术（DLTs）治理协议的最新发展进行了调查，并研究了在DLTs背景下系统化治理特性的工作。我们选择了最合适的治理特性分类，并指出了好的治理协议应该满足的形式安全概念。我们指导从业者使用现有解决方案来提供这些特性，尽可能避免攻击。此外，我们概述了在缺乏好的治理特性时会出现的一些漏洞。我们呼吁研究界和DLT研究从业者优先提供这些好的治理特性，并继续开发工具，以避免对DLT协议的攻击，这些攻击利用了其不完善的治理模型。

更新时间: 2024-09-24 10:19:00

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2409.15947v1

TPFL: Tsetlin-Personalized Federated Learning with Confidence-Based Clustering

The world of Machine Learning (ML) has witnessed rapid changes in terms of new models and ways to process users data. The majority of work that has been done is focused on Deep Learning (DL) based approaches. However, with the emergence of new algorithms such as the Tsetlin Machine (TM) algorithm, there is growing interest in exploring alternative approaches that may offer unique advantages in certain domains or applications. One of these domains is Federated Learning (FL), in which users privacy is of utmost importance. Due to its novelty, FL has seen a surge in the incorporation of personalization techniques to enhance model accuracy while maintaining user privacy under personalized conditions. In this work, we propose a novel approach called TPFL: Tsetlin-Personalized Federated Learning, in which models are grouped into clusters based on their confidence towards a specific class. In this way, clustering can benefit from two key advantages. Firstly, clients share only what they are confident about, resulting in the elimination of wrongful weight aggregation among clients whose data for a specific class may have not been enough during the training. This phenomenon is prevalent when the data are non-Independent and Identically Distributed (non-IID). Secondly, by sharing only weights towards a specific class, communication cost is substantially reduced, making TPLF efficient in terms of both accuracy and communication cost. The TPFL results were compared with 6 other baseline methods; namely FedAvg, FedProx, FLIS DC, FLIS HC, IFCA and FedTM. The results demonstrated that TPFL performance better than baseline methods with 98.94% accuracy on MNIST, 98.52% accuracy on FashionMNIST and 91.16% accuracy on FEMNIST dataset.

Updated: 2024-09-24 10:08:59

标题: TPFL：具有基于信心的聚类的Tsetlin个性化联邦学习

摘要: 机器学习（ML）领域已经经历了新模型和处理用户数据的方式方面的快速变化。大部分工作都集中在基于深度学习（DL）的方法上。然而，随着Tsetlin Machine（TM）算法等新算法的出现，人们对探索可能在某些领域或应用中提供独特优势的替代方法产生了越来越大的兴趣。其中一个领域是联邦学习（FL），在这个领域中，用户的隐私至关重要。由于其新颖性，FL在个性化技术的引入方面出现了激增，以提高模型准确性同时在个性化条件下保持用户隐私。在本研究中，我们提出了一种新方法称为TPFL：Tsetlin-Personalized Federated Learning，其中模型根据其对特定类别的置信度被分组成簇。通过这种方式，聚类可以从两个关键优势中受益。首先，客户端只分享他们确信的信息，从而消除了在训练过程中数据对于某个特定类别的客户之间错误权重聚合的情况。当数据是非独立同分布（non-IID）时，这种现象十分普遍。其次，通过仅分享对特定类别的权重，通信成本大大降低，使得TPLF在准确性和通信成本方面高效。TPFL的结果与其他6种基准方法进行了比较，分别是FedAvg，FedProx，FLIS DC，FLIS HC，IFCA和FedTM。结果表明，TPFL在MNIST数据集上的准确性达到了98.94％，在FashionMNIST上达到了98.52％，在FEMNIST数据集上达到了91.16％，表现优于基准方法。

更新时间: 2024-09-24 10:08:59

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2409.10392v3

Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement

Large language model agents have exhibited exceptional performance across a range of complex interactive tasks. Recent approaches have utilized tuning with expert trajectories to enhance agent performance, yet they primarily concentrate on outcome rewards, which may lead to errors or suboptimal actions due to the absence of process supervision signals. In this paper, we introduce the Iterative step-level Process Refinement (IPR) framework, which provides detailed step-by-step guidance to enhance agent training. Specifically, we adopt the Monte Carlo method to estimate step-level rewards. During each iteration, the agent explores along the expert trajectory and generates new actions. These actions are then evaluated against the corresponding step of expert trajectory using step-level rewards. Such comparison helps identify discrepancies, yielding contrastive action pairs that serve as training data for the agent. Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines. Moreover, our analytical findings highlight the effectiveness of IPR in augmenting action efficiency and its applicability to diverse models.

Updated: 2024-09-24 10:01:31

标题: 密切关注每一步！通过迭代步级过程细化进行学习的LLM代理

摘要: 大型语言模型代理在各种复杂互动任务中表现出色。最近的方法利用专家轨迹调整以增强代理性能，但它们主要集中在结果奖励上，这可能会导致错误或次优行动，因为缺乏过程监督信号。在本文中，我们介绍了迭代步骤级过程细化（IPR）框架，该框架提供了详细的逐步指导以增强代理训练。具体地，我们采用蒙特卡罗方法来估计步骤级奖励。在每次迭代中，代理沿着专家轨迹探索并生成新动作。然后，这些动作根据步骤级奖励与专家轨迹的相应步骤进行评估。这种比较有助于识别差异，产生对比动作对，作为代理的训练数据。我们对三项复杂代理任务的实验表明，我们的框架优于各种强基线。此外，我们的分析结果强调了IPR在增强行动效率以及其适用于不同模型的有效性。

更新时间: 2024-09-24 10:01:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.11176v2

Numerical determination of the width and shape of the effective string using Stochastic Normalizing Flows

Flow-based architectures have recently proved to be an efficient tool for numerical simulations of Effective String Theories regularized on the lattice that otherwise cannot be efficiently sampled by standard Monte Carlo methods. In this work we use Stochastic Normalizing Flows, a state-of-the-art deep-learning architecture based on non-equilibrium Monte Carlo simulations, to study different effective string models. After testing the reliability of this approach through a comparison with exact results for the Nambu-Got\={o} model, we discuss results on observables that are challenging to study analytically, such as the width of the string and the shape of the flux density. Furthermore, we perform a novel numerical study of Effective String Theories with terms beyond the Nambu-Got\={o} action, including a broader discussion on their significance for lattice gauge theories. These results establish the reliability and feasibility of flow-based samplers for Effective String Theories and pave the way for future applications on more complex models.

Updated: 2024-09-24 09:59:44

标题: 使用随机归一化流数值确定有效弦的宽度和形状

摘要: 基于流的架构最近被证明是一种有效的工具，用于在格点上正则化的有效弦理论的数值模拟，这些模拟通过标准的蒙特卡洛方法无法有效采样。在这项工作中，我们使用随机归一化流，这是一种基于非平衡蒙特卡洛模拟的最先进的深度学习架构，来研究不同的有效弦模型。通过与Nambu-Got\={o}模型的精确结果进行比较，我们测试了这种方法的可靠性，然后讨论了一些在解析上具有挑战性的可观测量，如弦的宽度和磁通密度的形状。此外，我们对包括Nambu-Got\={o}作用之外的项在内的有效弦理论进行了新颖的数值研究，并对它们在格点规范理论中的重要性进行了更广泛的讨论。这些结果建立了基于流的采样器在有效弦理论中的可靠性和可行性，并为未来在更复杂模型上的应用铺平了道路。

更新时间: 2024-09-24 09:59:44

领域: hep-lat,cs.LG,hep-th

下载: http://arxiv.org/abs/2409.15937v1

Automated test generation to evaluate tool-augmented LLMs as conversational AI agents

Tool-augmented LLMs are a promising approach to create AI agents that can have realistic conversations, follow procedures, and call appropriate functions. However, evaluating them is challenging due to the diversity of possible conversations, and existing datasets focus only on single interactions and function-calling. We present a test generation pipeline to evaluate LLMs as conversational AI agents. Our framework uses LLMs to generate diverse tests grounded on user-defined procedures. For that, we use intermediate graphs to limit the LLM test generator's tendency to hallucinate content that is not grounded on input procedures, and enforces high coverage of the possible conversations. Additionally, we put forward ALMITA, a manually curated dataset for evaluating AI agents in customer support, and use it to evaluate existing LLMs. Our results show that while tool-augmented LLMs perform well in single interactions, they often struggle to handle complete conversations. While our focus is on customer support, our method is general and capable of AI agents for different domains.

Updated: 2024-09-24 09:57:43

标题: 自动化测试生成，用于评估工具增强的LLMs作为对话AI代理

摘要: 工具增强的LLMs是创建能够进行逼真对话、遵循流程并调用适当函数的人工智能代理的一个有前途的方法。然而，由于可能对话的多样性，评估它们是具有挑战性的，并且现有数据集仅关注单个交互和函数调用。我们提出了一个测试生成管道来评估LLMs作为对话型人工智能代理。我们的框架使用LLMs生成基于用户定义程序的多样化测试。为此，我们使用中间图来限制LLM测试生成器产生非基于输入程序的内容的倾向，并强制实现可能对话的高覆盖率。此外，我们提出了ALMITA，一个手动策划的用于评估客户支持中人工智能代理的数据集，并使用它来评估现有的LLMs。我们的结果表明，虽然工具增强的LLMs在单个交互中表现良好，但它们往往难以处理完整的对话。尽管我们的重点是客户支持，但我们的方法是通用的，可以为不同领域的人工智能代理提供支持。

更新时间: 2024-09-24 09:57:43

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.15934v1

Towards Ground-truth-free Evaluation of Any Segmentation in Medical Images

We explore the feasibility and potential of building a ground-truth-free evaluation model to assess the quality of segmentations generated by the Segment Anything Model (SAM) and its variants in medical imaging. This evaluation model estimates segmentation quality scores by analyzing the coherence and consistency between the input images and their corresponding segmentation predictions. Based on prior research, we frame the task of training this model as a regression problem within a supervised learning framework, using Dice scores (and optionally other metrics) along with mean squared error to compute the training loss. The model is trained utilizing a large collection of public datasets of medical images with segmentation predictions from SAM and its variants. We name this model EvanySeg (Evaluation of Any Segmentation in Medical Images). Our exploration of convolution-based models (e.g., ResNet) and transformer-based models (e.g., ViT) suggested that ViT yields better performance for this task. EvanySeg can be employed for various tasks, including: (1) identifying poorly segmented samples by detecting low-percentile segmentation quality scores; (2) benchmarking segmentation models without ground truth by averaging quality scores across test samples; (3) alerting human experts to poor-quality segmentation predictions during human-AI collaboration by applying a threshold within the score space; and (4) selecting the best segmentation prediction for each test sample at test time when multiple segmentation models are available, by choosing the prediction with the highest quality score. Models and code will be made available at https://github.com/ahjolsenbics/EvanySeg.

Updated: 2024-09-24 09:56:16

标题: 朝向医学图像中任何分割的无真实标准评估

摘要: 我们探讨了在医学成像中建立一个无地面真值评估模型的可行性和潜力，以评估Segment Anything Model（SAM）及其变体生成的分割质量。该评估模型通过分析输入图像及其相应分割预测之间的连贯性和一致性来估计分割质量分数。基于先前的研究，我们将训练该模型的任务构建为监督学习框架中的回归问题，使用Dice分数（和可选的其他指标）以及均方误差来计算训练损失。该模型利用来自SAM及其变体的医学图像的大量公共数据集进行训练。我们将该模型命名为EvanySeg（Evaluation of Any Segmentation in Medical Images）。我们对基于卷积的模型（例如ResNet）和基于transformer的模型（例如ViT）的探索表明，ViT在此任务中表现更好。EvanySeg可用于各种任务，包括：（1）通过检测低百分位分割质量分数来识别分割不良的样本；（2）通过对测试样本的质量分数取平均来对没有地面真值的分割模型进行基准测试；（3）通过在分数空间内应用阈值，在人工智能协作期间向人类专家警示分割预测质量不佳；以及在测试时间选择每个测试样本的最佳分割预测时，如果有多个分割模型可用，通过选择具有最高质量分数的预测。模型和代码将在https://github.com/ahjolsenbics/EvanySeg 上提供。

更新时间: 2024-09-24 09:56:16

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.14874v2

Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement

This paper introduces a new Dynamic Gated Recurrent Neural Network (DG-RNN) for compute-efficient speech enhancement models running on resource-constrained hardware platforms. It leverages the slow evolution characteristic of RNN hidden states over steps, and updates only a selected set of neurons at each step by adding a newly proposed select gate to the RNN model. This select gate allows the computation cost of the conventional RNN to be reduced during network inference. As a realization of the DG-RNN, we further propose the Dynamic Gated Recurrent Unit (D-GRU) which does not require additional parameters. Test results obtained from several state-of-the-art compute-efficient RNN-based speech enhancement architectures using the DNS challenge dataset, show that the D-GRU based model variants maintain similar speech intelligibility and quality metrics comparable to the baseline GRU based models even with an average 50% reduction in GRU computes.

Updated: 2024-09-24 09:55:47

标题: 动态门控循环神经网络用于高效语音增强

摘要: 本文介绍了一种新的动态门控循环神经网络（DG-RNN），用于在资源受限的硬件平台上运行高效计算的语音增强模型。它利用RNN隐藏状态在步骤中的缓慢演变特性，并通过向RNN模型添加新提出的选择门，在每个步骤仅更新一组选择的神经元。这个选择门允许在网络推断期间减少传统RNN的计算成本。作为DG-RNN的实现，我们进一步提出了不需要额外参数的动态门控循环单元（D-GRU）。使用DNS挑战数据集从几种最先进的高效计算的基于RNN的语音增强架构获得的测试结果表明，基于D-GRU的模型变体保持了类似的语音可懂度和质量指标，即使GRU计算平均减少了50%。

更新时间: 2024-09-24 09:55:47

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2408.12425v2

GPTCast: a weather language model for precipitation nowcasting

This work introduces GPTCast, a generative deep-learning method for ensemble nowcast of radar-based precipitation, inspired by advancements in large language models (LLMs). We employ a GPT model as a forecaster to learn spatiotemporal precipitation dynamics using tokenized radar images. The tokenizer is based on a Quantized Variational Autoencoder featuring a novel reconstruction loss tailored for the skewed distribution of precipitation that promotes faithful reconstruction of high rainfall rates. The approach produces realistic ensemble forecasts and provides probabilistic outputs with accurate uncertainty estimation. The model is trained without resorting to randomness, all variability is learned solely from the data and exposed by model at inference for ensemble generation. We train and test GPTCast using a 6-year radar dataset over the Emilia-Romagna region in Northern Italy, showing superior results compared to state-of-the-art ensemble extrapolation methods.

Updated: 2024-09-24 09:50:58

标题: GPTCast：一种用于降水即时预报的天气语言模型

摘要: 这项工作介绍了GPTCast，这是一种受到大型语言模型（LLMs）进展启发的用于基于雷达的降水集合现预报的生成式深度学习方法。我们使用GPT模型作为预测器，利用标记化的雷达图像学习时空降水动态。标记器基于一种基于量子化变分自动编码器的技术，具有针对偏斜的降水分布定制的重建损失，促进对高降雨率的忠实重建。该方法产生了逼真的集合预报，并提供了准确的不确定性估计的概率输出。该模型在训练时不依赖随机性，所有变异性仅从数据中学习，并由模型在推断中暴露以进行集合生成。我们使用6年的雷达数据集在意大利北部的艾米利亚-罗马涅地区对GPTCast进行训练和测试，结果显示与最先进的集合外推方法相比具有优越性。

更新时间: 2024-09-24 09:50:58

领域: cs.LG,physics.ao-ph

下载: http://arxiv.org/abs/2407.02089v2

Knowledge Editing in Language Models via Adapted Direct Preference Optimization

Large Language Models (LLMs) can become outdated over time as they may lack updated world knowledge, leading to factual knowledge errors and gaps. Knowledge Editing (KE) aims to overcome this challenge using weight updates that do not require expensive retraining. We propose treating KE as an LLM alignment problem. Toward this goal, we introduce Knowledge Direct Preference Optimization (KDPO), a variation of the Direct Preference Optimization (DPO) that is more effective for knowledge modifications. Our method is based on an online approach that continually updates the knowledge stored in the model. We use the current knowledge as a negative sample and the new knowledge we want to introduce as a positive sample in a process called DPO. We also use teacher-forcing for negative sample generation and optimize using the positive sample, which helps maintain localized changes. We tested our KE method on various datasets and models, comparing it to several cutting-edge methods, with 100 and 500 sequential edits. Additionally, we conducted an ablation study comparing our method to the standard DPO approach. Our experimental results show that our modified DPO method allows for more refined KE, achieving similar or better performance compared to previous methods.

Updated: 2024-09-24 09:48:36

标题: 通过适应的直接偏好优化在语言模型中进行知识编辑

摘要: 大型语言模型（LLMs）可能随着时间的推移而过时，因为它们可能缺乏更新的世界知识，导致事实知识错误和空白。知识编辑（KE）旨在通过权重更新来克服这一挑战，而无需昂贵的重新训练。我们提出将KE视为LLM对齐问题。为实现这一目标，我们引入了知识直接偏好优化（KDPO），这是直接偏好优化（DPO）的一种变体，对于知识修改更有效。我们的方法基于一种在线方法，不断更新模型中存储的知识。我们将当前知识用作负样本，将要引入的新知识用作正样本，在一个称为DPO的过程中。我们还使用teacher-forcing进行负样本生成，并使用正样本进行优化，这有助于保持局部变化。我们在各种数据集和模型上测试了我们的KE方法，将其与几种尖端方法进行了比较，进行了100和500个连续编辑。此外，我们进行了一项消融研究，将我们的方法与标准的DPO方法进行了比较。我们的实验结果表明，我们修改后的DPO方法允许进行更精细的KE，实现了与先前方法相似或更好的性能。

更新时间: 2024-09-24 09:48:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.09920v2

Multilingual Transfer and Domain Adaptation for Low-Resource Languages of Spain

This article introduces the submission status of the Translation into Low-Resource Languages of Spain task at (WMT 2024) by Huawei Translation Service Center (HW-TSC). We participated in three translation tasks: spanish to aragonese (es-arg), spanish to aranese (es-arn), and spanish to asturian (es-ast). For these three translation tasks, we use training strategies such as multilingual transfer, regularized dropout, forward translation and back translation, labse denoising, transduction ensemble learning and other strategies to neural machine translation (NMT) model based on training deep transformer-big architecture. By using these enhancement strategies, our submission achieved a competitive result in the final evaluation.

Updated: 2024-09-24 09:46:27

标题: 西班牙低资源语言的多语言转移和领域适应

摘要: 这篇文章介绍了华为翻译服务中心（HW-TSC）在2024年机器翻译大赛（WMT 2024）中提交的西班牙语低资源语言翻译任务的情况。我们参与了三项翻译任务：西班牙语到阿拉贡语（es-arg）、西班牙语到阿拉尼斯语（es-arn）和西班牙语到阿斯图里亚斯语（es-ast）。针对这三项翻译任务，我们采用了多语言转移、正则化丢弃、正向翻译和逆向翻译、labse去噪、传导集成学习等训练策略来优化基于深度transformer-big架构的神经机器翻译（NMT）模型。通过使用这些增强策略，我们的提交在最终评估中取得了竞争性的成绩。

更新时间: 2024-09-24 09:46:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.15924v1

Overcoming Reward Model Noise in Instruction-Guided Reinforcement Learning

Vision-language models (VLMs) have gained traction as auxiliary reward models to provide more informative reward signals in sparse reward environments. However, our work reveals a critical vulnerability of this method: a small amount of noise in the reward signal can severely degrade agent performance. In challenging environments with sparse rewards, we show that reinforcement learning agents using VLM-based reward models without proper noise handling perform worse than agents relying solely on exploration-driven methods. We hypothesize that false positive rewards -- where the reward model incorrectly assigns rewards to trajectories that do not fulfill the given instruction -- are more detrimental to learning than false negatives. Our analysis confirms this hypothesis, revealing that the widely used cosine similarity metric, when applied to comparing agent trajectories and language instructions, is prone to generating false positive reward signals. To address this, we introduce BiMI (Binary Mutual Information), a novel noise-resilient reward function. Our experiments demonstrate that, BiMI significantly boosts the agent performance, with an average improvement ratio of 44.5\% across diverse environments with learned, non-oracle VLMs, thereby making VLM-based reward models practical for real-world applications.

Updated: 2024-09-24 09:45:20

标题: 克服指导式强化学习中的奖励模型噪音

摘要: 视觉-语言模型（VLMs）已经成为辅助奖励模型，为稀疏奖励环境提供更具信息性的奖励信号。然而，我们的工作揭示了这种方法的一个关键弱点：奖励信号中的少量噪音会严重降低代理性能。在稀疏奖励的挑战性环境中，我们展示了使用基于VLM的奖励模型但没有适当处理噪音的强化学习代理，表现比仅依赖探索驱动方法的代理更差。我们假设误报奖励--奖励模型错误地将奖励分配给未实现给定指令的轨迹--比误报消极更有害于学习。我们的分析证实了这一假设，揭示了广泛使用的余弦相似度指标在比较代理轨迹和语言指令时容易生成误报奖励信号。为了解决这个问题，我们引入了BiMI（二进制互信息），一种新颖的抗噪声奖励函数。我们的实验表明，BiMI显著提高了代理性能，在各种环境中，对于学习而非神谕VLMs，平均改进比例为44.5％，从而使VLM基于奖励模型对实际应用变得实用。

更新时间: 2024-09-24 09:45:20

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2409.15922v1

Towards Graph Prompt Learning: A Survey and Beyond

Large-scale "pre-train and prompt learning" paradigms have demonstrated remarkable adaptability, enabling broad applications across diverse domains such as question answering, image recognition, and multimodal retrieval. This approach fully leverages the potential of large-scale pre-trained models, reducing downstream data requirements and computational costs while enhancing model applicability across various tasks. Graphs, as versatile data structures that capture relationships between entities, play pivotal roles in fields such as social network analysis, recommender systems, and biological graphs. Despite the success of pre-train and prompt learning paradigms in Natural Language Processing (NLP) and Computer Vision (CV), their application in graph domains remains nascent. In graph-structured data, not only do the node and edge features often have disparate distributions, but the topological structures also differ significantly. This diversity in graph data can lead to incompatible patterns or gaps between pre-training and fine-tuning on downstream graphs. We aim to bridge this gap by summarizing methods for alleviating these disparities. This includes exploring prompt design methodologies, comparing related techniques, assessing application scenarios and datasets, and identifying unresolved problems and challenges. This survey categorizes over 100 relevant works in this field, summarizing general design principles and the latest applications, including text-attributed graphs, molecules, proteins, and recommendation systems. Through this extensive review, we provide a foundational understanding of graph prompt learning, aiming to impact not only the graph mining community but also the broader Artificial General Intelligence (AGI) community.

Updated: 2024-09-24 09:43:35

标题: 走向图提示学习：调查与展望

摘要: 大规模的“预训练和提示学习”范式已经展示出了非凡的适应性，使其能够广泛应用于不同领域，如问答、图像识别和多模态检索。这种方法充分利用了大规模预训练模型的潜力，降低了下游数据需求和计算成本，同时增强了模型在各种任务中的适用性。作为能够捕捉实体之间关系的多功能数据结构，图在社交网络分析、推荐系统和生物图等领域发挥着关键作用。尽管在自然语言处理（NLP）和计算机视觉（CV）领域中的预训练和提示学习范式取得了成功，但在图领域的应用仍处于初期阶段。在图结构化数据中，节点和边特征通常具有不同的分布，而拓扑结构也存在显著差异。图数据的多样性可能导致预训练与下游图的微调之间存在不兼容的模式或差距。我们旨在通过总结缓解这些差异的方法来弥合这一差距。这包括探索提示设计方法论、比较相关技术、评估应用场景和数据集，并确定未解决的问题和挑战。这份调查将这一领域中的100多项相关作品进行分类，总结一般设计原则和最新应用，包括文本属性图、分子、蛋白质和推荐系统。通过这一广泛的回顾，我们提供了对图提示学习的基础理解，旨在影响不仅图挖掘社区，还有更广泛的人工智能（AGI）社区。

更新时间: 2024-09-24 09:43:35

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2408.14520v3

Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models

Latent Gaussian process (GP) models are flexible probabilistic non-parametric function models. Vecchia approximations are accurate approximations for GPs to overcome computational bottlenecks for large data, and the Laplace approximation is a fast method with asymptotic convergence guarantees to approximate marginal likelihoods and posterior predictive distributions for non-Gaussian likelihoods. Unfortunately, the computational complexity of combined Vecchia-Laplace approximations grows faster than linearly in the sample size when used in combination with direct solver methods such as the Cholesky decomposition. Computations with Vecchia-Laplace approximations can thus become prohibitively slow precisely when the approximations are usually the most accurate, i.e., on large data sets. In this article, we present iterative methods to overcome this drawback. Among other things, we introduce and analyze several preconditioners, derive new convergence results, and propose novel methods for accurately approximating predictive variances. We analyze our proposed methods theoretically and in experiments with simulated and real-world data. In particular, we obtain a speed-up of an order of magnitude compared to Cholesky-based calculations and a threefold increase in prediction accuracy in terms of the continuous ranked probability score compared to a state-of-the-art method on a large satellite data set. All methods are implemented in a free C++ software library with high-level Python and R packages.

Updated: 2024-09-24 09:37:47

标题: Vecchia-Laplace逼近的迭代方法用于潜在高斯过程模型

摘要: 潜在高斯过程（GP）模型是灵活的概率非参数函数模型。Vecchia逼近是针对GP的准确逼近，以克服大数据计算瓶颈的方法，而Laplace逼近是一种快速方法，具有渐近收敛保证，用于逼近非高斯似然的边际似然和后验预测分布。不幸的是，与直接求解器方法（如Cholesky分解）结合使用时，Vecchia-Laplace逼近的计算复杂性增长速度超过线性，当用于大数据集时，这些逼近通常是最准确的，因此计算Vecchia-Laplace逼近可能变得难以忍受的缓慢。在本文中，我们提出了克服这一缺点的迭代方法。我们介绍和分析了几种预处理器，推导了新的收敛结果，并提出了准确逼近预测方差的新方法。我们在理论上和使用模拟和真实数据进行的实验中分析了我们提出的方法。特别是，我们在大型卫星数据集上与基于Cholesky的计算相比，获得了一个数量级的加速，并在连续排名概率分数方面较一个最先进的方法提高了三倍的预测准确性。所有方法均在一个带有高级Python和R软件包的免费C++软件库中实现。

更新时间: 2024-09-24 09:37:47

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2310.12000v3

Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes. Traditional physics-based computational methods, such as molecular dynamics (MD) simulations, suffer from rare event sampling and long equilibration time problems, hindering their applications in general protein systems. Recently, deep generative modeling techniques, especially diffusion models, have been employed to generate novel protein conformations. However, existing score-based diffusion methods cannot properly incorporate important physical prior knowledge to guide the generation process, causing large deviations in the sampled protein conformations from the equilibrium distribution. In this paper, to overcome these limitations, we propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation. By incorporating a force-guided network with a mixture of data-based score models, ConfDiff can generate protein conformations with rich diversity while preserving high fidelity. Experiments on a variety of protein conformation prediction tasks, including 12 fast-folding proteins and the Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that our method surpasses the state-of-the-art method.

Updated: 2024-09-24 09:37:16

标题: 通过力导引的SE(3)扩散模型生成蛋白质构象

摘要: 蛋白质的构象景观对于理解其在复杂生物过程中的功能至关重要。传统基于物理的计算方法，如分子动力学（MD）模拟，受到稀有事件采样和长时间平衡问题的困扰，阻碍了它们在一般蛋白质系统中的应用。最近，深度生成建模技术，特别是扩散模型，已被用于生成新颖的蛋白质构象。然而，现有的基于评分的扩散方法无法正确地整合重要的物理先验知识来引导生成过程，导致采样的蛋白质构象与平衡分布产生较大偏差。在本文中，为了克服这些限制，我们提出了一种力引导的SE(3)扩散模型ConfDiff，用于蛋白质构象生成。通过将力引导网络与基于数据的评分模型混合，ConfDiff可以生成具有丰富多样性的蛋白质构象，同时保持高保真度。对包括12个快速折叠蛋白质和牛胰蛋白酶抑制剂（BPTI）在内的各种蛋白质构象预测任务的实验表明，我们的方法超越了最先进的方法。

更新时间: 2024-09-24 09:37:16

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2403.14088v2

Deep convolutional framelets for dose reconstruction in BNCT with Compton camera detector

Boron Neutron Capture Therapy (BNCT) is an innovative binary form of radiation therapy with high selectivity towards cancer tissue based on the neutron capture reaction 10B(n,$\alpha$)7Li, consisting in the exposition of patients to neutron beams after administration of a boron compound with preferential accumulation in cancer cells. The high linear energy transfer products of the ensuing reaction deposit their energy at cell level, sparing normal tissue. Although progress in accelerator-based BNCT has led to renewed interest in this cancer treatment modality, in vivo dose monitoring during treatment still remains not feasible and several approaches are under investigation. While Compton imaging presents various advantages over other imaging methods, it typically requires long reconstruction times, comparable with BNCT treatment duration. This study aims to develop deep neural network models to estimate the dose distribution by using a simulated dataset of BNCT Compton camera images. The models pursue the avoidance of the iteration time associated with the maximum-likelihood expectation-maximization algorithm (MLEM), enabling a prompt dose reconstruction during the treatment. The U-Net architecture and two variants based on the deep convolutional framelets framework have been used for noise and artifacts reduction in few-iterations reconstructed images, leading to promising results in terms of reconstruction accuracy and processing time.

Updated: 2024-09-24 09:34:19

标题: 深度卷积框架在康普顿相机探测器中用于硼中子俘获治疗剂量重建

摘要: 硼中子俘获治疗（BNCT）是一种创新的二元形式的放射治疗，对癌细胞具有很高的选择性，基于中子俘获反应10B(n，α)7Li，包括在给患者注射富集在癌细胞中的硼化合物后，暴露于中子束。随后反应的高线性能量转移产物在细胞水平上释放能量，保护正常组织。尽管加速器基础的BNCT的进展导致对这种癌症治疗模式的重新关注，但在治疗过程中对剂量的实时监测仍然不可行，有几种方法正在研究中。虽然Compton成像相对于其他成像方法具有各种优势，但通常需要较长的重建时间，与BNCT治疗持续时间相当。本研究旨在通过使用模拟的BNCT康普顿相机图像数据集来开发深度神经网络模型，估计剂量分布。这些模型旨在避免与最大似然期望最大化算法（MLEM）相关的迭代时间，从而在治疗过程中进行快速剂量重建。U-Net架构和基于深度卷积框架的两个变体已用于减少少迭代重建图像中的噪音和伪影，从而在重建精度和处理时间方面取得了有望的结果。

更新时间: 2024-09-24 09:34:19

领域: physics.med-ph,cs.LG

下载: http://arxiv.org/abs/2409.15916v1

Embedding Knowledge Graph in Function Spaces

We introduce a novel embedding method diverging from conventional approaches by operating within function spaces of finite dimension rather than finite vector space, thus departing significantly from standard knowledge graph embedding techniques. Initially employing polynomial functions to compute embeddings, we progress to more intricate representations using neural networks with varying layer complexities. We argue that employing functions for embedding computation enhances expressiveness and allows for more degrees of freedom, enabling operations such as composition, derivatives and primitive of entities representation. Additionally, we meticulously outline the step-by-step construction of our approach and provide code for reproducibility, thereby facilitating further exploration and application in the field.

Updated: 2024-09-24 09:33:44

标题: 将知识图嵌入到函数空间中

摘要: 我们引入了一种新颖的嵌入方法，与传统方法不同，它在有限维函数空间中进行操作，而不是有限向量空间，因此与标准知识图嵌入技术有着显著的不同。最初使用多项式函数来计算嵌入，我们逐渐进展到使用具有不同层次复杂性的神经网络来表示更复杂的嵌入。我们认为，在嵌入计算中使用函数可以增强表达能力，并允许更多的自由度，从而实现复合、导数和实体表示的原始操作。此外，我们详细介绍了我们方法的逐步构建过程，并提供了可再现的代码，从而促进在该领域的进一步探索和应用。

更新时间: 2024-09-24 09:33:44

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.14857v2

Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts

Large Language Models (LLMs) have shown promise in solving natural language-described planning tasks, but their direct use often leads to inconsistent reasoning and hallucination. While hybrid LLM-symbolic planning pipelines have emerged as a more robust alternative, they typically require extensive expert intervention to refine and validate generated action schemas. It not only limits scalability but also introduces a potential for biased interpretation, as a single expert's interpretation of ambiguous natural language descriptions might not align with the user's actual intent. To address this, we propose a novel approach that constructs an action schema library to generate multiple candidates, accounting for the diverse possible interpretations of natural language descriptions. We further introduce a semantic validation and ranking module that automatically filter and rank the generated schemas and plans without expert-in-the-loop. The experiments showed our pipeline maintains superiority in planning over the direct LLM planning approach. These findings demonstrate the feasibility of a fully automated end-to-end LLM-symbolic planner that requires no expert intervention, opening up the possibility for a broader audience to engage with AI planning with less prerequisite of domain expertise.

Updated: 2024-09-24 09:33:12

标题: 在黑暗中规划：没有专家的LLM符号规划管道

摘要: 大型语言模型（LLMs）已经显示出在解决自然语言描述的规划任务方面具有潜力，但它们的直接使用通常会导致不一致的推理和幻觉。尽管混合LLM符号规划流程已经成为更健壮的替代方案，但通常需要广泛的专家干预来完善和验证生成的动作模式。这不仅限制了可伸缩性，还引入了偏见解释的可能性，因为单个专家对模糊的自然语言描述的解释可能与用户的实际意图不符。为了解决这个问题，我们提出了一种新颖的方法，构建一个动作模式库来生成多个候选方案，考虑到自然语言描述的各种可能解释。我们进一步引入了一个语义验证和排名模块，可以自动过滤和排名生成的模式和计划，无需专家参与。实验表明，我们的流程在规划方面保持优势，优于直接LLM规划方法。这些发现表明，完全自动化的端到端LLM符号规划器是可行的，无需专家干预，使更广泛的受众能够参与AI规划，减少对领域专业知识的先决条件。

更新时间: 2024-09-24 09:33:12

领域: cs.AI

下载: http://arxiv.org/abs/2409.15915v1

Enhancing IoT based Plant Health Monitoring through Advanced Human Plant Interaction using Large Language Models and Mobile Applications

This paper presents the development of a novel plant communication application that allows plants to "talk" to humans using real-time sensor data and AI-powered language models. Utilizing soil sensors that track moisture, temperature, and nutrient levels, the system feeds this data into the Gemini API, where it is processed and transformed into natural language insights about the plant's health and "mood." Developed using Flutter, Firebase, and ThingSpeak, the app offers a seamless user experience with real-time interaction capabilities. By fostering human-plant connectivity, this system enhances plant care practices, promotes sustainability, and introduces innovative applications for AI and IoT technologies in both personal and agricultural contexts. The paper explores the technical architecture, system integration, and broader implications of AI-driven plant communication.

Updated: 2024-09-24 09:26:47

标题: 通过使用大型语言模型和移动应用程序增强基于物联网的植物健康监测的高级人植物交互

摘要: 本文介绍了一种新颖的植物通信应用程序的开发，该应用程序允许植物使用实时传感器数据和AI驱动的语言模型与人类进行“交流”。利用跟踪湿度、温度和营养水平的土壤传感器，系统将这些数据输入Gemini API，经过处理并转化为有关植物健康状况和“情绪”的自然语言见解。该应用程序使用Flutter、Firebase和ThingSpeak开发，提供了无缝的用户体验和实时交互功能。通过促进人类与植物的连接，该系统增强了植物护理实践，促进了可持续发展，并在个人和农业环境中引入了人工智能和物联网技术的创新应用。本文探讨了基于人工智能的植物通信的技术架构、系统集成和更广泛的影响。

更新时间: 2024-09-24 09:26:47

领域: cs.AI

下载: http://arxiv.org/abs/2409.15910v1

A Fairness-Oriented Reinforcement Learning Approach for the Operation and Control of Shared Micromobility Services

As Machine Learning grows in popularity across various fields, equity has become a key focus for the AI community. However fairness-oriented approaches are still underexplored in smart mobility. Addressing this gap, our study investigates the balance between performance optimization and algorithmic fairness in shared micromobility services providing a novel framework based on Reinforcement Learning. Exploiting Q-Learning, the proposed methodology achieves equitable outcomes in terms of the Gini index across different areas characterized by their distance from central hubs. Through vehicle rebalancing, the provided scheme maximizes operator performance while ensuring fairness principles for users, reducing iniquity by up to 80% while only increasing costs by 30% (w.r.t. applying no equity adjustment). A case study with synthetic data validates our insights and highlights the importance of fairness in urban micromobility.

Updated: 2024-09-24 09:24:11

标题: 一种面向公平性的强化学习方法用于共享微移动服务的运营和控制

摘要: 随着机器学习在各个领域的流行，公平性已成为人工智能社区的重点关注。然而，在智能出行领域，公平导向的方法仍未得到充分探索。为填补这一空白，我们的研究探讨了在共享微移动服务中优化性能和算法公平性之间的平衡，提出了基于强化学习的新框架。通过利用Q学习，所提出的方法在不同区域中实现了基尼系数的公平结果，这些区域以它们与中心枢纽的距离为特征。通过车辆再平衡，所提供的方案在确保用户公平原则的同时最大限度地提高运营商的性能，将不公平性降低了80%，而仅增加了30%的成本（相对于不进行公平性调整）。通过合成数据的案例研究验证了我们的见解，并强调了城市微移动中公平性的重要性。

更新时间: 2024-09-24 09:24:11

领域: eess.SY,cs.CY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2403.15780v2

Enhancing Text-to-SQL Capabilities of Large Language Models via Domain Database Knowledge Injection

Text-to-SQL is a subtask in semantic parsing that has seen rapid progress with the evolution of Large Language Models (LLMs). However, LLMs face challenges due to hallucination issues and a lack of domain-specific database knowledge(such as table schema and cell values). As a result, they can make errors in generating table names, columns, and matching values to the correct columns in SQL statements. This paper introduces a method of knowledge injection to enhance LLMs' ability to understand schema contents by incorporating prior knowledge. This approach improves their performance in Text-to-SQL tasks. Experimental results show that pre-training LLMs on domain-specific database knowledge and fine-tuning them on downstream Text-to-SQL tasks significantly improves the Execution Match (EX) and Exact Match (EM) metrics across various models. This effectively reduces errors in generating column names and matching values to the columns. Furthermore, the knowledge-injected models can be applied to many downstream Text-to-SQL tasks, demonstrating the generalizability of the approach presented in this paper.

Updated: 2024-09-24 09:24:03

标题: 通过领域数据库知识注入增强大型语言模型的文本到SQL能力

摘要: 文本到SQL是语义解析中的一个子任务，随着大型语言模型（LLMs）的发展，这一领域取得了快速进展。然而，LLMs面临幻觉问题和缺乏领域特定数据库知识（如表结构和单元格值）等挑战。因此，它们在生成表名、列和将值匹配到正确的列时可能会出错。本文介绍了一种知识注入方法，通过整合先前的知识来增强LLMs理解模式内容的能力。这种方法提高了它们在文本到SQL任务中的性能。实验结果表明，在领域特定数据库知识上对LLMs进行预训练，并在下游文本到SQL任务中对其进行微调，显著提高了各种模型的执行匹配（EX）和精确匹配（EM）指标。这有效地减少了生成列名和将值匹配到列上的错误。此外，注入知识的模型可以应用于许多下游文本到SQL任务，展示了本文中提出的方法的泛化能力。

更新时间: 2024-09-24 09:24:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.15907v1

Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM

In this paper, we introduce a speech-conditioned Large Language Model (LLM) integrated with a Mixture of Experts (MoE) based connector to address the challenge of Code-Switching (CS) in Automatic Speech Recognition (ASR). Specifically, we propose an Insertion and Deletion of Interruption Token (IDIT) mechanism for better transfer text generation ability of LLM to speech recognition task. We also present a connecter with MoE architecture that manages multiple languages efficiently. To further enhance the collaboration of multiple experts and leverage the understanding capabilities of LLM, we propose a two-stage progressive training strategy: 1) The connector is unfrozen and trained with language-specialized experts to map speech representations to the text space. 2) The connector and LLM LoRA adaptor are trained with the proposed IDIT mechanism and all experts are activated to learn general representations. Experimental results demonstrate that our method significantly outperforms state-of-the-art models, including end-to-end and large-scale audio-language models.

Updated: 2024-09-24 09:20:22

标题: 用混合专家增强语音条件下的LLM提升代码切换ASR

摘要: 在本文中，我们介绍了一个与混合专家（MoE）连接器集成的语音条件的大型语言模型（LLM），以解决自动语音识别（ASR）中代码切换（CS）的挑战。具体来说，我们提出了一种插入和删除中断标记（IDIT）机制，以提高LLM的文本生成能力，从而适用于语音识别任务。我们还提出了一个具有MoE架构的连接器，有效管理多种语言。为了进一步增强多个专家的合作，并利用LLM的理解能力，我们提出了一个两阶段渐进式训练策略：1）解冻连接器，并用语言专业的专家训练，将语音表示映射到文本空间。2）使用提出的IDIT机制对连接器和LLM LoRA适配器进行训练，同时激活所有专家来学习通用表示。实验结果表明，我们的方法明显优于最先进的模型，包括端到端和大规模音频语言模型。

更新时间: 2024-09-24 09:20:22

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2409.15905v1

Five questions and answers about artificial intelligence

Rapid advances in Artificial Intelligence (AI) are generating much controversy in society, often without scientific basis. As occurred the development of other emerging technologies, such as the introduction of electricity in the early 20th century, AI causes both fascination and fear. Following the advice of the philosopher R.W. Emerson's: advice the knowledge is the antidote to fear; this paper seeks to contribute to the dissemination of knowledge about AI. To this end, it reflects on the following questions: the origins of AI, its possible future evolution, its ability to show feelings, the associated threats and dangers, and the concept of AI singularity.

Updated: 2024-09-24 09:19:55

标题: 五个关于人工智能的问题和答案

摘要: 人工智能（AI）的快速发展在社会中引发了许多争议，往往并无科学依据。就像其他新兴技术的发展一样，比如20世纪初引入电力，AI既引起了人们的兴趣，也引起了恐惧。遵循哲学家R.W.爱默生的建议：知识是消除恐惧的良药；本文旨在为AI的知识传播做出贡献。为此，它思考以下问题：AI的起源，其可能的未来发展，其展示情感的能力，相关的威胁和危险，以及AI奇点的概念。

更新时间: 2024-09-24 09:19:55

领域: cs.AI

下载: http://arxiv.org/abs/2409.15903v1

FedRepOpt: Gradient Re-parametrized Optimizers in Federated Learning

Federated Learning (FL) has emerged as a privacy-preserving method for training machine learning models in a distributed manner on edge devices. However, on-device models face inherent computational power and memory limitations, potentially resulting in constrained gradient updates. As the model's size increases, the frequency of gradient updates on edge devices decreases, ultimately leading to suboptimal training outcomes during any particular FL round. This limits the feasibility of deploying advanced and large-scale models on edge devices, hindering the potential for performance enhancements. To address this issue, we propose FedRepOpt, a gradient re-parameterized optimizer for FL. The gradient re-parameterized method allows training a simple local model with a similar performance as a complex model by modifying the optimizer's gradients according to a set of model-specific hyperparameters obtained from the complex models. In this work, we focus on VGG-style and Ghost-style models in the FL environment. Extensive experiments demonstrate that models using FedRepOpt obtain a significant boost in performance of 16.7% and 11.4% compared to the RepGhost-style and RepVGG-style networks, while also demonstrating a faster convergence time of 11.7% and 57.4% compared to their complex structure.

Updated: 2024-09-24 09:17:08

标题: FedRepOpt：联邦学习中的梯度重新参数化优化器

摘要: 联邦学习（FL）已成为一种在边缘设备上以分布式方式训练机器学习模型的隐私保护方法。然而，设备上的模型面临固有的计算能力和内存限制，可能导致梯度更新受限。随着模型大小的增加，边缘设备上梯度更新的频率降低，最终导致在任何特定FL轮次中训练结果不佳。这限制了在边缘设备上部署先进和大规模模型的可行性，阻碍了性能提升的潜力。为解决这一问题，我们提出了FedRepOpt，一种用于FL的梯度重新参数化优化器。梯度重新参数化方法允许通过根据从复杂模型中获取的一组模型特定超参数修改优化器的梯度，从而训练一个简单的本地模型，其性能与复杂模型相似。在这项工作中，我们专注于FL环境中的VGG风格和Ghost风格模型。广泛的实验证明，使用FedRepOpt的模型在性能上比RepGhost风格和RepVGG风格网络提高了16.7%和11.4%，同时也显示出比其复杂结构更快的收敛时间，分别提高了11.7%和57.4%。

更新时间: 2024-09-24 09:17:08

领域: cs.LG,cs.CV,cs.DC

下载: http://arxiv.org/abs/2409.15898v1

Characterizing Massive Activations of Attention Mechanism in Graph Neural Networks

Graph Neural Networks (GNNs) have become increasingly popular for effectively modeling data with graph structures. Recently, attention mechanisms have been integrated into GNNs to improve their ability to capture complex patterns. This paper presents the first comprehensive study revealing a critical, unexplored consequence of this integration: the emergence of Massive Activations (MAs) within attention layers. We introduce a novel method for detecting and analyzing MAs, focusing on edge features in different graph transformer architectures. Our study assesses various GNN models using benchmark datasets, including ZINC, TOX21, and PROTEINS. Key contributions include (1) establishing the direct link between attention mechanisms and MAs generation in GNNs, (2) developing a robust definition and detection method for MAs based on activation ratio distributions, (3) introducing the Explicit Bias Term (EBT) as a potential countermeasure and exploring it as an adversarial framework to assess models robustness based on the presence or absence of MAs. Our findings highlight the prevalence and impact of attention-induced MAs across different architectures, such as GraphTransformer, GraphiT, and SAN. The study reveals the complex interplay between attention mechanisms, model architecture, dataset characteristics, and MAs emergence, providing crucial insights for developing more robust and reliable graph models.

Updated: 2024-09-24 09:13:41

标题: Graph神经网络中大规模激活注意机制的特征化

摘要: 图神经网络（GNNs）越来越受欢迎，可以有效地对具有图结构的数据进行建模。最近，注意力机制已经被整合到GNNs中，以提高其捕捉复杂模式的能力。本文提出了第一项全面研究，揭示了这种整合的一个关键、未被探索的后果：在注意力层中出现大规模激活（MAs）。我们引入了一种新颖的方法来检测和分析MAs，重点放在不同图变换器架构中的边特征上。我们的研究评估了使用基准数据集，包括ZINC、TOX21和PROTEINS的各种GNN模型。关键贡献包括（1）建立了注意力机制和GNNs中MAs生成之间的直接联系，（2）基于激活比分布开发了强健的MAs定义和检测方法，（3）引入了显式偏差项（EBT）作为潜在对策，并探索它作为根据MAs的存在或不存在评估模型鲁棒性的对抗框架。我们的研究突出了在不同架构中，如GraphTransformer、GraphiT和SAN，注意力引起的MAs的普遍性和影响。该研究揭示了注意力机制、模型架构、数据集特征和MAs生成之间的复杂相互作用，为开发更强大可靠的图模型提供了关键见解。

更新时间: 2024-09-24 09:13:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.03463v2

Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model

Large language models, initially pre-trained with a limited context length, can better handle longer texts by continuing training on a corpus with extended contexts. However, obtaining effective long-context data is challenging due to the scarcity and uneven distribution of long documents across different domains. To address this issue, we propose a Query-centric data synthesis method, abbreviated as Quest. Quest is an interpretable method based on the observation that documents retrieved by similar queries are relevant but low-redundant, thus well-suited for synthesizing long-context data. The method is also scalable and capable of constructing large amounts of long-context data. Using Quest, we synthesize a long-context dataset up to 128k context length, significantly outperforming other data synthesis methods on multiple long-context benchmark datasets. In addition, we further verify that the Quest method is predictable through scaling law experiments, making it a reliable solution for advancing long-context models.

Updated: 2024-09-24 09:06:21

标题: "Quest：大型语言模型长文本扩展的查询中心数据合成方法"

摘要: 大型语言模型最初在有限的上下文长度进行预训练，可以通过在具有扩展上下文的语料库上继续训练来更好地处理更长的文本。然而，由于长文档在不同领域中的稀缺性和不均匀分布，获得有效的长上下文数据是具有挑战性的。为了解决这个问题，我们提出了一种基于查询为中心的数据合成方法，简称为Quest。Quest是一种可解释的方法，基于一个观察结果，即由类似查询检索到的文档是相关但低冗余的，因此非常适合用来合成长上下文数据。该方法还具有可扩展性，能够构建大量的长上下文数据。使用Quest，我们合成了一个长上下文数据集，最长达到128k的上下文长度，在多个长上下文基准数据集上明显优于其他数据合成方法。此外，我们通过规模定律实验证实了Quest方法是可预测的，使其成为推进长上下文模型的可靠解决方案。

更新时间: 2024-09-24 09:06:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19846v4

Symmetries and Expressive Requirements for Learning General Policies

State symmetries play an important role in planning and generalized planning. In the first case, state symmetries can be used to reduce the size of the search; in the second, to reduce the size of the training set. In the case of general planning, however, it is also critical to distinguish non-symmetric states, i.e., states that represent non-isomorphic relational structures. However, while the language of first-order logic distinguishes non-symmetric states, the languages and architectures used to represent and learn general policies do not. In particular, recent approaches for learning general policies use state features derived from description logics or learned via graph neural networks (GNNs) that are known to be limited by the expressive power of C_2, first-order logic with two variables and counting. In this work, we address the problem of detecting symmetries in planning and generalized planning and use the results to assess the expressive requirements for learning general policies over various planning domains. For this, we map planning states to plain graphs, run off-the-shelf algorithms to determine whether two states are isomorphic with respect to the goal, and run coloring algorithms to determine if C_2 features computed logically or via GNNs distinguish non-isomorphic states. Symmetry detection results in more effective learning, while the failure to detect non-symmetries prevents general policies from being learned at all in certain domains.

Updated: 2024-09-24 09:04:47

标题: 学习通用策略的对称性和表现要求

摘要: 状态对称在规划和广义规划中起着重要作用。在第一种情况下，状态对称可用于减少搜索的规模；在第二种情况下，可用于减少训练集的规模。然而，在广义规划的情况下，也很重要区分非对称状态，即表示非同构关系结构的状态。然而，尽管一阶逻辑语言区分非对称状态，但用于表示和学习一般策略的语言和架构却不具备这一能力。特别是，最近用于学习一般策略的方法使用从描述逻辑派生或通过已知受到C_2表达能力限制的两个变量和计数的一阶逻辑学习的状态特征。在这项工作中，我们解决了检测规划和广义规划中的对称性的问题，并利用结果评估学习各种规划领域中一般策略的表达要求。为此，我们将规划状态映射到普通图形，运行现成的算法以确定两个状态在目标方面是否同构，并运行着色算法以确定C_2特征计算逻辑上或通过GNNs是否区分非同构状态。对称性检测导致更有效的学习，而未能检测到非对称性会阻止在某些领域学习一般策略。

更新时间: 2024-09-24 09:04:47

领域: cs.AI

下载: http://arxiv.org/abs/2409.15892v1

Self-Supervised Graph Embedding Clustering

The K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. However, it combines the K-means clustering and dimensionality reduction processes for optimization, leading to limitations in the clustering effect due to the introduced hyperparameters and the initialization of clustering centers. Moreover, maintaining class balance during clustering remains challenging. To overcome these issues, we propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework. Specifically, we establish a connection between K-means and the manifold structure, allowing us to perform K-means without explicitly defining centroids. Additionally, we use this centroid-free K-means to generate labels in low-dimensional space and subsequently utilize the label information to determine the similarity between samples. This approach ensures consistency between the manifold structure and the labels. Our model effectively achieves one-step clustering without the need for redundant balancing hyperparameters. Notably, we have discovered that maximizing the $\ell_{2,1}$-norm naturally maintains class balance during clustering, a result that we have theoretically proven. Finally, experiments on multiple datasets demonstrate that the clustering results of Our-LPP and Our-MFA exhibit excellent and reliable performance.

Updated: 2024-09-24 08:59:51

标题: 自监督图嵌入聚类

摘要: K均值一步降维聚类方法在解决聚类任务中的维度灾难方面取得了一些进展。然而，它将K均值聚类和降维过程结合起来进行优化，导致由于引入的超参数和聚类中心的初始化而限制了聚类效果。此外，在聚类过程中保持类别平衡仍然具有挑战性。为了克服这些问题，我们提出了一个统一的框架，将流形学习与K均值集成在一起，形成了自监督图嵌入框架。具体地，我们建立了K均值和流形结构之间的联系，使我们能够在不明确定义质心的情况下执行K均值。此外，我们使用这种无质心的K均值在低维空间生成标签，并随后利用标签信息确定样本之间的相似性。这种方法确保了流形结构和标签之间的一致性。我们的模型有效地实现了一步聚类，无需冗余的平衡超参数。值得注意的是，我们发现最大化$\ell_{2,1}$-范数在聚类过程中自然地保持了类别平衡，这是我们在理论上证明的结果。最后，对多个数据集的实验表明，我们的模型Our-LPP和Our-MFA的聚类结果表现出优秀和可靠的性能。

更新时间: 2024-09-24 08:59:51

领域: cs.LG

下载: http://arxiv.org/abs/2409.15887v1

On the calibration of powerset speaker diarization models

End-to-end neural diarization models have usually relied on a multilabel-classification formulation of the speaker diarization problem. Recently, we proposed a powerset multiclass formulation that has beaten the state-of-the-art on multiple datasets. In this paper, we propose to study the calibration of a powerset speaker diarization model, and explore some of its uses. We study the calibration in-domain, as well as out-of-domain, and explore the data in low-confidence regions. The reliability of model confidence is then tested in practice: we use the confidence of the pretrained model to selectively create training and validation subsets out of unannotated data, and compare this to random selection. We find that top-label confidence can be used to reliably predict high-error regions. Moreover, training on low-confidence regions provides a better calibrated model, and validating on low-confidence regions can be more annotation-efficient than random regions.

Updated: 2024-09-24 08:56:42

标题: 关于幂集说话人辨识模型的校准

摘要: 端到端的神经分流模型通常依赖于讲话者分流问题的多标签分类公式。最近，我们提出了一个超集多类别公式，该公式在多个数据集上超越了最先进的水平。在本文中，我们提出研究一种超集讲话者分流模型的校准，并探索其一些用途。我们研究域内和域外的校准，并探索低置信区域的数据。然后在实践中测试模型置信度的可靠性：我们利用预训练模型的置信度有选择地从未标记数据中创建训练和验证子集，并将其与随机选择进行比较。我们发现，顶级标签置信度可以可靠地预测高误差区域。此外，在低置信区域训练可以提供更好校准的模型，并在低置信区域验证可能比随机区域更节省注释。

更新时间: 2024-09-24 08:56:42

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2409.15885v1

On Computing Optimal Tree Ensembles

Random forests and, more generally, (decision\nobreakdash-)tree ensembles are widely used methods for classification and regression. Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We are not aware of such research for tree ensembles and aim to contribute to this area. Mainly, we provide two novel algorithms and corresponding lower bounds. First, we are able to carry over and substantially improve on tractability results for decision trees: We obtain an algorithm that, given a training-data set and an size bound $S \in \mathbb{R}$, computes a tree ensemble of size at most $S$ that classifies the data correctly. The algorithm runs in $(4\delta D S)^S \cdot poly$-time, where $D$ the largest domain size, $\delta$ is the largest number of features in which two examples differ, $n$ the number of input examples, and $poly$ a polynomial of the input size. For decision trees, that is, ensembles of size 1, we obtain a running time of $(\delta D s)^s \cdot poly$, where $s$ is the size of the tree. To obtain these algorithms, we introduce the witness-tree technique, which seems promising for practical implementations. Secondly, we show that dynamic programming, which has been applied successfully to computing decision trees, may also be viable for tree ensembles, providing an $\ell^n \cdot poly$-time algorithm, where $\ell$ is the number of trees. Finally, we compare the number of cuts necessary to classify training data sets for decision trees and tree ensembles, showing that ensembles may need exponentially fewer cuts for increasing number of trees.

Updated: 2024-09-24 08:53:21

标题: 计算最优树集合

摘要: 随机森林和更一般地说，（决策）树集成是广泛应用于分类和回归的方法。最近的算法进展使得可以计算出在各种度量标准下都是最优的决策树，比如它们的大小或深度。我们不知道是否有关于树集成的这方面研究，因此旨在为这个领域做出贡献。主要地，我们提供了两种新算法和相应的下界。首先，我们能够延续并大幅改进决策树的可计算性结果：我们得到了一个算法，给定一个训练数据集和一个大小限制$S \in \mathbb{R}$，可以计算出一个大小不超过$S$的树集成，正确地对数据进行分类。该算法的运行时间为$(4\delta D S)^S \cdot poly$，其中$D$是最大的域大小，$\delta$是两个示例中不同特征的最大数量，$n$是输入示例的数量，$poly$是输入规模的多项式。对于决策树，也就是大小为1的集成，我们得到了一个运行时间为$(\delta D s)^s \cdot poly$，其中$s$是树的大小。为了得到这些算法，我们引入了见证树技术，这对于实际实现似乎很有前景。其次，我们展示了动态规划，这在计算决策树方面已经被成功应用，也可能适用于树集成，提供了一个$\ell^n \cdot poly$的算法，其中$\ell$是树的数量。最后，我们比较了为了分类训练数据集所需的切割次数，对于决策树和树集成来说，显示出随着树的数量增加，集成可能需要指数级别减少的切割次数。

更新时间: 2024-09-24 08:53:21

领域: cs.LG,cs.DS

下载: http://arxiv.org/abs/2306.04423v2

Machine Translation Advancements of Low-Resource Indian Languages by Transfer Learning

This paper introduces the submission by Huawei Translation Center (HW-TSC) to the WMT24 Indian Languages Machine Translation (MT) Shared Task. To develop a reliable machine translation system for low-resource Indian languages, we employed two distinct knowledge transfer strategies, taking into account the characteristics of the language scripts and the support available from existing open-source models for Indian languages. For Assamese(as) and Manipuri(mn), we fine-tuned the existing IndicTrans2 open-source model to enable bidirectional translation between English and these languages. For Khasi (kh) and Mizo (mz), We trained a multilingual model as a baseline using bilingual data from these four language pairs, along with an additional about 8kw English-Bengali bilingual data, all of which share certain linguistic features. This was followed by fine-tuning to achieve bidirectional translation between English and Khasi, as well as English and Mizo. Our transfer learning experiments produced impressive results: 23.5 BLEU for en-as, 31.8 BLEU for en-mn, 36.2 BLEU for as-en, and 47.9 BLEU for mn-en on their respective test sets. Similarly, the multilingual model transfer learning experiments yielded impressive outcomes, achieving 19.7 BLEU for en-kh, 32.8 BLEU for en-mz, 16.1 BLEU for kh-en, and 33.9 BLEU for mz-en on their respective test sets. These results not only highlight the effectiveness of transfer learning techniques for low-resource languages but also contribute to advancing machine translation capabilities for low-resource Indian languages.

Updated: 2024-09-24 08:53:19

标题: 通过迁移学习的方式提升低资源印度语言的机器翻译技术

摘要: 这篇论文介绍了华为翻译中心（HW-TSC）提交给WMT24印度语言机器翻译（MT）共享任务的成果。为了开发一种可靠的低资源印度语言机器翻译系统，我们采用了两种不同的知识转移策略，考虑到语言文字的特点以及现有印度语言开源模型的支持。对于阿萨姆语（as）和曼尼普尔语（mn），我们对现有的IndicTrans2开源模型进行了微调，实现了英语和这些语言之间的双向翻译。对于卡西语（kh）和密佐语（mz），我们使用这四种语言对的双语数据以及额外的大约8kw英孟加拉双语数据训练了一个多语言模型作为基线，这些数据都共享某些语言特征。接着我们进行微调，实现了英语和卡西语以及英语和密佐语之间的双向翻译。我们的迁移学习实验取得了令人印象深刻的结果：en-as的BLEU为23.5，en-mn的BLEU为31.8，as-en的BLEU为36.2，mn-en的BLEU为47.9。同样，多语言模型的迁移学习实验也取得了令人印象深刻的成果，在各自的测试集上分别实现了en-kh的BLEU为19.7，en-mz的BLEU为32.8，kh-en的BLEU为16.1，mz-en的BLEU为33.9。这些结果不仅突显了对于低资源语言的迁移学习技术的有效性，还有助于推进低资源印度语言的机器翻译能力。

更新时间: 2024-09-24 08:53:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.15879v1

CamelEval: Advancing Culturally Aligned Arabic Language Models and Benchmarks

Large Language Models (LLMs) are the cornerstones of modern artificial intelligence systems. This paper introduces Juhaina, a Arabic-English bilingual LLM specifically designed to align with the values and preferences of Arabic speakers. Juhaina inherently supports advanced functionalities such as instruction following, open-ended question answering, information provisioning, and text processing. Our model contains 9.24 billion parameters and is trained on a context window of up to 8,192 tokens. This paper details the creation process of Juhaina and provides an extensive empirical evaluation. Furthermore, we identify the limitations of widely-adopted Open Arabic LLM Leaderboard (OALL) and propose a new evaluation benchmark, CamelEval. Our findings demonstrate that Juhaina surpasses existing LLMs of comparable sizes, such as the Llama and Gemma families, in generating helpful responses in Arabic, providing factually accurate information about the region, and understanding nuanced cultural aspects. We aspire for Juhaina to democratize cutting-edge AI technologies, serving over 400 million Arabic speakers by offering LLMs that not only communicate in their language but also comprehend their culture. We publicly release all models on Huggingface \url{https://huggingface.co/elmrc}.

Updated: 2024-09-24 08:49:21

标题: CamelEval：推进符合文化的阿拉伯语言模型和基准

摘要: 大型语言模型(LLMs)是现代人工智能系统的基石。本文介绍了Juhaina，一个专门设计与阿拉伯语言者的价值观和偏好相一致的阿拉伯语-英语双语LLM。Juhaina本身支持高级功能，如遵循指令、回答开放式问题、提供信息和文本处理。我们的模型包含92.4亿个参数，并在最多8192个标记的上下文窗口上进行训练。本文详细介绍了Juhaina的创建过程，并提供了大量实证评估。此外，我们确定了广泛采用的Open Arabic LLM排行榜(OALL)的局限性，并提出了一个新的评估基准，CamelEval。我们的研究结果表明，Juhaina在生成有用的阿拉伯语响应、提供有关该地区的事实准确信息以及理解微妙的文化方面等方面超过了类似大小的现有LLMs，如Llama和Gemma系列。我们希望Juhaina能使尖端人工智能技术民主化，为超过4亿阿拉伯语言者提供LLMs，这些LLMs不仅能用他们的语言交流，还能理解他们的文化。我们在Huggingface上公开发布所有模型 \url{https://huggingface.co/elmrc}。

更新时间: 2024-09-24 08:49:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.12623v2

Exploring the traditional NMT model and Large Language Model for chat translation

This paper describes the submissions of Huawei Translation Services Center(HW-TSC) to WMT24 chat translation shared task on English$\leftrightarrow$Germany (en-de) bidirection. The experiments involved fine-tuning models using chat data and exploring various strategies, including Minimum Bayesian Risk (MBR) decoding and self-training. The results show significant performance improvements in certain directions, with the MBR self-training method achieving the best results. The Large Language Model also discusses the challenges and potential avenues for further research in the field of chat translation.

Updated: 2024-09-24 08:48:25

标题: 探索传统NMT模型和大型语言模型在聊天翻译中的应用

摘要: 本文描述了华为翻译服务中心（HW-TSC）在WMT24聊天翻译共享任务中对英德（en-de）双向翻译的提交。实验涉及使用聊天数据微调模型并探索各种策略，包括最小贝叶斯风险（MBR）解码和自我训练。结果显示在某些方向上有显著的性能改进，MBR自我训练方法取得了最佳结果。大型语言模型还讨论了聊天翻译领域中的挑战和进一步研究的潜在途径。

更新时间: 2024-09-24 08:48:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.16331v1

Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR

Large transformer-based models have significant potential for speech transcription and translation. Their self-attention mechanisms and parallel processing enable them to capture complex patterns and dependencies in audio sequences. However, this potential comes with challenges, as these large and computationally intensive models lead to slow inference speeds. Various optimization strategies have been proposed to improve performance, including efficient hardware utilization and algorithmic enhancements. In this paper, we introduce Whisper-Medusa, a novel approach designed to enhance processing speed with minimal impact on Word Error Rate (WER). The proposed model extends the OpenAI's Whisper architecture by predicting multiple tokens per iteration, resulting in a 50% reduction in latency. We showcase the effectiveness of Whisper-Medusa across different learning setups and datasets.

Updated: 2024-09-24 08:42:31

标题: 《梅杜莎之耳中的耳语：基于Transformer的ASR的多头高效解码》

摘要: 大型基于变压器的模型在语音转录和翻译方面具有重要潜力。它们的自注意机制和并行处理使它们能够捕捉音频序列中的复杂模式和依赖关系。然而，这种潜力伴随着挑战，因为这些大型且计算密集的模型导致推理速度较慢。已经提出了各种优化策略来提高性能，包括有效的硬件利用和算法增强。在本文中，我们介绍了Whisper-Medusa，这是一种旨在提高处理速度，对Word Error Rate（WER）影响最小的新方法。所提出的模型通过在每次迭代中预测多个标记来扩展OpenAI的Whisper架构，从而使延迟减少了50％。我们展示了Whisper-Medusa在不同学习设置和数据集上的有效性。

更新时间: 2024-09-24 08:42:31

领域: eess.AS,cs.AI,cs.LG,cs.SD

下载: http://arxiv.org/abs/2409.15869v1

Privacy Evaluation Benchmarks for NLP Models

By inducing privacy attacks on NLP models, attackers can obtain sensitive information such as training data and model parameters, etc. Although researchers have studied, in-depth, several kinds of attacks in NLP models, they are non-systematic analyses. It lacks a comprehensive understanding of the impact caused by the attacks. For example, we must consider which scenarios can apply to which attacks, what the common factors are that affect the performance of different attacks, the nature of the relationships between different attacks, and the influence of various datasets and models on the effectiveness of the attacks, etc. Therefore, we need a benchmark to holistically assess the privacy risks faced by NLP models. In this paper, we present a privacy attack and defense evaluation benchmark in the field of NLP, which includes the conventional/small models and large language models (LLMs). This benchmark supports a variety of models, datasets, and protocols, along with standardized modules for comprehensive evaluation of attacks and defense strategies. Based on the above framework, we present a study on the association between auxiliary data from different domains and the strength of privacy attacks. And we provide an improved attack method in this scenario with the help of Knowledge Distillation (KD). Furthermore, we propose a chained framework for privacy attacks. Allowing a practitioner to chain multiple attacks to achieve a higher-level attack objective. Based on this, we provide some defense and enhanced attack strategies. The code for reproducing the results can be found at https://github.com/user2311717757/nlp_doctor.

Updated: 2024-09-24 08:41:26

标题: 自然语言处理模型的隐私评估基准

摘要: 通过对NLP模型进行隐私攻击，攻击者可以获取敏感信息，如训练数据和模型参数等。尽管研究人员对NLP模型中的几种攻击进行了深入研究，但它们并非系统性分析。缺乏对攻击造成的影响的全面了解。例如，我们必须考虑哪些情景可以应用于哪些攻击，影响不同攻击性能的共同因素是什么，不同攻击之间的关系性质是什么，以及各种数据集和模型对攻击有效性的影响等。因此，我们需要一个基准来全面评估NLP模型面临的隐私风险。在本文中，我们提出了一个NLP领域的隐私攻击和防御评估基准，其中包括传统/小型模型和大型语言模型(LLMs)。该基准支持各种模型、数据集和协议，以及用于全面评估攻击和防御策略的标准化模块。基于上述框架，我们对不同领域的辅助数据与隐私攻击强度之间的关联进行了研究。我们在这种情况下提出了一种改进的攻击方法，借助知识蒸馏(KD)。此外，我们提出了一个用于隐私攻击的链式框架，允许从业者链接多个攻击以实现更高级别的攻击目标。基于此，我们提供了一些防御和增强攻击策略。可以在https://github.com/user2311717757/nlp_doctor 找到重现结果的代码。

更新时间: 2024-09-24 08:41:26

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.15868v1

In-Context Ensemble Improves Video-Language Models for Low-Level Workflow Understanding from Human Demonstrations

A Standard Operating Procedure (SOP) defines a low-level, step-by-step written guide for a business software workflow based on a video demonstration. SOPs are a crucial step toward automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-language models offer the potential for automating SOP generation by analyzing recordings of human demonstrations. However, current large video-language models face challenges with zero-shot SOP generation. We explore in-context learning with video-language models for SOP generation. We report that in-context learning sometimes helps video-language models at SOP generation. We then propose an in-context ensemble learning to further enhance the capabilities of the models in SOP generation.

Updated: 2024-09-24 08:41:01

标题: 在上下文集合中提高视频-语言模型，以便从人类演示中理解低级工作流程

摘要: 标准操作流程（SOP）定义了一个基于视频演示的业务软件工作流程的低级别、逐步编写的指南。SOP是自动化端到端软件工作流程的关键步骤。手动创建SOP可能会耗费时间。最近大型视频语言模型的进展为通过分析人类演示录像来自动化SOP生成提供了潜力。然而，当前的大型视频语言模型在零-shot SOP生成方面面临挑战。我们探索了视频语言模型在SOP生成中的上下文学习。我们报告了在某些情况下，上下文学习有助于视频语言模型的SOP生成。然后我们提出了一种上下文集成学习，以进一步增强模型在SOP生成中的能力。

更新时间: 2024-09-24 08:41:01

领域: cs.AI

下载: http://arxiv.org/abs/2409.15867v1

Multi-UAV Pursuit-Evasion with Online Planning in Unknown Environments by Deep Reinforcement Learning

Multi-UAV pursuit-evasion, where pursuers aim to capture evaders, poses a key challenge for UAV swarm intelligence. Multi-agent reinforcement learning (MARL) has demonstrated potential in modeling cooperative behaviors, but most RL-based approaches remain constrained to simplified simulations with limited dynamics or fixed scenarios. Previous attempts to deploy RL policy to real-world pursuit-evasion are largely restricted to two-dimensional scenarios, such as ground vehicles or UAVs at fixed altitudes. In this paper, we address multi-UAV pursuit-evasion by considering UAV dynamics and physical constraints. We introduce an evader prediction-enhanced network to tackle partial observability in cooperative strategy learning. Additionally, we propose an adaptive environment generator within MARL training, enabling higher exploration efficiency and better policy generalization across diverse scenarios. Simulations show our method significantly outperforms all baselines in challenging scenarios, generalizing to unseen scenarios with a 100\% capture rate. Finally, we derive a feasible policy via a two-stage reward refinement and deploy the policy on real quadrotors in a zero-shot manner. To our knowledge, this is the first work to derive and deploy an RL-based policy using collective thrust and body rates control commands for multi-UAV pursuit-evasion in unknown environments. The open-source code and videos are available at https://sites.google.com/view/pursuit-evasion-rl.

Updated: 2024-09-24 08:40:04

标题: 多无人机在未知环境中的在线规划追逐逃逸问题的深度强化学习

摘要: 多无人机追逐逃逸，其中追逐者的目标是捕捉逃避者，对于无人机群体智能构成了一个关键挑战。多智能体强化学习（MARL）已经展示了在建模合作行为方面的潜力，但大多数基于强化学习的方法仍然局限于简化的模拟环境，具有有限的动态或固定的场景。以前尝试将强化学习策略部署到现实世界的追逐逃避中，主要限制在二维场景，比如地面车辆或固定高度的无人机。在本文中，我们通过考虑无人机动态和物理约束来解决多无人机追逐逃避问题。我们引入了一个增强的逃避者预测网络来解决合作策略学习中的部分可观察性问题。此外，我们提出了一个自适应环境生成器，用于MARL训练，实现更高的探索效率和更好的策略泛化能力跨越多样化的场景。模拟结果表明，我们的方法在具有挑战性的场景中明显优于所有基线，在未知场景中实现了100％的捕获率。最后，我们通过两阶段奖励细化导出了一个可行的策略，并以零射方式将该策略部署在真实四旋翼飞行器上。据我们所知，这是首个在未知环境中使用集体推力和机体速率控制命令为多无人机追逐逃逸推导和部署基于强化学习的策略的工作。开源代码和视频可在https://sites.google.com/view/pursuit-evasion-rl 上获得。

更新时间: 2024-09-24 08:40:04

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2409.15866v1

Can Go AIs be adversarially robust?

Prior work found that superhuman Go AIs can be defeated by simple adversarial strategies, especially "cyclic" attacks. In this paper, we study whether adding natural countermeasures can achieve robustness in Go, a favorable domain for robustness since it benefits from incredible average-case capability and a narrow, innately adversarial setting. We test three defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that though some of these defenses protect against previously discovered attacks, none withstand freshly trained adversaries. Furthermore, most of the reliably effective attacks these adversaries discover are different realizations of the same overall class of cyclic attacks. Our results suggest that building robust AI systems is challenging even with extremely superhuman systems in some of the most tractable settings, and highlight two key gaps: efficient generalization in defenses, and diversity in training. For interactive examples of attacks and a link to our codebase, see https://goattack.far.ai.

Updated: 2024-09-24 08:38:38

标题: Go人工智能是否能够具有对抗性鲁棒性？

摘要: 先前的研究发现，超人类水平的围棋人工智能可以被简单的对抗策略打败，尤其是“循环”攻击。在本文中，我们研究了是否添加自然对策可以在围棋中实现鲁棒性，这是一个有利于鲁棒性的领域，因为它受益于令人难以置信的平均情况能力和一个狭窄的、天生对立的设置。我们测试了三种防御方法：对手构建位置的对抗训练、迭代对抗训练和改变网络架构。我们发现，尽管其中一些防御措施可以防止先前发现的攻击，但没有一种能抵御新训练的对手。此外，这些对手发现的大多数有效攻击都是相同整体类别循环攻击的不同实现。我们的结果表明，即使在一些最易处理的设置中，构建鲁棒的人工智能系统也是具有挑战性的，而且突出显示了两个关键差距：在防御中的有效泛化和在训练中的多样性。有关攻击的交互式示例和我们代码库的链接，请访问https://goattack.far.ai。

更新时间: 2024-09-24 08:38:38

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.12843v2

BeSimulator: A Large Language Model Powered Text-based Behavior Simulator

Traditional robot simulators focus on physical process modeling and realistic rendering, often suffering from high computational costs, inefficiencies, and limited adaptability. To handle this issue, we propose Behavior Simulation in robotics to emphasize checking the behavior logic of robots and achieving sufficient alignment between the outcome of robot actions and real scenarios. In this paper, we introduce BeSimulator, a modular and novel LLM-powered framework, as an attempt towards behavior simulation in the context of text-based environments. By constructing text-based virtual environments and performing semantic-level simulation, BeSimulator can generalize across scenarios and achieve long-horizon complex simulation. Inspired by human cognition processes, it employs a "consider-decide-capture-transfer" methodology, termed Chain of Behavior Simulation, which excels at analyzing action feasibility and state transitions. Additionally, BeSimulator incorporates code-driven reasoning to enable arithmetic operations and enhance reliability, as well as integrates reflective feedback to refine simulation. Based on our manually constructed behavior-tree-based simulation benchmark BTSIMBENCH, our experiments show a significant performance improvement in behavior simulation compared to baselines, ranging from 14.7% to 26.6%.

Updated: 2024-09-24 08:37:04

标题: BeSimulator：一个由大型语言模型驱动的基于文本的行为模拟器

摘要: 传统的机器人仿真器专注于物理过程建模和逼真渲染，通常受到高计算成本、低效率和有限的适应性的困扰。为了解决这个问题，我们提出了机器人行为仿真，强调检查机器人的行为逻辑，并实现机器人行为结果与真实场景之间的充分对齐。在本文中，我们介绍了BeSimulator，一个基于模块化和新颖的LLM驱动框架，旨在尝试在基于文本的环境中进行行为仿真。通过构建基于文本的虚拟环境并进行语义级仿真，BeSimulator可以在各种情景中进行泛化，并实现长期复杂的仿真。受人类认知过程的启发，它采用了一种“考虑-决定-捕获-转移”的方法论，称为行为仿真链，擅长分析行动的可行性和状态转换。此外，BeSimulator还融入了代码驱动推理，以实现算术运算并增强可靠性，并集成了反思反馈以完善仿真。根据我们手动构建的基于行为树的仿真基准BTSIMBENCH，我们的实验显示，与基准相比，行为仿真性能有显著的提升，提高了从14.7%到26.6%不等。

更新时间: 2024-09-24 08:37:04

领域: cs.RO,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.15865v1

VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka

This paper introduces VoxHakka, a text-to-speech (TTS) system designed for Taiwanese Hakka, a critically under-resourced language spoken in Taiwan. Leveraging the YourTTS framework, VoxHakka achieves high naturalness and accuracy and low real-time factor in speech synthesis while supporting six distinct Hakka dialects. This is achieved by training the model with dialect-specific data, allowing for the generation of speaker-aware Hakka speech. To address the scarcity of publicly available Hakka speech corpora, we employed a cost-effective approach utilizing a web scraping pipeline coupled with automatic speech recognition (ASR)-based data cleaning techniques. This process ensured the acquisition of a high-quality, multi-speaker, multi-dialect dataset suitable for TTS training. Subjective listening tests conducted using comparative mean opinion scores (CMOS) demonstrate that VoxHakka significantly outperforms existing publicly available Hakka TTS systems in terms of pronunciation accuracy, tone correctness, and overall naturalness. This work represents a significant advancement in Hakka language technology and provides a valuable resource for language preservation and revitalization efforts.

Updated: 2024-09-24 08:34:22

标题: VoxHakka: 一个方言多样的台湾客家人多说话者文本转语音系统

摘要: 这篇论文介绍了VoxHakka，一个专为台湾客家话设计的文本到语音（TTS）系统。客家话是台湾一种资源严重匮乏的语言。利用YourTTS框架，VoxHakka在语音合成中实现了高自然度和准确性，同时支持六种不同的客家方言。通过使用方言特定数据训练模型，使其能生成具有说话者感知的客家话语音。为解决公开客家话语料库稀缺的问题，我们采用了一种成本效益高的方法，结合网络抓取管道和基于自动语音识别（ASR）的数据清理技术。这一过程确保了获取高质量的多说话者、多方言数据集，适合TTS训练。使用比较意见平均分数（CMOS）进行的主观听力测试表明，VoxHakka在发音准确性、音调正确性和整体自然度方面明显优于现有的公开客家话TTS系统。这项工作代表了客家语言技术的重大进步，并为语言保护和复兴工作提供了宝贵的资源。

更新时间: 2024-09-24 08:34:22

领域: cs.SD,cs.AI,cs.CL,eess.AS

下载: http://arxiv.org/abs/2409.01548v2

A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding

Dialogue State Tracking (DST) is crucial for understanding user needs and executing appropriate system actions in task-oriented dialogues. Majority of existing DST methods are designed to work within predefined ontologies and assume the availability of gold domain labels, struggling with adapting to new slots values. While Large Language Models (LLMs)-based systems show promising zero-shot DST performance, they either require extensive computational resources or they underperform existing fully-trained systems, limiting their practicality. To address these limitations, we propose a zero-shot, open-vocabulary system that integrates domain classification and DST in a single pipeline. Our approach includes reformulating DST as a question-answering task for less capable models and employing self-refining prompts for more adaptable ones. Our system does not rely on fixed slot values defined in the ontology allowing the system to adapt dynamically. We compare our approach with existing SOTA, and show that it provides up to 20% better Joint Goal Accuracy (JGA) over previous methods on datasets like Multi-WOZ 2.1, with up to 90% fewer requests to the LLM API.

Updated: 2024-09-24 08:33:41

标题: 一个零射开放词汇管道用于对话理解

摘要: 对话状态跟踪（DST）对于理解用户需求并在面向任务的对话中执行适当的系统操作至关重要。现有的大多数DST方法都设计在预定义的本体中工作，并假定金本领域标签可用，难以适应新的插槽值。虽然基于大型语言模型（LLM）的系统显示出有前景的零-shot DST性能，但它们要么需要大量的计算资源，要么表现不如现有的完全训练过的系统，限制了它们的实用性。为了解决这些限制，我们提出了一个零-shot、开放词汇的系统，将领域分类和DST集成到一个单独的管道中。我们的方法包括将DST重新构建为一个适合较不具备能力的模型的问答任务，并为更具适应性的模型使用自我完善的提示。我们的系统不依赖于本体中定义的固定插槽值，从而使系统能够动态适应。我们将我们的方法与现有的SOTA进行比较，并展示它在诸如Multi-WOZ 2.1等数据集上提供了比先前方法高达20%的联合目标准确度（JGA），并且请求LLM API的次数减少了高达90%。

更新时间: 2024-09-24 08:33:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.15861v1

Identification For Control Based on Neural Networks: Approximately Linearizable Models

This work presents a control-oriented identification scheme for efficient control design and stability analysis of nonlinear systems. Neural networks are used to identify a discrete-time nonlinear state-space model to approximate time-domain input-output behavior of a nonlinear system. The network is constructed such that the identified model is approximately linearizable by feedback, ensuring that the control law trivially follows from the learning stage. After the identification and quasi-linearization procedures, linear control theory comes at hand to design robust controllers and study stability of the closed-loop system. The effectiveness and interest of the methodology are illustrated throughout the paper on popular benchmarks for system identification.

Updated: 2024-09-24 08:31:22

标题: 基于神经网络的控制识别：近似线性化模型

摘要: 这项工作提出了一个面向控制的识别方案，用于高效控制设计和非线性系统稳定性分析。神经网络被用来识别离散时间非线性状态空间模型，以近似非线性系统的时域输入-输出行为。网络被构建成识别的模型可以通过反馈近似线性化，确保控制律可以轻松地从学习阶段中得出。在识别和准线性化过程之后，线性控制理论被用来设计鲁棒控制器并研究闭环系统的稳定性。本文通过系统识别的热门基准测试案例，展示了该方法的有效性和兴趣。

更新时间: 2024-09-24 08:31:22

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2409.15858v1

Knowledge Distillation on Spatial-Temporal Graph Convolutional Network for Traffic Prediction

Efficient real-time traffic prediction is crucial for reducing transportation time. To predict traffic conditions, we employ a spatio-temporal graph neural network (ST-GNN) to model our real-time traffic data as temporal graphs. Despite its capabilities, it often encounters challenges in delivering efficient real-time predictions for real-world traffic data. Recognizing the significance of timely prediction due to the dynamic nature of real-time data, we employ knowledge distillation (KD) as a solution to enhance the execution time of ST-GNNs for traffic prediction. In this paper, We introduce a cost function designed to train a network with fewer parameters (the student) using distilled data from a complex network (the teacher) while maintaining its accuracy close to that of the teacher. We use knowledge distillation, incorporating spatial-temporal correlations from the teacher network to enable the student to learn the complex patterns perceived by the teacher. However, a challenge arises in determining the student network architecture rather than considering it inadvertently. To address this challenge, we propose an algorithm that utilizes the cost function to calculate pruning scores, addressing small network architecture search issues, and jointly fine-tunes the network resulting from each pruning stage using KD. Ultimately, we evaluate our proposed ideas on two real-world datasets, PeMSD7 and PeMSD8. The results indicate that our method can maintain the student's accuracy close to that of the teacher, even with the retention of only 3% of network parameters.

Updated: 2024-09-24 08:30:19

标题: 空间时间图卷积网络上的知识蒸馏用于交通预测

摘要: 高效的实时交通预测对于减少交通时间至关重要。为了预测交通状况，我们采用了时空图神经网络（ST-GNN）将我们的实时交通数据建模为时间图。尽管具有这种能力，但它经常在为真实世界交通数据提供高效的实时预测时遇到挑战。由于实时数据的动态性质，我们认识到及时预测的重要性，因此采用知识蒸馏（KD）作为增强ST-GNN用于交通预测的执行时间的解决方案。本文介绍了一个设计用于训练具有更少参数的网络（学生）的成本函数，使用复杂网络（老师）的蒸馏数据，同时保持其准确性接近老师的准确性。我们使用知识蒸馏，将老师网络中的时空相关性纳入，使学生能够学习老师感知的复杂模式。然而，一个挑战是确定学生网络架构，而不是无意中考虑它。为了解决这个挑战，我们提出了一个算法，利用成本函数计算剪枝分数，解决小网络架构搜索问题，并使用KD联合微调每个剪枝阶段产生的网络。最终，我们在两个真实世界数据集PeMSD7和PeMSD8上评估了我们提出的想法。结果表明，我们的方法可以保持学生的准确性接近老师的准确性，即使只保留了网络参数的3%。

更新时间: 2024-09-24 08:30:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.11798v4

HiQA: A Hierarchical Contextual Augmentation RAG for Multi-Documents QA

Retrieval-augmented generation (RAG) has rapidly advanced the language model field, particularly in question-answering (QA) systems. By integrating external documents during the response generation phase, RAG significantly enhances the accuracy and reliability of language models. This method elevates the quality of responses and reduces the frequency of hallucinations, where the model generates incorrect or misleading information. However, these methods exhibit limited retrieval accuracy when faced with numerous indistinguishable documents, presenting notable challenges in their practical application. In response to these emerging challenges, we present HiQA, an advanced multi-document question-answering (MDQA) framework that integrates cascading metadata into content and a multi-route retrieval mechanism. We also release a benchmark called MasQA to evaluate and research in MDQA. Finally, HiQA demonstrates the state-of-the-art performance in multi-document environments.

Updated: 2024-09-24 08:25:37

标题: HiQA：一种用于多文档问答的分层上下文增强RAG

摘要: 检索增强生成（RAG）迅速推动了语言模型领域的发展，特别是在问答（QA）系统中。通过在响应生成阶段集成外部文档，RAG显著提高了语言模型的准确性和可靠性。这种方法提升了响应的质量，减少了幻觉的频率，即模型生成不正确或误导性信息的情况。然而，当面对大量难以区分的文档时，这些方法显示出有限的检索准确性，在实际应用中面临显著挑战。为了应对这些新兴挑战，我们提出了HiQA，一种先进的多文档问答（MDQA）框架，将级联元数据整合到内容中，并采用多路检索机制。我们还发布了一个名为MasQA的基准来评估和研究MDQA。最后，HiQA在多文档环境中展示了最先进的性能。

更新时间: 2024-09-24 08:25:37

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.01767v2

KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation

The recently developed retrieval-augmented generation (RAG) technology has enabled the efficient construction of domain-specific applications. However, it also has limitations, including the gap between vector similarity and the relevance of knowledge reasoning, as well as insensitivity to knowledge logic, such as numerical values, temporal relations, expert rules, and others, which hinder the effectiveness of professional knowledge services. In this work, we introduce a professional domain knowledge service framework called Knowledge Augmented Generation (KAG). KAG is designed to address the aforementioned challenges with the motivation of making full use of the advantages of knowledge graph(KG) and vector retrieval, and to improve generation and reasoning performance by bidirectionally enhancing large language models (LLMs) and KGs through five key aspects: (1) LLM-friendly knowledge representation, (2) mutual-indexing between knowledge graphs and original chunks, (3) logical-form-guided hybrid reasoning engine, (4) knowledge alignment with semantic reasoning, and (5) model capability enhancement for KAG. We compared KAG with existing RAG methods in multihop question answering and found that it significantly outperforms state-of-theart methods, achieving a relative improvement of 19.6% on 2wiki and 33.5% on hotpotQA in terms of F1 score. We have successfully applied KAG to two professional knowledge Q&A tasks of Ant Group, including E-Government Q&A and E-Health Q&A, achieving significant improvement in professionalism compared to RAG methods.

Updated: 2024-09-24 08:24:39

标题: KAG：通过知识增强生成提升专业领域中的LLMs

摘要: 最近开发的检索增强生成（RAG）技术已经实现了领域特定应用的高效构建。然而，它也存在一些限制，包括向量相似性与知识推理相关性之间的差距，以及对知识逻辑（如数值、时间关系、专家规则等）的不敏感，这些限制阻碍了专业知识服务的有效性。在这项工作中，我们介绍了一种名为知识增强生成（KAG）的专业领域知识服务框架。KAG旨在应对上述挑战，旨在充分利用知识图（KG）和向量检索的优势，通过五个关键方面双向增强大型语言模型（LLMs）和KGs，以改善生成和推理性能：（1）LLM友好的知识表示，（2）知识图与原始块之间的相互索引，（3）逻辑形式引导的混合推理引擎，（4）与语义推理对齐的知识，以及（5）KAG的模型能力增强。我们将KAG与现有的RAG方法在多跳问题回答中进行比较，发现它在F1得分方面显著优于最先进的方法，在2wiki上相对提高了19.6％，在hotpotQA上提高了33.5％。我们已成功将KAG应用于蚂蚁集团的两项专业知识问答任务，包括电子政务问答和电子健康问答，在专业性方面与RAG方法相比取得了显著进步。

更新时间: 2024-09-24 08:24:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.13731v2

iGAiVA: Integrated Generative AI and Visual Analytics in a Machine Learning Workflow for Text Classification

In developing machine learning (ML) models for text classification, one common challenge is that the collected data is often not ideally distributed, especially when new classes are introduced in response to changes of data and tasks. In this paper, we present a solution for using visual analytics (VA) to guide the generation of synthetic data using large language models. As VA enables model developers to identify data-related deficiency, data synthesis can be targeted to address such deficiency. We discuss different types of data deficiency, describe different VA techniques for supporting their identification, and demonstrate the effectiveness of targeted data synthesis in improving model accuracy. In addition, we present a software tool, iGAiVA, which maps four groups of ML tasks into four VA views, integrating generative AI and VA into an ML workflow for developing and improving text classification models.

Updated: 2024-09-24 08:19:45

标题: iGAiVA：集成生成式人工智能和视觉分析在文本分类机器学习工作流中的应用

摘要: 在开发用于文本分类的机器学习（ML）模型时，一个常见的挑战是收集到的数据通常分布不理想，特别是当针对数据和任务的变化引入新类别时。在本文中，我们提出了一种利用视觉分析（VA）来引导使用大型语言模型生成合成数据的解决方案。由于VA使模型开发者能够识别与数据相关的不足之处，因此数据合成可以针对这种不足。我们讨论了不同类型的数据不足，描述了不同的VA技术以支持其识别，并展示了有针对性的数据合成在提高模型准确性方面的有效性。此外，我们提出了一个名为iGAiVA的软件工具，将四组ML任务映射到四个VA视图中，将生成AI和VA集成到ML工作流程中，用于开发和改进文本分类模型。

更新时间: 2024-09-24 08:19:45

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2409.15848v1

Adaptive Learn-then-Test: Statistically Valid and Efficient Hyperparameter Selection

We introduce adaptive learn-then-test (aLTT), an efficient hyperparameter selection procedure that provides finite-sample statistical guarantees on the population risk of AI models. Unlike the existing learn-then-test (LTT) technique, which relies on conventional p-value-based multiple hypothesis testing (MHT), aLTT implements sequential data-dependent MHT with early termination by leveraging e-processes. As a result, aLTT can reduce the number of testing rounds, making it particularly well-suited for scenarios in which testing is costly or presents safety risks. Apart from maintaining statistical validity, in applications such as online policy selection for offline reinforcement learning and hyperparameter tuning for engineering systems, aLTT is shown to achieve the same performance as LTT while requiring only a fraction of the testing rounds.

Updated: 2024-09-24 08:14:26

标题: 自适应学习-测试：统计学上有效和高效的超参数选择

摘要: 我们介绍了自适应学习-测试（aLTT），这是一种有效的超参数选择过程，可以对人工智能模型的总体风险提供有限样本的统计保证。与现有的学习-测试（LTT）技术不同，后者依赖于传统的基于p值的多重假设检验（MHT），aLTT利用e-process实现了顺序数据相关的MHT，通过提前终止来减少测试轮数。因此，aLTT可以减少测试轮数，特别适用于测试成本高或存在安全风险的场景。除了保持统计有效性外，在在线离线强化学习的政策选择和工程系统的超参数调整等应用中，aLTT被证明能够在仅需一小部分测试轮数的情况下实现与LTT相同的性能。

更新时间: 2024-09-24 08:14:26

领域: stat.ML,cs.AI,cs.IT,cs.LG,math.IT,stat.ME

下载: http://arxiv.org/abs/2409.15844v1

From Passive Watching to Active Learning: Empowering Proactive Participation in Digital Classrooms with AI Video Assistant

In online education, innovative tools are crucial for enhancing learning outcomes. SAM (Study with AI Mentor) is an advanced platform that integrates educational videos with a context-aware chat interface powered by large language models. SAM encourages students to ask questions and explore unclear concepts in real-time, offering personalized, context-specific assistance, including explanations of formulas, slides, and images. In a crowdsourced user study involving 140 participants, SAM was evaluated through pre- and post-knowledge tests, comparing a group using SAM with a control group. The results demonstrated that SAM users achieved greater knowledge gains, with a 96.8% answer accuracy. Participants also provided positive feedback on SAM's usability and effectiveness. SAM's proactive approach to learning not only enhances learning outcomes but also empowers students to take full ownership of their educational experience, representing a promising future direction for online learning tools.

Updated: 2024-09-24 08:12:36

标题: 从被动观看到主动学习：利用AI视频助手增强数字教室中的积极参与

摘要: 在线教育中，创新工具对提高学习成果至关重要。SAM（与AI导师一起学习）是一个先进的平台，将教育视频与由大型语言模型驱动的上下文感知聊天界面集成在一起。SAM鼓励学生实时提问并探索不清晰的概念，提供个性化、上下文特定的帮助，包括公式、幻灯片和图像的解释。在一个涉及140名参与者的众包用户研究中，通过SAM的使用组与对照组进行了预后知识测试比较，结果表明SAM用户取得了更大的知识收益，答案准确率为96.8%。参与者还对SAM的易用性和有效性给予了积极的反馈。SAM对学习的主动态度不仅提高了学习成果，还赋予学生对他们的教育经历拥有完全的控制权，代表着在线学习工具未来发展的一个有前途的方向。

更新时间: 2024-09-24 08:12:36

领域: cs.AI

下载: http://arxiv.org/abs/2409.15843v1

Cross Layer Optimization and Distributed Reinforcement Learning for Wireless 360° Video Streaming

Wirelessly streaming high quality 360 degree videos is still a challenging problem. When there are many users watching different 360 degree videos and competing for the computing and communication resources, the streaming algorithm at hand should maximize the average quality of experience (QoE) while guaranteeing a minimum rate for each user. In this paper, we propose a cross layer optimization approach that maximizes the available rate to each user and efficiently uses it to maximize users' QoE. Particularly, we consider a tile based 360 degree video streaming, and we optimize a QoE metric that balances the tradeoff between maximizing each user's QoE and ensuring fairness among users. We show that the problem can be decoupled into two interrelated subproblems: (i) a physical layer subproblem whose objective is to find the download rate for each user, and (ii) an application layer subproblem whose objective is to use that rate to find a quality decision per tile such that the user's QoE is maximized. We prove that the physical layer subproblem can be solved optimally with low complexity and an actor-critic deep reinforcement learning (DRL) is proposed to leverage the parallel training of multiple independent agents and solve the application layer subproblem. Extensive experiments reveal the robustness of our scheme and demonstrate its significant performance improvement compared to several baseline algorithms.

Updated: 2024-09-24 07:55:21

标题: 跨层优化和分布式强化学习用于无线360°视频流传输

摘要: 通过无线传输高质量的360度视频仍然是一个具有挑战性的问题。当有许多用户观看不同的360度视频并竞争计算和通信资源时，当前的流媒体算法应该最大化体验质量（QoE），同时保证每个用户的最低速率。在本文中，我们提出了一种跨层优化方法，该方法最大化每个用户的可用速率，并有效地利用它来最大化用户的QoE。特别地，我们考虑基于瓦片的360度视频流，我们优化了一个QoE指标，该指标平衡了最大化每个用户的QoE和确保用户之间公平的权衡。我们证明这个问题可以分解为两个相互关联的子问题：（i）一个物理层子问题，其目标是为每个用户找到下载速率，以及（ii）一个应用层子问题，其目标是使用该速率找到每个瓦片的质量决策，从而最大化用户的QoE。我们证明了物理层子问题可以以低复杂度最优解决，同时提出了一个actor-critic深度强化学习（DRL）方法，以利用多个独立代理的并行训练，并解决应用层子问题。大量实验证实了我们方案的稳健性，并展示了与几种基线算法相比的显著性能改进。

更新时间: 2024-09-24 07:55:21

领域: cs.LG,eess.IV

下载: http://arxiv.org/abs/2011.06356v3

Flow to Rare Events: An Application of Normalizing Flow in Temporal Importance Sampling for Automated Vehicle Validation

Automated Vehicle (AV) validation based on simulated testing requires unbiased evaluation and high efficiency. One effective solution is to increase the exposure to risky rare events while reweighting the probability measure. However, characterizing the distribution of risky events is particularly challenging due to the paucity of samples and the temporality of continuous scenario variables. To solve it, we devise a method to represent, generate, and reweight the distribution of risky rare events. We decompose the temporal evolution of continuous variables into distribution components based on conditional probability. By introducing the Risk Indicator Function, the distribution of risky rare events is theoretically precipitated out of naturalistic driving distribution. This targeted distribution is practically generated via Normalizing Flow, which achieves exact and tractable probability evaluation of intricate distribution. The rare event distribution is then demonstrated as the advantageous Importance Sampling distribution. We also promote the technique of temporal Importance Sampling. The combined method, named as TrimFlow, is executed to estimate the collision rate of Car-following scenarios as a tentative practice. The results showed that sampling background vehicle maneuvers from rare event distribution could evolve testing scenarios to hazardous states. TrimFlow reduced 86.1% of tests compared to generating testing scenarios according to their exposure in the naturalistic driving environment. In addition, the TrimFlow method is not limited to one specific type of functional scenario.

Updated: 2024-09-24 07:51:30

标题: 稀有事件的流动：在时间重要性采样中应用标准化流进行自动驾驶车辆验证

摘要: 基于模拟测试的自动驾驶车辆（AV）验证需要公正的评估和高效率。一种有效的解决方案是增加对风险稀有事件的暴露，同时重新加权概率度量。然而，由于样本稀缺和连续场景变量的时间性，表征风险事件的分布特别具有挑战性。为了解决这个问题，我们设计了一种方法来表示、生成和重新加权风险稀有事件的分布。我们将连续变量的时间演变分解为基于条件概率的分布组件。通过引入风险指示函数，风险稀有事件的分布从自然驾驶分布中理论上溯源而来。通过正则化流，实际生成了这个目标分布，从而实现了复杂分布的精确和可处理的概率评估。稀有事件分布随后被展示为有利的重要性采样分布。我们还提倡了时间重要性采样技术。将这两种方法结合起来，命名为TrimFlow，并用于估计跟车场景的碰撞率作为试验性实践。结果显示，从稀有事件分布中采样背景车辆操纵可以将测试场景演变为危险状态。与根据在自然驾驶环境中的暴露生成测试场景相比，TrimFlow减少了86.1%的测试。此外，TrimFlow方法并不局限于特定类型的功能场景。

更新时间: 2024-09-24 07:51:30

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.07320v2

CogGPT: Unleashing the Power of Cognitive Dynamics on Large Language Models

Cognitive dynamics are pivotal to advance human understanding of the world. Recent advancements in large language models (LLMs) reveal their potential for cognitive simulation. However, these LLM-based cognitive studies primarily focus on static modeling, overlooking the dynamic nature of cognition. To bridge this gap, we propose the concept of the cognitive dynamics of LLMs and present a corresponding task with the inspiration of longitudinal studies. Towards the task, we develop CogBench, a novel benchmark to assess the cognitive dynamics of LLMs and validate it through participant surveys. We also design two evaluation metrics for CogBench, including Authenticity and Rationality. Recognizing the inherent static nature of LLMs, we introduce CogGPT for the task, which features an innovative iterative cognitive mechanism aimed at enhancing lifelong cognitive dynamics. Empirical results demonstrate the superiority of CogGPT over existing methods, particularly in its ability to facilitate role-specific cognitive dynamics under continuous information flows.

Updated: 2024-09-24 07:41:19

标题: CogGPT：释放认知动态对大规模语言模型的影响力

摘要: 认知动态对于推进人类对世界的理解至关重要。最近，大型语言模型（LLMs）的进展揭示了它们在认知模拟方面的潜力。然而，基于LLMs的认知研究主要集中在静态建模，忽视了认知的动态特性。为了弥补这一差距，我们提出了LLMs的认知动态概念，并提出了一个受长期研究启发的相应任务。为了实现这一任务，我们开发了CogBench，这是一个用于评估LLMs认知动态的新型基准，并通过参与者调查对其进行验证。我们还为CogBench设计了两个评估指标，包括真实性和合理性。鉴于LLMs固有的静态特性，我们为该任务引入了CogGPT，它具有一个旨在增强终生认知动态的创新迭代认知机制。实证结果表明，CogGPT在现有方法方面具有优越性，尤其在其在连续信息流下促进特定角色的认知动态方面表现突出。

更新时间: 2024-09-24 07:41:19

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.08438v2

Empirical Insights on Fine-Tuning Large Language Models for Question-Answering

Large language models (LLMs) encode extensive world knowledge through pre-training on massive datasets, which can then be fine-tuned for the question-answering (QA) task. However, effective strategies for fine-tuning LLMs for the QA task remain largely unexplored. To address this gap, we categorize supervised fine-tuning (SFT) data based on the extent of knowledge memorized by the pretrained LLMs and conduct a series of empirical analyses. Our experiments, involving four LLMs from three different model families, focus on three key factors: the amount of data required for SFT, the impact of different SFT datasets on model performance, and how data requirements vary across LLMs. The results show that as few as 60 data points during the SFT stage can activate the knowledge encoded during pre-training, enabling LLMs to perform the QA task. Additionally, SFT with data of varying memory levels has a significant impact on LLM performance, with the optimal dataset differing based on the specific model being fine-tuned. Future research will delve deeper into the mechanisms underlying these phenomena.

Updated: 2024-09-24 07:38:38

标题: 对于问题回答，大语言模型微调的经验见解

摘要: 大型语言模型（LLMs）通过在大规模数据集上进行预训练来编码广泛的世界知识，然后可以对问题回答（QA）任务进行微调。然而，针对QA任务的LLMs的有效微调策略仍未得到充分探讨。为了填补这一空白，我们根据预训练的LLMs记忆的知识程度对受监督的微调（SFT）数据进行分类，并进行了一系列实证分析。我们的实验涉及来自三种不同模型系列的四个LLMs，重点关注三个关键因素：SFT所需数据量、不同SFT数据集对模型性能的影响，以及LLMs之间数据需求的变化。结果显示，仅需60个数据点在SFT阶段即可激活预训练期间编码的知识，使LLMs能够执行QA任务。此外，具有不同记忆级别数据的SFT对LLM性能有显著影响，最佳数据集的选择取决于具体微调的模型。未来的研究将更深入地探讨这些现象背后的机制。

更新时间: 2024-09-24 07:38:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.15825v1

A Mobile Payment Scheme Using Biometric Identification with Mutual Authentication

Cashless payment systems offer many benefits over cash, but also have some drawbacks. Fake terminals, skimming, wireless connectivity, and relay attacks are persistent problems. Attempts to overcome one problem often lead to another - for example, some systems use QR codes to avoid skimming and connexion issues, but QR codes can be stolen at distance and relayed. In this paper, we propose a novel mobile payment scheme based on biometric identification that provides mutual authentication to protect the user from rogue terminals. Our scheme imposes only minimal requirements on terminal hardware, does not depend on wireless connectivity between the user and the verifier during the authentication phase, and does not require the user to trust the terminal until it has authenticated itself to the user. We show that our scheme is resistant against phishing, replay, relay, and presentation attacks.

Updated: 2024-09-24 07:37:55

标题: 一种使用生物特征识别和双向认证的移动支付方案

摘要: 无现金支付系统比现金提供了许多优势，但也有一些缺点。虚假终端、刷卡、无线连接和中继攻击是持续存在的问题。试图解决一个问题往往会导致另一个问题 - 例如，一些系统使用QR码来避免刷卡和连接问题，但QR码可以被远程窃取和中继。在本文中，我们提出了一种基于生物识别身份验证的新型移动支付方案，提供互相认证来保护用户免受恶意终端的侵害。我们的方案对终端硬件的要求仅限于最低要求，在身份验证阶段用户和验证者之间不依赖无线连接，并且不需要用户信任终端，直到终端向用户验证自己。我们展示了我们的方案对网络钓鱼、重放、中继和呈现攻击具有抵抗力。

更新时间: 2024-09-24 07:37:55

领域: cs.CR

下载: http://arxiv.org/abs/2409.17181v1

Supervised Fine-Tuning: An Activation Pattern Optimization Process for Attention Heads

Though demonstrating promising potential, LLMs' performance on complex tasks, such as advanced mathematics and complex disease diagnosis is still unsatisfactory. A key issue is the present LLMs learn in a data-driven schema, while the instruction dataset about these complex tasks is both scarce and hard to collect or construct. On the contrary, a prominent phenomenon is that LLMs can learn rather fast on those simpler tasks with adequate prior knowledge captured during pretraining stage. Thus, if the prerequisite and mechanism of such rapid generalization could be elucidated, it could be highly beneficial in enhancing the efficiency and effectiveness of the LLM's ability to learn complex tasks. Thus, in this paper, we employ a gradient-based method, to dissect the process that the SFT process adapts LLMs to downstream tasks via the perspective of attention patterns. We find that: (1) LLMs selectively activate task-specific attention heads during SFT; (2) activation patterns for complex tasks are combinations of basic task patterns; and (3) changes in a few parameters can significantly impact activation patterns after SFT on a small number of samples. Based on these insights, we conduct experiments to examine whether these conclusions could effectively enhance the efficiency and effectiveness of SFT, particularly in handling complex tasks and when instructional resources are scarce. Our research not only uncovers the underlying reasons behind LLMs' rapid learning and generalization mechanisms but also provides practical solutions for addressing data challenges in complex and specialized tasks.

Updated: 2024-09-24 07:34:50

标题: 监督微调：一种用于注意力头的激活模式优化过程

摘要: 尽管表现出有希望的潜力，但大型语言模型在复杂任务（如高级数学和复杂疾病诊断）上的表现仍然不尽人意。一个关键问题是目前的大型语言模型在数据驱动的模式下学习，而关于这些复杂任务的指导数据集既稀缺又难以收集或构建。相反，一个显著的现象是大型语言模型在那些具有充分先验知识的简单任务上学习速度相当快，这些知识在预训练阶段已被捕捉。因此，如果可以阐明这种快速泛化的先决条件和机制，将对提高大型语言模型学习复杂任务的效率和效力大有裨益。因此，在本文中，我们采用基于梯度的方法，通过关注模式的视角来解剖SFT过程如何适应大型语言模型到下游任务。我们发现：（1）在SFT过程中，大型语言模型会选择性地激活特定任务的关注头；（2）复杂任务的激活模式是基本任务模式的组合；（3）一些参数的变化可以显著影响SFT后在少量样本上的激活模式。基于这些见解，我们进行实验证明这些结论是否能有效提高SFT的效率和效力，特别是在处理复杂任务和指导资源稀缺时。我们的研究不仅揭示了大型语言模型快速学习和泛化机制背后的原因，还为解决复杂和专业任务中的数据挑战提供了实用解决方案。

更新时间: 2024-09-24 07:34:50

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2409.15820v1

SwiftDossier: Tailored Automatic Dossier for Drug Discovery with LLMs and Agents

The advancement of artificial intelligence algorithms has expanded their application to several fields such as the biomedical domain. Artificial intelligence systems, including Large Language Models (LLMs), can be particularly advantageous in drug discovery, which is a very long and expensive process. However, LLMs by themselves lack in-depth knowledge about specific domains and can generate factually incorrect information. Moreover, they are not able to perform more complex actions that imply the usage of external tools. Our work is focused on these two issues. Firstly, we show how the implementation of an advanced RAG system can help the LLM to generate more accurate answers to drug-discovery-related questions. The results show that the answers generated by the LLM with the RAG system surpass in quality the answers produced by the model without RAG. Secondly, we show how to create an automatic target dossier using LLMs and incorporating them with external tools that they can use to execute more intricate tasks to gather data such as accessing databases and executing code. The result is a production-ready target dossier containing the acquired information summarized into a PDF and a PowerPoint presentation.

Updated: 2024-09-24 07:29:05

标题: SwiftDossier：使用LLMs和Agents定制的药物发现自动档案

摘要: 人工智能算法的进步已将其应用拓展至多个领域，如生物医学领域。人工智能系统，包括大型语言模型（LLMs），在药物发现中可能特别有优势，这是一个非常漫长且昂贵的过程。然而，LLMs本身缺乏关于特定领域的深入知识，可能生成事实不准确的信息。此外，它们无法执行涉及外部工具的更复杂操作。我们的工作集中在这两个问题上。首先，我们展示了如何通过实施先进的RAG系统来帮助LLM生成更准确的与药物发现相关问题的答案。结果显示，LLM与RAG系统生成的答案在质量上优于没有RAG的模型产生的答案。其次，我们展示了如何利用LLMs创建自动目标档案，并将它们与外部工具结合起来，这些工具可以用于执行更复杂的任务，例如访问数据库和执行代码以收集数据。结果是一个包含所获得信息的产品就绪目标档案，摘要为PDF和PowerPoint演示文稿。

更新时间: 2024-09-24 07:29:05

领域: cs.AI,68T07, 92C50, 68T09,I.2.7; J.3

下载: http://arxiv.org/abs/2409.15817v1

AsthmaBot: Multi-modal, Multi-Lingual Retrieval Augmented Generation For Asthma Patient Support

Asthma rates have risen globally, driven by environmental and lifestyle factors. Access to immediate medical care is limited, particularly in developing countries, necessitating automated support systems. Large Language Models like ChatGPT (Chat Generative Pre-trained Transformer) and Gemini have advanced natural language processing in general and question answering in particular, however, they are prone to producing factually incorrect responses (i.e. hallucinations). Retrieval-augmented generation systems, integrating curated documents, can improve large language models' performance and reduce the incidence of hallucination. We introduce AsthmaBot, a multi-lingual, multi-modal retrieval-augmented generation system for asthma support. Evaluation of an asthma-related frequently asked questions dataset shows AsthmaBot's efficacy. AsthmaBot has an added interactive and intuitive interface that integrates different data modalities (text, images, videos) to make it accessible to the larger public. AsthmaBot is available online via \url{asthmabot.datanets.org}.

Updated: 2024-09-24 07:24:01

标题: 哮喘机器人：哮喘患者支持的多模态、多语言检索增强生成

摘要: 哮喘发病率在全球范围内上升，主要受环境和生活方式因素影响。立即获得医疗护理的途径受到限制，尤其是在发展中国家，迫使需要自动化支持系统。大型语言模型如ChatGPT（聊天生成预训练变压器）和Gemini已经在自然语言处理和特定问题回答方面取得了进展，然而，它们容易产生事实不准确的回应（即幻觉）。整合筛选文档的检索增强生成系统可以改善大型语言模型的性能并降低幻觉的发生率。我们介绍了AsthmaBot，一个多语言、多模态的检索增强生成系统，用于哮喘支持。对与哮喘相关的常见问题数据集的评估显示AsthmaBot的有效性。AsthmaBot具有交互式和直观的界面，整合了不同的数据模态（文本、图像、视频），使其可供更广泛的公众获取。AsthmaBot可通过\url{asthmabot.datanets.org}在线使用。

更新时间: 2024-09-24 07:24:01

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.15815v1

Interactive Example-based Explanations to Improve Health Professionals' Onboarding with AI for Human-AI Collaborative Decision Making

A growing research explores the usage of AI explanations on user's decision phases for human-AI collaborative decision-making. However, previous studies found the issues of overreliance on `wrong' AI outputs. In this paper, we propose interactive example-based explanations to improve health professionals' onboarding with AI for their better reliance on AI during AI-assisted decision-making. We implemented an AI-based decision support system that utilizes a neural network to assess the quality of post-stroke survivors' exercises and interactive example-based explanations that systematically surface the nearest neighborhoods of a test/task sample from the training set of the AI model to assist users' onboarding with the AI model. To investigate the effect of interactive example-based explanations, we conducted a study with domain experts, health professionals to evaluate their performance and reliance on AI. Our interactive example-based explanations during onboarding assisted health professionals in having a better reliance on AI and making a higher ratio of making `right' decisions and a lower ratio of `wrong' decisions than providing only feature-based explanations during the decision-support phase. Our study discusses new challenges of assisting user's onboarding with AI for human-AI collaborative decision-making.

Updated: 2024-09-24 07:20:09

标题: 交互式基于示例的解释以改善医疗专业人员对人工智能在人机协作决策中的融入

摘要: 一个不断增长的研究探讨了在人工智能协作决策过程中使用AI解释对用户决策阶段的影响。然而，先前的研究发现了对“错误”AI输出的过度依赖问题。在本文中，我们提出了基于交互式示例的解释，以改善健康专业人员对AI的接受度，从而在AI辅助决策过程中更可靠地依赖AI。我们实施了一个基于神经网络的决策支持系统，用于评估中风幸存者锻炼的质量，并利用交互式基于示例的解释系统地系统地展示AI模型的训练集中测试/任务样本的最近邻居，以帮助用户与AI模型的接受度。为了研究交互式基于示例的解释的效果，我们进行了一项研究，邀请领域专家、健康专业人员评估他们在AI上的表现和依赖度。我们的交互式基于示例的解释在接受度方面帮助健康专业人员更可靠地依赖AI，比在决策支持阶段仅提供基于特征的解释的情况下做出更高比例的“正确”决策和更低比例的“错误”决策。我们的研究讨论了在人工智能协作决策过程中协助用户接受AI的新挑战。

更新时间: 2024-09-24 07:20:09

领域: cs.HC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.15814v1

Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks

Merging parameters of multiple models has resurfaced as an effective strategy to enhance task performance and robustness, but prior work is limited by the high costs of ensemble creation and inference. In this paper, we leverage the abundance of freely accessible trained models to introduce a cost-free approach to model merging. It focuses on a layer-wise integration of merged models, aiming to maintain the distinctiveness of the task-specific final layers while unifying the initial layers, which are primarily associated with feature extraction. This approach ensures parameter consistency across all layers, essential for boosting performance. Moreover, it facilitates seamless integration of knowledge, enabling effective merging of models from different datasets and tasks. Specifically, we investigate its applicability in Unsupervised Domain Adaptation (UDA), an unexplored area for model merging, for Semantic and Panoptic Segmentation. Experimental results demonstrate substantial UDA improvements without additional costs for merging same-architecture models from distinct datasets ($\uparrow 2.6\%$ mIoU) and different-architecture models with a shared backbone ($\uparrow 6.8\%$ mIoU). Furthermore, merging Semantic and Panoptic Segmentation models increases mPQ by $\uparrow 7\%$. These findings are validated across a wide variety of UDA strategies, architectures, and datasets.

Updated: 2024-09-24 07:19:30

标题: 逐层模型合并用于分割任务中的无监督领域自适应

摘要: 合并多个模型的参数已经重新出现作为增强任务性能和鲁棒性的有效策略，但先前的工作受到集成创建和推理的高成本的限制。在本文中，我们利用大量免费可访问的训练模型，引入一种无成本的模型合并方法。该方法侧重于逐层整合合并模型，旨在保持任务特定最终层的独特性，同时统一与特征提取主要相关的初始层。这种方法确保了所有层的参数一致性，对提高性能至关重要。此外，它促进了知识的无缝整合，实现了来自不同数据集和任务的模型的有效合并。具体而言，我们研究了它在无监督域自适应（UDA）中的适用性，这是模型合并中尚未探索的领域，用于语义和全景分割。实验结果表明，在不增加额外成本的情况下，合并来自不同数据集的相同架构模型（$\uparrow 2.6\%$ mIoU）和具有共享骨干的不同架构模型（$\uparrow 6.8\%$ mIoU）可以显著提高UDA。此外，合并语义和全景分割模型将mPQ提高了$\uparrow 7\%$。这些发现在各种UDA策略、架构和数据集中得到验证。

更新时间: 2024-09-24 07:19:30

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2409.15813v1

Aided design of bridge aesthetics based on Stable Diffusion fine-tuning

Stable Diffusion fine-tuning technique is tried to assist bridge-type innovation. The bridge real photo dataset is built, and Stable Diffusion is fine tuned by using four methods that are Textual Inversion, Dreambooth, Hypernetwork and Lora. All of them can capture the main characteristics of dataset images and realize the personalized customization of Stable Diffusion. Through fine-tuning, Stable Diffusion is not only a drawing tool, but also has the designer's innovative thinking ability. The fine tuned model can generate a large number of innovative new bridge types, which can provide rich inspiration for human designers. The result shows that this technology can be used as an engine of creativity and a power multiplier for human designers.

Updated: 2024-09-24 07:18:32

标题: 基于稳定扩散微调的桥梁美学辅助设计

摘要: 稳定扩散微调技术被尝试用来辅助桥梁型创新。建立了桥梁真实照片数据集，并通过使用文本反转、梦幻展台、超网络和Lora四种方法对稳定扩散进行微调。所有这些方法都能捕获数据集图像的主要特征，并实现稳定扩散的个性化定制。通过微调，稳定扩散不仅是一个绘图工具，还具有设计师的创新思维能力。微调模型可以生成大量创新的新桥梁类型，为人类设计师提供丰富的灵感。结果表明，这项技术可以作为创造力的引擎，为人类设计师提供力量倍增器。

更新时间: 2024-09-24 07:18:32

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2409.15812v1

Blockprint Accuracy Study

Blockprint, a tool for assessing client diversity on the Ethereum beacon chain, is essential for analyzing decentralization. This paper details experiments conducted at MigaLabs to enhance Blockprint's accuracy, evaluating various configurations for the K-Nearest Neighbors (KNN) classifier and exploring the Multi-Layer Perceptron (MLP) classifier as a proposed alternative. Findings suggest that the MLP classifier generally achieves superior accuracy with a smaller training dataset. The study revealed that clients running in different modes, especially those subscribed to all subnets, impact attestation inclusion differently, leading to proposed methods for mitigating the decline in model accuracy. Consequently, the recommendation is to employ an MLP model trained with a combined dataset of slots from both default and subscribed-to-all-subnets client configurations.

Updated: 2024-09-24 07:10:39

标题: 木版印刷准确性研究

摘要: Blockprint是一种用于评估以太坊信标链上客户端多样性的工具，对于分析去中心化至关重要。本文详细介绍了在MigaLabs进行的实验，以提高Blockprint的准确性，评估了K-最近邻（KNN）分类器的各种配置，并探索了多层感知器（MLP）分类器作为一种备选方案。研究结果表明，MLP分类器通常在较小的训练数据集下实现更高的准确性。研究揭示了运行在不同模式下的客户端，特别是订阅所有子网的客户端，对证明包含产生不同影响，导致提出了减少模型准确性下降的方法。因此，建议使用一个经过训练的MLP模型，该模型使用来自默认客户端配置和订阅所有子网客户端配置的槽的组合数据集。

更新时间: 2024-09-24 07:10:39

领域: cs.CR

下载: http://arxiv.org/abs/2409.15808v1

CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation

With the rapid development of artificial intelligence, multimodal learning has become an important research area. For intelligent agents, the state is a crucial modality to convey precise information alongside common modalities like images, videos, and language. This becomes especially clear with the broad adoption of reinforcement learning and multimodal large language models. Nevertheless, the representation of state modality still lags in development. To this end, we propose a High-Fidelity Contrastive Language-State Pre-training (CLSP) method, which can accurately encode state information into general representations for both reinforcement learning and multimodal large language models. Specifically, we first design a pre-training task based on the classification to train an encoder with coarse-grained information. Next, we construct data pairs of states and language descriptions, utilizing the pre-trained encoder to initialize the CLSP encoder. Then, we deploy contrastive learning to train the CLSP encoder to effectively represent precise state information. Additionally, we enhance the representation of numerical information using the Random Fourier Features (RFF) method for high-fidelity mapping. Extensive experiments demonstrate the superior precision and generalization capabilities of our representation, achieving outstanding results in text-state retrieval, reinforcement learning navigation tasks, and multimodal large language model understanding.

Updated: 2024-09-24 07:08:00

标题: CLSP：用于代理状态表示的高保真度对比语言状态预训练

摘要: 随着人工智能的迅速发展，多模态学习已成为一个重要的研究领域。对于智能代理而言，状态是传递精确信息的关键模态，与图像、视频和语言等常见模态并重要。这在强化学习和多模态大语言模型的广泛应用中尤为明显。然而，状态模态的表示仍然滞后于发展。为此，我们提出了一种高保真对比语言-状态预训练（CLSP）方法，可以将状态信息准确地编码到强化学习和多模态大语言模型的通用表示中。具体地，我们首先设计了一个基于分类的预训练任务，用于训练一个包含粗粒信息的编码器。接下来，我们构建了状态和语言描述的数据对，利用预训练的编码器初始化CLSP编码器。然后，我们采用对比学习来训练CLSP编码器，以有效地表示精确的状态信息。此外，我们利用随机傅里叶特征（RFF）方法增强了数值信息的表示，实现高保真映射。大量实验证明了我们表示的优越精度和泛化能力，在文本-状态检索、强化学习导航任务和多模态大语言模型理解方面取得了出色的结果。

更新时间: 2024-09-24 07:08:00

领域: cs.AI

下载: http://arxiv.org/abs/2409.15806v1

A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective

Offline reinforcement learning endeavors to leverage offline datasets to craft effective agent policy without online interaction, which imposes proper conservative constraints with the support of behavior policies to tackle the out-of-distribution problem. However, existing works often suffer from the constraint conflict issue when offline datasets are collected from multiple behavior policies, i.e., different behavior policies may exhibit inconsistent actions with distinct returns across the state space. To remedy this issue, recent advantage-weighted methods prioritize samples with high advantage values for agent training while inevitably ignoring the diversity of behavior policy. In this paper, we introduce a novel Advantage-Aware Policy Optimization (A2PO) method to explicitly construct advantage-aware policy constraints for offline learning under mixed-quality datasets. Specifically, A2PO employs a conditional variational auto-encoder to disentangle the action distributions of intertwined behavior policies by modeling the advantage values of all training data as conditional variables. Then the agent can follow such disentangled action distribution constraints to optimize the advantage-aware policy towards high advantage values. Extensive experiments conducted on both the single-quality and mixed-quality datasets of the D4RL benchmark demonstrate that A2PO yields results superior to the counterparts. Our code will be made publicly available.

Updated: 2024-09-24 07:06:51

标题: A2PO: 朝着有效的离线强化学习的优势感知视角前进

摘要: 离线强化学习致力于利用离线数据集来制定有效的代理策略，无需在线交互，通过行为策略的支持施加适当的保守约束来解决分布外问题。然而，现有的工作在离线数据集来自多个行为策略时经常遇到约束冲突问题，即不同的行为策略可能在状态空间中展示出不一致的动作和不同的回报。为了解决这个问题，最近的基于优势权重的方法优先考虑具有高优势值的样本来训练代理，但不可避免地忽略了行为策略的多样性。在本文中，我们引入了一种新颖的优势感知策略优化（A2PO）方法，明确构建了适用于混合质量数据集的优势感知策略约束进行离线学习。具体来说，A2PO利用条件变分自动编码器来解开相互交织的行为策略的动作分布，通过将所有训练数据的优势值建模为条件变量。然后代理可以遵循这种解开的动作分布约束，以优化朝向高优势值的优势感知策略。在D4RL基准测试的单一质量和混合质量数据集上进行的大量实验表明，A2PO的结果优于对应的方法。我们的代码将会公开发布。

更新时间: 2024-09-24 07:06:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.07262v3

Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks

The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory occupation improvements. These optimization techniques are usually applied independently. We propose a novel methodology to apply them jointly via a lightweight gradient-based search, and in a hardware-aware manner, greatly reducing the time required to generate Pareto-optimal DNNs in terms of accuracy versus cost (i.e., latency or memory). We test our approach on three edge-relevant benchmarks, namely CIFAR-10, Google Speech Commands, and Tiny ImageNet. When targeting the optimization of the memory footprint, we are able to achieve a size reduction of 47.50% and 69.54% at iso-accuracy with the baseline networks with all weights quantized at 8 and 2-bit, respectively. Our method surpasses a previous state-of-the-art approach with up to 56.17% size reduction at iso-accuracy. With respect to the sequential application of state-of-the-art pruning and mixed-precision optimizations, we obtain comparable or superior results, but with a significantly lowered training time. In addition, we show how well-tailored cost models can improve the cost versus accuracy trade-offs when targeting specific hardware for deployment.

Updated: 2024-09-24 07:06:26

标题: 联合修剪和通道级混合精度量化用于高效的深度神经网络

摘要: 深度神经网络（DNNs）的资源需求对它们在边缘设备上的部署提出了重大挑战。解决这个问题的常见方法是修剪和混合精度量化，这些方法导致延迟和内存占用的改善。这些优化技术通常是独立应用的。我们提出了一种新颖的方法，通过轻量级基于梯度的搜索，并以硬件感知的方式将它们联合应用，极大地减少了生成在准确性与成本（即延迟或内存）之间 Pareto 最优的 DNNs 所需的时间。我们在三个边缘相关基准上测试了我们的方法，分别是 CIFAR-10、Google 语音命令和 Tiny ImageNet。当目标是优化内存占用时，我们能够在基准网络中将所有权重量化为 8 位和 2 位时，分别实现尺寸减小 47.50% 和 69.54%。我们的方法在保持准确性不变的情况下，比以往的最先进方法实现了多达 56.17% 的尺寸减小。关于最先进修剪和混合精度优化的顺序应用，我们获得了可比或更好的结果，但训练时间显著缩短。此外，我们展示了如何精心制定的成本模型可以改善针对特定硬件部署时的成本与准确性权衡。

更新时间: 2024-09-24 07:06:26

领域: cs.LG

下载: http://arxiv.org/abs/2407.01054v2

Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension

Logical reading comprehension is a challenging task that entails grasping the underlying semantics of text and applying reasoning to deduce the correct answer. Prior researches have primarily focused on enhancing logical reasoning capabilities through Chain-of-Thought (CoT) or data augmentation. However, previous work constructing chain-of-thought rationales concentrates solely on analyzing correct options, neglecting the incorrect alternatives. Addtionally, earlier efforts on data augmentation by altering contexts rely on rule-based methods, which result in generated contexts that lack diversity and coherence. To address these issues, we propose a Premise-Oriented Data Augmentation (PODA) framework. This framework can generate CoT rationales including analyses for both correct and incorrect options, while constructing diverse and high-quality counterfactual contexts from incorrect candidate options. We integrate summarizing premises and identifying premises for each option into rationales. Subsequently, we employ multi-step prompts with identified premises to construct counterfactual context. To facilitate the model's capabilities to better differentiate the reasoning process associated with each option, we introduce a novel thought-path contrastive learning method that compares reasoning paths between the original and counterfactual samples. Experimental results on three representative LLMs demonstrate that our method can improve the baselines substantially across two challenging logical reasoning benchmarks (ReClor and LogiQA 2.0). The data and code are released at https://github.com/lalalamdbf/TPReasoner.

Updated: 2024-09-24 07:00:24

标题: 思维路径对比学习：基于前提导向数据增强的逻辑阅读理解

摘要: 逻辑阅读理解是一项具有挑战性的任务，需要抓住文本的基本语义，并运用推理来推断正确答案。先前的研究主要集中在通过思维链（CoT）或数据增强来增强逻辑推理能力。然而，先前的工作构建思维链理由主要集中在分析正确选项，忽略了错误的替代方案。此外，早期通过改变上下文进行数据增强的努力依赖于基于规则的方法，导致生成的上下文缺乏多样性和连贯性。为了解决这些问题，我们提出了一个基于前提的数据增强（PODA）框架。该框架可以生成包括对正确和错误选项的分析的CoT理由，并从错误的候选选项中构建多样化和高质量的反事实语境。我们将为每个选项集成总结前提和识别前提到理由中。随后，我们利用多步提示与识别的前提来构建反事实语境。为了促进模型更好地区分与每个选项相关的推理过程，我们引入了一种新颖的思路路径对比学习方法，比较原始样本和反事实样本之间的推理路径。对三种代表性LLM的实验结果表明，我们的方法可以在两个具有挑战性的逻辑推理基准测试（ReClor和LogiQA 2.0）上显著改进基线。数据和代码已发布在https://github.com/lalalamdbf/TPReasoner。

更新时间: 2024-09-24 07:00:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.14495v2

Interpretable statistical representations of neural population dynamics and geometry

The dynamics of neuron populations commonly evolve on low-dimensional manifolds. Thus, we need methods that learn the dynamical processes over neural manifolds to infer interpretable and consistent latent representations. We introduce a representation learning method, MARBLE, that decomposes on-manifold dynamics into local flow fields and maps them into a common latent space using unsupervised geometric deep learning. In simulated non-linear dynamical systems, recurrent neural networks, and experimental single-neuron recordings from primates and rodents, we discover emergent low-dimensional latent representations that parametrise high-dimensional neural dynamics during gain modulation, decision-making, and changes in the internal state. These representations are consistent across neural networks and animals, enabling the robust comparison of cognitive computations. Extensive benchmarking demonstrates state-of-the-art within- and across-animal decoding accuracy of MARBLE compared with current representation learning approaches, with minimal user input. Our results suggest that manifold structure provides a powerful inductive bias to develop powerful decoding algorithms and assimilate data across experiments.

Updated: 2024-09-24 06:54:53

标题: 可解释的神经群体动态和几何形状的统计表示

摘要: 神经元群体的动态通常在低维流形上演变。因此，我们需要学习在神经流形上的动态过程的方法，以推断可解释和一致的潜在表示。我们引入了一种表示学习方法，MARBLE，它将在流形上的动态分解为局部流场，并使用无监督几何深度学习将其映射到一个共同的潜在空间。在模拟的非线性动力系统、循环神经网络和实验性灵长类和啮齿类动物的单个神经元记录中，我们发现了新出现的低维潜在表示，这些表示参数化了增益调节、决策制定和内部状态变化过程中的高维神经动态。这些表示在神经网络和动物之间保持一致，使得认知计算的稳健比较成为可能。广泛的基准测试显示，与当前的表示学习方法相比，MARBLE在动物内部和跨动物之间的解码准确性达到了最先进水平，且用户输入最少。我们的结果表明，流形结构提供了一个有力的归纳偏好，可以开发强大的解码算法，并将数据整合到实验中。

更新时间: 2024-09-24 06:54:53

领域: cs.LG,math.DS,q-bio.NC,q-bio.QM

下载: http://arxiv.org/abs/2304.03376v4

A Multi-Level Approach for Class Imbalance Problem in Federated Learning for Remote Industry 4.0 Applications

Deep neural network (DNN) models are effective solutions for industry 4.0 applications (\eg oil spill detection, fire detection, anomaly detection). However, training a DNN network model needs a considerable amount of data collected from various sources and transferred to the central cloud server that can be expensive and sensitive to privacy. For instance, in the remote offshore oil field where network connectivity is vulnerable, a federated fog environment can be a potential computing platform. Hence it is feasible to perform computation within the federation. On the contrary, performing a DNN model training using fog systems poses a security issue that the federated learning (FL) technique can resolve. In this case, the new challenge is the class imbalance problem that can be inherited in local data sets and can degrade the performance of the global model. Therefore, FL training needs to be performed considering the class imbalance problem locally. In addition, an efficient technique to select the relevant worker model needs to be adopted at the global level to increase the robustness of the global model. Accordingly, we utilize one of the suitable loss functions addressing the class imbalance in workers at the local level. In addition, we employ a dynamic threshold mechanism with user-defined worker's weight to efficiently select workers for aggregation that improve the global model's robustness. Finally, we perform an extensive empirical evaluation to explore the benefits of our solution and find up to 3-5% performance improvement than baseline federated learning methods.

Updated: 2024-09-24 06:52:07

标题: 一种多层次方法用于解决远程工业4.0应用中的联邦学习类别不平衡问题

摘要: 深度神经网络（DNN）模型是工业4.0应用（例如油污检测、火灾检测、异常检测）的有效解决方案。然而，训练DNN网络模型需要大量数据，这些数据来自不同来源并且需要传输到中央云服务器，这可能会昂贵且涉及隐私问题。例如，在网络连接脆弱的远程海上油田中，联合雾环境可以成为潜在的计算平台。因此，在联合内部进行计算是可行的。相反，使用雾系统进行DNN模型训练会带来安全问题，而联合学习（FL）技术可以解决这个问题。在这种情况下，新的挑战是可能存在于本地数据集中的类别不平衡问题，这可能会降低全局模型的性能。因此，需要在本地考虑类别不平衡问题来进行FL训练。此外，需要在全局层面采用有效的技术来选择相关的工作模型，以提高全局模型的稳健性。因此，我们利用适当的损失函数之一来解决本地工作模型中的类别不平衡问题。此外，我们采用动态阈值机制和用户定义的工作权重来有效选择聚合工作者，以提高全局模型的稳健性。最后，我们进行了广泛的实证评估，探讨了我们解决方案的好处，并发现与基线联合学习方法相比，性能提升了3-5%。

更新时间: 2024-09-24 06:52:07

领域: cs.DC,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2409.15802v1

Towards Universal Large-Scale Foundational Model for Natural Gas Demand Forecasting

In the context of global energy strategy, accurate natural gas demand forecasting is crucial for ensuring efficient resource allocation and operational planning. Traditional forecasting methods struggle to cope with the growing complexity and variability of gas consumption patterns across diverse industries and commercial sectors. To address these challenges, we propose the first foundation model specifically tailored for natural gas demand forecasting. Foundation models, known for their ability to generalize across tasks and datasets, offer a robust solution to the limitations of traditional methods, such as the need for separate models for different customer segments and their limited generalization capabilities. Our approach leverages contrastive learning to improve prediction accuracy in real-world scenarios, particularly by tackling issues such as noise in historical consumption data and the potential misclassification of similar data samples, which can lead to degradation in the quaility of the representation and thus the accuracy of downstream forecasting tasks. By integrating advanced noise filtering techniques within the contrastive learning framework, our model enhances the quality of learned representations, leading to more accurate predictions. Furthermore, the model undergoes industry-specific fine-tuning during pretraining, enabling it to better capture the unique characteristics of gas consumption across various sectors. We conducted extensive experiments using a large-scale dataset from ENN Group, which includes data from over 10,000 industrial, commercial, and welfare-related customers across multiple regions. Our model outperformed existing state-of-the-art methods, demonstrating a relative improvement in MSE by 3.68\% and in MASE by 6.15\% compared to the best available model.

Updated: 2024-09-24 06:44:29

标题: 朝向通用的大规模基础模型：天然气需求预测

摘要: 在全球能源战略的背景下，准确的天然气需求预测对于确保资源分配和运营规划的高效性至关重要。传统的预测方法很难应对不同行业和商业部门的天然气消费模式日益复杂和多样化的挑战。为了解决这些问题，我们提出了第一个专门针对天然气需求预测的基础模型。基础模型以其在任务和数据集之间泛化的能力而闻名，为传统方法的局限性提供了一个强大的解决方案，例如需要为不同客户群体建立单独的模型以及其有限的泛化能力。我们的方法利用对比学习来提高在现实场景中的预测准确性，特别是通过解决历史消费数据中的噪音和相似数据样本的潜在误分类等问题，这些问题可能导致表示的质量降低，从而影响下游预测任务的准确性。通过在对比学习框架内集成先进的噪音过滤技术，我们的模型提高了学习表示的质量，从而实现更准确的预测。此外，模型在预训练期间经过行业特定的微调，使其能够更好地捕捉不同领域天然气消费的独特特征。我们使用ENN集团的大规模数据集进行了大量实验，该数据集包括来自多个地区的超过10,000个工业、商业和福利相关客户的数据。我们的模型优于现有的最先进方法，相对于最佳可用模型，MSE相对改进了3.68％，MASE相对改进了6.15％。

更新时间: 2024-09-24 06:44:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.15794v1

W2SAT: Learning to generate SAT instances from Weighted Literal Incidence Graphs

The Boolean Satisfiability (SAT) problem stands out as an attractive NP-complete problem in theoretic computer science and plays a central role in a broad spectrum of computing-related applications. Exploiting and tuning SAT solvers under numerous scenarios require massive high-quality industry-level SAT instances, which unfortunately are quite limited in the real world. To address the data insufficiency issue, in this paper, we propose W2SAT, a framework to generate SAT formulas by learning intrinsic structures and properties from given real-world/industrial instances in an implicit fashion. To this end, we introduce a novel SAT representation called Weighted Literal Incidence Graph (WLIG), which exhibits strong representation ability and generalizability against existing counterparts, and can be efficiently generated via a specialized learning-based graph generative model. Decoding from WLIGs into SAT problems is then modeled as finding overlapping cliques with a novel hill-climbing optimization method termed Optimal Weight Coverage (OWC). Experiments demonstrate the superiority of our WLIG-induced approach in terms of graph metrics, efficiency, and scalability in comparison to previous methods. Additionally, we discuss the limitations of graph-based SAT generation for real-world applications, especially when utilizing generated instances for SAT solver parameter-tuning, and pose some potential directions.

Updated: 2024-09-24 06:38:20

标题: W2SAT：从加权文字发生图中学习生成满足性问题实例

摘要: 布尔可满足性（SAT）问题在理论计算机科学中是一个吸引人的NP完全问题，并在广泛的与计算相关的应用中发挥着核心作用。在许多情况下利用和调整SAT求解器需要大量高质量的工业级SAT实例，然而在现实世界中这类实例相当有限。为了解决数据不足的问题，在本文中，我们提出了W2SAT，一个通过隐式学习给定的现实世界/工业实例中的内在结构和属性来生成SAT公式的框架。为此，我们引入了一种新颖的SAT表示称为加权文字关联图（WLIG），它具有强大的表示能力和对现有对手的泛化能力，并且可以通过专门的基于学习的图生成模型高效生成。从WLIG解码到SAT问题被建模为利用一种称为最优权重覆盖（OWC）的新颖爬山优化方法寻找重叠团。实验表明我们的WLIG诱导方法在图度量、效率和可扩展性方面优于先前的方法。此外，我们讨论了基于图的SAT生成在真实应用中的局限性，特别是在利用生成的实例进行SAT求解器参数调整时，并提出了一些潜在方向。

更新时间: 2024-09-24 06:38:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2302.00272v2

Small Language Models: Survey, Measurements, and Insights

Small language models (SLMs), despite their widespread adoption in modern smart devices, have received significantly less academic attention compared to their large language model (LLM) counterparts, which are predominantly deployed in data centers and cloud environments. While researchers continue to improve the capabilities of LLMs in the pursuit of artificial general intelligence, SLM research aims to make machine intelligence more accessible, affordable, and efficient for everyday tasks. Focusing on transformer-based, decoder-only language models with 100M-5B parameters, we survey 59 state-of-the-art open-source SLMs, analyzing their technical innovations across three axes: architectures, training datasets, and training algorithms. In addition, we evaluate their capabilities in various domains, including commonsense reasoning, in-context learning, mathematics, and coding. To gain further insight into their on-device runtime costs, we benchmark their inference latency and memory footprints. Through in-depth analysis of our benchmarking data, we offer valuable insights to advance research in this field.

Updated: 2024-09-24 06:36:56

标题: 小语言模型：调查、测量和洞见

摘要: 尽管小语言模型（SLMs）在现代智能设备中被广泛采用，但与主要部署在数据中心和云环境中的大语言模型（LLMs）相比，它们受到的学术关注明显较少。虽然研究人员继续改进LLMs的能力以追求人工通用智能，但SLM研究旨在使机器智能在日常任务中更具可访问性、可负担性和效率。我们重点关注基于变压器、仅解码器的语言模型，参数为100M-5B，调查了59个最先进的开源SLMs，分析了它们在三个方面的技术创新：架构、训练数据集和训练算法。此外，我们评估了它们在各个领域的能力，包括常识推理、上下文学习、数学和编码。为了进一步了解它们在设备上的运行时成本，我们对它们的推断延迟和内存占用进行了基准测试。通过对我们基准测试数据的深入分析，我们提供了有价值的见解，以推动该领域的研究。

更新时间: 2024-09-24 06:36:56

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.15790v1

Innovative Speech-Based Deep Learning Approaches for Parkinson's Disease Classification: A Systematic Review

Parkinson's disease (PD), the second most prevalent neurodegenerative disorder worldwide, frequently presents with early-stage speech impairments. Recent advancements in Artificial Intelligence (AI), particularly deep learning (DL), have significantly enhanced PD diagnosis through the analysis of speech data. Nevertheless, the progress of research is restricted by the limited availability of publicly accessible speech-based PD datasets, primarily due to privacy concerns. The goal of this systematic review is to explore the current landscape of speech-based DL approaches for PD classification, based on 33 scientific works published between January 2020 and March 2024. We discuss their available resources, capabilities, and potential limitations, and issues related to bias, explainability, and privacy. Furthermore, this review provides an overview of publicly accessible speech-based datasets and open-source material for PD. The DL approaches identified are categorized into end-to-end (E2E) learning, transfer learning (TL), and deep acoustic feature extraction (DAFE). Among E2E approaches, Convolutional Neural Networks (CNNs) are prevalent, though Transformers are increasingly popular. E2E approaches face challenges such as limited data and computational resources, especially with Transformers. TL addresses these issues by providing more robust PD diagnosis and better generalizability across languages. DAFE aims to improve the explainability and interpretability of results by examining the specific effects of deep features on both other DL approaches and more traditional machine learning (ML) methods. However, it often underperforms compared to E2E and TL approaches.

Updated: 2024-09-24 06:29:29

标题: 创新的基于语音的深度学习方法用于帕金森病分类：一项系统性综述

摘要: 帕金森病（PD）是全球第二常见的神经退行性疾病，常常表现为早期言语障碍。最近人工智能（AI）特别是深度学习（DL）的进步显著提高了通过言语数据分析进行PD诊断的能力。然而，研究的进展受到公开可访问的基于言语的PD数据集的限制，主要是由于隐私问题。本系统性评论的目标是探讨2020年1月至2024年3月间发表的33篇科学作品中基于言语的DL方法用于PD分类的当前情况。我们讨论它们的资源、能力、潜在限制，以及与偏见、可解释性和隐私相关的问题。此外，本评论提供了公开可访问的基于言语的数据集和用于PD的开源材料的概述。鉴别出的DL方法被归类为端到端（E2E）学习、迁移学习（TL）和深度声学特征提取（DAFE）。在E2E方法中，卷积神经网络（CNN）是普遍存在的，尽管变压器越来越受欢迎。E2E方法面临挑战，如有限的数据和计算资源，特别是对于变压器。TL通过提供更稳健的PD诊断和更好的跨语言泛化性来解决这些问题。DAFE旨在通过研究深层特征对其他DL方法和更传统的机器学习（ML）方法的具体影响，提高结果的可解释性和解释性。然而，与E2E和TL方法相比，它通常表现较差。

更新时间: 2024-09-24 06:29:29

领域: cs.SD,cs.AI,cs.CL,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.17844v4

Deep-learning real-time phase retrieval of imperfect diffraction patterns from X-ray free-electron lasers

Machine learning is attracting surging interest across nearly all scientific areas by enabling the analysis of large datasets and the extraction of scientific information from incomplete data. Data-driven science is rapidly growing, especially in X-ray methodologies, where advanced light sources and detection technologies accumulate vast amounts of data that exceed meticulous human inspection capabilities. Despite the increasing demands, the full application of machine learning has been hindered by the need for data-specific optimizations. In this study, we introduce a new deep-learning-based phase retrieval method for imperfect diffraction data. This method provides robust phase retrieval for simulated data and performs well on weak-signal single-pulse diffraction data from X-ray free-electron lasers. Moreover, the method significantly reduces data processing time, facilitating real-time image reconstructions that are crucial for high-repetition-rate data acquisition. Thus, this approach offers a reliable solution to the phase problem and is expected to be widely adopted across various research areas.

Updated: 2024-09-24 06:28:25

标题: 深度学习实时相位恢复不完美衍射图样的X射线自由电子激光

摘要: 机器学习正在吸引几乎所有科学领域的持续关注，通过使大型数据集的分析和从不完整数据中提取科学信息成为可能。数据驱动科学正在迅速增长，特别是在X射线方法学中，先进的光源和检测技术积累了大量超过仔细人工检查能力的数据。尽管需求不断增加，但机器学习的全面应用受到对数据特定优化的需求的阻碍。在这项研究中，我们介绍了一种基于深度学习的针对不完美衍射数据的相位恢复方法。该方法为模拟数据提供了强大的相位恢复，并在X射线自由电子激光的弱信号单脉冲衍射数据上表现良好。此外，该方法显著减少了数据处理时间，有助于实时图像重建，这对于高重复率数据采集至关重要。因此，这种方法为相位问题提供了可靠解决方案，并预计将被广泛应用于各种研究领域。

更新时间: 2024-09-24 06:28:25

领域: physics.app-ph,cond-mat.mtrl-sci,cs.LG,physics.optics,68T07,J.2

下载: http://arxiv.org/abs/2409.15784v1

OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models

To overcome the burden on the memory size and bandwidth due to ever-increasing size of large language models (LLMs), aggressive weight quantization has been recently studied, while lacking research on quantizing activations. In this paper, we present a hardware-software co-design method that results in an energy-efficient LLM accelerator, named OPAL, for generation tasks. First of all, a novel activation quantization method that leverages the microscaling data format while preserving several outliers per sub-tensor block (e.g., four out of 128 elements) is proposed. Second, on top of preserving outliers, mixed precision is utilized that sets 5-bit for inputs to sensitive layers in the decoder block of an LLM, while keeping inputs to less sensitive layers to 3-bit. Finally, we present the OPAL hardware architecture that consists of FP units for handling outliers and vectorized INT multipliers for dominant non-outlier related operations. In addition, OPAL uses log2-based approximation on softmax operations that only requires shift and subtraction to maximize power efficiency. As a result, we are able to improve the energy efficiency by 1.6~2.2x, and reduce the area by 2.4~3.1x with negligible accuracy loss, i.e., <1 perplexity increase.

Updated: 2024-09-24 06:11:12

标题: OPAL：保留异常值的微缩放量化加速器，用于生成大型语言模型

摘要: 为了克服由于大型语言模型（LLMs）尺寸不断增加而导致的内存大小和带宽负担，最近开始研究了积极的权重量化，但缺乏对激活量化的研究。本文提出了一种硬件-软件协同设计方法，实现了一种名为OPAL的用于生成任务的能量效率高的LLM加速器。首先，提出了一种新颖的激活量化方法，利用微缩放数据格式，同时保留每个子张量块中的若干异常值（例如，128个元素中的四个）。其次，在保留异常值的基础上，利用混合精度，在LLM的解码器块中将对敏感层的输入设置为5位，而对不太敏感层的输入设置为3位。最后，我们提出了OPAL硬件架构，其中包括用于处理异常值的FP单元和用于主要非异常值相关操作的矢量化INT乘法器。此外，OPAL使用基于log2的近似值进行softmax操作，只需要移位和减法即可最大化功率效率。结果，我们能够将能量效率提高1.6～2.2倍，并将面积减少2.4～3.1倍，同时几乎不损失精度，即<1个困惑度增加。

更新时间: 2024-09-24 06:11:12

领域: cs.LG,cs.AR,cs.CL

下载: http://arxiv.org/abs/2409.05902v3

Zero-shot forecasting of chaotic systems

Time-series forecasting is a challenging task that traditionally requires specialized models custom-trained for the specific task at hand. Recently, inspired by the success of large language models, foundation models pre-trained on vast amounts of time-series data from diverse domains have emerged as a promising candidate for general-purpose time-series forecasting. The defining characteristic of these foundation models is their ability to perform zero-shot learning, that is, forecasting a new system from limited context data without explicit re-training or fine-tuning. Here, we evaluate whether the zero-shot learning paradigm extends to the challenging task of forecasting chaotic systems. Across 135 distinct chaotic dynamical systems and $10^8$ timepoints, we find that foundation models produce competitive forecasts compared to custom-trained models (including NBEATS, TiDE, etc.), particularly when training data is limited. Interestingly, even after point forecasts fail, foundation models preserve the geometric and statistical properties of the chaotic attractors, demonstrating a surprisingly strong ability to capture the long-term behavior of chaotic dynamical systems. Our results highlight the promises and pitfalls of foundation models in making zero-shot forecasts of chaotic systems.

Updated: 2024-09-24 05:56:58

标题: 混沌系统的零预测

摘要: 时间序列预测是一项具有挑战性的任务，传统上需要为特定任务定制的专门模型。最近，受大型语言模型成功的启发，基于广泛时间序列数据预训练的基础模型已经成为通用时间序列预测的有前途的候选者。这些基础模型的定义特征是它们能够执行零样本学习，即在有限的上下文数据中预测新系统，而无需明确重新训练或微调。在这里，我们评估零样本学习范式是否适用于具有挑战性的混沌系统预测任务。在135个不同的混沌动力学系统和$10^8$时间点上，我们发现基础模型与定制训练模型（包括NBEATS、TiDE等）相比，产生了具有竞争力的预测结果，特别是在训练数据有限的情况下。有趣的是，即使点预测失败，基础模型仍然保留着混沌吸引子的几何和统计特性，展示出了对混沌动力学系统长期行为的惊人能力。我们的结果凸显了基础模型在对混沌系统进行零样本预测时的优势和缺陷。

更新时间: 2024-09-24 05:56:58

领域: cs.LG,nlin.CD,physics.comp-ph

下载: http://arxiv.org/abs/2409.15771v1

Probabilistic Forecasting of Real-Time Electricity Market Signals via Interpretable Generative AI

This paper introduces a generative AI approach to probabilistic forecasting of real-time electricity market signals, including locational marginal prices, interregional price spreads, and demand-supply imbalances. We present WIAE-GPF, a Weak Innovation AutoEncoder-based Generative Probabilistic Forecasting architecture that generates future samples of multivariate time series. Unlike traditional black-box models, WIAE-GPF offers interpretability through the Wiener-Kallianpur innovation representation for nonparametric time series, making it a nonparametric generalization of the Wiener/Kalman filter-based forecasting. A novel learning algorithm with structural convergence guarantees is proposed, ensuring that, under ideal training conditions, the generated forecast samples match the ground truth conditional probability distribution. Extensive tests using publicly available data from U.S. independent system operators under various point and probabilistic forecasting metrics demonstrate that WIAE-GPF consistently outperforms classical methods and cutting-edge machine learning techniques.

Updated: 2024-09-24 05:54:09

标题: 可解释生成人工智能在实时电力市场信号的概率预测中的应用

摘要: 这篇论文介绍了一种用于概率预测实时电力市场信号的生成式人工智能方法，包括定位边际价格、地区间价格差异和供需不平衡。我们提出了基于弱创新自动编码器的生成式概率预测架构WIAE-GPF，该架构生成多变量时间序列的未来样本。与传统的黑盒模型不同，WIAE-GPF通过Wiener-Kallianpur创新表示提供可解释性，用于非参数时间序列，使其成为Wiener/Kalman滤波器基于预测的非参数泛化。提出了一种具有结构收敛保证的新颖学习算法，确保在理想的训练条件下，生成的预测样本与基本真实条件概率分布相匹配。使用美国独立系统运营商的公开数据进行广泛测试，采用各种点和概率预测指标，结果表明WIAE-GPF始终优于传统方法和尖端机器学习技术。

更新时间: 2024-09-24 05:54:09

领域: eess.SP,cs.LG,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2403.05743v5

Spatial-Temporal Mixture-of-Graph-Experts for Multi-Type Crime Prediction

As various types of crime continue to threaten public safety and economic development, predicting the occurrence of multiple types of crimes becomes increasingly vital for effective prevention measures. Although extensive efforts have been made, most of them overlook the heterogeneity of different crime categories and fail to address the issue of imbalanced spatial distribution. In this work, we propose a Spatial-Temporal Mixture-of-Graph-Experts (ST-MoGE) framework for collective multiple-type crime prediction. To enhance the model's ability to identify diverse spatial-temporal dependencies and mitigate potential conflicts caused by spatial-temporal heterogeneity of different crime categories, we introduce an attentive-gated Mixture-of-Graph-Experts (MGEs) module to capture the distinctive and shared crime patterns of each crime category. Then, we propose Cross-Expert Contrastive Learning(CECL) to update the MGEs and force each expert to focus on specific pattern modeling, thereby reducing blending and redundancy. Furthermore, to address the issue of imbalanced spatial distribution, we propose a Hierarchical Adaptive Loss Re-weighting (HALR) approach to eliminate biases and insufficient learning of data-scarce regions. To evaluate the effectiveness of our methods, we conduct comprehensive experiments on two real-world crime datasets and compare our results with twelve advanced baselines. The experimental results demonstrate the superiority of our methods.

Updated: 2024-09-24 05:41:11

标题: 多类型犯罪预测的时空混合图专家模型

摘要: 随着各种犯罪继续威胁公共安全和经济发展，预测多种犯罪类型的发生变得越来越重要，以制定有效的预防措施。尽管进行了大量努力，但大多数忽视了不同犯罪类别的异质性，并未解决空间分布不均衡的问题。在这项工作中，我们提出了一个空间-时间图专家混合（ST-MoGE）框架，用于集体多类型犯罪预测。为了增强模型识别不同犯罪类别的空间-时间依赖性并减少由空间-时间异质性引起的潜在冲突，我们引入了一个专注门控图专家混合（MGEs）模块，以捕捉每种犯罪类别的独特和共享模式。然后，我们提出了交叉专家对比学习（CECL）来更新MGEs，并迫使每个专家专注于特定模式建模，从而减少混合和冗余。此外，为解决空间分布不均衡的问题，我们提出了一种分层自适应损失重新加权（HALR）方法，以消除偏见和数据稀缺区域的不足学习。为了评估我们方法的有效性，我们在两个真实世界的犯罪数据集上进行了全面实验，并将结果与十二种先进基线进行比较。实验结果表明我们方法的优越性。

更新时间: 2024-09-24 05:41:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.15764v1

IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios

In Retrieval-Augmented Generation (RAG) tasks using Large Language Models (LLMs), the quality of retrieved information is critical to the final output. This paper introduces the IRSC benchmark for evaluating the performance of embedding models in multilingual RAG tasks. The benchmark encompasses five retrieval tasks: query retrieval, title retrieval, part-of-paragraph retrieval, keyword retrieval, and summary retrieval. Our research addresses the current lack of comprehensive testing and effective comparison methods for embedding models in RAG scenarios. We introduced new metrics: the Similarity of Semantic Comprehension Index (SSCI) and the Retrieval Capability Contest Index (RCCI), and evaluated models such as Snowflake-Arctic, BGE, GTE, and M3E. Our contributions include: 1) the IRSC benchmark, 2) the SSCI and RCCI metrics, and 3) insights into the cross-lingual limitations of embedding models. The IRSC benchmark aims to enhance the understanding and development of accurate retrieval systems in RAG tasks. All code and datasets are available at: https://github.com/Jasaxion/IRSC\_Benchmark

Updated: 2024-09-24 05:39:53

标题: IRSC：一个零样本评估基准，用于通过检索增强生成情景中的语义理解进行信息检索

摘要: 在使用大型语言模型（LLMs）进行检索增强生成（RAG）任务时，检索到的信息质量对最终输出至关重要。本文介绍了用于评估多语言RAG任务中嵌入模型性能的IRSC基准。该基准包括五个检索任务：查询检索、标题检索、段落部分检索、关键词检索和摘要检索。我们的研究解决了目前在RAG场景中嵌入模型缺乏全面测试和有效比较方法的问题。我们引入了新的指标：语义理解相似性指数（SSCI）和检索能力竞赛指数（RCCI），并评估了Snowflake-Arctic、BGE、GTE和M3E等模型。我们的贡献包括：1）IRSC基准，2）SSCI和RCCI指标，以及3）关于嵌入模型跨语言限制的见解。IRSC基准旨在增强对RAG任务中准确检索系统的理解和发展。所有代码和数据集均可在以下链接找到：https://github.com/Jasaxion/IRSC_Benchmark

更新时间: 2024-09-24 05:39:53

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2409.15763v1

TFG: Unified Training-Free Guidance for Diffusion Models

Given an unconditional diffusion model and a predictor for a target property of interest (e.g., a classifier), the goal of training-free guidance is to generate samples with desirable target properties without additional training. Existing methods, though effective in various individual applications, often lack theoretical grounding and rigorous testing on extensive benchmarks. As a result, they could even fail on simple tasks, and applying them to a new problem becomes unavoidably difficult. This paper introduces a novel algorithmic framework encompassing existing methods as special cases, unifying the study of training-free guidance into the analysis of an algorithm-agnostic design space. Via theoretical and empirical investigation, we propose an efficient and effective hyper-parameter searching strategy that can be readily applied to any downstream task. We systematically benchmark across 7 diffusion models on 16 tasks with 40 targets, and improve performance by 8.5% on average. Our framework and benchmark offer a solid foundation for conditional generation in a training-free manner.

Updated: 2024-09-24 05:31:17

标题: TFG: 统一的无训练指导扩散模型

摘要: 鉴于无条件扩散模型和目标特性的预测器（例如，分类器），无需额外训练的指导目标是生成具有理想目标特性的样本。现有方法虽然在各个单独应用中有效，但常常缺乏理论基础，并且在广泛基准测试中缺乏严格测试。因此，它们甚至可能在简单任务上失败，并且将它们应用于新问题变得不可避免地困难。本文引入了一种新的算法框架，将现有方法作为特例，统一将无需训练的指导研究纳入算法不可知设计空间的分析中。通过理论和实证研究，我们提出了一种高效且有效的超参数搜索策略，可轻松应用于任何下游任务。我们在16项任务中对7个扩散模型进行了系统基准测试，共有40个目标，并且平均性能提高了8.5％。我们的框架和基准测试为无需训练方式下的有条件生成提供了坚实基础。

更新时间: 2024-09-24 05:31:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.15761v1

Distribution-Level Feature Distancing for Machine Unlearning: Towards a Better Trade-off Between Model Utility and Forgetting

With the explosive growth of deep learning applications, the right to be forgotten has become increasingly in demand in various AI industries. For example, given a facial recognition system, some individuals may wish to remove images that might have been used in the training phase from the trained model. Unfortunately, modern deep neural networks sometimes unexpectedly leak personal identities. Recent studies have presented various machine unlearning algorithms to make a trained model unlearn the data to be forgotten. While these methods generally perform well in terms of forgetting scores, we have found that an unexpected modelutility drop can occur. This phenomenon, which we term correlation collapse, happens when the machine unlearning algorithms reduce the useful correlation between image features and the true label. To address this challenge, we propose Distribution-Level Feature Distancing (DLFD), a novel method that efficiently forgets instances while preventing correlation collapse. Our method synthesizes data samples so that the generated data distribution is far from the distribution of samples being forgotten in the feature space, achieving effective results within a single training epoch. Through extensive experiments on facial recognition datasets, we demonstrate that our approach significantly outperforms state-of-the-art machine unlearning methods.

Updated: 2024-09-24 05:27:24

标题: 分布级特征远离用于机器遗忘：朝着模型效用和遗忘之间更好的权衡前进

摘要: 随着深度学习应用的爆炸性增长，被遗忘权在各种人工智能行业中需求日益增加。例如，对于一个面部识别系统，一些个体可能希望从已训练模型中删除可能在训练阶段使用过的图像。不幸的是，现代深度神经网络有时会意外地泄露个人身份。最近的研究提出了各种机器遗忘算法，使训练模型遗忘要被遗忘的数据。虽然这些方法在遗忘得分方面通常表现良好，但我们发现可能会出现意外的模型效用下降。我们将这种现象称为相关性崩溃，当机器遗忘算法减少图像特征与真实标签之间的有用相关性时会发生。为了解决这一挑战，我们提出了一种新颖的方法，即分布级特征距离（DLFD），该方法在防止相关性崩溃的同时有效地遗忘实例。我们的方法合成数据样本，使生成的数据分布在特征空间中远离被遗忘样本的分布，从而在单个训练时期内取得有效结果。通过在面部识别数据集上进行大量实验，我们展示了我们的方法明显优于最先进的机器遗忘方法。

更新时间: 2024-09-24 05:27:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.14747v2

Smart Grid Security: A Verified Deep Reinforcement Learning Framework to Counter Cyber-Physical Attacks

The distributed nature of smart grids, combined with sophisticated sensors, control algorithms, and data collection facilities at Supervisory Control and Data Acquisition (SCADA) centers, makes them vulnerable to strategically crafted cyber-physical attacks. These malicious attacks can manipulate power demands using high-wattage Internet of Things (IoT) botnet devices, such as refrigerators and air conditioners, or introduce false values into transmission line power flow sensor readings. Consequently, grids experience blackouts and high power flow oscillations. Existing grid protection mechanisms, originally designed to tackle natural faults in transmission lines and generator outages, are ineffective against such intelligently crafted attacks. This is because grid operators overlook potential scenarios of cyber-physical attacks during their design phase. In this work, we propose a safe Deep Reinforcement Learning (DRL)-based framework for mitigating attacks on smart grids. The DRL agent effectively neutralizes cyber-physical attacks on grid surfaces by triggering appropriate sequences of existing protection schemes. The safety of the DRL agent is formally verified through a reachability analysis method. Additionally, our framework is designed for deployment on CUDA-enabled GPU systems, which enables faster execution of these protection sequences and their real-time validation. Our framework establishes a new set of protection rules for grid models, successfully thwarting existing cyber-physical attacks.

Updated: 2024-09-24 05:26:20

标题: 智能电网安全：一种经过验证的深度强化学习框架，用于对抗网络物理攻击

摘要: 智能电网的分布式特性，结合先进的传感器、控制算法和数据采集设施在监控控制和数据采集（SCADA）中心，使其容易受到精心设计的网络物理攻击的威胁。这些恶意攻击可以利用高瓦特的物联网（IoT）僵尸网络设备，如冰箱和空调，操纵电力需求，或者在传输线电力流传感器读数中引入虚假值。因此，电网出现停电和高功率流动振荡。现有的电网保护机制最初是设计用来应对传输线路上的自然故障和发电机故障，对这种精心设计的攻击无效。这是因为电网运营商在设计阶段忽视了网络物理攻击的潜在场景。在这项工作中，我们提出了一个基于安全深度强化学习（DRL）的框架，用于缓解对智能电网的攻击。DRL代理通过触发适当的现有保护方案序列，有效地中和电网表面上的网络物理攻击。通过可达性分析方法正式验证了DRL代理的安全性。此外，我们的框架设计用于部署在支持CUDA的GPU系统上，这样可以更快地执行这些保护序列并进行实时验证。我们的框架为电网模型建立了一套新的保护规则，成功地挫败了现有的网络物理攻击。

更新时间: 2024-09-24 05:26:20

领域: cs.CR

下载: http://arxiv.org/abs/2409.15757v1

Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach

As the complexity of tasks addressed through reinforcement learning (RL) increases, the definition of reward functions also has become highly complicated. We introduce an RL method aimed at simplifying the reward-shaping process through intuitive strategies. Initially, instead of a single reward function composed of various terms, we define multiple reward and cost functions within a constrained multi-objective RL (CMORL) framework. For tasks involving sequential complex movements, we segment the task into distinct stages and define multiple rewards and costs for each stage. Finally, we introduce a practical CMORL algorithm that maximizes objectives based on these rewards while satisfying constraints defined by the costs. The proposed method has been successfully demonstrated across a variety of acrobatic tasks in both simulation and real-world environments. Additionally, it has been shown to successfully perform tasks compared to existing RL and constrained RL algorithms. Our code is available at https://github.com/rllab-snu/Stage-Wise-CMORL.

Updated: 2024-09-24 05:25:24

标题: 阶段奖励塑造对杂技机器人的影响：一种受限多目标强化学习方法

摘要: 随着通过强化学习（RL）解决的任务复杂性增加，奖励函数的定义也变得非常复杂。我们介绍了一种旨在通过直观策略简化奖励塑造过程的RL方法。最初，我们在受限多目标RL（CMORL）框架内定义了多个奖励和成本函数，而不是由各种项组成的单个奖励函数。对于涉及顺序复杂动作的任务，我们将任务分割为不同阶段，并为每个阶段定义多个奖励和成本。最后，我们介绍了一种实用的CMORL算法，该算法根据这些奖励最大化目标，同时满足由成本定义的约束。所提出的方法已成功在模拟和现实环境中展示了各种杂技任务。此外，与现有RL和受限RL算法相比，已经表明能够成功执行任务。我们的代码可在https://github.com/rllab-snu/Stage-Wise-CMORL获取。

更新时间: 2024-09-24 05:25:24

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.15755v1

Development and Validation of Heparin Dosing Policies Using an Offline Reinforcement Learning Algorithm

Appropriate medication dosages in the intensive care unit (ICU) are critical for patient survival. Heparin, used to treat thrombosis and inhibit blood clotting in the ICU, requires careful administration due to its complexity and sensitivity to various factors, including patient clinical characteristics, underlying medical conditions, and potential drug interactions. Incorrect dosing can lead to severe complications such as strokes or excessive bleeding. To address these challenges, this study proposes a reinforcement learning (RL)-based personalized optimal heparin dosing policy that guides dosing decisions reliably within the therapeutic range based on individual patient conditions. A batch-constrained policy was implemented to minimize out-of-distribution errors in an offline RL environment and effectively integrate RL with existing clinician policies. The policy's effectiveness was evaluated using weighted importance sampling, an off-policy evaluation method, and the relationship between state representations and Q-values was explored using t-SNE. Both quantitative and qualitative analyses were conducted using the Medical Information Mart for Intensive Care III (MIMIC-III) database, demonstrating the efficacy of the proposed RL-based medication policy. Leveraging advanced machine learning techniques and extensive clinical data, this research enhances heparin administration practices and establishes a precedent for the development of sophisticated decision-support tools in medicine.

Updated: 2024-09-24 05:20:38

标题: 肝素剂量策略的开发和验证，使用离线强化学习算法

摘要: 重症监护室（ICU）中适当的药物剂量对患者的生存至关重要。肝素用于治疗血栓形成并抑制ICU中的血液凝结，由于其复杂性和对各种因素的敏感性，包括患者的临床特征、潜在的药物相互作用和潜在的药物相互作用，因此需要谨慎使用。基础疾病。错误的剂量可能导致严重并发症，如中风或过度出血。为了解决这些挑战，本研究提出了一种基于强化学习（RL）的个性化最佳肝素剂量政策，根据个体患者的情况可靠地指导剂量决策在治疗范围内。在离线RL环境中实现了一种批量约束策略，以最小化失配错误，并有效地将RL与现有的临床政策整合。使用加权重要性采样，一个离线评估方法，评估了策略的有效性，并探讨了状态表示和Q值之间的关系使用t-SNE。使用医疗信息马特（MIMIC-III）数据库进行了定量和定性分析，证明了提出的基于RL的药物政策的有效性。利用先进的机器学习技术和广泛的临床数据，本研究增强了肝素管理实践，并为医学中复杂决策支持工具的开发树立了先例。

更新时间: 2024-09-24 05:20:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.15753v1

Goal-guided Generative Prompt Injection Attack on Large Language Models

Current large language models (LLMs) provide a strong foundation for large-scale user-oriented natural language tasks. A large number of users can easily inject adversarial text or instructions through the user interface, thus causing LLMs model security challenges. Although there is currently a large amount of research on prompt injection attacks, most of these black-box attacks use heuristic strategies. It is unclear how these heuristic strategies relate to the success rate of attacks and thus effectively improve model robustness. To solve this problem, we redefine the goal of the attack: to maximize the KL divergence between the conditional probabilities of the clean text and the adversarial text. Furthermore, we prove that maximizing the KL divergence is equivalent to maximizing the Mahalanobis distance between the embedded representation $x$ and $x'$ of the clean text and the adversarial text when the conditional probability is a Gaussian distribution and gives a quantitative relationship on $x$ and $x'$. Then we designed a simple and effective goal-guided generative prompt injection strategy (G2PIA) to find an injection text that satisfies specific constraints to achieve the optimal attack effect approximately. It is particularly noteworthy that our attack method is a query-free black-box attack method with low computational cost. Experimental results on seven LLM models and four datasets show the effectiveness of our attack method.

Updated: 2024-09-24 05:16:59

标题: 大语言模型上的目标引导生成提示注入攻击

摘要: 目前的大型语言模型（LLMs）为大规模用户导向的自然语言任务提供了坚实基础。大量用户可以通过用户界面轻松注入对抗性文本或指令，从而引发LLMs模型安全挑战。虽然目前有大量关于提示注入攻击的研究，但大多数黑盒攻击使用启发式策略。目前尚不清楚这些启发式策略与攻击成功率的关系，因此无法有效提升模型的鲁棒性。为了解决这个问题，我们重新定义了攻击的目标：最大化干净文本和对抗性文本的条件概率之间的KL散度。此外，我们证明最大化KL散度等价于在条件概率为高斯分布时，最大化干净文本和对抗性文本的嵌入表示$x$和$x'$之间的马氏距离，并给出了$x$和$x'$之间的定量关系。然后，我们设计了一个简单有效的目标引导生成式提示注入策略（G2PIA），以找到一个满足特定约束条件的注入文本，从而近似实现最佳攻击效果。特别值得注意的是，我们的攻击方法是一种无查询黑盒攻击方法，计算成本低。对七个LLM模型和四个数据集的实验结果显示了我们攻击方法的有效性。

更新时间: 2024-09-24 05:16:59

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.07234v3

The Roles of Generative Artificial Intelligence in Internet of Electric Vehicles

With the advancement of generative artificial intelligence (GenAI) models, their capability to generate content is seeing significant enhancement, leading to widespread applications in the field of data generation and forecasting. Furthermore, GenAI has strong capabilities in data modeling and analysis, which enhances Internet of electric vehicles (IoEV) applications in various aspects. In this paper, we investigate and survey applications of GenAI in the IoEV. Specifically, we categorize GenAI for IoEV into four different layers namely, EV's battery layer, individual electric vehicle (EV) layer, smart grid with EV layer, and security layer. We first introduce various GenAI techniques used in each layer of IoEV applications. Subsequently, public datasets available for training the GenAI models are summarized. Finally, we provide recommendations for future directions. This survey not only categorizes the applications of GenAI in IoEV across different layers but also serves as a valuable resource for researchers and practitioners by highlighting the design and implementation challenges within each layer. Furthermore, it provides a roadmap for future research directions, enabling the development of more robust and efficient IoEV systems through the integration of advanced GenAI techniques.

Updated: 2024-09-24 05:12:10

标题: 生成式人工智能在电动汽车互联网中的作用

摘要: 随着生成式人工智能（GenAI）模型的进步，它们生成内容的能力得到显著增强，导致在数据生成和预测领域的广泛应用。此外，GenAI在数据建模和分析方面具有强大的能力，这加强了在各个方面应用互联网电动汽车（IoEV）的能力。本文研究和调查了GenAI在IoEV中的应用。具体而言，我们将IoEV中的GenAI分为四个不同的层次，即电动汽车的电池层、个体电动汽车层、带有电动汽车的智能电网层和安全层。我们首先介绍了在IoEV应用的每个层次中使用的各种GenAI技术。随后，总结了用于训练GenAI模型的公共数据集。最后，我们提供了未来方向的建议。这项调查不仅将GenAI在IoEV中的应用按不同层次进行分类，而且通过强调每个层次中的设计和实施挑战，为研究人员和从业者提供了宝贵资源。此外，它为未来研究方向提供了路线图，通过整合先进的GenAI技术，促进了更强大和高效的IoEV系统的发展。

更新时间: 2024-09-24 05:12:10

领域: cs.LG,cs.AI,cs.ET

下载: http://arxiv.org/abs/2409.15750v1

Automated Assessment of Multimodal Answer Sheets in the STEM domain

In the domain of education, the integration of,technology has led to a transformative era, reshaping traditional,learning paradigms. Central to this evolution is the automation,of grading processes, particularly within the STEM domain encompassing Science, Technology, Engineering, and Mathematics.,While efforts to automate grading have been made in subjects,like Literature, the multifaceted nature of STEM assessments,presents unique challenges, ranging from quantitative analysis,to the interpretation of handwritten diagrams. To address these,challenges, this research endeavors to develop efficient and reliable grading methods through the implementation of automated,assessment techniques using Artificial Intelligence (AI). Our,contributions lie in two key areas: firstly, the development of a,robust system for evaluating textual answers in STEM, leveraging,sample answers for precise comparison and grading, enabled by,advanced algorithms and natural language processing techniques.,Secondly, a focus on enhancing diagram evaluation, particularly,flowcharts, within the STEM context, by transforming diagrams,into textual representations for nuanced assessment using a,Large Language Model (LLM). By bridging the gap between,visual representation and semantic meaning, our approach ensures accurate evaluation while minimizing manual intervention.,Through the integration of models such as CRAFT for text,extraction and YoloV5 for object detection, coupled with LLMs,like Mistral-7B for textual evaluation, our methodology facilitates,comprehensive assessment of multimodal answer sheets. This,paper provides a detailed account of our methodology, challenges,encountered, results, and implications, emphasizing the potential,of AI-driven approaches in revolutionizing grading practices in,STEM education.

Updated: 2024-09-24 05:10:13

标题: STEM领域中多模式答题卡的自动评估

摘要: 在教育领域，技术整合已经引领了一个变革时代，重塑了传统的学习范式。这一演进的核心是自动化评分过程，特别是在涵盖科学、技术、工程和数学（STEM）领域内。虽然在文学等学科中已经做出了自动化评分的努力，但STEM评估的多面性性质提出了独特的挑战，从定量分析到手写图表的解释不等。为了解决这些挑战，本研究致力于通过实施利用人工智能（AI）的自动评估技术来开发高效可靠的评分方法。我们的贡献主要体现在两个关键领域：首先，开发一个在STEM中评估文本答案的强大系统，利用样本答案进行精确比较和评分，通过先进算法和自然语言处理技术实现。其次，重点放在增强图表评估，特别是在STEM背景下的流程图，通过将图表转换为文本表示进行细致评估，利用大型语言模型（LLM）。通过将视觉表现和语义含义之间的差距联系起来，我们的方法确保了准确评估同时最大程度减少手动干预。通过整合诸如CRAFT用于文本提取和YoloV5用于对象检测的模型，结合像Mistral-7B这样的LLM用于文本评估，我们的方法论促进了对多模态答题卡的全面评估。本文详细介绍了我们的方法论、遇到的挑战、结果和意义，强调了AI驱动方法在革新STEM教育评分实践中的潜力。

更新时间: 2024-09-24 05:10:13

领域: cs.AI

下载: http://arxiv.org/abs/2409.15749v1

Archon: An Architecture Search Framework for Inference-Time Techniques

Inference-time techniques are emerging as highly effective tools to increase large language model (LLM) capabilities. However, there is still limited understanding of the best practices for developing systems that combine inference-time techniques with one or more LLMs, with challenges including: (1) effectively allocating inference compute budget, (2) understanding the interactions between different combinations of inference-time techniques and their impact on downstream performance, and 3) efficiently searching over the large space of model choices, inference-time techniques, and their compositions. To address these challenges, we introduce Archon, an automated framework for designing inference-time architectures. Archon defines an extensible design space, encompassing methods such as generation ensembling, multi-sampling, ranking, fusion, critiquing, verification, and unit testing. It then transforms the problem of selecting and combining LLMs and inference-time techniques into a hyperparameter optimization objective. To optimize this objective, we introduce automated Inference-Time Architecture Search (ITAS) algorithms. Given target benchmark(s), an inference compute budget, and available LLMs, ITAS outputs optimized architectures. We evaluate Archon architectures across a wide range of instruction-following and reasoning benchmarks, including MT-Bench, Arena-Hard-Auto, AlpacaEval 2.0, MixEval, MixEval Hard, MATH, and CodeContests. We show that automatically designed inference-time architectures by Archon outperform strong models such as GPT-4o and Claude 3.5 Sonnet on these benchmarks, achieving an average increase of 14.1 and 10.3 percentage points with all-source models and open-source models, respectively. We make our code and datasets available publicly on Github: https://github.com/ScalingIntelligence/Archon.

Updated: 2024-09-24 05:08:18

标题: Archon: 一种用于推理时技术的架构搜索框架

摘要: 推理时间技术正在成为增强大型语言模型（LLM）能力的高效工具。然而，对于开发将推理时间技术与一个或多个LLM结合的系统的最佳实践仍有限的理解，其中的挑战包括：（1）有效分配推理计算预算，（2）理解不同组合的推理时间技术之间的交互作用及其对下游性能的影响，以及（3）在模型选择、推理时间技术和它们的组合的庞大空间中进行高效搜索。为了解决这些挑战，我们引入了Archon，一个设计推理时间架构的自动化框架。Archon定义了一个可扩展的设计空间，包括生成集成、多采样、排名、融合、批评、验证和单元测试等方法。然后，它将选择和组合LLM和推理时间技术的问题转化为超参数优化目标。为了优化这一目标，我们引入了自动化的推理时间架构搜索（ITAS）算法。给定目标基准、推理计算预算和可用的LLM，ITAS输出优化的架构。我们在一系列指令遵循和推理基准中评估了Archon架构，包括MT-Bench、Arena-Hard-Auto、AlpacaEval 2.0、MixEval、MixEval Hard、MATH和CodeContests。我们展示了Archon自动设计的推理时间架构在这些基准上胜过像GPT-4o和Claude 3.5 Sonnet这样的强大模型，分别实现了所有源模型和开源模型的平均增加14.1和10.3个百分点。我们将我们的代码和数据集公开提供在Github上：https://github.com/ScalingIntelligence/Archon。

更新时间: 2024-09-24 05:08:18

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.15254v2

Training Neural Networks for Modularity aids Interpretability

An approach to improve network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We find pretrained models to be highly unclusterable and thus train models to be more modular using an ``enmeshment loss'' function that encourages the formation of non-interacting clusters. Using automated interpretability measures, we show that our method finds clusters that learn different, disjoint, and smaller circuits for CIFAR-10 labels. Our approach provides a promising direction for making neural networks easier to interpret.

Updated: 2024-09-24 05:03:49

标题: 训练神经网络以促进模块化有助于解释性

摘要: 一种提高网络可解释性的方法是通过集群性，即将模型分割成可以独立研究的不相交集群。我们发现预训练模型很难集群化，因此训练模型更模块化，使用一种“纠缠损失”函数来鼓励非交互集群的形成。使用自动解释性度量，我们展示了我们的方法发现了为CIFAR-10标签学习不同、不相交和更小电路的集群。我们的方法为使神经网络更易解释提供了一个有希望的方向。

更新时间: 2024-09-24 05:03:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.15747v1

Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs

The growing demand for AI training data has transformed data annotation into a global industry, but traditional approaches relying on human annotators are often time-consuming, labor-intensive, and prone to inconsistent quality. We propose the Model-in-the-Loop (MILO) framework, which integrates AI/ML models into the annotation process. Our research introduces a collaborative paradigm that leverages the strengths of both professional human annotators and large language models (LLMs). By employing LLMs as pre-annotation and real-time assistants, and judges on annotator responses, MILO enables effective interaction patterns between human annotators and LLMs. Three empirical studies on multimodal data annotation demonstrate MILO's efficacy in reducing handling time, improving data quality, and enhancing annotator experiences. We also introduce quality rubrics for flexible evaluation and fine-grained feedback on open-ended annotations. The MILO framework has implications for accelerating AI/ML development, reducing reliance on human annotation alone, and promoting better alignment between human and machine values.

Updated: 2024-09-24 05:00:07

标题: Model-in-the-Loop（MILO）：使用LLMs加速多模态人工智能数据标注

摘要: 人工智能训练数据的不断增长需求已经将数据标注转变为一个全球性产业，但传统依赖于人类标注者的方法往往耗时、劳动密集且容易产生质量不一致的问题。我们提出了Model-in-the-Loop（MILO）框架，将人工智能/机器学习模型整合到标注过程中。我们的研究引入了一种协作范式，利用专业人类标注者和大型语言模型（LLMs）的优势。通过将LLMs作为预标注和实时助手，并作为标注者响应的评判者，MILO实现了人类标注者和LLMs之间有效的交互模式。三项关于多模态数据标注的实证研究展示了MILO在减少处理时间、提高数据质量和增强标注者体验方面的有效性。我们还引入了质量评估标准，用于对开放式标注进行灵活评估和精细反馈。MILO框架对于加速人工智能/机器学习的发展、减少对单一人类标注的依赖，并促进人类和机器价值观之间更好的协调具有重要意义。

更新时间: 2024-09-24 05:00:07

领域: cs.HC,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.10702v2

Real-Time Pedestrian Detection on IoT Edge Devices: A Lightweight Deep Learning Approach

Artificial intelligence (AI) has become integral to our everyday lives. Computer vision has advanced to the point where it can play the safety critical role of detecting pedestrians at road intersections in intelligent transportation systems and alert vehicular traffic as to potential collisions. Centralized computing analyzes camera feeds and generates alerts for nearby vehicles. However, real-time applications face challenges such as latency, limited data transfer speeds, and the risk of life loss. Edge servers offer a potential solution for real-time applications, providing localized computing and storage resources and lower response times. Unfortunately, edge servers have limited processing power. Lightweight deep learning (DL) techniques enable edge servers to utilize compressed deep neural network (DNN) models. The research explores implementing a lightweight DL model on Artificial Intelligence of Things (AIoT) edge devices. An optimized You Only Look Once (YOLO) based DL model is deployed for real-time pedestrian detection, with detection events transmitted to the edge server using the Message Queuing Telemetry Transport (MQTT) protocol. The simulation results demonstrate that the optimized YOLO model can achieve real-time pedestrian detection, with a fast inference speed of 147 milliseconds, a frame rate of 2.3 frames per second, and an accuracy of 78%, representing significant improvements over baseline models.

Updated: 2024-09-24 04:48:41

标题: 物联网边缘设备上的实时行人检测：一种轻量级深度学习方法

摘要: 人工智能（AI）已经成为我们日常生活中不可或缺的一部分。计算机视觉已经发展到可以在智能交通系统中扮演安全关键角色，例如检测道路交叉口的行人并提醒车辆交通可能发生的碰撞。集中式计算分析摄像头视频流并为附近车辆生成警报。然而，实时应用程序面临延迟、有限的数据传输速度和生命损失风险等挑战。边缘服务器为实时应用程序提供了潜在解决方案，提供本地化计算和存储资源以及更低的响应时间。不幸的是，边缘服务器的处理能力有限。轻量级深度学习（DL）技术使边缘服务器能够利用压缩的深度神经网络（DNN）模型。该研究探讨了在物联网（AIoT）边缘设备上实现轻量级DL模型的方法。一个优化的基于You Only Look Once (YOLO)的DL模型被部署用于实时行人检测，检测事件使用消息队列遥测传输（MQTT）协议传输至边缘服务器。模拟结果表明，优化的YOLO模型可以实现实时行人检测，推断速度快达147毫秒，每秒帧率为2.3帧，准确率为78%，相比基线模型有显著改进。

更新时间: 2024-09-24 04:48:41

领域: cs.AI,cs.CV,cs.NI

下载: http://arxiv.org/abs/2409.15740v1

LSAST -- Enhancing Cybersecurity through LLM-supported Static Application Security Testing

In the fast-evolving landscape of cybersecurity, Large Language Models (LLMs) play a pivotal role, continually improving their ability to analyze software code. This paper introduces a novel approach to vulnerability scanning by integrating conservative SAST (Static Application Security Testing) scanners with LLM capabilities, resulting in the creation of LSAST (LLM-supported Static Application Security Testing). Our approach significantly enhances the performance of LLMs in vulnerability scanning, establishing a new standard in this field. We benchmark LSAST's efficiency and compare its results with a state-of-the-art LLM. Additionally, we address the inherent drawbacks of LLMs in vulnerability scanning: their reliance on static training datasets, which leads to the exclusion of the latest vulnerabilities, and the privacy concerns associated with sending code to third-party LLM providers. To mitigate these issues, we utilize an open-source LLM to ensure privacy and employ a novel approach to gather relevant vulnerability information, thereby equipping the LLM with up-to-date knowledge.

Updated: 2024-09-24 04:42:43

标题: LSAST — 通过LLM支持的静态应用安全测试增强网络安全

摘要: 在网络安全领域不断发展的背景下，大型语言模型（LLMs）在分析软件代码方面发挥着关键作用。本文介绍了一种将保守的SAST（静态应用程序安全测试）扫描仪与LLM能力集成的新方法，从而创建了LSAST（LLM支持的静态应用程序安全测试）。我们的方法显著提高了LLMs在漏洞扫描中的性能，为该领域建立了新的标准。我们对LSAST的效率进行了基准测试，并将其结果与最先进的LLM进行了比较。此外，我们解决了LLMs在漏洞扫描中固有的缺点：它们依赖于静态训练数据集，这导致最新漏洞被排除在外，并且存在将代码发送给第三方LLM提供商的隐私问题。为了减轻这些问题，我们利用开源LLM来确保隐私，并采用一种新颖的方法来收集相关的漏洞信息，从而为LLM提供最新的知识。

更新时间: 2024-09-24 04:42:43

领域: cs.CR

下载: http://arxiv.org/abs/2409.15735v1

Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding

The increasing volume and complexity of clinical documentation in Electronic Medical Records systems pose significant challenges for clinical coders, who must mentally process and summarise vast amounts of clinical text to extract essential information needed for coding tasks. While large language models have been successfully applied to shorter summarisation tasks in recent years, the challenge of summarising a hospital course remains an open area for further research and development. In this study, we adapted three pre trained LLMs, Llama 3, BioMistral, Mistral Instruct v0.1 for the hospital course summarisation task, using Quantized Low Rank Adaptation fine tuning. We created a free text clinical dataset from MIMIC III data by concatenating various clinical notes as the input clinical text, paired with ground truth Brief Hospital Course sections extracted from the discharge summaries for model training. The fine tuned models were evaluated using BERTScore and ROUGE metrics to assess the effectiveness of clinical domain fine tuning. Additionally, we validated their practical utility using a novel hospital course summary assessment metric specifically tailored for clinical coding. Our findings indicate that fine tuning pre trained LLMs for the clinical domain can significantly enhance their performance in hospital course summarisation and suggest their potential as assistive tools for clinical coding. Future work should focus on refining data curation methods to create higher quality clinical datasets tailored for hospital course summary tasks and adapting more advanced open source LLMs comparable to proprietary models to further advance this research.

Updated: 2024-09-24 04:41:43

标题: 调和临床旋律：调整大型语言模型以用于临床编码中的医院病程总结

摘要: 随着电子病历系统中临床文档数量和复杂性的增加，临床编码人员面临着重大挑战，他们必须对大量临床文本进行心理加工和总结，以提取编码任务所需的关键信息。近年来，大型语言模型已成功应用于较短的摘要任务，但医院病程摘要的挑战仍是进一步研究和发展的一个开放领域。在本研究中，我们采用了三个预训练的LLMs（Llama 3、BioMistral、Mistral Instruct v0.1）来进行医院病程摘要任务，使用了量化低秩适应微调。我们从MIMIC III数据中创建了一个自由文本临床数据集，将各种临床笔记串联起来作为输入临床文本，并与从出院摘要中提取的真实简要医院病程部分配对，用于模型训练。使用BERTScore和ROUGE指标评估了微调模型的效果，以评估临床领域微调的有效性。此外，我们使用了一种专门针对临床编码定制的新型医院病程摘要评估指标来验证它们的实际效用。我们的研究结果表明，为临床领域微调预训练的LLMs可以显著提升其在医院病程摘要任务中的性能，并表明它们有潜力作为临床编码的辅助工具。未来的工作应侧重于改进数据筛选方法，创建更高质量的针对医院病程摘要任务的临床数据集，并将更先进的开源LLMs调整到专有模型的水平，以进一步推动这一研究。

更新时间: 2024-09-24 04:41:43

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.14638v2

Tarsier: Recipes for Training and Evaluating Large Video Description Models

Generating fine-grained video descriptions is a fundamental challenge in video understanding. In this work, we introduce Tarsier, a family of large-scale video-language models designed to generate high-quality video descriptions. Tarsier employs CLIP-ViT to encode frames separately and then uses an LLM to model temporal relationships. Despite its simple architecture, we demonstrate that with a meticulously designed two-stage training procedure, the Tarsier models exhibit substantially stronger video description capabilities than any existing open-source model, showing a $+51.4\%$ advantage in human side-by-side evaluation over the strongest model. Additionally, they are comparable to state-of-the-art proprietary models, with a $+12.3\%$ advantage against GPT-4V and a $-6.7\%$ disadvantage against Gemini 1.5 Pro. When upgraded to Tarsier2 by building upon SigLIP and Qwen2-7B, it further improves significantly with a $+4.8\%$ advantage against GPT-4o. Besides video description, Tarsier proves to be a versatile generalist model, achieving new state-of-the-art results across nine public benchmarks, including multi-choice VQA, open-ended VQA, and zero-shot video captioning. Our second contribution is the introduction of a new benchmark -- DREAM-1K (https://tarsier-vlm.github.io/) for evaluating video description models, consisting of a new challenging dataset featuring videos from diverse sources and varying complexity, along with an automatic method specifically designed to assess the quality of fine-grained video descriptions. We make our models and evaluation benchmark publicly available at https://github.com/bytedance/tarsier.

Updated: 2024-09-24 04:41:08

标题: 狐猴：用于训练和评估大型视频描述模型的配方

摘要: 生成精细的视频描述是视频理解中的一个基本挑战。在这项工作中，我们介绍了Tarsier，一组大规模视频语言模型，旨在生成高质量的视频描述。Tarsier采用CLIP-ViT分别编码帧，然后使用LLM模型来建模时间关系。尽管其架构简单，我们证明通过精心设计的两阶段训练过程，Tarsier模型表现出比任何现有开源模型更强大的视频描述能力，在人类并排评估中显示出+51.4%的优势。此外，它们与最先进的专有模型相当，对GPT-4V有12.3%的优势，对Gemini 1.5 Pro有6.7%的劣势。通过在SigLIP和Qwen2-7B基础上构建Tarsier2进行升级，它进一步显著提高，在GPT-4o上有4.8%的优势。除视频描述外，Tarsier被证明是一个多才多艺的通用模型，在九个公共基准测试中取得了新的最先进结果，包括多选VQA、开放式VQA和零样本视频字幕。我们的第二项贡献是引入一个新的基准测试--DREAM-1K（https://tarsier-vlm.github.io/）用于评估视频描述模型，包括一个新的具有挑战性的数据集，其中包含来自不同来源和不同复杂度的视频，以及一个专门设计用于评估精细视频描述质量的自动方法。我们将我们的模型和评估基准公开提供在https://github.com/bytedance/tarsier。

更新时间: 2024-09-24 04:41:08

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.00634v2

Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models

In this work, we consider solving optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Sequential Quadratic Programming method to find both first- and second-order stationary points. Our method utilizes a random model to represent the objective function, which is constructed from stochastic observations of the objective and is designed to satisfy proper adaptive accuracy conditions with a high but fixed probability. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a quadratic approximation of the objective subject to a (relaxed) linear approximation of the problem constraints and a trust-region constraint. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature of the reduced Hessian matrix, as well as a second-order correction step to address the potential Maratos effect, which arises due to the nonlinearity of the problem constraints. Such an effect may impede the method from moving away from saddle points. Both gradient and eigen step computations leverage a novel parameter-free decomposition of the step and the trust-region radius, accounting for the proportions among the feasibility residual, optimality residual, and negative curvature. We establish global almost sure first- and second-order convergence guarantees for our method, and present computational results on CUTEst problems, regression problems, and saddle-point problems to demonstrate its superiority over existing line-search-based stochastic methods.

Updated: 2024-09-24 04:39:47

标题: 信任域序列二次规划用于具有随机模型的随机优化

摘要: 在这项工作中，我们考虑解决具有随机目标和确定性等式约束的优化问题。我们提出了一种信任域顺序二次规划方法，用于找到一阶和二阶稳定点。我们的方法利用随机模型来表示目标函数，该模型由目标的随机观测构建，并设计为满足适当的自适应精度条件，并具有高但固定的概率。为了收敛到一阶稳定点，我们的方法在每次迭代中计算一个梯度步骤，该步骤由最小化目标的二次逼近定义，受约束于问题约束的（放松的）线性逼近和信任域约束。为了收敛到二阶稳定点，我们的方法另外计算一个特征步骤来探索减小的黑塞矩阵的负曲率，以及一个二阶校正步骤来解决潜在的马拉托斯效应，该效应由于问题约束的非线性而产生。这种效应可能妨碍该方法远离鞍点。梯度和特征步骤计算都利用了一个新颖的无参数分解步骤和信任域半径，考虑到可行性残差、最优性残差和负曲率之间的比例。我们为我们的方法建立了全局几乎确定的一阶和二阶收敛保证，并在CUTEst问题、回归问题和鞍点问题上提供了计算结果，以展示其优于现有基于线搜索的随机方法的优越性。

更新时间: 2024-09-24 04:39:47

领域: math.OC,cs.LG,cs.NA,math.NA,stat.CO,stat.ML

下载: http://arxiv.org/abs/2409.15734v1

EvoFA: Evolvable Fast Adaptation for EEG Emotion Recognition

Electroencephalography (EEG)-based emotion recognition has gained significant traction due to its accuracy and objectivity. However, the non-stationary nature of EEG signals leads to distribution drift over time, causing severe performance degradation when the model is reused. While numerous domain adaptation (DA) approaches have been proposed in recent years to address this issue, their reliance on large amounts of target data for calibration restricts them to offline scenarios, rendering them unsuitable for real-time applications. To address this challenge, this paper proposes Evolvable Fast Adaptation (EvoFA), an online adaptive framework tailored for EEG data. EvoFA organically integrates the rapid adaptation of Few-Shot Learning (FSL) and the distribution matching of Domain Adaptation (DA) through a two-stage generalization process. During the training phase, a robust base meta-learning model is constructed for strong generalization. In the testing phase, a designed evolvable meta-adaptation module iteratively aligns the marginal distribution of target (testing) data with the evolving source (training) data within a model-agnostic meta-learning framework, enabling the model to learn the evolving trends of testing data relative to training data and improving online testing performance. Experimental results demonstrate that EvoFA achieves significant improvements compared to the basic FSL method and previous online methods. The introduction of EvoFA paves the way for broader adoption of EEG-based emotion recognition in real-world applications. Our code will be released upon publication.

Updated: 2024-09-24 04:35:10

标题: EvoFA：脑电情绪识别的可演化快速适应

摘要: 基于脑电图（EEG）的情绪识别由于其准确性和客观性而受到广泛关注。然而，EEG信号的非平稳性导致随着时间的推移出现分布漂移，导致在模型重复使用时性能严重下降。近年来提出了许多领域自适应（DA）方法来解决这个问题，但它们依赖大量目标数据进行校准，限制了它们在离线场景中的使用，使它们不适用于实时应用。为解决这一挑战，本文提出了Evolvable Fast Adaptation（EvoFA），这是一个专门针对EEG数据的在线自适应框架。EvoFA通过两阶段泛化过程有机地融合了Few-Shot Learning（FSL）的快速适应和Domain Adaptation（DA）的分布匹配。在训练阶段，构建了一个强泛化的基础元学习模型。在测试阶段，设计了一个可进化的元适应模块，通过一个模型无关的元学习框架，迭代地将目标（测试）数据的边际分布与不断演化的源（训练）数据对齐，使模型能够学习测试数据相对于训练数据的演化趋势，并提高在线测试性能。实验结果表明，与基本FSL方法和先前的在线方法相比，EvoFA取得了显著的改进。EvoFA的引入为在实际应用中更广泛采用基于EEG的情绪识别铺平了道路。我们的代码将在发表后发布。

更新时间: 2024-09-24 04:35:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.15733v1

Efficient and Effective Model Extraction

Model extraction aims to create a functionally similar copy from a machine learning as a service (MLaaS) API with minimal overhead, typically for illicit profit or as a precursor to further attacks, posing a significant threat to the MLaaS ecosystem. However, recent studies have shown that model extraction is highly inefficient, particularly when the target task distribution is unavailable. In such cases, even substantially increasing the attack budget fails to produce a sufficiently similar replica, reducing the adversary's motivation to pursue extraction attacks. In this paper, we revisit the elementary design choices throughout the extraction lifecycle. We propose an embarrassingly simple yet dramatically effective algorithm, Efficient and Effective Model Extraction (E3), focusing on both query preparation and training routine. E3 achieves superior generalization compared to state-of-the-art methods while minimizing computational costs. For instance, with only 0.005 times the query budget and less than 0.2 times the runtime, E3 outperforms classical generative model based data-free model extraction by an absolute accuracy improvement of over 50% on CIFAR-10. Our findings underscore the persistent threat posed by model extraction and suggest that it could serve as a valuable benchmarking algorithm for future security evaluations.

Updated: 2024-09-24 04:29:40

标题: 高效有效的模型抽取

摘要: 模型提取旨在从机器学习作为服务（MLaaS）API中创建一个功能相似的副本，而且开销很小，通常用于非法获利或作为进一步攻击的先导，对MLaaS生态系统构成重大威胁。然而，最近的研究表明，模型提取效率极低，特别是当目标任务分布不可用时。在这种情况下，即使大幅增加攻击预算也无法生成足够相似的复制品，减少了对手追求提取攻击的动机。在本文中，我们重新审视了提取生命周期中的基本设计选择。我们提出了一个简单但极其有效的算法，称为高效有效模型提取（E3），侧重于查询准备和训练例程。E3相比于最先进的方法实现了更好的泛化能力，同时最小化了计算成本。例如，只需0.005倍的查询预算和不到0.2倍的运行时间，E3在CIFAR-10上的绝对准确率改进超过50%，优于传统的基于生成模型的无数据模型提取。我们的研究结果强调了模型提取所构成的持续威胁，并建议它可能成为未来安全评估的有价值的基准算法。

更新时间: 2024-09-24 04:29:40

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2409.14122v2

Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving

The autoregressive world model exhibits robust generalization capabilities in vectorized scene understanding but encounters difficulties in deriving actions due to insufficient uncertainty modeling and self-delusion. In this paper, we explore the feasibility of deriving decisions from an autoregressive world model by addressing these challenges through the formulation of multiple probabilistic hypotheses. We propose LatentDriver, a framework models the environment's next states and the ego vehicle's possible actions as a mixture distribution, from which a deterministic control signal is then derived. By incorporating mixture modeling, the stochastic nature of decisionmaking is captured. Additionally, the self-delusion problem is mitigated by providing intermediate actions sampled from a distribution to the world model. Experimental results on the recently released close-loop benchmark Waymax demonstrate that LatentDriver surpasses state-of-the-art reinforcement learning and imitation learning methods, achieving expert-level performance. The code and models will be made available at https://github.com/Sephirex-X/LatentDriver.

Updated: 2024-09-24 04:26:24

标题: 在自动驾驶中从潜在世界模型中学习多个概率决策

摘要: 自回归世界模型在矢量化场景理解中表现出强大的泛化能力，但在推导行动时遇到困难，原因是不足的不确定性建模和自欺。在本文中，我们通过制定多个概率假设来探讨从自回归世界模型中推导决策的可行性，以解决这些挑战。我们提出了LatentDriver，一个框架，将环境的下一个状态和自车可能的行动建模为混合分布，然后从中推导出确定性控制信号。通过结合混合建模，捕捉了决策过程的随机性。此外，通过向世界模型提供从分布中抽样的中间行动，缓解了自欺问题。最近发布的闭环基准测试Waymax的实验结果表明，LatentDriver超越了最先进的强化学习和模仿学习方法，实现了专家级性能。代码和模型将在https://github.com/Sephirex-X/LatentDriver 上提供。

更新时间: 2024-09-24 04:26:24

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.15730v1

Sequential Learning in the Dense Associative Memory

Sequential learning involves learning tasks in a sequence, and proves challenging for most neural networks. Biological neural networks regularly conquer the sequential learning challenge and are even capable of transferring knowledge both forward and backwards between tasks. Artificial neural networks often totally fail to transfer performance between tasks, and regularly suffer from degraded performance or catastrophic forgetting on previous tasks. Models of associative memory have been used to investigate the discrepancy between biological and artificial neural networks due to their biological ties and inspirations, of which the Hopfield network is perhaps the most studied model. The Dense Associative Memory, or modern Hopfield network, generalizes the Hopfield network, allowing for greater capacities and prototype learning behaviors, while still retaining the associative memory structure. We investigate the performance of the Dense Associative Memory in sequential learning problems, and benchmark various sequential learning techniques in the network. We give a substantial review of the sequential learning space with particular respect to the Hopfield network and associative memories, as well as describe the techniques we implement in detail. We also draw parallels between the classical and Dense Associative Memory in the context of sequential learning, and discuss the departures from biological inspiration that may influence the utility of the Dense Associative Memory as a tool for studying biological neural networks. We present our findings, and show that existing sequential learning methods can be applied to the Dense Associative Memory to improve sequential learning performance.

Updated: 2024-09-24 04:23:00

标题: 密集联想记忆中的序列学习

摘要: Sequential learning涉及按顺序学习任务，并且对大多数神经网络来说都是具有挑战性的。生物神经网络经常克服了顺序学习的挑战，甚至能够在任务之间向前和向后传递知识。人工神经网络经常在任务之间完全无法转移性能，并经常在先前任务上遭受性能下降或灾难性遗忘。关联记忆模型已被用来研究生物和人工神经网络之间的差异，因为它们具有生物联系和启发，其中Hopfield网络可能是最受关注的模型之一。密集关联记忆，或现代Hopfield网络，推广了Hopfield网络，允许更大的容量和原型学习行为，同时仍保留关联记忆结构。我们研究了密集关联记忆在顺序学习问题中的性能，并在网络中基准各种顺序学习技术。我们对顺序学习空间进行了全面回顾，特别关注了Hopfield网络和关联记忆，并详细描述了我们实施的技术。我们还在顺序学习的背景下对经典和密集关联记忆进行了类比，并讨论了可能影响密集关联记忆作为研究生物神经网络工具的生物启发的偏离。我们展示了我们的研究结果，并表明现有的顺序学习方法可以应用于密集关联记忆，以改善顺序学习性能。

更新时间: 2024-09-24 04:23:00

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2409.15729v1

LLM-Cure: LLM-based Competitor User Review Analysis for Feature Enhancement

The exponential growth of the mobile app market underscores the importance of constant innovation and rapid response to user demands. As user satisfaction is paramount to the success of a mobile application (app), developers typically rely on user reviews, which represent user feedback that includes ratings and comments to identify areas for improvement. However, the sheer volume of user reviews poses challenges in manual analysis, necessitating automated approaches. Existing automated approaches either analyze only the target apps reviews, neglecting the comparison of similar features to competitors or fail to provide suggestions for feature enhancement. To address these gaps, we propose a Large Language Model (LLM)-based Competitive User Review Analysis for Feature Enhancement) (LLM-Cure), an approach powered by LLMs to automatically generate suggestion s for mobile app feature improvements. More specifically, LLM-Cure identifies and categorizes features within reviews by applying LLMs. When provided with a complaint in a user review, LLM-Cure curates highly rated (4 and 5 stars) reviews in competing apps related to the complaint and proposes potential improvements tailored to the target application. We evaluate LLM-Cure on 1,056,739 reviews of 70 popular Android apps. Our evaluation demonstrates that LLM-Cure significantly outperforms the state-of-the-art approaches in assigning features to reviews by up to 13% in F1-score, up to 16% in recall and up to 11% in precision. Additionally, LLM-Cure demonstrates its capability to provide suggestions for resolving user complaints. We verify the suggestions using the release notes that reflect the changes of features in the target mobile app. LLM-Cure achieves a promising average of 73% of the implementation of the provided suggestions.

Updated: 2024-09-24 04:17:21

标题: LLM-Cure：基于LLM的竞争对手用户评论分析用于功能增强

摘要: 移动应用市场的指数增长突显了对不断创新和快速响应用户需求的重要性。由于用户满意度对移动应用程序（app）的成功至关重要，开发人员通常依赖用户评论来识别改进的方向，用户评论包括评分和评论。然而，用户评论的数量之多在手动分析中存在挑战，需要自动化方法。现有的自动化方法要么只分析目标应用的评论，忽视与竞争对手类似功能的比较，要么未提供功能增强建议。为了解决这些问题，我们提出了一种基于大型语言模型（LLM）的竞争用户评论分析以实现特性增强（LLM-Cure）的方法，该方法由LLM驱动，自动生成移动应用程序特性改进的建议。具体来说，LLM-Cure通过应用LLM来识别和分类评论中的特性。当用户评论中提出投诉时，LLM-Cure在竞争应用中筛选出与投诉相关的高评级（4和5星）评论，并为目标应用提出潜在的改进建议。我们在70个热门Android应用的1056739条评论上评估了LLM-Cure。我们的评估表明，LLM-Cure在将特性分配给评论方面在F1分数上提高了高达13％，在召回率上提高了高达16％，在精确度上提高了高达11％。此外，LLM-Cure展示了提供解决用户投诉建议的能力。我们通过反映目标移动应用中特性变化的发布说明来验证建议。LLM-Cure实现了73％的提供的建议的平均实施，表现出很有前途。

更新时间: 2024-09-24 04:17:21

领域: cs.SE,cs.AI,cs.IR

下载: http://arxiv.org/abs/2409.15724v1

Federated Large Language Models: Current Progress and Future Directions

Large language models are rapidly gaining popularity and have been widely adopted in real-world applications. While the quality of training data is essential, privacy concerns arise during data collection. Federated learning offers a solution by allowing multiple clients to collaboratively train LLMs without sharing local data. However, FL introduces new challenges, such as model convergence issues due to heterogeneous data and high communication costs. A comprehensive study is required to address these challenges and guide future research. This paper surveys Federated learning for LLMs (FedLLM), highlighting recent advances and future directions. We focus on two key aspects: fine-tuning and prompt learning in a federated setting, discussing existing work and associated research challenges. We finally propose potential research directions for federated LLMs, including pre-training and how LLMs can further enhance federated learning.

Updated: 2024-09-24 04:14:33

标题: 联邦式大型语言模型：当前进展与未来方向

摘要: 大型语言模型迅速赢得了广泛的关注，并在实际应用中被广泛采用。虽然训练数据的质量至关重要，但在数据收集过程中会出现隐私问题。联邦学习通过允许多个客户端共同训练LLMs而无需共享本地数据来提供解决方案。然而，联邦学习引入了新的挑战，例如由于异构数据和高通信成本而导致的模型收敛问题。需要进行全面的研究来解决这些挑战并指导未来的研究。本文调研了用于LLMs的联邦学习（FedLLM），突出了最新进展和未来方向。我们重点讨论了联邦设置中的微调和提示学习两个关键方面，讨论了现有工作和相关研究挑战。最后，我们提出了联邦LLMs的潜在研究方向，包括预训练以及LLMs如何进一步增强联邦学习。

更新时间: 2024-09-24 04:14:33

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2409.15723v1

Applying Incremental Learning in Binary-Addition-Tree Algorithm for Dynamic Binary-State Network Reliability

This paper presents a novel approach to enhance the Binary-Addition-Tree algorithm (BAT) by integrating incremental learning techniques. BAT, known for its simplicity in development, implementation, and application, is a powerful implicit enumeration method for solving network reliability and optimization problems. However, it traditionally struggles with dynamic and large-scale networks due to its static nature. By introducing incremental learning, we enable the BAT to adapt and improve its performance iteratively as it encounters new data or network changes. This integration allows for more efficient computation, reduced redundancy without searching minimal paths and cuts, and improves overall performance in dynamic environments. Experimental results demonstrate the effectiveness of the proposed method, showing significant improvements in both computational efficiency and solution quality compared to the traditional BAT and indirect algorithms, such as MP-based algorithms and MC-based algorithms.

Updated: 2024-09-24 04:13:03

标题: 将“Applying Incremental Learning in Binary-Addition-Tree Algorithm for Dynamic Binary-State Network Reliability”翻译为中文可以是：“将增量学习应用于动态二进制状态网络可靠性的二进制加法树算法”

摘要: 这篇论文提出了一种新颖的方法，通过整合增量学习技术来增强二进制加法树算法（BAT）。BAT以其在开发、实施和应用方面的简单性而闻名，是解决网络可靠性和优化问题的强大隐式枚举方法。然而，由于其静态特性，传统上BAT在动态和大规模网络中遇到困难。通过引入增量学习，我们使BAT能够在遇到新数据或网络变化时迭代地适应和改善其性能。这种整合使得计算更加高效，减少了在搜索最小路径和切割时的冗余，并提高了在动态环境中的整体性能。实验结果表明，与传统BAT和间接算法（如基于MP的算法和基于MC的算法）相比，所提出的方法在计算效率和解决方案质量上都有显著改善。

更新时间: 2024-09-24 04:13:03

领域: cs.LG

下载: http://arxiv.org/abs/2409.15721v1

Adversarial Federated Consensus Learning for Surface Defect Classification Under Data Heterogeneity in IIoT

The challenge of data scarcity hinders the application of deep learning in industrial surface defect classification (SDC), as it's difficult to collect and centralize sufficient training data from various entities in Industrial Internet of Things (IIoT) due to privacy concerns. Federated learning (FL) provides a solution by enabling collaborative global model training across clients while maintaining privacy. However, performance may suffer due to data heterogeneity--discrepancies in data distributions among clients. In this paper, we propose a novel personalized FL (PFL) approach, named Adversarial Federated Consensus Learning (AFedCL), for the challenge of data heterogeneity across different clients in SDC. First, we develop a dynamic consensus construction strategy to mitigate the performance degradation caused by data heterogeneity. Through adversarial training, local models from different clients utilize the global model as a bridge to achieve distribution alignment, alleviating the problem of global knowledge forgetting. Complementing this strategy, we propose a consensus-aware aggregation mechanism. It assigns aggregation weights to different clients based on their efficacy in global knowledge learning, thereby enhancing the global model's generalization capabilities. Finally, we design an adaptive feature fusion module to further enhance global knowledge utilization efficiency. Personalized fusion weights are gradually adjusted for each client to optimally balance global and local features, tailored to their individual global knowledge learning efficacy. Compared with state-of-the-art FL methods like FedALA, the proposed AFedCL method achieves an accuracy increase of up to 5.67% on three SDC datasets.

Updated: 2024-09-24 03:59:32

标题: 对于工业物联网中数据异构下的表面缺陷分类，对抗式联邦共识学习

摘要: 数据稀缺的挑战阻碍了深度学习在工业表面缺陷分类（SDC）中的应用，因为由于隐私问题，从工业物联网（IIoT）中的各个实体收集和集中足够的训练数据变得困难。联邦学习（FL）通过在客户端之间进行协作全局模型训练来提供解决方案，同时保持隐私。然而，由于数据异质性--客户端之间数据分布的差异，性能可能会受到影响。在本文中，我们提出了一种新颖的个性化FL（PFL）方法，称为对抗式联邦一致性学习（AFedCL），用于解决SDC中不同客户端之间数据异质性的挑战。首先，我们开发了一种动态一致性构建策略，以减轻数据异质性导致的性能下降。通过对抗训练，来自不同客户端的本地模型利用全局模型作为桥梁实现分布对齐，缓解全局知识遗忘的问题。为了补充这一策略，我们提出了一种一致性感知聚合机制。它根据客户端在全局知识学习中的效力分配聚合权重，从而增强全局模型的泛化能力。最后，我们设计了一个自适应特征融合模块，以进一步增强全局知识利用效率。个性化融合权重逐渐调整为每个客户端优化平衡全局和本地特征，以适应其个体全局知识学习效力。与FedALA等最新FL方法相比，所提出的AFedCL方法在三个SDC数据集上实现了高达5.67％的准确度提高。

更新时间: 2024-09-24 03:59:32

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2409.15711v1

Autotuning Bipedal Locomotion MPC with GRFM-Net for Efficient Sim-to-Real Transfer

Bipedal locomotion control is essential for humanoid robots to navigate complex, human-centric environments. While optimization-based control designs are popular for integrating sophisticated models of humanoid robots, they often require labor-intensive manual tuning. In this work, we address the challenges of parameter selection in bipedal locomotion control using DiffTune, a model-based autotuning method that leverages differential programming for efficient parameter learning. A major difficulty lies in balancing model fidelity with differentiability. We address this difficulty using a low-fidelity model for differentiability, enhanced by a Ground Reaction Force-and-Moment Network (GRFM-Net) to capture discrepancies between MPC commands and actual control effects. We validate the parameters learned by DiffTune with GRFM-Net in hardware experiments, which demonstrates the parameters' optimality in a multi-objective setting compared with baseline parameters, reducing the total loss by up to 40.5$\%$ compared with the expert-tuned parameters. The results confirm the GRFM-Net's effectiveness in mitigating the sim-to-real gap, improving the transferability of simulation-learned parameters to real hardware.

Updated: 2024-09-24 03:58:18

标题: 使用GRFM-Net自动调谐的双足运动MPC，实现高效的仿真到实际转移

摘要: 双足步行 locomotion 控制对于人形机器人在复杂、以人为中心的环境中导航至关重要。虽然优化控制设计在整合人形机器人复杂模型方面很受欢迎，但通常需要耗费大量人力进行手动调整。在这项工作中，我们利用基于模型的自动调整方法 DiffTune，通过利用微分编程实现高效参数学习，解决了双足步行 locomotion 控制中参数选择的挑战。主要困难在于平衡模型保真度和可微性。我们通过使用低保真度模型进行可微性处理，结合地面反作用力和力矩网络（GRFM-Net）来捕捉模型预测命令与实际控制效果之间的差异来解决这一困难。我们通过在硬件实验中验证 DiffTune 和 GRFM-Net 学习的参数，证明了这些参数在多目标设置中相对于基线参数的最优性，与专家调整的参数相比，总损失最多减少了40.5%。结果证实了 GRFM-Net 在减轻模拟与实际之间差距方面的有效性，提高了仿真学习参数到实际硬件的可转移性。

更新时间: 2024-09-24 03:58:18

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2409.15710v1

Cookie Monster: Efficient On-device Budgeting for Differentially-Private Ad-Measurement Systems

With the impending removal of third-party cookies from major browsers and the introduction of new privacy-preserving advertising APIs, the research community has a timely opportunity to assist industry in qualitatively improving the Web's privacy. This paper discusses our efforts, within a W3C community group, to enhance existing privacy-preserving advertising measurement APIs. We analyze designs from Google, Apple, Meta and Mozilla, and augment them with a more rigorous and efficient differential privacy (DP) budgeting component. Our approach, called Cookie Monster, enforces well-defined DP guarantees and enables advertisers to conduct more private measurement queries accurately. By framing the privacy guarantee in terms of an individual form of DP, we can make DP budgeting more efficient than in current systems that use a traditional DP definition. We incorporate Cookie Monster into Chrome and evaluate it on microbenchmarks and advertising datasets. Across workloads, Cookie Monster significantly outperforms baselines in enabling more advertising measurements under comparable DP protection.

Updated: 2024-09-24 03:54:52

标题: 饼干怪兽：差分隐私广告测量系统的高效设备端预算管理

摘要: 随着主要浏览器即将移除第三方cookie，并引入新的保护隐私的广告API，研究界有机会帮助行业在提高网络隐私方面取得质的改进。本文讨论了我们在W3C社区小组内的努力，以增强现有的保护隐私广告测量API。我们分析了来自Google、Apple、Meta和Mozilla的设计，并通过更严格、更高效的差分隐私（DP）预算组件进行了增强。我们的方法称为Cookie Monster，强制执行明确定义的DP保证，并使广告商能够更准确地进行更私密的测量查询。通过以一种个体形式的DP来表述隐私保证，我们可以使DP预算比目前使用传统DP定义的系统更高效。我们将Cookie Monster整合到Chrome中，并在微基准和广告数据集上进行评估。在各种工作负载中，Cookie Monster在提供更多广告测量数据的同时，具有相当的DP保护水平上显著优于基线。

更新时间: 2024-09-24 03:54:52

领域: cs.CR

下载: http://arxiv.org/abs/2405.16719v4

Improving Emotional Support Delivery in Text-Based Community Safety Reporting Using Large Language Models

Emotional support is a crucial aspect of communication between community members and police dispatchers during incident reporting. However, there is a lack of understanding about how emotional support is delivered through text-based systems, especially in various non-emergency contexts. In this study, we analyzed two years of chat logs comprising 57,114 messages across 8,239 incidents from 130 higher education institutions. Our empirical findings revealed significant variations in emotional support provided by dispatchers, influenced by the type of incident, service time, and a noticeable decline in support over time across multiple organizations. To improve the consistency and quality of emotional support, we developed and implemented a fine-tuned Large Language Model (LLM), named dispatcherLLM. We evaluated dispatcherLLM by comparing its generated responses to those of human dispatchers and other off-the-shelf models using real chat messages. Additionally, we conducted a human evaluation to assess the perceived effectiveness of the support provided by dispatcherLLM. This study not only contributes new empirical understandings of emotional support in text-based dispatch systems but also demonstrates the significant potential of generative AI in improving service delivery.

Updated: 2024-09-24 03:47:02

标题: 使用大型语言模型改进基于文本的社区安全报告中的情感支持交付

摘要: 情感支持是社区成员与警务调度员在报告事件过程中沟通的关键方面。然而，在基于文本的系统中如何提供情感支持，特别是在各种非紧急情况下，人们对此缺乏理解。在这项研究中，我们分析了来自130所高等教育机构的8,239起事件中，跨越57,114条消息的两年聊天记录。我们的实证研究结果揭示了调度员提供的情感支持存在显著差异，受到事件类型、服务时间的影响，并且在多个组织中随着时间的推移存在明显的支持下降。为了提高情感支持的一致性和质量，我们开发并实施了一个经过精调的大型语言模型（LLM），名为dispatcherLLM。我们通过将其生成的回复与人类调度员和其他现成模型在真实聊天消息中的回复进行比较来评估dispatcherLLM。此外，我们进行了人类评估，以评估dispatcherLLM提供的支持的感知效果。这项研究不仅为基于文本的调度系统中的情感支持提供了新的实证理解，还展示了生成式人工智能在改善服务交付方面的重要潜力。

更新时间: 2024-09-24 03:47:02

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2409.15706v1

Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation for Time Series

Real-world time series data frequently have significant amounts of missing values, posing challenges for advanced analysis. A common approach to address this issue is imputation, where the primary challenge lies in determining the appropriate values to fill in. While previous deep learning methods have proven effective for time series imputation, they often produce overconfident imputations, which could brings a potentially overlooked risk to the reliability of the intelligent system. Diffusion methods are proficient in estimating probability distributions but face challenges with high missing rates and moreover, computationally expensive due to the nature of the generative model framework. In this paper, we propose Quantile Sub-Ensembles, a novel method to estimate uncertainty with ensemble of quantile-regression-based task networks and then incorporate Quantile Sub-Ensembles into a non-generative time series imputation method. Our method not only produces accurate imputations that is robust to high missing rates, but also is computationally efficient due to the fast training of its non-generative model. We examine the performance of the proposed method on two real-world datasets, the air quality and health-care datasets, and conduct extensive experiments to show that our method outperforms other most of the baseline methods in making deterministic and probabilistic imputations. Compared with the diffusion method, CSDI, our approach can obtain comparable forecasting results which is better when more data is missing, and moreover consumes a much smaller computation overhead, yielding much faster training and test.

Updated: 2024-09-24 03:39:37

标题: 深度集成遇见分位数回归：面向时间序列的不确定性感知填充

摘要: 现实世界中的时间序列数据经常存在大量缺失值，给先进分析带来挑战。解决这一问题的常见方法是插补，其中主要挑战在于确定要填充的适当值。虽然先前的深度学习方法已被证明对于时间序列插补是有效的，但它们往往会产生过于自信的插补，这可能给智能系统的可靠性带来潜在风险。扩散方法擅长估计概率分布，但面临高缺失率的挑战，而且由于生成模型框架的性质而计算成本高昂。在本文中，我们提出了一种新颖的方法——Quantile Sub-Ensembles，用于利用基于分位数回归的任务网络集成估计不确定性，然后将Quantile Sub-Ensembles结合到非生成式时间序列插补方法中。我们的方法不仅产生准确的插补，对高缺失率具有鲁棒性，而且在非生成式模型的快速训练下计算效率高。我们在两个实际数据集——空气质量和医疗保健数据集上检验了所提出方法的性能，并进行了大量实验，结果表明我们的方法在做确定性和概率性插补方面优于大多数基线方法。与扩散方法CSDI相比，我们的方法在数据缺失较多时可以获得可比的预测结果，而且消耗更小的计算开销，训练和测试速度更快。

更新时间: 2024-09-24 03:39:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2312.01294v3

GraphGI:A GNN Explanation Method using Game Interaction

Graph Neural Networks (GNNs) have garnered significant attention and have been extensively utilized across various domains. However, similar to other deep learning models, GNNs are often viewed as black-box models, making it challenging to interpret their prediction mechanisms. Current graph explanation techniques focus on identifying key nodes or edges, attributing the critical data features that drive model predictions. Nevertheless, these features do not independently influence the model's outcomes; rather, they interact with one another to collectively affect predictions. In this work, we propose a novel explanatory method GraphGI, which identifies the coalition with the highest interaction strength and presents it as an explanatory subgraph. Given a trained model and an input graph, our method explains predictions by gradually incorporating significant edges into the selected subgraph. We utilize game-theoretic interaction values to assess the interaction strength after edge additions, ensuring that the newly added edges confer maximum interaction strength to the explanatory subgraph. To enhance computational efficiency, we adopt effective approximation techniques for calculating Shapley values and game-theoretic interaction values. Empirical evaluations demonstrate that our method achieves superior fidelity and sparsity, maintaining the interpretability of the results at a comprehensible level.

Updated: 2024-09-24 03:24:31

标题: GraphGI：一种使用游戏交互的GNN解释方法

摘要: 图神经网络（GNNs）受到了广泛关注，并在各个领域被广泛应用。然而，与其他深度学习模型类似，GNNs经常被视为黑匣子模型，这使得解释它们的预测机制变得具有挑战性。当前的图解释技术着重于识别关键节点或边，归因于驱动模型预测的关键数据特征。然而，这些特征不是独立影响模型的结果；相反，它们相互作用以共同影响预测结果。在这项工作中，我们提出了一种新颖的解释方法GraphGI，它识别具有最高交互强度的联盟，并将其呈现为解释性子图。给定一个经过训练的模型和一个输入图，我们的方法通过逐步将重要的边逐渐纳入所选子图来解释预测结果。我们利用博弈论互动价值来评估边的添加后的交互强度，确保新添加的边为解释性子图提供最大的交互强度。为了提高计算效率，我们采用有效的近似技术来计算Shapley值和博弈论互动值。实证评估表明，我们的方法在保持结果可解释性的同时实现了更高的忠实度和稀疏性。

更新时间: 2024-09-24 03:24:31

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2409.15698v1

Orthogonal Finetuning for Direct Preference Optimization

DPO is an effective preference optimization algorithm. However, the DPO-tuned models tend to overfit on the dispreferred samples, manifested as overly long generations lacking diversity. While recent regularization approaches have endeavored to alleviate this issue by modifying the objective function, they achieved that at the cost of alignment performance degradation. In this paper, we innovatively incorporate regularization from the perspective of weight updating to curb alignment overfitting. Through the pilot experiment, we discovered that there exists a positive correlation between overfitting and the hyperspherical energy fluctuation. Hence, we introduce orthogonal finetuning for DPO via a weight-Rotated Preference Optimization (RoPO) method, which merely conducts rotational and magnitude-stretching updates on the weight parameters to maintain the hyperspherical energy invariant, thereby preserving the knowledge encoded in the angle between neurons. Extensive experiments demonstrate that our model aligns perfectly with human preferences while retaining the original expressive capacity using only 0.0086% of the trainable parameters, suggesting an effective regularization against overfitting. Specifically, RoPO outperforms DPO by up to 10 points on MT-Bench and by up to 2.8 points on AlpacaEval 2, while enhancing the generation diversity by an average of 6 points.

Updated: 2024-09-24 03:22:15

标题: 直接偏好优化的正交微调

摘要: DPO是一种有效的偏好优化算法。然而，经过DPO调整的模型往往在不受欢迎的样本上过拟合，表现为生成过长且缺乏多样性。尽管最近的正则化方法试图通过修改目标函数来缓解这一问题，但却以对齐性能下降为代价实现了这一目标。在本文中，我们从权重更新的角度创新地将正则化纳入，以抑制对齐过度拟合。通过试验，我们发现过拟合与超球能量波动之间存在正相关。因此，我们引入了一种基于权重旋转的偏好优化（RoPO）方法，仅对权重参数进行旋转和幅度拉伸更新，以保持超球能量不变，从而保留编码在神经元之间角度中的知识。大量实验证明，我们的模型与人类偏好完美对齐，同时仅使用可训练参数的0.0086%，表明了对抗过拟合的有效正则化。具体而言，RoPO在MT-Bench上的表现优于DPO最多10分，在AlpacaEval 2上最多提高2.8分，并平均提高6分的生成多样性。

更新时间: 2024-09-24 03:22:15

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.14836v2

dnaGrinder: a lightweight and high-capacity genomic foundation model

The task of understanding and interpreting the complex information encoded within genomic sequences remains a grand challenge in biological research and clinical applications. In this context, recent advancements in large language model research have led to the development of both encoder-only and decoder-only foundation models designed to decode intricate information in DNA sequences. However, several issues persist, particularly regarding the efficient management of long-range dependencies inherent in genomic sequences, the effective representation of nucleotide variations, and the considerable computational costs associated with large model architectures and extensive pretraining datasets. Current genomic foundation models often face a critical tradeoff: smaller models with mediocre performance versus large models with improved performance. To address these challenges, we introduce dnaGrinder, a unique and efficient genomic foundation model. dnaGrinder excels at managing long-range dependencies within genomic sequences while minimizing computational costs without compromising performance. It achieves results that are not just comparable but often superior to leading DNA models such as Nucleotide Transformer and DNABERT-2. Furthermore, dnaGrinder is designed for easy fine-tuning on workstation-grade GPUs, accommodating input lengths exceeding 17,000 tokens. On a single high-performance GPU, it supports sequences longer than 140,000 tokens, making it a highly efficient and accessible tool for both basic biological research and clinical applications.

Updated: 2024-09-24 03:20:07

标题: dnaGrinder：一种轻量级和高容量基因组基础模型

摘要: 理解和解释基因组序列中编码的复杂信息是生物研究和临床应用中的一项重大挑战。在这种背景下，最近在大型语言模型研究方面取得的进展导致了旨在解码DNA序列中复杂信息的编码器和解码器基础模型的发展。然而，一些问题仍然存在，特别是关于有效管理基因组序列中固有的远程依赖性、有效表示核苷酸变异以及与大型模型结构和广泛预训练数据集相关的巨大计算成本。当前的基因组基础模型经常面临一个关键的折衷：性能中等的较小模型与性能提高的大型模型之间的折衷。为了解决这些挑战，我们引入了dnaGrinder，一种独特而高效的基因组基础模型。dnaGrinder在管理基因组序列中的远程依赖性方面表现出色，同时最大限度地减少计算成本而不影响性能。它取得的结果不仅可与领先的DNA模型（如核苷酸转换器和DNABERT-2）相媲美，而且通常还优于它们。此外，dnaGrinder专为在工作站级GPU上进行简单微调而设计，可容纳超过17,000个令牌的输入长度。在单个高性能GPU上，它支持超过140,000个令牌的序列，使其成为基础生物研究和临床应用的高效易用工具。

更新时间: 2024-09-24 03:20:07

领域: q-bio.GN,cs.AI,cs.CE,cs.CL

下载: http://arxiv.org/abs/2409.15697v1

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

3D vision-language grounding, which focuses on aligning language with the 3D physical environment, stands as a cornerstone in the development of embodied agents. In comparison to recent advancements in the 2D domain, grounding language in 3D scenes faces several significant challenges: (i) the inherent complexity of 3D scenes due to the diverse object configurations, their rich attributes, and intricate relationships; (ii) the scarcity of paired 3D vision-language data to support grounded learning; and (iii) the absence of a unified learning framework to distill knowledge from grounded 3D data. In this work, we aim to address these three major challenges in 3D vision-language by examining the potential of systematically upscaling 3D vision-language learning in indoor environments. We introduce the first million-scale 3D vision-language dataset, SceneVerse, encompassing about 68K 3D indoor scenes and comprising 2.5M vision-language pairs derived from both human annotations and our scalable scene-graph-based generation approach. We demonstrate that this scaling allows for a unified pre-training framework, Grounded Pre-training for Scenes (GPS), for 3D vision-language learning. Through extensive experiments, we showcase the effectiveness of GPS by achieving state-of-the-art performance on all existing 3D visual grounding benchmarks. The vast potential of SceneVerse and GPS is unveiled through zero-shot transfer experiments in the challenging 3D vision-language tasks. Project website: https://scene-verse.github.io.

Updated: 2024-09-24 03:18:24

标题: SceneVerse：为基于场景理解的3D视觉-语言学习提供规模化支持

摘要: 3D视觉-语言基础，专注于将语言与3D物理环境对齐，是发展具有体现特质的代理的基石。与2D领域的最新进展相比，将语言基础化在3D场景中面临着几个重要挑战：（i）由于不同对象配置、丰富属性和错综复杂的关系，3D场景的固有复杂性；（ii）支持基础学习的配对3D视觉-语言数据的稀缺性；以及（iii）缺乏一个统一的学习框架来从基础化的3D数据中提炼知识。在这项工作中，我们旨在通过系统地扩展室内环境中的3D视觉-语言学习来解决这三个主要挑战。我们推出了第一个百万规模的3D视觉-语言数据集SceneVerse，包括约68K个3D室内场景，由人类注释和我们可扩展的基于场景图的生成方法得出的250万个视觉-语言配对。我们展示了这种扩展使得能够为3D视觉-语言学习提供统一的预训练框架Grounded Pre-training for Scenes (GPS)。通过广泛的实验，我们展示了GPS的有效性，实现了所有现有3D视觉基础化基准测试的最新性能。通过在具有挑战性的3D视觉-语言任务中进行零样本转移实验，揭示了SceneVerse和GPS的巨大潜力。项目网站：https://scene-verse.github.io。

更新时间: 2024-09-24 03:18:24

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.RO

下载: http://arxiv.org/abs/2401.09340v3

Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks

Semantic Communication (SemCom) plays a pivotal role in 6G networks, offering a viable solution for future efficient communication. Deep Learning (DL)-based semantic codecs further enhance this efficiency. However, the vulnerability of DL models to security threats, such as adversarial attacks, poses significant challenges for practical applications of SemCom systems. These vulnerabilities enable attackers to tamper with messages and eavesdrop on private information, especially in wireless communication scenarios. Although existing defenses attempt to address specific threats, they often fail to simultaneously handle multiple heterogeneous attacks. To overcome this limitation, we introduce a novel Mixture-of-Experts (MoE)-based SemCom system. This system comprises a gating network and multiple experts, each specializing in different security challenges. The gating network adaptively selects suitable experts to counter heterogeneous attacks based on user-defined security requirements. Multiple experts collaborate to accomplish semantic communication tasks while meeting the security requirements of users. A case study in vehicular networks demonstrates the efficacy of the MoE-based SemCom system. Simulation results show that the proposed MoE-based SemCom system effectively mitigates concurrent heterogeneous attacks, with minimal impact on downstream task accuracy.

Updated: 2024-09-24 03:17:51

标题: 朝向支持可信赖语义通信的6G网络专家混合模式

摘要: 语义通信（SemCom）在6G网络中发挥着关键作用，为未来高效通信提供了可行解决方案。基于深度学习（DL）的语义编解码器进一步增强了这种效率。然而，DL模型对安全威胁的脆弱性，如对抗性攻击，给SemCom系统的实际应用带来了重大挑战。这些脆弱性使攻击者能够篡改消息并窃听私人信息，尤其在无线通信场景下。尽管现有的防御尝试解决特定威胁，但通常无法同时处理多种异构攻击。为了克服这一限制，我们介绍了一种基于混合专家（MoE）的SemCom系统。该系统包括一个门控网络和多个专家，每个专家专门处理不同的安全挑战。门控网络根据用户定义的安全要求自适应地选择合适的专家来对抗异构攻击。多个专家合作完成语义通信任务，同时满足用户的安全要求。在车载网络中的案例研究展示了MoE-based SemCom系统的有效性。模拟结果表明，所提出的MoE-based SemCom系统能够有效地缓解同时发生的异构攻击，对下游任务准确性的影响较小。

更新时间: 2024-09-24 03:17:51

领域: cs.NI,cs.AI,cs.CR

下载: http://arxiv.org/abs/2409.15695v1

Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor with T10

As AI chips incorporate numerous parallelized cores to scale deep learning (DL) computing, inter-core communication is enabled recently by employing high-bandwidth and low-latency interconnect links on the chip (e.g., Graphcore IPU). It allows each core to directly access the fast scratchpad memory in other cores, which enables new parallel computing paradigms. However, without proper support for the scalable inter-core connections in current DL compilers, it is hard for developers to exploit the benefits of this new architecture. We present T10, the first DL compiler to exploit the inter-core communication bandwidth and distributed on-chip memory on AI chips. To formulate the computation and communication patterns of tensor operators in this new architecture, T10 introduces a distributed tensor abstraction rTensor. T10 maps a DNN model to execution plans with a generalized compute-shift pattern, by partitioning DNN computation into sub-operators and mapping them to cores, so that the cores can exchange data following predictable patterns. T10 makes globally optimized trade-offs between on-chip memory consumption and inter-core communication overhead, selects the best execution plan from a vast optimization space, and alleviates unnecessary inter-core communications. Our evaluation with a real inter-core connected AI chip, the Graphcore IPU, shows up to 3.3$\times$ performance improvement, and scalability support for larger models, compared to state-of-the-art DL compilers and vendor libraries.

Updated: 2024-09-24 03:17:47

标题: 使用T10高效互联智能处理器扩展深度学习计算

摘要: 随着AI芯片集成了大量并行化核心，以扩展深度学习（DL）计算，最近通过在芯片上采用高带宽和低延迟互连链接（例如Graphcore IPU），实现了核心之间的互联通信。这使得每个核心可以直接访问其他核心中的快速scratchpad内存，从而实现了新的并行计算范例。然而，在当前DL编译器中没有对可扩展的核心之间连接进行适当支持，开发人员很难利用这种新架构的优势。我们提出了T10，这是第一个利用AI芯片上互核通信带宽和分布式片上存储的DL编译器。为了在这种新架构中制定张量操作符的计算和通信模式，T10引入了分布式张量抽象rTensor。T10将DNN模型映射到具有广义计算-移位模式的执行计划，通过将DNN计算分割为子运算符并将其映射到核心，使得核心可以按照可预测的模式交换数据。T10在片上内存消耗和核心间通信开销之间进行全局优化的折衷，从庞大的优化空间中选择最佳执行计划，并减轻不必要的核心间通信。我们使用真实的互联核心连接的AI芯片Graphcore IPU进行评估，相对于最先进的DL编译器和供应商库，性能提高了最多3.3倍，并支持更大模型的可扩展性。

更新时间: 2024-09-24 03:17:47

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2408.04808v2

Will Large Language Models be a Panacea to Autonomous Driving?

Artificial intelligence (AI) plays a crucial role in autonomous driving (AD) research, propelling its development towards intelligence and efficiency. Currently, the development of AD technology follows two main technical paths: modularization and end-to-end. Modularization decompose the driving task into modules such as perception, prediction, planning, and control, and train them separately. Due to the inconsistency of training objectives between modules, the integrated effect suffers from bias. End-to-end attempts to address this issue by utilizing a single model that directly maps from sensor data to control signals. This path has limited learning capabilities in a comprehensive set of features and struggles to handle unpredictable long-tail events and complex urban traffic scenarios. In the face of challenges encountered in both paths, many researchers believe that large language models (LLMs) with powerful reasoning capabilities and extensive knowledge understanding may be the solution, expecting LLMs to provide AD systems with deeper levels of understanding and decision-making capabilities. In light of the challenges faced by both paths, many researchers believe that LLMs, with their powerful reasoning abilities and extensive knowledge, could offer a solution. To understand if LLMs could enhance AD, this paper conducts a thorough analysis of the potential applications of LLMs in AD systems, including exploring their optimization strategies in both modular and end-to-end approaches, with a particular focus on how LLMs can tackle the problems and challenges present in current solutions. Furthermore, we discuss an important question: Can LLM-based artificial general intelligence (AGI) be a key to achieve high-level AD? We further analyze the potential limitations and challenges that LLMs may encounter in promoting the development of AD technology.

Updated: 2024-09-24 03:12:12

标题: 大型语言模型是否会成为自动驾驶的灵丹妙药？

摘要: 人工智能在自动驾驶（AD）研究中扮演着至关重要的角色，推动其向智能和高效发展。目前，AD技术的发展主要遵循两条技术路径：模块化和端到端。模块化将驾驶任务分解为感知、预测、规划和控制等模块，并分别对其进行训练。由于模块之间训练目标的不一致性，综合效果受到偏见的影响。端到端尝试通过利用单一模型，直接从传感器数据映射到控制信号来解决这个问题。这条路径在各种特征的全面学习能力有限，并且难以处理不可预测的长尾事件和复杂的城市交通场景。面对两条路径遇到的挑战，许多研究人员认为具有强大推理能力和广泛知识理解能力的大型语言模型（LLMs）可能是解决方案，期望LLMs能为AD系统提供更深层次的理解和决策能力。在面对两条路径遇到的挑战时，许多研究人员认为具有强大推理能力和广泛知识的LLMs可能提供一个解决方案。为了了解LLMs是否能够增强AD系统，本文对LLMs在AD系统中的潜在应用进行了彻底分析，包括探讨它们在模块化和端到端方法中的优化策略，特别关注LLMs如何解决当前解决方案中存在的问题和挑战。此外，我们讨论一个重要问题：LLM基础的人工通用智能（AGI）是否能成为实现高级AD的关键？我们进一步分析了LLMs在推动AD技术发展中可能遇到的潜在限制和挑战。

更新时间: 2024-09-24 03:12:12

领域: cs.AI,cs.CL,cs.LG,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2409.14165v2

Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning

With the increasing application of automated robotic digestive endoscopy (RDE), ensuring safe and efficient navigation in the unstructured and narrow digestive tract has become a critical challenge. Existing automated reinforcement learning navigation algorithms, often result in potentially risky collisions due to the absence of essential human intervention, which significantly limits the safety and effectiveness of RDE in actual clinical practice. To address this limitation, we proposed a Human Intervention (HI)-based Proximal Policy Optimization (PPO) framework, dubbed HI-PPO, which incorporates expert knowledge to enhance RDE's safety. Specifically, we introduce an Enhanced Exploration Mechanism (EEM) to address the low exploration efficiency of the standard PPO. Additionally, a reward-penalty adjustment (RPA) is implemented to penalize unsafe actions during initial interventions. Furthermore, Behavior Cloning Similarity (BCS) is included as an auxiliary objective to ensure the agent emulates expert actions. Comparative experiments conducted in a simulated platform across various anatomical colon segments demonstrate that our model effectively and safely guides RDE.

Updated: 2024-09-24 03:01:30

标题: 使用基于人类干预的强化学习确保机器消化内窥镜的安全导航

摘要: 随着自动化机器人消化内窥镜（RDE）的应用不断增加，确保在无结构和狭窄的消化道中进行安全有效的导航已成为一个关键挑战。现有的自动强化学习导航算法往往由于缺乏必要的人类干预而导致潜在的危险碰撞，这显著限制了RDE在实际临床实践中的安全性和有效性。为了解决这一局限性，我们提出了一个基于人类干预（HI）的近端政策优化（PPO）框架，命名为HI-PPO，该框架融入了专家知识以增强RDE的安全性。具体而言，我们引入了增强探索机制（EEM）来解决标准PPO的低探索效率。此外，实施奖励-惩罚调整（RPA）以惩罚初始干预期间的不安全行为。此外，行为克隆相似度（BCS）被包含为一个辅助目标，以确保代理人模仿专家行为。在模拟平台上进行的比较实验跨越各种解剖结肠段，证明了我们的模型有效且安全地引导RDE。

更新时间: 2024-09-24 03:01:30

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.15688v1

C-Pack: Packed Resources For General Chinese Embeddings

We introduce C-Pack, a package of resources that significantly advance the field of general Chinese embeddings. C-Pack includes three critical resources. 1) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets. 2) C-MTP is a massive text embedding dataset curated from labeled and unlabeled Chinese corpora for training embedding models. 3) C-TEM is a family of embedding models covering multiple sizes. Our models outperform all prior Chinese text embeddings on C-MTEB by up to +10% upon the time of the release. We also integrate and optimize the entire suite of training methods for C-TEM. Along with our resources on general Chinese embedding, we release our data and models for English text embeddings. The English models achieve state-of-the-art performance on MTEB benchmark; meanwhile, our released English data is 2 times larger than the Chinese data. All these resources are made publicly available at https://github.com/FlagOpen/FlagEmbedding.

Updated: 2024-09-24 03:01:25

标题: C-Pack: 通用中文嵌入式资源打包

摘要: 我们介绍了C-Pack，这是一套显著推进通用中文嵌入领域的资源包。C-Pack包括三个关键资源。1）C-MTEB是一个涵盖6个任务和35个数据集的中文文本嵌入全面基准。2）C-MTP是一个从标记和未标记的中文语料库中策划出的大规模文本嵌入数据集，用于训练嵌入模型。3）C-TEM是一系列覆盖多种大小的嵌入模型。我们的模型在发布时比所有先前的中文文本嵌入在C-MTEB上的表现提高了多达+10%。我们还整合和优化了C-TEM的整套训练方法。除了我们在通用中文嵌入上的资源外，我们还发布了英文文本嵌入的数据和模型。这些英文模型在MTEB基准上实现了最先进的性能；同时，我们发布的英文数据比中文数据大2倍。所有这些资源都可以在https://github.com/FlagOpen/FlagEmbedding上公开获取。

更新时间: 2024-09-24 03:01:25

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2309.07597v5

A Comprehensive Evaluation of Large Language Models on Mental Illnesses

Large language models have shown promise in various domains, including healthcare. In this study, we conduct a comprehensive evaluation of LLMs in the context of mental health tasks using social media data. We explore the zero-shot (ZS) and few-shot (FS) capabilities of various LLMs, including GPT-4, Llama 3, Gemini, and others, on tasks such as binary disorder detection, disorder severity evaluation, and psychiatric knowledge assessment. Our evaluation involved 33 models testing 9 main prompt templates across the tasks. Key findings revealed that models like GPT-4 and Llama 3 exhibited superior performance in binary disorder detection, with accuracies reaching up to 85% on certain datasets. Moreover, prompt engineering played a crucial role in enhancing model performance. Notably, the Mixtral 8x22b model showed an improvement of over 20%, while Gemma 7b experienced a similar boost in performance. In the task of disorder severity evaluation, we observed that FS learning significantly improved the model's accuracy, highlighting the importance of contextual examples in complex assessments. Notably, the Phi-3-mini model exhibited a substantial increase in performance, with balanced accuracy improving by over 6.80% and mean average error dropping by nearly 1.3 when moving from ZS to FS learning. In the psychiatric knowledge task, recent models generally outperformed older, larger counterparts, with the Llama 3.1 405b achieving an accuracy of 91.2%. Despite promising results, our analysis identified several challenges, including variability in performance across datasets and the need for careful prompt engineering. Furthermore, the ethical guards imposed by many LLM providers hamper the ability to accurately evaluate their performance, due to tendency to not respond to potentially sensitive queries.

Updated: 2024-09-24 02:58:52

标题: 大型语言模型在精神疾病上的全面评估

摘要: 大型语言模型在各个领域，包括医疗保健领域，展示了潜力。在这项研究中，我们利用社交媒体数据对LLMs在心理健康任务中的表现进行了全面评估。我们探索了各种LLMs（包括GPT-4、Llama 3、Gemini等）在二元障碍检测、障碍严重性评估和精神病学知识评估等任务中的零样本（ZS）和少样本（FS）能力。我们的评估涉及33个模型在9个主要提示模板上进行任务测试。关键发现显示，像GPT-4和Llama 3这样的模型在二元障碍检测方面表现出色，准确率在某些数据集上达到了85%。此外，提示工程在提升模型性能方面起着至关重要的作用。值得注意的是，Mixtral 8x22b模型显示出20%以上的改进，而Gemma 7b也经历了类似的性能提升。在障碍严重性评估任务中，我们观察到FS学习显著提高了模型的准确率，突显了在复杂评估中上下文示例的重要性。值得注意的是，Phi-3-mini模型在性能方面显著提升，平衡准确率提高了超过6.80%，平均误差减少了近1.3，从ZS学习转向FS学习。在精神病学知识任务中，最近的模型通常表现优于旧的、更大的对应模型，Llama 3.1 405b实现了91.2%的准确率。尽管结果令人鼓舞，我们的分析发现了一些挑战，包括在数据集间性能变化的可变性以及对提示工程的慎重需求。此外，许多LLM提供商强加的道德保护限制了准确评估它们性能的能力，因为它们倾向于不响应可能敏感的查询。

更新时间: 2024-09-24 02:58:52

领域: cs.AI

下载: http://arxiv.org/abs/2409.15687v1

Linear Contextual Bandits with Interference

Interference, a key concept in causal inference, extends the reward modeling process by accounting for the impact of one unit's actions on the rewards of others. In contextual bandit (CB) settings, where multiple units are present in the same round, potential interference can significantly affect the estimation of expected rewards for different arms, thereby influencing the decision-making process. Although some prior work has explored multi-agent and adversarial bandits in interference-aware settings, the effect of interference in CB, as well as the underlying theory, remains significantly underexplored. In this paper, we introduce a systematic framework to address interference in Linear CB (LinCB), bridging the gap between causal inference and online decision-making. We propose a series of algorithms that explicitly quantify the interference effect in the reward modeling process and provide comprehensive theoretical guarantees, including sublinear regret bounds, finite sample upper bounds, and asymptotic properties. The effectiveness of our approach is demonstrated through simulations and a synthetic data generated based on MovieLens data.

Updated: 2024-09-24 02:51:00

标题: 具有干扰的线性上下文赌博机

摘要: 干扰是因果推断中的一个关键概念，通过考虑一个单位的行动对其他单位的奖励的影响，扩展了奖励建模过程。在上下文臂（CB）设置中，多个单位同时存在时，潜在的干扰可能会显著影响不同臂的预期奖励的估计，从而影响决策过程。尽管一些先前研究已经探讨了在干扰感知设置中的多智能体和对抗性臂带问题，但CB中的干扰效应以及基础理论仍然被显著地忽视。在本文中，我们介绍了一个系统性框架来解决线性CB（LinCB）中的干扰问题，弥合了因果推断和在线决策之间的差距。我们提出了一系列算法，明确量化了奖励建模过程中的干扰效应，并提供了全面的理论保证，包括次线性后悔边界、有限样本上界和渐近性质。我们的方法的有效性通过模拟和基于MovieLens数据生成的合成数据得到了验证。

更新时间: 2024-09-24 02:51:00

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2409.15682v1

Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret

This paper considers the distributed online bandit optimization problem with nonconvex loss functions over a time-varying digraph. This problem can be viewed as a repeated game between a group of online players and an adversary. At each round, each player selects a decision from the constraint set, and then the adversary assigns an arbitrary, possibly nonconvex, loss function to this player. Only the loss value at the current round, rather than the entire loss function or any other information (e.g. gradient), is privately revealed to the player. Players aim to minimize a sequence of global loss functions, which are the sum of local losses. We observe that traditional multi-point bandit algorithms are unsuitable for online optimization, where the data for the loss function are not all a priori, while the one-point bandit algorithms suffer from poor regret guarantees. To address these issues, we propose a novel one-point residual feedback distributed online algorithm. This algorithm estimates the gradient using residuals from two points, effectively reducing the regret bound while maintaining $\mathcal{O}(1)$ sampling complexity per iteration. We employ a rigorous metric, dynamic regret, to evaluate the algorithm's performance. By appropriately selecting the step size and smoothing parameters, we demonstrate that the expected dynamic regret of our algorithm is comparable to existing algorithms that use two-point feedback, provided the deviation in the objective function sequence and the path length of the minimization grows sublinearly. Finally, we validate the effectiveness of the proposed algorithm through numerical simulations.

Updated: 2024-09-24 02:37:33

标题: 通过动态遗憾，在线分布式一点残余反馈的非凸赌博优化

摘要: 本文考虑了在时变有向图上具有非凸损失函数的分布式在线赌博优化问题。这个问题可以被看作是一组在线玩家和对手之间的重复游戏。在每一轮中，每个玩家从约束集中选择一个决策，然后对手为这个玩家分配一个任意的、可能是非凸的损失函数。只有当前轮次的损失值，而不是整个损失函数或任何其他信息（例如梯度），被私下告知给玩家。玩家的目标是最小化一系列全局损失函数，这些函数是局部损失的总和。我们观察到传统的多点赌博算法不适用于在线优化，其中损失函数的数据不是事先确定的，而一点赌博算法则受到悔恨保证的限制。为了解决这些问题，我们提出了一种新颖的一点残差反馈分布式在线算法。该算法使用两点的残差来估计梯度，有效地降低了悔恨界，同时保持每次迭代的$\mathcal{O}(1)$采样复杂度。我们采用严格的度量标准，动态悔恨，来评估算法的性能。通过适当选择步长和平滑参数，我们证明了我们算法的期望动态悔恨与使用两点反馈的现有算法相当，前提是目标函数序列的偏差和最小化的路径长度呈次线性增长。最后，我们通过数值模拟验证了所提算法的有效性。

更新时间: 2024-09-24 02:37:33

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2409.15680v1

Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation

The quality of the dataset is crucial for ensuring optimal performance and reliability of downstream task models. However, datasets often contain noisy data inadvertently included during the construction process. Numerous attempts have been made to correct this issue through human annotators. However, hiring and managing human annotators is expensive and time-consuming. As an alternative, recent studies are exploring the use of large language models (LLMs) for data annotation. In this study, we present a case study that extends the application of LLM-based data annotation to enhance the quality of existing datasets through a cleansing strategy. Specifically, we leverage approaches such as chain-of-thought and majority voting to imitate human annotation and classify unrelated documents from the Multi-News dataset, which is widely used for the multi-document summarization task. Through our proposed cleansing method, we introduce an enhanced Multi-News+. By employing LLMs for data cleansing, we demonstrate an efficient and effective approach to improving dataset quality without relying on expensive human annotation efforts.

Updated: 2024-09-24 02:35:41

标题: Multi-News+: 基于LLM的数据注释的成本效益数据集清洗

摘要: 数据集的质量对于确保下游任务模型的最佳性能和可靠性至关重要。然而，在构建过程中通常会无意中包含噪音数据。许多尝试通过人工注释员来纠正这个问题。然而，雇佣和管理人工注释员成本高且耗时。作为替代方案，最近的研究正在探索使用大型语言模型（LLMs）进行数据注释。在这项研究中，我们提出一个案例研究，将基于LLM的数据注释应用扩展到通过清洗策略增强现有数据集的质量。具体而言，我们利用链式思维和多数投票等方法来模仿人工注释并分类与广泛用于多文档摘要任务的Multi-News数据集中无关的文档。通过我们提出的清洗方法，我们引入了一个增强版的Multi-News+。通过利用LLMs进行数据清洗，我们展示了一种有效且高效的方法，可以改善数据集的质量，而无需依赖昂贵的人工注释工作。

更新时间: 2024-09-24 02:35:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.09682v3

Northeast Materials Database (NEMAD): Enabling Discovery of High Transition Temperature Magnetic Compounds

The discovery of novel magnetic materials with greater operating temperature ranges and optimized performance is essential for advanced applications. Current data-driven approaches are challenging and limited due to the lack of accurate, comprehensive, and feature-rich databases. This study aims to address this challenge by introducing a new approach that uses Large Language Models (LLMs) to create a comprehensive, experiment-based, magnetic materials database named the Northeast Materials Database (NEMAD), which consists of 26,706 magnetic materials (www.nemad.org). The database incorporates chemical composition, magnetic phase transition temperatures, structural details, and magnetic properties. Enabled by NEMAD, machine learning models were developed to classify materials and predict transition temperatures. Our classification model achieved an accuracy of 90% in categorizing materials as ferromagnetic (FM), antiferromagnetic (AFM), and non-magnetic (NM). The regression models predict Curie (N\'eel) temperature with a coefficient of determination (R2) of 0.86 (0.85) and a mean absolute error (MAE) of 62K (32K). These models identified 62 (19) FM (AFM) candidates with a predicted Curie (N\'eel) temperature above 500K (100K) from the Materials Project. This work shows the feasibility of combining LLMs for automated data extraction and machine learning models in accelerating the discovery of magnetic materials.

Updated: 2024-09-24 02:27:10

标题: 东北材料数据库（NEMAD）：促进高转变温度磁性化合物的发现

摘要: 发现具有更大操作温度范围和优化性能的新磁性材料对于先进应用至关重要。目前的数据驱动方法由于缺乏准确、全面和功能丰富的数据库而具有挑战性和局限性。本研究旨在通过引入一种新方法来解决这一挑战，该方法利用大型语言模型(LLMs)创建了一个全面的、基于实验的磁性材料数据库，名为东北材料数据库(NEMAD)，其中包括26,706种磁性材料(www.nemad.org)。该数据库包含化学组成、磁相转变温度、结构细节和磁性能。借助NEMAD，开发了机器学习模型来分类材料并预测转变温度。我们的分类模型在将材料分类为铁磁性(FM)、反铁磁性(AFM)和非磁性(NM)方面的准确率达到90%。回归模型预测居里(尼尔)温度的决定系数(R2)为0.86(0.85)，平均绝对误差(MAE)为62K(32K)。这些模型从材料项目中识别出62(19)个预测居里(尼尔)温度高于500K(100K)的FM(AFM)候选材料。这项工作展示了将LLMs与自动化数据提取和机器学习模型相结合加速磁性材料发现的可行性。

更新时间: 2024-09-24 02:27:10

领域: cond-mat.mtrl-sci,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2409.15675v1

GUARD: A Safe Reinforcement Learning Benchmark

Due to the trial-and-error nature, it is typically challenging to apply RL algorithms to safety-critical real-world applications, such as autonomous driving, human-robot interaction, robot manipulation, etc, where such errors are not tolerable. Recently, safe RL (i.e. constrained RL) has emerged rapidly in the literature, in which the agents explore the environment while satisfying constraints. Due to the diversity of algorithms and tasks, it remains difficult to compare existing safe RL algorithms. To fill that gap, we introduce GUARD, a Generalized Unified SAfe Reinforcement Learning Development Benchmark. GUARD has several advantages compared to existing benchmarks. First, GUARD is a generalized benchmark with a wide variety of RL agents, tasks, and safety constraint specifications. Second, GUARD comprehensively covers state-of-the-art safe RL algorithms with self-contained implementations. Third, GUARD is highly customizable in tasks and algorithms. We present a comparison of state-of-the-art safe RL algorithms in various task settings using GUARD and establish baselines that future work can build on.

Updated: 2024-09-24 02:23:04

标题: GUARD：一个安全的强化学习基准

摘要: 由于试错性质，通常很难将强化学习算法应用于安全关键的现实世界应用，例如自动驾驶、人机交互、机器人操纵等，这些错误是不可容忍的。最近，安全强化学习（即约束强化学习）在文献中迅速出现，其中代理在满足约束条件的同时探索环境。由于算法和任务的多样性，仍然很难比较现有的安全强化学习算法。为了填补这一空白，我们引入了GUARD，一个通用的统一安全强化学习开发基准。GUARD与现有基准相比具有几个优势。首先，GUARD是一个通用基准，涵盖了各种各样的强化学习代理、任务和安全约束规范。其次，GUARD全面涵盖了最先进的安全强化学习算法，并具有自包含的实现。第三，GUARD在任务和算法方面具有高度的可定制性。我们使用GUARD在各种任务设置中比较了最先进的安全强化学习算法，并建立了未来工作可以基于的基线。

更新时间: 2024-09-24 02:23:04

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2305.13681v4

Data Poisoning-based Backdoor Attack Framework against Supervised Learning Rules of Spiking Neural Networks

Spiking Neural Networks (SNNs), the third generation neural networks, are known for their low energy consumption and high robustness. SNNs are developing rapidly and can compete with Artificial Neural Networks (ANNs) in many fields. To ensure that the widespread use of SNNs does not cause serious security incidents, much research has been conducted to explore the robustness of SNNs under adversarial sample attacks. However, many other unassessed security threats exist, such as highly stealthy backdoor attacks. Therefore, to fill the research gap in this and further explore the security vulnerabilities of SNNs, this paper explores the robustness performance of SNNs trained by supervised learning rules under backdoor attacks. Specifically, the work herein includes: i) We propose a generic backdoor attack framework that can be launched against the training process of existing supervised learning rules and covers all learnable dataset types of SNNs. ii) We analyze the robustness differences between different learning rules and between SNN and ANN, which suggests that SNN no longer has inherent robustness under backdoor attacks. iii) We reveal the vulnerability of conversion-dependent learning rules caused by backdoor migration and further analyze the migration ability during the conversion process, finding that the backdoor migration rate can even exceed 99%. iv) Finally, we discuss potential countermeasures against this kind of backdoor attack and its technical challenges and point out several promising research directions.

Updated: 2024-09-24 02:15:19

标题: 基于数据中毒的背门攻击框架针对脉冲神经网络监督学习规则

摘要: 脉冲神经网络（SNNs），第三代神经网络，以低能耗和高鲁棒性而闻名。SNNs正在迅速发展，并且可以在许多领域与人工神经网络（ANNs）竞争。为了确保广泛使用SNNs不会导致严重的安全事件，已经进行了大量研究来探索SNNs在对抗性样本攻击下的鲁棒性。然而，许多其他未经评估的安全威胁存在，如高度隐蔽的后门攻击。因此，为了填补这一研究空白并进一步探索SNNs的安全漏洞，本文探讨了受监督学习规则训练的SNNs在后门攻击下的鲁棒性表现。具体来说，本文工作包括：i）我们提出了一个通用的后门攻击框架，可以针对现有监督学习规则的训练过程发起攻击，并包括SNNs的所有可学习数据集类型。ii）我们分析了不同学习规则之间以及SNN和ANN之间的鲁棒性差异，这表明SNN在后门攻击下不再具有固有的鲁棒性。iii）我们揭示了后门迁移导致的依赖于转换的学习规则的脆弱性，并进一步分析了转换过程中的迁移能力，发现后门迁移率甚至可以超过99%。iv）最后，我们讨论了针对这种后门攻击的潜在对策及其技术挑战，并指出了几个有前途的研究方向。

更新时间: 2024-09-24 02:15:19

领域: cs.CR,cs.NE

下载: http://arxiv.org/abs/2409.15670v1

A Survey on Recent Random Walk-based Methods for Embedding Knowledge Graphs

Machine learning, deep learning, and NLP methods on knowledge graphs are present in different fields and have important roles in various domains from self-driving cars to friend recommendations on social media platforms. However, to apply these methods to knowledge graphs, the data usually needs to be in an acceptable size and format. In fact, knowledge graphs normally have high dimensions and therefore we need to transform them to a low-dimensional vector space. An embedding is a low-dimensional space into which you can translate high dimensional vectors in a way that intrinsic features of the input data are preserved. In this review, we first explain knowledge graphs and their embedding and then review some of the random walk-based embedding methods that have been developed recently.

Updated: 2024-09-24 02:11:55

标题: 一个关于最近基于随机游走方法嵌入知识图的调查

摘要: 机器学习、深度学习和自然语言处理方法在知识图谱上的应用涉及不同领域，在从自动驾驶汽车到社交媒体平台上的朋友推荐等各种领域中起着重要作用。然而，要将这些方法应用到知识图谱中，数据通常需要具有可接受的大小和格式。事实上，知识图谱通常具有高维度，因此我们需要将它们转换为低维度向量空间。嵌入是一个低维空间，您可以将高维向量转化为其中，以保留输入数据的内在特征。在这篇综述中，我们首先解释知识图谱及其嵌入，然后审查一些最近开发的基于随机游走的嵌入方法。

更新时间: 2024-09-24 02:11:55

领域: cs.LG

下载: http://arxiv.org/abs/2406.07402v2

Adversarial Attacks to Multi-Modal Models

Multi-modal models have gained significant attention due to their powerful capabilities. These models effectively align embeddings across diverse data modalities, showcasing superior performance in downstream tasks compared to their unimodal counterparts. Recent study showed that the attacker can manipulate an image or audio file by altering it in such a way that its embedding matches that of an attacker-chosen targeted input, thereby deceiving downstream models. However, this method often underperforms due to inherent disparities in data from different modalities. In this paper, we introduce CrossFire, an innovative approach to attack multi-modal models. CrossFire begins by transforming the targeted input chosen by the attacker into a format that matches the modality of the original image or audio file. We then formulate our attack as an optimization problem, aiming to minimize the angular deviation between the embeddings of the transformed input and the modified image or audio file. Solving this problem determines the perturbations to be added to the original media. Our extensive experiments on six real-world benchmark datasets reveal that CrossFire can significantly manipulate downstream tasks, surpassing existing attacks. Additionally, we evaluate six defensive strategies against CrossFire, finding that current defenses are insufficient to counteract our CrossFire.

Updated: 2024-09-24 02:09:10

标题: 对多模态模型的对抗攻击

摘要: 多模态模型由于其强大的能力而受到了广泛关注。这些模型能够有效地在不同数据模态之间对齐嵌入，相比于它们的单模态对应物，在下游任务中展现出更优越的性能。最近的研究表明，攻击者可以通过改变图像或音频文件的方式来操纵它，使其嵌入与攻击者选择的目标输入的嵌入匹配，从而欺骗下游模型。然而，这种方法通常由于不同模态的数据固有差异而表现不佳。在本文中，我们引入了一种创新的攻击多模态模型的方法CrossFire。CrossFire首先将攻击者选择的目标输入转换为与原始图像或音频文件的模态相匹配的格式。然后，我们将我们的攻击构建为一个优化问题，旨在最小化转换输入与修改后的图像或音频文件的嵌入之间的角度偏差。解决这个问题确定了要添加到原始媒体中的扰动。我们在六个真实世界的基准数据集上进行了大量实验，发现CrossFire可以显著地操纵下游任务，超越现有的攻击手段。此外，我们评估了六种防御策略来对抗CrossFire，发现当前的防御措施不足以抵消我们的CrossFire攻击。

更新时间: 2024-09-24 02:09:10

领域: cs.CR,cs.IR,cs.LG

下载: http://arxiv.org/abs/2409.06793v2

Mitigating Semantic Leakage in Cross-lingual Embeddings via Orthogonality Constraint

Accurately aligning contextual representations in cross-lingual sentence embeddings is key for effective parallel data mining. A common strategy for achieving this alignment involves disentangling semantics and language in sentence embeddings derived from multilingual pre-trained models. However, we discover that current disentangled representation learning methods suffer from semantic leakage - a term we introduce to describe when a substantial amount of language-specific information is unintentionally leaked into semantic representations. This hinders the effective disentanglement of semantic and language representations, making it difficult to retrieve embeddings that distinctively represent the meaning of the sentence. To address this challenge, we propose a novel training objective, ORthogonAlity Constraint LEarning (ORACLE), tailored to enforce orthogonality between semantic and language embeddings. ORACLE builds upon two components: intra-class clustering and inter-class separation. Through experiments on cross-lingual retrieval and semantic textual similarity tasks, we demonstrate that training with the ORACLE objective effectively reduces semantic leakage and enhances semantic alignment within the embedding space.

Updated: 2024-09-24 02:01:52

标题: 通过正交约束减轻跨语言嵌入中的语义泄露

摘要: 准确地对齐跨语言句子嵌入中的上下文表示对于有效的并行数据挖掘至关重要。实现这种对齐的常见策略涉及从多语言预训练模型中导出的句子嵌入中解开语义和语言。然而，我们发现当前的解开表示学习方法存在语义泄漏 - 我们引入这个术语来描述当大量语言特定信息无意中泄漏到语义表示中时。这妨碍了有效解开语义和语言表示，使得难以检索能够明显代表句子含义的嵌入。为了解决这一挑战，我们提出了一种新的训练目标，称为ORthogonAlity Constraint LEarning（ORACLE），旨在强化语义和语言嵌入之间的正交性。ORACLE建立在两个组件之上：类内聚类和类间分离。通过在跨语言检索和语义文本相似性任务上的实验，我们展示了使用ORACLE目标训练有效减少了语义泄漏，并增强了嵌入空间内的语义对齐。

更新时间: 2024-09-24 02:01:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.15664v1

Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models

With the advent of the big data and large language model era, zero-shot personalized rapid customization has emerged as a significant trend. In this report, we introduce Takin AudioLLM, a series of techniques and models, mainly including Takin TTS, Takin VC, and Takin Morphing, specifically designed for audiobook production. These models are capable of zero-shot speech production, generating high-quality speech that is nearly indistinguishable from real human speech and facilitating individuals to customize the speech content according to their own needs. Specifically, we first introduce Takin TTS, a neural codec language model that builds upon an enhanced neural speech codec and a multi-task training framework, capable of generating high-fidelity natural speech in a zero-shot way. For Takin VC, we advocate an effective content and timbre joint modeling approach to improve the speaker similarity, while advocating for a conditional flow matching based decoder to further enhance its naturalness and expressiveness. Last, we propose the Takin Morphing system with highly decoupled and advanced timbre and prosody modeling approaches, which enables individuals to customize speech production with their preferred timbre and prosody in a precise and controllable manner. Extensive experiments validate the effectiveness and robustness of our Takin AudioLLM series models. For detailed demos, please refer to https://everest-ai.github.io/takinaudiollm/.

Updated: 2024-09-24 02:00:54

标题: Takin：一组优质的零样本语音生成模型

摘要: 随着大数据和大型语言模型时代的到来，零射击个性化快速定制已经成为一个重要趋势。在这篇报告中，我们介绍了Takin AudioLLM，这是一系列专为有声书制作而设计的技术和模型，主要包括Takin TTS、Takin VC和Takin Morphing。这些模型能够进行零射击语音生成，生成几乎无法与真实人类语音区分的高质量语音，并帮助个人根据自己的需求定制语音内容。具体来说，我们首先介绍了Takin TTS，这是一个建立在增强型神经语音编解码器和多任务训练框架基础上的神经编解码器语言模型，能够以零射击方式生成高保真的自然语音。对于Takin VC，我们提倡一种有效的内容和音色联合建模方法来提高说话者相似度，同时提倡一种基于条件流匹配的解码器来进一步增强其自然性和表现力。最后，我们提出了Takin Morphing系统，采用高度解耦和先进的音色和韵律建模方法，使个人能够以精确可控的方式定制其偏好的音色和韵律的语音生成。大量实验证实了我们Takin AudioLLM系列模型的有效性和稳健性。有关详细演示，请参阅https://everest-ai.github.io/takinaudiollm/。

更新时间: 2024-09-24 02:00:54

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2409.12139v3

Double-Path Adaptive-correlation Spatial-Temporal Inverted Transformer for Stock Time Series Forecasting

Spatial-temporal graph neural networks (STGNNs) have achieved significant success in various time series forecasting tasks. However, due to the lack of explicit and fixed spatial relationships in stock prediction tasks, many STGNNs fail to perform effectively in this domain. While some STGNNs learn spatial relationships from time series, they often lack comprehensiveness. Research indicates that modeling time series using feature changes as tokens reveals entirely different information compared to using time steps as tokens. To more comprehensively extract dynamic spatial information from stock data, we propose a Double-Path Adaptive-correlation Spatial-Temporal Inverted Transformer (DPA-STIFormer). DPA-STIFormer models each node via continuous changes in features as tokens and introduces a Double Direction Self-adaptation Fusion mechanism. This mechanism decomposes node encoding into temporal and feature representations, simultaneously extracting different spatial correlations from a double path approach, and proposes a Double-path gating mechanism to fuse these two types of correlation information. Experiments conducted on four stock market datasets demonstrate state-of-the-art results, validating the model's superior capability in uncovering latent temporal-correlation patterns.

Updated: 2024-09-24 01:53:22

标题: 股票时间序列预测的双路径自适应相关性时空反向变压器

摘要: 空间-时间图神经网络（STGNNs）在各种时间序列预测任务中取得了显著的成功。然而，由于股票预测任务中缺乏明确和固定的空间关系，许多STGNNs在该领域中无法有效执行。虽然一些STGNNs从时间序列中学习空间关系，但它们往往缺乏全面性。研究表明，使用特征变化作为标记来建模时间序列揭示了与使用时间步长作为标记完全不同的信息。为了更全面地从股票数据中提取动态空间信息，我们提出了一种双路径自适应相关性空间-时间反转器（DPA-STIFormer）。DPA-STIFormer通过特征变化作为标记对每个节点进行建模，并引入了双向自适应融合机制。该机制将节点编码分解为时间和特征表示，同时从双路径方法中提取不同的空间相关性，并提出了一个双路径门控机制来融合这两种相关信息。对四个股票市场数据集进行的实验展示了最先进的结果，验证了该模型在揭示潜在的时间相关性模式方面的优越能力。

更新时间: 2024-09-24 01:53:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.15662v1

ReLEP: A Novel Framework for Real-world Long-horizon Embodied Planning

Real-world long-horizon embodied planning underpins embodied AI. To accomplish long-horizon tasks, agents need to decompose abstract instructions into detailed steps. Prior works mostly rely on GPT-4V for task decomposition into predefined actions, which limits task diversity due to GPT-4V's finite understanding of larger skillsets. Therefore, we present ReLEP, a groundbreaking framework for Real world Long-horizon Embodied Planning, which can accomplish a wide range of daily tasks. At its core lies a fine-tuned large vision language model that formulates plans as sequences of skill functions according to input instruction and scene image. These functions are selected from a carefully designed skill library. ReLEP is also equipped with a Memory module for plan and status recall, and a Robot Configuration module for versatility across robot types. In addition, we propose a semi-automatic data generation pipeline to tackle dataset scarcity. Real-world off-line experiments across eight daily embodied tasks demonstrate that ReLEP is able to accomplish long-horizon embodied tasks and outperforms other state-of-the-art baseline methods.

Updated: 2024-09-24 01:47:23

标题: ReLEP：一个用于现实世界长期规划的新框架

摘要: 实际世界中的长视野具体规划支撑着具体的人工智能。为了完成长期任务，代理需要将抽象指令分解为详细的步骤。先前的工作大多依赖于GPT-4V对任务进行预定义动作的分解，这限制了任务多样性，因为GPT-4V对更大技能集的理解是有限的。因此，我们提出了ReLEP，这是一个开创性的实际世界长视野具体规划框架，可以完成各种日常任务。其核心是一个经过精细调整的大型视觉语言模型，根据输入指令和场景图像将计划制定为技能函数序列。这些函数是从精心设计的技能库中选择的。ReLEP还配备了一个用于计划和状态回忆的内存模块，以及一个用于跨机器人类型的多功能性的机器人配置模块。此外，我们提出了一个半自动数据生成流水线来解决数据集稀缺的问题。在八个日常具体任务上的实际世界离线实验表明，ReLEP能够完成长期具体任务，并且优于其他最先进的基准方法。

更新时间: 2024-09-24 01:47:23

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.15658v1

MMPT: Multimodal Prompt Tuning for Zero-shot Instruction Learning

Multimodal Large Language Models (MLLMs) demonstrate remarkable performance across a wide range of domains, with increasing emphasis on enhancing their zero-shot generalization capabilities for unseen tasks across various modalities. Instruction tuning has emerged as an effective strategy for achieving zero-shot generalization by finetuning pretrained models on diverse multimodal tasks. As the scale of MLLMs continues to grow, parameter-efficient finetuning becomes increasingly critical. However, most existing parameter-efficient approaches focus only on single modalities and often overlook the multimodal characteristics during finetuning. In this work, we introduce a novel Multimodal Prompt Tuning (MMPT) approach for efficient instruction tuning of MLLMs. MMPT effectively integrates visual and textual prompts into the vision encoder and language processor respectively during finetuning, facilitating the extraction and alignment of features across modalities. Empirical results on various multimodal evaluation datasets demonstrate the superior performance of our approach compared to several state-of-the-art baselines. A comprehensive set of ablation studies validates the effectiveness of our prompt design and the efficiency of our approach.

Updated: 2024-09-24 01:40:24

标题: MMPT：零-shot 指导学习的多模态提示调整

摘要: 多模态大型语言模型（MLLMs）在各个领域展现出卓越的性能，在增强它们对各种模态下未见任务的零样本泛化能力方面越来越受重视。指令微调已经成为一种有效的策略，通过在不同的多模态任务上微调预训练模型来实现零样本泛化。随着MLLM的规模不断增长，参数高效微调变得日益关键。然而，大多数现有的参数高效方法仅关注单模态，并且在微调过程中经常忽视多模态特性。在这项工作中，我们介绍了一种新颖的多模态提示微调（MMPT）方法，用于对MLLM进行高效的指令微调。MMPT在微调过程中有效地将视觉和文本提示分别整合到视觉编码器和语言处理器中，促进跨模态特征的提取和对齐。在各种多模态评估数据集上的实证结果显示，与几种最先进的基线方法相比，我们的方法表现出更优越的性能。一系列全面的消融研究验证了我们提示设计的有效性和我们方法的高效性。

更新时间: 2024-09-24 01:40:24

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.15657v1

Identified-and-Targeted: The First Early Evidence of the Privacy-Invasive Use of Browser Fingerprinting for Online Tracking

While advertising has become commonplace in today's online interactions, there is a notable dearth of research investigating the extent to which browser fingerprinting is harnessed for user tracking and targeted advertising. Prior studies only measured whether fingerprinting-related scripts are being run on the websites but that in itself does not necessarily mean that fingerprinting is being used for the privacy-invasive purpose of online tracking because fingerprinting might be deployed for the defensive purposes of bot/fraud detection and user authentication. It is imperative to address the mounting concerns regarding the utilization of browser fingerprinting in the realm of online advertising. To understand the privacy-invasive use of fingerprinting for user tracking, this paper introduces a new framework ``FPTrace'' (fingerprinting-based tracking assessment and comprehensive evaluation framework) designed to identify alterations in advertisements resulting from adjustments in browser fingerprinting settings. Our approach involves emulating genuine user interactions, capturing advertiser bid data, and closely monitoring HTTP information. Using FPTrace we conduct a large-scale measurement study to identify whether browser fingerprinting is being used for the purpose of user tracking and ad targeting. The results we have obtained provide robust evidence supporting the utilization of browser fingerprinting for the purposes of advertisement tracking and targeting. This is substantiated by significant disparities in bid values and a reduction in HTTP records subsequent to changes in fingerprinting. In conclusion, our research unveils the widespread employment of browser fingerprinting in online advertising, prompting critical considerations regarding user privacy and data security within the digital advertising landscape.

Updated: 2024-09-24 01:39:16

标题: 已确定并针对：浏览器指纹技术在在线追踪中首次被证实具有侵犯隐私的早期证据

摘要: 虽然广告在当今的在线互动中已变得司空见惯，但有一点值得注意的是，对于浏览器指纹技术被用于用户跟踪和定向广告的程度缺乏研究。以往的研究仅仅测量了网站上是否运行了与指纹技术相关的脚本，但这并不一定意味着指纹技术被用于侵犯隐私的在线跟踪目的，因为指纹技术可能被用于防御目的，如机器人/欺诈检测和用户认证。有必要解决有关在在线广告领域利用浏览器指纹技术的担忧。为了了解指纹技术在用户跟踪中的侵犯隐私用途，本文介绍了一个新的框架“FPTrace”（基于指纹技术的跟踪评估和全面评估框架），旨在识别由于浏览器指纹设置调整而导致的广告内容变化。我们的方法涉及模拟真实用户互动，捕获广告商出价数据，并密切监视HTTP信息。利用FPTrace，我们进行了一项大规模的测量研究，以确定浏览器指纹技术是否被用于用户跟踪和广告定向目的。我们得到的结果提供了有力的证据，支持了利用浏览器指纹技术进行广告跟踪和定向的做法。这得到了出价值的显著差异和HTTP记录减少的支持。总之，我们的研究揭示了浏览器指纹技术在在线广告中被广泛应用，引发了对数字广告领域用户隐私和数据安全的关键考虑。

更新时间: 2024-09-24 01:39:16

领域: cs.CR

下载: http://arxiv.org/abs/2409.15656v1

MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models

This study explores the effectiveness of Large Language Models (LLMs) in creating personalized "mirror stories" that reflect and resonate with individual readers' identities, addressing the significant lack of diversity in literature. We present MirrorStories, a corpus of 1,500 personalized short stories generated by integrating elements such as name, gender, age, ethnicity, reader interest, and story moral. We demonstrate that LLMs can effectively incorporate diverse identity elements into narratives, with human evaluators identifying personalized elements in the stories with high accuracy. Through a comprehensive evaluation involving 26 diverse human judges, we compare the effectiveness of MirrorStories against generic narratives. We find that personalized LLM-generated stories not only outscore generic human-written and LLM-generated ones across all metrics of engagement (with average ratings of 4.22 versus 3.37 on a 5-point scale), but also achieve higher textual diversity while preserving the intended moral. We also provide analyses that include bias assessments and a study on the potential for integrating images into personalized stories.

Updated: 2024-09-24 01:30:14

标题: 镜像故事：通过大型语言模型生成个性化叙事反映多样性

摘要: 这项研究探讨了大型语言模型（LLMs）在创建个性化“镜像故事”方面的有效性，这些故事反映并 resonates 个人读者的身份，解决了文学作品中多样性的显著缺乏。我们提出了MirrorStories，这是一个包含1,500个个性化短篇故事的语料库，通过整合姓名、性别、年龄、种族、读者兴趣和故事道德等元素生成。我们证明了LLMs能够有效地将多样化的身份元素融入叙述中，人类评估者准确地识别了故事中的个性化元素。通过一个涉及26名不同背景的人类评委的全面评估，我们比较了MirrorStories与通用叙述的有效性。我们发现，个性化的LLM生成的故事不仅在所有参与度指标上（平均评分为4.22，而通用人类撰写和LLM生成的故事为3.37）胜过通用故事，而且在保留预期道德的同时实现了更高的文本多样性。我们还提供了包括偏见评估和关于将图像整合到个性化故事中的潜力研究在内的分析。

更新时间: 2024-09-24 01:30:14

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2409.13935v2

English offensive text detection using CNN based Bi-GRU model

Over the years, the number of users of social media has increased drastically. People frequently share their thoughts through social platforms, and this leads to an increase in hate content. In this virtual community, individuals share their views, express their feelings, and post photos, videos, blogs, and more. Social networking sites like Facebook and Twitter provide platforms to share vast amounts of content with a single click. However, these platforms do not impose restrictions on the uploaded content, which may include abusive language and explicit images unsuitable for social media. To resolve this issue, a new idea must be implemented to divide the inappropriate content. Numerous studies have been done to automate the process. In this paper, we propose a new Bi-GRU-CNN model to classify whether the text is offensive or not. The combination of the Bi-GRU and CNN models outperforms the existing model

Updated: 2024-09-24 01:29:24

标题: 中文翻译：基于CNN和双向GRU模型的英文冒犯文本检测

摘要: 在过去几年中，社交媒体用户数量急剧增加。人们经常通过社交平台分享他们的想法，这导致了仇恨内容的增加。在这个虚拟社区中，个人分享他们的观点，表达他们的感受，并发布照片、视频、博客等。像Facebook和Twitter这样的社交网站提供了平台，可以通过一次点击分享大量内容。然而，这些平台并不对上传的内容施加限制，这些内容可能包含不适合社交媒体的辱骂语言和露骨图片。为了解决这个问题，必须实施一个新的想法来区分不恰当的内容。已经进行了大量研究以自动化这个过程。在本文中，我们提出了一个新的Bi-GRU-CNN模型来分类文本是否具有冒犯性。Bi-GRU和CNN模型的组合优于现有模型。

更新时间: 2024-09-24 01:29:24

领域: cs.CL,cs.LG,cs.SI

下载: http://arxiv.org/abs/2409.15652v1

SurgIRL: Towards Life-Long Learning for Surgical Automation by Incremental Reinforcement Learning

Surgical automation holds immense potential to improve the outcome and accessibility of surgery. Recent studies use reinforcement learning to learn policies that automate different surgical tasks. However, these policies are developed independently and are limited in their reusability when the task changes, making it more time-consuming when robots learn to solve multiple tasks. Inspired by how human surgeons build their expertise, we train surgical automation policies through Surgical Incremental Reinforcement Learning (SurgIRL). SurgIRL aims to (1) acquire new skills by referring to external policies (knowledge) and (2) accumulate and reuse these skills to solve multiple unseen tasks incrementally (incremental learning). Our SurgIRL framework includes three major components. We first define an expandable knowledge set containing heterogeneous policies that can be helpful for surgical tasks. Then, we propose Knowledge Inclusive Attention Network with mAximum Coverage Exploration (KIAN-ACE), which improves learning efficiency by maximizing the coverage of the knowledge set during the exploration process. Finally, we develop incremental learning pipelines based on KIAN-ACE to accumulate and reuse learned knowledge and solve multiple surgical tasks sequentially. Our simulation experiments show that KIAN-ACE efficiently learns to automate ten surgical tasks separately or incrementally. We also evaluate our learned policies on the da Vinci Research Kit (dVRK) and demonstrate successful sim-to-real transfers.

Updated: 2024-09-24 01:27:46

标题: SurgIRL：通过增量强化学习实现手术自动化的终身学习

摘要: 外科自动化具有巨大潜力，可改善手术结果和可及性。最近的研究使用强化学习来学习自动化不同的外科任务。然而，这些政策是独立开发的，在任务发生变化时重复使用性受限，使得机器人学习解决多个任务更加耗时。受人类外科医生如何建立专业知识的启发，我们通过外科增量强化学习（SurgIRL）训练外科自动化政策。SurgIRL旨在（1）通过参考外部政策（知识）获取新技能，（2）逐步积累和重复使用这些技能以逐步解决多个未见任务（增量学习）。我们的SurgIRL框架包括三个主要组件。首先，我们定义一个可扩展的知识集，其中包含对外科任务有帮助的异质政策。然后，我们提出了具有最大覆盖探索的知识包容注意网络（KIAN-ACE），通过在探索过程中最大化知识集的覆盖率来提高学习效率。最后，我们基于KIAN-ACE开发了增量学习管道，积累和重复使用学到的知识，并依次解决多个外科任务。我们的模拟实验显示，KIAN-ACE有效地学习了单独或逐步自动化十个外科任务。我们还在达芬奇研究套件（dVRK）上评估了我们学到的政策，并展示了成功的模拟到实际的转移。

更新时间: 2024-09-24 01:27:46

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2409.15651v1

Looped Transformers for Length Generalization

Recent work has shown that Transformers trained from scratch can successfully solve various arithmetic and algorithmic tasks, such as adding numbers and computing parity. While these Transformers generalize well on unseen inputs of the same length, they struggle with length generalization, i.e., handling inputs of unseen lengths. In this work, we demonstrate that looped Transformers with an adaptive number of steps significantly improve length generalization. We focus on tasks with a known iterative solution, involving multiple iterations of a RASP-L operation - a length-generalizable operation that can be expressed by a finite-sized Transformer. We train looped Transformers using our proposed learning algorithm and observe that they learn highly length-generalizable solutions for various tasks.

Updated: 2024-09-24 01:21:17

标题: 循环变压器用于长度泛化

摘要: 最近的研究表明，从头开始训练的变压器可以成功地解决各种算术和算法任务，如加法和计算奇偶性。虽然这些变压器在相同长度的未见输入上具有良好的泛化能力，但它们在长度泛化方面表现出困难，即处理未见长度的输入。在这项工作中，我们展示了具有自适应步数的循环变压器显著改善了长度泛化。我们专注于具有已知迭代解的任务，涉及多次对 RASP-L 操作的迭代 - 这是一个可以用有限大小的变压器表达的长度可泛化操作。我们使用我们提出的学习算法训练循环变压器，并观察到它们学习了各种任务的高度长度可泛化的解决方案。

更新时间: 2024-09-24 01:21:17

领域: cs.LG

下载: http://arxiv.org/abs/2409.15647v1

CCE: Sample Efficient Sparse Reward Policy Learning for Robotic Navigation via Confidence-Controlled Exploration

We introduce Confidence-Controlled Exploration (CCE), a novel exploration scheme designed to enhance the training sample efficiency of reinforcement learning (RL) algorithms for sparse reward settings such as robot navigation. Sparse rewards are common in RL and convenient to design and implement, but typically hard to deal with due to the challenges of exploration. Existing methods deploy regularization-based methods to deal with the exploration challenges. However, it is hard to characterize the balance between exploration and exploitation because regularization modifies the reward function itself, hence changing the objective we are optimizing for. In contrast to regularization-based approaches in the existing literature, our approach, CCE, is based on a novel relationship we provide between gradient estimation and policy entropy. CCE dynamically adjusts the number of samples of the gradient update used during training to control exploration. Interestingly, CCE can be applied to both existing on-policy and off-policy RL methods, which we demonstrate by empirically validating its efficacy on three popular RL methods: REINFORCE, Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC) for goal-reaching robotic navigation tasks. We demonstrate through simulated and real-world experiments that CCE outperforms conventional methods that employ constant trajectory lengths and entropy regularization when constraining the sample budget. For a fixed sample budget, CCE achieves an 18\% increase in navigation success rate, a 20-38\% reduction in navigation path length, and a 9.32\% decrease in elevation costs. Furthermore, we showcase the versatility of CCE by integrating it with the Clearpath Husky robot, illustrating its applicability in complex outdoor environments.

Updated: 2024-09-24 01:09:54

标题: CCE：通过置信度受控探索实现机器人导航的样本高效稀疏奖励策略学习

摘要: 我们引入了一种名为置信度控制探索（CCE）的新型探索方案，旨在增强稀疏奖励设置下强化学习（RL）算法的训练样本效率，例如机器人导航。稀疏奖励在RL中很常见，便于设计和实现，但由于探索的挑战，通常很难处理。现有方法采用基于正则化的方法来处理探索挑战。然而，很难表征探索和开发之间的平衡，因为正则化修改了奖励函数本身，从而改变了我们正在优化的目标。与现有文献中基于正则化方法相反，我们的方法CCE基于我们提供的梯度估计和策略熵之间的新关系。CCE动态调整训练过程中使用的梯度更新样本数量，以控制探索。有趣的是，CCE可以应用于现有的在线策略和离线策略RL方法，我们通过在三种流行的RL方法（REINFORCE，Proximal Policy Optimization（PPO）和Soft Actor-Critic（SAC））上实证验证了其有效性，用于目标达成的机器人导航任务。我们通过模拟和真实世界实验表明，当限制样本预算时，CCE的表现优于采用恒定轨迹长度和熵正则化的传统方法。对于固定的样本预算，CCE实现了导航成功率增加18％，导航路径长度减少20-38％，高度成本减少9.32％。此外，我们通过将CCE与Clearpath Husky机器人集成，展示了其在复杂户外环境中的适用性。

更新时间: 2024-09-24 01:09:54

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2306.06192v8

On a measure of intelligence

The Fall 2024 Logic in Computer Science column of the Bulletin of EATCS is a little discussion on intelligence, measuring intelligence, and related issues, provoked by a fascinating must-read article ``On the measure of intelligence'' by François Chollet. The discussion includes a modicum of critique of the article.

Updated: 2024-09-24 01:03:48

标题: 关于智力测量的研究

摘要: 《计算机科学逻辑》栏目的2024年秋季期刊中出现了一篇关于智能、智能测量以及相关问题的讨论，这篇讨论是由弗朗索瓦·肖勒(François Chollet)的一篇引人入胜的必读文章《关于智能的衡量》所引发的。讨论中包含了对这篇文章的一些批评。

更新时间: 2024-09-24 01:03:48

领域: cs.AI

下载: http://arxiv.org/abs/2409.14496v1

KernJC: Automated Vulnerable Environment Generation for Linux Kernel Vulnerabilities

Linux kernel vulnerability reproduction is a critical task in system security. To reproduce a kernel vulnerability, the vulnerable environment and the Proof of Concept (PoC) program are needed. Most existing research focuses on the generation of PoC, while the construction of environment is overlooked. However, establishing an effective vulnerable environment to trigger a vulnerability is challenging. Firstly, it is hard to guarantee that the selected kernel version for reproduction is vulnerable, as the vulnerability version claims in online databases can occasionally be spurious. Secondly, many vulnerabilities can not be reproduced in kernels built with default configurations. Intricate non-default kernel configurations must be set to include and trigger a kernel vulnerability, but less information is available on how to recognize these configurations. To solve these challenges, we propose a patch-based approach to identify real vulnerable kernel versions and a graph-based approach to identify necessary configs for activating a specific vulnerability. We implement these approaches in a tool, KernJC, automating the generation of vulnerable environments for kernel vulnerabilities. To evaluate the efficacy of KernJC, we build a dataset containing 66 representative real-world vulnerabilities with PoCs from kernel vulnerability research in the past five years. The evaluation shows that KernJC builds vulnerable environments for all these vulnerabilities, 48.5% of which require non-default configs, and 4 have incorrect version claims in the National Vulnerability Database (NVD). Furthermore, we conduct large-scale spurious version detection on kernel vulnerabilities and identify 128 vulnerabilities which have spurious version claims in NVD. To foster future research, we release KernJC with the dataset in the community.

Updated: 2024-09-24 00:56:36

标题: KernJC：针对Linux内核漏洞的自动化脆弱环境生成

摘要: Linux内核漏洞重现是系统安全中的关键任务。要重现内核漏洞，需要具有漏洞的环境和证明概念（PoC）程序。大多数现有研究侧重于生成PoC，而环境的构建被忽视了。然而，建立一个有效的易受攻击环境以触发漏洞是具有挑战性的。首先，很难保证用于重现的选择的内核版本是易受攻击的，因为在线数据库中的漏洞版本声明有时可能是虚假的。其次，许多漏洞无法在使用默认配置构建的内核中重现。必须设置复杂的非默认内核配置以包含和触发内核漏洞，但关于如何识别这些配置的信息较少。为了解决这些挑战，我们提出了一种基于补丁的方法来识别真实易受攻击的内核版本，以及一种基于图的方法来识别激活特定漏洞所需的必要配置。我们在一个名为KernJC的工具中实现了这些方法，自动化生成内核漏洞的易受攻击环境。为了评估KernJC的有效性，我们构建了一个数据集，其中包含过去五年内核漏洞研究中的66个具有PoC的代表性真实世界漏洞。评估结果显示，KernJC为所有这些漏洞构建了易受攻击的环境，其中48.5%需要非默认配置，并且有4个在国家漏洞数据库（NVD）中有错误版本声明。此外，我们对内核漏洞进行了大规模虚假版本检测，并识别出128个在NVD中有虚假版本声明的漏洞。为了促进未来研究，我们在社区中发布了包含数据集的KernJC。

更新时间: 2024-09-24 00:56:36

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2404.11107v3

VulZoo: A Comprehensive Vulnerability Intelligence Dataset

Software vulnerabilities pose critical security and risk concerns for many software systems. Many techniques have been proposed to effectively assess and prioritize these vulnerabilities before they cause serious consequences. To evaluate their performance, these solutions often craft their own experimental datasets from limited information sources, such as MITRE CVE and NVD, lacking a global overview of broad vulnerability intelligence. The repetitive data preparation process further complicates the verification and comparison of new solutions. To resolve this issue, in this paper, we propose VulZoo, a comprehensive vulnerability intelligence dataset that covers 17 popular vulnerability information sources. We also construct connections among these sources, enabling more straightforward configuration and adaptation for different vulnerability assessment tasks (e.g., vulnerability type prediction). Additionally, VulZoo provides utility scripts for automatic data synchronization and cleaning, relationship mining, and statistics generation. We make VulZoo publicly available and maintain it with incremental updates to facilitate future research. We believe that VulZoo serves as a valuable input to vulnerability assessment and prioritization studies. The dataset with utility scripts is available at https://github.com/NUS-Curiosity/VulZoo.

Updated: 2024-09-24 00:54:30

标题: VulZoo：综合漏洞情报数据集

摘要: 软件漏洞对许多软件系统构成关键的安全和风险问题。许多技术已被提出，以有效评估和优先处理这些漏洞，以避免造成严重后果。为了评估它们的性能，这些解决方案通常从有限的信息来源（如MITRE CVE和NVD）中制定自己的实验数据集，缺乏全面的广泛漏洞情报概述。重复的数据准备过程进一步加剧了新解决方案的验证和比较的复杂性。为解决这一问题，在本文中，我们提出了VulZoo，一个涵盖17个流行漏洞信息来源的全面漏洞情报数据集。我们还在这些来源之间建立连接，使得为不同漏洞评估任务（如漏洞类型预测）进行更直接的配置和适应变得更容易。此外，VulZoo提供了用于自动数据同步和清理、关系挖掘和统计生成的实用脚本。我们将VulZoo公开提供，并定期进行增量更新，以促进未来的研究。我们相信VulZoo为漏洞评估和优先处理研究提供了有价值的输入。包含实用脚本的数据集可在https://github.com/NUS-Curiosity/VulZoo 上获取。

更新时间: 2024-09-24 00:54:30

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2406.16347v2

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

LLMs can now act as autonomous agents that interact with digital environments and complete specific objectives (e.g., arranging an online meeting). However, accuracy is still far from satisfactory, partly due to a lack of large-scale, direct demonstrations for digital tasks. Obtaining supervised data from humans is costly, and automatic data collection through exploration or reinforcement learning relies on complex environmental and content setup, resulting in datasets that lack comprehensive coverage of various scenarios. On the other hand, there is abundant knowledge that may indirectly assist task completion, such as online tutorials that were created for human consumption. In this work, we present Synatra, an approach that effectively transforms this indirect knowledge into direct supervision at scale. We define different types of indirect knowledge, and carefully study the available sources to obtain it, methods to encode the structure of direct demonstrations, and finally methods to transform indirect knowledge into direct demonstrations. We use 100k such synthetically-created demonstrations to finetune a 7B CodeLlama, and demonstrate that the resulting agent surpasses all comparably sized models on three web-based task benchmarks Mind2Web, MiniWoB++ and WebArena, as well as surpassing GPT-3.5 on WebArena and Mind2Web. In addition, while synthetic demonstrations prove to be only 3% the cost of human demonstrations (at $0.031 each), we show that the synthetic demonstrations can be more effective than an identical number of human demonstrations collected from limited domains.

Updated: 2024-09-24 00:51:45

标题: Synatra：将间接知识转化为规模化数字代理的直接演示

摘要: LLMs现在可以作为自主代理，与数字环境互动并完成特定目标（例如安排在线会议）。然而，准确性仍然远未令人满意，部分原因是由于缺乏针对数字任务的大规模直接演示。从人类获得监督数据成本高昂，而通过探索或强化学习进行自动数据收集依赖于复杂的环境和内容设置，导致数据集缺乏各种情景的全面覆盖。另一方面，存在大量的知识可能间接地帮助任务完成，例如为人类消费而创建的在线教程。在这项工作中，我们提出了Synatra，一种有效地将这种间接知识转化为大规模直接监督的方法。我们定义了不同类型的间接知识，并仔细研究了获取它的可用来源、编码直接演示结构的方法，以及将间接知识转化为直接演示的方法。我们使用10万个这样合成创建的演示来微调7B CodeLlama，并展示结果代理在三个基于网络的任务基准Mind2Web、MiniWoB++和WebArena上超越所有相同规模的模型，同时在WebArena和Mind2Web上超越了GPT-3.5。此外，虽然合成演示的成本仅为人类演示的3%（每个0.031美元），我们证明合成演示可能比从有限领域收集的相同数量的人类演示更有效。

更新时间: 2024-09-24 00:51:45

领域: cs.AI

下载: http://arxiv.org/abs/2409.15637v1

Personalized Federated Learning via Backbone Self-Distillation

In practical scenarios, federated learning frequently necessitates training personalized models for each client using heterogeneous data. This paper proposes a backbone self-distillation approach to facilitate personalized federated learning. In this approach, each client trains its local model and only sends the backbone weights to the server. These weights are then aggregated to create a global backbone, which is returned to each client for updating. However, the client's local backbone lacks personalization because of the common representation. To solve this problem, each client further performs backbone self-distillation by using the global backbone as a teacher and transferring knowledge to update the local backbone. This process involves learning two components: the shared backbone for common representation and the private head for local personalization, which enables effective global knowledge transfer. Extensive experiments and comparisons with 12 state-of-the-art approaches demonstrate the effectiveness of our approach.

Updated: 2024-09-24 00:43:16

标题: 通过骨干自蒸馏实现个性化的联邦学习

摘要: 在实际场景中，联邦学习经常需要使用异构数据为每个客户端训练个性化模型。本文提出了一种骨干自蒸馏方法来促进个性化的联邦学习。在这种方法中，每个客户端训练其本地模型，只将骨干权重发送到服务器。然后对这些权重进行聚合以创建全局骨干，然后将其返回给每个客户端进行更新。然而，由于共同的表示，客户端的本地骨干缺乏个性化。为了解决这个问题，每个客户端进一步通过使用全局骨干作为教师进行骨干自蒸馏，将知识传输到更新本地骨干。这个过程涉及学习两个组件：用于共同表示的共享骨干和用于本地个性化的私有头部，从而实现有效的全局知识传输。大量实验和与12种最先进方法的比较表明了我们方法的有效性。

更新时间: 2024-09-24 00:43:16

领域: cs.LG,cs.AI,cs.CR,cs.CV

下载: http://arxiv.org/abs/2409.15636v1

Scaling Synthetic Data Creation with 1,000,000,000 Personas

We propose a novel persona-driven data synthesis methodology that leverages various perspectives within a large language model (LLM) to create diverse synthetic data. To fully exploit this methodology at scale, we introduce Persona Hub -- a collection of 1 billion diverse personas automatically curated from web data. These 1 billion personas (~13% of the world's total population), acting as distributed carriers of world knowledge, can tap into almost every perspective encapsulated within the LLM, thereby facilitating the creation of diverse synthetic data at scale for various scenarios. By showcasing Persona Hub's use cases in synthesizing high-quality mathematical and logical reasoning problems, instructions (i.e., user prompts), knowledge-rich texts, game NPCs and tools (functions) at scale, we demonstrate persona-driven data synthesis is versatile, scalable, flexible, and easy to use, potentially driving a paradigm shift in synthetic data creation and applications in practice, which may have a profound impact on LLM research and development.

Updated: 2024-09-24 00:38:10

标题: 使用1,000,000,000个虚拟角色扩展合成数据创建

摘要: 我们提出了一种新颖的以人设为驱动的数据合成方法，利用大型语言模型（LLM）中的各种视角来创建多样化的合成数据。为了充分利用这种规模化的方法，我们引入了Persona Hub - 一个从网络数据中自动策划出的10亿多样化人设的集合。这10亿人设（约占世界总人口的13%），作为传播世界知识的分布式载体，可以利用LLM中几乎每一个视角，从而促进为各种场景规模化地创建多样化合成数据。通过展示Persona Hub在合成高质量数学和逻辑推理问题、指令（即用户提示）、知识丰富的文本、游戏NPC和工具（功能）等方面的应用案例，我们证明了以人设为驱动的数据合成是多才多艺、可扩展、灵活且易于使用的，潜在地推动合成数据的创造和实际应用中的范式转变，可能对LLM的研究和发展产生深远影响。

更新时间: 2024-09-24 00:38:10

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.20094v2

Provably Efficient Infinite-Horizon Average-Reward Reinforcement Learning with Linear Function Approximation

This paper proposes a computationally tractable algorithm for learning infinite-horizon average-reward linear Markov decision processes (MDPs) and linear mixture MDPs under the Bellman optimality condition. While guaranteeing computational efficiency, our algorithm for linear MDPs achieves the best-known regret upper bound of $\widetilde{\mathcal{O}}(d^{3/2}\mathrm{sp}(v^*)\sqrt{T})$ over $T$ time steps where $\mathrm{sp}(v^*)$ is the span of the optimal bias function $v^*$ and $d$ is the dimension of the feature mapping. For linear mixture MDPs, our algorithm attains a regret bound of $\widetilde{\mathcal{O}}(d\cdot\mathrm{sp}(v^*)\sqrt{T})$. The algorithm applies novel techniques to control the covering number of the value function class and the span of optimistic estimators of the value function, which is of independent interest.

Updated: 2024-09-24 00:30:18

标题: 可以翻译为：具证明有效性的具有线性函数逼近的无限时域平均奖励强化学习

摘要: 本文提出了一个可计算的算法，用于在贝尔曼最优条件下学习无限时间跨度平均奖励线性马尔可夫决策过程（MDPs）和线性混合MDPs。在保证计算效率的同时，我们的线性MDPs算法实现了目前已知的最佳遗憾上界，为$T$个时间步长时的$\widetilde{\mathcal{O}}(d^{3/2}\mathrm{sp}(v^*)\sqrt{T})$，其中$\mathrm{sp}(v^*)$是最优偏差函数$v^*$的跨度，$d$是特征映射的维度。对于线性混合MDPs，我们的算法达到了一个遗憾上界为$\widetilde{\mathcal{O}}(d\cdot\mathrm{sp}(v^*)\sqrt{T})$。该算法应用了新颖的技术来控制价值函数类的覆盖数和值函数乐观估计的跨度，这对于独立的研究具有重要意义。

更新时间: 2024-09-24 00:30:18

领域: cs.LG,cs.DS,math.OC

下载: http://arxiv.org/abs/2409.10772v2

Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI

Learning performance data describe correct and incorrect answers or problem-solving attempts in adaptive learning, such as in intelligent tutoring systems (ITSs). Learning performance data tend to be highly sparse (80\%$\sim$90\% missing observations) in most real-world applications due to adaptive item selection. This data sparsity presents challenges to using learner models to effectively predict future performance explore new hypotheses about learning. This article proposes a systematic framework for augmenting learner data to address data sparsity in learning performance data. First, learning performance is represented as a three-dimensional tensor of learners' questions, answers, and attempts, capturing longitudinal knowledge states during learning. Second, a tensor factorization method is used to impute missing values in sparse tensors of collected learner data, thereby grounding the imputation on knowledge tracing tasks that predict missing performance values based on real observations. Third, a module for generating patterns of learning is used. This study contrasts two forms of generative Artificial Intelligence (AI), including Generative Adversarial Networks (GANs) and Generate Pre-Trained Transformers (GPT) to generate data associated with different clusters of learner data. We tested this approach on an adult literacy dataset from AutoTutor lessons developed for Adult Reading Comprehension (ARC). We found that: (1) tensor factorization improved the performance in tracing and predicting knowledge mastery compared with other knowledge tracing techniques without data augmentation, showing higher relative fidelity for this imputation method, and (2) the GAN-based simulation showed greater overall stability and less statistical bias based on a divergence evaluation with varying simulation sample sizes compared to GPT.

Updated: 2024-09-24 00:25:07

标题: 使用生成式人工智能进行稀疏多维学习性能数据的数据增强

摘要: 学习表现数据描述了适应性学习中的正确和错误答案或问题解决尝试，例如智能辅导系统（ITSs）中的情况。由于自适应项目选择，学习表现数据往往在大多数实际应用中非常稀疏（80\%～90\%的缺失观察）。这种数据稀疏性对于使用学习者模型有效地预测未来表现和探索关于学习的新假设提出了挑战。本文提出了一个系统框架，用于增加学习表现数据以解决数据稀疏性问题。首先，学习表现被表示为学习者问题、答案和尝试的三维张量，捕捉学习过程中的纵向知识状态。其次，使用张量分解方法来填补收集到的学习者数据中稀疏张量的缺失值，从而基于真实观察来预测缺失表现值。第三，使用一个生成学习模式的模块。这项研究对比了两种生成人工智能（AI）形式，包括生成对抗网络（GANs）和生成预训练变换器（GPT），以生成与不同学习者数据簇相关的数据。我们在AutoTutor为成人阅读理解（ARC）开发的成人识字数据集上测试了这种方法。我们发现：（1）张量分解相较于其他不进行数据增强的知识跟踪技术，提高了知识掌握的追踪和预测表现，显示出这种填补方法的较高相对忠实度，以及（2）基于生成对抗网络的模拟相对于GPT，在不同模拟样本大小的发散评估中显示出更大的整体稳定性和较少的统计偏差。

更新时间: 2024-09-24 00:25:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.15631v1

Universal Session Protocol: A General Solution to Remote Code Execution

Currently, the TCP/IP model enables exploitation of vulnerabilities anonymously by unconditionally fulfilling every request for a connection into an application; the model only incorporates authentication within applications themselves, rather than as a precondition for access into applications. I am proposing the Universal Session Protocol as a change to the architecture of the TCP/IP model to include a session layer featuring a structured generalized process for authentication negotiation and fulfillment. The Universal Session Protocol addresses an urgent and vital need to eliminate unauthenticated data processing on security critical systems. Previous work regarding TCP/IP security has focused on the application design and implementation and existing protocol layers, but has failed to posit the addition of a session layer as a mitigating control. Failing to implement a distinct authentication layer leaves every resource connected to the global Internet, including life and security critical infrastructure, vulnerable to attacks from anonymous and untraceable sources. The Universal Session Protocol provides a solution by establishing a TCP/IP Session Layer that explicitly provides authentication before a data stream is accessible within an application. After authentication, an identity is associated with the data stream so that all data may be related back to that identity for forensic purposes. If authentication fails, the application will never process user data, rendering the service safe from anonymous bad actors.

Updated: 2024-09-24 00:02:06

标题: Universal Session Protocol: A General Solution to Remote Code Execution （通用会话协议：远程代码执行的通用解决方案）

摘要: 目前，TCP/IP模型使得利用漏洞匿名地满足每个连接请求进入应用程序成为可能；该模型只在应用程序本身中包含身份验证，而不作为进入应用程序的先决条件。我提出Universal Session Protocol作为TCP/IP模型架构的一项改变，以包括一个会话层，其中包含一个结构化的通用过程，用于身份验证的协商和履行。Universal Session Protocol解决了在安全关键系统上消除未经身份验证的数据处理的迫切和重要需求。以前关于TCP/IP安全的工作主要集中在应用程序设计和实施以及现有协议层，但未提出添加会话层作为缓解控制措施。没有实施明确的身份验证层会使连接到全球互联网的每个资源，包括生命和安全关键基础设施，容易受到来自匿名和无法追踪的来源的攻击。Universal Session Protocol通过建立一个明确在数据流在应用程序内可访问之前提供身份验证的TCP/IP会话层来提供解决方案。身份验证后，一个身份与数据流相关联，以便所有数据都可以追溯到该身份进行取证目的。如果身份验证失败，应用程序将永远不会处理用户数据，使服务免受匿名不良行为者的危害。

更新时间: 2024-09-24 00:02:06

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2306.14339v2