Arxiv Day: Article

FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding

Precisely perceiving the geometric and semantic properties of real-world 3D objects is crucial for the continued evolution of augmented reality and robotic applications. To this end, we present Foundation Model Embedded Gaussian Splatting (FMGS), which incorporates vision-language embeddings of foundation models into 3D Gaussian Splatting (GS). The key contribution of this work is an efficient method to reconstruct and represent 3D vision-language models. This is achieved by distilling feature maps generated from image-based foundation models into those rendered from our 3D model. To ensure high-quality rendering and fast training, we introduce a novel scene representation by integrating strengths from both GS and multi-resolution hash encodings (MHE). Our effective training procedure also introduces a pixel alignment loss that makes the rendered feature distance of the same semantic entities close, following the pixel-level semantic boundaries. Our results demonstrate remarkable multi-view semantic consistency, facilitating diverse downstream tasks, beating state-of-the-art methods by 10.2 percent on open-vocabulary language-based object detection, despite that we are 851X faster for inference. This research explores the intersection of vision, language, and 3D scene representation, paving the way for enhanced scene understanding in uncontrolled real-world environments. We plan to release the code on the project page.

Updated: 2024-05-03 23:33:07

标题: FMGS：基于基础模型的嵌入式3D高斯点喷洒技术，用于整体3D场景理解

摘要: 准确地感知现实世界3D物体的几何和语义属性对增强现实和机器人应用的持续发展至关重要。为此，我们提出了基于基础模型嵌入高斯分布（FMGS）的方法，将基础模型的视觉语言嵌入集成到3D高斯分布（GS）中。这项工作的关键贡献是一种高效的方法，用于重建和表示3D视觉语言模型。通过将基于图像的基础模型生成的特征图提炼为我们的3D模型所渲染的特征图，实现了这一目标。为了确保高质量的渲染和快速训练，我们引入了一种新颖的场景表示方法，将GS和多分辨率哈希编码（MHE）的优势整合在一起。我们的有效训练程序还引入了像素对齐损失，使相同语义实体的渲染特征距离接近，遵循像素级语义边界。我们的结果展示了显着的多视图语义一致性，有助于多样化的下游任务，对开放词汇语言为基础的物体检测方法的超越了当前最先进的方法10.2％，尽管我们在推断方面的速度快了851倍。这项研究探讨了视觉、语言和3D场景表示的交叉点，为在无控制的现实世界环境中提升场景理解铺平了道路。我们计划在项目页面上发布代码。

更新时间: 2024-05-03 23:33:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2401.01970v2

Zero-Knowledge Proof of Distinct Identity: a Standard-compatible Sybil-resistant Pseudonym Extension for C-ITS

Pseudonyms are widely used in Cooperative Intelligent Transport Systems (C-ITS) to protect the location privacy of vehicles. However, the unlinkability nature of pseudonyms also enables Sybil attacks, where a malicious vehicle can pretend to be multiple vehicles at the same time. In this paper, we propose a novel protocol called zero-knowledge Proof of Distinct Identity (zk-PoDI,) which allows a vehicle to prove that it is not the owner of another pseudonym in the local area, without revealing its actual identity. Zk-PoDI is based on the Diophantine equation and zk-SNARK, and does not rely on any specific pseudonym design or infrastructure assistance. We show that zk-PoDI satisfies all the requirements for a practical Sybil-resistance pseudonym system, and it has low latency, adjustable difficulty, moderate computation overhead, and negligible communication cost. We also discuss the future work of implementing and evaluating zk-PoDI in a realistic city-scale simulation environment.

Updated: 2024-05-03 23:29:02

标题: 零知识证明不同身份：用于C-ITS的符合标准的抗Sybil伪装的伪名扩展

摘要: 假名在合作智能交通系统（C-ITS）中被广泛使用，以保护车辆的位置隐私。然而，假名的不可链接性特性也使得Sybil攻击成为可能，即恶意车辆可以假装同时是多辆车。本文提出了一种名为零知识不同身份证明（zk-PoDI）的新协议，允许车辆证明自己不是本地区另一个假名的所有者，而不泄露其实际身份。zk-PoDI基于丢番图方程和zk-SNARK，不依赖于任何特定的假名设计或基础设施辅助。我们展示了zk-PoDI满足实际Sybil抵抗假名系统的所有要求，具有低延迟、可调节的困难度、适度的计算开销和可忽略的通信成本。我们还讨论了在实际城市规模仿真环境中实施和评估zk-PoDI的未来工作。

更新时间: 2024-05-03 23:29:02

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2403.14020v3

New contexts, old heuristics: How young people in India and the US trust online content in the age of generative AI

We conducted an in-person ethnography in India and the US to investigate how young people (18-24) trusted online content, with a focus on generative AI (GenAI). We had four key findings about how young people use GenAI and determine what to trust online. First, when online, we found participants fluidly shifted between mindsets and emotional states, which we term "information modes." Second, these information modes shaped how and why participants trust GenAI and how they applied literacy skills. In the modes where they spent most of their time, they eschewed literacy skills. Third, with the advent of GenAI, participants imported existing trust heuristics from familiar online contexts into their interactions with GenAI. Fourth, although study participants had reservations about GenAI, they saw it as a requisite tool to adopt to keep up with the times. Participants valued efficiency above all else, and used GenAI to further their goals quickly at the expense of accuracy. Our findings suggest that young people spend the majority of their time online not concerned with truth because they are seeking only to pass the time. As a result, literacy interventions should be designed to intervene at the right time, to match users' distinct information modes, and to work with their existing fact-checking practices.

Updated: 2024-05-03 23:27:11

标题: 新背景，旧启发法：在生成式人工智能时代，印度和美国年轻人如何信任在线内容

摘要: 我们在印度和美国进行了一项现场民族志调查，以调查年轻人（18-24岁）如何信任在线内容，重点关注生成式人工智能（GenAI）。我们有四个关于年轻人如何使用GenAI并确定在线内容可信度的关键发现。首先，在在线时，我们发现参与者在不同的心态和情绪状态之间流畅转换，我们将其称为“信息模式”。其次，这些信息模式塑造了参与者信任GenAI的方式和原因以及他们应用识字技能的方式。在他们花费大部分时间的模式中，他们避开了识字技能。第三，随着GenAI的出现，参与者将现有的信任启发从熟悉的在线环境引入到与GenAI的互动中。第四，尽管研究参与者对GenAI持保留态度，但他们将其视为必不可少的工具，以跟上时代。参与者最重视效率，他们利用GenAI快速实现目标，而牺牲准确性。我们的发现表明，年轻人大部分时间在线时并不关心真相，因为他们只是为了消磨时间。因此，识字干预应该在正确的时间介入，以匹配用户的不同信息模式，并与他们现有的事实核查实践相匹配。

更新时间: 2024-05-03 23:27:11

领域: cs.HC,cs.AI,cs.CY,cs.SI

下载: http://arxiv.org/abs/2405.02522v1

Automating the Enterprise with Foundation Models

Automating enterprise workflows could unlock $4 trillion/year in productivity gains. Despite being of interest to the data management community for decades, the ultimate vision of end-to-end workflow automation has remained elusive. Current solutions rely on process mining and robotic process automation (RPA), in which a bot is hard-coded to follow a set of predefined rules for completing a workflow. Through case studies of a hospital and large B2B enterprise, we find that the adoption of RPA has been inhibited by high set-up costs (12-18 months), unreliable execution (60% initial accuracy), and burdensome maintenance (requiring multiple FTEs). Multimodal foundation models (FMs) such as GPT-4 offer a promising new approach for end-to-end workflow automation given their generalized reasoning and planning abilities. To study these capabilities we propose ECLAIR, a system to automate enterprise workflows with minimal human supervision. We conduct initial experiments showing that multimodal FMs can address the limitations of traditional RPA with (1) near-human-level understanding of workflows (93% accuracy on a workflow understanding task) and (2) instant set-up with minimal technical barrier (based solely on a natural language description of a workflow, ECLAIR achieves end-to-end completion rates of 40%). We identify human-AI collaboration, validation, and self-improvement as open challenges, and suggest ways they can be solved with data management techniques. Code is available at: https://github.com/HazyResearch/eclair-agents

Updated: 2024-05-03 23:25:15

标题: 使用基础模型自动化企业

摘要: 自动化企业工作流程可能带来每年4万亿美元的生产率增益。尽管数据管理社区对此感兴趣已有几十年，但实现端到端工作流自动化的最终愿景仍然难以实现。当前的解决方案依赖于过程挖掘和机器人流程自动化（RPA），其中一个机器人被硬编码以遵循一组预定义规则来完成工作流程。通过对医院和大型B2B企业的案例研究，我们发现RPA的采用受到高昂的设置成本（12-18个月）、不可靠的执行（60%的初始准确性）和繁重的维护（需要多名全职员工）的阻碍。诸如GPT-4之类的多模基础模型（FMs）提供了一种有前途的新方法，用于端到端工作流自动化，因为它们具有泛化推理和规划能力。为了研究这些能力，我们提出了ECLAIR，一个能够在最小人工监督下自动化企业工作流程的系统。我们进行了初步实验，结果显示多模FMs可以解决传统RPA的局限性，具有（1）接近人类水平的工作流理解能力（在工作流理解任务上达到93%的准确率）和（2）通过最小的技术障碍即可立即设置（仅基于工作流的自然语言描述，ECLAIR实现了40%的端到端完成率）。我们确定人工智能与人类的协作、验证和自我改进是开放性挑战，并建议利用数据管理技术来解决这些挑战。有关代码，请访问：https://github.com/HazyResearch/eclair-agents

更新时间: 2024-05-03 23:25:15

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.03710v1

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

We study the challenging exploration incentive problem in both bandit and reinforcement learning, where the rewards are scale-free and potentially unbounded, driven by real-world scenarios and differing from existing work. Past works in reinforcement learning either assume costly interactions with an environment or propose algorithms finding potentially low quality local maxima. Motivated by EXP-type methods that integrate multiple agents (experts) for exploration in bandits with the assumption that rewards are bounded, we propose new algorithms, namely EXP4.P and EXP4-RL for exploration in the unbounded reward case, and demonstrate their effectiveness in these new settings. Unbounded rewards introduce challenges as the regret cannot be limited by the number of trials, and selecting suboptimal arms may lead to infinite regret. Specifically, we establish EXP4.P's regret upper bounds in both bounded and unbounded linear and stochastic contextual bandits. Surprisingly, we also find that by including one sufficiently competent expert, EXP4.P can achieve global optimality in the linear case. This unbounded reward result is also applicable to a revised version of EXP3.P in the Multi-armed Bandit scenario. In EXP4-RL, we extend EXP4.P from bandit scenarios to reinforcement learning to incentivize exploration by multiple agents, including one high-performing agent, for both efficiency and excellence. This algorithm has been tested on difficult-to-explore games and shows significant improvements in exploration compared to state-of-the-art.

Updated: 2024-05-03 23:19:46

标题: 后悔界限和基于EXP算法的强化学习探索

摘要: 我们研究了在赌博机和强化学习中具有挑战性的探索激励问题，其中奖励是无标度且潜在无限的，受真实世界情景驱动，并不同于现有工作。过去的强化学习工作要么假设与环境的交互成本高昂，要么提出了寻找潜在低质量局部最大值的算法。受EXP类型方法的启发，这些方法将多个代理（专家）整合到赌博机中进行探索，假设奖励是有界的，我们提出了新算法，即EXP4.P和EXP4-RL，用于处理无界奖励情况下的探索，并展示了它们在这些新设置中的有效性。无界奖励带来了挑战，因为后悔无法通过试验次数来限制，选择次优臂可能会导致无限后悔。具体地，在有界和无界线性和随机上下文赌博机中，我们建立了EXP4.P的后悔上界。令人惊讶的是，我们还发现，在线性情况下，通过包括一个足够胜任的专家，EXP4.P可以实现全局最优性。这个无界奖励结果也适用于多臂赌博机场景中的EXP3.P的修订版本。在EXP4-RL中，我们将EXP4.P从赌博机场景扩展到强化学习，以激励多个代理，包括一个表现优异的代理，进行探索，以提高效率和卓越性。该算法已在难以探索的游戏中进行了测试，并与最先进技术相比，在探索方面显示出显著的改进。

更新时间: 2024-05-03 23:19:46

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2009.09538v3

Generating Probabilistic Scenario Programs from Natural Language

For cyber-physical systems (CPS), including robotics and autonomous vehicles, mass deployment has been hindered by fatal errors that occur when operating in rare events. To replicate rare events such as vehicle crashes, many companies have created logging systems and employed crash reconstruction experts to meticulously recreate these valuable events in simulation. However, in these methods, "what if" questions are not easily formulated and answered. We present ScenarioNL, an AI System for creating scenario programs from natural language. Specifically, we generate these programs from police crash reports. Reports normally contain uncertainty about the exact details of the incidents which we represent through a Probabilistic Programming Language (PPL), Scenic. By using Scenic, we can clearly and concisely represent uncertainty and variation over CPS behaviors, properties, and interactions. We demonstrate how commonplace prompting techniques with the best Large Language Models (LLM) are incapable of reasoning about probabilistic scenario programs and generating code for low-resource languages such as Scenic. Our system is comprised of several LLMs chained together with several kinds of prompting strategies, a compiler, and a simulator. We evaluate our system on publicly available autonomous vehicle crash reports in California from the last five years and share insights into how we generate code that is both semantically meaningful and syntactically correct.

Updated: 2024-05-03 23:06:31

标题: 从自然语言生成概率场景程序

摘要: 对于包括机器人和自动驾驶汽车在内的网络物理系统（CPS），大规模部署受到发生在罕见事件中的致命错误的阻碍。为了复制诸如车祸等罕见事件，许多公司创建了日志系统并雇用了事故重建专家，精心重现这些有价值的事件。然而，在这些方法中，“如果”问题并不容易制定和回答。我们提出了ScenarioNL，这是一个用于从自然语言创建场景程序的人工智能系统。具体来说，我们从警方事故报告中生成这些程序。报告通常包含关于事故细节的不确定性，我们通过概率编程语言（PPL）Scenic来表示这些不确定性。通过使用Scenic，我们可以清晰简洁地表示CPS行为、属性和交互的不确定性和变化。我们演示了如何使用最佳大型语言模型（LLM）的常见提示技术无法推理概率场景程序并生成低资源语言（如Scenic）的代码。我们的系统由多个LLM连接在一起、多种提示策略、编译器和模拟器组成。我们在过去五年的加州公开可用的自动驾驶汽车事故报告上评估了我们的系统，并分享了我们如何生成既有语义意义又在语法上正确的代码的见解。

更新时间: 2024-05-03 23:06:31

领域: cs.SE,cs.AI,cs.LG,cs.PL

下载: http://arxiv.org/abs/2405.03709v1

Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty

To overcome the sim-to-real gap in reinforcement learning (RL), learned policies must maintain robustness against environmental uncertainties. While robust RL has been widely studied in single-agent regimes, in multi-agent environments, the problem remains understudied -- despite the fact that the problems posed by environmental uncertainties are often exacerbated by strategic interactions. This work focuses on learning in distributionally robust Markov games (RMGs), a robust variant of standard Markov games, wherein each agent aims to learn a policy that maximizes its own worst-case performance when the deployed environment deviates within its own prescribed uncertainty set. This results in a set of robust equilibrium strategies for all agents that align with classic notions of game-theoretic equilibria. Assuming a non-adaptive sampling mechanism from a generative model, we propose a sample-efficient model-based algorithm (DRNVI) with finite-sample complexity guarantees for learning robust variants of various notions of game-theoretic equilibria. We also establish an information-theoretic lower bound for solving RMGs, which confirms the near-optimal sample complexity of DRNVI with respect to problem-dependent factors such as the size of the state space, the target accuracy, and the horizon length.

Updated: 2024-05-03 23:02:53

标题: 面对环境不确定性的高效稳健多智能体强化学习算法

摘要: 为了克服强化学习（RL）中的模拟到现实差距，学习的策略必须对环境的不确定性保持稳健性。虽然单智能体制度下的强化学习已经得到广泛研究，但在多智能体环境中，这个问题仍然被忽视--尽管由环境不确定性引起的问题通常会因战略互动而加剧。本文重点研究了在分布稳健马尔科夫博弈（RMGs）中的学习，这是标准马尔科夫博弈的稳健变体，其中每个智能体旨在学习一种策略，以在部署的环境在其自己预先确定的不确定性集合内偏离时最大化其自己的最坏情况表现。这导致了所有智能体的一组稳健均衡策略，这些策略与博弈论均衡的经典概念相一致。假设从一个生成模型中进行非自适应采样机制，我们提出了一种具有有限样本复杂度保证的基于模型的高效算法（DRNVI），用于学习各种博弈论均衡的稳健变体。我们还为解决RMGs建立了一个信息理论下界，这证实了DRNVI相对于问题依赖因素（如状态空间的大小、目标准确度和时间长度）的近乎最优样本复杂度。

更新时间: 2024-05-03 23:02:53

领域: cs.LG,cs.MA,stat.ML

下载: http://arxiv.org/abs/2404.18909v2

Spatio-Temporal SwinMAE: A Swin Transformer based Multiscale Representation Learner for Temporal Satellite Imagery

Currently, the foundation models represented by large language models have made dramatic progress and are used in a very wide range of domains including 2D and 3D vision. As one of the important application domains of foundation models, earth observation has attracted attention and various approaches have been developed. When considering earth observation as a single image capture, earth observation imagery can be processed as an image with three or more channels, and when it comes with multiple image captures of different timestamps at one location, the temporal observation can be considered as a set of continuous image resembling video frames or medical SCAN slices. This paper presents Spatio-Temporal SwinMAE (ST-SwinMAE), an architecture which particularly focuses on representation learning for spatio-temporal image processing. Specifically, it uses a hierarchical Masked Auto-encoder (MAE) with Video Swin Transformer blocks. With the architecture, we present a pretrained model named Degas 100M as a geospatial foundation model. Also, we propose an approach for transfer learning with Degas 100M, which both pretrained encoder and decoder of MAE are utilized with skip connections added between them to achieve multi-scale information communication, forms an architecture named Spatio-Temporal SwinUNet (ST-SwinUNet). Our approach shows significant improvements of performance over existing state-of-the-art of foundation models. Specifically, for transfer learning of the land cover downstream task on the PhilEO Bench dataset, it shows 10.4\% higher accuracy compared with other geospatial foundation models on average.

Updated: 2024-05-03 22:55:56

标题: 时空SwinMAE：基于Swin Transformer的时间卫星图像多尺度表示学习器

摘要: 目前，大型语言模型代表的基础模型取得了巨大的进展，并在包括2D和3D视觉在内的广泛领域中得到应用。作为基础模型的重要应用领域之一，地球观测引起了关注，并开发了各种方法。将地球观测视为单个图像捕获时，地球观测图像可以被处理为具有三个或更多通道的图像，当它带有不同时间戳的多个图像捕获时，时间观测可以被视为一组连续图像，类似于视频帧或医学扫描切片。本文提出了一种名为Spatio-Temporal SwinMAE（ST-SwinMAE）的架构，特别关注时空图像处理的表示学习。具体地，它使用具有视频Swin Transformer块的分层掩膜自动编码器（MAE）。通过这种架构，我们提出了一个名为Degas 100M的预训练模型作为地理空间基础模型。此外，我们提出了一种使用Degas 100M的迁移学习方法，利用了MAE的预训练编码器和解码器，并在它们之间添加了跳跃连接以实现多尺度信息通信，形成了一种名为Spatio-Temporal SwinUNet（ST-SwinUNet）的架构。我们的方法在性能方面表现出明显的改进，超过了现有基础模型的最新技术水平。具体地，在PhilEO Bench数据集上的土地覆盖下游任务的迁移学习中，与其他地理空间基础模型相比，平均精度提高了10.4％。

更新时间: 2024-05-03 22:55:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.02512v1

Implicit Neural Representations for Robust Joint Sparse-View CT Reconstruction

Computed Tomography (CT) is pivotal in industrial quality control and medical diagnostics. Sparse-view CT, offering reduced ionizing radiation, faces challenges due to its under-sampled nature, leading to ill-posed reconstruction problems. Recent advancements in Implicit Neural Representations (INRs) have shown promise in addressing sparse-view CT reconstruction. Recognizing that CT often involves scanning similar subjects, we propose a novel approach to improve reconstruction quality through joint reconstruction of multiple objects using INRs. This approach can potentially leverage both the strengths of INRs and the statistical regularities across multiple objects. While current INR joint reconstruction techniques primarily focus on accelerating convergence via meta-initialization, they are not specifically tailored to enhance reconstruction quality. To address this gap, we introduce a novel INR-based Bayesian framework integrating latent variables to capture the inter-object relationships. These variables serve as a dynamic reference throughout the optimization, thereby enhancing individual reconstruction fidelity. Our extensive experiments, which assess various key factors such as reconstruction quality, resistance to overfitting, and generalizability, demonstrate significant improvements over baselines in common numerical metrics. This underscores a notable advancement in CT reconstruction methods.

Updated: 2024-05-03 22:50:59

标题: 隐式神经表示用于稳健的联合稀疏视图CT重建

摘要: 计算机断层扫描（CT）在工业质量控制和医学诊断中起着关键作用。稀疏视图CT减少了电离辐射，但面临其采样不足的挑战，导致重建问题不适定。最近对隐式神经表示（INRs）的进展显示出在解决稀疏视图CT重建方面的潜力。鉴于CT通常涉及扫描类似的对象，我们提出了一种通过INRs联合重建多个对象以改善重建质量的新方法。这种方法潜在地可以充分利用INRs的优势以及多个对象之间的统计规律。当前的INR联合重建技术主要侧重于通过元初始化加速收敛，但并未专门针对增强重建质量。为了填补这一空白，我们引入了一种基于INR的贝叶斯框架，整合潜变量以捕捉对象之间的关系。这些变量在整个优化过程中作为动态参考，从而增强个体重建的忠实度。我们的广泛实验评估了重建质量、对过拟合的抵抗力和泛化能力等各种关键因素，证明在常见数值指标上相比基线有显著改进。这突显了CT重建方法的显著进步。

更新时间: 2024-05-03 22:50:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.02509v1

Mathematical Foundation and Corrections for Full Range Head Pose Estimation

Numerous works concerning head pose estimation (HPE) offer algorithms or proposed neural network-based approaches for extracting Euler angles from either facial key points or directly from images of the head region. However, many works failed to provide clear definitions of the coordinate systems and Euler or Tait-Bryan angles orders in use. It is a well-known fact that rotation matrices depend on coordinate systems, and yaw, roll, and pitch angles are sensitive to their application order. Without precise definitions, it becomes challenging to validate the correctness of the output head pose and drawing routines employed in prior works. In this paper, we thoroughly examined the Euler angles defined in the 300W-LP dataset, head pose estimation such as 3DDFA-v2, 6D-RepNet, WHENet, etc, and the validity of their drawing routines of the Euler angles. When necessary, we infer their coordinate system and sequence of yaw, roll, pitch from provided code. This paper presents (1) code and algorithms for inferring coordinate system from provided source code, code for Euler angle application order and extracting precise rotation matrices and the Euler angles, (2) code and algorithms for converting poses from one rotation system to another, (3) novel formulae for 2D augmentations of the rotation matrices, and (4) derivations and code for the correct drawing routines for rotation matrices and poses. This paper also addresses the feasibility of defining rotations with right-handed coordinate system in Wikipedia and SciPy, which makes the Euler angle extraction much easier for full-range head pose research.

Updated: 2024-05-03 22:50:41

标题: 全范围头部姿势估计的数学基础和校正

摘要: 众多关于头部姿态估计（HPE）的作品提供了从面部关键点或直接从头部区域图像中提取欧拉角的算法或基于神经网络的方法。然而，许多作品未能提供清晰的坐标系统和欧拉或Tait-Bryan角度顺序的定义。众所周知，旋转矩阵取决于坐标系统，偏航、横滚和俯仰角对它们的应用顺序敏感。缺乏精确定义，验证先前作品中使用的输出头部姿态和绘图程序的正确性变得具有挑战性。本文彻底检验了在300W-LP数据集中定义的欧拉角、头部姿态估计（如3DDFA-v2、6D-RepNet、WHENet等）以及它们的欧拉角绘图程序的有效性。在必要时，我们通过提供的代码推断它们的坐标系统和偏航、横滚、俯仰的顺序。本文提供了（1）从提供的源代码推断坐标系统的代码和算法，欧拉角应用顺序的代码以及提取精确旋转矩阵和欧拉角的代码，（2）将姿势从一个旋转系统转换为另一个旋转系统的代码和算法，（3）用于旋转矩阵的二维增强的新公式，以及（4）旋转矩阵和姿势的正确绘图程序的推导和代码。本文还探讨了在维基百科和SciPy中定义右手坐标系旋转的可行性，这使得对全范围头部姿态研究的欧拉角提取变得更加容易。

更新时间: 2024-05-03 22:50:41

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.18104v2

LordNet: An Efficient Neural Network for Learning to Solve Parametric Partial Differential Equations without Simulated Data

Neural operators, as a powerful approximation to the non-linear operators between infinite-dimensional function spaces, have proved to be promising in accelerating the solution of partial differential equations (PDE). However, it requires a large amount of simulated data, which can be costly to collect. This can be avoided by learning physics from the physics-constrained loss, which we refer to it as mean squared residual (MSR) loss constructed by the discretized PDE. We investigate the physical information in the MSR loss, which we called long-range entanglements, and identify the challenge that the neural network requires the capacity to model the long-range entanglements in the spatial domain of the PDE, whose patterns vary in different PDEs. To tackle the challenge, we propose LordNet, a tunable and efficient neural network for modeling various entanglements. Inspired by the traditional solvers, LordNet models the long-range entanglements with a series of matrix multiplications, which can be seen as the low-rank approximation to the general fully-connected layers and extracts the dominant pattern with reduced computational cost. The experiments on solving Poisson's equation and (2D and 3D) Navier-Stokes equation demonstrate that the long-range entanglements from the MSR loss can be well modeled by the LordNet, yielding better accuracy and generalization ability than other neural networks. The results show that the Lordnet can be $50\times$ faster than traditional PDE solvers. In addition, LordNet outperforms other modern neural network architectures in accuracy and efficiency with the smallest parameter size.

Updated: 2024-05-03 22:29:05

标题: LordNet：一种高效的神经网络，用于学习解决参数化偏微分方程而无需模拟数据

摘要: 神经算子作为在无限维函数空间之间的非线性算子的强大逼近，已被证明在加速偏微分方程（PDE）的解决方案方面具有潜力。然而，这需要大量的模拟数据，收集起来可能成本高昂。通过学习物理约束损失来避免这一问题，我们将其称为由离散化PDE构建的均方残差（MSR）损失。我们研究了MSR损失中的物理信息，我们称之为长程纠缠，并确定了神经网络需要具备在PDE的空间域中建模长程纠缠的能力的挑战，其模式在不同的PDE中变化。为了解决这一挑战，我们提出了LordNet，一种可调节和高效的神经网络，用于建模各种纠缠。受传统求解器的启发，LordNet使用一系列矩阵乘法来建模长程纠缠，这可以看作是对一般全连接层的低秩逼近，并以降低的计算成本提取主导模式。解决泊松方程和（2D和3D）Navier-Stokes方程的实验表明，由MSR损失产生的长程纠缠可以很好地由LordNet建模，比其他神经网络具有更好的准确性和泛化能力。结果表明，LordNet比传统PDE求解器快50倍。此外，LordNet在准确性和效率方面优于其他现代神经网络架构，并具有最小的参数大小。

更新时间: 2024-05-03 22:29:05

领域: cs.LG

下载: http://arxiv.org/abs/2206.09418v2

Beyond Helpfulness and Harmlessness: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning

Large Language Models (LLMs) are trained on massive text corpora, which are encoded with diverse personality traits. This triggers an interesting goal of eliciting a desired personality trait from the LLM, and probing its behavioral preferences. Accordingly, we formalize the persona elicitation task, aiming to customize LLM behaviors to align with a target persona. We present Persona In-Context Learning (PICLe), a novel persona elicitation framework grounded in Bayesian inference. At the core, PICLe introduces a new ICL example selection criterion based on likelihood ratio, which is designed to optimally guide the model in eliciting a specific target persona. We demonstrate the effectiveness of PICLe through extensive comparisons against baseline methods across three contemporary LLMs. Code is available at https://github.com/deeplearning-wisc/picle.

Updated: 2024-05-03 22:17:22

标题: 超越有益和无害：通过个性上下文学习从大型语言模型中引出多样化行为

摘要: 大型语言模型（LLMs）是在大规模文本语料库上训练的，这些语料库编码了各种个性特征。这引发了一个有趣的目标，即从LLM中引出所需的个性特征，并探究其行为偏好。因此，我们形式化了个性引出任务，旨在将LLM行为定制为与目标个性相一致。我们提出了Persona In-Context Learning（PICLe），这是一个基于贝叶斯推断的新颖的个性引出框架。在核心上，PICLe引入了一个基于似然比的新的ICL示例选择标准，旨在最优地引导模型引出特定目标个性。我们通过与三种当代LLMs的基线方法进行广泛比较，展示了PICLe的有效性。代码可在https://github.com/deeplearning-wisc/picle上找到。

更新时间: 2024-05-03 22:17:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.02501v1

DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands

The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses this gap by presenting more rigorous findings on the microarchitectures of commodity DRAM chips and their impacts on the characteristics of activate-induced bitflips (AIBs), such as RowHammer and RowPress. The previous studies have also attempted to understand the DRAM microarchitectures and associated behaviors, but we have found some of their results to be misled by inaccurate address mapping and internal data swizzling, or lack of a deeper understanding of the modern DRAM cell structure. For accurate and efficient reverse-engineering, we use three tools: AIBs, retention time test, and RowCopy, which can be cross-validated. With these three tools, we first take a macroscopic view of modern DRAM chips to uncover the size, structure, and operation of their subarrays, memory array tiles (MATs), and rows. Then, we analyze AIB characteristics based on the microscopic view of the DRAM microarchitecture, such as 6F^2 cell layout, through which we rectify misunderstandings regarding AIBs and discover a new data pattern that accelerates AIBs. Lastly, based on our findings at both macroscopic and microscopic levels, we identify previously unknown AIB vulnerabilities and propose a simple yet effective protection solution.

Updated: 2024-05-03 22:10:21

标题: DRAMScope: 通过发出内存命令揭示DRAM微体系结构和特性

摘要: 对DRAM微体系结构和错误特性的精确信息需求激增，这主要是由于需要探索内存中的处理、增强可靠性和减轻安全漏洞。然而，DRAM制造商仅披露了有限数量的信息，使得很难找到有关其DRAM微体系结构的具体信息。本文通过提供更严谨的研究结果，来填补这一空白，研究商品DRAM芯片的微体系结构及其对激活引起的位翻转（AIB）特性的影响，如RowHammer和RowPress。先前的研究也尝试理解DRAM微体系结构和相关行为，但我们发现其中一些结果被不准确的地址映射和内部数据交换、或对现代DRAM单元结构缺乏更深入的了解所误导。为了准确和高效地进行逆向工程，我们使用了三种工具：AIB、保留时间测试和RowCopy，这些工具可以相互验证。通过这三种工具，我们首先从宏观角度审视现代DRAM芯片，揭示其子阵列、存储阵列瓦片（MATs）和行的大小、结构和操作。然后，我们根据DRAM微体系结构的微观视图来分析AIB的特性，例如6F^2单元布局，通过这一布局我们纠正了有关AIB的误解，并发现了加速AIB的新数据模式。最后，基于我们在宏观和微观层面的发现，我们识别了先前未知的AIB漏洞，并提出了一个简单而有效的保护解决方案。

更新时间: 2024-05-03 22:10:21

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2405.02499v1

Large Language Models for In-Context Student Modeling: Synthesizing Student's Behavior in Visual Programming

Student modeling is central to many educational technologies as it enables predicting future learning outcomes and designing targeted instructional strategies. However, open-ended learning domains pose challenges for accurately modeling students due to the diverse behaviors and a large space of possible misconceptions. To approach these challenges, we explore the application of large language models (LLMs) for in-context student modeling in open-ended learning domains. More concretely, given a particular student's attempt on a reference task as observation, the objective is to synthesize the student's attempt on a target task. We introduce a novel framework, LLM for Student Synthesis (LLM-SS), that leverages LLMs for synthesizing a student's behavior. Our framework can be combined with different LLMs; moreover, we fine-tune LLMs to boost their student modeling capabilities. We instantiate several methods based on LLM-SS framework and evaluate them using an existing benchmark, StudentSyn, for student attempt synthesis in a visual programming domain. Experimental results show that our methods perform significantly better than the baseline method NeurSS provided in the StudentSyn benchmark. Furthermore, our method using a fine-tuned version of the GPT-3.5 model is significantly better than using the base GPT-3.5 model and gets close to human tutors' performance.

Updated: 2024-05-03 22:03:43

标题: 大型语言模型用于上下文学生建模：在视觉编程中合成学生行为

摘要: 学生建模对许多教育技术至关重要，因为它能够预测未来的学习结果并设计有针对性的教学策略。然而，开放式学习领域对于准确建模学生提出了挑战，因为学生的行为多样化，可能存在大量的误解。为了应对这些挑战，我们探讨了在开放式学习领域中应用大型语言模型（LLMs）进行上下文学生建模的方法。更具体地说，给定一个特定学生在参考任务上的尝试作为观察，目标是综合学生在目标任务上的尝试。我们引入了一个新颖的框架，LLM学生综合（LLM-SS），利用LLMs来综合学生的行为。我们的框架可以与不同的LLMs结合使用；此外，我们对LLMs进行微调以增强它们的学生建模能力。我们基于LLM-SS框架实例化了几种方法，并使用现有的基准测试StudentSyn，在视觉编程领域中评估它们对学生尝试综合的表现。实验结果显示，我们的方法比StudentSyn基准测试中提供的基准方法NeurSS表现显著更好。此外，我们使用经过微调的GPT-3.5模型的方法明显优于使用基础GPT-3.5模型，并接近人类导师的表现水平。

更新时间: 2024-05-03 22:03:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.10690v3

Positive Unlabeled Learning Selected Not At Random (PULSNAR): class proportion estimation when the SCAR assumption does not hold

Positive and Unlabeled (PU) learning is a type of semi-supervised binary classification where the machine learning algorithm differentiates between a set of positive instances (labeled) and a set of both positive and negative instances (unlabeled). PU learning has broad applications in settings where confirmed negatives are unavailable or difficult to obtain, and there is value in discovering positives among the unlabeled (e.g., viable drugs among untested compounds). Most PU learning algorithms make the \emph{selected completely at random} (SCAR) assumption, namely that positives are selected independently of their features. However, in many real-world applications, such as healthcare, positives are not SCAR (e.g., severe cases are more likely to be diagnosed), leading to a poor estimate of the proportion, $\alpha$, of positives among unlabeled examples and poor model calibration, resulting in an uncertain decision threshold for selecting positives. PU learning algorithms vary; some estimate only the proportion, $\alpha$, of positives in the unlabeled set, while others calculate the probability that each specific unlabeled instance is positive, and some can do both. We propose two PU learning algorithms to estimate $\alpha$, calculate calibrated probabilities for PU instances, and improve classification metrics: i) PULSCAR (positive unlabeled learning selected completely at random), and ii) PULSNAR (positive unlabeled learning selected not at random). PULSNAR employs a divide-and-conquer approach to cluster SNAR positives into subtypes and estimates $\alpha$ for each subtype by applying PULSCAR to positives from each cluster and all unlabeled. In our experiments, PULSNAR outperformed state-of-the-art approaches on both synthetic and real-world benchmark datasets.

Updated: 2024-05-03 21:41:20

标题: 正负样本学习的非随机选择（PULSNAR）：当SCAR假设不成立时的类别比例估计

摘要: 正例和未标记（PU）学习是一种半监督二分类的算法，其中机器学习算法区分了一组正例（已标记）和一组既有正例又有负例的实例（未标记）。PU学习在某些情况下具有广泛的应用，例如在无法获取确定性负例或难以获取负例的情况下，发现未标记实例中的正例具有价值（例如，在未经测试的化合物中发现可行的药物）。大多数PU学习算法都假设正例是完全随机选择的（SCAR），即正例的选择与其特征无关。然而，在许多现实世界的应用中，例如在医疗保健领域，正例不是完全随机选择的（例如，严重病例更有可能被诊断出），这导致了对未标记实例中正例比例α的估计不准确，模型校准不佳，导致选择正例的决策阈值不确定。PU学习算法有所不同；一些算法仅估计未标记集中正例的比例α，而另一些计算每个特定未标记实例为正例的概率，还有一些可以同时做两者。我们提出了两种PU学习算法来估计α，计算PU实例的校准概率，并改进分类指标：i）PULSCAR（完全随机选择的正例未标记学习），ii）PULSNAR（非随机选择的正例未标记学习）。PULSNAR采用分而治之的方法将SNAR正例聚类成亚型，并通过将PULSCAR应用于每个簇中的正例和所有未标记实例来估计每个亚型的α。在我们的实验中，PULSNAR在合成和真实世界基准数据集上表现优于现有技术方法。

更新时间: 2024-05-03 21:41:20

领域: cs.LG

下载: http://arxiv.org/abs/2303.08269v3

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

In this work, we further develop the conformer-based metric generative adversarial network (CMGAN) model for speech enhancement (SE) in the time-frequency (TF) domain. This paper builds on our previous work but takes a more in-depth look by conducting extensive ablation studies on model inputs and architectural design choices. We rigorously tested the generalization ability of the model to unseen noise types and distortions. We have fortified our claims through DNS-MOS measurements and listening tests. Rather than focusing exclusively on the speech denoising task, we extend this work to address the dereverberation and super-resolution tasks. This necessitated exploring various architectural changes, specifically metric discriminator scores and masking techniques. It is essential to highlight that this is among the earliest works that attempted complex TF-domain super-resolution. Our findings show that CMGAN outperforms existing state-of-the-art methods in the three major speech enhancement tasks: denoising, dereverberation, and super-resolution. For example, in the denoising task using the Voice Bank+DEMAND dataset, CMGAN notably exceeded the performance of prior models, attaining a PESQ score of 3.41 and an SSNR of 11.10 dB. Audio samples and CMGAN implementations are available online.

Updated: 2024-05-03 21:38:45

标题: CMGAN：基于Conformer的度量-GAN用于单声道语音增强

摘要: 在这项工作中，我们进一步发展了基于构象的度量生成对抗网络（CMGAN）模型，用于在时间频率（TF）域中进行语音增强（SE）。本文基于我们先前的工作，但通过对模型输入和架构设计选择进行大量消融研究，进行了更深入的研究。我们严格测试了模型对未知噪声类型和失真的泛化能力。我们通过DNS-MOS测量和听觉测试加强了我们的观点。与其专注于语音降噪任务，我们将这项工作扩展到解决混响消除和超分辨率任务。这需要探索各种架构变化，特别是度量鉴别器分数和遮罩技术。值得强调的是，这是最早尝试复杂TF域超分辨率的工作之一。我们的发现显示，CMGAN在三项主要语音增强任务（降噪、混响消除和超分辨率）中优于现有的最先进方法。例如，在使用Voice Bank+DEMAND数据集的降噪任务中，CMGAN明显超过了先前模型的性能，达到了3.41的PESQ分数和11.10 dB的SSNR。音频样本和CMGAN实现可在线获取。

更新时间: 2024-05-03 21:38:45

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2209.11112v3

Physics-informed neural networks for operator equations with stochastic data

We consider the computation of statistical moments to operator equations with stochastic data. We remark that application of PINNs -- referred to as TPINNs -- allows to solve the induced tensor operator equations under minimal changes of existing PINNs code, and enabling handling of non-linear and time-dependent operators. We propose two types of architectures, referred to as vanilla and multi-output TPINNs, and investigate their benefits and limitations. Exhaustive numerical experiments are performed; demonstrating applicability and performance; raising a variety of new promising research avenues.

Updated: 2024-05-03 21:35:02

标题: 物理信息神经网络用于具有随机数据的算子方程

摘要: 我们考虑将统计矩计算应用于具有随机数据的算子方程。我们指出，应用PINNs（称为TPINNs）可以在最小更改现有PINNs代码的情况下解决诱导张量算子方程，并能够处理非线性和时间相关算子。我们提出了两种类型的架构，分别称为基础和多输出TPINNs，并研究它们的优势和局限性。进行了详尽的数值实验；展示了适用性和性能；提出了一系列新的有前景的研究途径。

更新时间: 2024-05-03 21:35:02

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2211.10344v2

Modelling Sampling Distributions of Test Statistics with Autograd

Simulation-based inference methods that feature correct conditional coverage of confidence sets based on observations that have been compressed to a scalar test statistic require accurate modelling of either the p-value function or the cumulative distribution function (cdf) of the test statistic. If the model of the cdf, which is typically a deep neural network, is a function of the test statistic then the derivative of the neural network with respect to the test statistic furnishes an approximation of the sampling distribution of the test statistic. We explore whether this approach to modelling conditional 1-dimensional sampling distributions is a viable alternative to the probability density-ratio method, also known as the likelihood-ratio trick. Relatively simple, yet effective, neural network models are used whose predictive uncertainty is quantified through a variety of methods.

Updated: 2024-05-03 21:34:12

标题: 使用自动求导模型测试统计量的抽样分布建模

摘要: 基于对被压缩为标量检验统计量的观察进行正确条件覆盖的模拟推断方法，需要准确建模p值函数或检验统计量的累积分布函数（cdf）。如果cdf的模型通常是一个深度神经网络，它是检验统计量的函数，那么神经网络对检验统计量的导数提供了对检验统计量的抽样分布的近似。我们探讨这种对建模条件一维抽样分布的方法是否是概率密度比方法的可行替代，也称为似然比技巧。我们使用相对简单但有效的神经网络模型，通过各种方法对其预测不确定性进行量化。

更新时间: 2024-05-03 21:34:12

领域: stat.ML,cs.LG,hep-ex,stat.CO

下载: http://arxiv.org/abs/2405.02488v1

Peer-to-Peer Deep Learning for Beyond-5G IoT

We present P2PL, a practical multi-device peer-to-peer deep learning algorithm that, unlike the federated learning paradigm, does not require coordination from edge servers or the cloud. This makes P2PL well-suited for the sheer scale of beyond-5G computing environments like smart cities that otherwise create range, latency, bandwidth, and single point of failure issues for federated approaches. P2PL introduces max norm synchronization to catalyze training, retains on-device deep model training to preserve privacy, and leverages local inter-device communication to implement distributed consensus. Each device iteratively alternates between two phases: 1) on-device learning and 2) peer-to-peer cooperation where they combine model parameters with nearby devices. We empirically show that all participating devices achieve the same test performance attained by federated and centralized training -- even with 100 devices and relaxed singly stochastic consensus weights. We extend these experimental results to settings with diverse network topologies, sparse and intermittent communication, and non-IID data distributions.

Updated: 2024-05-03 21:34:02

标题: 点对点深度学习用于超越5G物联网

摘要: 我们提出了P2PL，这是一个实用的多设备对等深度学习算法，与联邦学习范式不同，它不需要来自边缘服务器或云端的协调。这使得P2PL非常适用于超越5G计算环境的规模，比如智能城市，否则这些环境会给联邦方法带来范围、延迟、带宽和单点故障问题。 P2PL引入了最大范数同步来催化训练，保留了设备上的深度模型训练以保护隐私，并利用本地设备间通信来实现分布式共识。每个设备在两个阶段之间交替进行：1）设备上的学习和2）对等协作，在这个阶段，它们将模型参数与附近设备进行组合。我们经验性地展示，所有参与设备都能够达到联邦和集中式训练所达到的相同的测试性能，即使在有100个设备和放松了单一随机一致权重的情况下。我们将这些实验结果扩展到具有不同网络拓扑结构、稀疏和间歇通信以及非独立同分布数据分布的设置中。

更新时间: 2024-05-03 21:34:02

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2310.18861v2

A Survey of Few-Shot Learning for Biomedical Time Series

Advancements in wearable sensor technologies and the digitization of medical records have contributed to the unprecedented ubiquity of biomedical time series data. Data-driven models have tremendous potential to assist clinical diagnosis and improve patient care by improving long-term monitoring capabilities, facilitating early disease detection and intervention, as well as promoting personalized healthcare delivery. However, accessing extensively labeled datasets to train data-hungry deep learning models encounters many barriers, such as long-tail distribution of rare diseases, cost of annotation, privacy and security concerns, data-sharing regulations, and ethical considerations. An emerging approach to overcome the scarcity of labeled data is to augment AI methods with human-like capabilities to leverage past experiences to learn new tasks with limited examples, called few-shot learning. This survey provides a comprehensive review and comparison of few-shot learning methods for biomedical time series applications. The clinical benefits and limitations of such methods are discussed in relation to traditional data-driven approaches. This paper aims to provide insights into the current landscape of few-shot learning for biomedical time series and its implications for future research and applications.

Updated: 2024-05-03 21:22:27

标题: 《生物医学时间序列少样本学习调查》

摘要: 可穿戴传感器技术的进步和医疗记录的数字化促成了生物医学时间序列数据的空前普及。数据驱动模型具有巨大潜力，可以通过改善长期监测能力、促进早期疾病检测和干预，以及推动个性化医疗服务的提供，从而帮助临床诊断并改善患者护理。然而，访问广泛标记的数据集来训练数据需求量大的深度学习模型面临许多障碍，如罕见疾病的长尾分布、标注成本、隐私和安全问题、数据共享法规以及道德考虑。一种新兴的方法是通过赋予人类类似能力的AI方法来克服标记数据的稀缺性，以利用过去的经验来学习具有有限示例的新任务，称为少样本学习。本调查提供了对生物医学时间序列应用的少样本学习方法进行综合审查和比较。讨论了这些方法在临床上的益处和限制，以及与传统数据驱动方法的关系。本文旨在提供对生物医学时间序列少样本学习的当前格局以及对未来研究和应用的影响的见解。

更新时间: 2024-05-03 21:22:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.02485v1

LLM Security Guard for Code

Many developers rely on Large Language Models (LLMs) to facilitate software development. Nevertheless, these models have exhibited limited capabilities in the security domain. We introduce LLMSecGuard, a framework to offer enhanced code security through the synergy between static code analyzers and LLMs. LLMSecGuard is open source and aims to equip developers with code solutions that are more secure than the code initially generated by LLMs. This framework also has a benchmarking feature, aimed at providing insights into the evolving security attributes of these models.

Updated: 2024-05-03 21:09:25

标题: LLM安全警卫代码

摘要: 许多开发人员依赖大型语言模型（LLMs）来促进软件开发。然而，这些模型在安全领域表现出有限的能力。我们引入了LLMSecGuard，一个通过静态代码分析器和LLMs之间的协同作用来提供增强代码安全性的框架。LLMSecGuard是开源的，旨在为开发人员提供比LLMs最初生成的代码更安全的代码解决方案。该框架还具有基准测试功能，旨在提供有关这些模型不断发展的安全属性的见解。

更新时间: 2024-05-03 21:09:25

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2405.01103v2

Proximal Curriculum with Task Correlations for Deep Reinforcement Learning

Curriculum design for reinforcement learning (RL) can speed up an agent's learning process and help it learn to perform well on complex tasks. However, existing techniques typically require domain-specific hyperparameter tuning, involve expensive optimization procedures for task selection, or are suitable only for specific learning objectives. In this work, we consider curriculum design in contextual multi-task settings where the agent's final performance is measured w.r.t. a target distribution over complex tasks. We base our curriculum design on the Zone of Proximal Development concept, which has proven to be effective in accelerating the learning process of RL agents for uniform distribution over all tasks. We propose a novel curriculum, ProCuRL-Target, that effectively balances the need for selecting tasks that are not too difficult for the agent while progressing the agent's learning toward the target distribution via leveraging task correlations. We theoretically justify the task selection strategy of ProCuRL-Target by analyzing a simple learning setting with REINFORCE learner model. Our experimental results across various domains with challenging target task distributions affirm the effectiveness of our curriculum strategy over state-of-the-art baselines in accelerating the training process of deep RL agents.

Updated: 2024-05-03 21:07:54

标题: 深度强化学习中具有任务相关性的近端课程

摘要: 加强学习（RL）的课程设计可以加快代理人的学习过程，并帮助其学会在复杂任务上表现良好。然而，现有技术通常需要特定于领域的超参数调整，涉及昂贵的任务选择优化程序，或仅适用于特定学习目标。在这项工作中，我们考虑在情境多任务设置中的课程设计，其中代理人的最终表现是根据复杂任务的目标分布来衡量的。我们基于近区发展概念进行课程设计，该概念已被证明可以加速RL代理人在所有任务的均匀分布上的学习过程。我们提出了一种新颖的课程设计，ProCuRL-Target，通过利用任务之间的相关性有效平衡了选择对代理人来说不太困难的任务的需要，同时将代理人的学习引向目标分布。我们通过分析一个具有REINFORCE学习者模型的简单学习设置，从理论上证明了ProCuRL-Target的任务选择策略。我们在具有挑战性目标任务分布的各个领域的实验结果证实了我们的课程策略在加快深度RL代理人训练过程中的有效性，超越了现有技术基线。

更新时间: 2024-05-03 21:07:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.02481v1

A Network Simulation of OTC Markets with Multiple Agents

We present a novel agent-based approach to simulating an over-the-counter (OTC) financial market in which trades are intermediated solely by market makers and agent visibility is constrained to a network topology. Dynamics, such as changes in price, result from agent-level interactions that ubiquitously occur via market maker agents acting as liquidity providers. Two additional agents are considered: trend investors use a deep convolutional neural network paired with a deep Q-learning framework to inform trading decisions by analysing price history; and value investors use a static price-target to determine their trade directions and sizes. We demonstrate that our novel inclusion of a network topology with market makers facilitates explorations into various market structures. First, we present the model and an overview of its mechanics. Second, we validate our findings via comparison to the real-world: we demonstrate a fat-tailed distribution of price changes, auto-correlated volatility, a skew negatively correlated to market maker positioning, predictable price-history patterns and more. Finally, we demonstrate that our network-based model can lend insights into the effect of market-structure on price-action. For example, we show that markets with sparsely connected intermediaries can have a critical point of fragmentation, beyond which the market forms distinct clusters and arbitrage becomes rapidly possible between the prices of different market makers. A discussion is provided on future work that would be beneficial.

Updated: 2024-05-03 20:45:00

标题: 一个具有多个代理的场外交易市场的网络模拟

摘要: 我们提出了一种新颖的基于代理的方法来模拟场外交易（OTC）金融市场，其中交易仅由市场制造商进行中介，并且代理的可见性受网络拓扑的限制。动态变化，例如价格变化，是由代理级互动产生的，这些互动普遍通过充当流动性提供者的市场制造商代理发生。另外考虑了两种额外的代理：趋势投资者使用深度卷积神经网络配合深度Q学习框架来通过分析价格历史来确定交易决策；价值投资者使用静态价格目标来确定他们的交易方向和规模。我们证明了我们对网络拓扑与市场制造商的新颖包含有助于探索各种市场结构。首先，我们介绍了模型及其机制的概述。其次，我们通过与现实世界的比较验证了我们的发现：我们展示了价格变化的长尾分布，自相关波动性，与市场制造商定位负相关的偏斜，可预测的价格历史模式等。最后，我们证明了我们基于网络的模型可以揭示市场结构对价格行为的影响。例如，我们展示了与稀疏连接中介者的市场可能存在重要的碎片化临界点，超过该临界点后市场形成不同的集群，并且不同市场制造商的价格之间的套利迅速变得可能。我们对未来可能有益的工作进行了讨论。

更新时间: 2024-05-03 20:45:00

领域: econ.EM,cs.AI,cs.MA

下载: http://arxiv.org/abs/2405.02480v1

Continuous Learned Primal Dual

Neural ordinary differential equations (Neural ODEs) propose the idea that a sequence of layers in a neural network is just a discretisation of an ODE, and thus can instead be directly modelled by a parameterised ODE. This idea has had resounding success in the deep learning literature, with direct or indirect influence in many state of the art ideas, such as diffusion models or time dependant models. Recently, a continuous version of the U-net architecture has been proposed, showing increased performance over its discrete counterpart in many imaging applications and wrapped with theoretical guarantees around its performance and robustness. In this work, we explore the use of Neural ODEs for learned inverse problems, in particular with the well-known Learned Primal Dual algorithm, and apply it to computed tomography (CT) reconstruction.

Updated: 2024-05-03 20:40:14

标题: Continuous Learned Primal Dual 请问还有其他需要帮助的吗？

摘要: 神经常微分方程（Neural ODEs）提出了神经网络中的一系列层只是ODE的离散化，因此可以直接通过参数化ODE来建模。这个想法在深度学习文献中取得了巨大成功，直接或间接地影响了许多最先进的想法，如扩散模型或时间依赖模型。最近，提出了U-net架构的连续版本，在许多成像应用中表现出比其离散对应物更高的性能，并围绕其性能和稳健性提供了理论保证。在这项工作中，我们探讨了将神经ODE用于学习逆问题，特别是与众所周知的学习原始对偶算法相结合，并将其应用于计算机断层扫描（CT）重建。

更新时间: 2024-05-03 20:40:14

领域: cs.LG,eess.IV

下载: http://arxiv.org/abs/2405.02478v1

Increasing Trust in Language Models through the Reuse of Verified Circuits

Language Models (LMs) are increasingly used for a wide range of prediction tasks, but their training can often neglect rare edge cases, reducing their reliability. Here, we define a stringent standard of trustworthiness whereby the task algorithm and circuit implementation must be verified, accounting for edge cases, with no known failure modes. We show that a transformer model can be trained to meet this standard if built using mathematically and logically specified frameworks. In this paper, we fully verify a model for n-digit integer addition. To exhibit the reusability of verified modules, we insert the trained integer addition model into an untrained model and train the combined model to perform both addition and subtraction. We find extensive reuse of the addition circuits for both tasks, easing verification of the more complex subtractor model. We discuss how inserting verified task modules into LMs can leverage model reuse to improve verifiability and trustworthiness of language models built using them. The reuse of verified circuits reduces the effort to verify more complex composite models which we believe to be a significant step towards safety of language models.

Updated: 2024-05-03 20:32:15

标题: 通过重复使用经过验证的电路增加对语言模型的信任

摘要: 语言模型（LMs）越来越被广泛用于各种预测任务，但它们的训练往往会忽视罕见的边缘情况，降低它们的可靠性。在这里，我们定义了一个严格的可信度标准，要求任务算法和电路实现必须经过验证，考虑到边缘情况，且没有已知的故障模式。我们展示了如果使用数学和逻辑指定的框架构建，transformer模型可以被训练以满足这一标准。在本文中，我们完全验证了一个n位整数加法模型。为了展示经过验证模块的可重复使用性，我们将经过训练的整数加法模型插入到一个未经训练的模型中，并训练组合模型执行加法和减法。我们发现加法电路在两个任务中都有广泛的重用，减轻了更复杂减法模型的验证工作。我们讨论了将经过验证的任务模块插入LMs中如何利用模型重用来提高使用它们构建的语言模型的可验证性和可信度。经过验证电路的重复使用减少了验证更复杂的复合模型的工作量，我们认为这是语言模型安全的重要一步。

更新时间: 2024-05-03 20:32:15

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.02619v4

SSI4IoT: Unlocking the Potential of IoT Tailored Self-Sovereign Identity

The emerging Self-Sovereign Identity (SSI) techniques, such as Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), move control of digital identity from conventional identity providers to individuals and lay down the foundation for people, organizations, and things establishing rich digital relationship. The existing applications of SSI mainly focus on creating person-to-person and person-to-service relationships, whereas person-to-device and device-to-device interactions have been largely overlooked. In this paper, we close this gap by identifying a number of key challenges of applying SSI to the Internet of Things (IoT) and providing a comprehensive taxonomy and usage of VCs in the IoT context with respect to their validity period, trust and interoperability level, and scope of usage. The life-cycle management of VCs as well as various optimization techniques for realizing SSI in IoT environments are also addressed in great detail. This work is a noteworthy step towards massive adoption of SSI for securing existing and future IoT applications in practice.

Updated: 2024-05-03 20:31:52

标题: SSI4IoT：释放量身定制的自主身份识别技术在物联网中的潜力

摘要: 新兴的自主身份（SSI）技术，如分散式标识符（DIDs）和可验证凭证（VCs），将数字身份的控制权从传统身份提供者转移到个人，并为人们、组织和事物建立丰富的数字关系奠定了基础。SSI的现有应用主要集中在创建人与人、人与服务之间的关系，而人与设备以及设备之间的交互则大多被忽视。本文通过确定将SSI应用于物联网（IoT）的一些关键挑战，提供了关于VC在IoT环境中的有效期、信任和互操作水平以及使用范围的全面分类和用法。VC的生命周期管理以及在实现IoT环境中SSI的各种优化技术也被详细讨论。这项工作是朝着在实践中为保护现有和未来的IoT应用程序大规模采用SSI迈出的重要一步。

更新时间: 2024-05-03 20:31:52

领域: cs.ET,cs.CR,cs.DC

下载: http://arxiv.org/abs/2405.02476v1

Generalizing Orthogonalization for Models with Non-linearities

The complexity of black-box algorithms can lead to various challenges, including the introduction of biases. These biases present immediate risks in the algorithms' application. It was, for instance, shown that neural networks can deduce racial information solely from a patient's X-ray scan, a task beyond the capability of medical experts. If this fact is not known to the medical expert, automatic decision-making based on this algorithm could lead to prescribing a treatment (purely) based on racial information. While current methodologies allow for the "orthogonalization" or "normalization" of neural networks with respect to such information, existing approaches are grounded in linear models. Our paper advances the discourse by introducing corrections for non-linearities such as ReLU activations. Our approach also encompasses scalar and tensor-valued predictions, facilitating its integration into neural network architectures. Through extensive experiments, we validate our method's effectiveness in safeguarding sensitive data in generalized linear models, normalizing convolutional neural networks for metadata, and rectifying pre-existing embeddings for undesired attributes.

Updated: 2024-05-03 20:25:57

标题: 对具有非线性关系的模型泛化正交化

摘要: 黑盒算法的复杂性可能导致各种挑战，包括引入偏见。这些偏见在算法应用中带来了即时风险。例如，已经显示出神经网络可以仅仅通过患者的X射线扫描推断出种族信息，这是医学专家无法做到的任务。如果医学专家不知道这一事实，基于该算法的自动决策可能会导致基于种族信息的治疗处方。虽然当前的方法允许对神经网络进行关于这种信息的“正交化”或“归一化”，但现有方法基于线性模型。我们的论文通过引入ReLU激活等非线性的校正，推进了这一讨论。我们的方法还包括标量和张量值预测，便于将其集成到神经网络架构中。通过大量实验，我们验证了我们的方法在保护广义线性模型中的敏感数据、规范化卷积神经网络的元数据以及纠正现有嵌入中不需要的属性方面的有效性。

更新时间: 2024-05-03 20:25:57

领域: cs.LG,cs.AI,stat.CO,stat.ME

下载: http://arxiv.org/abs/2405.02475v1

FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion Models

Text-to-image generative models, especially those based on latent diffusion models (LDMs), have demonstrated outstanding ability in generating high-quality and high-resolution images from textual prompts. With this advancement, various fine-tuning methods have been developed to personalize text-to-image models for specific applications such as artistic style adaptation and human face transfer. However, such advancements have raised copyright concerns, especially when the data are used for personalization without authorization. For example, a malicious user can employ fine-tuning techniques to replicate the style of an artist without consent. In light of this concern, we propose FT-Shield, a watermarking solution tailored for the fine-tuning of text-to-image diffusion models. FT-Shield addresses copyright protection challenges by designing new watermark generation and detection strategies. In particular, it introduces an innovative algorithm for watermark generation. It ensures the seamless transfer of watermarks from training images to generated outputs, facilitating the identification of copyrighted material use. To tackle the variability in fine-tuning methods and their impact on watermark detection, FT-Shield integrates a Mixture of Experts (MoE) approach for watermark detection. Comprehensive experiments validate the effectiveness of our proposed FT-Shield.

Updated: 2024-05-03 20:06:17

标题: FT-Shield：一种针对文本到图像扩散模型未经授权微调的水印

摘要: 文本到图像生成模型，特别是基于潜在扩散模型（LDMs）的模型，已经展现出在从文本提示生成高质量和高分辨率图像方面的出色能力。随着这一进展，各种微调方法已经被开发出来，用于个性化文本到图像模型，以适用于特定应用，如艺术风格适应和人脸转移。然而，这样的进展引起了版权问题，特别是当数据被未经授权地用于个性化时。例如，恶意用户可以使用微调技术来复制艺术家的风格而未经同意。鉴于这一问题，我们提出了FT-Shield，一种专为文本到图像扩散模型微调而设计的水印解决方案。FT-Shield通过设计新的水印生成和检测策略来解决版权保护挑战。特别是，它引入了一种创新的水印生成算法。它确保了水印从训练图像到生成的输出的无缝传输，便于识别版权材料的使用。为了应对微调方法的可变性及其对水印检测的影响，FT-Shield集成了一种专家混合（MoE）方法用于水印检测。全面的实验验证了我们提出的FT-Shield的有效性。

更新时间: 2024-05-03 20:06:17

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2310.02401v2

ProFLingo: A Fingerprinting-based Copyright Protection Scheme for Large Language Models

Large language models (LLMs) have attracted significant attention in recent years. Due to their "Large" nature, training LLMs from scratch consumes immense computational resources. Since several major players in the artificial intelligence (AI) field have open-sourced their original LLMs, an increasing number of individual researchers and smaller companies are able to build derivative LLMs based on these open-sourced models at much lower costs. However, this practice opens up possibilities for unauthorized use or reproduction that may not comply with licensing agreements, and deriving models can change the model's behavior, thus complicating the determination of model ownership. Current copyright protection schemes for LLMs are either designed for white-box settings or require additional modifications to the original model, which restricts their use in real-world settings. In this paper, we propose ProFLingo, a black-box fingerprinting-based copyright protection scheme for LLMs. ProFLingo generates adversarial examples (AEs) that can represent the unique decision boundary characteristics of an original model, thereby establishing unique fingerprints. Our scheme checks the effectiveness of these adversarial examples on a suspect model to determine whether it has been derived from the original model. ProFLingo offers a non-invasive approach, which neither requires knowledge of the suspect model nor modifications to the base model or its training process. To the best of our knowledge, our method represents the first black-box fingerprinting technique for copyright protection for LLMs. Our source code and generated AEs are available at: https://github.com/hengvt/ProFLingo_arXiv.

Updated: 2024-05-03 20:00:40

标题: ProFLingo：基于指纹识别的大型语言模型版权保护方案

摘要: 大型语言模型（LLMs）近年来吸引了很多关注。由于它们的“大”特性，从头开始训练LLMs需要大量的计算资源。由于一些人工智能（AI）领域的主要参与者已经开源了他们的原始LLMs，越来越多的个人研究人员和小公司能够基于这些开源模型以更低的成本构建衍生LLMs。然而，这种做法打开了未经授权使用或复制的可能性，这可能违反许可协议，并且衍生模型可能会改变模型的行为，从而使确定模型所有权变得更加复杂。目前针对LLMs的版权保护方案要么设计用于白盒设置，要么需要对原始模型进行额外修改，这限制了它们在实际环境中的使用。在本文中，我们提出了ProFLingo，这是一种基于黑盒指纹技术的LLMs版权保护方案。ProFLingo生成能够代表原始模型独特决策边界特征的对抗示例（AEs），从而建立独特的指纹。我们的方案检查这些对抗性示例对疑犯模型的有效性，以确定它是否是从原始模型派生而来。ProFLingo提供了一种非侵入式方法，既不需要了解疑犯模型，也不需要对基础模型或其训练过程进行修改。据我们所知，我们的方法代表了第一个针对LLMs版权保护的黑盒指纹技术。我们的源代码和生成的AEs可在https://github.com/hengvt/ProFLingo_arXiv获取。

更新时间: 2024-05-03 20:00:40

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.02466v1

mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture

The escalating complexity of micro-services architecture in cloud-native technologies poses significant challenges for maintaining system stability and efficiency. To conduct root cause analysis (RCA) and resolution of alert events, we propose a pioneering framework, multi-Agent Blockchain-inspired Collaboration for root cause analysis in micro-services architecture (mABC), to revolutionize the AI for IT operations (AIOps) domain, where multiple agents based on the powerful large language models (LLMs) perform blockchain-inspired voting to reach a final agreement following a standardized process for processing tasks and queries provided by Agent Workflow. Specifically, seven specialized agents derived from Agent Workflow each provide valuable insights towards root cause analysis based on their expertise and the intrinsic software knowledge of LLMs collaborating within a decentralized chain. To avoid potential instability issues in LLMs and fully leverage the transparent and egalitarian advantages inherent in a decentralized structure, mABC adopts a decision-making process inspired by blockchain governance principles while considering the contribution index and expertise index of each agent. Experimental results on the public benchmark AIOps challenge dataset and our created train-ticket dataset demonstrate superior performance in accurately identifying root causes and formulating effective solutions, compared to previous strong baselines. The ablation study further highlights the significance of each component within mABC, with Agent Workflow, multi-agent, and blockchain-inspired voting being crucial for achieving optimal performance. mABC offers a comprehensive automated root cause analysis and resolution in micro-services architecture and achieves a significant improvement in the AIOps domain compared to existing baselines

Updated: 2024-05-03 19:58:01

标题: mABC：基于区块链启发的多代理协作在微服务架构中的根本原因分析

摘要: 云原生技术中微服务架构日益复杂，对系统稳定性和效率的维护提出了重大挑战。为了进行根本原因分析（RCA）和解决警报事件，我们提出了一个开创性框架，即基于多智能体区块链启发的微服务架构根本原因分析协作（mABC），以革新AI运维（AIOps）领域，其中基于强大的大语言模型（LLMs）的多智能体通过区块链启发式投票，在遵循由智能体工作流提供的任务和查询的标准化过程后达成最终一致意见。具体而言，从智能体工作流中衍生出的七个专业智能体根据其专业知识和LLMs之间协作的内在软件知识为根本原因分析提供了有价值的见解。为避免LLMs中潜在的不稳定问题，并充分利用分散结构中透明和平等的优势，mABC采用了受区块链治理原则启发的决策过程，同时考虑了每个智能体的贡献指数和专业指数。在公开基准AIOps挑战数据集和我们创建的火车票数据集上的实验结果显示，与先前的强基线相比，mABC在准确识别根本原因和制定有效解决方案方面表现出卓越性能。消融研究进一步突显了mABC中每个组件的重要性，智能体工作流、多智能体和区块链启发式投票对实现最佳性能至关重要。与现有基线相比，mABC为微服务架构提供了全面自动化的根本原因分析和解决方案，并在AIOps领域取得了显著进步。

更新时间: 2024-05-03 19:58:01

领域: cs.MA,cs.CR,cs.DC

下载: http://arxiv.org/abs/2404.12135v2

Knowledge Graph Extension by Entity Type Recognition

Knowledge graphs have emerged as a sophisticated advancement and refinement of semantic networks, and their deployment is one of the critical methodologies in contemporary artificial intelligence. The construction of knowledge graphs is a multifaceted process involving various techniques, where researchers aim to extract the knowledge from existing resources for the construction since building from scratch entails significant labor and time costs. However, due to the pervasive issue of heterogeneity, the description diversity across different knowledge graphs can lead to mismatches between concepts, thereby impacting the efficacy of knowledge extraction. This Ph.D. study focuses on automatic knowledge graph extension, i.e., properly extending the reference knowledge graph by extracting and integrating concepts from one or more candidate knowledge graphs. We propose a novel knowledge graph extension framework based on entity type recognition. The framework aims to achieve high-quality knowledge extraction by aligning the schemas and entities across different knowledge graphs, thereby enhancing the performance of the extension. This paper elucidates three major contributions: (i) we propose an entity type recognition method exploiting machine learning and property-based similarities to enhance knowledge extraction; (ii) we introduce a set of assessment metrics to validate the quality of the extended knowledge graphs; (iii) we develop a platform for knowledge graph acquisition, management, and extension to benefit knowledge engineers practically. Our evaluation comprehensively demonstrated the feasibility and effectiveness of the proposed extension framework and its functionalities through quantitative experiments and case studies.

Updated: 2024-05-03 19:55:03

标题: 实体类型识别扩展的知识图。

摘要: 知识图谱已经成为语义网络的一个复杂先进和精炼的发展，并且它们的部署是当代人工智能中关键方法之一。知识图谱的构建是一个多方面的过程，涉及各种技术，研究人员的目标是从现有资源中提取知识进行构建，因为从头开始构建涉及显着的劳动力和时间成本。然而，由于异质性的普遍问题，不同知识图谱之间的描述多样性可能导致概念之间的不匹配，从而影响知识提取的效果。这篇博士论文关注自动知识图谱扩展，即通过从一个或多个候选知识图谱中提取和集成概念，适当扩展参考知识图谱。我们提出了一种基于实体类型识别的新颖知识图谱扩展框架。该框架旨在通过对不同知识图谱之间的模式和实体进行对齐，从而提高扩展的性能，实现高质量的知识提取。本文阐明了三个主要贡献：(i)我们提出了一种利用机器学习和基于属性的相似性来增强知识提取的实体类型识别方法；(ii)我们引入了一组评估指标来验证扩展知识图谱的质量；(iii)我们开发了一个平台，用于知识图谱的获取、管理和扩展，以使知识工程师能够实际受益。我们的评估通过定量实验和案例研究全面展示了所提出的扩展框架及其功能的可行性和有效性。

更新时间: 2024-05-03 19:55:03

领域: cs.AI

下载: http://arxiv.org/abs/2405.02463v1

Controlled Query Evaluation through Epistemic Dependencies

In this paper, we propose the use of epistemic dependencies to express data protection policies in Controlled Query Evaluation (CQE), which is a form of confidentiality-preserving query answering over ontologies and databases. The resulting policy language goes significantly beyond those proposed in the literature on CQE so far, allowing for very rich and practically interesting forms of data protection rules. We show the expressive abilities of our framework and study the data complexity of CQE for (unions of) conjunctive queries when ontologies are specified in the Description Logic DL-Lite_R. Interestingly, while we show that the problem is in general intractable, we prove tractability for the case of acyclic epistemic dependencies by providing a suitable query rewriting algorithm. The latter result paves the way towards the implementation and practical application of this new approach to CQE.

Updated: 2024-05-03 19:48:07

标题: 通过认知依赖关系控制查询评估

摘要: 在本文中，我们提出使用认识依赖来表达在受控查询评估（CQE）中的数据保护政策，这是一种在本体和数据库上进行保密查询回答的形式。由此产生的政策语言在迄今为止在CQE文献中提出的那些政策之外，允许非常丰富和实际上有趣的数据保护规则形式。我们展示了我们框架的表达能力，并研究了当本体在描述逻辑DL-Lite_R中指定时，CQE对（合取）查询的数据复杂性。有趣的是，虽然我们表明问题在一般情况下是难解的，但我们证明了对于无环认识依赖的情况，通过提供合适的查询重写算法，问题是可解的。后一结果为这种新方法在CQE中的实施和实际应用铺平了道路。

更新时间: 2024-05-03 19:48:07

领域: cs.AI

下载: http://arxiv.org/abs/2405.02458v1

Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning

Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time. This paper presents a constrained formulation for multi-task RL where the goal is to maximize the average performance of the policy across tasks subject to bounds on the performance in each task. We consider solving this problem both in the centralized setting, where information for all tasks is accessible to a single server, and in the decentralized setting, where a network of agents, each given one task and observing local information, cooperate to find the solution of the globally constrained objective using local communication. We first propose a primal-dual algorithm that provably converges to the globally optimal solution of this constrained formulation under exact gradient evaluations. When the gradient is unknown, we further develop a sampled-based actor-critic algorithm that finds the optimal policy using online samples of state, action, and reward. Finally, we study the extension of the algorithm to the linear function approximation setting.

Updated: 2024-05-03 19:43:30

标题: 自然政策梯度和演员评论方法用于受限多任务强化学习

摘要: 多任务强化学习（RL）旨在找到一个单一策略，同时有效地解决多个任务。本文提出了一种多任务RL的受限制组合，其目标是在各个任务的性能限制下最大化策略的平均性能。我们考虑在中心化设置和分散设置中解决这个问题，其中在中心化设置中，所有任务的信息可访问单个服务器，而在分散设置中，一组代理通过本地通信合作，每个代理都有一个任务并观察本地信息，以找到全局受限制目标的解决方案。我们首先提出了一个原始-对偶算法，证明在精确梯度评估下会收敛到这个受限制组合的全局最优解。当梯度未知时，我们进一步开发了一种基于采样的演员-评论家算法，使用状态、动作和奖励的在线样本找到最佳策略。最后，我们研究了该算法在线性函数逼近设置下的扩展。

更新时间: 2024-05-03 19:43:30

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2405.02456v1

On the Optimization Landscape of Maximum Mean Discrepancy

Generative models have been successfully used for generating realistic signals. Because the likelihood function is typically intractable in most of these models, the common practice is to use "implicit" models that avoid likelihood calculation. However, it is hard to obtain theoretical guarantees for such models. In particular, it is not understood when they can globally optimize their non-convex objectives. Here we provide such an analysis for the case of Maximum Mean Discrepancy (MMD) learning of generative models. We prove several optimality results, including for a Gaussian distribution with low rank covariance (where likelihood is inapplicable) and a mixture of Gaussians. Our analysis shows that that the MMD optimization landscape is benign in these cases, and therefore gradient based methods will globally minimize the MMD objective.

Updated: 2024-05-03 19:41:06

标题: 最大均值差异的优化景观

摘要: 生成模型已成功用于生成逼真的信号。由于在大多数这些模型中，似然函数通常难以处理，因此常见做法是使用可以避免似然计算的“隐式”模型。然而，对于这些模型很难获得理论保证。特别是，人们不清楚它们何时可以全局优化它们的非凸目标。在这里，我们为生成模型的最大均值差异（MMD）学习提供了这样的分析。我们证明了几个最优性结果，包括对于具有低秩协方差的高斯分布（在这种情况下似然无法应用）和高斯混合分布。我们的分析表明，在这些情况下，MMD优化景观是良性的，因此基于梯度的方法将全局最小化MMD目标。

更新时间: 2024-05-03 19:41:06

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2110.13452v2

ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints

Reinforcement Learning (RL) for constrained MDPs (CMDPs) is an increasingly important problem for various applications. Often, the average criterion is more suitable than the discounted criterion. Yet, RL for average-CMDPs (ACMDPs) remains a challenging problem. Algorithms designed for discounted constrained RL problems often do not perform well for the average CMDP setting. In this paper, we introduce a new policy optimization with function approximation algorithm for constrained MDPs with the average criterion. The Average-Constrained Policy Optimization (ACPO) algorithm is inspired by trust region-based policy optimization algorithms. We develop basic sensitivity theory for average CMDPs, and then use the corresponding bounds in the design of the algorithm. We provide theoretical guarantees on its performance, and through extensive experimental work in various challenging OpenAI Gym environments, show its superior empirical performance when compared to other state-of-the-art algorithms adapted for the ACMDPs.

Updated: 2024-05-03 19:40:10

标题: ACPO: 一种带有约束的平均MDPs的策略优化算法

摘要: 强化学习（RL）用于约束MDPs（CMDPs）是各种应用中日益重要的问题。通常，平均准则比折现准则更适合。然而，针对平均CMDPs（ACMDPs）的RL仍然是一个具有挑战性的问题。为折现约束RL问题设计的算法通常在平均CMDP设置下表现不佳。在本文中，我们介绍了一种新的基于函数逼近的约束MDPs平均准则的策略优化算法。平均约束策略优化（ACPO）算法受信任区域基于策略优化算法的启发。我们发展了平均CMDPs的基本敏感度理论，然后在算法设计中使用相应的界限。我们提供了其性能的理论保证，并通过在各种具有挑战性的OpenAI Gym环境中进行广泛的实验工作，展示了与其他适应ACMDPs的最新算法相比，其优越的实证性能。

更新时间: 2024-05-03 19:40:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2302.00808v3

What is Sentiment Meant to Mean to Language Models?

Sentiment analysis is one of the most widely used techniques in text analysis. Recent advancements with Large Language Models have made it more accurate and accessible than ever, allowing researchers to classify text with only a plain English prompt. However, "sentiment" entails a wide variety of concepts depending on the domain and tools used. It has been used to mean emotion, opinions, market movements, or simply a general ``good-bad'' dimension. This raises a question: What exactly are language models doing when prompted to label documents by sentiment? This paper first overviews how sentiment is defined across different contexts, highlighting that it is a confounded measurement construct in that it entails multiple variables, such as emotional valence and opinion, without disentangling them. I then test three language models across two data sets with prompts requesting sentiment, valence, and stance classification. I find that sentiment labels most strongly correlate with valence labels. I further find that classification improves when researchers more precisely specify their dimension of interest rather than using the less well-defined concept of sentiment. I conclude by encouraging researchers to move beyond "sentiment" when feasible and use a more precise measurement construct.

Updated: 2024-05-03 19:37:37

标题: 情感在语言模型中的含义是什么？

摘要: 情感分析是文本分析中最广泛使用的技术之一。最近，大型语言模型的进展使其比以往更准确和更易访问，使研究人员能够仅通过简单的英语提示对文本进行分类。然而，“情感”涉及各种概念，取决于领域和工具的使用。它被用来表示情绪、观点、市场走势，或者仅仅是一个一般的“好坏”维度。这引发了一个问题：当语言模型被提示为情感标签文档时，它们到底在做什么？本文首先概述了情感在不同背景下的定义，强调了它是一个混淆的测量构建，因为它涉及多个变量，如情绪倾向和观点，而没有将它们解开。然后，我使用请求情感、情绪倾向和立场分类的提示，在两个数据集上测试了三个语言模型。我发现情感标签与情绪倾向标签最强烈地相关。我进一步发现，当研究人员更精确地指定他们感兴趣的维度时，分类效果会提高，而不是使用不太明确定义的情感概念。最后，我鼓励研究人员在可行的情况下超越“情感”，使用更精确的测量构建。

更新时间: 2024-05-03 19:37:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.02454v1

Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design

Experimental design techniques such as active search and Bayesian optimization are widely used in the natural sciences for data collection and discovery. However, existing techniques tend to favor exploitation over exploration of the search space, which causes them to get stuck in local optima. This ``collapse" problem prevents experimental design algorithms from yielding diverse high-quality data. In this paper, we extend the Vendi scores -- a family of interpretable similarity-based diversity metrics -- to account for quality. We then leverage these quality-weighted Vendi scores to tackle experimental design problems across various applications, including drug discovery, materials discovery, and reinforcement learning. We found that quality-weighted Vendi scores allow us to construct policies for experimental design that flexibly balance quality and diversity, and ultimately assemble rich and diverse sets of high-performing data points. Our algorithms led to a 70%-170% increase in the number of effective discoveries compared to baselines.

Updated: 2024-05-03 19:33:44

标题: 质量加权Vendi分数及其在多样化实验设计中的应用

摘要: 实验设计技术，如主动搜索和贝叶斯优化，在自然科学中被广泛应用于数据收集和发现。然而，现有技术往往偏向于对搜索空间的开发而不是探索，这导致它们陷入局部最优解。这种“崩溃”问题阻碍了实验设计算法产生多样化高质量数据。在本文中，我们扩展了Vendi分数 - 一组可解释的基于相似性的多样性度量标准 - 以考虑质量。然后，我们利用这些质量加权的Vendi分数来解决各种应用中的实验设计问题，包括药物发现、材料发现和强化学习。我们发现，质量加权的Vendi分数使我们能够制定实验设计策略，灵活平衡质量和多样性，最终组合丰富多样的高性能数据点集。与基线相比，我们的算法使有效发现数量增加了70%-170%。

更新时间: 2024-05-03 19:33:44

领域: stat.ML,cond-mat.mtrl-sci,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2405.02449v1

Semantic Objective Functions: A distribution-aware method for adding logical constraints in deep learning

Issues of safety, explainability, and efficiency are of increasing concern in learning systems deployed with hard and soft constraints. Symbolic Constrained Learning and Knowledge Distillation techniques have shown promising results in this area, by embedding and extracting knowledge, as well as providing logical constraints during neural network training. Although many frameworks exist to date, through an integration of logic and information geometry, we provide a construction and theoretical framework for these tasks that generalize many approaches. We propose a loss-based method that embeds knowledge-enforces logical constraints-into a machine learning model that outputs probability distributions. This is done by constructing a distribution from the external knowledge/logic formula and constructing a loss function as a linear combination of the original loss function with the Fisher-Rao distance or Kullback-Leibler divergence to the constraint distribution. This construction includes logical constraints in the form of propositional formulas (Boolean variables), formulas of a first-order language with finite variables over a model with compact domain (categorical and continuous variables), and in general, likely applicable to any statistical model that was pretrained with semantic information. We evaluate our method on a variety of learning tasks, including classification tasks with logic constraints, transferring knowledge from logic formulas, and knowledge distillation from general distributions.

Updated: 2024-05-03 19:21:47

标题: 语义客观函数：一种在深度学习中添加逻辑约束的分布感知方法

摘要: 在具有硬性和软性约束的学习系统部署中，安全性、可解释性和效率等问题日益受到关注。符号约束学习和知识蒸馏技术在这一领域显示出了令人鼓舞的结果，通过嵌入和提取知识，以及在神经网络训练过程中提供逻辑约束。尽管目前存在许多框架，但通过逻辑和信息几何的整合，我们为这些任务提供了一个广义的构建和理论框架。我们提出了一种基于损失的方法，将知识嵌入-强制逻辑约束-到一个输出概率分布的机器学习模型中。这是通过从外部知识/逻辑公式构建一个分布，并构建一个损失函数，将原始损失函数与Fisher-Rao距离或Kullback-Leibler散度到约束分布的线性组合。这种构建包括以命题公式（布尔变量）、具有有限变量的一阶语言公式（分类和连续变量）形式的逻辑约束，以及一般情况下，可能适用于已经用语义信息进行预训练的任何统计模型。我们在各种学习任务上评估了我们的方法，包括带有逻辑约束的分类任务、从逻辑公式转移知识以及从一般分布进行知识蒸馏。

更新时间: 2024-05-03 19:21:47

领域: cs.AI,cs.IT,cs.LG,cs.LO,math.IT

下载: http://arxiv.org/abs/2405.15789v1

Learning minimal volume uncertainty ellipsoids

We consider the problem of learning uncertainty regions for parameter estimation problems. The regions are ellipsoids that minimize the average volumes subject to a prescribed coverage probability. As expected, under the assumption of jointly Gaussian data, we prove that the optimal ellipsoid is centered around the conditional mean and shaped as the conditional covariance matrix. In more practical cases, we propose a differentiable optimization approach for approximately computing the optimal ellipsoids using a neural network with proper calibration. Compared to existing methods, our network requires less storage and less computations in inference time, leading to accurate yet smaller ellipsoids. We demonstrate these advantages on four real-world localization datasets.

Updated: 2024-05-03 19:11:35

标题: 学习最小体积不确定度椭球

摘要: 我们考虑学习参数估计问题的不确定性区域的问题。这些区域是椭圆体，其平均体积最小，同时具有预定的覆盖概率。在联合高斯数据的假设下，我们证明最优椭圆体位于条件均值周围，并且形状与条件协方差矩阵相同。在更实际的情况下，我们提出了一种可微优化方法，用神经网络进行近似计算最优椭圆体，通过适当的校准。与现有方法相比，我们的网络在推断时需要更少的存储和计算，导致更准确但更小的椭圆体。我们在四个真实的定位数据集上展示了这些优势。

更新时间: 2024-05-03 19:11:35

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.02441v1

Dynamic Open Vocabulary Enhanced Safe-landing with Intelligence (DOVESEI)

This work targets what we consider to be the foundational step for urban airborne robots, a safe landing. Our attention is directed toward what we deem the most crucial aspect of the safe landing perception stack: segmentation. We present a streamlined reactive UAV system that employs visual servoing by harnessing the capabilities of open vocabulary image segmentation. This approach can adapt to various scenarios with minimal adjustments, bypassing the necessity for extensive data accumulation for refining internal models, thanks to its open vocabulary methodology. Given the limitations imposed by local authorities, our primary focus centers on operations originating from altitudes of 100 meters. This choice is deliberate, as numerous preceding works have dealt with altitudes up to 30 meters, aligning with the capabilities of small stereo cameras. Consequently, we leave the remaining 20m to be navigated using conventional 3D path planning methods. Utilizing monocular cameras and image segmentation, our findings demonstrate the system's capability to successfully execute landing maneuvers at altitudes as low as 20 meters. However, this approach is vulnerable to intermittent and occasionally abrupt fluctuations in the segmentation between frames in a video stream. To address this challenge, we enhance the image segmentation output by introducing what we call a dynamic focus: a masking mechanism that self adjusts according to the current landing stage. This dynamic focus guides the control system to avoid regions beyond the drone's safety radius projected onto the ground, thus mitigating the problems with fluctuations. Through the implementation of this supplementary layer, our experiments have reached improvements in the landing success rate of almost tenfold when compared to global segmentation. All the source code is open source and available online (github.com/MISTLab/DOVESEI).

Updated: 2024-05-03 19:05:18

标题: 动态开放词汇增强智能安全着陆（DOVESEI）

摘要: 这项工作针对我们认为是城市空中机器人的基础步骤，即安全着陆。我们关注的重点是我们认为安全着陆感知堆栈中最关键的方面：分割。我们提出了一种简化的反应式无人机系统，通过利用开放词汇图像分割的能力进行视觉伺服。这种方法可以适应各种场景，并且减少了对内部模型进行精细调整的必要性，这得益于其开放词汇方法。考虑到当地当局的限制，我们的主要重点集中在从100米高度起飞的操作上。这个选择是故意的，因为许多之前的作品已经处理了高达30米的高度，这与小型立体摄像机的能力相一致。因此，我们将剩余的20米留给使用传统的3D路径规划方法进行导航。利用单眼摄像头和图像分割，我们的研究结果表明系统能够成功执行低至20米高度的着陆动作。然而，这种方法容易受到视频流中帧之间分割的间歇性和偶尔的突然波动的影响。为了解决这一挑战，我们通过引入我们称之为动态焦点的遮罩机制增强了图像分割输出：这是一种根据当前着陆阶段自动调整的机制。这个动态焦点指导控制系统避免超出无人机在地面上投影的安全半径的区域，从而减轻波动问题。通过实施这个补充层，我们的实验在着陆成功率方面与全局分割相比取得了近十倍的改进。所有源代码都是开源的，可在线获取（github.com/MISTLab/DOVESEI）。

更新时间: 2024-05-03 19:05:18

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2308.11471v5

FastLloyd: Federated, Accurate, Secure, and Tunable $k$-Means Clustering with Differential Privacy

We study the problem of privacy-preserving $k$-means clustering in the horizontally federated setting. Existing federated approaches using secure computation, suffer from substantial overheads and do not offer output privacy. At the same time, differentially private (DP) $k$-means algorithms assume a trusted central curator and do not extend to federated settings. Naively combining the secure and DP solutions results in a protocol with impractical overhead. Instead, our work provides enhancements to both the DP and secure computation components, resulting in a design that is faster, more private, and more accurate than previous work. By utilizing the computational DP model, we design a lightweight, secure aggregation-based approach that achieves four orders of magnitude speed-up over state-of-the-art related work. Furthermore, we not only maintain the utility of the state-of-the-art in the central model of DP, but we improve the utility further by taking advantage of constrained clustering techniques.

Updated: 2024-05-03 19:04:37

标题: FastLloyd: 带有差分隐私的联合、准确、安全和可调节的$k$-均值聚类

摘要: 我们研究了在水平联合设置中保护隐私的$k$-means聚类问题。现有的使用安全计算的联合方法存在重大开销，并且不提供输出隐私保护。与此同时，差分隐私（DP）$k$-means算法假定有一个可信的中央管理者，并且不能扩展到联合设置。简单地结合安全和DP解决方案会导致一个开销不切实际的协议。相反，我们的工作对DP和安全计算组件进行了增强，设计出了比以前的工作更快、更私密和更准确的方案。通过利用计算DP模型，我们设计了一种基于安全聚合的轻量级方法，比最先进的相关工作提高了四个数量级的速度。此外，我们不仅保持了中央DP模型中最先进的效用，而且通过利用约束聚类技术进一步提高了效用。

更新时间: 2024-05-03 19:04:37

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.02437v1

Bridging the Gap: A Study of AI-based Vulnerability Management between Industry and Academia

Recent research advances in Artificial Intelligence (AI) have yielded promising results for automated software vulnerability management. AI-based models are reported to greatly outperform traditional static analysis tools, indicating a substantial workload relief for security engineers. However, the industry remains very cautious and selective about integrating AI-based techniques into their security vulnerability management workflow. To understand the reasons, we conducted a discussion-based study, anchored in the authors' extensive industrial experience and keen observations, to uncover the gap between research and practice in this field. We empirically identified three main barriers preventing the industry from adopting academic models, namely, complicated requirements of scalability and prioritization, limited customization flexibility, and unclear financial implications. Meanwhile, research works are significantly impacted by the lack of extensive real-world security data and expertise. We proposed a set of future directions to help better understand industry expectations, improve the practical usability of AI-based security vulnerability research, and drive a synergistic relationship between industry and academia.

Updated: 2024-05-03 19:00:50

标题: 填补学术界和工业界之间基于人工智能的漏洞管理的研究间隙

摘要: 人工智能（AI）领域的最新研究进展取得了令人鼓舞的成果，为自动化软件漏洞管理提供了希望。据报道，基于AI的模型远远优于传统的静态分析工具，表明安全工程师的工作负担大大减轻。然而，行业对将基于AI的技术纳入其安全漏洞管理工作流程仍然非常谨慎和选择性。为了了解原因，我们进行了一项基于讨论的研究，基于作者们丰富的工业经验和敏锐的观察，揭示了这一领域研究与实践之间的差距。我们实证地确定了阻止行业采用学术模型的三个主要障碍，即可扩展性和优先级的复杂要求、有限的定制灵活性和不明确的财务影响。同时，研究工作受到缺乏广泛的现实世界安全数据和专业知识的影响。我们提出了一系列未来方向，以帮助更好地了解行业期望，提高基于AI的安全漏洞研究的实际可用性，并推动行业与学术界之间的协同关系。

更新时间: 2024-05-03 19:00:50

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2405.02435v1

Delphi: Efficient Asynchronous Approximate Agreement for Distributed Oracles

Agreement protocols are crucial in various emerging applications, spanning from distributed (blockchains) oracles to fault-tolerant cyber-physical systems. In scenarios where sensor/oracle nodes measure a common source, maintaining output within the convex range of correct inputs, known as convex validity, is imperative. Present asynchronous convex agreement protocols employ either randomization, incurring substantial computation overhead, or approximate agreement techniques, leading to high $\mathcal{\tilde{O}}(n^3)$ communication for an $n$-node system. This paper introduces Delphi, a deterministic protocol with $\mathcal{\tilde{O}}(n^2)$ communication and minimal computation overhead. Delphi assumes that honest inputs are bounded, except with negligible probability, and integrates agreement primitives from literature with a novel weighted averaging technique. Experimental results highlight Delphi's superior performance, showcasing a significantly lower latency compared to state-of-the-art protocols. Specifically, for an $n=160$-node system, Delphi achieves an 8x and 3x improvement in latency within CPS and AWS environments, respectively.

Updated: 2024-05-03 18:57:46

标题: Delphi：用于分布式神谕的高效异步近似一致性算法

摘要: 协议协议在各种新兴应用中至关重要，涵盖了从分布式（区块链）预言机到容错的网络物理系统。在传感器/预言机节点测量共同来源的情况下，保持输出在正确输入的凸范围内，即凸有效性，至关重要。目前的异步凸协议采用随机化或近似协议技术，导致$n$节点系统的高$\mathcal{\tilde{O}}(n^3)$通信开销。本文介绍了Delphi，这是一个确定性协议，具有$\mathcal{\tilde{O}}(n^2)$通信和最小的计算开销。Delphi假设诚实输入是有界的，除了可以忽略不计的概率外，并将文献中的协议原语与一种新颖的加权平均技术相结合。实验结果突出了Delphi的卓越性能，展示了与最先进协议相比显着更低的延迟。具体而言，对于一个$n=160$节点系统，Delphi在CPS和AWS环境中分别实现了8倍和3倍的延迟改善。

更新时间: 2024-05-03 18:57:46

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2405.02431v1

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at https://wmdp.ai

Updated: 2024-05-03 18:56:39

标题: 《WMDP基准测试：通过遗忘来衡量和减少恶意使用》

摘要: 白宫关于人工智能的行政命令强调了大型语言模型（LLMs）赋予恶意行为者开发生物、网络和化学武器的风险。为了衡量这些恶意使用的风险，政府机构和主要人工智能实验室正在开发评估LLMs中危险能力的方法。然而，目前的评估是私人的，阻碍了进一步研究风险缓解。此外，它们只关注了几条高度具体的恶意使用路径。为了填补这些空白，我们公开发布了大规模杀伤性武器代理（WMDP）基准测试，这是一个包含3,668个多项选择问题的数据集，用作生物安全、网络安全和化学安全中危险知识的代理测量。WMDP由一组学术界人员和技术顾问开发，严格过滤以消除敏感信息后才进行公开发布。WMDP发挥了两个作用：首先，作为评估LLMs中危险知识的工具，其次，作为去学习方法的基准，用于消除这种危险知识。为了引导去学习的进展，我们开发了RMU，这是一种基于控制模型表示的最先进的去学习方法。RMU降低了模型在WMDP上的性能，同时保持了在生物学和计算机科学等领域的一般能力，这表明去学习可能是减少LLMs恶意使用的一个具体途径。我们在https://wmdp.ai上公开发布了我们的基准测试和代码。

更新时间: 2024-05-03 18:56:39

领域: cs.LG,cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2403.03218v6

CALRec: Contrastive Alignment of Generative LLMs For Sequential Recommendation

Traditional recommender systems such as matrix factorization methods rely on learning a shared dense embedding space to represent both items and user preferences. Sequence models such as RNN, GRUs, and, recently, Transformers have also excelled in the task of sequential recommendation. This task requires understanding the sequential structure present in users' historical interactions to predict the next item they may like. Building upon the success of Large Language Models (LLMs) in a variety of tasks, researchers have recently explored using LLMs that are pretrained on vast corpora of text for sequential recommendation. To use LLMs in sequential recommendations, both the history of user interactions and the model's prediction of the next item are expressed in text form. We propose CALRec, a two-stage LLM finetuning framework that finetunes a pretrained LLM in a two-tower fashion using a mixture of two contrastive losses and a language modeling loss: the LLM is first finetuned on a data mixture from multiple domains followed by another round of target domain finetuning. Our model significantly outperforms many state-of-the-art baselines (+37% in Recall@1 and +24% in NDCG@10) and systematic ablation studies reveal that (i) both stages of finetuning are crucial, and, when combined, we achieve improved performance, and (ii) contrastive alignment is effective among the target domains explored in our experiments.

Updated: 2024-05-03 18:51:19

标题: CALRec：生成LLM的对比对齐用于顺序推荐

摘要: 传统的推荐系统，如矩阵分解方法，依赖于学习一个共享的密集嵌入空间来表示物品和用户偏好。序列模型，如RNN、GRUs，以及最近的Transformer，在顺序推荐任务中也表现出色。这项任务需要理解用户历史交互中存在的顺序结构，以预测他们可能喜欢的下一个物品。借鉴大语言模型（LLMs）在各种任务中取得的成功，研究人员最近探索了在巨大文本语料库上预训练的LLMs在顺序推荐中的应用。为了在顺序推荐中使用LLMs，用户交互的历史和模型对下一个物品的预测都以文本形式表达。我们提出了CALRec，一个两阶段LLM微调框架，通过使用两种对比损失和语言建模损失，以两个塔的方式微调预训练的LLM：LLM首先在来自多个领域的数据混合上进行微调，然后进行另一轮目标领域微调。我们的模型明显优于许多最先进的基线（在Recall@1上增加了37%，在NDCG@10上增加了24%），系统消融研究表明（i）微调的两个阶段都至关重要，结合起来可以实现改进的性能，以及（ii）对比对齐在我们实验中探索的目标领域中是有效的。

更新时间: 2024-05-03 18:51:19

领域: cs.IR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.02429v1

Predicting the Geothermal Gradient in Colombia: a Machine Learning Approach

Accurate determination of the geothermal gradient is critical for assessing the geothermal energy potential of a given region. Of particular interest is the case of Colombia, a country with abundant geothermal resources. A history of active oil and gas exploration and production has left drilled boreholes in different geological settings, providing direct measurements of the geothermal gradient. Unfortunately, large regions of the country where geothermal resources might exist lack such measurements. Indirect geophysical measurements are costly and difficult to perform at regional scales. Computational thermal models could be constructed, but they require very detailed knowledge of the underlying geology and uniform sampling of subsurface temperatures to be well-constrained. We present an alternative approach that leverages recent advances in supervised machine learning and available direct measurements to predict the geothermal gradient in regions where only global-scale geophysical datasets and course geological knowledge are available. We find that a Gradient Boosted Regression Tree algorithm yields optimal predictions and extensively validate the trained model. We show that predictions of our model are within 12% accuracy and that independent measurements performed by other authors agree well with our model. Finnally, we present a geothermal gradient map for Colombia that highlights regions where futher exploration and data collection should be performed.

Updated: 2024-05-03 18:46:57

标题: 预测哥伦比亚地热梯度：一种机器学习方法

摘要: 准确确定地热梯度对评估特定地区地热能潜力至关重要。哥伦比亚是一个拥有丰富地热资源的国家，尤其引起关注。活跃的油气勘探和生产历史在不同地质环境中留下了钻井孔，提供了地热梯度的直接测量数据。不幸的是，该国许多可能存在地热资源的地区缺乏这样的测量数据。间接地球物理测量昂贵且难以在区域尺度上进行。计算热模型可以构建，但需要对底层地质有非常详细的了解，并对地下温度进行统一采样以保持约束。我们提出了一种利用最近监督机器学习的进展和现有直接测量数据来预测在只有全球尺度地球物理数据和粗略地质知识的地区的地热梯度的替代方法。我们发现Gradient Boosted Regression Tree算法产生了最佳预测结果，并对训练模型进行了广泛验证。我们展示了我们模型的预测准确率在12%以内，并且其他作者进行的独立测量结果与我们的模型相符。最后，我们为哥伦比亚制作了一个地热梯度图，突出显示了需要进行进一步勘探和数据收集的地区。

更新时间: 2024-05-03 18:46:57

领域: physics.geo-ph,cs.LG

下载: http://arxiv.org/abs/2404.05184v5

Navigating Explanatory Multiverse Through Counterfactual Path Geometry

Counterfactual explanations are the de facto standard when tasked with interpreting decisions of (opaque) predictive models. Their generation is often subject to algorithmic and domain-specific constraints -- such as density-based feasibility, and attribute (im)mutability or directionality of change -- that aim to maximise their real-life utility. In addition to desiderata with respect to the counterfactual instance itself, existence of a viable path connecting it with the factual data point, known as algorithmic recourse, has become an important technical consideration. While both of these requirements ensure that the steps of the journey as well as its destination are admissible, current literature neglects the multiplicity of such counterfactual paths. To address this shortcoming we introduce the novel concept of explanatory multiverse that encompasses all the possible counterfactual journeys. We then show how to navigate, reason about and compare the geometry of these trajectories with two methods: vector spaces and graphs. To this end, we overview their spacial properties -- such as affinity, branching, divergence and possible future convergence -- and propose an all-in-one metric, called opportunity potential, to quantify them. Implementing this (possibly interactive) explanatory process grants explainees agency by allowing them to select counterfactuals based on the properties of the journey leading to them in addition to their absolute differences. We show the flexibility, benefit and efficacy of such an approach through examples and quantitative evaluation on the German Credit and MNIST data sets.

Updated: 2024-05-03 18:42:14

标题: 通过反事实路径几何导航解释性多元宇宙

摘要: 反事实解释在解释（不透明的）预测模型的决策时往往是事实标准。它们的生成通常受到算法和领域特定约束的影响，例如基于密度的可行性，属性的（不）可变性或变化方向性，旨在最大化它们在现实生活中的效用。除了针对反事实实例本身的要求外，连接它与事实数据点的可行路径的存在，被称为算法途径，已成为一个重要的技术考虑因素。虽然这两个要求都确保了旅程的步骤和目的地是可接受的，但当前文献忽视了这种反事实路径的多样性。为了解决这一缺点，我们引入了全新的解释多元宇宙概念，涵盖了所有可能的反事实旅程。然后，我们展示了如何使用向量空间和图形两种方法来导航、推理和比较这些轨迹的几何形状。为此，我们概述了它们的空间特性，如亲和性、分支、分歧和可能的未来汇合，并提出了一个全能指标，称为机会潜力，用于量化它们。通过实施这一（可能是互动的）解释过程，被解释者可以根据导致其到达目的地的旅程的特性选择反事实，而不仅仅是它们的绝对差异。我们通过德国信用和MNIST数据集上的示例和定量评估展示了这种方法的灵活性、益处和有效性。

更新时间: 2024-05-03 18:42:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.02786v3

Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning

We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including active perception, agile full-body control, and long-horizon planning in a dynamic, partially-observable, multi-agent domain. We rely on large-scale, simulation-based data generation to obtain complex behaviors from egocentric vision which can be successfully transferred to physical robots using low-cost sensors. To achieve adequate visual realism, our simulation combines rigid-body physics with learned, realistic rendering via multiple Neural Radiance Fields (NeRFs). We combine teacher-based multi-agent RL and cross-experiment data reuse to enable the discovery of sophisticated soccer strategies. We analyze active-perception behaviors including object tracking and ball seeking that emerge when simply optimizing perception-agnostic soccer play. The agents display equivalent levels of performance and agility as policies with access to privileged, ground-truth state. To our knowledge, this paper constitutes a first demonstration of end-to-end training for multi-agent robot soccer, mapping raw pixel observations to joint-level actions, that can be deployed in the real world. Videos of the game-play and analyses can be seen on our website https://sites.google.com/view/vision-soccer .

Updated: 2024-05-03 18:41:13

标题: 用深度强化学习从自我中心视角学习机器人足球

摘要: 我们应用多智能体深度强化学习（RL）来训练具有完全板载计算和感知的端到端机器人足球策略，通过自我中心的RGB视觉。该设置反映了真实世界机器人技术的许多挑战，包括主动感知、灵活的全身控制和在动态、部分可观察、多智能体领域中进行长期规划。我们依靠基于大规模模拟数据生成来从自我中心视觉中获得复杂行为，这些行为可以成功地通过低成本传感器转移到物理机器人上。为了实现足够的视觉逼真感，我们的模拟结合了刚体物理学与通过多个神经辐射场（NeRFs）学习的逼真渲染。我们结合了基于教师的多智能体RL和跨实验数据重用，以实现对精密足球策略的发现。我们分析了主动感知行为，包括物体跟踪和寻找球，这些行为仅仅通过优化感知不可知的足球比赛就出现了。这些智能体展示了与具有特权、地面真实状态访问权限的策略相同水平的性能和灵活性。据我们所知，本文是首次演示了多智能体机器人足球的端到端训练，将原始像素观察映射到关节级别动作，可以在现实世界中部署。游戏实况视频和分析可以在我们的网站https://sites.google.com/view/vision-soccer 上观看。

更新时间: 2024-05-03 18:41:13

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.02425v1

ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains, or are closed-source. We present ChainForge, an open-source visual toolkit for prompt engineering and on-demand hypothesis testing of text generation LLMs. ChainForge provides a graphical interface for comparison of responses across models and prompt variations. Our system was designed to support three tasks: model selection, prompt template design, and hypothesis testing (e.g., auditing). We released ChainForge early in its development and iterated on its design with academics and online users. Through in-lab and interview studies, we find that a range of people could use ChainForge to investigate hypotheses that matter to them, including in real-world settings. We identify three modes of prompt engineering and LLM hypothesis testing: opportunistic exploration, limited evaluation, and iterative refinement.

Updated: 2024-05-03 18:34:35

标题: ChainForge：用于快速工程和LLM假设测试的可视化工具包

摘要: 评估大型语言模型（LLMs）的输出具有挑战性，需要进行许多响应的制作和理解。然而，超越基本提示的工具往往需要对编程API有所了解，专注于狭窄领域，或者是闭源的。我们提出了ChainForge，这是一个用于文本生成LLMs的提示工程和按需假设测试的开源可视化工具包。ChainForge提供了一个图形界面，用于比较不同模型和提示变体的响应。我们的系统旨在支持三项任务：模型选择、提示模板设计和假设测试（例如审计）。我们在ChainForge的早期开发阶段发布了该工具，并与学术界和在线用户一起进行了设计迭代。通过实验室和访谈研究，我们发现各种人群可以使用ChainForge来调查对他们重要的假设，包括在现实世界中。我们确定了提示工程和LLM假设测试的三种模式：机会探索、有限评估和迭代改进。

更新时间: 2024-05-03 18:34:35

领域: cs.HC,cs.AI,H.5.2; I.2

下载: http://arxiv.org/abs/2309.09128v3

A Unified Framework for Human-Allied Learning of Probabilistic Circuits

Probabilistic Circuits (PCs) have emerged as an efficient framework for representing and learning complex probability distributions. Nevertheless, the existing body of research on PCs predominantly concentrates on data-driven parameter learning, often neglecting the potential of knowledge-intensive learning, a particular issue in data-scarce/knowledge-rich domains such as healthcare. To bridge this gap, we propose a novel unified framework that can systematically integrate diverse domain knowledge into the parameter learning process of PCs. Experiments on several benchmarks as well as real world datasets show that our proposed framework can both effectively and efficiently leverage domain knowledge to achieve superior performance compared to purely data-driven learning approaches.

Updated: 2024-05-03 18:14:29

标题: 一个用于人类辅助学习概率电路的统一框架

摘要: 概率电路（PCs）已经成为一种有效的框架，用于表示和学习复杂的概率分布。然而，PCs的现有研究主要集中在数据驱动的参数学习上，往往忽视了知识密集型学习的潜力，这在数据稀缺/知识丰富的领域，如医疗保健领域，是一个特殊问题。为了弥补这一差距，我们提出了一个新颖的统一框架，可以系统地将各种领域知识整合到PCs的参数学习过程中。对几个基准测试和真实世界数据集的实验表明，我们提出的框架既能有效又能高效地利用领域知识，实现比纯粹的数据驱动学习方法更优越的性能。

更新时间: 2024-05-03 18:14:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.02413v1

Deep Learning and Transfer Learning Architectures for English Premier League Player Performance Forecasting

This paper presents a groundbreaking model for forecasting English Premier League (EPL) player performance using convolutional neural networks (CNNs). We evaluate Ridge regression, LightGBM and CNNs on the task of predicting upcoming player FPL score based on historical FPL data over the previous weeks. Our baseline models, Ridge regression and LightGBM, achieve solid performance and emphasize the importance of recent FPL points, influence, creativity, threat, and playtime in predicting EPL player performances. Our optimal CNN architecture achieves better performance with fewer input features and even outperforms the best previous EPL player performance forecasting models in the literature. The optimal CNN architecture also achieves very strong Spearman correlation with player rankings, indicating its strong implications for supporting the development of FPL artificial intelligence (AI) Agents and providing analysis for FPL managers. We additionally perform transfer learning experiments on soccer news data collected from The Guardian, for the same task of predicting upcoming player score, but do not identify a strong predictive signal in natural language news texts, achieving worse performance compared to both the CNN and baseline models. Overall, our CNN-based approach marks a significant advancement in EPL player performance forecasting and lays the foundation for transfer learning to other EPL prediction tasks such as win-loss odds for sports betting and the development of cutting-edge FPL AI Agents.

Updated: 2024-05-03 18:13:52

标题: 深度学习和迁移学习架构用于英超球员表现预测

摘要: 这篇论文提出了一种开创性的模型，利用卷积神经网络（CNN）来预测英超联赛（EPL）球员的表现。我们评估了岭回归、LightGBM和CNN在根据前几周的历史FPL数据预测即将到来的球员FPL得分的任务上的表现。我们的基准模型，岭回归和LightGBM，在强调近期FPL积分、影响力、创造力、威胁和上场时间在预测EPL球员表现方面的重要性的同时，取得了良好的表现。我们的最佳CNN架构通过较少的输入特征实现了更好的性能，甚至超过了文献中最佳的先前EPL球员表现预测模型。最佳CNN架构还与球员排名实现了非常强的Spearman相关性，表明其对支持FPL人工智能（AI）代理的发展和为FPL经理提供分析具有重要意义。我们还对从《卫报》收集的足球新闻数据进行了迁移学习实验，用于预测即将到来的球员得分，但并未在自然语言新闻文本中找到强有力的预测信号，与CNN和基准模型相比表现更差。总的来说，我们基于CNN的方法标志着EPL球员表现预测的重大进展，并为将来进行其他EPL预测任务的迁移学习（如体育博彩的胜负赔率和先进FPL AI代理的开发）奠定了基础。

更新时间: 2024-05-03 18:13:52

领域: cs.LG,68T07

下载: http://arxiv.org/abs/2405.02412v1

Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models. Vibe-Eval consists of 269 visual understanding prompts, including 100 of hard difficulty, complete with gold-standard responses authored by experts. Vibe-Eval is open-ended and challenging with dual objectives: (i) vibe checking multimodal chat models for day-to-day tasks and (ii) rigorously testing and probing the capabilities of present frontier models. Notably, our hard set contains >50% questions that all frontier models answer incorrectly. We explore the nuances of designing, evaluating, and ranking models on ultra challenging prompts. We also discuss trade-offs between human and automatic evaluation, and show that automatic model evaluation using Reka Core roughly correlates to human judgment. We offer free API access for the purpose of lightweight evaluation and plan to conduct formal human evaluations for public models that perform well on the Vibe-Eval's automatic scores. We release the evaluation code and data, see https://github.com/reka-ai/reka-vibe-eval

Updated: 2024-05-03 17:59:55

标题: Vibe-Eval：用于衡量多模态语言模型进展的严格评估套件

摘要: 我们介绍了Vibe-Eval：一个用于评估多模态聊天模型的新的开放基准和框架。Vibe-Eval包括269个视觉理解提示，其中包括100个困难难度的提示，完整地提供了由专家编写的标准答案。Vibe-Eval是开放且具有挑战性的，具有双重目标：（i）用于日常任务的多模态聊天模型的心情检查，以及（ii）严格测试和探索当前前沿模型的能力。值得注意的是，我们的困难集合包含超过50%的问题，所有前沿模型都回答错误。我们探讨了在极具挑战性的提示上设计、评估和排名模型的微妙之处。我们还讨论了人类和自动评估之间的权衡，并展示了使用Reka Core进行自动模型评估大致与人类判断相关。我们提供免费的API访问，用于轻量级评估，并计划对在Vibe-Eval的自动评分上表现良好的公共模型进行正式的人类评估。我们发布了评估代码和数据，请访问https://github.com/reka-ai/reka-vibe-eval。

更新时间: 2024-05-03 17:59:55

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.02287v1

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission Grade School Math 1000 (GSM1k). GSM1k is designed to mirror the style and complexity of the established GSM8k benchmark, the gold standard for measuring elementary mathematical reasoning. We ensure that the two benchmarks are comparable across important metrics such as human solve rates, number of steps in solution, answer magnitude, and more. When evaluating leading open- and closed-source LLMs on GSM1k, we observe accuracy drops of up to 13%, with several families of models (e.g., Phi and Mistral) showing evidence of systematic overfitting across almost all model sizes. At the same time, many models, especially those on the frontier, (e.g., Gemini/GPT/Claude) show minimal signs of overfitting. Further analysis suggests a positive relationship (Spearman's r^2=0.32) between a model's probability of generating an example from GSM8k and its performance gap between GSM8k and GSM1k, suggesting that many models may have partially memorized GSM8k.

Updated: 2024-05-03 17:53:26

标题: 一个对大型语言模型在小学算术表现进行仔细检查的研究

摘要: 大型语言模型（LLMs）在许多数学推理基准上取得了令人印象深刻的成功。然而，越来越多的人担心，其中一些性能实际上反映了数据集污染，即类似基准问题的数据泄漏到训练数据中，而非真正的推理能力。为了严格调查这一主张，我们委托进行Grade School Math 1000（GSM1k）研究。GSM1k旨在模拟已建立的GSM8k基准的风格和复杂性，GSM8k是衡量初等数学推理的黄金标准。我们确保这两个基准在人类解题率、解决步骤数量、答案大小等重要指标上是可比较的。在评估领先的开源和闭源LLMs在GSM1k上时，我们观察到高达13%的准确率下降，其中几个模型家族（例如Phi和Mistral）显示出几乎所有模型尺寸都存在系统性过拟合的证据。同时，许多模型，特别是那些处于前沿的模型（例如Gemini/GPT/Claude），显示出最小的过拟合迹象。进一步的分析表明，模型生成来自GSM8k的示例的概率与其在GSM8k和GSM1k之间的性能差距之间存在正相关关系（Spearman's r^2=0.32），这表明许多模型可能部分记忆了GSM8k。

更新时间: 2024-05-03 17:53:26

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.00332v3

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (e.g., LLaMA-2) are still far away from satisfactory for solving mathematical problem due to the complex reasoning procedures. To bridge this gap, we propose MetaMath, a fine-tuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives without extra knowledge, which results in a new dataset called MetaMathQA. Then we fine-tune the LLaMA-2 models on MetaMathQA. Experimental results on two popular benchmarks (i.e., GSM8K and MATH) for mathematical reasoning demonstrate that MetaMath outperforms a suite of open-source LLMs by a significant margin. Our MetaMath-7B model achieves 66.4% on GSM8K and 19.4% on MATH, exceeding the state-of-the-art models of the same size by 11.5% and 8.7%. Particularly, MetaMath-70B achieves an accuracy of 82.3% on GSM8K, slightly better than GPT-3.5-Turbo. We release all the MetaMathQA dataset, the MetaMath models with different model sizes and the training code for public use.

Updated: 2024-05-03 17:36:07

标题: 元数学：为大型语言模型打造自己的数学问题

摘要: 大型语言模型（LLMs）推动了自然语言理解的极限，并展示了出色的问题解决能力。尽管取得了巨大成功，大多数现有的开源LLMs（例如LLaMA-2）在解决数学问题方面仍然远未令人满意，这是由于复杂的推理过程。为了弥补这一差距，我们提出了MetaMath，这是一个专门用于数学推理的精细调优语言模型。具体而言，我们通过从多个角度重写问题来引导数学问题，而无需额外知识，这导致了一个名为MetaMathQA的新数据集。然后我们在MetaMathQA上对LLaMA-2模型进行微调。数学推理的两个流行基准（即GSM8K和MATH）上的实验结果表明，MetaMath在性能上明显优于一系列开源LLMs。我们的MetaMath-7B模型在GSM8K上达到了66.4%的准确率，在MATH上达到了19.4%，超过了同等大小的最先进模型11.5%和8.7%。特别是，MetaMath-70B在GSM8K上达到了82.3%的准确率，略优于GPT-3.5-Turbo。我们发布了所有MetaMathQA数据集，不同模型大小的MetaMath模型以及用于公共使用的训练代码。

更新时间: 2024-05-03 17:36:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2309.12284v4

InceptionXML: A Lightweight Framework with Synchronized Negative Sampling for Short Text Extreme Classification

Automatic annotation of short-text data to a large number of target labels, referred to as Short Text Extreme Classification, has found numerous applications including prediction of related searches and product recommendation. In this paper, we propose a convolutional architecture InceptionXML which is light-weight, yet powerful, and robust to the inherent lack of word-order in short-text queries encountered in search and recommendation. We demonstrate the efficacy of applying convolutions by recasting the operation along the embedding dimension instead of the word dimension as applied in conventional CNNs for text classification. Towards scaling our model to datasets with millions of labels, we also propose SyncXML pipeline which improves upon the shortcomings of the recently proposed dynamic hard-negative mining technique for label short-listing by synchronizing the label-shortlister and extreme classifier. SyncXML not only reduces the inference time to half but is also an order of magnitude smaller than state-of-the-art Astec in terms of model size. Through a comprehensive empirical comparison, we show that not only can InceptionXML outperform existing approaches on benchmark datasets but also the transformer baselines requiring only 2% FLOPs. The code for InceptionXML is available at https://github.com/xmc-aalto/inceptionxml.

Updated: 2024-05-03 17:35:02

标题: InceptionXML：一种轻量级框架，具有同步负采样，用于短文本极端分类

摘要: 将短文本数据自动注释为大量目标标签，称为短文本极端分类，已被广泛应用，包括相关搜索和产品推荐的预测。在本文中，我们提出了一种轻量级、强大且对搜索和推荐中遇到的短文本查询中固有的缺乏词序具有鲁棒性的卷积架构InceptionXML。我们通过将卷积运算重新构建到嵌入维度上，而不是传统CNN用于文本分类中的词维度上，来展示应用卷积的有效性。为了将我们的模型扩展到拥有数百万个标签的数据集，我们还提出了SyncXML管道，通过同步标签筛选器和极端分类器来改进最近提出的动态硬负样本挖掘技术的缺点，以进行标签缩短。SyncXML不仅将推理时间减半，而且在模型大小方面比最先进的Astec小一个数量级。通过全面的实证比较，我们展示了InceptionXML不仅可以在基准数据集上胜过现有方法，还可以在只需要2% FLOPs的基线变压器上胜过变压器基线。InceptionXML的代码可在https://github.com/xmc-aalto/inceptionxml 上找到。

更新时间: 2024-05-03 17:35:02

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2109.07319v4

Structural Pruning of Pre-trained Language Models via Neural Architecture Search

Pre-trained language models (PLM), for example BERT or RoBERTa, mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data. However, their large size poses challenges in deploying them for inference in real-world applications, due to significant GPU memory requirements and high inference latency. This paper explores neural architecture search (NAS) for structural pruning to find sub-parts of the fine-tuned network that optimally trade-off efficiency, for example in terms of model size or latency, and generalization performance. We also show how we can utilize more recently developed two-stage weight-sharing NAS approaches in this setting to accelerate the search process. Unlike traditional pruning methods with fixed thresholds, we propose to adopt a multi-objective approach that identifies the Pareto optimal set of sub-networks, allowing for a more flexible and automated compression process.

Updated: 2024-05-03 17:34:57

标题: 通过神经架构搜索对预训练语言模型进行结构修剪

摘要: 预训练语言模型（PLM），例如BERT或RoBERTa，在标记过的数据上微调后，在自然语言理解任务中表现出最先进的技术水平。然而，由于其庞大的体积，在实际应用中部署它们进行推理存在挑战，这是由于显著的GPU内存需求和高推理延迟。本文探讨了神经架构搜索（NAS）用于结构剪枝，以找到微调网络的最佳折衷效率的子部分，例如模型大小或延迟以及泛化性能。我们还展示了如何在这种情况下利用最近开发的两阶段权重共享NAS方法加速搜索过程。与具有固定阈值的传统剪枝方法不同，我们提出采用多目标方法来识别帕累托最优集的子网络，从而实现更灵活和自动化的压缩过程。

更新时间: 2024-05-03 17:34:57

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.02267v1

Constrained Reinforcement Learning Under Model Mismatch

Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments. To address the above challenge, we formulate the problem as constrained RL under model uncertainty, where the goal is to learn a good policy that optimizes the reward and at the same time satisfy the constraint under model mismatch. We develop a Robust Constrained Policy Optimization (RCPO) algorithm, which is the first algorithm that applies to large/continuous state space and has theoretical guarantees on worst-case reward improvement and constraint violation at each iteration during the training. We demonstrate the effectiveness of our algorithm on a set of RL tasks with constraints.

Updated: 2024-05-03 17:24:11

标题: 受限制条件下的模型不匹配强化学习

摘要: 现有的关于受限强化学习（RL）的研究可能会在训练环境中获得一个表现良好的策略。然而，当部署到真实环境中时，由于训练和真实环境之间可能存在模型不匹配，可能很容易违反最初在训练过程中满足的约束。为了解决上述挑战，我们将问题规划为在模型不确定性下的受限RL，其目标是学习一种良好的策略，既优化奖励，同时满足模型不匹配下的约束。我们开发了一个鲁棒受限策略优化（RCPO）算法，这是第一个适用于大型/连续状态空间并在训练过程中每次迭代都对最坏情况下奖励提升和约束违反具有理论保证的算法。我们在一组带有约束的RL任务上展示了我们算法的有效性。

更新时间: 2024-05-03 17:24:11

领域: cs.LG

下载: http://arxiv.org/abs/2405.01327v2

Efficient Deep Learning with Decorrelated Backpropagation

The backpropagation algorithm remains the dominant and most successful method for training deep neural networks (DNNs). At the same time, training DNNs at scale comes at a significant computational cost and therefore a high carbon footprint. Converging evidence suggests that input decorrelation may speed up deep learning. However, to date, this has not yet translated into substantial improvements in training efficiency in large-scale DNNs. This is mainly caused by the challenge of enforcing fast and stable network-wide decorrelation. Here, we show for the first time that much more efficient training of very deep neural networks using decorrelated backpropagation is feasible. To achieve this goal we made use of a novel algorithm which induces network-wide input decorrelation using minimal computational overhead. By combining this algorithm with careful optimizations, we obtain a more than two-fold speed-up and higher test accuracy compared to backpropagation when training a 18-layer deep residual network. This demonstrates that decorrelation provides exciting prospects for efficient deep learning at scale.

Updated: 2024-05-03 17:21:13

标题: 高效的深度学习与去相关反向传播

摘要: 反向传播算法仍然是训练深度神经网络（DNNs）的主要和最成功的方法。与此同时，在规模上训练DNNs需要显著的计算成本，因此具有较高的碳足迹。越来越多的证据表明，输入去相关化可能会加快深度学习的速度。然而，迄今为止，这并没有在大规模DNNs的训练效率上产生实质性的改进。这主要是由于快速且稳定地实现整个网络的去相关性的挑战。在这里，我们首次展示了使用去相关化反向传播来训练非常深的神经网络更加高效是可行的。为了实现这一目标，我们利用了一种新颖的算法，通过最小的计算开销诱导整个网络的输入去相关化。通过将这种算法与细致的优化相结合，我们在训练一个18层深度残差网络时，相比于反向传播，获得了两倍以上的加速和更高的测试准确度。这表明去相关化为规模上高效的深度学习提供了令人兴奋的前景。

更新时间: 2024-05-03 17:21:13

领域: cs.LG

下载: http://arxiv.org/abs/2405.02385v1

Hysteresis Compensation of Flexible Continuum Manipulator using RGBD Sensing and Temporal Convolutional Network

Flexible continuum manipulators are valued for minimally invasive surgery, offering access to confined spaces through nonlinear paths. However, cable-driven manipulators face control difficulties due to hysteresis from cabling effects such as friction, elongation, and coupling. These effects are difficult to model due to nonlinearity and the difficulties become even more evident when dealing with long and coupled, multi-segmented manipulator. This paper proposes a data-driven approach based on Deep Neural Networks (DNN) to capture these nonlinear and previous states-dependent characteristics of cable actuation. We collect physical joint configurations according to command joint configurations using RGBD sensing and 7 fiducial markers to model the hysteresis of the proposed manipulator. Result on a study comparing the estimation performance of four DNN models show that the Temporal Convolution Network (TCN) demonstrates the highest predictive capability. Leveraging trained TCNs, we build a control algorithm to compensate for hysteresis. Tracking tests in task space using unseen trajectories show that the proposed control algorithm reduces the average position and orientation error by 61.39% (from 13.7mm to 5.29 mm) and 64.04% (from 31.17{\deg} to 11.21{\deg}), respectively. This result implies that the proposed calibrated controller effectively reaches the desired configurations by estimating the hysteresis of the manipulator. Applying this method in real surgical scenarios has the potential to enhance control precision and improve surgical performance.

Updated: 2024-05-03 17:19:31

标题: 使用RGBD感知和时间卷积网络对柔性连续机械手的滞后补偿

摘要: 柔性连续机械手臂因其在微创手术中的价值而备受青睐，通过非线性路径可进入受限空间。然而，由于摩擦、拉伸和耦合等电缆效应引起的滞后效应，电缆驱动机械手臂面临控制困难。这些效应由于非线性而难以建模，当处理长且耦合的多段机械手臂时，困难变得更加明显。本文提出了一种基于深度神经网络（DNN）的数据驱动方法，以捕捉电缆驱动特性的非线性和先前状态依赖特征。我们使用RGBD感应器和7个基准标记收集物理关节配置，以模拟所提出机械手臂的滞后效应。对比四个DNN模型的估计性能研究结果表明，时间卷积网络（TCN）表现出最高的预测能力。利用训练过的TCN，我们建立了一个控制算法来补偿滞后效应。在任务空间中使用未见轨迹进行跟踪测试，结果显示，所提出的控制算法将平均位置和方向误差分别降低了61.39%（从13.7毫米降至5.29毫米）和64.04%（从31.17°降至11.21°）。这一结果表明，所提出的校准控制器通过估计机械手臂的滞后效应有效地实现了期望的配置。将这种方法应用于实际手术场景中有潜力提高控制精度和改善手术表现。

更新时间: 2024-05-03 17:19:31

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2402.11319v3

BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity

Understanding the functional organization of higher visual cortex is a central focus in neuroscience. Past studies have primarily mapped the visual and semantic selectivity of neural populations using hand-selected stimuli, which may potentially bias results towards pre-existing hypotheses of visual cortex functionality. Moving beyond conventional approaches, we introduce a data-driven method that generates natural language descriptions for images predicted to maximally activate individual voxels of interest. Our method -- Semantic Captioning Using Brain Alignments ("BrainSCUBA") -- builds upon the rich embedding space learned by a contrastive vision-language model and utilizes a pre-trained large language model to generate interpretable captions. We validate our method through fine-grained voxel-level captioning across higher-order visual regions. We further perform text-conditioned image synthesis with the captions, and show that our images are semantically coherent and yield high predicted activations. Finally, to demonstrate how our method enables scientific discovery, we perform exploratory investigations on the distribution of "person" representations in the brain, and discover fine-grained semantic selectivity in body-selective areas. Unlike earlier studies that decode text, our method derives voxel-wise captions of semantic selectivity. Our results show that BrainSCUBA is a promising means for understanding functional preferences in the brain, and provides motivation for further hypothesis-driven investigation of visual cortex.

Updated: 2024-05-03 17:19:02

标题: BrainSCUBA: 视觉皮层选择性的细粒度自然语言标题

摘要: 理解高级视觉皮层的功能组织是神经科学的中心关注点。过去的研究主要使用手动选择的刺激来映射神经群体的视觉和语义选择性，这可能会对结果产生偏见，使其朝向对视觉皮层功能的预先存在的假设。超越传统方法，我们引入了一种数据驱动的方法，为预测最大程度激活感兴趣的个体体素生成图像的自然语言描述。我们的方法——利用脑结构的语义字幕生成（"BrainSCUBA"）——建立在对比视觉语言模型学习的丰富嵌入空间之上，并利用预训练的大规模语言模型生成可解释的字幕。我们通过对高级视觉区域进行精细的体素级字幕标注来验证我们的方法。我们进一步通过字幕进行文本条件的图像合成，并展示我们的图像在语义上具有连贯性，并产生高预测激活。最后，为了展示我们的方法如何促成科学发现，我们对大脑中“人物”表征的分布进行了探索性研究，并发现了身体选择性区域中的细粒度语义选择性。与以前解码文本的研究不同，我们的方法推导出对语义选择性的体素级字幕。我们的结果表明，BrainSCUBA是理解大脑功能偏好的有希望手段，并为进一步基于假设的对视觉皮层的研究提供了动力。

更新时间: 2024-05-03 17:19:02

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2310.04420v3

Learning of Sea Surface Height Interpolation from Multi-variate Simulated Satellite Observations

Satellite-based remote sensing missions have revolutionized our understanding of the Ocean state and dynamics. Among them, space-borne altimetry provides valuable measurements of Sea Surface Height (SSH), which is used to estimate surface geostrophic currents. Due to the sensor technology employed, important gaps occur in SSH observations. Complete SSH maps are produced using linear Optimal Interpolations (OI) such as the widely-used Data Unification and Altimeter Combination System (duacs). On the other hand, Sea Surface Temperature (SST) products have much higher data coverage and SST is physically linked to geostrophic currents through advection. We propose a new multi-variate Observing System Simulation Experiment (OSSE) emulating 20 years of SSH and SST satellite observations. We train an Attention-Based Encoder-Decoder deep learning network (abed) on this data, comparing two settings: one with access to ground truth during training and one without. On our OSSE, we compare abed reconstructions when trained using either supervised or unsupervised loss functions, with or without SST information. We evaluate the SSH interpolations in terms of eddy detection. We also introduce a new way to transfer the learning from simulation to observations by doing a supervised pre-training on our OSSE followed by an unsupervised fine-tuning on satellite data. On real SSH observations from the Ocean Data Challenge 2021, we find that this learning strategy combined with the use of SST leads to a decrease of 24% of the root mean squared error compared to duacs.

Updated: 2024-05-03 17:12:12

标题: 海面高度插值的学习来自多变量模拟卫星观测数据

摘要: 卫星遥感任务已经彻底改变了我们对海洋状态和动态的理解。其中，空间测高技术提供了有价值的海面高度（SSH）测量数据，用于估算地转流速度。由于传感器技术的限制，SSH观测存在重要的缺失。使用线性最优插值（OI）生成完整的SSH地图，如广泛使用的数据统一和测高计算系统（duacs）。另一方面，海表温度（SST）产品具有更高的数据覆盖率，并且SST与地转流速度通过平流有物理联系。我们提出了一个新的多变量观测系统模拟实验（OSSE），模拟了20年的SSH和SST卫星观测。我们在这些数据上训练了一个基于注意力的编码器-解码器深度学习网络（abed），比较了两种设置：一种在训练过程中有地面真相数据的访问，另一种没有。在我们的OSSE上，我们比较了使用监督或无监督损失函数进行训练的abed重建结果，有或没有SST信息。我们根据涡旋检测评估SSH插值。我们还介绍了一种新的方法，通过在我们的OSSE上进行监督预训练，然后在卫星数据上进行无监督微调，将模拟学习应用到实际观测中。在来自2021年海洋数据挑战赛的实际SSH观测中，我们发现这种学习策略结合SST的使用，使均方根误差比duacs减少了24%。

更新时间: 2024-05-03 17:12:12

领域: cs.LG

下载: http://arxiv.org/abs/2310.07626v2

What matters when building vision-language models?

The growing interest in vision-language models (VLMs) has been driven by improvements in large language models and vision transformers. Despite the abundance of literature on this subject, we observe that critical decisions regarding the design of VLMs are often not justified. We argue that these unsupported decisions impede progress in the field by making it difficult to identify which choices improve model performance. To address this issue, we conduct extensive experiments around pre-trained models, architecture choice, data, and training methods. Our consolidation of findings includes the development of Idefics2, an efficient foundational VLM of 8 billion parameters. Idefics2 achieves state-of-the-art performance within its size category across various multimodal benchmarks, and is often on par with models four times its size. We release the model (base, instructed, and chat) along with the datasets created for its training.

Updated: 2024-05-03 17:00:00

标题: 构建视觉语言模型时需要考虑什么？

摘要: 对视觉-语言模型（VLMs）日益增长的兴趣主要受到大型语言模型和视觉变换器的改进推动。尽管有大量关于这个主题的文献，但我们观察到关于VLMs设计的关键决策通常缺乏合理的理由。我们认为这些不受支持的决定阻碍了该领域的进展，因为很难确定哪些选择会提高模型性能。为了解决这个问题，我们围绕预训练模型、架构选择、数据和训练方法进行了广泛的实验。我们的研究结果包括开发了一个拥有80亿参数的高效基础VLM——Idefics2。Idefics2在其大小类别内在各种多模态基准测试中取得了最先进的性能，并且往往与其四倍大小的模型性能相当。我们发布了该模型（基础、指导和聊天版本），以及为其训练创建的数据集。

更新时间: 2024-05-03 17:00:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.02246v1

Subgraph2vec: A random walk-based algorithm for embedding knowledge graphs

Graph is an important data representation which occurs naturally in the real world applications \cite{goyal2018graph}. Therefore, analyzing graphs provides users with better insights in different areas such as anomaly detection \cite{ma2021comprehensive}, decision making \cite{fan2023graph}, clustering \cite{tsitsulin2023graph}, classification \cite{wang2021mixup} and etc. However, most of these methods require high levels of computational time and space. We can use other ways like embedding to reduce these costs. Knowledge graph (KG) embedding is a technique that aims to achieve the vector representation of a KG. It represents entities and relations of a KG in a low-dimensional space while maintaining the semantic meanings of them. There are different methods for embedding graphs including random walk-based methods such as node2vec, metapath2vec and regpattern2vec. However, most of these methods bias the walks based on a rigid pattern usually hard-coded in the algorithm. In this work, we introduce \textit{subgraph2vec} for embedding KGs where walks are run inside a user-defined subgraph. We use this embedding for link prediction and prove our method has better performance in most cases in comparison with the previous ones.

Updated: 2024-05-03 16:51:18

标题: 子图2vec：一种基于随机游走的算法用于嵌入知识图

摘要: 图是一种重要的数据表示形式，在现实世界的应用中自然而然地出现。因此，分析图提供了用户在不同领域如异常检测、决策制定、聚类、分类等方面更好的洞察力。然而，大多数这些方法需要高水平的计算时间和空间。我们可以利用嵌入等其他方法来降低这些成本。知识图（KG）嵌入是一种旨在实现KG的向量表示的技术。它在保持实体和关系的语义含义的同时，将KG的实体和关系表示在低维空间中。有不同的嵌入图的方法，包括基于随机游走的方法，如node2vec、metapath2vec和regpattern2vec。然而，大多数这些方法会基于通常在算法中硬编码的严格模式来偏向于游走。在这项工作中，我们介绍了用于嵌入KG的\textit{subgraph2vec}，其中游走在用户定义的子图内运行。我们使用这种嵌入进行链接预测，并证明我们的方法在大多数情况下比先前的方法具有更好的性能。

更新时间: 2024-05-03 16:51:18

领域: cs.LG

下载: http://arxiv.org/abs/2405.02240v1

Secure and Efficient General Matrix Multiplication On Cloud Using Homomorphic Encryption

Despite the cloud enormous technical and financial advantages, security and privacy have always been the primary concern for adopting cloud computing facility, especially for government agencies and commercial sectors with high-security requirements. Homomorphic Encryption (HE) has recently emerged as an effective tool in assuring privacy and security for sensitive applications by allowing computing on encrypted data. One major obstacle to employing HE-based computation, however, is its excessive computational cost, which is multiple magnitudes higher than its counterpart based on the plaintext. In this paper, we study the problem of how to reduce the HE-based computational cost for general Matrix Multiplication (MM), i.e., a fundamental building block for numerous practical applications, by taking advantage of the Single Instruction Multiple Data (SIMD) operation supported by HE schemes. Specifically, we develop a novel element-wise algorithm for general matrix multiplication, based on which we propose two HE-based General Matrix Multiplication (HEGMM) algorithms to reduce the HE computation cost. Our experimental results show that our algorithms can significantly outperform the state-of-the-art approaches of HE-based matrix multiplication.

Updated: 2024-05-03 16:50:02

标题: 在云上使用同态加密进行安全高效的通用矩阵乘法

摘要: 尽管云计算具有巨大的技术和财务优势，但安全和隐私一直是采用云计算设施的主要关注点，特别是对于具有高安全要求的政府机构和商业部门。同态加密（HE）最近出现作为一种有效的工具，通过允许对加密数据进行计算来确保敏感应用的隐私和安全性。然而，采用基于HE的计算的一个主要障碍是其过高的计算成本，远高于基于明文的对应成本。本文研究了如何通过利用HE方案支持的单指令多数据（SIMD）操作，来降低一般矩阵乘法（MM）的HE计算成本的问题，即许多实际应用的基本构建块。具体来说，我们开发了一种基于元素的新算法用于一般矩阵乘法，基于此，我们提出了两种HE基础的一般矩阵乘法（HEGMM）算法以减少HE计算成本。我们的实验结果显示，我们的算法可以显著优于基于HE的矩阵乘法的最新方法。

更新时间: 2024-05-03 16:50:02

领域: cs.CR

下载: http://arxiv.org/abs/2405.02238v1

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that separately encode speaker characteristics and linguistic content. However, disentangling speech representations to capture such attributes using task-specific loss terms can lead to information loss. In this work, instead of explicitly disentangling attributes with loss terms, we present a framework to train a controllable voice conversion model on entangled speech representations derived from self-supervised learning (SSL) and speaker verification models. First, we develop techniques to derive prosodic information from the audio signal and SSL representations to train predictive submodules in the synthesis model. Next, we propose a training strategy to iteratively improve the synthesis model for voice conversion, by creating a challenging training objective using self-synthesized examples. We demonstrate that incorporating such self-synthesized examples during training improves the speaker similarity of generated speech as compared to a baseline voice conversion model trained solely on heuristically perturbed inputs. Our framework is trained without any text and achieves state-of-the-art results in zero-shot voice conversion on metrics evaluating naturalness, speaker similarity, and intelligibility of synthesized audio.

Updated: 2024-05-03 16:45:39

标题: SelfVC：使用自身转换进行迭代细化的语音转换

摘要: 我们提出了SelfVC，这是一种训练策略，通过自我合成的示例迭代改进语音转换模型。先前在语音转换上的工作集中在将语音因子化为明确解缠的表示，这些表示分别编码说话者特征和语言内容。然而，使用任务特定的损失项来解缠语音表示以捕捉这些属性可能会导致信息损失。在这项工作中，我们提出了一个框架，通过自监督学习（SSL）和说话者验证模型产生的缠绕语音表示来训练可控语音转换模型。首先，我们开发了从音频信号和SSL表示中获取韵律信息的技术，以训练合成模型中的预测子模块。接下来，我们提出了一种训练策略，通过创建具有挑战性的训练目标使用自我合成的示例来迭代改进语音转换的合成模型。我们证明，在训练过程中加入这种自我合成的示例可以提高生成语音的说话者相似性，相较于仅在启发式扰动输入上训练的基线语音转换模型。我们的框架在没有任何文本的情况下进行训练，并在评估合成音频的自然性、说话者相似性和可懂性的指标上实现了最新技术水平的零样本语音转换结果。

更新时间: 2024-05-03 16:45:39

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2310.09653v2

Learning Optimal Deterministic Policies with Stochastic Policy Gradients

Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. They learn stochastic parametric (hyper)policies by either exploring in the space of actions or in the space of parameters. Stochastic controllers, however, are often undesirable from a practical perspective because of their lack of robustness, safety, and traceability. In common practice, stochastic (hyper)policies are learned only to deploy their deterministic version. In this paper, we make a step towards the theoretical understanding of this practice. After introducing a novel framework for modeling this scenario, we study the global convergence to the best deterministic policy, under (weak) gradient domination assumptions. Then, we illustrate how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy. Finally, we quantitatively compare action-based and parameter-based exploration, giving a formal guise to intuitive results.

Updated: 2024-05-03 16:45:15

标题: 使用随机策略梯度学习最优确定性策略

摘要: 政策梯度（PG）方法是处理连续强化学习（RL）问题的成功方法。它们通过在动作空间或参数空间中探索来学习随机参数化（超）策略。然而，从实际角度来看，随机控制器通常是不可取的，因为它们缺乏稳健性、安全性和可追溯性。在常见做法中，随机（超）策略仅用于部署它们的确定性版本。在本文中，我们向理论理解这种做法迈出了一步。在引入一个新的建模框架后，我们研究了在（弱）梯度支配假设下最佳确定性策略的全局收敛性。然后，我们说明了如何调整用于学习的探索水平，以优化样本复杂度和部署确定性策略的性能之间的权衡。最后，我们定量比较了基于动作和基于参数的探索，为直观结果提供了形式化的外观。

更新时间: 2024-05-03 16:45:15

领域: cs.LG

下载: http://arxiv.org/abs/2405.02235v1

FocusLearn: Fully-Interpretable, High-Performance Modular Neural Networks for Time Series

Multivariate time series have many applications, from healthcare and meteorology to life science. Although deep learning models have shown excellent predictive performance for time series, they have been criticised for being "black-boxes" or non-interpretable. This paper proposes a novel modular neural network model for multivariate time series prediction that is interpretable by construction. A recurrent neural network learns the temporal dependencies in the data while an attention-based feature selection component selects the most relevant features and suppresses redundant features used in the learning of the temporal dependencies. A modular deep network is trained from the selected features independently to show the users how features influence outcomes, making the model interpretable. Experimental results show that this approach can outperform state-of-the-art interpretable Neural Additive Models (NAM) and variations thereof in both regression and classification of time series tasks, achieving a predictive performance that is comparable to the top non-interpretable methods for time series, LSTM and XGBoost.

Updated: 2024-05-03 16:44:31

标题: FocusLearn：可完全解释的、高性能的模块化神经网络用于时间序列

摘要: 多元时间序列具有许多应用，从医疗保健和气象学到生命科学。尽管深度学习模型在时间序列的预测性能方面表现出色，但它们被批评为“黑匣子”或不可解释的。本文提出了一种新颖的模块化神经网络模型，用于多元时间序列预测，其建构具有解释性。一个递归神经网络学习数据中的时间依赖关系，而基于注意力的特征选择组件选择最相关的特征，并压制在学习时间依赖关系中使用的冗余特征。一个模块化深度网络从选择的特征中独立训练，向用户展示特征如何影响结果，使模型具有解释性。实验结果表明，这种方法在回归和分类时间序列任务中可以胜过最先进的可解释性神经添加模型（NAM）及其变体，实现了与顶级不可解释方法LSTM和XGBoost在时间序列方面相当的预测性能。

更新时间: 2024-05-03 16:44:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.16834v4

REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs

Automatic citation generation for sentences in a document or report is paramount for intelligence analysts, cybersecurity, news agencies, and education personnel. In this research, we investigate whether large language models (LLMs) are capable of generating references based on two forms of sentence queries: (a) Direct Queries, LLMs are asked to provide author names of the given research article, and (b) Indirect Queries, LLMs are asked to provide the title of a mentioned article when given a sentence from a different article. To demonstrate where LLM stands in this task, we introduce a large dataset called REASONS comprising abstracts of the 12 most popular domains of scientific research on arXiv. From around 20K research articles, we make the following deductions on public and proprietary LLMs: (a) State-of-the-art, often called anthropomorphic GPT-4 and GPT-3.5, suffers from high pass percentage (PP) to minimize the hallucination rate (HR). When tested with Perplexity.ai (7B), they unexpectedly made more errors; (b) Augmenting relevant metadata lowered the PP and gave the lowest HR; (c) Advance retrieval-augmented generation (RAG) using Mistral demonstrates consistent and robust citation support on indirect queries and matched performance to GPT-3.5 and GPT-4. The HR across all domains and models decreased by an average of 41.93% and the PP was reduced to 0% in most cases. In terms of generation quality, the average F1 Score and BLEU were 68.09% and 57.51%, respectively; (d) Testing with adversarial samples showed that LLMs, including the Advance RAG Mistral, struggle to understand context, but the extent of this issue was small in Mistral and GPT-4-Preview. Our study con tributes valuable insights into the reliability of RAG for automated citation generation tasks.

Updated: 2024-05-03 16:38:51

标题: REASONS：使用公共和专有LLMs检索和自动引用科学句子的基准测试

摘要: 本研究探讨了大型语言模型（LLMs）是否能够基于两种形式的句子查询生成引用：（a）直接查询，要求LLMs提供给定研究文章的作者姓名；（b）间接查询，要求LLMs在给定不同文章的句子时提供提到文章的标题。为了展示LLM在这项任务中的表现，我们引入了一个名为REASONS的大型数据集，包括arXiv上12个最受欢迎领域的科学研究摘要。从大约20,000篇研究文章中，我们得出以下关于公共和专有LLMs的结论：（a）最先进的，通常被称为类人GPT-4和GPT-3.5，在最小化幻觉率（HR）方面表现不佳。在使用Perplexity.ai（7B）进行测试时，它们意外地产生了更多错误；（b）增加相关元数据降低了PP并使HR最低；（c）使用Mistral进行高级检索增强生成（RAG）表现出对间接查询的一致和稳健的引文支持，并与GPT-3.5和GPT-4的性能相匹配。在所有领域和模型中，HR平均减少了41.93％，PP在大多数情况下降至0％。就生成质量而言，平均F1得分和BLEU分别为68.09％和57.51％；（d）对对抗样本进行测试表明，LLMs，包括Advance RAG Mistral，在理解上存在困难，但在Mistral和GPT-4-Preview中这个问题的程度较小。我们的研究为自动引文生成任务的可靠性提供了宝贵的见解。

更新时间: 2024-05-03 16:38:51

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2405.02228v1

Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks

This paper introduces a framework for post-processing machine learning models so that their predictions satisfy multi-group fairness guarantees. Based on the celebrated notion of multicalibration, we introduce $(\mathbf{s},\mathcal{G}, \alpha)-$GMC (Generalized Multi-Dimensional Multicalibration) for multi-dimensional mappings $\mathbf{s}$, constraint set $\mathcal{G}$, and a pre-specified threshold level $\alpha$. We propose associated algorithms to achieve this notion in general settings. This framework is then applied to diverse scenarios encompassing different fairness concerns, including false negative rate control in image segmentation, prediction set conditional uncertainty quantification in hierarchical classification, and de-biased text generation in language models. We conduct numerical studies on several datasets and tasks.

Updated: 2024-05-03 16:32:09

标题: 公平风险控制：用于校准多组公平风险的通用框架

摘要: 这篇论文介绍了一个框架，用于后处理机器学习模型，以确保它们的预测符合多组公平性保证。基于著名的多校准概念，我们引入了$(\mathbf{s},\mathcal{G}, \alpha)-$GMC（广义多维多校准）用于多维映射$\mathbf{s}$，约束集$\mathcal{G}$和预先指定的阈值水平$\alpha$。我们提出了相关算法来在一般设置中实现这个概念。然后将该框架应用于涵盖不同公平性关注的多种场景，包括图像分割中的假阴性率控制、层次分类中的预测集条件不确定性量化，以及语言模型中的去偏文本生成。我们在几个数据集和任务上进行了数值研究。

更新时间: 2024-05-03 16:32:09

领域: stat.ML,cs.AI,cs.CY,cs.LG,stat.ME

下载: http://arxiv.org/abs/2405.02225v1

Privately Aligning Language Models with Reinforcement Learning

Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT. In this work, we initiate the study of privacy-preserving alignment of LLMs through Differential Privacy (DP) in conjunction with RL. Following the influential work of Ziegler et al. (2020), we study two dominant paradigms: (i) alignment via RL without human in the loop (e.g., positive review generation) and (ii) alignment via RL from human feedback (RLHF) (e.g., summarization in a human-preferred way). We give a new DP framework to achieve alignment via RL, and prove its correctness. Our experimental results validate the effectiveness of our approach, offering competitive utility while ensuring strong privacy protections.

Updated: 2024-05-03 16:30:01

标题: 使用强化学习私下调整语言模型

摘要: 处于预训练和用户部署之间的大型语言模型（LLMs）通过强化学习（RL）进行对齐已成为训练指导模型（如ChatGPT）的主要策略。在这项工作中，我们通过差分隐私（DP）开展了LLMs的隐私保护对齐研究与RL相结合。在借鉴Ziegler等人（2020）的重要工作之后，我们研究了两种主要范式：（i）不涉及人类参与的RL对齐（例如积极的评论生成）和（ii）通过人类反馈（RLHF）进行RL对齐（例如以人类偏好的方式进行摘要）。我们提出了一个新的DP框架来通过RL实现对齐，并证明了其正确性。我们的实验结果验证了我们的方法的有效性，提供了竞争性效用，同时确保了强大的隐私保护。

更新时间: 2024-05-03 16:30:01

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2310.16960v2

Discretization Error of Fourier Neural Operators

Operator learning is a variant of machine learning that is designed to approximate maps between function spaces from data. The Fourier Neural Operator (FNO) is a common model architecture used for operator learning. The FNO combines pointwise linear and nonlinear operations in physical space with pointwise linear operations in Fourier space, leading to a parameterized map acting between function spaces. Although FNOs formally involve convolutions of functions on a continuum, in practice the computations are performed on a discretized grid, allowing efficient implementation via the FFT. In this paper, the aliasing error that results from such a discretization is quantified and algebraic rates of convergence in terms of the grid resolution are obtained as a function of the regularity of the input. Numerical experiments that validate the theory and describe model stability are performed.

Updated: 2024-05-03 16:28:05

标题: Fourier神经算子的离散化误差

摘要: Operator learning是一种机器学习的变体，旨在从数据中逼近函数空间之间的映射。傅里叶神经算子（FNO）是一种常见的运算学习模型架构。FNO在物理空间中结合了逐点线性和非线性操作，以及在傅里叶空间中的逐点线性操作，从而导致一个在函数空间之间起作用的参数化映射。尽管FNO在形式上涉及连续函数的卷积，但在实践中，计算是在离散化的网格上进行的，通过快速傅里叶变换（FFT）实现高效计算。本文量化了由此离散化导致的混叠误差，并根据输入的正则性，得到了关于网格分辨率的代数收敛率。进行了验证理论并描述模型稳定性的数值实验。

更新时间: 2024-05-03 16:28:05

领域: math.NA,cs.LG,cs.NA,41A35 (Primary) 65T50, 68T07 (Secondary)

下载: http://arxiv.org/abs/2405.02221v1

Designed Dithering Sign Activation for Binary Neural Networks

Binary Neural Networks emerged as a cost-effective and energy-efficient solution for computer vision tasks by binarizing either network weights or activations. However, common binary activations, such as the Sign activation function, abruptly binarize the values with a single threshold, losing fine-grained details in the feature outputs. This work proposes an activation that applies multiple thresholds following dithering principles, shifting the Sign activation function for each pixel according to a spatially periodic threshold kernel. Unlike literature methods, the shifting is defined jointly for a set of adjacent pixels, taking advantage of spatial correlations. Experiments over the classification task demonstrate the effectiveness of the designed dithering Sign activation function as an alternative activation for binary neural networks, without increasing the computational cost. Further, DeSign balances the preservation of details with the efficiency of binary operations.

Updated: 2024-05-03 16:27:39

标题: 为二进制神经网络设计抖动符号激活

摘要: 二进制神经网络以二值化网络权重或激活作为一种经济高效的解决方案出现在计算机视觉任务中。然而，常见的二值激活函数，如Sign激活函数，通过单一阈值突然二值化数值，导致特征输出中丢失了细粒度细节。本文提出了一种激活函数，根据抖动原理应用多个阈值，根据空间周期阈值核为每个像素移动Sign激活函数。与文献方法不同，这种移动是针对一组相邻像素共同定义的，利用了空间相关性。对分类任务的实验表明，设计的抖动Sign激活函数作为二进制神经网络的替代激活函数具有有效性，而不增加计算成本。此外，DeSign平衡了细节保留和二进制操作的效率。

更新时间: 2024-05-03 16:27:39

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.02220v1

GReAT: A Graph Regularized Adversarial Training Method

This paper presents GReAT (Graph Regularized Adversarial Training), a novel regularization method designed to enhance the robust classification performance of deep learning models. Adversarial examples, characterized by subtle perturbations that can mislead models, pose a significant challenge in machine learning. Although adversarial training is effective in defending against such attacks, it often overlooks the underlying data structure. In response, GReAT integrates graph based regularization into the adversarial training process, leveraging the data's inherent structure to enhance model robustness. By incorporating graph information during training, GReAT defends against adversarial attacks and improves generalization to unseen data. Extensive evaluations on benchmark datasets demonstrate that GReAT outperforms state of the art methods in robustness, achieving notable improvements in classification accuracy. Specifically, compared to the second best methods, GReAT achieves a performance increase of approximately 4.87% for CIFAR10 against FGSM attack and 10.57% for SVHN against FGSM attack. Additionally, for CIFAR10, GReAT demonstrates a performance increase of approximately 11.05% against PGD attack, and for SVHN, a 5.54% increase against PGD attack. This paper provides detailed insights into the proposed methodology, including numerical results and comparisons with existing approaches, highlighting the significant impact of GReAT in advancing the performance of deep learning models.

Updated: 2024-05-03 16:23:58

标题: GReAT：一种图正则化对抗训练方法

摘要: 本文介绍了GReAT（图正则化对抗训练），这是一种旨在增强深度学习模型鲁棒分类性能的新型正则化方法。对抗示例以微妙的扰动为特征，可以误导模型，在机器学习中构成了重要挑战。尽管对抗训练在抵御此类攻击方面是有效的，但往往忽视了底层数据结构。作为回应，GReAT将基于图的正则化集成到对抗训练过程中，利用数据固有的结构来增强模型的鲁棒性。通过在训练过程中整合图信息，GReAT可以抵御对抗攻击并提高对未见数据的泛化能力。在基准数据集上进行了广泛评估，结果表明GReAT在鲁棒性方面优于最先进的方法，分类准确率有显著提升。具体而言，与次优方法相比，对于CIFAR10，GReAT在FGSM攻击下的性能提高约4.87%，在SVHN下提高10.57%。此外，对于CIFAR10，GReAT在PGD攻击下表现出约11.05%的性能提升，对于SVHN，在PGD攻击下提高了5.54%。本文提供了对所提出方法的详细见解，包括数值结果和与现有方法的比较，突出了GReAT在提高深度学习模型性能方面的显著影响。

更新时间: 2024-05-03 16:23:58

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2310.05336v2

From Explainable to Interpretable Deep Learning for Natural Language Processing in Healthcare: How Far from Reality?

Deep learning (DL) has substantially enhanced natural language processing (NLP) in healthcare research. However, the increasing complexity of DL-based NLP necessitates transparent model interpretability, or at least explainability, for reliable decision-making. This work presents a thorough scoping review of explainable and interpretable DL in healthcare NLP. The term "eXplainable and Interpretable Artificial Intelligence" (XIAI) is introduced to distinguish XAI from IAI. Different models are further categorized based on their functionality (model-, input-, output-based) and scope (local, global). Our analysis shows that attention mechanisms are the most prevalent emerging IAI technique. The use of IAI is growing, distinguishing it from XAI. The major challenges identified are that most XIAI does not explore "global" modelling processes, the lack of best practices, and the lack of systematic evaluation and benchmarks. One important opportunity is to use attention mechanisms to enhance multi-modal XIAI for personalized medicine. Additionally, combining DL with causal logic holds promise. Our discussion encourages the integration of XIAI in Large Language Models (LLMs) and domain-specific smaller models. In conclusion, XIAI adoption in healthcare requires dedicated in-house expertise. Collaboration with domain experts, end-users, and policymakers can lead to ready-to-use XIAI methods across NLP and medical tasks. While challenges exist, XIAI techniques offer a valuable foundation for interpretable NLP algorithms in healthcare.

Updated: 2024-05-03 16:20:02

标题: 从可解释到可解释的深度学习在医疗自然语言处理中：离现实有多远？

摘要: 深度学习（DL）已经极大地提升了在医疗研究中的自然语言处理（NLP）。然而，基于DL的NLP日益复杂，需要透明的模型可解释性，或至少能解释性，以支持可靠的决策制定。本文对医疗NLP中可解释和可解释的DL进行了全面的范围评估。引入了“可解释和可解释的人工智能”（XIAI）这一术语，以区分XAI和IAI。根据功能（基于模型、输入、输出）和范围（局部、全局），进一步对不同模型进行分类。我们的分析显示，注意机制是最普遍的新兴IAI技术。IAI的使用正在增长，与XAI有所区别。识别的主要挑战是大多数XIAI并未探索“全局”建模过程，缺乏最佳实践以及缺乏系统评估和基准。一个重要的机会是利用注意机制来增强个性化医学的多模态XIAI。此外，将DL与因果逻辑结合具有潜力。我们的讨论鼓励将XIAI整合到大型语言模型（LLMs）和领域特定的较小模型中。总之，在医疗领域采用XIAI需要专注内部专业知识。与领域专家、最终用户和政策制定者的合作可以实现准备就绪的XIAI方法，涵盖NLP和医学任务。尽管存在挑战，XIAI技术为医疗领域的可解释NLP算法提供了宝贵的基础。

更新时间: 2024-05-03 16:20:02

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.11894v2

Automatic Programming: Large Language Models and Beyond

Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related issues of programmer responsibility. These are key issues for organizations while deciding on the usage of automatically generated code. We discuss how advances in software engineering such as program repair and analysis can enable automatic programming. We conclude with a forward looking view, focusing on the programming environment of the near future, where programmers may need to switch to different roles to fully utilize the power of automatic programming. Automated repair of automatically generated programs from LLMs, can help produce higher assurance code from LLMs, along with evidence of assurance

Updated: 2024-05-03 16:19:24

标题: 自动编程：大型语言模型及其进展

摘要: 自动编程因GitHub Copilot等工具的出现而变得越来越受欢迎，这些工具依赖于大型语言模型（LLMs）。与此同时，自动生成的代码在部署过程中面临质量和信任方面的挑战。在本文中，我们从一个总体意义上研究自动编码，并研究围绕代码质量、安全性以及程序员责任等问题的关注点。这些是组织在决定使用自动生成的代码时的关键问题。我们讨论了软件工程方面的进展，比如程序修复和分析如何促进自动编程。最后，我们以一个前瞻性的视角结束，着重关注未来即将到来的编程环境，程序员可能需要转变角色以充分利用自动编程的力量。从LLMs自动生成的程序的自动修复可以帮助生成更高保障性的代码，同时提供保障的证据。

更新时间: 2024-05-03 16:19:24

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.02213v1

LangProp: A code optimization framework using Large Language Models applied to driving

We propose LangProp, a framework for iteratively optimizing code generated by large language models (LLMs), in both supervised and reinforcement learning settings. While LLMs can generate sensible coding solutions zero-shot, they are often sub-optimal. Especially for code generation tasks, it is likely that the initial code will fail on certain edge cases. LangProp automatically evaluates the code performance on a dataset of input-output pairs, catches any exceptions, and feeds the results back to the LLM in the training loop, so that the LLM can iteratively improve the code it generates. By adopting a metric- and data-driven training paradigm for this code optimization procedure, one could easily adapt findings from traditional machine learning techniques such as imitation learning, DAgger, and reinforcement learning. We show LangProp's applicability to general domains such as Sudoku and CartPole, as well as demonstrate the first proof of concept of automated code optimization for autonomous driving in CARLA. We show that LangProp can generate interpretable and transparent policies that can be verified and improved in a metric- and data-driven way. Our code is available at https://github.com/shuishida/LangProp.

Updated: 2024-05-03 16:15:45

标题: LangProp：应用于驾驶的大型语言模型的代码优化框架

摘要: 我们提出了LangProp，这是一个用于迭代优化由大型语言模型（LLMs）生成的代码的框架，无论是在监督学习还是强化学习环境中。虽然LLMs可以零-shot地生成合理的编码解决方案，但它们往往是次优的。特别是对于代码生成任务，初始代码很可能会在某些边缘情况下失败。LangProp自动评估数据集中输入输出对的代码性能，捕捉任何异常，并将结果反馈给LLM在训练循环中，以便LLM可以迭代改进其生成的代码。通过采用度量和数据驱动的训练范式来进行此代码优化过程，一个人可以轻松地从传统机器学习技术（如模仿学习、DAgger和强化学习）中适应发现。我们展示了LangProp在一般领域（如数独和CartPole）的适用性，以及在CARLA中自动驾驶的代码优化的第一个概念验证。我们展示了LangProp可以生成可解释和透明的策略，可以以度量和数据驱动的方式进行验证和改进。我们的代码可在https://github.com/shuishida/LangProp找到。

更新时间: 2024-05-03 16:15:45

领域: cs.SE,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2401.10314v2

A separability-based approach to quantifying generalization: which layer is best?

Generalization to unseen data remains poorly understood for deep learning classification and foundation models. How can one assess the ability of networks to adapt to new or extended versions of their input space in the spirit of few-shot learning, out-of-distribution generalization, and domain adaptation? Which layers of a network are likely to generalize best? We provide a new method for evaluating the capacity of networks to represent a sampled domain, regardless of whether the network has been trained on all classes in the domain. Our approach is the following: after fine-tuning state-of-the-art pre-trained models for visual classification on a particular domain, we assess their performance on data from related but distinct variations in that domain. Generalization power is quantified as a function of the latent embeddings of unseen data from intermediate layers for both unsupervised and supervised settings. Working throughout all stages of the network, we find that (i) high classification accuracy does not imply high generalizability; and (ii) deeper layers in a model do not always generalize the best, which has implications for pruning. Since the trends observed across datasets are largely consistent, we conclude that our approach reveals (a function of) the intrinsic capacity of the different layers of a model to generalize.

Updated: 2024-05-03 16:03:57

标题: 一种基于可分离性的方法来量化泛化能力：哪个层次最好？

摘要: 深度学习分类和基础模型对未见数据的泛化能力仍然知之甚少。如何评估网络适应新的或扩展的输入空间版本的能力，体现在少样本学习、超出分布泛化和领域适应方面？网络的哪些层可能有最好的泛化能力？我们提供了一种新方法，用于评估网络表示采样域的能力，无论网络是否已经在该域中的所有类别上进行了训练。我们的方法是：在对特定域的视觉分类进行最先进的预训练模型微调后，我们评估它们在该领域中相关但不同变化的数据上的表现。泛化能力被量化为中间层未见数据的潜在嵌入的函数，适用于无监督和监督设置。通过整个网络的所有阶段工作，我们发现：（i）高分类准确性并不意味着高泛化能力；（ii）模型中更深的层并不总是最好的泛化，这对修剪有影响。由于观察到的数据集间趋势大体一致，我们得出结论，我们的方法揭示了模型不同层的泛化能力的（某种）固有能力。

更新时间: 2024-05-03 16:03:57

领域: cs.LG,cs.AI,cs.CV,I.2.6; I.5.1; I.4.10

下载: http://arxiv.org/abs/2405.01524v2

Regularized Q-learning through Robust Averaging

We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We propose a distributionally robust estimator for the maximum expected value term, which allows us to precisely control the level of estimation bias introduced. The distributionally robust estimator admits a closed-form solution such that the proposed algorithm has a computational cost per iteration comparable to Watkins' Q-learning. For the tabular case, we show that 2RA Q-learning converges to the optimal policy and analyze its asymptotic mean-squared error. Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than existing methods.

Updated: 2024-05-03 15:57:26

标题: 通过鲁棒平均实现正则化的Q-learning

摘要: 我们提出了一种新的Q-learning变体，称为2RA Q-learning，以原则性地解决现有Q-learning方法的一些弱点。其中一种弱点是无法控制的基本估计偏差，通常导致性能不佳。我们提出了一个用于最大期望值项的分布鲁棒估计器，这使我们能够精确控制引入的估计偏差水平。分布鲁棒估计器允许闭合形式解决方案，从而使得所提出的算法的每次迭代的计算成本与Watkins的Q-learning相当。对于表格情况，我们展示了2RA Q-learning收敛到最优策略，并分析其渐近均方误差。最后，我们进行了各种设置的数值实验，证实了我们的理论发现，并表明2RA Q-learning通常比现有方法表现更好。

更新时间: 2024-05-03 15:57:26

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2405.02201v1

Position Paper: Rethinking Empirical Research in Machine Learning: Addressing Epistemic and Methodological Challenges of Experimentation

We warn against a common but incomplete understanding of empirical research in machine learning (ML) that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue most current empirical ML research is fashioned as confirmatory research while it should rather be considered exploratory.

Updated: 2024-05-03 15:57:22

标题: 立场文件：重新思考机器学习中的实证研究：应对实验的认识论和方法论挑战

摘要: 我们警告一种普遍但不完整的对机器学习（ML）中实证研究的理解，这种理解导致结果不可重复，使研究结果不可靠，并威胁着该领域的进展。为了克服这种令人担忧的局面，我们呼吁更多地意识到通过实验获得知识的多样性，同时也意识到一些认识论上的局限性。特别地，我们认为目前大多数的实证ML研究被设计为确认性研究，而实际上应该被视为探索性研究。

更新时间: 2024-05-03 15:57:22

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.02200v1

Public-private funding models in open source software development: A case study on scikit-learn

Governments are increasingly funding open source software (OSS) development to support software security, digital sovereignty, and national competitiveness in science and innovation, amongst others. However, little is known about how OSS developers evaluate the relative benefits and drawbacks of governmental funding for OSS. This study explores this question through a case study on scikit-learn, a Python library for machine learning, funded by public research grants, commercial sponsorship, micro-donations, and a 32 euro million grant announced in France's artificial intelligence strategy. Through 25 interviews with scikit-learn's maintainers and funders, this study makes two key contributions. First, it contributes empirical findings about the benefits and drawbacks of public and private funding in an impactful OSS project, and the governance protocols employed by the maintainers to balance the diverse interests of their community and funders. Second, it offers practical lessons on funding for OSS developers, governments, and companies based on the experience of scikit-learn. The paper concludes with key recommendations for practitioners and future research directions.

Updated: 2024-05-03 15:57:04

标题: 《开源软件开发中的公私融合资金模型：以scikit-learn为例的案例研究》

摘要: 政府越来越多地资助开源软件（OSS）开发，以支持软件安全、数字主权以及在科学和创新领域的国家竞争力等方面。然而，对于OSS开发人员如何评估政府资助OSS的相对利弊知之甚少。本研究通过对scikit-learn的案例研究来探讨这个问题，scikit-learn是一个用于机器学习的Python库，其资金来源包括公共研究资助、商业赞助、微捐款以及法国人工智能战略中宣布的3200万欧元资助。通过对scikit-learn的维护人员和资助方进行了25次访谈，本研究提供了两个关键贡献。首先，它提供了有关在一个具有影响力的OSS项目中公共和私人资助的利弊的实证发现，以及维护人员采用的治理协议来平衡社区和资助方的各种利益。其次，它基于scikit-learn的经验为OSS开发人员、政府和公司提供了实用的资金筹集经验教训。本文最后总结了对从业者和未来研究方向的关键建议。

更新时间: 2024-05-03 15:57:04

领域: cs.SE,cs.AI,cs.CY,cs.LG,K.4.1

下载: http://arxiv.org/abs/2404.06484v5

CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding

Predictive Coding (PC) is a theoretical framework in cognitive science suggesting that the human brain processes cognition through spatiotemporal prediction of the visual world. Existing studies have developed spatiotemporal prediction neural networks based on the PC theory, emulating its two core mechanisms: Correcting predictions from residuals and hierarchical learning. However, these models do not show the enhancement of prediction skills on real-world forecasting tasks and ignore the Precision Weighting mechanism of PC theory. The precision weighting mechanism posits that the brain allocates more attention to signals with lower precision, contributing to the cognitive ability of human brains. This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM), which demonstrate the connection between diffusion probabilistic models and PC theory. CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and weight the guidance with precision weights estimated by the inherent property of diffusion models. We experimentally show that the precision weights effectively estimate the data predictability. We apply CogDPM to real-world prediction tasks using the United Kindom precipitation and ERA surface wind datasets. Our results demonstrate that CogDPM outperforms both existing domain-specific operational models and general deep prediction models by providing more proficient forecasting.

Updated: 2024-05-03 15:54:50

标题: CogDPM：通过认知预测编码的扩散概率模型

摘要: 预测编码（PC）是认知科学中的一个理论框架，表明人类大脑通过对视觉世界的时空预测来处理认知。现有研究基于PC理论开发了时空预测神经网络，模拟其两个核心机制：根据残差修正预测和分层学习。然而，这些模型并未表现出在真实世界预测任务中预测技能的增强，并且忽略了PC理论的精度加权机制。精度加权机制认为，大脑将更多注意力分配给精度较低的信号，有助于人类大脑的认知能力。本文介绍了认知扩散概率模型（CogDPM），展示了扩散概率模型与PC理论之间的联系。CogDPM具有基于扩散模型的分层抽样能力的精度估计方法，并利用扩散模型固有属性估计的精度权重来指导。我们通过实验证明，精度权重有效地估计了数据的可预测性。我们将CogDPM应用于使用英国降水和ERA地表风数据集的真实世界预测任务。我们的结果表明，CogDPM通过提供更有效的预测，优于现有的领域特定操作模型和一般深度预测模型。

更新时间: 2024-05-03 15:54:50

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.02384v1

Impact of emoji exclusion on the performance of Arabic sarcasm detection models

The complex challenge of detecting sarcasm in Arabic speech on social media is increased by the language diversity and the nature of sarcastic expressions. There is a significant gap in the capability of existing models to effectively interpret sarcasm in Arabic, which mandates the necessity for more sophisticated and precise detection methods. In this paper, we investigate the impact of a fundamental preprocessing component on sarcasm speech detection. While emojis play a crucial role in mitigating the absence effect of body language and facial expressions in modern communication, their impact on automated text analysis, particularly in sarcasm detection, remains underexplored. We investigate the impact of emoji exclusion from datasets on the performance of sarcasm detection models in social media content for Arabic as a vocabulary-super rich language. This investigation includes the adaptation and enhancement of AraBERT pre-training models, specifically by excluding emojis, to improve sarcasm detection capabilities. We use AraBERT pre-training to refine the specified models, demonstrating that the removal of emojis can significantly boost the accuracy of sarcasm detection. This approach facilitates a more refined interpretation of language, eliminating the potential confusion introduced by non-textual elements. The evaluated AraBERT models, through the focused strategy of emoji removal, adeptly navigate the complexities of Arabic sarcasm. This study establishes new benchmarks in Arabic natural language processing and presents valuable insights for social media platforms.

Updated: 2024-05-03 15:51:02

标题: 表情符号排除对阿拉伯语讽刺检测模型性能的影响

摘要: 在社交媒体上检测阿拉伯语言的讽刺表达，由于语言多样性和讽刺表达的特性，增加了复杂的挑战。现有模型在阿拉伯语言的讽刺识别能力存在显著差距，这迫使我们需要更复杂、更精确的检测方法。本文研究了一个基础的预处理组件对讽刺言论检测的影响。虽然表情符号在缓解现代交流中身体语言和面部表情缺失的影响方面起着关键作用，但它们在自动化文本分析中，特别是在讽刺检测中的影响仍未被充分探讨。我们研究了从数据集中排除表情符号对阿拉伯社交媒体内容中讽刺检测模型性能的影响，因为阿拉伯语是一种词汇非常丰富的语言。这项研究包括对AraBERT预训练模型进行适应和增强，特别是通过排除表情符号来提高讽刺检测的能力。我们使用AraBERT预训练来完善指定的模型，结果表明排除表情符号可以显著提高讽刺检测的准确性。这种方法有助于更精细地解释语言，消除非文本元素引入的潜在混淆。通过专注于表情符号排除的策略，经过评估的AraBERT模型能够熟练应对阿拉伯语讽刺的复杂性。这项研究在阿拉伯自然语言处理领域建立了新的基准，并为社交媒体平台提供了有价值的见解。

更新时间: 2024-05-03 15:51:02

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.02195v1

Improving Interpretation Faithfulness for Vision Transformers

Vision Transformers (ViTs) have achieved state-of-the-art performance for various vision tasks. One reason behind the success lies in their ability to provide plausible innate explanations for the behavior of neural architectures. However, ViTs suffer from issues with explanation faithfulness, as their focal points are fragile to adversarial attacks and can be easily changed with even slight perturbations on the input image. In this paper, we propose a rigorous approach to mitigate these issues by introducing Faithful ViTs (FViTs). Briefly speaking, an FViT should have the following two properties: (1) The top-$k$ indices of its self-attention vector should remain mostly unchanged under input perturbation, indicating stable explanations; (2) The prediction distribution should be robust to perturbations. To achieve this, we propose a new method called Denoised Diffusion Smoothing (DDS), which adopts randomized smoothing and diffusion-based denoising. We theoretically prove that processing ViTs directly with DDS can turn them into FViTs. We also show that Gaussian noise is nearly optimal for both $\ell_2$ and $\ell_\infty$-norm cases. Finally, we demonstrate the effectiveness of our approach through comprehensive experiments and evaluations. Results show that FViTs are more robust against adversarial attacks while maintaining the explainability of attention, indicating higher faithfulness.

Updated: 2024-05-03 15:49:16

标题: 提高视觉变换器的解释忠实度

摘要: 视觉Transformer（ViTs）已经在各种视觉任务中取得了最先进的性能。成功的原因之一在于它们能够为神经架构的行为提供合理的内在解释。然而，ViTs在解释忠实度方面存在问题，因为它们的焦点对于对抗性攻击是脆弱的，并且即使在输入图像上进行轻微扰动也很容易改变。在本文中，我们提出了一种严格的方法来减轻这些问题，引入了忠实的ViTs（FViTs）。简而言之，一个FViT应该具有以下两个特性：（1）其自注意力向量的前k个索引在输入扰动下应该基本保持不变，表明解释是稳定的；（2）预测分布应该对扰动具有鲁棒性。为了实现这一点，我们提出了一种名为去噪扩散平滑（DDS）的新方法，采用了随机平滑和基于扩散的去噪。我们在理论上证明了直接使用DDS处理ViTs可以将它们转变成FViTs。我们还展示了高斯噪声对于$\ell_2$和$\ell_\infty$-范数情况都几乎是最优的。最后，我们通过全面的实验和评估展示了我们方法的有效性。结果表明，FViTs在抵抗对抗性攻击方面更加稳健，同时保持了注意力的可解释性，表明其具有更高的忠实度。

更新时间: 2024-05-03 15:49:16

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.17983v2

A Fresh Look at Sanity Checks for Saliency Maps

The Model Parameter Randomisation Test (MPRT) is highly recognised in the eXplainable Artificial Intelligence (XAI) community due to its fundamental evaluative criterion: explanations should be sensitive to the parameters of the model they seek to explain. However, recent studies have raised several methodological concerns for the empirical interpretation of MPRT. In response, we propose two modifications to the original test: Smooth MPRT and Efficient MPRT. The former reduces the impact of noise on evaluation outcomes via sampling, while the latter avoids the need for biased similarity measurements by re-interpreting the test through the increase in explanation complexity after full model randomisation. Our experiments show that these modifications enhance the metric reliability, facilitating a more trustworthy deployment of explanation methods.

Updated: 2024-05-03 15:47:32

标题: 一种新的用于显著图的合理性检查方法

摘要: 模型参数随机化测试（MPRT）在可解释人工智能（XAI）社区中得到高度认可，因为其基本评价标准是：解释应该对其试图解释的模型的参数敏感。然而，最近的研究提出了几个关于MPRT实证解释的方法论问题。为此，我们提出了对原始测试的两个修改：平滑MPRT和高效MPRT。前者通过采样减少了噪声对评估结果的影响，而后者通过在完全模型随机化后增加解释复杂性来避免对偏见相似性测量的需求。我们的实验表明，这些修改增强了指标的可靠性，促进了更可靠的解释方法的部署。

更新时间: 2024-05-03 15:47:32

领域: stat.ML,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.02383v1

Non-Destructive Peat Analysis using Hyperspectral Imaging and Machine Learning

Peat, a crucial component in whisky production, imparts distinctive and irreplaceable flavours to the final product. However, the extraction of peat disrupts ancient ecosystems and releases significant amounts of carbon, contributing to climate change. This paper aims to address this issue by conducting a feasibility study on enhancing peat use efficiency in whisky manufacturing through non-destructive analysis using hyperspectral imaging. Results show that shot-wave infrared (SWIR) data is more effective for analyzing peat samples and predicting total phenol levels, with accuracies up to 99.81%.

Updated: 2024-05-03 15:47:07

标题: 非破坏性利用高光谱成像和机器学习进行泥炭分析

摘要: 泥炭是威士忌生产中至关重要的组成部分，为最终产品赋予独特且不可替代的风味。然而，泥炭的开采破坏了古老的生态系统，并释放出大量碳，加剧了气候变化。本文旨在通过使用高光谱成像进行非破坏性分析，进行一项可行性研究，以提高威士忌制造中泥炭利用效率。结果表明，短波红外（SWIR）数据更有效地分析泥炭样品并预测总酚含量，准确率高达99.81%。

更新时间: 2024-05-03 15:47:07

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2405.02191v1

Policy design in experiments with unknown interference

This paper studies experimental designs for estimation and inference on policies with spillover effects. Units are organized into a finite number of large clusters and interact in unknown ways within each cluster. First, we introduce a single-wave experiment that, by varying the randomization across cluster pairs, estimates the marginal effect of a change in treatment probabilities, taking spillover effects into account. Using the marginal effect, we propose a test for policy optimality. Second, we design a multiple-wave experiment to estimate welfare-maximizing treatment rules. We provide strong theoretical guarantees and an implementation in a large-scale field experiment.

Updated: 2024-05-03 15:45:42

标题: 未知干扰条件下的实验政策设计

摘要: 这篇论文研究了针对具有溢出效应的政策的估计和推断的实验设计。单位被组织成有限数量的大集群，并在每个集群内以未知方式相互作用。首先，我们引入了一种单波实验，通过在集群对之间变化随机化，估计了考虑溢出效应的治疗概率变化的边际效应。利用边际效应，我们提出了一个测试政策最优性的方法。其次，我们设计了一个多波实验，以估计最大化福利的治疗规则。我们提供了强有力的理论保证，并在大规模现场实验中实施。

更新时间: 2024-05-03 15:45:42

领域: econ.EM,cs.LG,stat.ME

下载: http://arxiv.org/abs/2011.08174v9

Weisfeiler-Lehman goes Dynamic: An Analysis of the Expressive Power of Graph Neural Networks for Attributed and Dynamic Graphs

Graph Neural Networks (GNNs) are a large class of relational models for graph processing. Recent theoretical studies on the expressive power of GNNs have focused on two issues. On the one hand, it has been proven that GNNs are as powerful as the Weisfeiler-Lehman test (1-WL) in their ability to distinguish graphs. Moreover, it has been shown that the equivalence enforced by 1-WL equals unfolding equivalence. On the other hand, GNNs turned out to be universal approximators on graphs modulo the constraints enforced by 1-WL/unfolding equivalence. However, these results only apply to Static Attributed Undirected Homogeneous Graphs (SAUHG) with node attributes. In contrast, real-life applications often involve a much larger variety of graph types. In this paper, we conduct a theoretical analysis of the expressive power of GNNs for two other graph domains that are particularly interesting in practical applications, namely dynamic graphs and SAUGHs with edge attributes. Dynamic graphs are widely used in modern applications; hence, the study of the expressive capability of GNNs in this domain is essential for practical reasons and, in addition, it requires a new analyzing approach due to the difference in the architecture of dynamic GNNs compared to static ones. On the other hand, the examination of SAUHGs is of particular relevance since they act as a standard form for all graph types: it has been shown that all graph types can be transformed without loss of information to SAUHGs with both attributes on nodes and edges. This paper considers generic GNN models and appropriate 1-WL tests for those domains. Then, the known results on the expressive power of GNNs are extended to the mentioned domains: it is proven that GNNs have the same capability as the 1-WL test, the 1-WL equivalence equals unfolding equivalence and that GNNs are universal approximators modulo 1-WL/unfolding equivalence.

Updated: 2024-05-03 15:44:52

标题: 维斯费勒-莱曼动态化：对属性化和动态图的图神经网络表达能力分析

摘要: 图神经网络(GNNs)是用于图处理的一类关系模型。最近对GNNs表达能力的理论研究主要集中在两个问题上。一方面，已经证明GNNs在区分图形方面与Weisfeiler-Lehman测试(1-WL)一样强大。此外，已经表明1-WL强制的等价性等同于展开等价性。另一方面，GNNs被证明是在1-WL/展开等价性约束下的图形的通用逼近器。然而，这些结果仅适用于具有节点属性的静态属性无向同质图(SAUHG)。相比之下，现实生活中的应用通常涉及更多种类的图形。在本文中，我们对GNNs在两个在实际应用中特别有趣的图形领域的表达能力进行了理论分析，即动态图和具有边属性的SAUGHs。动态图在现代应用中被广泛使用；因此，研究GNNs在此领域的表达能力对于实际原因至关重要，此外，由于动态GNNs的架构与静态GNNs存在差异，因此需要一种新的分析方法。另一方面，对SAUHGs的检查具有特殊的相关性，因为它们充当所有图形类型的标准形式：已经表明所有图形类型都可以在不丢失信息的情况下转换为具有节点和边属性的SAUHGs。本文考虑了通用的GNN模型和适用于这些领域的适当的1-WL测试。然后，已知的关于GNNs表达能力的结果被扩展到上述领域：已经证明GNNs具有与1-WL测试相同的能力，1-WL等价性等同于展开等价性，并且GNNs在1-WL/展开等价性约束下是通用逼近器。

更新时间: 2024-05-03 15:44:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2210.03990v2

Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes

The Adversarial Markov Decision Process (AMDP) is a learning framework that deals with unknown and varying tasks in decision-making applications like robotics and recommendation systems. A major limitation of the AMDP formalism, however, is pessimistic regret analysis results in the sense that although the cost function can change from one episode to the next, the evolution in many settings is not adversarial. To address this, we introduce and study a new variant of AMDP, which aims to minimize regret while utilizing a set of cost predictors. For this setting, we develop a new policy search method that achieves a sublinear optimistic regret with high probability, that is a regret bound which gracefully degrades with the estimation power of the cost predictors. Establishing such optimistic regret bounds is nontrivial given that (i) as we demonstrate, the existing importance-weighted cost estimators cannot establish optimistic bounds, and (ii) the feedback model of AMDP is different (and more realistic) than the existing optimistic online learning works. Our result, in particular, hinges upon developing a novel optimistically biased cost estimator that leverages cost predictors and enables a high-probability regret analysis without imposing restrictive assumptions. We further discuss practical extensions of the proposed scheme and demonstrate its efficacy numerically.

Updated: 2024-05-03 15:44:31

标题: 对抗性马尔可夫决策过程中在线学习的乐观遗憾界限

摘要: Adversarial Markov Decision Process (AMDP)是一个处理在决策应用中的未知和变化任务的学习框架，比如机器人技术和推荐系统。然而，AMDP形式主义的一个主要限制是悲观的后悔分析结果，即使成本函数可以在一个周期内改变，但在许多情况下的演变并不是对抗性的。为了解决这个问题，我们介绍并研究了AMDP的一个新变体，旨在最小化后悔同时利用一组成本预测器。对于这种情况，我们开发了一种新的策略搜索方法，可以以高概率实现次线性的乐观后悔，即后悔界限会随着成本预测器的估计能力而逐渐降低。建立这样的乐观后悔界限并不容易，因为(i)正如我们展示的，现有的重要加权成本估计器不能建立乐观的界限，而且(ii)AMDP的反馈模型与现有的乐观在线学习工作有所不同（并且更加现实）。我们的研究特别依赖于开发一种新颖的乐观偏见成本估计器，利用成本预测器，并且在不强加限制性假设的情况下实现高概率的后悔分析。我们进一步讨论了所提出方案的实际扩展，并通过数值演示其有效性。

更新时间: 2024-05-03 15:44:31

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.02188v1

Accelerating Convergence in Bayesian Few-Shot Classification

Bayesian few-shot classification has been a focal point in the field of few-shot learning. This paper seamlessly integrates mirror descent-based variational inference into Gaussian process-based few-shot classification, addressing the challenge of non-conjugate inference. By leveraging non-Euclidean geometry, mirror descent achieves accelerated convergence by providing the steepest descent direction along the corresponding manifold. It also exhibits the parameterization invariance property concerning the variational distribution. Experimental results demonstrate competitive classification accuracy, improved uncertainty quantification, and faster convergence compared to baseline models. Additionally, we investigate the impact of hyperparameters and components. Code is publicly available at https://github.com/keanson/MD-BSFC.

Updated: 2024-05-03 15:43:57

标题: 加速贝叶斯少样本分类的收敛

摘要: 贝叶斯小样本分类一直是小样本学习领域的一个焦点。本文将基于镜像下降的变分推断无缝集成到基于高斯过程的小样本分类中，解决了非共轭推断的挑战。通过利用非欧几里得几何，镜像下降通过沿着相应流形提供最陡下降方向实现加速收敛。它还展示了关于变分分布的参数化不变性特性。实验结果显示，与基准模型相比，具有竞争力的分类准确性、改进的不确定性量化和更快的收敛速度。此外，我们还研究了超参数和组件的影响。代码可以在https://github.com/keanson/MD-BSFC 上公开获取。

更新时间: 2024-05-03 15:43:57

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.01507v2

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR). We learn "class-specific" queries (one for each class) as input to the decoder, enabling each class to localize its patterns in an image via cross-attention. We name our approach INterpretable TRansformer (INTR), which is fairly easy to implement and exhibits several compelling properties. We show that INTR intrinsically encourages each class to attend distinctively; the cross-attention weights thus provide a faithful interpretation of the prediction. Interestingly, via "multi-head" cross-attention, INTR could identify different "attributes" of a class, making it particularly suitable for fine-grained classification and analysis, which we demonstrate on eight datasets. Our code and pre-trained models are publicly accessible at the Imageomics Institute GitHub site: https://github.com/Imageomics/INTR.

Updated: 2024-05-03 15:33:36

标题: 一种简单易懂的Transformer模型用于细粒度图像分类和分析

摘要: 我们提出了一种新颖的Transformer的用法，使图像分类成为可解释的。与主流分类器不同，它们等到最后一个全连接层才将类信息纳入预测中，我们探索了一种主动的方法，要求每个类在图像中寻找自己。我们通过一个受DEtection TRansformer (DETR)启发的Transformer编码器-解码器实现了这一想法。我们学习了“特定于类别”的查询（每个类别一个），作为解码器的输入，使每个类别能够通过交叉注意力在图像中定位其模式。我们将我们的方法命名为INterpretable TRansformer (INTR)，这种方法实现起来相当简单，并展示了几个引人注目的特性。我们证明INTR在本质上鼓励每个类别进行独特的关注；因此，交叉注意力权重提供了对预测的忠实解释。有趣的是，通过“多头”交叉注意力，INTR可以识别类的不同“属性”，使其特别适用于细粒度分类和分析，我们在八个数据集上进行了演示。我们的代码和预训练模型可以在Imageomics Institute GitHub网站上公开访问：https://github.com/Imageomics/INTR。

更新时间: 2024-05-03 15:33:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2311.04157v2

Metalearners for Ranking Treatment Effects

Efficiently allocating treatments with a budget constraint constitutes an important challenge across various domains. In marketing, for example, the use of promotions to target potential customers and boost conversions is limited by the available budget. While much research focuses on estimating causal effects, there is relatively limited work on learning to allocate treatments while considering the operational context. Existing methods for uplift modeling or causal inference primarily estimate treatment effects, without considering how this relates to a profit maximizing allocation policy that respects budget constraints. The potential downside of using these methods is that the resulting predictive model is not aligned with the operational context. Therefore, prediction errors are propagated to the optimization of the budget allocation problem, subsequently leading to a suboptimal allocation policy. We propose an alternative approach based on learning to rank. Our proposed methodology directly learns an allocation policy by prioritizing instances in terms of their incremental profit. We propose an efficient sampling procedure for the optimization of the ranking model to scale our methodology to large-scale data sets. Theoretically, we show how learning to rank can maximize the area under a policy's incremental profit curve. Empirically, we validate our methodology and show its effectiveness in practice through a series of experiments on both synthetic and real-world data.

Updated: 2024-05-03 15:31:18

标题: 排序治疗效果的元学习者

摘要: 在预算约束下高效分配治疗是各个领域中的一个重要挑战。例如，在营销领域，利用促销活动来针对潜在客户并提高转化率受到可用预算的限制。虽然很多研究都集中在估计因果效应上，但对于在考虑运营背景的情况下学习分配治疗的工作相对较少。现有的提升建模或因果推断方法主要估计治疗效果，而不考虑这与尊重预算约束的最大化利润分配政策之间的关系。使用这些方法的潜在缺点是生成的预测模型与运营背景不一致。因此，预测错误传播到预算分配问题的优化中，随后导致次优的分配政策。我们提出了一种基于学习排序的替代方法。我们提出的方法直接通过优先考虑每个实例的增量利润来学习一种分配政策。我们提出了一种高效的采样过程，用于优化排名模型，以将我们的方法扩展到大规模数据集。在理论上，我们展示了学习排序如何最大化政策增量利润曲线下的面积。在经验上，我们验证了我们的方法，并通过一系列在合成和真实数据上的实验展示了其有效性。

更新时间: 2024-05-03 15:31:18

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.02183v1

Imitation Learning in Discounted Linear MDPs without exploration assumptions

We present a new algorithm for imitation learning in infinite horizon linear MDPs dubbed ILARL which greatly improves the bound on the number of trajectories that the learner needs to sample from the environment. In particular, we remove exploration assumptions required in previous works and we improve the dependence on the desired accuracy $\epsilon$ from $\mathcal{O}\br{\epsilon^{-5}}$ to $\mathcal{O}\br{\epsilon^{-4}}$. Our result relies on a connection between imitation learning and online learning in MDPs with adversarial losses. For the latter setting, we present the first result for infinite horizon linear MDP which may be of independent interest. Moreover, we are able to provide a strengthen result for the finite horizon case where we achieve $\mathcal{O}\br{\epsilon^{-2}}$. Numerical experiments with linear function approximation shows that ILARL outperforms other commonly used algorithms.

Updated: 2024-05-03 15:28:44

标题: 在没有探索假设的折扣线性MDPs中的模仿学习

摘要: 我们提出了一种新的算法，用于在无限时间线性MDPs中进行模仿学习，命名为ILARL，它大大改进了学习者需要从环境中采样的轨迹数量的界限。特别地，我们消除了先前作品中所需的探索假设，并且我们改进了对所需精度$\epsilon$的依赖关系，从$\mathcal{O}\br{\epsilon^{-5}}$到$\mathcal{O}\br{\epsilon^{-4}}$。我们的结果依赖于模仿学习与具有对抗性损失的MDPs中的在线学习之间的联系。对于后者的设置，我们提出了无限时间线性MDP的第一个结果，这可能是独立感兴趣的。此外，我们能够为有限时间情况提供更强的结果，其中我们实现了$\mathcal{O}\br{\epsilon^{-2}}$。使用线性函数逼近的数值实验表明，ILARL优于其他常用算法。

更新时间: 2024-05-03 15:28:44

领域: cs.LG

下载: http://arxiv.org/abs/2405.02181v1

A Flow-Based Model for Conditional and Probabilistic Electricity Consumption Profile Generation and Prediction

Residential Load Profile (RLP) generation and prediction are critical for the operation and planning of distribution networks, particularly as diverse low-carbon technologies are increasingly integrated. This paper introduces a novel flow-based generative model, termed Full Convolutional Profile Flow (FCPFlow), which is uniquely designed for both conditional and unconditional RLP generation, and for probabilistic load forecasting. By introducing two new layers--the invertible linear layer and the invertible normalization layer--the proposed FCPFlow architecture shows three main advantages compared to traditional statistical and contemporary deep generative models: 1) it is well-suited for RLP generation under continuous conditions, such as varying weather and annual electricity consumption, 2) it shows superior scalability in different datasets compared to traditional statistical, and 3) it also demonstrates better modeling capabilities in capturing the complex correlation of RLPs compared with deep generative models.

Updated: 2024-05-03 15:27:51

标题: 一种基于流的模型用于条件和概率电力消耗配置生成和预测

摘要: 住宅负荷特性（RLP）的生成和预测对于配电网络的运营和规划至关重要，特别是随着不同低碳技术的逐渐整合。本文介绍了一种新颖的基于流的生成模型，称为全卷积负荷流（FCPFlow），该模型独特设计用于条件和无条件RLP生成，以及概率性负荷预测。通过引入两个新层——可逆线性层和可逆归一化层——所提出的FCPFlow架构相对于传统统计和现代深度生成模型显示出三个主要优势：1）它适用于连续条件下的RLP生成，如变化的天气和年度电力消耗，2）相对于传统统计具有更好的可扩展性，在不同数据集中表现出色，3）相对于深度生成模型，还展示了更好的建模能力，可以捕捉RLP的复杂相关性。

更新时间: 2024-05-03 15:27:51

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.02180v1

Assessing and Verifying Task Utility in LLM-Powered Applications

The rapid development of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents, assisting humans in their daily tasks. However, a significant gap remains in assessing to what extent LLM-powered applications genuinely enhance user experience and task execution efficiency. This highlights the need to verify utility of LLM-powered applications, particularly by ensuring alignment between the application's functionality and end-user needs. We introduce AgentEval, a novel framework designed to simplify the utility verification process by automatically proposing a set of criteria tailored to the unique purpose of any given application. This allows for a comprehensive assessment, quantifying the utility of an application against the suggested criteria. We present a comprehensive analysis of the effectiveness and robustness of AgentEval for two open source datasets including Math Problem solving and ALFWorld House-hold related tasks. For reproducibility purposes, we make the data, code and all the logs publicly available at https://bit.ly/3w3yKcS .

Updated: 2024-05-03 15:26:27

标题: 评估和验证LLM驱动应用程序中的任务效用

摘要: 大型语言模型（LLMs）的快速发展导致了一系列应用程序的涌现，这些应用程序促进了多个代理之间的协作，帮助人类完成日常任务。然而，一个显著的差距仍然存在于评估LLM驱动的应用程序在多大程度上真正增强了用户体验和任务执行效率。这突显了验证LLM驱动应用程序的效用的必要性，特别是通过确保应用程序的功能与最终用户需求之间的一致性。我们引入了AgentEval，一种新颖的框架，旨在通过自动提出一套根据任何给定应用程序的独特目的量身定制的标准，简化效用验证过程。这使得可以进行全面评估，量化应用程序与建议标准之间的效用。我们针对两个开源数据集（包括数学问题解决和ALFWorld家庭任务）对AgentEval的有效性和健壮性进行了全面分析。为了可重现性的目的，我们公开提供了数据、代码和所有日志，网址为https://bit.ly/3w3yKcS。

更新时间: 2024-05-03 15:26:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.02178v1

Hoaxpedia: A Unified Wikipedia Hoax Articles Dataset

Hoaxes are a recognised form of disinformation created deliberately, with potential serious implications in the credibility of reference knowledge resources such as Wikipedia. What makes detecting Wikipedia hoaxes hard is that they often are written according to the official style guidelines. In this work, we first provide a systematic analysis of the similarities and discrepancies between legitimate and hoax Wikipedia articles, and introduce Hoaxpedia, a collection of 311 Hoax articles (from existing literature as well as official Wikipedia lists) alongside semantically similar real articles. We report results of binary classification experiments in the task of predicting whether a Wikipedia article is real or hoax, and analyze several settings as well as a range of language models. Our results suggest that detecting deceitful content in Wikipedia based on content alone, despite not having been explored much in the past, is a promising direction.

Updated: 2024-05-03 15:25:48

标题: Hoaxpedia：一个统一的维基百科恶作剧文章数据集

摘要: 骗局是一种故意制造的虚假信息形式，对于维基百科等参考知识资源的可信度可能造成严重影响。检测维基百科骗局的困难在于它们通常按照官方的写作风格指南编写。在这项工作中，我们首先对合法和骗局维基百科文章之间的相似之处和差异进行系统分析，并介绍了Hoaxpedia，这是一个包含311篇骗局文章（来自现有文献以及官方维基百科列表）以及语义上相似的真实文章的集合。我们报告了在预测维基百科文章是真实还是骗局的任务中的二元分类实验结果，并分析了几种设置以及一系列语言模型。我们的结果表明，尽管过去尚未被广泛探讨，但基于内容在维基百科中检测欺诈性内容是一个有希望的方向。

更新时间: 2024-05-03 15:25:48

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.02175v1

Zero-shot generalization across architectures for visual classification

Generalization to unseen data is a key desideratum for deep networks, but its relation to classification accuracy is unclear. Using a minimalist vision dataset and a measure of generalizability, we show that popular networks, from deep convolutional networks (CNNs) to transformers, vary in their power to extrapolate to unseen classes both across layers and across architectures. Accuracy is not a good predictor of generalizability, and generalization varies non-monotonically with layer depth.

Updated: 2024-05-03 15:25:09

标题: 跨架构的零样本泛化视觉分类

摘要: 未经训练数据的泛化是深度网络的一个关键愿望，但它与分类准确性的关系尚不清楚。通过使用一个极简视觉数据集和一个泛化度量，我们展示了从深度卷积网络（CNNs）到transformers等流行网络在跨层和跨架构上对未知类别进行外推的能力存在差异。准确性并不是泛化性的良好预测因子，而泛化性随着层深度的变化呈非单调变化。

更新时间: 2024-05-03 15:25:09

领域: cs.CV,cs.AI,cs.LG,I.2.6; I.5.1; I.4.10

下载: http://arxiv.org/abs/2402.14095v4

Visual Enumeration is Challenging for Large-scale Generative AI

Humans can readily judge the number of objects in a visual scene, even without counting, and such a skill has been documented in many animal species and babies prior to language development and formal schooling. Numerical judgments are error-free for small sets, while for larger collections responses become approximate, with variability increasing proportionally to the target number. This response pattern is observed for items of all kinds, despite variation in object features (such as color or shape), suggesting that our visual number sense relies on abstract representations of numerosity. Here, we investigate whether large-scale generative Artificial Intelligence (AI) systems have a human-like number sense, which should allow them to reliably name the number of objects in simple visual stimuli or generate images containing a target number of items in the 1-10 range. Surprisingly, most of the foundation models considered have a poor number sense: They make striking errors even with small numbers, the response variability does not increase in a systematic way, and the pattern of errors depends on object category. Only the most recent proprietary systems exhibit signatures of a visual number sense. Our findings demonstrate that having an intuitive visual understanding of number remains challenging for foundation models, which in turn might be detrimental to the perceptual grounding of numeracy that in humans is crucial for mathematical learning.

Updated: 2024-05-03 15:24:20

标题: 大规模生成人工智能中的视觉枚举具有挑战性

摘要: 人类可以在不用数数的情况下轻松判断视觉场景中物体的数量，这种技能已经在许多动物物种和在语言发展和正规教育之前的婴儿中得到证实。对于小规模集合，数值判断是无误的，而对于更大的集合，响应变得近似，变异性与目标数字成比例增加。尽管物体特征（如颜色或形状）有所不同，但这种响应模式适用于所有种类的物品，这表明我们的视觉数字感依赖于数量的抽象表达。在这里，我们调查大规模生成式人工智能（AI）系统是否具有类似人类的数字感，这应该使它们能够可靠地命名简单视觉刺激中物体的数量或生成包含1-10范围内目标数量的图像。令人惊讶的是，大多数考虑的基础模型都具有较差的数字感：即使在小数字情况下也会出现明显错误，响应变异性并没有以系统性方式增加，错误模式取决于物体类别。只有最新的专有系统显示出视觉数字感的特征。我们的发现表明，对于基础模型来说，具有直观的数字视觉理解仍然具有挑战性，这反过来可能对数字感知在人类中对数学学习至关重要的感知基础造成不利影响。

更新时间: 2024-05-03 15:24:20

领域: cs.CV,cs.AI,cs.NE

下载: http://arxiv.org/abs/2402.03328v2

Forensic License Plate Recognition with Compression-Informed Transformers

Forensic license plate recognition (FLPR) remains an open challenge in legal contexts such as criminal investigations, where unreadable license plates (LPs) need to be deciphered from highly compressed and/or low resolution footage, e.g., from surveillance cameras. In this work, we propose a side-informed Transformer architecture that embeds knowledge on the input compression level to improve recognition under strong compression. We show the effectiveness of Transformers for license plate recognition (LPR) on a low-quality real-world dataset. We also provide a synthetic dataset that includes strongly degraded, illegible LP images and analyze the impact of knowledge embedding on it. The network outperforms existing FLPR methods and standard state-of-the art image recognition models while requiring less parameters. For the severest degraded images, we can improve recognition by up to 8.9 percent points.

Updated: 2024-05-03 15:15:27

标题: 具有压缩通知变压器的法医车牌识别

摘要: 司法车牌识别（FLPR）在法律环境中仍然是一个开放挑战，例如在刑事调查中，需要从高度压缩和/或低分辨率的录像中解密无法识别的车牌（LPs）。在这项工作中，我们提出了一种侧信息Transformer架构，嵌入了关于输入压缩级别的知识，以改善在强压缩下的识别能力。我们展示了Transformers在低质量真实世界数据集上用于车牌识别（LPR）的有效性。我们还提供了一个包含强烈降级、无法辨认的LP图像的合成数据集，并分析了知识嵌入对其的影响。该网络优于现有的FLPR方法和标准的最先进图像识别模型，同时需要更少的参数。对于最严重的降级图像，我们可以将识别率提高高达8.9个百分点。

更新时间: 2024-05-03 15:15:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2207.14686v3

From Neural Activations to Concepts: A Survey on Explaining Concepts in Neural Networks

In this paper, we review recent approaches for explaining concepts in neural networks. Concepts can act as a natural link between learning and reasoning: once the concepts are identified that a neural learning system uses, one can integrate those concepts with a reasoning system for inference or use a reasoning system to act upon them to improve or enhance the learning system. On the other hand, knowledge can not only be extracted from neural networks but concept knowledge can also be inserted into neural network architectures. Since integrating learning and reasoning is at the core of neuro-symbolic AI, the insights gained from this survey can serve as an important step towards realizing neuro-symbolic AI based on explainable concepts.

Updated: 2024-05-03 15:15:17

标题: 从神经激活到概念：神经网络中解释概念的调查

摘要: 在这篇论文中，我们回顾了最近用于解释神经网络中概念的方法。概念可以作为学习和推理之间的自然联系：一旦确定了神经学习系统使用的概念，就可以将这些概念与推理系统集成，以进行推理或使用推理系统对其进行操作，以改进或增强学习系统。另一方面，知识不仅可以从神经网络中提取，还可以将概念知识插入到神经网络架构中。由于整合学习和推理是神经符号人工智能的核心，从这项调查中获得的见解可以作为实现基于可解释概念的神经符号人工智能的重要一步。

更新时间: 2024-05-03 15:15:17

领域: cs.AI,cs.CL,cs.CV,cs.LG,cs.NE

下载: http://arxiv.org/abs/2310.11884v2

EEG2TEXT: Open Vocabulary EEG-to-Text Decoding with EEG Pre-Training and Multi-View Transformer

Deciphering the intricacies of the human brain has captivated curiosity for centuries. Recent strides in Brain-Computer Interface (BCI) technology, particularly using motor imagery, have restored motor functions such as reaching, grasping, and walking in paralyzed individuals. However, unraveling natural language from brain signals remains a formidable challenge. Electroencephalography (EEG) is a non-invasive technique used to record electrical activity in the brain by placing electrodes on the scalp. Previous studies of EEG-to-text decoding have achieved high accuracy on small closed vocabularies, but still fall short of high accuracy when dealing with large open vocabularies. We propose a novel method, EEG2TEXT, to improve the accuracy of open vocabulary EEG-to-text decoding. Specifically, EEG2TEXT leverages EEG pre-training to enhance the learning of semantics from EEG signals and proposes a multi-view transformer to model the EEG signal processing by different spatial regions of the brain. Experiments show that EEG2TEXT has superior performance, outperforming the state-of-the-art baseline methods by a large margin of up to 5% in absolute BLEU and ROUGE scores. EEG2TEXT shows great potential for a high-performance open-vocabulary brain-to-text system to facilitate communication.

Updated: 2024-05-03 15:14:19

标题: EEG2TEXT：具有EEG预训练和多视角Transformer的开放词汇EEG到文本解码

摘要: 解读人类大脑的复杂性已经吸引了数个世纪的好奇心。最近在脑机接口（BCI）技术方面取得的进展，特别是使用运动想象，已经恢复了瘫痪个体的运动功能，如伸手、抓握和行走。然而，从大脑信号中解读自然语言仍然是一个巨大的挑战。脑电图（EEG）是一种非侵入性技术，通过在头皮上放置电极记录大脑中的电活动。先前的EEG到文本解码研究在小型封闭词汇上取得了较高的准确性，但在处理大型开放词汇时仍然缺乏高准确性。我们提出了一种新方法，EEG2TEXT，以提高开放词汇EEG到文本解码的准确性。具体来说，EEG2TEXT利用EEG预训练来增强从EEG信号中学习语义的能力，并提出了一个多视图变压器，来模拟大脑不同空间区域的EEG信号处理。实验证明，EEG2TEXT具有出色的性能，在绝对BLEU和ROUGE分数上超过了现有基准方法高达5%的较大幅度。EEG2TEXT展现出了作为一个高性能开放词汇大脑到文本系统的巨大潜力，以促进交流。

更新时间: 2024-05-03 15:14:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.02165v1

Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models

In the field of robotics and computer vision, efficient and accurate semantic mapping remains a significant challenge due to the growing demand for intelligent machines that can comprehend and interact with complex environments. Conventional panoptic mapping methods, however, are limited by predefined semantic classes, thus making them ineffective for handling novel or unforeseen objects. In response to this limitation, we introduce the Unified Promptable Panoptic Mapping (UPPM) method. UPPM utilizes recent advances in foundation models to enable real-time, on-demand label generation using natural language prompts. By incorporating a dynamic labeling strategy into traditional panoptic mapping techniques, UPPM provides significant improvements in adaptability and versatility while maintaining high performance levels in map reconstruction. We demonstrate our approach on real-world and simulated datasets. Results show that UPPM can accurately reconstruct scenes and segment objects while generating rich semantic labels through natural language interactions. A series of ablation experiments validated the advantages of foundation model-based labeling over fixed label sets.

Updated: 2024-05-03 15:08:39

标题: 描绘未见之物：利用基础模型进行统一的可提示全景式地图绘制和动态标注

摘要: 在机器人学和计算机视觉领域，高效准确的语义映射仍然是一个重要挑战，因为对能够理解和与复杂环境互动的智能机器的需求不断增长。然而，传统的全景映射方法受到预定义的语义类别的限制，因此对于处理新颖或未预料到的对象效果不佳。为了应对这一限制，我们引入了统一可提示全景映射（UPPM）方法。UPPM利用最新的基础模型，通过自然语言提示实现实时、按需的标签生成。通过将动态标记策略融入传统的全景映射技术，UPPM在提高适应性和多功能性的同时，保持了地图重建的高性能水平。我们在真实和模拟数据集上展示了我们的方法。结果显示，UPPM能够准确重建场景并分割对象，同时通过自然语言交互生成丰富的语义标签。一系列消融实验验证了基础模型标签相对于固定标签集的优势。

更新时间: 2024-05-03 15:08:39

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.02162v1

Simulating the economic impact of rationality through reinforcement learning and agent-based modelling

Agent-based models (ABMs) are simulation models used in economics to overcome some of the limitations of traditional frameworks based on general equilibrium assumptions. However, agents within an ABM follow predetermined, not fully rational, behavioural rules which can be cumbersome to design and difficult to justify. Here we leverage multi-agent reinforcement learning (RL) to expand the capabilities of ABMs with the introduction of fully rational agents that learn their policy by interacting with the environment and maximising a reward function. Specifically, we propose a 'Rational macro ABM' (R-MABM) framework by extending a paradigmatic macro ABM from the economic literature. We show that gradually substituting ABM firms in the model with RL agents, trained to maximise profits, allows for a thorough study of the impact of rationality on the economy. We find that RL agents spontaneously learn three distinct strategies for maximising profits, with the optimal strategy depending on the level of market competition and rationality. We also find that RL agents with independent policies, and without the ability to communicate with each other, spontaneously learn to segregate into different strategic groups, thus increasing market power and overall profits. Finally, we find that a higher degree of rationality in the economy always improves the macroeconomic environment as measured by total output, depending on the specific rational policy, this can come at the cost of higher instability. Our R-MABM framework is general, it allows for stable multi-agent learning, and represents a principled and robust direction to extend existing economic simulators.

Updated: 2024-05-03 15:08:25

标题: 用强化学习和基于代理模型的方法模拟理性对经济影响

摘要: 代理模型（ABMs）是经济学中使用的模拟模型，用于克服基于一般均衡假设的传统框架的一些局限性。然而，在ABM中的代理遵循预先确定的，而不是完全理性的行为规则，这些规则可能难以设计并且难以证明。在这里，我们利用多智能体强化学习（RL）来扩展ABM的能力，引入通过与环境互动并最大化奖励函数来学习策略的完全理性代理。具体而言，我们提出了一个“理性宏观ABM”（R-MABM）框架，通过扩展经济文献中的一个典型宏观ABM。我们发现，逐渐用RL代理替代模型中的ABM公司，这些代理经过训练以最大化利润，可以允许对理性对经济的影响进行深入研究。我们发现，RL代理自发学习了三种不同的最大化利润策略，最佳策略取决于市场竞争和理性水平。我们还发现，具有独立政策且无法互相沟通的RL代理自发学习了分离成不同战略群体，从而增加市场力量和总利润。最后，我们发现，经济中更高程度的理性总是会改善宏观经济环境，根据具体的理性政策，这可能会以更高的不稳定性为代价。我们的R-MABM框架是通用的，它允许稳定的多智能体学习，并代表了延伸现有经济模拟器的有原则和坚固的方向。

更新时间: 2024-05-03 15:08:25

领域: cs.LG,cs.AI,cs.CE,cs.MA,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2405.02161v1

Deep Learning Forecasts Caldera Collapse Events at Kilauea Volcano

During the three month long eruption of Kilauea volcano, Hawaii in 2018, the pre-existing summit caldera collapsed in over 60 quasi-periodic failure events. The last 40 of these events, which generated Mw >5 very long period (VLP) earthquakes, had inter-event times between 0.8 - 2.2 days. These failure events offer a unique dataset for testing methods for predicting earthquake recurrence based on locally recorded GPS, tilt, and seismicity data. In this work, we train a deep learning graph neural network (GNN) to predict the time-to-failure of the caldera collapse events using only a fraction of the data recorded at the start of each cycle. We find that the GNN generalizes to unseen data and can predict the time-to-failure to within a few hours using only 0.5 days of data, substantially improving upon a null model based only on inter-event statistics. Predictions improve with increasing input data length, and are most accurate when using high-SNR tilt-meter data. Applying the trained GNN to synthetic data with different magma pressure decay times predicts failure at a nearly constant stress threshold, revealing that the GNN is sensing the underling physics of caldera collapse. These findings demonstrate the predictability of caldera collapse sequences under well monitored conditions, and highlight the potential of machine learning methods for forecasting real world catastrophic events with limited training data.

Updated: 2024-05-03 15:05:01

标题: 深度学习预测基拉韦厄火山的火山口坍塌事件

摘要: 在2018年夏威夷基拉韦厄火山的为期三个月的喷发过程中，原有的火山口在60多次准周期性的崩塌事件中坍塌。在这40次事件中，产生了Mw >5的极长周期地震，事件间隔在0.8-2.2天之间。这些崩塌事件提供了一个独特的数据集，用于测试基于当地GPS、倾角和地震数据预测地震再次发生的方法。在这项工作中，我们训练了一个深度学习图神经网络（GNN），仅使用每个周期开始时记录的一小部分数据来预测火山口坍塌事件的发生时间。我们发现，GNN可以泛化到未见数据，并且仅使用0.5天的数据就能够预测坍塌事件的时间，精度在几小时内，大大优于仅基于事件间统计的空模型。随着输入数据长度的增加，预测结果会得到改善，并且在使用高信噪比倾斜仪数据时最为准确。将训练好的GNN应用于具有不同岩浆压力衰减时间的合成数据，预测在几乎恒定的应力阈值下发生崩溃，揭示了GNN正在感知火山口坍塌的基本物理过程。这些发现表明，在受到良好监测条件下，火山口坍塌序列是可预测的，并且突显了机器学习方法在有限训练数据情况下预测现实世界灾难事件的潜力。

更新时间: 2024-05-03 15:05:01

领域: physics.geo-ph,cs.LG

下载: http://arxiv.org/abs/2404.19351v2

Neural Context Flows for Learning Generalizable Dynamical Systems

Neural Ordinary Differential Equations typically struggle to generalize to new dynamical behaviors created by parameter changes in the underlying system, even when the dynamics are close to previously seen behaviors. The issue gets worse when the changing parameters are unobserved, i.e., their value or influence is not directly measurable when collecting data. We introduce Neural Context Flow (NCF), a framework that encodes said unobserved parameters in a latent context vector as input to a vector field. NCFs leverage differentiability of the vector field with respect to the parameters, along with first-order Taylor expansion to allow any context vector to influence trajectories from other parameters. We validate our method and compare it to established Multi-Task and Meta-Learning alternatives, showing competitive performance in mean squared error for in-domain and out-of-distribution evaluation on the Lotka-Volterra, Glycolytic Oscillator, and Gray-Scott problems. This study holds practical implications for foundational models in science and related areas that benefit from conditional neural ODEs. Our code is openly available at https://github.com/ddrous/ncflow.

Updated: 2024-05-03 15:02:21

标题: 神经上下文流用于学习具有普适性动态系统

摘要: 神经常微分方程通常难以推广到基础系统中参数变化所产生的新动态行为，即使这些动态行为与先前观察到的行为非常接近。当变化的参数是未观察到的时，问题会变得更加严重，即在收集数据时无法直接测量它们的值或影响。我们引入神经上下文流（NCF）的框架，将这些未观察到的参数编码为输入到矢量场的潜在上下文向量。NCF利用矢量场相对于参数的可微性，以及一阶泰勒展开，使任何上下文向量都能影响来自其他参数的轨迹。我们验证了我们的方法，并将其与已建立的多任务和元学习替代方案进行比较，在洛特卡-沃尔特拉、糖分振荡器和格雷-斯科特问题的域内和域外评估中显示出竞争性的性能。这项研究对于科学和相关领域中受益于条件神经ODE的基础模型具有实际意义。我们的代码公开可用于https://github.com/ddrous/ncflow。

更新时间: 2024-05-03 15:02:21

领域: cs.LG,math.DS

下载: http://arxiv.org/abs/2405.02154v1

Deep Reinforcement Learning in Parameterized Action Space

Recent work has shown that deep neural networks are capable of approximating both value functions and policies in reinforcement learning domains featuring continuous state and action spaces. However, to the best of our knowledge no previous work has succeeded at using deep neural networks in structured (parameterized) continuous action spaces. To fill this gap, this paper focuses on learning within the domain of simulated RoboCup soccer, which features a small set of discrete action types, each of which is parameterized with continuous variables. The best learned agent can score goals more reliably than the 2012 RoboCup champion agent. As such, this paper represents a successful extension of deep reinforcement learning to the class of parameterized action space MDPs.

Updated: 2024-05-03 15:00:50

标题: 参数化动作空间中的深度强化学习

摘要: 最近的研究表明，深度神经网络能够在具有连续状态和动作空间的强化学习领域中逼近价值函数和策略。然而，据我们所知，以往没有任何研究成功地在结构化（参数化）连续动作空间中使用深度神经网络。为了填补这一空白，本文将重点放在学习模拟RoboCup足球领域，该领域具有一小组离散动作类型，每种类型都使用连续变量进行参数化。最佳学习代理能够比2012年RoboCup冠军代理更可靠地进球。因此，本文代表了将深度强化学习成功扩展到参数化动作空间MDPs类别的一次成功尝试。

更新时间: 2024-05-03 15:00:50

领域: cs.AI,cs.LG,cs.MA,cs.NE

下载: http://arxiv.org/abs/1511.04143v5

GMP-ATL: Gender-augmented Multi-scale Pseudo-label Enhanced Adaptive Transfer Learning for Speech Emotion Recognition via HuBERT

The continuous evolution of pre-trained speech models has greatly advanced Speech Emotion Recognition (SER). However, there is still potential for enhancement in the performance of these methods. In this paper, we present GMP-ATL (Gender-augmented Multi-scale Pseudo-label Adaptive Transfer Learning), a novel HuBERT-based adaptive transfer learning framework for SER. Specifically, GMP-ATL initially employs the pre-trained HuBERT, implementing multi-task learning and multi-scale k-means clustering to acquire frame-level gender-augmented multi-scale pseudo-labels. Then, to fully leverage both obtained frame-level and utterance-level emotion labels, we incorporate model retraining and fine-tuning methods to further optimize GMP-ATL. Experiments on IEMOCAP show that our GMP-ATL achieves superior recognition performance, with a WAR of 80.0\% and a UAR of 82.0\%, surpassing state-of-the-art unimodal SER methods, while also yielding comparable results with multimodal SER approaches.

Updated: 2024-05-03 14:58:46

标题: GMP-ATL：基于性别增强的多尺度伪标签增强自适应迁移学习，用于通过HuBERT进行语音情绪识别

摘要: 预训练语音模型的持续演进极大地推动了语音情感识别（SER）的发展。然而，这些方法的性能仍有提升空间。本文提出了一种新颖的基于HuBERT的自适应迁移学习框架GMP-ATL（Gender-augmented Multi-scale Pseudo-label Adaptive Transfer Learning）用于SER。具体来说，GMP-ATL首先采用预训练的HuBERT，实现多任务学习和多尺度k均值聚类以获取帧级别的性别增强多尺度伪标签。然后，为了充分利用获得的帧级别和话语级别的情感标签，我们结合了模型重训练和微调方法来进一步优化GMP-ATL。在IEMOCAP上的实验表明，我们的GMP-ATL实现了卓越的识别性能，WAR为80.0％，UAR为82.0％，超越了最先进的单模态SER方法，同时还产生了与多模态SER方法相当的结果。

更新时间: 2024-05-03 14:58:46

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2405.02151v1

The Impact of Differential Feature Under-reporting on Algorithmic Fairness

Predictive risk models in the public sector are commonly developed using administrative data that is more complete for subpopulations that more greatly rely on public services. In the United States, for instance, information on health care utilization is routinely available to government agencies for individuals supported by Medicaid and Medicare, but not for the privately insured. Critiques of public sector algorithms have identified such differential feature under-reporting as a driver of disparities in algorithmic decision-making. Yet this form of data bias remains understudied from a technical viewpoint. While prior work has examined the fairness impacts of additive feature noise and features that are clearly marked as missing, the setting of data missingness absent indicators (i.e. differential feature under-reporting) has been lacking in research attention. In this work, we present an analytically tractable model of differential feature under-reporting which we then use to characterize the impact of this kind of data bias on algorithmic fairness. We demonstrate how standard missing data methods typically fail to mitigate bias in this setting, and propose a new set of methods specifically tailored to differential feature under-reporting. Our results show that, in real world data settings, under-reporting typically leads to increasing disparities. The proposed solution methods show success in mitigating increases in unfairness.

Updated: 2024-05-03 14:58:33

标题: 差异性特征未报告对算法公平性的影响

摘要: 公共部门中的预测风险模型通常使用更完整的行政数据开发，这些数据更多地依赖于公共服务的子人口。例如，在美国，政府机构通常可以获得关于由Medicaid和Medicare支持的个人的医疗保健利用信息，但不能获得私人保险人的信息。公共部门算法的批评者已经确定了这种差异特征的报告不足作为算法决策中不公平的驱动因素。然而，这种形式的数据偏差在技术视角下仍未被充分研究。尽管先前的工作已经研究了增加特征噪声和明确标记为缺失的特征对公平性的影响，但缺失指标缺失的数据缺失设置（即差异特征报告不足）在研究中缺乏关注。在这项工作中，我们提出了一个分析可行的差异特征报告不足模型，然后用它来描述这种数据偏差对算法公平性的影响。我们展示了标准缺失数据方法通常无法减轻这种情况下的偏见，并提出了一套专门针对差异特征报告不足的方法。我们的结果显示，在真实世界数据设置中，报告不足通常会导致不公平加剧。所提出的解决方法在减轻不公平增加方面表现出成功。

更新时间: 2024-05-03 14:58:33

领域: cs.LG,cs.CY,stat.ML

下载: http://arxiv.org/abs/2401.08788v2

Towards a Formal Creativity Theory: Preliminary results in Novelty and Transformativeness

Formalizing creativity-related concepts has been a long-term goal of Computational Creativity. To the same end, we explore Formal Learning Theory in the context of creativity. We provide an introduction to the main concepts of this framework and a re-interpretation of terms commonly found in creativity discussions, proposing formal definitions for novelty and transformational creativity. This formalisation marks the beginning of a research branch we call Formal Creativity Theory, exploring how learning can be included as preparation for exploratory behaviour and how learning is a key part of transformational creative behaviour. By employing these definitions, we argue that, while novelty is neither necessary nor sufficient for transformational creativity in general, when using an inspiring set, rather than a sequence of experiences, an agent actually requires novelty for transformational creativity to occur.

Updated: 2024-05-03 14:53:46

标题: 走向一个正式的创造力理论：新奇性和转变性的初步结果

摘要: 将创造力相关概念形式化一直是计算创造力的长期目标。为此，我们在创造力的语境中探索形式学习理论。我们介绍了该框架的主要概念，并重新解释了创造力讨论中常见的术语，提出了新颖性和转化创造力的形式定义。这种形式化标志着我们所称的形式创造力理论研究分支的开始，探讨学习如何作为探索行为的准备和学习如何成为转化创造行为的关键部分。通过运用这些定义，我们认为，尽管在一般情况下新颖性对于转化创造力来说既非必要也非充分，但当使用一个激发性集合而不是一系列经验时，一个代理实际上需要新颖性才能发生转化创造力。

更新时间: 2024-05-03 14:53:46

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.02148v1

Payout Races and Congested Channels: A Formal Analysis of Security in the Lightning Network

The Lightning Network, a payment channel network with a market cap of over 192M USD, is designed to resolve Bitcoin's scalability issues through fast off-chain transactions. There are multiple Lightning Network client implementations, all of which conform to the same textual specifications known as BOLTs. Several vulnerabilities have been manually discovered, but to-date there have been few works systematically analyzing the security of the Lightning Network. In this work, we take a foundational approach to analyzing the security of the Lightning Network with the help of formal methods. Based on the BOLTs' specifications, we build a detailed formal model of the Lightning Network's single-hop payment protocol and verify it using the Spin model checker. Our model captures both concurrency and error semantics of the payment protocol. We then define several security properties which capture the correct intermediate operation of the protocol, ensuring that the outcome is always certain to both channel peers, and using them we re-discover a known attack previously reported in the literature along with a novel attack, referred to as a Payout Race. A Payout Race consists of a particular sequence of events that can lead to an ambiguity in the protocol in which innocent users can unwittingly lose funds. We confirm the practicality of this attack by reproducing it in a local testbed environment.

Updated: 2024-05-03 14:52:47

标题: 支付赛跑和拥塞通道：闪电网络安全的形式分析

摘要: 闪电网络是一个市值超过1.92亿美元的支付通道网络，旨在通过快速的离链交易解决比特币的可扩展性问题。有多个闪电网络客户端实现，所有这些实现都符合被称为BOLTs的相同文本规范。已经手动发现了几个漏洞，但迄今为止，对闪电网络安全性进行系统分析的工作很少。在这项工作中，我们采用基础方法来分析闪电网络的安全性，借助形式方法。基于BOLTs的规范，我们构建了闪电网络单跳支付协议的详细形式模型，并使用Spin模型检查器对其进行验证。我们的模型捕获了支付协议的并发性和错误语义。然后，我们定义了几个安全属性，捕获协议的正确中间操作，确保结果始终对两个通道对等方确定，并利用它们重新发现了先前在文献中报道的已知攻击以及一种新型攻击，称为Payout Race。Payout Race包括一系列特定的事件序列，可能导致协议中的模棱两可，使无辜用户无意中丢失资金。通过在本地测试环境中复现，我们确认了这种攻击的实用性。

更新时间: 2024-05-03 14:52:47

领域: cs.CR

下载: http://arxiv.org/abs/2405.02147v1

Towards Unconstrained Audio Splicing Detection and Localization with Neural Networks

Freely available and easy-to-use audio editing tools make it straightforward to perform audio splicing. Convincing forgeries can be created by combining various speech samples from the same person. Detection of such splices is important both in the public sector when considering misinformation, and in a legal context to verify the integrity of evidence. Unfortunately, most existing detection algorithms for audio splicing use handcrafted features and make specific assumptions. However, criminal investigators are often faced with audio samples from unconstrained sources with unknown characteristics, which raises the need for more generally applicable methods. With this work, we aim to take a first step towards unconstrained audio splicing detection to address this need. We simulate various attack scenarios in the form of post-processing operations that may disguise splicing. We propose a Transformer sequence-to-sequence (seq2seq) network for splicing detection and localization. Our extensive evaluation shows that the proposed method outperforms existing dedicated approaches for splicing detection [3, 10] as well as the general-purpose networks EfficientNet [28] and RegNet [25].

Updated: 2024-05-03 14:52:21

标题: 朝向使用神经网络实现无约束音频拼接检测和定位

摘要: 免费且易于使用的音频编辑工具使音频拼接变得简单。通过组合同一人的不同语音样本可以创建令人信服的伪造品。检测这种拼接对于在公共部门考虑虚假信息以及在法律背景下验证证据的完整性都很重要。不幸的是，大多数现有的音频拼接检测算法使用手工制作的特征并做出特定假设。然而，刑事调查人员常常面临来自未受限制的来源且具有未知特征的音频样本，这引发了对更普遍适用的方法的需求。通过这项工作，我们旨在迈出第一步，实现对未受限制的音频拼接检测以满足这种需求。我们模拟各种攻击场景，以后处理操作的形式可能掩盖拼接。我们提出了一种用于拼接检测和定位的Transformer序列到序列（seq2seq）网络。我们的广泛评估表明，所提出的方法在拼接检测方面优于现有的专用方法以及通用网络EfficientNet和RegNet。

更新时间: 2024-05-03 14:52:21

领域: cs.SD,cs.AI,cs.CV,eess.AS

下载: http://arxiv.org/abs/2207.14682v4

Exposing and Explaining Fake News On-the-Fly

Social media platforms enable the rapid dissemination and consumption of information. However, users instantly consume such content regardless of the reliability of the shared data. Consequently, the latter crowdsourcing model is exposed to manipulation. This work contributes with an explainable and online classification method to recognize fake news in real-time. The proposed method combines both unsupervised and supervised Machine Learning approaches with online created lexica. The profiling is built using creator-, content- and context-based features using Natural Language Processing techniques. The explainable classification mechanism displays in a dashboard the features selected for classification and the prediction confidence. The performance of the proposed solution has been validated with real data sets from Twitter and the results attain 80 % accuracy and macro F-measure. This proposal is the first to jointly provide data stream processing, profiling, classification and explainability. Ultimately, the proposed early detection, isolation and explanation of fake news contribute to increase the quality and trustworthiness of social media contents.

Updated: 2024-05-03 14:49:04

标题: 即时揭露和解释假新闻

摘要: 社交媒体平台促进了信息的快速传播和消费。然而，用户会立即消费这些内容，而不考虑所分享数据的可靠性。因此，后者的众包模型容易受到操纵。本研究提出了一种可解释的在线分类方法，用于实时识别假新闻。所提出的方法结合了无监督和监督机器学习方法，并使用在线创建的词库。利用自然语言处理技术，基于创作者、内容和上下文的特征构建了概要。可解释的分类机制在仪表板上显示了用于分类的特征和预测置信度。所提出的解决方案的性能已经通过来自Twitter的真实数据集进行了验证，结果达到了80%的准确率和宏F-度量。这个提议是第一个共同提供数据流处理、概要、分类和可解释性的。最终，所提出的假新闻的早期检测、隔离和解释有助于提高社交媒体内容的质量和可信度。

更新时间: 2024-05-03 14:49:04

领域: cs.CL,cs.AI,cs.SI

下载: http://arxiv.org/abs/2405.06668v1

Multi-Objective Recommendation via Multivariate Policy Learning

Real-world recommender systems often need to balance multiple objectives when deciding which recommendations to present to users. These include behavioural signals (e.g. clicks, shares, dwell time), as well as broader objectives (e.g. diversity, fairness). Scalarisation methods are commonly used to handle this balancing task, where a weighted average of per-objective reward signals determines the final score used for ranking. Naturally, how these weights are computed exactly, is key to success for any online platform. We frame this as a decision-making task, where the scalarisation weights are actions taken to maximise an overall North Star reward (e.g. long-term user retention or growth). We extend existing policy learning methods to the continuous multivariate action domain, proposing to maximise a pessimistic lower bound on the North Star reward that the learnt policy will yield. Typical lower bounds based on normal approximations suffer from insufficient coverage, and we propose an efficient and effective policy-dependent correction for this. We provide guidance to design stochastic data collection policies, as well as highly sensitive reward signals. Empirical observations from simulations, offline and online experiments highlight the efficacy of our deployed approach.

Updated: 2024-05-03 14:44:04

标题: 多目标推荐的多元策略学习

摘要: 真实世界的推荐系统在决定向用户展示哪些推荐内容时通常需要平衡多个目标。这些目标包括行为信号（例如点击、分享、停留时间），以及更广泛的目标（例如多样性、公平性）。标量化方法通常用于处理这种平衡任务，其中基于每个目标奖励信号的加权平均确定用于排名的最终得分。自然地，这些权重的计算方式对于任何在线平台的成功至关重要。我们将这视为一个决策任务，其中标量化权重是为了最大化整体North Star奖励（例如长期用户保留或增长）而采取的行动。我们将现有的策略学习方法扩展到连续的多元行动领域，建议最大化所学策略将产生的North Star奖励的悲观下限。基于正态近似的典型下限存在覆盖不足的问题，我们提出了一个高效且有效的基于策略的纠正方法。我们提供了设计随机数据收集策略以及高度敏感的奖励信号的指导。模拟、离线和在线实验的经验观察突出了我们部署方法的有效性。

更新时间: 2024-05-03 14:44:04

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.02141v1

An Information Theoretic Perspective on Conformal Prediction

Conformal Prediction (CP) is a distribution-free uncertainty estimation framework that constructs prediction sets guaranteed to contain the true answer with a user-specified probability. Intuitively, the size of the prediction set encodes a general notion of uncertainty, with larger sets associated with higher degrees of uncertainty. In this work, we leverage information theory to connect conformal prediction to other notions of uncertainty. More precisely, we prove three different ways to upper bound the intrinsic uncertainty, as described by the conditional entropy of the target variable given the inputs, by combining CP with information theoretical inequalities. Moreover, we demonstrate two direct and useful applications of such connection between conformal prediction and information theory: (i) more principled and effective conformal training objectives that generalize previous approaches and enable end-to-end training of machine learning models from scratch, and (ii) a natural mechanism to incorporate side information into conformal prediction. We empirically validate both applications in centralized and federated learning settings, showing our theoretical results translate to lower inefficiency (average prediction set size) for popular CP methods.

Updated: 2024-05-03 14:43:07

标题: 一个信息论的视角对符合性预测进行研究

摘要: Conformal Prediction（CP）是一种无分布的不确定性估计框架，它构建了预测集，保证以用户指定的概率包含真实答案。直观地说，预测集的大小编码了一种普遍的不确定性概念，较大的集合与较高程度的不确定性相关联。在这项工作中，我们利用信息论将符合预测与其他不确定性概念联系起来。更确切地说，我们证明了三种不同的方式来上界内在不确定性，即给定输入的目标变量的条件熵，通过将CP与信息理论不等式相结合。此外，我们展示了符合预测和信息理论之间的这种联系的两种直接且有用的应用：（i）更加原则和有效的符合训练目标，这些目标泛化了先前的方法，并且使得可以从头开始对机器学习模型进行端到端的训练，和（ii）一个自然机制来将侧面信息纳入符合预测。我们在集中式和联合学习环境中从经验上验证了这两种应用，展示了我们的理论结果转化为流行CP方法的低效率（平均预测集大小）。

更新时间: 2024-05-03 14:43:07

领域: cs.LG,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2405.02140v1

Fairness Without Demographics in Human-Centered Federated Learning

Federated learning (FL) enables collaborative model training while preserving data privacy, making it suitable for decentralized human-centered AI applications. However, a significant research gap remains in ensuring fairness in these systems. Current fairness strategies in FL require knowledge of bias-creating/sensitive attributes, clashing with FL's privacy principles. Moreover, in human-centered datasets, sensitive attributes may remain latent. To tackle these challenges, we present a novel bias mitigation approach inspired by "Fairness without Demographics" in machine learning. The presented approach achieves fairness without needing knowledge of sensitive attributes by minimizing the top eigenvalue of the Hessian matrix during training, ensuring equitable loss landscapes across FL participants. Notably, we introduce a novel FL aggregation scheme that promotes participating models based on error rates and loss landscape curvature attributes, fostering fairness across the FL system. This work represents the first approach to attaining "Fairness without Demographics" in human-centered FL. Through comprehensive evaluation, our approach demonstrates effectiveness in balancing fairness and efficacy across various real-world applications, FL setups, and scenarios involving single and multiple bias-inducing factors, representing a significant advancement in human-centered FL.

Updated: 2024-05-03 14:38:56

标题: 人类中心的联邦学习中的公平性无需人口统计数据

摘要: 联邦学习（FL）使协作模型训练成为可能，同时保护数据隐私，使其适用于分散的以人为中心的人工智能应用。然而，在这些系统中确保公平性仍存在重大研究空白。当前在FL中的公平性策略需要了解造成偏见的/敏感属性，与FL的隐私原则相冲突。此外，在以人为中心的数据集中，敏感属性可能保持潜在。为了解决这些挑战，我们提出了一种新颖的偏见缓解方法，灵感来自于机器学习中的“无人口统计数据公平性”。所提出的方法通过在训练过程中最小化Hessian矩阵的最大特征值来实现公平性，确保FL参与者之间的损失景观公平。值得注意的是，我们引入了一种新颖的FL聚合方案，根据错误率和损失景观曲率属性促进参与模型，促进FL系统的公平性。这项工作代表了在人为中心的FL中实现“无人口统计数据公平性”的第一种方法。通过全面评估，我们的方法在平衡公平性和效果方面在各种真实应用、FL设置和涉及单一和多个引入偏见因素的情景中展示了有效性，代表了人为中心FL的重大进步。

更新时间: 2024-05-03 14:38:56

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2404.19725v2

Physics-informed generative neural networks for RF propagation prediction with application to indoor body perception

Electromagnetic (EM) body models designed to predict Radio-Frequency (RF) propagation are time-consuming methods which prevent their adoption in strict real-time computational imaging problems, such as human body localization and sensing. Physics-informed Generative Neural Network (GNN) models have been recently proposed to reproduce EM effects, namely to simulate or reconstruct missing data or samples by incorporating relevant EM principles and constraints. The paper discusses a Variational Auto-Encoder (VAE) model which is trained to reproduce the effects of human motions on the EM field and incorporate EM body diffraction principles. Proposed physics-informed generative neural network models are verified against both classical diffraction-based EM tools and full-wave EM body simulations.

Updated: 2024-05-03 14:35:02

标题: 物理信息生成神经网络在射频传播预测中的应用，以室内人体感知为例。

摘要: 电磁（EM）人体模型旨在预测射频（RF）传播，但这些方法耗时，阻碍了它们在严格实时计算成像问题中的应用，例如人体定位和感知。最近提出了基于物理信息的生成神经网络（GNN）模型，旨在重现EM效应，即通过纳入相关的EM原理和约束来模拟或重建丢失的数据或样本。本文讨论了一个变分自动编码器（VAE）模型，该模型经过训练，能够重现人体运动对EM场的影响，并纳入EM人体衍射原理。提出的基于物理信息的生成神经网络模型已经通过经典基于衍射的EM工具和全波EM人体模拟进行了验证。

更新时间: 2024-05-03 14:35:02

领域: eess.SP,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.02131v1

TIPAA-SSL: Text Independent Phone-to-Audio Alignment based on Self-Supervised Learning and Knowledge Transfer

In this paper, we present a novel approach for text independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (wav2vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model and a frame-level phoneme classifier trained thanks to forced-alignment labels (using Montreal Forced Aligner) to produce multi-lingual phonetic representations, thus requiring minimal additional training. We evaluate our model using synthetic native data from the TIMIT dataset and the SCRIBE dataset for American and British English, respectively. Our proposed model outperforms the state-of-the-art (charsiu) in statistical metrics and has applications in language learning and speech processing systems. We leave experiments on other languages for future work but the design of the system makes it easily adaptable to other languages.

Updated: 2024-05-03 14:25:21

标题: TIPAA-SSL：基于自监督学习和知识迁移的文本无关电话到音频对齐

摘要: 在这篇论文中，我们提出了一种新颖的文本无关电话到音频对齐方法，基于音素识别、表示学习和知识传递。我们的方法利用了一个自监督模型(wav2vec2)，通过使用Connectionist Temporal Classification (CTC)损失进行音素识别的微调，一个降维模型和一个基于帧级音素分类器的训练，这些训练是通过强制对齐标签(使用 Montreal Forced Aligner)来生成多语言语音表示，因此需要最少的额外训练。我们使用 TIMIT 数据集和 SCRIBE 数据集的美国英语和英国英语的合成本地数据来评估我们的模型。我们提出的模型在统计指标上优于最先进的 (charsiu)，并在语言学习和语音处理系统中有应用。我们将其他语言的实验留给未来的工作，但系统的设计使其可以轻松地适应其他语言。

更新时间: 2024-05-03 14:25:21

领域: eess.AS,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.02124v1

Can We Identify Unknown Audio Recording Environments in Forensic Scenarios?

Audio recordings may provide important evidence in criminal investigations. One such case is the forensic association of the recorded audio to the recording location. For example, a voice message may be the only investigative cue to narrow down the candidate sites for a crime. Up to now, several works provide tools for closed-set recording environment classification under relatively clean recording conditions. However, in forensic investigations, the candidate locations are case-specific. Thus, closed-set tools are not applicable without retraining on a sufficient amount of training samples for each case and respective candidate set. In addition, a forensic tool has to deal with audio material from uncontrolled sources with variable properties and quality. In this work, we therefore attempt a major step towards practical forensic application scenarios. We propose a representation learning framework called EnvId, short for environment identification. EnvId avoids case-specific retraining. Instead, it is the first tool for robust few-shot classification of unseen environment locations. We demonstrate that EnvId can handle forensically challenging material. It provides good quality predictions even under unseen signal degradations, environment characteristics or recording position mismatches. Our code and datasets will be made publicly available upon acceptance.

Updated: 2024-05-03 14:19:40

标题: 我们能在法庭情境中识别未知的音频录音环境吗？

摘要: 音频记录可能在刑事调查中提供重要证据。其中一个案例是将录制的音频与录音地点进行法医关联。例如，语音留言可能是缩小犯罪候选地点范围的唯一调查线索。到目前为止，有几项作品提供了在相对清洁的录制条件下进行封闭集录音环境分类的工具。然而，在法医调查中，候选地点是特定案例的。因此，在没有对每个案例和相应的候选集进行足够数量的训练样本重新训练的情况下，封闭集工具是不适用的。此外，法医工具必须处理来自具有可变特性和质量的不受控制来源的音频材料。在这项工作中，我们因此尝试朝着实际法医应用场景迈出重要一步。我们提出了一个称为EnvId的表示学习框架，简称环境识别。EnvId避免了特定案例的重新训练。相反，它是第一个能够对未见环境位置进行稳健少样本分类的工具。我们证明EnvId可以处理具有法医挑战性材料。即使在未见信号退化、环境特征或录音位置不匹配的情况下，它也能提供良好质量的预测。我们的代码和数据集将在接受后公开提供。

更新时间: 2024-05-03 14:19:40

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.02119v1

Practical Performance Guarantees for Pipelined DNN Inference

We optimize pipeline parallelism for deep neural network (DNN) inference by partitioning model graphs into $k$ stages and minimizing the running time of the bottleneck stage, including communication. We give practical and effective algorithms for this NP-hard problem, but our emphasis is on tackling the practitioner's dilemma of deciding when a solution is good enough. To this end, we design novel mixed-integer programming (MIP) relaxations for proving lower bounds. Applying these methods to a diverse testbed of 369 production models, for $k \in \{2, 4, 8, 16, 32, 64\}$, we empirically show that these lower bounds are strong enough to be useful in practice. Our lower bounds are substantially stronger than standard combinatorial bounds. For example, evaluated via geometric means across our production testbed with $k = 16$ pipeline stages, our MIP formulations raised the lower bound from 0.4598 to 0.9452, expressed as a fraction of the best partition found. In other words, our improved lower bounds closed the optimality gap by a factor of 9.855x.

Updated: 2024-05-03 14:05:17

标题: 管道式DNN推理的实用性能保证

摘要: 我们通过将模型图分成$k$个阶段，并最小化瓶颈阶段的运行时间（包括通信），来为深度神经网络（DNN）推理优化管道并行性。我们提供了针对这个NP难题的实用且有效的算法，但我们的重点在于解决从业者在决定解决方案何时足够好时所面临的困境。为此，我们设计了新颖的混合整数规划（MIP）松弛用于证明下界。将这些方法应用于一个包含369个生产模型的多样化测试平台，对于$k \in \{2, 4, 8, 16, 32, 64\}$，我们经验性地表明这些下界足够强大以在实践中发挥作用。我们的下界明显强于标准的组合下界。例如，通过在我们的生产测试平台上以$k=16$个管道阶段为例，我们的MIP公式将下界从0.4598提高到0.9452，表达为找到的最佳分区的比例。换句话说，我们改进的下界将最优性差距缩小了9.855倍。

更新时间: 2024-05-03 14:05:17

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2311.03703v2

Got Root? A Linux Priv-Esc Benchmark

Linux systems are integral to the infrastructure of modern computing environments, necessitating robust security measures to prevent unauthorized access. Privilege escalation attacks represent a significant threat, typically allowing attackers to elevate their privileges from an initial low-privilege account to the all-powerful root account. A benchmark set of vulnerable systems is of high importance to evaluate the effectiveness of privilege-escalation techniques performed by both humans and automated tooling. Analyzing their behavior allows defenders to better fortify their entrusted Linux systems and thus protect their infrastructure from potentially devastating attacks. To address this gap, we developed a comprehensive benchmark for Linux privilege escalation. It provides a standardized platform to evaluate and compare the performance of human and synthetic actors, e.g., hacking scripts or automated tooling.

Updated: 2024-05-03 14:04:51

标题: 有root权限吗？Linux特权提升基准测试

摘要: Linux系统是现代计算环境基础设施的重要组成部分，需要强大的安全措施来防止未经授权的访问。特权升级攻击代表了一个重大威胁，通常允许攻击者将他们的权限从初始低权限账户提升到具有全部权限的root账户。一个易受攻击的系统基准集对于评估人类和自动化工具执行的特权升级技术的有效性至关重要。分析它们的行为可以让防御者更好地加固他们信任的Linux系统，从而保护基础设施免受潜在灾难性的攻击。为了填补这一空白，我们开发了一个全面的Linux特权升级基准。它提供了一个标准化平台来评估和比较人类和合成行为者的表现，例如黑客脚本或自动化工具。

更新时间: 2024-05-03 14:04:51

领域: cs.CR

下载: http://arxiv.org/abs/2405.02106v1

Evaluating Large Language Models for Structured Science Summarization in the Open Research Knowledge Graph

Structured science summaries or research contributions using properties or dimensions beyond traditional keywords enhances science findability. Current methods, such as those used by the Open Research Knowledge Graph (ORKG), involve manually curating properties to describe research papers' contributions in a structured manner, but this is labor-intensive and inconsistent between the domain expert human curators. We propose using Large Language Models (LLMs) to automatically suggest these properties. However, it's essential to assess the readiness of LLMs like GPT-3.5, Llama 2, and Mistral for this task before application. Our study performs a comprehensive comparative analysis between ORKG's manually curated properties and those generated by the aforementioned state-of-the-art LLMs. We evaluate LLM performance through four unique perspectives: semantic alignment and deviation with ORKG properties, fine-grained properties mapping accuracy, SciNCL embeddings-based cosine similarity, and expert surveys comparing manual annotations with LLM outputs. These evaluations occur within a multidisciplinary science setting. Overall, LLMs show potential as recommendation systems for structuring science, but further finetuning is recommended to improve their alignment with scientific tasks and mimicry of human expertise.

Updated: 2024-05-03 14:03:04

标题: 评估大型语言模型在开放研究知识图中用于结构化科学摘要的效果

摘要: 结构化的科学摘要或研究成果利用超越传统关键词的属性或维度增强了科学的可查性。目前的方法，比如Open Research Knowledge Graph（ORKG）使用的方法，涉及手动筛选属性，以结构化的方式描述研究论文的贡献，但这是一项耗时且领域专家人工筛选之间不一致的工作。我们提出使用大型语言模型（LLMs）自动建议这些属性。然而，在应用之前，评估像GPT-3.5、Llama 2和Mistral这样的LLMs是否准备就绪是至关重要的。我们的研究通过ORKG手动筛选的属性和上述最先进的LLMs生成的属性进行了全面的比较分析。我们通过四个独特的视角评估LLM的性能：与ORKG属性的语义对齐和偏差，细粒度属性映射准确性，基于SciNCL嵌入的余弦相似度，以及专家调查比较手动注释与LLM输出。这些评估发生在跨学科科学环境中。总体而言，LLMs显示出作为结构化科学的推荐系统的潜力，但建议进一步微调以改进它们与科学任务的对齐性和对人类专业知识的模仿。

更新时间: 2024-05-03 14:03:04

领域: cs.AI,cs.CL,cs.IT,math.IT

下载: http://arxiv.org/abs/2405.02105v1

Reconstructions of Jupiter's magnetic field using physics informed neural networks

Magnetic sounding using data collected from the Juno mission can be used to provide constraints on Jupiter's interior. However, inwards continuation of reconstructions assuming zero electrical conductivity and a representation in spherical harmonics are limited by the enhancement of noise at small scales. Here we describe new reconstructions of Jupiter's internal magnetic field based on physics-informed neural networks and either the first 33 (PINN33) or the first 50 (PINN50) of Juno's orbits. The method can resolve local structures, and allows for weak ambient electrical currents. Our models are not hampered by noise amplification at depth, and offer a much clearer picture of the interior structure. We estimate that the dynamo boundary is at a fractional radius of 0.8. At this depth, the magnetic field is arranged into longitudinal bands, and strong local features such as the great blue spot appear to be rooted in neighbouring structures of oppositely signed flux.

Updated: 2024-05-03 14:00:58

标题: 使用物理信息神经网络重建木星的磁场

摘要: 使用Juno任务收集的数据进行磁性声测可以用来对木星的内部施加约束。然而，假设电导率为零并且以球谐函数表示的重建向内延伸受到小尺度噪音增强的限制。在这里，我们描述了基于物理信息神经网络的木星内部磁场的新重建方法，其中使用了Juno轨道的前33个（PINN33）或前50个（PINN50）。该方法可以解析局部结构，并允许存在弱环境电流。我们的模型不受深度噪音放大的影响，并提供了内部结构的更清晰图像。我们估计地幔边界位于半径的分数0.8处。在这个深度，磁场被排列成经向带，而强烈的局部特征，如巨大的蓝色斑点，似乎根植于相邻结构中的异号磁通。

更新时间: 2024-05-03 14:00:58

领域: astro-ph.EP,cs.LG

下载: http://arxiv.org/abs/2403.07507v2

Compressing neural network by tensor network with exponentially fewer variational parameters

Neural network (NN) designed for challenging machine learning tasks is in general a highly nonlinear mapping that contains massive variational parameters. High complexity of NN, if unbounded or unconstrained, might unpredictably cause severe issues including over-fitting, loss of generalization power, and unbearable cost of hardware. In this work, we propose a general compression scheme that significantly reduces the variational parameters of NN by encoding them to deep automatically-differentiable tensor network (ADTN) that contains exponentially-fewer free parameters. Superior compression performance of our scheme is demonstrated on several widely-recognized NN's (FC-2, LeNet-5, AlextNet, ZFNet and VGG-16) and datasets (MNIST, CIFAR-10 and CIFAR-100). For instance, we compress two linear layers in VGG-16 with approximately $10^{7}$ parameters to two ADTN's with just 424 parameters, where the testing accuracy on CIFAR-10 is improved from $90.17 \%$ to $91.74\%$. Our work suggests TN as an exceptionally efficient mathematical structure for representing the variational parameters of NN's, which exhibits superior compressibility over the commonly-used matrices and multi-way arrays.

Updated: 2024-05-03 13:59:46

标题: 用指数减少的变分参数通过张量网络压缩神经网络

摘要: 我们提出了一种通用的压缩方案，通过将神经网络（NN）的变化参数编码到深度自动可微张量网络（ADTN）中，显著减少了NN的变化参数，ADTN包含指数级较少的自由参数。我们的方案在几个广泛认可的NN（FC-2、LeNet-5、AlextNet、ZFNet和VGG-16）和数据集（MNIST、CIFAR-10和CIFAR-100）上展示了优越的压缩性能。例如，我们将VGG-16中的两个线性层的大约$10^{7}$个参数压缩为仅有424个参数的两个ADTN，其中在CIFAR-10上的测试准确率从$90.17\%$提高到$91.74\%$。我们的工作表明，TN作为一种异常高效的数学结构，用于表示NN的变化参数，具有优越的压缩性，超过了常用的矩阵和多维数组。

更新时间: 2024-05-03 13:59:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2305.06058v2

Discrete Aware Matrix Completion via Convexized $\ell_0$-Norm Approximation

We consider a novel algorithm, for the completion of partially observed low-rank matrices in a structured setting where each entry can be chosen from a finite discrete alphabet set, such as in common recommender systems. The proposed low-rank matrix completion (MC) method is an improved variation of state-of-the-art (SotA) discrete aware matrix completion method which we previously proposed, in which discreteness is enforced by an $\ell_0$-norm regularizer, not by replaced with the $\ell_1$-norm, but instead approximated by a continuous and differentiable function normalized via fractional programming (FP) under a proximal gradient (PG) framework. Simulation results demonstrate the superior performance of the new method compared to the SotA techniques as well as the earlier $\ell_1$-norm-based discrete-aware matrix completion approach.

Updated: 2024-05-03 13:54:59

标题: 使用凸化的$\ell_0$-范数逼近进行离散意识矩阵完备化

摘要: 我们考虑了一种新颖的算法，用于在结构化环境中完成部分观测到的低秩矩阵，其中每个条目都可以从有限的离散字母集中选择，例如常见的推荐系统。所提出的低秩矩阵完成（MC）方法是我们先前提出的最先进（SotA）离散感知矩阵完成方法的改进版本，其中通过$\ell_0$范数正则化器来强制离散性，而不是用$\ell_1$范数替代，而是通过分数规划（FP）在近端梯度（PG）框架下规范化为连续且可微分的函数来近似。模拟结果表明，与SotA技术以及早期基于$\ell_1$范数的离散感知矩阵完成方法相比，新方法具有更优越的性能。

更新时间: 2024-05-03 13:54:59

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2405.02101v1

Forecasting Ferry Passenger Flow Using Long-Short Term Memory Neural Networks

With recent studies related to Neural Networks being used on different forecasting and time series investigations, this study aims to expand these contexts to ferry passenger traffic. The primary objective of the study is to investigate and evaluate an LSTM-based Neural Networks' capability to forecast ferry passengers of two ports in the Philippines. The proposed model's fitting and evaluation of the passenger flow forecasting of the two ports is based on monthly passenger traffic from 2016 to 2022 data that was acquired from the Philippine Ports Authority (PPA). This work uses Mean Absolute Percentage Error (MAPE) as its primary metric to evaluate the model's forecasting capability. The proposed LSTM-based Neural Networks model achieved 72% forecasting accuracy to the Batangas port ferry passenger data and 74% forecasting accuracy to the Mindoro port ferry passenger data. Using Keras and Scikit-learn Python libraries, this work concludes a reasonable forecasting performance of the presented LSTM model. Aside from these notable findings, this study also recommends further investigation and studies on employing other statistical, machine learning, and deep learning methods on forecasting ferry passenger flows.

Updated: 2024-05-03 13:48:05

标题: 使用长短期记忆神经网络预测渡轮乘客流量

摘要: 随着最近有关神经网络在不同预测和时间序列研究中的应用，本研究旨在将这些背景扩展到渡轮客流。本研究的主要目标是调查并评估基于LSTM的神经网络对菲律宾两个港口的渡轮客流进行预测的能力。所提出的模型对两个港口的客流预测是基于2016年至2022年从菲律宾港口管理局（PPA）获得的月度客流数据进行的。本研究使用平均绝对百分比误差（MAPE）作为评估模型预测能力的主要指标。所提出的基于LSTM的神经网络模型对Batangas港口渡轮客流数据达到了72%的预测准确率，对Mindoro港口渡轮客流数据达到了74%的预测准确率。使用Keras和Scikit-learn Python库，本研究得出了所提出的LSTM模型的合理预测性能。除了这些显着的发现外，本研究还建议进一步研究和研究在预测渡轮客流方面采用其他统计、机器学习和深度学习方法。

更新时间: 2024-05-03 13:48:05

领域: cs.LG

下载: http://arxiv.org/abs/2405.02098v1

Advanced Detection of Source Code Clones via an Ensemble of Unsupervised Similarity Measures

The capability of accurately determining code similarity is crucial in many tasks related to software development. For example, it might be essential to identify code duplicates for performing software maintenance. This research introduces a novel ensemble learning approach for code similarity assessment, combining the strengths of multiple unsupervised similarity measures. The key idea is that the strengths of a diverse set of similarity measures can complement each other and mitigate individual weaknesses, leading to improved performance. Preliminary results show that while Transformers-based CodeBERT and its variant GraphCodeBERT are undoubtedly the best option in the presence of abundant training data, in the case of specific small datasets (up to 500 samples), our ensemble achieves similar results, without prejudice to the interpretability of the resulting solution, and with a much lower associated carbon footprint due to training. The source code of this novel approach can be downloaded from https://github.com/jorge-martinez-gil/ensemble-codesim.

Updated: 2024-05-03 13:42:49

标题: 通过一组无监督相似度测量方法的集成，实现源代码克隆的高级检测

摘要: 准确确定代码相似性的能力在与软件开发相关的许多任务中至关重要。例如，识别代码重复以进行软件维护可能是至关重要的。这项研究引入了一种新颖的集成学习方法，用于代码相似性评估，结合了多种无监督相似性度量的优势。关键思想是，多种相似性度量的优势可以相互补充，减轻个体的弱点，从而提高性能。初步结果显示，虽然基于Transformers的CodeBERT及其变种GraphCodeBERT在存在丰富的训练数据时无疑是最佳选择，但在特定小数据集的情况下（最多500个样本），我们的集成方法实现了类似的结果，而且不损害所得解决方案的可解释性，并且由于训练而产生的碳足迹大大降低。这种新颖方法的源代码可以从https://github.com/jorge-martinez-gil/ensemble-codesim下载。

更新时间: 2024-05-03 13:42:49

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2405.02095v1

Human-AI Coevolution

Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online platforms. The interaction between users and AI results in a potentially endless feedback loop, wherein users' choices generate data to train AI models, which, in turn, shape subsequent user preferences. This human-AI feedback loop has peculiar characteristics compared to traditional human-machine interaction and gives rise to complex and often ``unintended'' social outcomes. This paper introduces Coevolution AI as the cornerstone for a new field of study at the intersection between AI and complexity science focused on the theoretical, empirical, and mathematical investigation of the human-AI feedback loop. In doing so, we: (i) outline the pros and cons of existing methodologies and highlight shortcomings and potential ways for capturing feedback loop mechanisms; (ii) propose a reflection at the intersection between complexity science, AI and society; (iii) provide real-world examples for different human-AI ecosystems; and (iv) illustrate challenges to the creation of such a field of study, conceptualising them at increasing levels of abstraction, i.e., technical, epistemological, legal and socio-political.

Updated: 2024-05-03 13:38:55

标题: 人工智能与人类的共同进化

摘要: 人工智能与人类的共同进化被定义为一种过程，其中人类和人工智能算法不断相互影响，越来越多地成为我们社会的特征，但在人工智能和复杂性科学文献中研究不足。推荐系统和助手在人工智能共同进化中发挥着重要作用，因为它们渗透到日常生活的许多方面，并影响在线平台上的人类选择。用户与人工智能之间的互动导致潜在的无限反馈循环，其中用户的选择生成数据以训练人工智能模型，而这些模型反过来塑造了随后的用户偏好。与传统的人机交互相比，这种人工智能反馈循环具有独特的特征，导致复杂且常常是“意外”的社会结果。本文将Coevolution AI引入作为AI和复杂性科学交叉领域的一个新研究基石，重点在于理论、实证和数学上探讨人工智能反馈循环。在这一过程中，我们：(i)概述现有方法论的优缺点，并强调捕捉反馈循环机制的缺陷和潜在方法；(ii)提出在复杂性科学、人工智能和社会交叉点的思考；(iii)为不同人工智能生态系统提供现实世界的例子；(iv)说明创建这一研究领域的挑战，并将其概念化为技术、认识论、法律和社会政治不同的抽象层次。

更新时间: 2024-05-03 13:38:55

领域: cs.AI

下载: http://arxiv.org/abs/2306.13723v2

Process Mining Embeddings: Learning Vector Representations for Petri Nets

Process mining offers powerful techniques for discovering, analyzing, and enhancing real-world business processes. In this context, Petri nets provide an expressive means of modeling process behavior. However, directly analyzing and comparing intricate Petri net presents challenges. This study introduces PetriNet2Vec, a novel unsupervised methodology based on Natural Language Processing concepts inspired by Doc2Vec and designed to facilitate the effective comparison, clustering, and classification of process models represented as embedding vectors. These embedding vectors allow us to quantify similarities and relationships between different process models. Our methodology was experimentally validated using the PDC Dataset, featuring 96 diverse Petri net models. We performed cluster analysis, created UMAP visualizations, and trained a decision tree to provide compelling evidence for the capability of PetriNet2Vec to discern meaningful patterns and relationships among process models and their constituent tasks. Through a series of experiments, we demonstrated that PetriNet2Vec was capable of learning the structure of Petri nets, as well as the main properties used to simulate the process models of our dataset. Furthermore, our results showcase the utility of the learned embeddings in two crucial downstream tasks within process mining enhancement: process classification and process retrieval.

Updated: 2024-05-03 13:33:59

标题: Process Mining Embeddings: 学习Petri网的向量表示

摘要: 过程挖掘提供了强大的技术，用于发现、分析和增强现实世界的业务流程。在这种情况下，Petri网提供了一种表现过程行为的表达方式。然而，直接分析和比较复杂的Petri网存在挑战。本研究介绍了PetriNet2Vec，这是一种基于自然语言处理概念、受到Doc2Vec启发的新颖的无监督方法，旨在促进过程模型表示为嵌入向量时的有效比较、聚类和分类。这些嵌入向量使我们能够量化不同过程模型之间的相似性和关系。我们的方法在具有96个不同Petri网模型的PDC数据集上进行了实验验证。我们进行了集群分析，创建了UMAP可视化，并训练了决策树，为PetriNet2Vec能够识别过程模型及其组成任务之间的有意义模式和关系提供了有力证据。通过一系列实验，我们证明了PetriNet2Vec能够学习Petri网的结构，以及用于模拟我们数据集中过程模型的主要属性。此外，我们的结果展示了在过程挖掘增强中两个关键的下游任务中学到的嵌入的实用性：过程分类和过程检索。

更新时间: 2024-05-03 13:33:59

领域: cs.AI

下载: http://arxiv.org/abs/2404.17129v2

On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box

Attribution methods shed light on the explainability of data-driven approaches such as deep learning models by uncovering the most influential features in a to-be-explained decision. While determining feature attributions via gradients delivers promising results, the internal access required for acquiring gradients can be impractical under safety concerns, thus limiting the applicability of gradient-based approaches. In response to such limited flexibility, this paper presents \methodAbr~(gradient-estimation-based explanation), an approach that produces gradient-like explanations through only query-level access. The proposed approach holds a set of fundamental properties for attribution methods, which are mathematically rigorously proved, ensuring the quality of its explanations. In addition to the theoretical analysis, with a focus on image data, the experimental results empirically demonstrate the superiority of the proposed method over state-of-the-art black-box methods and its competitive performance compared to methods with full access.

Updated: 2024-05-03 13:21:53

标题: 在黑盒设置下的类梯度解释：当黑盒解释变得与白盒一样好时

摘要: 归因方法通过揭示决策中最具影响力的特征，为深度学习模型等数据驱动方法的可解释性提供了启示。虽然通过梯度确定特征归因可以产生有希望的结果，但获取梯度所需的内部访问可能在安全方面不切实际，从而限制了基于梯度的方法的适用性。为了应对这种有限的灵活性，本文提出了一种名为\methodAbr~（基于梯度估计的解释）的方法，通过仅具有查询级别访问权限产生类似梯度的解释。所提出的方法具有一组经过数学严格证明的基本属性，确保其解释的质量。除了理论分析，重点是图像数据，实验结果从经验上证明了所提出的方法优于最先进的黑盒方法，并且相对于完全访问的方法具有竞争性表现。

更新时间: 2024-05-03 13:21:53

领域: cs.LG

下载: http://arxiv.org/abs/2308.09381v2

Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks

The $\ell_{1,\infty}$ norm is an efficient structured projection but the complexity of the best algorithm is unfortunately $\mathcal{O}\big(n m \log(n m)\big)$ for a matrix in $\mathbb{R}^{n\times m}$. In this paper, we propose a new bi-level projection method for which we show that the time complexity for the $\ell_{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix in $\mathbb{R}^{n\times m}$, and $\mathcal{O}\big(n + m \big)$ with full parallel power. We generalize our method to tensors and we propose a new multi-level projection, having an induced decomposition that yields a linear parallel speedup up to an exponential speedup factor, resulting in a time complexity lower-bounded by the sum of the dimensions. Experiments show that our bi-level $\ell_{1,\infty}$ projection is $2.5$ times faster than the actual fastest algorithm provided by \textit{Chu et. al.} while providing same accuracy and better sparsity in neural networks applications.

Updated: 2024-05-03 13:21:49

标题: 多级投影与指数并行加速；应用于稀疏自编码器神经网络

摘要: $\ell_{1,\infty}$范数是一种高效的结构化投影方法，但是最佳算法的复杂度不幸地为$\mathcal{O}\big(n m \log(n m)\big)$，对于一个$n\times m$的矩阵来说。在本文中，我们提出了一种新的双层投影方法，我们展示了对于$\ell_{1,\infty}$范数，该方法的时间复杂度仅为$\mathcal{O}\big(n m\big)$，对于一个$n\times m$的矩阵来说，并且在完全并行计算的情况下为$\mathcal{O}\big(n + m\big)$。我们将我们的方法推广到张量，并提出了一种新的多层投影，具有引发分解，可以实现线性加速度直至指数加速度的因子，并导致时间复杂度下限为维度之和。实验证明，我们的双层$\ell_{1,\infty}$投影比由\textit{Chu等人}提供的最快算法快2.5倍，同时在神经网络应用中提供相同的准确性和更好的稀疏性。

更新时间: 2024-05-03 13:21:49

领域: cs.LG

下载: http://arxiv.org/abs/2405.02086v1

A semantic loss for ontology classification

Deep learning models are often unaware of the inherent constraints of the task they are applied to. However, many downstream tasks require logical consistency. For ontology classification tasks, such constraints include subsumption and disjointness relations between classes. In order to increase the consistency of deep learning models, we propose a semantic loss that combines label-based loss with terms penalising subsumption- or disjointness-violations. Our evaluation on the ChEBI ontology shows that the semantic loss is able to decrease the number of consistency violations by several orders of magnitude without decreasing the classification performance. In addition, we use the semantic loss for unsupervised learning. We show that this can further improve consistency on data from a distribution outside the scope of the supervised training.

Updated: 2024-05-03 13:20:37

标题: 一个本体分类的语义丢失

摘要: 深度学习模型通常不了解其应用于的任务的固有约束。然而，许多下游任务需要逻辑一致性。对于本体分类任务，这些约束包括类别之间的包含和互斥关系。为了增加深度学习模型的一致性，我们提出了一种语义损失，它将基于标签的损失与惩罚包含或互斥违规的项结合起来。我们在ChEBI本体上的评估表明，语义损失能够将一致性违规的数量减少数个数量级，而不会降低分类性能。此外，我们将语义损失用于无监督学习。我们展示了这可以进一步提高对来自受监督训练范围之外分布的数据的一致性。

更新时间: 2024-05-03 13:20:37

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2405.02083v1

A comparative study of conformal prediction methods for valid uncertainty quantification in machine learning

In the past decades, most work in the area of data analysis and machine learning was focused on optimizing predictive models and getting better results than what was possible with existing models. To what extent the metrics with which such improvements were measured were accurately capturing the intended goal, whether the numerical differences in the resulting values were significant, or whether uncertainty played a role in this study and if it should have been taken into account, was of secondary importance. Whereas probability theory, be it frequentist or Bayesian, used to be the gold standard in science before the advent of the supercomputer, it was quickly replaced in favor of black box models and sheer computing power because of their ability to handle large data sets. This evolution sadly happened at the expense of interpretability and trustworthiness. However, while people are still trying to improve the predictive power of their models, the community is starting to realize that for many applications it is not so much the exact prediction that is of importance, but rather the variability or uncertainty. The work in this dissertation tries to further the quest for a world where everyone is aware of uncertainty, of how important it is and how to embrace it instead of fearing it. A specific, though general, framework that allows anyone to obtain accurate uncertainty estimates is singled out and analysed. Certain aspects and applications of the framework -- dubbed `conformal prediction' -- are studied in detail. Whereas many approaches to uncertainty quantification make strong assumptions about the data, conformal prediction is, at the time of writing, the only framework that deserves the title `distribution-free'. No parametric assumptions have to be made and the nonparametric results also hold without having to resort to the law of large numbers in the asymptotic regime.

Updated: 2024-05-03 13:19:33

标题: 一种比较研究：机器学习中用于有效不确定性量化的符合性预测方法

摘要: 在过去几十年里，数据分析和机器学习领域的大部分工作都集中在优化预测模型，并获得比现有模型可能的更好结果。在多大程度上这些改进的度量方式准确地捕捉到预期目标，结果值中的数值差异是否显著，或者不确定性在研究中是否起了作用并且是否应该被考虑，这些都是次要的。在超级计算机出现之前，概率论，无论是频率派还是贝叶斯派，曾经是科学中的金标准，但很快被黑盒模型和强大的计算能力所取代，因为它们能够处理大数据集。这种演变遗憾地发生在可解释性和可信度的代价上。然而，尽管人们仍在努力提高模型的预测能力，社区开始意识到，在许多应用中，重要的并不是准确的预测，而是变异性或不确定性。本论文的工作试图进一步探索一个让每个人都意识到不确定性、它的重要性以及如何接受它而不是害怕它的世界。一个特定但广泛的框架，允许任何人获得准确的不确定性估计，被单独挑选出来并进行分析。对该框架的某些方面和应用——被称为“符合预测”——进行了详细研究。虽然许多不确定性量化方法对数据有严格的假设，但在撰写本文时，符合预测是唯一值得被称为“无分布假设”的框架。不必做出参数假设，非参数结果也适用，而不必在渐近区域依赖大数定律。

更新时间: 2024-05-03 13:19:33

领域: stat.ML,cs.AI,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2405.02082v1

A Mutual Information Perspective on Federated Contrastive Learning

We investigate contrastive learning in the federated setting through the lens of SimCLR and multi-view mutual information maximization. In doing so, we uncover a connection between contrastive representation learning and user verification; by adding a user verification loss to each client's local SimCLR loss we recover a lower bound to the global multi-view mutual information. To accommodate for the case of when some labelled data are available at the clients, we extend our SimCLR variant to the federated semi-supervised setting. We see that a supervised SimCLR objective can be obtained with two changes: a) the contrastive loss is computed between datapoints that share the same label and b) we require an additional auxiliary head that predicts the correct labels from either of the two views. Along with the proposed SimCLR extensions, we also study how different sources of non-i.i.d.-ness can impact the performance of federated unsupervised learning through global mutual information maximization; we find that a global objective is beneficial for some sources of non-i.i.d.-ness but can be detrimental for others. We empirically evaluate our proposed extensions in various tasks to validate our claims and furthermore demonstrate that our proposed modifications generalize to other pretraining methods.

Updated: 2024-05-03 13:15:29

标题: 《基于互信息的联邦对比学习视角》

摘要: 我们通过SimCLR和多视图互信息最大化的透镜，研究了联邦设置中的对比学习。在这个过程中，我们揭示了对比表示学习与用户验证之间的联系；通过在每个客户端的本地SimCLR损失中添加用户验证损失，我们恢复了全局多视图互信息的下界。为了适应客户端某些标记数据可用的情况，我们将我们的SimCLR变体扩展到联邦半监督设置中。我们发现，可以通过两个变化获得监督SimCLR目标：a) 对比损失在共享相同标签的数据点之间计算，b) 我们需要一个额外的辅助头部，从两个视图中预测正确的标签。除了提出的SimCLR扩展之外，我们还研究了不同的非独立同分布源如何影响通过全局互信息最大化进行联邦无监督学习的性能；我们发现，对于某些非独立同分布源，全局目标是有益的，但对于其他源则可能有害。我们在各种任务中进行了实证评估，以验证我们的观点，并进一步证明我们提出的修改方法可以推广到其他预训练方法。

更新时间: 2024-05-03 13:15:29

领域: cs.LG

下载: http://arxiv.org/abs/2405.02081v1

Fisher Mask Nodes for Language Model Merging

Fine-tuning pre-trained models provides significant advantages in downstream performance. The ubiquitous nature of pre-trained models such as BERT and its derivatives in natural language processing has also led to a proliferation of task-specific fine-tuned models. As these models typically only perform one task well, additional training or ensembling is required in multi-task scenarios. The growing field of model merging provides a solution, dealing with the challenge of combining multiple task-specific models into a single multi-task model. In this study, we introduce a novel model merging method for Transformers, combining insights from previous work in Fisher-weighted averaging and the use of Fisher information in model pruning. Utilizing the Fisher information of mask nodes within the Transformer architecture, we devise a computationally efficient weighted-averaging scheme. Our method exhibits a regular and significant performance increase across various models in the BERT family, outperforming full-scale Fisher-weighted averaging in a fraction of the computational cost, with baseline performance improvements of up to +6.5 and a speedup between 57.4x and 321.7x across models. Our results prove the potential of our method in current multi-task learning environments and suggest its scalability and adaptability to new model architectures and learning scenarios.

Updated: 2024-05-03 13:12:40

标题: 费舍尔蒙特卡罗节点用于语言模型合并

摘要: 对预训练模型进行微调在下游性能方面具有显著优势。预训练模型如BERT及其衍生物在自然语言处理中的普遍性已导致了任务特定的微调模型的激增。由于这些模型通常只能很好地执行一项任务，因此在多任务场景中需要额外的训练或组合。模型合并领域的发展提供了一种解决方案，处理将多个任务特定模型合并成单个多任务模型的挑战。在本研究中，我们引入了一种新颖的Transformer模型合并方法，结合了先前Fisher加权平均和Fisher信息在模型修剪中的应用的见解。利用Transformer架构中掩码节点的Fisher信息，我们设计了一个计算效率高的加权平均方案。我们的方法在BERT家族中的各种模型中表现出稳定且显著的性能提升，以较小的计算成本超越全尺度的Fisher加权平均方法，基准性能提升高达+6.5，速度提升在57.4倍至321.7倍之间。我们的结果证明了我们方法在当前的多任务学习环境中的潜力，并表明其可扩展性和适应性适用于新的模型架构和学习场景。

更新时间: 2024-05-03 13:12:40

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.09891v3

Argumentative Large Language Models for Explainable and Contestable Decision-Making

The diversity of knowledge encoded in large language models (LLMs) and their ability to apply this knowledge zero-shot in a range of settings makes them a promising candidate for use in decision-making. However, they are currently limited by their inability to reliably provide outputs which are explainable and contestable. In this paper, we attempt to reconcile these strengths and weaknesses by introducing a method for supplementing LLMs with argumentative reasoning. Concretely, we introduce argumentative LLMs, a method utilising LLMs to construct argumentation frameworks, which then serve as the basis for formal reasoning in decision-making. The interpretable nature of these argumentation frameworks and formal reasoning means that any decision made by the supplemented LLM may be naturally explained to, and contested by, humans. We demonstrate the effectiveness of argumentative LLMs experimentally in the decision-making task of claim verification. We obtain results that are competitive with, and in some cases surpass, comparable state-of-the-art techniques.

Updated: 2024-05-03 13:12:28

标题: 争议性大型语言模型在可解释和可辩论决策中的应用

摘要: 大型语言模型（LLMs）编码的知识多样性及其在各种环境中零次应用知识的能力，使其成为决策过程中值得考虑的候选者。然而，它们目前受限于无法可靠地提供可解释和可争议的输出。在本文中，我们尝试通过引入一种用于辅助LLMs进行辩证推理的方法来调和这些优势和劣势。具体来说，我们引入了辩证LLMs，这是一种利用LLMs构建论证框架的方法，然后这些框架成为决策中形式推理的基础。这些论证框架和形式推理的可解释性意味着经过辅助的LLM做出的任何决定都可以自然地向人类解释和争论。我们在索赔验证任务中实验证明了辩证LLMs的有效性。我们获得了与可比较的最新技术相竞争的结果，并在某些情况下超越了它们。

更新时间: 2024-05-03 13:12:28

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2405.02079v1

A Federated Learning Benchmark on Tabular Data: Comparing Tree-Based Models and Neural Networks

Federated Learning (FL) has lately gained traction as it addresses how machine learning models train on distributed datasets. FL was designed for parametric models, namely Deep Neural Networks (DNNs).Thus, it has shown promise on image and text tasks. However, FL for tabular data has received little attention. Tree-Based Models (TBMs) have been considered to perform better on tabular data and they are starting to see FL integrations. In this study, we benchmark federated TBMs and DNNs for horizontal FL, with varying data partitions, on 10 well-known tabular datasets. Our novel benchmark results indicates that current federated boosted TBMs perform better than federated DNNs in different data partitions. Furthermore, a federated XGBoost outperforms all other models. Lastly, we find that federated TBMs perform better than federated parametric models, even when increasing the number of clients significantly.

Updated: 2024-05-03 13:08:56

标题: 基于表格数据的联邦学习基准：比较基于树模型和神经网络的模型

摘要: 最近，联邦学习（FL）作为一种解决机器学习模型在分布式数据集上训练的方法逐渐受到关注。FL旨在为参数模型，即深度神经网络（DNNs）设计。因此，它在图像和文本任务上显示出了潜力。然而，针对表格数据的FL却受到了较少关注。基于树的模型（TBMs）被认为在表格数据上表现更好，它们开始看到FL的整合。在这项研究中，我们对联邦TBMs和DNNs进行了水平FL的基准测试，使用不同的数据分区，在10个知名的表格数据集上进行。我们的新颖基准测试结果表明，当前的联邦增强TBMs在不同的数据分区中表现比联邦DNNs更好。此外，联邦XGBoost优于所有其他模型。最后，我们发现，即使增加了客户端数量，联邦TBMs仍然比联邦参数模型表现更好。

更新时间: 2024-05-03 13:08:56

领域: cs.LG

下载: http://arxiv.org/abs/2405.02074v1

Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformers

Optimal transport (OT) and the related Wasserstein metric (W) are powerful and ubiquitous tools for comparing distributions. However, computing pairwise Wasserstein distances rapidly becomes intractable as cohort size grows. An attractive alternative would be to find an embedding space in which pairwise Euclidean distances map to OT distances, akin to standard multidimensional scaling (MDS). We present Wasserstein Wormhole, a transformer-based autoencoder that embeds empirical distributions into a latent space wherein Euclidean distances approximate OT distances. Extending MDS theory, we show that our objective function implies a bound on the error incurred when embedding non-Euclidean distances. Empirically, distances between Wormhole embeddings closely match Wasserstein distances, enabling linear time computation of OT distances. Along with an encoder that maps distributions to embeddings, Wasserstein Wormhole includes a decoder that maps embeddings back to distributions, allowing for operations in the embedding space to generalize to OT spaces, such as Wasserstein barycenter estimation and OT interpolation. By lending scalability and interpretability to OT approaches, Wasserstein Wormhole unlocks new avenues for data analysis in the fields of computational geometry and single-cell biology.

Updated: 2024-05-03 13:07:49

标题: Wasserstein虫洞：利用Transformers实现可扩展的最优输运距离

摘要: 最优输运（OT）及相关的Wasserstein度量（W）是比较分布的强大而普遍的工具。然而，随着队列规模的增长，计算成对的Wasserstein距离迅速变得棘手。一个有吸引力的替代方案是找到一个嵌入空间，在这个空间中，成对的欧氏距离映射到OT距离，类似于标准的多维缩放（MDS）。我们提出了Wasserstein Wormhole，一个基于变压器的自动编码器，将经验分布嵌入到一个潜在空间中，其中欧氏距离近似于OT距离。通过扩展MDS理论，我们展示了我们的目标函数意味着在嵌入非欧几里得距离时所产生的误差的上限。在经验上，Wormhole嵌入之间的距离与Wasserstein距离密切匹配，使得OT距离的计算变得线性时间。除了一个将分布映射到嵌入的编码器外，Wasserstein Wormhole还包括一个将嵌入映射回分布的解码器，允许在嵌入空间中的操作推广到OT空间，如Wasserstein质心估计和OT插值。通过为OT方法提供可扩展性和可解释性，Wasserstein Wormhole为计算几何学和单细胞生物学领域的数据分析开辟了新的途径。

更新时间: 2024-05-03 13:07:49

领域: cs.LG,cs.CG,q-bio.GN

下载: http://arxiv.org/abs/2404.09411v2

Utilizing Deep Learning to Optimize Software Development Processes

This study explores the application of deep learning technologies in software development processes, particularly in automating code reviews, error prediction, and test generation to enhance code quality and development efficiency. Through a series of empirical studies, experimental groups using deep learning tools and control groups using traditional methods were compared in terms of code error rates and project completion times. The results demonstrated significant improvements in the experimental group, validating the effectiveness of deep learning technologies. The research also discusses potential optimization points, methodologies, and technical challenges of deep learning in software development, as well as how to integrate these technologies into existing software development workflows.

Updated: 2024-05-03 13:07:18

标题: 利用深度学习优化软件开发流程

摘要: 这项研究探讨了深度学习技术在软件开发过程中的应用，特别是在自动化代码审查、错误预测和测试生成方面，以提高代码质量和开发效率。通过一系列实证研究，比较了使用深度学习工具的实验组和使用传统方法的对照组在代码错误率和项目完成时间方面的差异。结果显示实验组取得了显著改进，验证了深度学习技术的有效性。研究还讨论了深度学习在软件开发中的潜在优化点、方法论和技术挑战，以及如何将这些技术整合到现有的软件开发工作流程中。

更新时间: 2024-05-03 13:07:18

领域: cs.SE,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.13630v2

Strategies for Intrusion Monitoring in Cloud Services

Effective activity and event monitoring is an essential aspect of digital forensic readiness. Techniques for capturing log and other event data are familiar from conventional networked hosts and transfer directly to the Cloud context. In both contexts, a major concern is the risk that monitoring systems may be targeted and impaired by intruders seeking to conceal their illicit presence and activities. We outline an approach to intrusion monitoring that aims (i)~to ensure the credibility of log data and (ii)~provide a means of data sharing that supports log reconstruction in the event that one or more logging systems is maliciously impaired.

Updated: 2024-05-03 13:00:36

标题: 云服务中的入侵监控策略

摘要: 有效的活动和事件监控是数字取证准备的一个重要方面。捕获日志和其他事件数据的技术在传统网络主机中已经很熟悉，并直接转移到云环境中。在这两种情况下，一个主要的关注点是监控系统可能会被入侵者针对和损害，以隐藏他们的非法存在和活动。我们概述了一种入侵监控方法，旨在（i）确保日志数据的可信度，（ii）提供一种数据共享方式，支持在一个或多个日志记录系统被恶意破坏的情况下对日志重建。

更新时间: 2024-05-03 13:00:36

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2405.02070v1

Histogram-Based Federated XGBoost using Minimal Variance Sampling for Federated Tabular Data

Federated Learning (FL) has gained considerable traction, yet, for tabular data, FL has received less attention. Most FL research has focused on Neural Networks while Tree-Based Models (TBMs) such as XGBoost have historically performed better on tabular data. It has been shown that subsampling of training data when building trees can improve performance but it is an open problem whether such subsampling can improve performance in FL. In this paper, we evaluate a histogram-based federated XGBoost that uses Minimal Variance Sampling (MVS). We demonstrate the underlying algorithm and show that our model using MVS can improve performance in terms of accuracy and regression error in a federated setting. In our evaluation, our model using MVS performs better than uniform (random) sampling and no sampling at all. It achieves both outstanding local and global performance on a new set of federated tabular datasets. Federated XGBoost using MVS also outperforms centralized XGBoost in half of the studied cases.

Updated: 2024-05-03 12:58:57

标题: 基于直方图的最小方差采样的联合XGBoost在联合表格数据中的应用

摘要: 联邦学习（FL）已经受到了相当大的关注，但是对于表格数据，FL却受到了较少的关注。大多数FL研究都集中在神经网络上，而基于树的模型（TBMs）如XGBoost在表格数据上历史上表现更好。已经表明，在构建树时对训练数据进行子采样可以提高性能，但目前尚不清楚这种子采样是否可以提高FL的性能。在本文中，我们评估了一种基于直方图的联邦XGBoost，使用了最小方差采样（MVS）。我们展示了底层算法，并展示了我们的模型使用MVS可以在联邦设置中提高准确性和回归误差的性能。在我们的评估中，我们的使用MVS的模型表现比均匀（随机）采样和完全不采样要好。它在一组新的联邦表格数据集上既取得了出色的本地性能，又取得了出色的全局性能。使用MVS的联邦XGBoost在半数研究案例中也优于集中式XGBoost。

更新时间: 2024-05-03 12:58:57

领域: cs.LG

下载: http://arxiv.org/abs/2405.02067v1

Few-sample Variational Inference of Bayesian Neural Networks with Arbitrary Nonlinearities

Bayesian Neural Networks (BNNs) extend traditional neural networks to provide uncertainties associated with their outputs. On the forward pass through a BNN, predictions (and their uncertainties) are made either by Monte Carlo sampling network weights from the learned posterior or by analytically propagating statistical moments through the network. Though flexible, Monte Carlo sampling is computationally expensive and can be infeasible or impractical under resource constraints or for large networks. While moment propagation can ameliorate the computational costs of BNN inference, it can be difficult or impossible for networks with arbitrary nonlinearities, thereby restricting the possible set of network layers permitted with such a scheme. In this work, we demonstrate a simple yet effective approach for propagating statistical moments through arbitrary nonlinearities with only 3 deterministic samples, enabling few-sample variational inference of BNNs without restricting the set of network layers used. Furthermore, we leverage this approach to demonstrate a novel nonlinear activation function that we use to inject physics-informed prior information into output nodes of a BNN.

Updated: 2024-05-03 12:48:21

标题: 使用任意非线性函数的贝叶斯神经网络的少样本变分推断

摘要: 贝叶斯神经网络（BNNs）扩展了传统神经网络，提供了与其输出相关的不确定性。在通过BNN进行前向传递时，预测（及其不确定性）可以通过从学习后验中的蒙特卡洛抽样网络权重或通过分析地传播统计矩来进行。尽管灵活，蒙特卡洛抽样在计算上是昂贵的，在资源约束或大型网络下可能不可行或不切实际。虽然矩传播可以减轻BNN推断的计算成本，但对于具有任意非线性的网络来说可能会很困难或不可能，从而限制了可使用此方案的网络层的可能集合。在这项工作中，我们展示了一种简单而有效的方法，通过仅使用3个确定性样本来传播统计矩，从而实现BNN的少样本变分推断，而不限制所使用的网络层集合。此外，我们利用这种方法展示了一种新颖的非线性激活函数，用于将物理信息先验注入到BNN的输出节点中。

更新时间: 2024-05-03 12:48:21

领域: cs.LG

下载: http://arxiv.org/abs/2405.02063v1

Dyna-Style Learning with A Macroscopic Model for Vehicle Platooning in Mixed-Autonomy Traffic

Platooning of connected and autonomous vehicles (CAVs) plays a vital role in modernizing highways, ushering in enhanced efficiency and safety. This paper explores the significance of platooning in smart highways, employing a coupled partial differential equation (PDE) and ordinary differential equation (ODE) model to elucidate the complex interaction between bulk traffic flow and CAV platoons. Our study focuses on developing a Dyna-style planning and learning framework tailored for platoon control, with a specific goal of reducing fuel consumption. By harnessing the coupled PDE-ODE model, we improve data efficiency in Dyna-style learning through virtual experiences. Simulation results validate the effectiveness of our macroscopic model in modeling platoons within mixed-autonomy settings, demonstrating a notable $10.11\%$ reduction in vehicular fuel consumption compared to conventional approaches.

Updated: 2024-05-03 12:44:52

标题: 在混合自治交通中利用宏观模型进行车队编队的Dyna-Style学习

摘要: 智能高速公路中的连接和自动驾驶车辆（CAVs）编队在现代化高速公路中发挥着至关重要的作用，带来了提高效率和安全性。本文探讨了编队在智能高速公路中的重要性，采用耦合偏微分方程（PDE）和常微分方程（ODE）模型来阐明大宗交通流和CAV编队之间复杂的相互作用。我们的研究重点在于开发一种为编队控制量身定制的Dyna-style规划和学习框架，旨在降低燃料消耗。通过利用耦合PDE-ODE模型，我们通过虚拟体验提高了Dyna-style学习的数据效率。仿真结果验证了我们的宏观模型在模拟混合自治设置中的编队方面的有效性，相比传统方法，实现了车辆燃料消耗的显着减少10.11％。

更新时间: 2024-05-03 12:44:52

领域: cs.LG

下载: http://arxiv.org/abs/2405.02062v1

Single-Task Continual Offline Reinforcement Learning

In this paper, we study the continual learning problem of single-task offline reinforcement learning. In the past, continual reinforcement learning usually only dealt with multitasking, that is, learning multiple related or unrelated tasks in a row, but once each learned task was learned, it was not relearned, but only used in subsequent processes. However, offline reinforcement learning tasks require the continuously learning of multiple different datasets for the same task. Existing algorithms will try their best to achieve the best results in each offline dataset they have learned and the skills of the network will overwrite the high-quality datasets that have been learned after learning the subsequent poor datasets. On the other hand, if too much emphasis is placed on stability, the network will learn the subsequent better dataset after learning the poor offline dataset, and the problem of insufficient plasticity and non-learning will occur. How to design a strategy that can always preserve the best performance for each state in the data that has been learned is a new challenge and the focus of this study. Therefore, this study proposes a new algorithm, called Ensemble Offline Reinforcement Learning Based on Experience Replay, which introduces multiple value networks to learn the same dataset and judge whether the strategy has been learned by the discrete degree of the value network, to improve the performance of the network in single-task offline reinforcement learning.

Updated: 2024-05-03 12:43:37

标题: 单任务连续离线强化学习

摘要: 在本文中，我们研究了单任务离线强化学习的持续学习问题。过去，持续强化学习通常只涉及多任务处理，即连续学习多个相关或无关的任务，但一旦每个学习的任务被学会，就不会重新学习，而只会在随后的过程中使用。然而，离线强化学习任务要求对相同任务的多个不同数据集进行持续学习。现有算法将尽力在每个已学习的离线数据集中取得最佳结果，并且网络的技能将在学习后覆盖已学习的高质量数据集。另一方面，如果过分强调稳定性，网络将在学习了较差的离线数据集后学习到后续更好的数据集，并且会出现塑性不足和无法学习的问题。如何设计一种策略，可以始终保留已学习数据中每个状态的最佳表现，是一个新的挑战，也是本研究的焦点。因此，本研究提出了一种新算法，称为基于经验重播的集成离线强化学习，该算法引入多个值网络来学习相同数据集，并通过值网络的离散程度来判断策略是否已被学习，以提高网络在单任务离线强化学习中的性能。

更新时间: 2024-05-03 12:43:37

领域: cs.LG,I.2.6

下载: http://arxiv.org/abs/2404.12639v2

Federated Learning for Tabular Data using TabNet: A Vehicular Use-Case

In this paper, we show how Federated Learning (FL) can be applied to vehicular use-cases in which we seek to classify obstacles, irregularities and pavement types on roads. Our proposed framework utilizes FL and TabNet, a state-of-the-art neural network for tabular data. We are the first to demonstrate how TabNet can be integrated with FL. Moreover, we achieve a maximum test accuracy of 93.6%. Finally, we reason why FL is a suitable concept for this data set.

Updated: 2024-05-03 12:42:40

标题: 使用TabNet进行表格数据的联邦学习：以车辆使用案例为例

摘要: 在本文中，我们展示了如何将联邦学习（FL）应用于车辆使用案例中，在这些案例中，我们试图对道路上的障碍物、不规则性和路面类型进行分类。我们提出的框架利用FL和TabNet，这是一种用于表格数据的最先进的神经网络。我们是第一个展示TabNet如何与FL集成的研究。此外，我们实现了最高测试准确率达到93.6%。最后，我们解释了为什么FL是这个数据集的一个合适的概念。

更新时间: 2024-05-03 12:42:40

领域: cs.LG

下载: http://arxiv.org/abs/2405.02060v1

Causal Discovery Under Local Privacy

Differential privacy is a widely adopted framework designed to safeguard the sensitive information of data providers within a data set. It is based on the application of controlled noise at the interface between the server that stores and processes the data, and the data consumers. Local differential privacy is a variant that allows data providers to apply the privatization mechanism themselves on their data individually. Therefore it provides protection also in contexts in which the server, or even the data collector, cannot be trusted. The introduction of noise, however, inevitably affects the utility of the data, particularly by distorting the correlations between individual data components. This distortion can prove detrimental to tasks such as causal discovery. In this paper, we consider various well-known locally differentially private mechanisms and compare the trade-off between the privacy they provide, and the accuracy of the causal structure produced by algorithms for causal learning when applied to data obfuscated by these mechanisms. Our analysis yields valuable insights for selecting appropriate local differentially private protocols for causal discovery tasks. We foresee that our findings will aid researchers and practitioners in conducting locally private causal discovery.

Updated: 2024-05-03 12:40:49

标题: 局部隐私条件下的因果关系发现

摘要: 差分隐私是一种广泛采用的框架，旨在保护数据提供者在数据集中的敏感信息。它基于在存储和处理数据的服务器与数据消费者之间的界面上应用受控噪声。本地差分隐私是一种变体，允许数据提供者在其数据上个别应用隐私化机制。因此，它也在服务器或甚至数据收集器不可信的情况下提供保护。然而，引入噪声不可避免地影响数据的效用，特别是通过扭曲个别数据组件之间的相关性。这种扭曲可能对因果发现等任务造成不利影响。在本文中，我们考虑了各种众所周知的本地差分私密机制，并比较了它们提供的隐私与应用于被这些机制混淆的数据时用于因果学习的算法产生的因果结构的准确性之间的权衡。我们的分析为选择适当的本地差分私密协议用于因果发现任务提供了宝贵的见解。我们预计我们的发现将帮助研究人员和从业者进行本地私密因果发现。

更新时间: 2024-05-03 12:40:49

领域: cs.CR,cs.AI,cs.LG,stat.ME

下载: http://arxiv.org/abs/2311.04037v3

Metric Temporal Equilibrium Logic over Timed Traces

In temporal extensions of Answer Set Programming (ASP) based on linear-time, the behavior of dynamic systems is captured by sequences of states. While this representation reflects their relative order, it abstracts away the specific times associated with each state. However, timing constraints are important in many applications like, for instance, when planning and scheduling go hand in hand. We address this by developing a metric extension of linear-time temporal equilibrium logic, in which temporal operators are constrained by intervals over natural numbers. The resulting Metric Equilibrium Logic provides the foundation of an ASP-based approach for specifying qualitative and quantitative dynamic constraints. To this end, we define a translation of metric formulas into monadic first-order formulas and give a correspondence between their models in Metric Equilibrium Logic and Monadic Quantified Equilibrium Logic, respectively. Interestingly, our translation provides a blue print for implementation in terms of ASP modulo difference constraints.

Updated: 2024-05-03 12:40:35

标题: 基于时间跟踪的度量时序平衡逻辑

摘要: 在基于线性时间的答案集编程（ASP）的时间扩展中，动态系统的行为被状态序列捕捉。虽然这种表示反映了它们的相对顺序，但却忽略了与每个状态相关联的具体时间。然而，在许多应用中，例如在规划和调度同时进行时，时间约束是重要的。我们通过开发线性时间时序平衡逻辑的度量扩展来解决这个问题，其中时间运算符受自然数上的区间限制。由此产生的度量平衡逻辑为基于ASP的方法提供了指定定性和定量动态约束的基础。为此，我们定义了度量公式到单调一阶公式的转换，并分别在度量平衡逻辑和单调量化平衡逻辑中给出了它们的模型之间的对应关系。有趣的是，我们的转换为在ASP模块差异约束方面的实现提供了一个蓝图。

更新时间: 2024-05-03 12:40:35

领域: cs.AI

下载: http://arxiv.org/abs/2304.14778v2

Simplicity in Complexity : Explaining Visual Complexity using Deep Segmentation Models

The complexity of visual stimuli plays an important role in many cognitive phenomena, including attention, engagement, memorability, time perception and aesthetic evaluation. Despite its importance, complexity is poorly understood and ironically, previous models of image complexity have been quite complex. There have been many attempts to find handcrafted features that explain complexity, but these features are usually dataset specific, and hence fail to generalise. On the other hand, more recent work has employed deep neural networks to predict complexity, but these models remain difficult to interpret, and do not guide a theoretical understanding of the problem. Here we propose to model complexity using segment-based representations of images. We use state-of-the-art segmentation models, SAM and FC-CLIP, to quantify the number of segments at multiple granularities, and the number of classes in an image respectively. We find that complexity is well-explained by a simple linear model with these two features across six diverse image-sets of naturalistic scene and art images. This suggests that the complexity of images can be surprisingly simple.

Updated: 2024-05-03 12:31:52

标题: 复杂中的简单：利用深度分割模型解释视觉复杂性

摘要: 视觉刺激的复杂性在许多认知现象中发挥重要作用，包括注意力、参与度、记忆性、时间感知和审美评价。尽管其重要性，复杂性仍然被理解得很差，而且讽刺的是，先前的图像复杂性模型相当复杂。已经有许多尝试找到手工特征来解释复杂性，但这些特征通常是特定于数据集的，因此无法推广。另一方面，最近的工作采用了深度神经网络来预测复杂性，但这些模型仍然难以解释，并且不能指导对问题的理论理解。在这里，我们提出使用基于片段的图像表示来建模复杂性。我们使用最先进的分割模型SAM和FC-CLIP，在多个粒度上量化图像中的段数和类别数。我们发现，通过这两个特征的简单线性模型可以很好地解释六组自然场景和艺术图像的复杂性。这表明图像的复杂性可能出乎意料地简单。

更新时间: 2024-05-03 12:31:52

领域: cs.CV,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2403.03134v2

Comparative Analysis of Retrieval Systems in the Real World

This research paper presents a comprehensive analysis of integrating advanced language models with search and retrieval systems in the fields of information retrieval and natural language processing. The objective is to evaluate and compare various state-of-the-art methods based on their performance in terms of accuracy and efficiency. The analysis explores different combinations of technologies, including Azure Cognitive Search Retriever with GPT-4, Pinecone's Canopy framework, Langchain with Pinecone and different language models (OpenAI, Cohere), LlamaIndex with Weaviate Vector Store's hybrid search, Google's RAG implementation on Cloud VertexAI-Search, Amazon SageMaker's RAG, and a novel approach called KG-FID Retrieval. The motivation for this analysis arises from the increasing demand for robust and responsive question-answering systems in various domains. The RobustQA metric is used to evaluate the performance of these systems under diverse paraphrasing of questions. The report aims to provide insights into the strengths and weaknesses of each method, facilitating informed decisions in the deployment and development of AI-driven search and retrieval systems.

Updated: 2024-05-03 12:30:01

标题: 真实世界中检索系统的比较分析

摘要: 本研究论文提出了一项综合分析，将先进的语言模型与信息检索和自然语言处理领域的搜索和检索系统集成在一起。其目标是评估和比较各种最先进的方法，根据它们在准确性和效率方面的表现。分析探讨了不同技术组合，包括Azure Cognitive Search Retriever与GPT-4、Pinecone的Canopy框架、Langchain与Pinecone以及不同语言模型（OpenAI、Cohere）、LlamaIndex与Weaviate Vector Store的混合搜索、Google在Cloud VertexAI-Search上的RAG实现、Amazon SageMaker的RAG，以及一种名为KG-FID检索的新方法。进行此分析的动机源于对各个领域中稳健且响应快速的问答系统的需求不断增加。RobustQA指标用于评估这些系统在不同问题重述下的表现。该报告旨在提供对每种方法的优势和劣势的洞察，从而促进在部署和开发基于AI的搜索和检索系统时做出明智决策。

更新时间: 2024-05-03 12:30:01

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2405.02048v1

Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach

Robust Reinforcement Learning (RRL) is a promising Reinforcement Learning (RL) paradigm aimed at training robust to uncertainty or disturbances models, making them more efficient for real-world applications. Following this paradigm, uncertainty or disturbances are interpreted as actions of a second adversarial agent, and thus, the problem is reduced to seeking the agents' policies robust to any opponent's actions. This paper is the first to propose considering the RRL problems within the positional differential game theory, which helps us to obtain theoretically justified intuition to develop a centralized Q-learning approach. Namely, we prove that under Isaacs's condition (sufficiently general for real-world dynamical systems), the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations. Based on these results, we present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.

Updated: 2024-05-03 12:21:43

标题: 零和位置差分博弈作为强化学习的框架：深度Q学习方法

摘要: Robust Reinforcement Learning (RRL)是一种有希望的强化学习（RL）范式，旨在训练对不确定性或干扰具有鲁棒性的模型，使它们在实际应用中更加高效。遵循这一范式，不确定性或干扰被解释为第二个对抗性代理的行动，因此，问题被简化为寻求对抗手的任何行动都具有鲁棒性的代理策略。本文首次提出在位置微分博弈理论框架内考虑RRL问题，这有助于我们获得理论上合理的直觉来发展集中式Q-learning方法。换句话说，我们证明了在Isaacs条件下（对于实际世界动态系统足够通用），相同的Q函数可以被用作最小最大和最大最小贝尔曼方程的近似解。基于这些结果，我们提出了Isaacs Deep Q-Network算法，并展示了它们在各种环境中相对于其他基准RRL和多智能体RL算法的优越性。

更新时间: 2024-05-03 12:21:43

领域: cs.LG,cs.AI,cs.GT,cs.SY,eess.SY,math.OC,68T07, 49N70

下载: http://arxiv.org/abs/2405.02044v1

On human-centred security: A new systems model based on modes and mode transitions

We propose an abstract conceptual framework for analysing complex security systems using a new notion of modes and mode transitions. A mode is an independent component of a system with its own objectives, monitoring data, algorithms, and scope and limits. The behaviour of a mode, including its transitions to other modes, is determined by interpretations of the mode's monitoring data in the light of its objectives and capabilities -- these interpretations we call beliefs. We formalise the conceptual framework mathematically and, by quantifying and visualising beliefs in higher-dimensional geometric spaces, we argue our models may help both design, analyse and explain systems. The mathematical models are based on simplicial complexes.

Updated: 2024-05-03 12:21:38

标题: 关于以人为中心的安全：基于模式和模式转换的新系统模型

摘要: 我们提出了一个用于分析复杂安全系统的抽象概念框架，其中引入了一种新的模式和模式转换的概念。模式是系统的独立组件，具有自己的目标、监视数据、算法、范围和限制。模式的行为，包括其与其他模式的转换，是通过对模式的监视数据在其目标和能力的光下进行解释确定的--我们称这些解释为信念。我们在数学上形式化了这个概念框架，并通过在更高维度的几何空间中量化和可视化信念，我们认为我们的模型可以帮助设计、分析和解释系统。这些数学模型基于单纯复合体。

更新时间: 2024-05-03 12:21:38

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2405.02043v1

Stabilizing Backpropagation Through Time to Learn Complex Physics

Of all the vector fields surrounding the minima of recurrent learning setups, the gradient field with its exploding and vanishing updates appears a poor choice for optimization, offering little beyond efficient computability. We seek to improve this suboptimal practice in the context of physics simulations, where backpropagating feedback through many unrolled time steps is considered crucial to acquiring temporally coherent behavior. The alternative vector field we propose follows from two principles: physics simulators, unlike neural networks, have a balanced gradient flow, and certain modifications to the backpropagation pass leave the positions of the original minima unchanged. As any modification of backpropagation decouples forward and backward pass, the rotation-free character of the gradient field is lost. Therefore, we discuss the negative implications of using such a rotational vector field for optimization and how to counteract them. Our final procedure is easily implementable via a sequence of gradient stopping and component-wise comparison operations, which do not negatively affect scalability. Our experiments on three control problems show that especially as we increase the complexity of each task, the unbalanced updates from the gradient can no longer provide the precise control signals necessary while our method still solves the tasks. Our code can be found at https://github.com/tum-pbs/StableBPTT.

Updated: 2024-05-03 12:20:08

标题: 通过时间稳定反向传播以学习复杂物理现象

摘要: 在所有围绕循环学习设置的最小值周围的矢量场中，梯度场与其爆炸和消失的更新似乎是优化的一个不佳选择，除了有效计算之外几乎无其他好处。我们试图改进这种次优实践，将其应用于物理模拟的背景，通过许多展开的时间步骤反向传播反馈被认为对获得时间上连贯的行为至关重要。我们提出的替代矢量场遵循两个原则：物理模拟器与神经网络不同，具有平衡的梯度流，对反向传播传递进行某些修改不会改变原始最小值的位置。由于对反向传播的任何修改都会分离正向和反向传播，因此梯度场的无旋特性将丢失。因此，我们讨论了使用这种旋转矢量场进行优化的负面影响以及如何抵消这些影响。我们的最终流程通过一系列梯度停止和分量比较操作轻松实现，这不会对可伸缩性产生负面影响。我们在三个控制问题上的实验表明，特别是在增加每个任务的复杂性时，梯度的不平衡更新将不再能提供必要的精确控制信号，而我们的方法仍然可以解决这些任务。我们的代码可以在https://github.com/tum-pbs/StableBPTT找到。

更新时间: 2024-05-03 12:20:08

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2405.02041v1

Robustness of Decentralised Learning to Nodes and Data Disruption

In the vibrant landscape of AI research, decentralised learning is gaining momentum. Decentralised learning allows individual nodes to keep data locally where they are generated and to share knowledge extracted from local data among themselves through an interactive process of collaborative refinement. This paradigm supports scenarios where data cannot leave local nodes due to privacy or sovereignty reasons or real-time constraints imposing proximity of models to locations where inference has to be carried out. The distributed nature of decentralised learning implies significant new research challenges with respect to centralised learning. Among them, in this paper, we focus on robustness issues. Specifically, we study the effect of nodes' disruption on the collective learning process. Assuming a given percentage of "central" nodes disappear from the network, we focus on different cases, characterised by (i) different distributions of data across nodes and (ii) different times when disruption occurs with respect to the start of the collaborative learning task. Through these configurations, we are able to show the non-trivial interplay between the properties of the network connecting nodes, the persistence of knowledge acquired collectively before disruption or lack thereof, and the effect of data availability pre- and post-disruption. Our results show that decentralised learning processes are remarkably robust to network disruption. As long as even minimum amounts of data remain available somewhere in the network, the learning process is able to recover from disruptions and achieve significant classification accuracy. This clearly varies depending on the remaining connectivity after disruption, but we show that even nodes that remain completely isolated can retain significant knowledge acquired before the disruption.

Updated: 2024-05-03 12:14:48

标题: 分散学习对节点和数据干扰的稳健性

摘要: 在人工智能研究的充满活力的领域中，分散式学习正在蓬勃发展。分散式学习允许个体节点将数据保留在本地生成，并通过协作精炼的互动过程在它们之间分享从本地数据中提取的知识。这种范式支持数据由于隐私或主权原因或实时约束导致无法离开本地节点的情况，或者要求模型靠近必须进行推断的位置。分散式学习的分布性质意味着相对于集中式学习存在重大的新研究挑战。在这些挑战中，本文着重研究健壮性问题。具体而言，我们研究了节点中断对集体学习过程的影响。假设给定百分比的“中央”节点从网络中消失，我们关注不同情况，特征是（i）数据在节点之间的不同分布和（ii）中断发生与协作学习任务开始之间的不同时间。通过这些配置，我们能够展示网络连接节点的属性之间的非平凡相互作用，以及在中断之前或之后集体获得的知识的持续性，以及中断前后数据可用性的影响。我们的结果表明，分散式学习过程对网络中断非常健壮。只要网络中仍然有最少量的数据可用，学习过程就能够从中断中恢复，并实现显著的分类准确性。这显然取决于中断后的剩余连接性，但我们展示即使保持完全孤立的节点也可以保留在中断之前获得的重要知识。

更新时间: 2024-05-03 12:14:48

领域: cs.LG

下载: http://arxiv.org/abs/2405.02377v1

Analyzing Narrative Processing in Large Language Models (LLMs): Using GPT4 to test BERT

The ability to transmit and receive complex information via language is unique to humans and is the basis of traditions, culture and versatile social interactions. Through the disruptive introduction of transformer based large language models (LLMs) humans are not the only entity to "understand" and produce language any more. In the present study, we have performed the first steps to use LLMs as a model to understand fundamental mechanisms of language processing in neural networks, in order to make predictions and generate hypotheses on how the human brain does language processing. Thus, we have used ChatGPT to generate seven different stylistic variations of ten different narratives (Aesop's fables). We used these stories as input for the open source LLM BERT and have analyzed the activation patterns of the hidden units of BERT using multi-dimensional scaling and cluster analysis. We found that the activation vectors of the hidden units cluster according to stylistic variations in earlier layers of BERT (1) than narrative content (4-5). Despite the fact that BERT consists of 12 identical building blocks that are stacked and trained on large text corpora, the different layers perform different tasks. This is a very useful model of the human brain, where self-similar structures, i.e. different areas of the cerebral cortex, can have different functions and are therefore well suited to processing language in a very efficient way. The proposed approach has the potential to open the black box of LLMs on the one hand, and might be a further step to unravel the neural processes underlying human language processing and cognition in general.

Updated: 2024-05-03 11:56:13

标题: 分析大型语言模型（LLMs）中的叙事处理：使用GPT4测试BERT

摘要: 人类通过语言传递和接收复杂信息的能力是独特的，是传统、文化和多样化社会互动的基础。通过引入基于Transformer的大型语言模型（LLMs），人类不再是唯一能够“理解”和产生语言的实体。在本研究中，我们已经开始使用LLMs作为模型来理解神经网络中语言处理的基本机制，以便对人类大脑如何进行语言处理进行预测和生成假设。因此，我们使用ChatGPT生成了十个不同叙述（伊索寓言）的七种不同风格的变体。我们将这些故事作为输入，用开源LLM BERT分析了BERT隐藏单元的激活模式，使用多维尺度分析和聚类分析。我们发现，隐藏单元的激活向量按照早期BERT层的风格变化（1）而不是叙述内容（4-5）进行聚类。尽管BERT由12个相同的构建块组成，这些构建块叠加并在大型文本语料库上进行训练，不同的层执行不同的任务。这是一个非常有用的人脑模型，在这里，自相似的结构，即大脑皮层的不同区域，可以具有不同的功能，因此非常适合以非常高效的方式处理语言。所提出的方法有潜力打开LLMs的黑匣子，另一方面可能是解开潜在的神经过程，揭示人类语言处理和认知的基础。

更新时间: 2024-05-03 11:56:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.02024v1

Adversarial Botometer: Adversarial Analysis for Social Bot Detection

Social bots play a significant role in many online social networks (OSN) as they imitate human behavior. This fact raises difficult questions about their capabilities and potential risks. Given the recent advances in Generative AI (GenAI), social bots are capable of producing highly realistic and complex content that mimics human creativity. As the malicious social bots emerge to deceive people with their unrealistic content, identifying them and distinguishing the content they produce has become an actual challenge for numerous social platforms. Several approaches to this problem have already been proposed in the literature, but the proposed solutions have not been widely evaluated. To address this issue, we evaluate the behavior of a text-based bot detector in a competitive environment where some scenarios are proposed: \textit{First}, the tug-of-war between a bot and a bot detector is examined. It is interesting to analyze which party is more likely to prevail and which circumstances influence these expectations. In this regard, we model the problem as a synthetic adversarial game in which a conversational bot and a bot detector are engaged in strategic online interactions. \textit{Second}, the bot detection model is evaluated under attack examples generated by a social bot; to this end, we poison the dataset with attack examples and evaluate the model performance under this condition. \textit{Finally}, to investigate the impact of the dataset, a cross-domain analysis is performed. Through our comprehensive evaluation of different categories of social bots using two benchmark datasets, we were able to demonstrate some achivement that could be utilized in future works.

Updated: 2024-05-03 11:28:21

标题: 对抗性Botometer：社交机器人检测的对抗性分析

摘要: 社交机器人在许多在线社交网络中扮演着重要角色，因为它们模仿人类行为。这一事实引发了关于它们能力和潜在风险的难题。鉴于生成式人工智能（GenAI）的最新进展，社交机器人能够产生高度逼真且复杂的内容，模仿人类创造力。由于恶意社交机器人开始以其不切实际的内容欺骗人们，识别它们并区分它们所产生的内容已成为许多社交平台面临的实际挑战。文献中已经提出了几种解决这个问题的方法，但提出的解决方案尚未广泛评估。为了解决这个问题，我们在一个竞争环境中评估了基于文本的机器人检测器的行为，提出了一些情景：首先，考察了机器人与机器人检测器之间的拉锯战。分析哪一方更有可能取胜以及哪些情况影响这些期望。在这方面，我们将问题建模为一种合成对抗性游戏，在这个游戏中，一个对话机器人和一个机器人检测器参与战略性的在线交互。其次，评估了机器人检测模型在社交机器人生成的攻击示例下的表现；为此，我们用攻击示例污染数据集，并评估了模型在这种条件下的性能。最后，为了调查数据集的影响，进行了跨领域分析。通过对使用两个基准数据集的不同类别的社交机器人进行全面评估，我们能够展示一些成就，这些成就可以在未来的工作中利用。

更新时间: 2024-05-03 11:28:21

领域: cs.SI,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.02016v1

Automating Computational Design with Generative AI

AI image generators based on diffusion models have recently garnered attention for their capability to create images from simple text prompts. However, for practical use in civil engineering they need to be able to create specific construction plans for given constraints. This paper investigates the potential of current AI generators in addressing such challenges, specifically for the creation of simple floor plans. We explain how the underlying diffusion-models work and propose novel refinement approaches to improve semantic encoding and generation quality. In several experiments we show that we can improve validity of generated floor plans from 6% to 90%. Based on these results we derive future research challenges considering building information modelling. With this we provide: (i) evaluation of current generative AIs; (ii) propose improved refinement approaches; (iii) evaluate them on various examples; (iv) derive future directions for diffusion models in civil engineering.

Updated: 2024-05-03 11:08:17

标题: 用生成式人工智能自动化计算设计

摘要: 基于扩散模型的AI图像生成器最近引起了关注，因为它们能够根据简单的文本提示创建图像。然而，在实际的土木工程应用中，它们需要能够根据给定的约束条件创建特定的施工计划。本文研究了当前AI生成器在解决这些挑战方面的潜力，特别是针对简单平面图的创建。我们解释了基础扩散模型的工作原理，并提出了新颖的改进方法来提高语义编码和生成质量。在几项实验中，我们展示了我们可以将生成的平面图的有效性从6%提高到90%。基于这些结果，我们提出了未来的研究挑战，考虑到建筑信息建模。通过这些，我们提供了：（i）对当前生成式AI的评估；（ii）提出改进的改进方法；（iii）对各种示例进行评估；（iv）推导扩散模型在土木工程中的未来方向。

更新时间: 2024-05-03 11:08:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2307.02511v2

The Sparse Tsetlin Machine: Sparse Representation with Active Literals

This paper introduces the Sparse Tsetlin Machine (STM), a novel Tsetlin Machine (TM) that processes sparse data efficiently. Traditionally, the TM does not consider data characteristics such as sparsity, commonly seen in NLP applications and other bag-of-word-based representations. Consequently, a TM must initialize, store, and process a significant number of zero values, resulting in excessive memory usage and computational time. Previous attempts at creating a sparse TM have predominantly been unsuccessful, primarily due to their inability to identify which literals are sufficient for TM training. By introducing Active Literals (AL), the STM can focus exclusively on literals that actively contribute to the current data representation, significantly decreasing memory footprint and computational time while demonstrating competitive classification performance.

Updated: 2024-05-03 11:06:10

标题: 《稀疏Tsetlin机器：具有主动文字的稀疏表示》

摘要: 本文介绍了稀疏Tsetlin机器（STM），这是一种能够高效处理稀疏数据的新型Tsetlin机器（TM）。传统上，TM并不考虑数据特征，比如在NLP应用和其他基于词袋表示的场景中常见的稀疏性。因此，TM必须初始化、存储和处理大量的零值，导致内存使用和计算时间过多。先前尝试创建稀疏TM的努力主要失败，主要是因为它们无法确定哪些文字对TM的训练足够重要。通过引入活跃文字（AL），STM可以专注于对当前数据表示有积极贡献的文字，显著减少内存占用和计算时间，同时展示出具有竞争力的分类性能。

更新时间: 2024-05-03 11:06:10

领域: cs.LG,cs.AI,cs.FL

下载: http://arxiv.org/abs/2405.02375v1

An operator preconditioning perspective on training in physics-informed machine learning

In this paper, we investigate the behavior of gradient descent algorithms in physics-informed machine learning methods like PINNs, which minimize residuals connected to partial differential equations (PDEs). Our key result is that the difficulty in training these models is closely related to the conditioning of a specific differential operator. This operator, in turn, is associated to the Hermitian square of the differential operator of the underlying PDE. If this operator is ill-conditioned, it results in slow or infeasible training. Therefore, preconditioning this operator is crucial. We employ both rigorous mathematical analysis and empirical evaluations to investigate various strategies, explaining how they better condition this critical operator, and consequently improve training.

Updated: 2024-05-03 10:59:26

标题: 一个操作员预处理的视角对物理信息机器学习中的训练

摘要: 在这篇论文中，我们研究了梯度下降算法在物理信息机器学习方法（如PINNs）中的行为，这些方法最小化与偏微分方程（PDEs）相关的残差。我们的关键结果是，训练这些模型的困难与特定微分算子的条件数密切相关。这个算子又与底层PDE的微分算子的共轭平方有关。如果这个算子的条件数糟糕，会导致训练速度慢或根本无法进行。因此，对这个算子进行预处理是至关重要的。我们既采用了严格的数学分析，又进行了实证评估，探讨了各种策略，说明它们如何更好地调整这个关键算子的条件，从而改善训练效果。

更新时间: 2024-05-03 10:59:26

领域: cs.LG

下载: http://arxiv.org/abs/2310.05801v2

Exploring Combinatorial Problem Solving with Large Language Models: A Case Study on the Travelling Salesman Problem Using GPT-3.5 Turbo

Large Language Models (LLMs) are deep learning models designed to generate text based on textual input. Although researchers have been developing these models for more complex tasks such as code generation and general reasoning, few efforts have explored how LLMs can be applied to combinatorial problems. In this research, we investigate the potential of LLMs to solve the Travelling Salesman Problem (TSP). Utilizing GPT-3.5 Turbo, we conducted experiments employing various approaches, including zero-shot in-context learning, few-shot in-context learning, and chain-of-thoughts (CoT). Consequently, we fine-tuned GPT-3.5 Turbo to solve a specific problem size and tested it using a set of various instance sizes. The fine-tuned models demonstrated promising performance on problems identical in size to the training instances and generalized well to larger problems. Furthermore, to improve the performance of the fine-tuned model without incurring additional training costs, we adopted a self-ensemble approach to improve the quality of the solutions.

Updated: 2024-05-03 10:54:14

标题: 使用大型语言模型探索组合问题解决：以GPT-3.5 Turbo为例研究旅行推销员问题

摘要: 大型语言模型（LLMs）是设计用于基于文本输入生成文本的深度学习模型。尽管研究人员一直在开发这些模型以用于更复杂的任务，如代码生成和一般推理，但很少有研究探讨LLMs如何应用于组合问题。在这项研究中，我们调查了LLMs解决旅行商问题（TSP）的潜力。利用GPT-3.5 Turbo，我们进行了实验，采用了各种方法，包括零-shot上下文学习，少-shot上下文学习和思维链（CoT）。结果，我们对GPT-3.5 Turbo进行了微调，以解决特定问题规模，并使用一组各种实例大小进行了测试。微调模型在与训练实例大小相同的问题上表现出有希望的性能，并且很好地推广到更大的问题。此外，为了提高微调模型的性能而不产生额外的训练成本，我们采用了自组合方法来改进解决方案的质量。

更新时间: 2024-05-03 10:54:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.01997v1

Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery

This thesis aims to study some of the mathematical challenges that arise in the analysis of statistical sequential decision-making algorithms for postoperative patients follow-up. Stochastic bandits (multiarmed, contextual) model the learning of a sequence of actions (policy) by an agent in an uncertain environment in order to maximise observed rewards. To learn optimal policies, bandit algorithms have to balance the exploitation of current knowledge and the exploration of uncertain actions. Such algorithms have largely been studied and deployed in industrial applications with large datasets, low-risk decisions and clear modelling assumptions, such as clickthrough rate maximisation in online advertising. By contrast, digital health recommendations call for a whole new paradigm of small samples, risk-averse agents and complex, nonparametric modelling. To this end, we developed new safe, anytime-valid concentration bounds, (Bregman, empirical Chernoff), introduced a new framework for risk-aware contextual bandits (with elicitable risk measures) and analysed a novel class of nonparametric bandit algorithms under weak assumptions (Dirichlet sampling). In addition to the theoretical guarantees, these results are supported by in-depth empirical evidence. Finally, as a first step towards personalised postoperative follow-up recommendations, we developed with medical doctors and surgeons an interpretable machine learning model to predict the long-term weight trajectories of patients after bariatric surgery.

Updated: 2024-05-03 10:50:30

标题: 统计序贯决策的数学：浓缩、风险意识和随机赌博模型，及其在减肥手术中的应用

摘要: 这篇论文旨在研究在统计顺序决策算法分析中出现的一些数学挑战，应用于术后患者的随访。随机赌博机（多臂、情境）模拟了代理在不确定环境中学习一系列行动（策略），以最大化观察到的奖励。为了学习最优策略，赌博机算法必须平衡当前知识的利用和不确定行动的探索。此类算法在工业应用中大量研究和部署，具有大型数据集、低风险决策和明确的建模假设，例如在线广告中的点击率最大化。相比之下，数字健康建议需要一种全新的小样本、风险规避代理和复杂的非参数建模范式。为此，我们开发了新的安全、随时有效的集中界限（Bregman、经验Chernoff），引入了一种新的风险感知情境赌博机框架（具有可激发风险测量），并在弱假设下分析了一类新颖的非参数赌博机算法（Dirichlet抽样）。除了理论保证，这些结果还得到了深入的经验证据支持。最后，作为个性化术后随访建议的第一步，我们与医生和外科医生合作开发了一个可解释的机器学习模型，用于预测减重手术后患者的长期体重轨迹。

更新时间: 2024-05-03 10:50:30

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2405.01994v1

Cooperation and Federation in Distributed Radar Point Cloud Processing

The paper considers the problem of human-scale RF sensing utilizing a network of resource-constrained MIMO radars with low range-azimuth resolution. The radars operate in the mmWave band and obtain time-varying 3D point cloud (PC) information that is sensitive to body movements. They also observe the same scene from different views and cooperate while sensing the environment using a sidelink communication channel. Conventional cooperation setups allow the radars to mutually exchange raw PC information to improve ego sensing. The paper proposes a federation mechanism where the radars exchange the parameters of a Bayesian posterior measure of the observed PCs, rather than raw data. The radars act as distributed parameter servers to reconstruct a global posterior (i.e., federated posterior) using Bayesian tools. The paper quantifies and compares the benefits of radar federation with respect to cooperation mechanisms. Both approaches are validated by experiments with a real-time demonstration platform. Federation makes minimal use of the sidelink communication channel (20 {\div} 25 times lower bandwidth use) and is less sensitive to unresolved targets. On the other hand, cooperation reduces the mean absolute target estimation error of about 20%.

Updated: 2024-05-03 10:50:30

标题: 分布式雷达点云处理中的合作和联邦

摘要: 这篇论文考虑了利用资源受限的MIMO雷达网络进行人体尺度射频感知的问题，这些雷达具有低范围-方位分辨率。雷达在毫米波频段运行，并获取对身体运动敏感的时间变化的3D点云（PC）信息。它们还从不同视角观察同一场景，并利用侧链通信信道在感知环境时进行合作。传统的合作设置允许雷达相互交换原始PC信息以改善自我感知。论文提出了一个联邦机制，其中雷达交换观测到的PC的贝叶斯后验测量参数，而不是原始数据。雷达充当分布式参数服务器，使用贝叶斯工具重建全局后验（即联邦后验）。论文量化并比较了雷达联邦相对于合作机制的好处。这两种方法均通过实时演示平台的实验证实。联邦机制对侧链通信信道的利用最小（带宽使用降低了20 {\div} 25倍），并且对未解决的目标不太敏感。另一方面，合作可以将平均绝对目标估计误差减少约20%。

更新时间: 2024-05-03 10:50:30

领域: cs.LG,cs.CV,cs.IT,math.IT

下载: http://arxiv.org/abs/2405.01995v1

A Conditional Independence Test in the Presence of Discretization

Testing conditional independence has many applications, such as in Bayesian network learning and causal discovery. Different test methods have been proposed. However, existing methods generally can not work when only discretized observations are available. Specifically, consider $X_1$, $\tilde{X}_2$ and $X_3$ are observed variables, where $\tilde{X}_2$ is a discretization of latent variables $X_2$. Applying existing test methods to the observations of $X_1$, $\tilde{X}_2$ and $X_3$ can lead to a false conclusion about the underlying conditional independence of variables $X_1$, $X_2$ and $X_3$. Motivated by this, we propose a conditional independence test specifically designed to accommodate the presence of such discretization. To achieve this, we design the bridge equations to recover the parameter reflecting the statistical information of the underlying latent continuous variables. An appropriate test statistic and its asymptotic distribution under the null hypothesis of conditional independence have also been derived. Both theoretical results and empirical validation have been provided, demonstrating the effectiveness of our test methods.

Updated: 2024-05-03 10:48:56

标题: 在离散化存在的条件独立性检验

摘要: 测试条件独立性在贝叶斯网络学习和因果发现等领域有许多应用。已经提出了不同的测试方法。然而，现有方法通常在只有离散化观察数据时无法工作。具体地，考虑观察变量 $X_1$、$\tilde{X}_2$ 和 $X_3$，其中 $\tilde{X}_2$ 是潜在变量 $X_2$ 的离散化。将现有测试方法应用于 $X_1$、$\tilde{X}_2$ 和 $X_3$ 的观察数据可能会导致关于变量 $X_1$、$X_2$ 和 $X_3$ 的潜在条件独立性的错误结论。受此启发，我们提出了一个专门设计来适应这种离散化存在的条件独立性测试方法。为了实现这一目标，我们设计了桥接方程来恢复反映潜在连续变量的统计信息的参数。还推导了一个适当的检验统计量及其在条件独立性零假设下的渐近分布。我们提供了理论结果和经验验证，证明了我们测试方法的有效性。

更新时间: 2024-05-03 10:48:56

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.17644v2

Soft Label PU Learning

PU learning refers to the classification problem in which only part of positive samples are labeled. Existing PU learning methods treat unlabeled samples equally. However, in many real tasks, from common sense or domain knowledge, some unlabeled samples are more likely to be positive than others. In this paper, we propose soft label PU learning, in which unlabeled data are assigned soft labels according to their probabilities of being positive. Considering that the ground truth of TPR, FPR, and AUC are unknown, we then design PU counterparts of these metrics to evaluate the performances of soft label PU learning methods within validation data. We show that these new designed PU metrics are good substitutes for the real metrics. After that, a method that optimizes such metrics is proposed. Experiments on public datasets and real datasets for anti-cheat services from Tencent games demonstrate the effectiveness of our proposed method.

Updated: 2024-05-03 10:46:19

标题: 软标签PU学习

摘要: PU学习是指只有部分正样本被标记的分类问题。现有的PU学习方法将未标记样本视为同等重要。然而，在许多实际任务中，根据常识或领域知识，一些未标记样本更有可能是正样本。在本文中，我们提出了软标签PU学习，即根据未标记数据可能是正样本的概率分配软标签。考虑到真正的真阳性率、假阳性率和AUC未知，我们设计了这些指标的PU对应项来评估软标签PU学习方法在验证数据中的性能。我们表明这些新设计的PU指标是真实指标的良好替代品。然后，提出了一种优化这些指标的方法。在腾讯游戏的公共数据集和反作弊服务的真实数据集上进行的实验表明了我们提出的方法的有效性。

更新时间: 2024-05-03 10:46:19

领域: cs.LG

下载: http://arxiv.org/abs/2405.01990v1

Joint sentiment analysis of lyrics and audio in music

Sentiment or mood can express themselves on various levels in music. In automatic analysis, the actual audio data is usually analyzed, but the lyrics can also play a crucial role in the perception of moods. We first evaluate various models for sentiment analysis based on lyrics and audio separately. The corresponding approaches already show satisfactory results, but they also exhibit weaknesses, the causes of which we examine in more detail. Furthermore, different approaches to combining the audio and lyrics results are proposed and evaluated. Considering both modalities generally leads to improved performance. We investigate misclassifications and (also intentional) contradictions between audio and lyrics sentiment more closely, and identify possible causes. Finally, we address fundamental problems in this research area, such as high subjectivity, lack of data, and inconsistency in emotion taxonomies.

Updated: 2024-05-03 10:42:17

标题: 音乐中歌词和音频的联合情感分析

摘要: 情感或心情可以在音乐中以不同的层次表达。在自动分析中，通常会分析实际的音频数据，但歌词也可以在情绪感知中发挥关键作用。我们首先评估基于歌词和音频的情感分析模型。相应的方法已经显示出令人满意的结果，但它们也表现出一些弱点，我们将更详细地分析其原因。此外，提出并评估了不同的音频和歌词结果相结合的方法。考虑到两种模态通常会带来更好的性能。我们更仔细地调查了音频和歌词情感之间的误分类和（有意的）矛盾，并确定了可能的原因。最后，我们解决了这一研究领域的基本问题，如高度主观性、数据不足和情感分类的不一致性。

更新时间: 2024-05-03 10:42:17

领域: cs.SD,cs.AI,cs.CL,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.01988v1

Properties of the geometry of solutions and capacity of multi-layer neural networks with Rectified Linear Units activations

Rectified Linear Units (ReLU) have become the main model for the neural units in current deep learning systems. This choice has been originally suggested as a way to compensate for the so called vanishing gradient problem which can undercut stochastic gradient descent (SGD) learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity and on the geometrical landscape of the solution space in two-layer neural networks with either binary or real-valued weights. We study the problem of storing an extensive number of random patterns and find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases, at odds with the case of threshold units in which the capacity diverges. Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure: while the majority of solutions are close in distance but still isolated, there exist rare regions of solutions which are much more dense than the similar ones in the case of threshold units. These solutions are robust to perturbations of the weights and can tolerate large perturbations of the inputs. The analytical results are corroborated by numerical findings.

Updated: 2024-05-03 10:37:51

标题: 多层神经网络具有修正线性单元激活函数的解几何属性和容量特性

摘要: 修正线性单元（ReLU）已成为当前深度学习系统中神经元的主要模型。最初选择这种模型是为了弥补所谓的梯度消失问题，这可能削弱由多层组成的网络中的随机梯度下降（SGD）学习。在这里，我们提供了关于ReLU对具有二进制或实数权重的两层神经网络的容量和解空间的几何景观的影响的分析结果。我们研究存储大量随机模式的问题，并发现，令人意外的是，网络的容量在隐藏层神经元数量增加时仍保持有限，这与使用阈值单元的情况不同，后者的容量会发散。可能更重要的是，通过大偏差方法，我们发现解空间的几何景观具有特殊的结构：虽然大多数解在距离上接近但仍然是孤立的，但存在一些稀有区域的解比阈值单元的类似解更密集。这些解对权重的扰动是稳健的，并且可以容忍输入的大扰动。数值结果证实了分析结果。

更新时间: 2024-05-03 10:37:51

领域: cond-mat.dis-nn,cs.LG,stat.ML

下载: http://arxiv.org/abs/1907.07578v6

A Penalty-Based Guardrail Algorithm for Non-Decreasing Optimization with Inequality Constraints

Traditional mathematical programming solvers require long computational times to solve constrained minimization problems of complex and large-scale physical systems. Therefore, these problems are often transformed into unconstrained ones, and solved with computationally efficient optimization approaches based on first-order information, such as the gradient descent method. However, for unconstrained problems, balancing the minimization of the objective function with the reduction of constraint violations is challenging. We consider the class of time-dependent minimization problems with increasing (possibly) nonlinear and non-convex objective function and non-decreasing (possibly) nonlinear and non-convex inequality constraints. To efficiently solve them, we propose a penalty-based guardrail algorithm (PGA). This algorithm adapts a standard penalty-based method by dynamically updating the right-hand side of the constraints with a guardrail variable which adds a margin to prevent violations. We evaluate PGA on two novel application domains: a simplified model of a district heating system and an optimization model derived from learned deep neural networks. Our method significantly outperforms mathematical programming solvers and the standard penalty-based method, and achieves better performance and faster convergence than a state-of-the-art algorithm (IPDD) within a specified time limit.

Updated: 2024-05-03 10:37:34

标题: 一个基于罚函数的护栏算法，用于具有不减约束的优化问题

摘要: 传统的数学规划求解器需要较长的计算时间来解决复杂和大规模物理系统的约束最小化问题。因此，这些问题通常被转化为无约束问题，并使用基于一阶信息的计算效率高的优化方法来解决，如梯度下降法。然而，对于无约束问题，平衡目标函数的最小化和约束违规的减少是具有挑战性的。我们考虑了具有逐渐增加（可能）非线性和非凸目标函数以及逐渐增加（可能）非线性和非凸不等式约束的时变最小化问题类。为了高效地解决这些问题，我们提出了一种基于惩罚的护栏算法（PGA）。该算法通过动态更新约束的右手边，引入一个护栏变量来添加一个边界，以防止违规。我们在两个新颖的应用领域上评估了PGA：一个简化的区域供热系统模型和一个源自学习的深度神经网络的优化模型。我们的方法明显优于数学规划求解器和标准的基于惩罚的方法，并且在指定的时间限制内比最先进的算法（IPDD）取得更好的性能和更快的收敛速度。

更新时间: 2024-05-03 10:37:34

领域: math.OC,cs.AI

下载: http://arxiv.org/abs/2405.01984v1

Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy

Human forecasting accuracy in practice relies on the 'wisdom of the crowd' effect, in which predictions about future events are significantly improved by aggregating across a crowd of individual forecasters. Past work on the forecasting ability of large language models (LLMs) suggests that frontier LLMs, as individual forecasters, underperform compared to the gold standard of a human crowd forecasting tournament aggregate. In Study 1, we expand this research by using an LLM ensemble approach consisting of a crowd of twelve LLMs. We compare the aggregated LLM predictions on 31 binary questions to that of a crowd of 925 human forecasters from a three-month forecasting tournament. Our preregistered main analysis shows that the LLM crowd outperforms a simple no-information benchmark and is not statistically different from the human crowd. In exploratory analyses, we find that these two approaches are equivalent with respect to medium-effect-size equivalence bounds. We also observe an acquiescence effect, with mean model predictions being significantly above 50%, despite an almost even split of positive and negative resolutions. Moreover, in Study 2, we test whether LLM predictions (of GPT-4 and Claude 2) can be improved by drawing on human cognitive output. We find that both models' forecasting accuracy benefits from exposure to the median human prediction as information, improving accuracy by between 17% and 28%: though this leads to less accurate predictions than simply averaging human and machine forecasts. Our results suggest that LLMs can achieve forecasting accuracy rivaling that of human crowd forecasting tournaments: via the simple, practically applicable method of forecast aggregation. This replicates the 'wisdom of the crowd' effect for LLMs, and opens up their use for a variety of applications throughout society.

Updated: 2024-05-03 10:37:24

标题: 硅众智慧：LLM集成预测能力不逊于人类众准确性

摘要: 人类在实践中的预测准确性依赖于“群体智慧”效应，即通过对一群个体预测者进行汇总，可以显著提高对未来事件的预测。过去关于大型语言模型（LLMs）的预测能力的研究表明，作为个体预测者的前沿LLMs与人类群体预测比较时表现不佳。在第一项研究中，我们通过使用由十二个LLMs组成的LLM集成方法来扩展这项研究。我们将31个二元问题的汇总LLM预测与来自为期三个月的预测比赛的925名人类预测者群体进行比较。我们的预登记主要分析显示，LLM群体的表现优于简单的无信息基准，并且与人类群体没有统计学上的差异。在探索性分析中，我们发现这两种方法在中等效应尺度等价边界方面是等效的。我们还观察到一种顺从效应，即平均模型预测明显高于50％，尽管正负解决方案几乎是平分的。此外，在第二项研究中，我们测试了LLM预测（GPT-4和Claude 2）是否可以通过利用人类认知输出来改进。我们发现，这两个模型的预测准确性受益于暴露于中位数人类预测作为信息，准确性提高了17％至28％：尽管这导致的预测不如简单地平均人类和机器预测准确。我们的结果表明，LLMs可以通过简单、实用的预测汇总方法实现与人类群体预测比赛相媲美的预测准确性。这为LLMs复制了“群体智慧”效应，并为它们在社会各个领域的各种应用打开了大门。

更新时间: 2024-05-03 10:37:24

领域: cs.CY,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.19379v3

Protein binding affinity prediction under multiple substitutions applying eGNNs on Residue and Atomic graphs combined with Language model information: eGRAL

Protein-protein interactions (PPIs) play a crucial role in numerous biological processes. Developing methods that predict binding affinity changes under substitution mutations is fundamental for modelling and re-engineering biological systems. Deep learning is increasingly recognized as a powerful tool capable of bridging the gap between in-silico predictions and in-vitro observations. With this contribution, we propose eGRAL, a novel SE(3) equivariant graph neural network (eGNN) architecture designed for predicting binding affinity changes from multiple amino acid substitutions in protein complexes. eGRAL leverages residue, atomic and evolutionary scales, thanks to features extracted from protein large language models. To address the limited availability of large-scale affinity assays with structural information, we generate a simulated dataset comprising approximately 500,000 data points. Our model is pre-trained on this dataset, then fine-tuned and tested on experimental data.

Updated: 2024-05-03 10:33:19

标题: 蛋白质结合亲和力在多个取代情况下的预测：应用eGNNs在残基和原子图上结合语言模型信息的研究：eGRAL

摘要: 蛋白质-蛋白质相互作用（PPIs）在许多生物过程中起着至关重要的作用。开发能够预测替换突变下结合亲和力变化的方法对于建模和重新设计生物系统至关重要。深度学习越来越被认为是一种强大的工具，能够弥合体外预测和体内观察之间的差距。在这篇论文中，我们提出了eGRAL，一种新颖的SE(3)等变图神经网络（eGNN）架构，旨在从蛋白质复合物中多种氨基酸替换中预测结合亲和力的变化。eGRAL利用残基、原子和进化尺度，这得益于从蛋白质大型语言模型中提取的特征。为了解决在具有结构信息的大规模亲和力测定实验中可获得性有限的问题，我们生成了一个包含大约500,000个数据点的模拟数据集。我们的模型在该数据集上进行了预训练，然后在实验数据上进行了微调和测试。

更新时间: 2024-05-03 10:33:19

领域: q-bio.QM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.02374v1

Model-based reinforcement learning for protein backbone design

Designing protein nanomaterials of predefined shape and characteristics has the potential to dramatically impact the medical industry. Machine learning (ML) has proven successful in protein design, reducing the need for expensive wet lab experiment rounds. However, challenges persist in efficiently exploring the protein fitness landscapes to identify optimal protein designs. In response, we propose the use of AlphaZero to generate protein backbones, meeting shape and structural scoring requirements. We extend an existing Monte Carlo tree search (MCTS) framework by incorporating a novel threshold-based reward and secondary objectives to improve design precision. This innovation considerably outperforms existing approaches, leading to protein backbones that better respect structural scores. The application of AlphaZero is novel in the context of protein backbone design and demonstrates promising performance. AlphaZero consistently surpasses baseline MCTS by more than 100% in top-down protein design tasks. Additionally, our application of AlphaZero with secondary objectives uncovers further promising outcomes, indicating the potential of model-based reinforcement learning (RL) in navigating the intricate and nuanced aspects of protein design

Updated: 2024-05-03 10:24:33

标题: 基于模型的强化学习在蛋白质主链设计中的应用

摘要: 设计具有预定义形状和特征的蛋白质纳米材料具有极大的潜力，可以对医药行业产生重大影响。机器学习（ML）在蛋白质设计方面已经取得成功，减少了昂贵的湿实验轮次。然而，有效地探索蛋白质适应性景观以确定最佳蛋白质设计仍然存在挑战。因此，我们提出使用AlphaZero生成蛋白质骨架，满足形状和结构评分要求。我们通过将一种新颖的基于阈值的奖励和次要目标纳入现有的蒙特卡洛树搜索（MCTS）框架来扩展设计精度。这种创新明显优于现有方法，导致更好地尊重结构评分的蛋白质骨架。在蛋白质骨架设计的背景下，AlphaZero的应用是新颖的，并展示了有希望的性能。AlphaZero在自顶向下的蛋白质设计任务中始终比基线MCTS超过100%。此外，我们将AlphaZero与次要目标结合应用，发现了更多有希望的结果，表明了模型驱动的强化学习（RL）在导航蛋白质设计复杂和微妙方面的潜力。

更新时间: 2024-05-03 10:24:33

领域: cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2405.01983v1

Exponentially Weighted Algorithm for Online Network Resource Allocation with Long-Term Constraints

This paper studies an online optimal resource reservation problem in communication networks with job transfers where the goal is to minimize the reservation cost while maintaining the blocking cost under a certain budget limit. To tackle this problem, we propose a novel algorithm based on a randomized exponentially weighted method that encompasses long-term constraints. We then analyze the performance of our algorithm by establishing an upper bound for the associated regret and the cumulative constraint violations. Finally, we present numerical experiments where we compare the performance of our algorithm with those of reinforcement learning where we show that our algorithm surpasses it.

Updated: 2024-05-03 10:12:40

标题: 指数加权算法用于具有长期约束的在线网络资源分配

摘要: 这篇论文研究了一个在线最优资源预留问题，该问题涉及具有作业转移的通信网络，在保持在一定预算限制下的阻塞成本的同时最小化预留成本。为了解决这个问题，我们提出了一种基于随机指数加权方法的新算法，该算法涵盖了长期约束。然后，我们通过建立与后悔和累积约束违反相关的上界来分析我们算法的性能。最后，我们展示了数值实验，比较了我们算法与强化学习算法的性能，结果表明我们的算法胜过了强化学习算法。

更新时间: 2024-05-03 10:12:40

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.02373v1

Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation

Image segmentation holds a vital position in the realms of diagnosis and treatment within the medical domain. Traditional convolutional neural networks (CNNs) and Transformer models have made significant advancements in this realm, but they still encounter challenges because of limited receptive field or high computing complexity. Recently, State Space Models (SSMs), particularly Mamba and its variants, have demonstrated notable performance in the field of vision. However, their feature extraction methods may not be sufficiently effective and retain some redundant structures, leaving room for parameter reduction. Motivated by previous spatial and channel attention methods, we propose Triplet Mamba-UNet. The method leverages residual VSS Blocks to extract intensive contextual features, while Triplet SSM is employed to fuse features across spatial and channel dimensions. We conducted experiments on ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, and Kvasir-Instrument datasets, demonstrating the superior segmentation performance of our proposed TM-UNet. Additionally, compared to the previous VM-UNet, our model achieves a one-third reduction in parameters.

Updated: 2024-05-03 10:12:09

标题: 旋转扫描：具有三元SSM模块的类UNet Mamba用于医学图像分割

摘要: 图像分割在医学领域的诊断和治疗中占据重要地位。传统的卷积神经网络（CNNs）和Transformer模型在这个领域取得了显著进展，但由于有限的感受野或高计算复杂性，它们仍然面临挑战。最近，状态空间模型（SSMs），特别是Mamba及其变体，在视觉领域表现出显著的性能。然而，它们的特征提取方法可能不够有效，保留了一些冗余结构，为参数减少留下了空间。受先前的空间和通道注意方法的启发，我们提出了Triplet Mamba-UNet。该方法利用残差VSS块提取密集的上下文特征，同时采用Triplet SSM来融合空间和通道维度上的特征。我们在ISIC17、ISIC18、CVC-300、CVC-ClinicDB、Kvasir-SEG、CVC-ColonDB和Kvasir-Instrument数据集上进行了实验，展示了我们提出的TM-UNet具有优越的分割性能。此外，与先前的VM-UNet相比，我们的模型参数减少了三分之一。

更新时间: 2024-05-03 10:12:09

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.17701v4

Triadic-OCD: Asynchronous Online Change Detection with Provable Robustness, Optimality, and Convergence

The primary goal of online change detection (OCD) is to promptly identify changes in the data stream. OCD problem find a wide variety of applications in diverse areas, e.g., security detection in smart grids and intrusion detection in communication networks. Prior research usually assumes precise knowledge of the parameters linked to the data stream. Nevertheless, this presumption often proves unattainable in practical scenarios due to factors such as estimation errors, system updates, etc. This paper aims to take the first attempt to develop a triadic-OCD framework with certifiable robustness, provable optimality, and guaranteed convergence. In addition, the proposed triadic-OCD algorithm can be realized in a fully asynchronous distributed manner, easing the necessity of transmitting the data to a single server. This asynchronous mechanism also could mitigate the straggler issue that faced by traditional synchronous algorithm. We then analyze the non-asymptotic convergence property of triadic-OCD and derive its iteration complexity to achieve an $\epsilon$-optimal point. Finally, extensive experiments have been conducted to elucidate the effectiveness of the proposed method.

Updated: 2024-05-03 10:10:11

标题: 三元-OCD：具有可证实的鲁棒性、最优性和收敛性的异步在线变化检测

摘要: 在线变化检测（OCD）的主要目标是及时识别数据流中的变化。 OCD问题在各种领域中找到了广泛的应用，例如智能电网中的安全检测和通信网络中的入侵检测。先前的研究通常假定与数据流相关的参数具有精确知识。然而，在实际场景中，由于估计误差、系统更新等因素，这种假设通常难以实现。本文旨在首次尝试开发一个具有可证实鲁棒性、可证明最优性和保证收敛性的三元-OCD框架。此外，所提出的三元-OCD算法可以以完全异步分布方式实现，减轻了将数据传输到单个服务器的必要性。这种异步机制还可以缓解传统同步算法所面临的拖后者问题。然后，我们分析了三元-OCD的非渐近收敛特性，并推导出其迭代复杂度以实现ε-最优点。最后，进行了大量实验来阐明所提出方法的有效性。

更新时间: 2024-05-03 10:10:11

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.02372v1

Quantifying Distribution Shifts and Uncertainties for Enhanced Model Robustness in Machine Learning Applications

Distribution shifts, where statistical properties differ between training and test datasets, present a significant challenge in real-world machine learning applications where they directly impact model generalization and robustness. In this study, we explore model adaptation and generalization by utilizing synthetic data to systematically address distributional disparities. Our investigation aims to identify the prerequisites for successful model adaptation across diverse data distributions, while quantifying the associated uncertainties. Specifically, we generate synthetic data using the Van der Waals equation for gases and employ quantitative measures such as Kullback-Leibler divergence, Jensen-Shannon distance, and Mahalanobis distance to assess data similarity. These metrics en able us to evaluate both model accuracy and quantify the associated uncertainty in predictions arising from data distribution shifts. Our findings suggest that utilizing statistical measures, such as the Mahalanobis distance, to determine whether model predictions fall within the low-error "interpolation regime" or the high-error "extrapolation regime" provides a complementary method for assessing distribution shift and model uncertainty. These insights hold significant value for enhancing model robustness and generalization, essential for the successful deployment of machine learning applications in real-world scenarios.

Updated: 2024-05-03 10:05:31

标题: 量化分布偏移和不确定性以增强机器学习应用中的模型鲁棒性

摘要: 分布转移，即训练和测试数据集之间的统计属性不同，在现实世界中的机器学习应用中构成了一个重要挑战，因为它们直接影响模型的泛化能力和稳健性。在这项研究中，我们通过利用合成数据系统地解决分布差异来探讨模型适应性和泛化能力。我们的研究旨在确定跨不同数据分布成功模型适应的先决条件，同时量化相关的不确定性。具体地，我们使用范德瓦尔斯气体方程生成合成数据，并利用Kullback-Leibler散度、Jensen-Shannon距离和马哈拉诺比斯距离等定量指标评估数据相似性。这些指标使我们能够评估模型准确性，并量化由数据分布转移引起的预测不确定性。我们的研究结果表明，利用统计指标，如马哈拉诺比斯距离，来确定模型预测是否落入低误差的“内插区域”或高误差的“外推区域”，提供了一个补充方法来评估分布转移和模型不确定性。这些观点对于增强模型的稳健性和泛化能力具有重要价值，这对于机器学习应用在现实世界场景中的成功部署至关重要。

更新时间: 2024-05-03 10:05:31

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.01978v1

Conformal Prediction for Natural Language Processing: A Survey

The rapid proliferation of large language models and natural language processing (NLP) applications creates a crucial need for uncertainty quantification to mitigate risks such as hallucinations and to enhance decision-making reliability in critical applications. Conformal prediction is emerging as a theoretically sound and practically useful framework, combining flexibility with strong statistical guarantees. Its model-agnostic and distribution-free nature makes it particularly promising to address the current shortcomings of NLP systems that stem from the absence of uncertainty quantification. This paper provides a comprehensive survey of conformal prediction techniques, their guarantees, and existing applications in NLP, pointing to directions for future research and open challenges.

Updated: 2024-05-03 10:00:45

标题: 《自然语言处理中的一种一致预测方法：综述》

摘要: 大型语言模型和自然语言处理（NLP）应用的快速增长创造了对不确定性量化的关键需求，以减轻幻觉等风险，并增强关键应用中的决策可靠性。符合性预测作为一个理论上可靠且实用的框架正在崛起，将灵活性与强大的统计保证相结合。其模型无关和分布无关的特性使其特别有希望解决当前NLP系统的短板，这些短板源于缺乏不确定性量化。本文全面调查了符合性预测技术、其保证和现有NLP应用，指出了未来研究和开放挑战的方向。

更新时间: 2024-05-03 10:00:45

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.01976v1

Introducing a microstructure-embedded autoencoder approach for reconstructing high-resolution solution field from reduced parametric space

In this study, we develop a novel multi-fidelity deep learning approach that transforms low-fidelity solution maps into high-fidelity ones by incorporating parametric space information into a standard autoencoder architecture. It is shown that, due to the integration of parametric space data, this method requires significantly less training data to achieve effective performance in predicting high-fidelity solution from the low-fidelity one. In this study, our focus is on a 2D steady-state heat transfer analysis in highly heterogeneous materials microstructure, where the spatial distribution of heat conductivity coefficients for two distinct materials is condensed. Subsequently, the boundary value problem is solved on the coarsest grid using a pre-trained physics-informed neural operator network. Afterward, the calculated low-fidelity result is upscaled using the newly designed enhanced autoencoder. The novelty of the developed enhanced autoencoder lies in the concatenation of heat conductivity maps of different resolutions to the decoder segment in distinct steps. We then compare the outcomes of developed algorithm with the corresponding finite element results, standard U-Net architecture as well as other upscaling approaches such as interpolation functions of varying orders and feedforward neural networks (FFNN). The analysis of the results based on the new approach demonstrates superior performance compared to other approaches in terms of computational cost and error on the test cases. Therefore, as a potential supplement to neural operators networks, our architecture upscales low-fidelity solutions to high-fidelity ones while preserving critical details that are often lost in conventional upscaling methods, especially at sharp interfaces, such as those encountered with interpolation methods.

Updated: 2024-05-03 10:00:36

标题: 引入一种微结构嵌入的自编码器方法，用于从减少的参数空间重建高分辨率解场

摘要: 在这项研究中，我们开发了一种新颖的多保真深度学习方法，通过将参数空间信息结合到标准自动编码器架构中，将低保真解映射转换为高保真解映射。由于集成了参数空间数据，这种方法需要明显较少的训练数据来实现从低保真解到高保真解的有效性能预测。在这项研究中，我们关注高度异质材料微结构中的二维稳态传热分析，其中两种不同材料的热导率系数的空间分布被压缩。随后，在最粗糙的网格上使用预先训练的物理信息神经算子网络解决边界值问题。然后，使用新设计的增强自动编码器对计算得到的低保真结果进行放大。开发的增强自动编码器的新颖之处在于在不同步骤将不同分辨率的热导率映射连接到解码器段中。然后，我们将开发的算法的结果与相应的有限元结果、标准U型网络架构以及其他上采样方法（如不同阶数的插值函数和前馈神经网络）进行比较。基于新方法的结果分析表明，在测试案例中，与其他方法相比，我们的架构在计算成本和误差方面表现出更好的性能。因此，作为神经算子网络的潜在补充，我们的架构将低保真解映射升级为高保真解映射，同时保留通常在传统上采样方法中经常丢失的关键细节，尤其是在插值方法遇到的尖锐界面处。

更新时间: 2024-05-03 10:00:36

领域: cs.CE,cs.LG

下载: http://arxiv.org/abs/2405.01975v1

Multitask Extension of Geometrically Aligned Transfer Encoder

Molecular datasets often suffer from a lack of data. It is well-known that gathering data is difficult due to the complexity of experimentation or simulation involved. Here, we leverage mutual information across different tasks in molecular data to address this issue. We extend an algorithm that utilizes the geometric characteristics of the encoding space, known as the Geometrically Aligned Transfer Encoder (GATE), to a multi-task setup. Thus, we connect multiple molecular tasks by aligning the curved coordinates onto locally flat coordinates, ensuring the flow of information from source tasks to support performance on target data.

Updated: 2024-05-03 09:57:44

标题: 几何对齐转移编码器的多任务扩展

摘要: 分子数据集通常缺乏数据。众所周知，由于涉及实验或模拟的复杂性，收集数据是困难的。在这里，我们利用分子数据中不同任务之间的互信息来解决这个问题。我们将利用编码空间的几何特征的算法扩展到多任务设置中，该算法被称为几何对齐传输编码器（GATE）。因此，我们通过将弯曲坐标对齐到局部平坦坐标来连接多个分子任务，确保从源任务到目标数据的信息流，以支持目标数据上的性能。

更新时间: 2024-05-03 09:57:44

领域: cs.LG,cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2405.01974v1

It's About Time: Temporal References in Emergent Communication

Emergent communication studies the development of language between autonomous agents, aiming to improve understanding of natural language evolution and increase communication efficiency. While temporal aspects of language have been considered in computational linguistics, there has been no research on temporal references in emergent communication. This paper addresses this gap, by exploring how agents communicate about temporal relationships. We analyse three potential influences for the emergence of temporal references: environmental, external, and architectural changes. Our experiments demonstrate that altering the loss function is insufficient for temporal references to emerge; rather, architectural changes are necessary. However, a minimal change in agent architecture, using a different batching method, allows the emergence of temporal references. This modified design is compared with the standard architecture in a temporal referential games environment, which emphasises temporal relationships. The analysis indicates that over 95\% of the agents with the modified batching method develop temporal references, without changes to their loss function. We consider temporal referencing necessary for future improvements to the agents' communication efficiency, yielding a closer to optimal coding as compared to purely compositional languages. Our readily transferable architectural insights provide the basis for their incorporation into other emergent communication settings.

Updated: 2024-05-03 09:44:44

标题: 这是关于时间的：新兴交流中的时间参考

摘要: 紧急通信研究了自主代理之间语言发展，旨在改进对自然语言演变的理解并提高通信效率。尽管计算语言学已经考虑了语言的时间特性，但在紧急通信中并没有研究时间参照。本文填补了这一空白，探讨了代理如何沟通关于时间关系。我们分析了时间参照出现的三个潜在影响：环境、外部和架构变化。我们的实验表明，改变损失函数不足以使时间参照出现；相反，架构变化是必要的。然而，通过对代理架构进行最小的改变，使用不同的批处理方法，可以使时间参照出现。这种修改设计与标准架构在强调时间关系的时间指代游戏环境中进行了比较。分析表明，具有修改的批处理方法的代理中超过95\%发展出时间参照，而不需要改变损失函数。我们认为时间参照对于未来改进代理的通信效率是必要的，与纯粹的组合语言相比，可以产生更接近最佳编码。我们的易于转移的架构见解为它们融入其他紧急通信设置提供了基础。

更新时间: 2024-05-03 09:44:44

领域: cs.CL,cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2310.06555v2

Understanding LLMs Requires More Than Statistical Generalization

The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart -- thus, equivalent test loss -- can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of zero-shot rule extrapolation; (2) the approximate non-identifiability of in-context learning; and (3) the non-identifiability of fine-tunability. We review promising research directions focusing on LLM-relevant generalization measures, transferability, and inductive biases.

Updated: 2024-05-03 09:41:39

标题: 理解LLMs需要不仅仅是统计概括

摘要: 在过去的十年中，深度学习理论的研究蓬勃发展，试图回答“为什么深度学习能泛化？”这一问题。这一进展的一个重要转变是对过度参数化模型在插值区域的研究。在本文中，我们认为另一个视角的转变是必要的，因为一些长期期望的LLM特性并不是良好统计泛化的结果，需要一个单独的理论解释。我们的核心论点基于一个观察，即AR概率模型本质上是不可辨认的：零或接近零的KL散度模型之间存在等效测试损失的显著不同行为。我们用数学例子和经验观察支持我们的观点，说明为什么不可辨认性通过三个案例研究具有实际相关性：（1）零射规则外推的不可辨认性；（2）上下文学习的近似不可辨认性；以及（3）调整精细度的不可辨认性。我们回顾了关注LLM相关泛化度量、可传递性和归纳偏差的有前途的研究方向。

更新时间: 2024-05-03 09:41:39

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.01964v1

From Attack to Defense: Insights into Deep Learning Security Measures in Black-Box Settings

Deep Learning (DL) is rapidly maturing to the point that it can be used in safety- and security-crucial applications. However, adversarial samples, which are undetectable to the human eye, pose a serious threat that can cause the model to misbehave and compromise the performance of such applications. Addressing the robustness of DL models has become crucial to understanding and defending against adversarial attacks. In this study, we perform comprehensive experiments to examine the effect of adversarial attacks and defenses on various model architectures across well-known datasets. Our research focuses on black-box attacks such as SimBA, HopSkipJump, MGAAttack, and boundary attacks, as well as preprocessor-based defensive mechanisms, including bits squeezing, median smoothing, and JPEG filter. Experimenting with various models, our results demonstrate that the level of noise needed for the attack increases as the number of layers increases. Moreover, the attack success rate decreases as the number of layers increases. This indicates that model complexity and robustness have a significant relationship. Investigating the diversity and robustness relationship, our experiments with diverse models show that having a large number of parameters does not imply higher robustness. Our experiments extend to show the effects of the training dataset on model robustness. Using various datasets such as ImageNet-1000, CIFAR-100, and CIFAR-10 are used to evaluate the black-box attacks. Considering the multiple dimensions of our analysis, e.g., model complexity and training dataset, we examined the behavior of black-box attacks when models apply defenses. Our results show that applying defense strategies can significantly reduce attack effectiveness. This research provides in-depth analysis and insight into the robustness of DL models against various attacks, and defenses.

Updated: 2024-05-03 09:40:47

标题: 从攻击到防御：对黑匣子设置中深度学习安全措施的见解

摘要: 深度学习（DL）迅速成熟到可以在安全和安全关键应用中使用的程度。然而，对人眼不可检测的对抗样本构成了一个严重威胁，可能导致模型行为异常并损害这些应用的性能。解决DL模型的鲁棒性已变得至关重要，以理解和防御对抗性攻击。在本研究中，我们进行全面实验，研究对抗性攻击和防御对知名数据集中各种模型架构的影响。我们的研究重点是黑盒攻击，如SimBA、HopSkipJump、MGAAttack和边界攻击，以及基于预处理器的防御机制，包括位压缩、中值平滑和JPEG滤波器。通过对各种模型进行实验，我们的结果表明，攻击所需的噪声水平随着层数的增加而增加。此外，随着层数的增加，攻击成功率减少。这表明模型复杂性和鲁棒性具有显著关系。通过研究多样性和鲁棒性关系，我们对不同模型进行的实验表明，拥有大量参数并不意味着更高的鲁棒性。我们的实验还展示了训练数据集对模型鲁棒性的影响。使用诸如ImageNet-1000、CIFAR-100和CIFAR-10等各种数据集来评估黑盒攻击。考虑到我们分析的多个维度，如模型复杂性和训练数据集，我们研究了模型应用防御时黑盒攻击的行为。我们的结果表明，应用防御策略可以显著降低攻击的有效性。这项研究对DL模型在各种攻击和防御下的鲁棒性进行了深入分析和洞察。

更新时间: 2024-05-03 09:40:47

领域: cs.CR,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.01963v1

Architecture of a Cortex Inspired Hierarchical Event Recaller

This paper proposes a new approach to Machine Learning (ML) that focuses on unsupervised continuous context-dependent learning of complex patterns. Although the proposal is partly inspired by some of the current knowledge about the structural and functional properties of the mammalian brain, we do not claim that biological systems work in an analogous way (nor the opposite). Based on some properties of the cerebellar cortex and adjacent structures, a proposal suitable for practical problems is presented. A synthetic structure capable of identifying and predicting complex temporal series will be defined and experimentally tested. The system relies heavily on prediction to help identify and learn patterns based on previously acquired contextual knowledge. As a proof of concept, the proposed system is shown to be able to learn, identify and predict a remarkably complex temporal series such as human speech, with no prior knowledge. From raw data, without any adaptation in the core algorithm, the system is able to identify certain speech structures from a set of Spanish sentences. Unlike conventional ML, the proposal can learn with a reduced training set. Although the idea can be applied to a constrained problem, such as the detection of unknown vocabulary in a speech, it could be used in more applications, such as vision, or (by incorporating the missing biological periphery) fit into other ML techniques. Given the trivial computational primitives used, a potential hardware implementation will be remarkably frugal. Coincidentally, the proposed model not only conforms to a plausible functional framework for biological systems but may also explain many elusive cognitive phenomena.

Updated: 2024-05-03 09:36:16

标题: 一个类皮质层启发的分层事件回溯器的架构

摘要: 本文提出了一种新的机器学习（ML）方法，着重于无监督的连续上下文依赖学习复杂模式。尽管该提议部分受到目前关于哺乳动物大脑结构和功能特性的一些知识的启发，但我们并不声称生物系统以类似的方式工作（也不否定）。基于小脑皮层和相邻结构的一些特性，提出了一个适用于实际问题的提议。将定义一个合成结构，能够识别和预测复杂的时间序列，并进行实验测试。该系统在识别和学习基于先前获得的上下文知识的模式时，严重依赖于预测。作为概念验证，所提出的系统被证明能够学习、识别和预测极其复杂的时间序列，如人类言语，而无需先验知识。从原始数据中，没有对核心算法进行任何调整，系统就能够从一组西班牙语句子中识别出特定的言语结构。与传统的机器学习不同，该提议可以通过减少训练集来学习。尽管这个想法可以应用于受限问题，比如检测言语中的未知词汇，但它也可用于更多应用，如视觉，或者（通过整合缺失的生物周边）融入其他机器学习技术。鉴于所使用的基本计算原语，潜在的硬件实现将非常节俭。巧合的是，所提出的模型不仅符合生物系统的合理功能框架，还可能解释许多难以捉摸的认知现象。

更新时间: 2024-05-03 09:36:16

领域: cs.NE,cs.AI,cs.AR,cs.LG

下载: http://arxiv.org/abs/2405.02371v1

Neuromorphic Correlates of Artificial Consciousness

The concept of neural correlates of consciousness (NCC), which suggests that specific neural activities are linked to conscious experiences, has gained widespread acceptance. This acceptance is based on a wealth of evidence from experimental studies, brain imaging techniques such as fMRI and EEG, and theoretical frameworks like integrated information theory (IIT) within neuroscience and the philosophy of mind. This paper explores the potential for artificial consciousness by merging neuromorphic design and architecture with brain simulations. It proposes the Neuromorphic Correlates of Artificial Consciousness (NCAC) as a theoretical framework. While the debate on artificial consciousness remains contentious due to our incomplete grasp of consciousness, this work may raise eyebrows and invite criticism. Nevertheless, this optimistic and forward-thinking approach is fueled by insights from the Human Brain Project, advancements in brain imaging like EEG and fMRI, and recent strides in AI and computing, including quantum and neuromorphic designs. Additionally, this paper outlines how machine learning can play a role in crafting artificial consciousness, aiming to realise machine consciousness and awareness in the future.

Updated: 2024-05-03 09:27:51

标题: 人工意识的神经形态学相关性

摘要: 意识的神经相关性(NCC)的概念表明特定的神经活动与意识体验有关，已经得到广泛的认可。这种认可是基于实验研究、脑成像技术如fMRI和EEG以及神经科学和心灵哲学中的理论框架，比如整合信息理论(IIT)的大量证据。本文探讨了通过将神经形态设计和架构与脑模拟相结合，实现人工意识的潜力。它提出了神经形态相关的人工意识(NCAC)作为一个理论框架。尽管人工意识的争论仍然存在争议，因为我们对意识的理解尚不完整，但这项工作可能引起关注并招致批评。然而，这种乐观和前瞻性的方法受到了人类大脑计划的启示、脑成像技术如EEG和fMRI的进展，以及包括量子和神经形态设计在内的人工智能和计算的最新进展的推动。此外，本文概述了机器学习如何在塑造人工意识中发挥作用，旨在未来实现机器意识和认识。

更新时间: 2024-05-03 09:27:51

领域: cs.AI,eess.SP

下载: http://arxiv.org/abs/2405.02370v1

Three Quantization Regimes for ReLU Networks

We establish the fundamental limits in the approximation of Lipschitz functions by deep ReLU neural networks with finite-precision weights. Specifically, three regimes, namely under-, over-, and proper quantization, in terms of minimax approximation error behavior as a function of network weight precision, are identified. This is accomplished by deriving nonasymptotic tight lower and upper bounds on the minimax approximation error. Notably, in the proper-quantization regime, neural networks exhibit memory-optimality in the approximation of Lipschitz functions. Deep networks have an inherent advantage over shallow networks in achieving memory-optimality. We also develop the notion of depth-precision tradeoff, showing that networks with high-precision weights can be converted into functionally equivalent deeper networks with low-precision weights, while preserving memory-optimality. This idea is reminiscent of sigma-delta analog-to-digital conversion, where oversampling rate is traded for resolution in the quantization of signal samples. We improve upon the best-known ReLU network approximation results for Lipschitz functions and describe a refinement of the bit extraction technique which could be of independent general interest.

Updated: 2024-05-03 09:27:31

标题: ReLU网络的三种量化模式

摘要: 我们建立了有限精度权重的深度ReLU神经网络逼近Lipschitz函数的基本极限。具体地，我们确定了三种情况，即欠量化、过量化和适当量化，这些情况以网络权重精度作为函数的极小极大逼近误差行为。通过推导非渐近紧密的极小极大逼近误差的下界和上界来实现这一点。值得注意的是，在适当量化的情况下，神经网络在逼近Lipschitz函数时表现出内存优化性。深度网络在实现内存优化性方面具有固有优势。我们还发展了深度精度权衡的概念，表明高精度权重的网络可以转换为具有低精度权重的功能等效更深的网络，同时保持内存优化性。这个想法让人想起Sigma-Delta模拟数字转换，其中过采样率被用来交换信号样本的量化分辨率。我们改进了对Lipschitz函数的ReLU网络逼近结果，并描述了一种比特提取技术的改进，这可能具有独立的普遍兴趣。

更新时间: 2024-05-03 09:27:31

领域: stat.ML,cs.AI,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2405.01952v1

Training robust and generalizable quantum models

Adversarial robustness and generalization are both crucial properties of reliable machine learning models. In this letter, we study these properties in the context of quantum machine learning based on Lipschitz bounds. We derive parameter-dependent Lipschitz bounds for quantum models with trainable encoding, showing that the norm of the data encoding has a crucial impact on the robustness against data perturbations. Further, we derive a bound on the generalization error which explicitly involves the parameters of the data encoding. Our theoretical findings give rise to a practical strategy for training robust and generalizable quantum models by regularizing the Lipschitz bound in the cost. Further, we show that, for fixed and non-trainable encodings, as those frequently employed in quantum machine learning, the Lipschitz bound cannot be influenced by tuning the parameters. Thus, trainable encodings are crucial for systematically adapting robustness and generalization during training. The practical implications of our theoretical findings are illustrated with numerical results.

Updated: 2024-05-03 09:14:21

标题: Training robust and generalizable quantum models 训练鲁棒和具有普适性的量子模型

摘要: 对抗鲁棒性和泛化能力都是可靠的机器学习模型的关键属性。在这封信中，我们基于利普希茨界限研究了量子机器学习中的这些属性。我们为具有可训练编码的量子模型推导了参数相关的利普希茨界限，表明数据编码的范数对于抵抗数据扰动具有重要影响。此外，我们推导了一个关于泛化误差的界限，明确涉及数据编码的参数。我们的理论发现为通过在损失中正则化利普希茨界限来训练鲁棒和泛化能力强的量子模型提供了实用策略。此外，我们发现，对于在量子机器学习中经常使用的固定且不可训练的编码，无法通过调整参数来影响利普希茨界限。因此，在训练过程中，可训练编码对于系统地调整鲁棒性和泛化能力至关重要。我们的理论发现的实际影响通过数值结果加以说明。

更新时间: 2024-05-03 09:14:21

领域: quant-ph,cs.LG,math.OC

下载: http://arxiv.org/abs/2311.11871v2

Dependency-Aware Semi-Structured Sparsity of GLU Variants in Large Language Models

The rapid advancement in Large Language Models (LLMs) has markedly enhanced the capabilities of language understanding and generation. However, the substantial model size poses hardware challenges, affecting both memory size for serving and inference latency for token generation. To address those challenges, we propose Dependency-aware Semi-structured Sparsity (DaSS), a novel method for the recent prevalent SwiGLU-based LLMs pruning. Our approach incorporates structural dependency into the weight magnitude-based unstructured pruning. We introduce an MLP-specific pruning metric that evaluates the importance of each weight by jointly considering its magnitude and its corresponding MLP intermediate activation norms. DaSS facilitates a balance between the adaptability offered by unstructured pruning and the structural consistency inherent in dependency-based structured pruning. Empirical evaluations on Mistral and LLaMA2 model families demonstrate that DaSS not only outperforms both SparseGPT and Wanda in achieving hardware-friendly N:M sparsity patterns but also maintains the computational efficiency of Wanda.

Updated: 2024-05-03 09:13:13

标题: 大型语言模型中GLU变体的依赖感知半结构稀疏化

摘要: 大型语言模型（LLMs）的快速发展显着提升了语言理解和生成的能力。然而，巨大的模型大小给硬件带来了挑战，影响了为服务和推理生成令牌所需的内存大小和推理延迟。为了解决这些挑战，我们提出了一种新颖的方法，称为Dependency-aware Semi-structured Sparsity（DaSS），用于最近流行的基于SwiGLU的LLMs修剪。我们的方法将结构依赖性融入基于权重大小的非结构化修剪中。我们引入了一个特定于MLP的修剪度量，通过同时考虑权重的大小和其对应的MLP中间激活范数来评估每个权重的重要性。DaSS在非结构化修剪提供的可适应性和基于依赖性的结构化修剪固有的结构一致性之间实现了平衡。对Mistral和LLaMA2模型系列的实证评估表明，DaSS不仅在实现硬件友好的N:M稀疏模式方面优于SparseGPT和Wanda，而且还保持了Wanda的计算效率。

更新时间: 2024-05-03 09:13:13

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.01943v1

No One-Size-Fits-All Neurons: Task-based Neurons for Artificial Neural Networks

Biologically, the brain does not rely on a single type of neuron that universally functions in all aspects. Instead, it acts as a sophisticated designer of task-based neurons. In this study, we address the following question: since the human brain is a task-based neuron user, can the artificial network design go from the task-based architecture design to the task-based neuron design? Since methodologically there are no one-size-fits-all neurons, given the same structure, task-based neurons can enhance the feature representation ability relative to the existing universal neurons due to the intrinsic inductive bias for the task. Specifically, we propose a two-step framework for prototyping task-based neurons. First, symbolic regression is used to identify optimal formulas that fit input data by utilizing base functions such as logarithmic, trigonometric, and exponential functions. We introduce vectorized symbolic regression that stacks all variables in a vector and regularizes each input variable to perform the same computation, which can expedite the regression speed, facilitate parallel computation, and avoid overfitting. Second, we parameterize the acquired elementary formula to make parameters learnable, which serves as the aggregation function of the neuron. The activation functions such as ReLU and the sigmoidal functions remain the same because they have proven to be good. Empirically, experimental results on synthetic data, classic benchmarks, and real-world applications show that the proposed task-based neuron design is not only feasible but also delivers competitive performance over other state-of-the-art models.

Updated: 2024-05-03 09:12:46

标题: 没有一种大小适合所有的神经元：基于任务的人工神经网络神经元

摘要: 在生物学上，大脑并不依赖于一种在所有方面普遍发挥作用的神经元类型。相反，它作为一种复杂的基于任务的神经元设计者。在这项研究中，我们探讨了以下问题：由于人类大脑是一种基于任务的神经元用户，人工网络设计是否可以从基于任务的架构设计转向基于任务的神经元设计？由于在方法上没有一种适用于所有情况的神经元，因此在相同结构下，基于任务的神经元可以由于任务的固有归纳偏差而增强特征表示能力相对于现有的通用神经元。具体来说，我们提出了一个用于原型化基于任务的神经元的两步框架。首先，我们使用符号回归来识别适合输入数据的最佳公式，利用对数、三角和指数函数等基本函数。我们介绍了矢量化符号回归，将所有变量堆叠在一个向量中，并对每个输入变量进行正则化以执行相同的计算，这可以加快回归速度，促进并行计算，并避免过度拟合。其次，我们对获得的基本公式进行参数化，使参数可学习，这充当神经元的聚合函数。激活函数，如ReLU和Sigmoid函数保持不变，因为它们已被证明是有效的。根据实证数据，在合成数据、经典基准测试和实际应用中的实验结果表明，所提出的基于任务的神经元设计不仅可行，而且在其他最新模型上表现出有竞争力的性能。

更新时间: 2024-05-03 09:12:46

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.02369v1

3D-based RNA function prediction tools in rnaglib

Understanding the connection between complex structural features of RNA and biological function is a fundamental challenge in evolutionary studies and in RNA design. However, building datasets of RNA 3D structures and making appropriate modeling choices remains time-consuming and lacks standardization. In this chapter, we describe the use of rnaglib, to train supervised and unsupervised machine learning-based function prediction models on datasets of RNA 3D structures.

Updated: 2024-05-03 09:01:17

标题: rnaglib中基于3D的RNA功能预测工具

摘要: 理解RNA的复杂结构特征与生物功能之间的联系是进化研究和RNA设计中的一个基本挑战。然而，构建RNA 3D结构数据集并进行适当的建模选择仍然耗时且缺乏标准化。在本章中，我们描述了使用rnaglib来训练基于监督和无监督的机器学习功能预测模型，这些模型基于RNA 3D结构数据集。

更新时间: 2024-05-03 09:01:17

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2402.09330v2

Impact of Architectural Modifications on Deep Learning Adversarial Robustness

Rapid advancements of deep learning are accelerating adoption in a wide variety of applications, including safety-critical applications such as self-driving vehicles, drones, robots, and surveillance systems. These advancements include applying variations of sophisticated techniques that improve the performance of models. However, such models are not immune to adversarial manipulations, which can cause the system to misbehave and remain unnoticed by experts. The frequency of modifications to existing deep learning models necessitates thorough analysis to determine the impact on models' robustness. In this work, we present an experimental evaluation of the effects of model modifications on deep learning model robustness using adversarial attacks. Our methodology involves examining the robustness of variations of models against various adversarial attacks. By conducting our experiments, we aim to shed light on the critical issue of maintaining the reliability and safety of deep learning models in safety- and security-critical applications. Our results indicate the pressing demand for an in-depth assessment of the effects of model changes on the robustness of models.

Updated: 2024-05-03 08:58:38

标题: 建筑修改对深度学习对抗性鲁棒性的影响

摘要: 深度学习的快速进展加速了在各种应用中的采用，包括自动驾驶车辆、无人机、机器人和监控系统等安全关键应用。这些进展包括应用各种复杂技术的变体，以提高模型的性能。然而，这些模型并非免疫于对抗性操纵，这可能导致系统出现异常行为，并未被专家注意到。对现有深度学习模型的修改频率需要进行彻底分析，以确定对模型鲁棒性的影响。在这项工作中，我们通过对抗性攻击来实验评估模型修改对深度学习模型鲁棒性的影响。我们的方法涉及检查各种模型对各种对抗性攻击的鲁棒性。通过进行实验，我们旨在阐明在安全和安全关键应用中维护深度学习模型可靠性和安全性的关键问题。我们的结果表明，迫切需要对模型变化对模型鲁棒性的影响进行深入评估。

更新时间: 2024-05-03 08:58:38

领域: cs.CV,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.01934v1

SlotGAT: Slot-based Message Passing for Heterogeneous Graph Neural Network

Heterogeneous graphs are ubiquitous to model complex data. There are urgent needs on powerful heterogeneous graph neural networks to effectively support important applications. We identify a potential semantic mixing issue in existing message passing processes, where the representations of the neighbors of a node $v$ are forced to be transformed to the feature space of $v$ for aggregation, though the neighbors are in different types. That is, the semantics in different node types are entangled together into node $v$'s representation. To address the issue, we propose SlotGAT with separate message passing processes in slots, one for each node type, to maintain the representations in their own node-type feature spaces. Moreover, in a slot-based message passing layer, we design an attention mechanism for effective slot-wise message aggregation. Further, we develop a slot attention technique after the last layer of SlotGAT, to learn the importance of different slots in downstream tasks. Our analysis indicates that the slots in SlotGAT can preserve different semantics in various feature spaces. The superiority of SlotGAT is evaluated against 13 baselines on 6 datasets for node classification and link prediction. Our code is at https://github.com/scottjiao/SlotGAT_ICML23/.

Updated: 2024-05-03 08:44:04

标题: SlotGAT：基于槽的异构图神经网络消息传递

摘要: 异构图在建模复杂数据时普遍存在。强大的异构图神经网络迫切需要有效支持重要应用。我们在现有的消息传递过程中识别出潜在的语义混合问题，其中节点$v$的邻居的表示被强制转换到$v$的特征空间进行聚合，尽管邻居属于不同类型。也就是说，不同节点类型的语义被纠缠到节点$v$的表示中。为了解决这个问题，我们提出了SlotGAT，其中在不同槽中分别进行消息传递过程，每个槽对应一个节点类型，以保持在各自节点类型特征空间中的表示。此外，在基于槽的消息传递层中，我们设计了一种注意力机制，用于有效的槽级消息聚合。此外，我们在SlotGAT的最后一层之后开发了一个槽注意技术，用于学习在下游任务中不同槽的重要性。我们的分析表明，SlotGAT中的槽可以保留各种特征空间中的不同语义。SlotGAT的优越性已在6个数据集上针对节点分类和链接预测的13个基线进行了评估。我们的代码位于https://github.com/scottjiao/SlotGAT_ICML23/。

更新时间: 2024-05-03 08:44:04

领域: cs.LG

下载: http://arxiv.org/abs/2405.01927v1

Semi-Parametric Retrieval via Binary Token Index

The landscape of information retrieval has broadened from search services to a critical component in various advanced applications, where indexing efficiency, cost-effectiveness, and freshness are increasingly important yet remain less explored. To address these demands, we introduce Semi-parametric Vocabulary Disentangled Retrieval (SVDR). SVDR is a novel semi-parametric retrieval framework that supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval. In our evaluation on three open-domain question answering benchmarks with the entire Wikipedia as the retrieval corpus, SVDR consistently demonstrates superiority. It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and an 9% higher top-1 accuracy compared to BM25 when using a binary token index. Specifically, the adoption of a binary token index reduces index preparation time from 30 GPU hours to just 2 CPU hours and storage size from 31 GB to 2 GB, achieving a 90% reduction compared to an embedding-based index.

Updated: 2024-05-03 08:34:13

标题: 半参数检索通过二进制标记索引

摘要: 信息检索的领域已经从搜索服务拓展到各种先进应用的关键组成部分，索引效率、成本效益和新鲜度日益重要，但仍然缺乏深入探讨。为了满足这些需求，我们引入了半参数词汇解缠检索（SVDR）。SVDR是一种新颖的半参数检索框架，支持两种类型的索引：基于嵌入的索引，类似于现有的神经检索方法，具有高效性；以及二进制令牌索引，允许快速和成本效益的设置，类似于传统的基于术语的检索。在我们对三个开放域问答基准测试中对整个维基百科作为检索语料库进行评估时，SVDR始终表现出优越性。当使用基于嵌入的索引时，与密集检索器DPR相比，它的top-1检索准确率高出3％，当使用二进制令牌索引时，与BM25相比，它的top-1准确率高出9％。特别是采用二进制令牌索引，将索引准备时间从30 GPU小时减少到仅需2个CPU小时，存储大小从31 GB减少到2 GB，与基于嵌入的索引相比实现了90％的减少。

更新时间: 2024-05-03 08:34:13

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2405.01924v1

DiffECG: A Versatile Probabilistic Diffusion Model for ECG Signals Synthesis

Within cardiovascular disease detection using deep learning applied to ECG signals, the complexities of handling physiological signals have sparked growing interest in leveraging deep generative models for effective data augmentation. In this paper, we introduce a novel versatile approach based on denoising diffusion probabilistic models for ECG synthesis, addressing three scenarios: (i) heartbeat generation, (ii) partial signal imputation, and (iii) full heartbeat forecasting. Our approach presents the first generalized conditional approach for ECG synthesis, and our experimental results demonstrate its effectiveness for various ECG-related tasks. Moreover, we show that our approach outperforms other state-of-the-art ECG generative models and can enhance the performance of state-of-the-art classifiers.

Updated: 2024-05-03 08:25:54

标题: DiffECG：一种用于心电图信号合成的多功能概率扩散模型

摘要: 在应用于心电图信号的深度学习进行心血管疾病检测时，处理生理信号的复杂性引发了对利用深度生成模型进行有效数据增强的不断增长的兴趣。在本文中，我们介绍了一种基于去噪扩散概率模型的新颖多功能方法，用于心电图合成，涉及三种情景：(i)心跳生成，(ii)部分信号填充，和 (iii)完整心跳预测。我们的方法提出了第一个用于心电图合成的泛化条件方法，并我们的实验结果表明其对于各种与心电图相关的任务的有效性。此外，我们展示了我们的方法优于其他最先进的心电图生成模型，并且可以提高最先进的分类器的性能。

更新时间: 2024-05-03 08:25:54

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2306.01875v3

AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

Large Language Models (LLMs) have demonstrated significant success across various domains. However, their application in complex decision-making tasks frequently necessitates intricate prompt engineering or fine-tuning, leading to challenges in unseen downstream tasks and heavy demands on computational resources. Meanwhile, Reinforcement Learning (RL) has been recognized as effective in decision-making problems but struggles in environments with sparse rewards, such as open-world games. To overcome these challenges, we introduce AdaRefiner, a novel framework designed to enhance the synergy between LLMs and RL feedback. The key component of AdaRefiner is a lightweight Adapter Language Model (LM), which automatically refines task comprehension based on feedback from RL agents. This method mitigates the need for intricate prompt engineering and intensive LLM fine-tuning while maintaining the LLMs' generalization abilities and enhancing their decision-making capabilities in downstream tasks. Empirical evaluations of AdaRefiner on 22 diverse tasks within the open-world game Crafter have demonstrated its superior effectiveness, especially in guiding agents towards higher-level and common-sense skills. Our work makes contributions to the automatic self-refinement of LLMs with RL feedback, offering a more adaptable and efficient solution for complex decision-making problems.

Updated: 2024-05-03 08:24:12

标题: AdaRefiner：使用自适应反馈改进语言模型的决策

摘要: 大型语言模型(LLMs)在各个领域取得了显著的成功。然而，在复杂决策任务中应用它们通常需要复杂的提示工程或微调，这导致在未知下游任务中面临挑战并对计算资源需求巨大。同时，强化学习(RL)在决策问题中被认为是有效的，但在奖励稀疏的环境中，如开放世界游戏中很难应用。为了克服这些挑战，我们介绍了AdaRefiner，这是一个旨在增强LLMs和RL反馈之间协同作用的新框架。AdaRefiner的关键组成部分是一个轻量级的Adapter语言模型(LM)，它根据RL代理的反馈自动完善任务理解。这种方法减轻了复杂提示工程和LLMs强化微调的需要，同时保持了LLMs的泛化能力，并增强了它们在下游任务中的决策能力。在开放世界游戏Crafter中对AdaRefiner在22个不同任务上的实证评估表明了其卓越的有效性，特别是在引导代理向更高级和常识技能方向发展方面。我们的工作为LLMs自动自我完善提供了基于RL反馈的更加灵活和高效的解决方案，为复杂决策问题提供了更好的适应性。

更新时间: 2024-05-03 08:24:12

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2309.17176v3

The Privacy Power of Correlated Noise in Decentralized Learning

Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data. Yet, without additional precautions, curious users can still leverage models obtained from their peers to violate privacy. In this paper, we propose Decor, a variant of decentralized SGD with differential privacy (DP) guarantees. Essentially, in Decor, users securely exchange randomness seeds in one communication round to generate pairwise-canceling correlated Gaussian noises, which are injected to protect local models at every communication round. We theoretically and empirically show that, for arbitrary connected graphs, Decor matches the central DP optimal privacy-utility trade-off. We do so under SecLDP, our new relaxation of local DP, which protects all user communications against an external eavesdropper and curious users, assuming that every pair of connected users shares a secret, i.e., an information hidden to all others. The main theoretical challenge is to control the accumulation of non-canceling correlated noise due to network sparsity. We also propose a companion SecLDP privacy accountant for public use.

Updated: 2024-05-03 08:14:22

标题: 分散式学习中相关噪音的隐私保护力量

摘要: 分散式学习具有吸引力，因为它使得能够可伸缩地利用大量分布式数据和资源（而无需依赖于任何中央实体），同时可以促进隐私保护，因为每个用户最小化其数据的直接曝光。然而，如果没有额外的预防措施，好奇的用户仍然可以利用从他们的同行获取的模型来侵犯隐私。在本文中，我们提出了Decor，这是一种带有差分隐私（DP）保证的分散式SGD的变体。在Decor中，用户在一个通信轮中安全地交换随机种子，以生成成对抵消的相关高斯噪声，这些噪声在每个通信轮中被注入以保护本地模型。我们在理论上和经验上展示，对于任意连接的图形，Decor与中心DP最佳隐私-效用权衡相匹配。我们在SecLDP下进行，这是我们对本地DP的新放松，它保护所有用户通信免受外部窃听者和好奇用户的侵犯，假设每对连接的用户分享一个秘密，即对所有其他人隐藏的信息。主要的理论挑战是控制由于网络稀疏性导致的非抵消相关噪声的累积。我们还提出了一个供公共使用的伴随SecLDP隐私账户。

更新时间: 2024-05-03 08:14:22

领域: cs.LG,cs.CR,cs.DC,math.OC,stat.ML

下载: http://arxiv.org/abs/2405.01031v2

MRI Scan Synthesis Methods based on Clustering and Pix2Pix

We consider a missing data problem in the context of automatic segmentation methods for Magnetic Resonance Imaging (MRI) brain scans. Usually, automated MRI scan segmentation is based on multiple scans (e.g., T1-weighted, T2-weighted, T1CE, FLAIR). However, quite often a scan is blurry, missing or otherwise unusable. We investigate the question whether a missing scan can be synthesized. We exemplify that this is in principle possible by synthesizing a T2-weighted scan from a given T1-weighted scan. Our first aim is to compute a picture that resembles the missing scan closely, measured by average mean squared error (MSE). We develop/use several methods for this, including a random baseline approach, a clustering-based method and pixel-to-pixel translation method by Isola et al. (Pix2Pix) which is based on conditional GANs. The lowest MSE is achieved by our clustering-based method. Our second aim is to compare the methods with respect to the effect that using the synthesized scan has on the segmentation process. For this, we use a DeepMedic model trained with the four input scan modalities named above. We replace the T2-weighted scan by the synthesized picture and evaluate the segmentations with respect to the tumor identification, using Dice scores as numerical evaluation. The evaluation shows that the segmentation works well with synthesized scans (in particular, with Pix2Pix methods) in many cases.

Updated: 2024-05-03 08:12:53

标题: 基于聚类和Pix2Pix的MRI扫描合成方法

摘要: 我们考虑了自动分割方法在磁共振成像（MRI）脑部扫描中的缺失数据问题。通常，自动MRI扫描分割是基于多个扫描（例如，T1加权、T2加权、T1CE、FLAIR）。然而，很多时候一次扫描是模糊的、缺失的或者其他原因无法使用的。我们探讨了一个问题，即缺失的扫描是否可以合成。我们通过从给定的T1加权扫描合成T2加权扫描来示例说明这在原则上是可能的。我们的第一个目标是计算一个与缺失扫描相似的图片，通过平均均方误差（MSE）来衡量。我们开发/使用了几种方法，包括随机基线方法、基于聚类的方法和Isola等人的像素到像素翻译方法（Pix2Pix），该方法基于条件GANs。最低的MSE是通过我们的基于聚类的方法实现的。我们的第二个目标是比较这些方法在使用合成扫描对分割过程的影响方面的效果。为此，我们使用了一个DeepMedic模型，该模型使用了上述四种输入扫描模式进行训练。我们用合成图片替换了T2加权扫描，并使用Dice分数对肿瘤识别进行评估。评估显示，在许多情况下，分割使用合成扫描效果良好（尤其是使用Pix2Pix方法）。

更新时间: 2024-05-03 08:12:53

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.05176v2

Instance-Conditioned Adaptation for Large-scale Generalization of Neural Combinatorial Optimization

The neural combinatorial optimization (NCO) approach has shown great potential for solving routing problems without the requirement of expert knowledge. However, existing constructive NCO methods cannot directly solve large-scale instances, which significantly limits their application prospects. To address these crucial shortcomings, this work proposes a novel Instance-Conditioned Adaptation Model (ICAM) for better large-scale generalization of neural combinatorial optimization. In particular, we design a powerful yet lightweight instance-conditioned adaptation module for the NCO model to generate better solutions for instances across different scales. In addition, we develop an efficient three-stage reinforcement learning-based training scheme that enables the model to learn cross-scale features without any labeled optimal solution. Experimental results show that our proposed method is capable of obtaining excellent results with a very fast inference time in solving Traveling Salesman Problems (TSPs) and Capacitated Vehicle Routing Problems (CVRPs) across different scales. To the best of our knowledge, our model achieves state-of-the-art performance among all RL-based constructive methods for TSP and CVRP with up to 1,000 nodes.

Updated: 2024-05-03 08:00:19

标题: 基于实例条件适应的神经组合优化大规模泛化

摘要: 神经组合优化（NCO）方法已经显示出在解决路由问题时不需要专业知识的巨大潜力。然而，现有的构造性NCO方法不能直接解决大规模实例，这显著限制了它们的应用前景。为了解决这些关键缺陷，本文提出了一种新颖的实例条件调整模型（ICAM），以实现神经组合优化的大规模泛化。具体来说，我们设计了一个强大而轻量级的实例条件调整模块，用于为不同规模的实例生成更好的解决方案。此外，我们开发了一种有效的三阶段基于强化学习的训练方案，使模型能够学习跨规模特征，而无需任何标记的最优解。实验结果表明，我们提出的方法能够在解决不同规模的旅行推销员问题（TSPs）和容量车辆路径问题（CVRPs）时以非常快的推断时间获得出色的结果。据我们所知，我们的模型在TSP和CVRP的所有RL构造方法中取得了最先进的性能，节点数量高达1,000个。

更新时间: 2024-05-03 08:00:19

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.01906v1

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

We investigate learning the equilibria in non-stationary multi-agent systems and address the challenges that differentiate multi-agent learning from single-agent learning. Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges. To overcome these obstacles, we propose a versatile black-box approach applicable to a broad spectrum of problems, such as general-sum games, potential games, and Markov games, when equipped with appropriate learning and testing oracles for stationary environments. Our algorithms can achieve $\widetilde{O}\left(\Delta^{1/4}T^{3/4}\right)$ regret when the degree of nonstationarity, as measured by total variation $\Delta$, is known, and $\widetilde{O}\left(\Delta^{1/5}T^{4/5}\right)$ regret when $\Delta$ is unknown, where $T$ is the number of rounds. Meanwhile, our algorithm inherits the favorable dependence on number of agents from the oracles. As a side contribution that may be independent of interest, we show how to test for various types of equilibria by a black-box reduction to single-agent learning, which includes Nash equilibria, correlated equilibria, and coarse correlated equilibria.

Updated: 2024-05-03 07:38:21

标题: 一个针对非平稳多智能体强化学习的黑盒方法

摘要: 我们研究了在非平稳多智能体系统中学习均衡，并解决了区分多智能体学习和单智能体学习之间的挑战。具体来说，我们专注于具有强化反馈的博弈，在这种情况下，即使要测试的差距很小，测试均衡也可能导致重大遗憾，并且在静态博弈中存在多个最优解（均衡）会带来额外的挑战。为了克服这些障碍，我们提出了一种通用的黑匣子方法，适用于广泛的问题，如总和博弈、潜在博弈和马尔可夫博弈，在静态环境中配备适当的学习和测试预言。当非稳态程度，由总变异度Δ衡量时，我们的算法可以实现$O(\Delta^{1/4}T^{3/4})$的遗憾；当Δ未知时，可以实现$O(\Delta^{1/5}T^{4/5})$的遗憾，其中T是回合数。与此同时，我们的算法继承了来自预言的对智能体数量的有利依赖。作为一个可能独立于兴趣的附带贡献，我们展示了如何通过黑匣子降级到单智能体学习来测试各种类型的均衡，包括纳什均衡、相关均衡和粗略相关均衡。

更新时间: 2024-05-03 07:38:21

领域: cs.LG,cs.AI,cs.GT,cs.MA,stat.ML

下载: http://arxiv.org/abs/2306.07465v2

Enhancing Social Media Post Popularity Prediction with Visual Content

Our study presents a framework for predicting image-based social media content popularity that focuses on addressing complex image information and a hierarchical data structure. We utilize the Google Cloud Vision API to effectively extract key image and color information from users' postings, achieving 6.8\% higher accuracy compared to using non-image covariates alone. For prediction, we explore a wide range of prediction models, including Linear Mixed Model, Support Vector Regression, Multi-layer Perceptron, Random Forest, and XGBoost, with linear regression as the benchmark. Our comparative study demonstrates that models that are capable of capturing the underlying nonlinear interactions between covariates outperform other methods.

Updated: 2024-05-03 07:37:50

标题: 通过视觉内容提升社交媒体帖子受欢迎度预测

摘要: 我们的研究提出了一个框架，用于预测基于图像的社交媒体内容的流行度，重点是解决复杂的图像信息和分层数据结构。我们利用Google Cloud Vision API有效地从用户的帖子中提取关键的图像和颜色信息，与仅使用非图像协变量相比，实现了6.8％更高的准确性。对于预测，我们探索了一系列广泛的预测模型，包括线性混合模型、支持向量回归、多层感知器、随机森林和XGBoost，线性回归作为基准。我们的比较研究表明，能够捕获协变量之间潜在非线性交互作用的模型胜过其他方法。

更新时间: 2024-05-03 07:37:50

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.02367v1

Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks

Deep Neural Networks (DNN) have shown great promise in many classification applications, yet are widely known to have poorly calibrated predictions when they are over-parametrized. Improving DNN calibration without comprising on model accuracy is of extreme importance and interest in safety critical applications such as in the health-care sector. In this work, we show that decoupling the training of feature extraction layers and classification layers in over-parametrized DNN architectures such as Wide Residual Networks (WRN) and Visual Transformers (ViT) significantly improves model calibration whilst retaining accuracy, and at a low training cost. In addition, we show that placing a Gaussian prior on the last hidden layer outputs of a DNN, and training the model variationally in the classification training stage, even further improves calibration. We illustrate these methods improve calibration across ViT and WRN architectures for several image classification benchmark datasets.

Updated: 2024-05-03 07:36:26

标题: 将特征提取层和分类层分离以进行校准神经网络

摘要: 深度神经网络（DNN）在许多分类应用中显示出巨大的潜力，然而广为人知的是，当它们参数过多时，预测往往缺乏校准。在诸如医疗保健领域这样的安全关键应用中，提高DNN的校准性而不降低模型准确性至关重要且具有极大的兴趣。在这项工作中，我们展示了在参数过多的DNN架构（如Wide Residual Networks（WRN）和Visual Transformers（ViT））中将特征提取层和分类层的训练解耦显著提高了模型的校准性，同时保持了准确性，且训练成本较低。此外，我们还展示了在DNN的最后一个隐藏层输出上放置一个高斯先验，并在分类训练阶段对模型进行变分训练，进一步提高了校准性。我们展示了这些方法在几个图像分类基准数据集中改善了ViT和WRN架构的校准性。

更新时间: 2024-05-03 07:36:26

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.01196v2

(A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal Advice

Large language models (LLMs) are increasingly capable of providing users with advice in a wide range of professional domains, including legal advice. However, relying on LLMs for legal queries raises concerns due to the significant expertise required and the potential real-world consequences of the advice. To explore \textit{when} and \textit{why} LLMs should or should not provide advice to users, we conducted workshops with 20 legal experts using methods inspired by case-based reasoning. The provided realistic queries ("cases") allowed experts to examine granular, situation-specific concerns and overarching technical and legal constraints, producing a concrete set of contextual considerations for LLM developers. By synthesizing the factors that impacted LLM response appropriateness, we present a 4-dimension framework: (1) User attributes and behaviors, (2) Nature of queries, (3) AI capabilities, and (4) Social impacts. We share experts' recommendations for LLM response strategies, which center around helping users identify `right questions to ask' and relevant information rather than providing definitive legal judgments. Our findings reveal novel legal considerations, such as unauthorized practice of law, confidentiality, and liability for inaccurate advice, that have been overlooked in the literature. The case-based deliberation method enabled us to elicit fine-grained, practice-informed insights that surpass those from de-contextualized surveys or speculative principles. These findings underscore the applicability of our method for translating domain-specific professional knowledge and practices into policies that can guide LLM behavior in a more responsible direction.

Updated: 2024-05-03 07:32:34

标题: 我不是律师，但是...：与法律专家合作，制定负责任的法律咨询LLM政策

摘要: 大型语言模型（LLMs）越来越能够在各种专业领域为用户提供建议，包括法律建议。然而，依赖LLMs进行法律查询引发了关注，因为这需要相当的专业知识，并且建议可能会产生潜在的现实世界后果。为了探讨LLMs何时以及为什么应该或不应该向用户提供建议，我们与20位法律专家进行了研讨会，采用了受案例推理启发的方法。提供的现实查询（“案例”）使专家能够检查细致的、与情况相关的问题和总体技术和法律约束，从而为LLMs开发者提供了一组具体的背景考虑因素。通过综合影响LLMs响应适宜性的因素，我们提出了一个四维框架：（1）用户属性和行为，（2）查询的性质，（3）AI能力，以及（4）社会影响。我们分享了专家对LLMs响应策略的建议，重点在于帮助用户确定“提出正确问题”和相关信息，而不是提供明确的法律判断。我们的研究结果揭示了一些新颖的法律考虑因素，如未经授权的法律实践、保密性和对不准确建议的责任，这些在文献中被忽视。基于案例的讨论方法使我们能够引发细致、实践为基础的见解，超越了脱离上下文的调查或推测性原则。这些发现强调了我们的方法将领域特定的专业知识和实践转化为能够引导LLMs行为更加负责任的政策的适用性。

更新时间: 2024-05-03 07:32:34

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2402.01864v2

Recommenadation aided Caching using Combinatorial Multi-armed Bandits

We study content caching with recommendations in a wireless network where the users are connected through a base station equipped with a finite-capacity cache. We assume a fixed set of contents with unknown user preferences and content popularities. We can recommend a subset of the contents to the users which encourages the users to request these contents. Recommendation can thus be used to increase cache hits. We formulate the cache hit optimization problem as a combinatorial multi-armed bandit (CMAB). We propose a UCB-based algorithm to decide which contents to cache and recommend. We provide an upper bound on the regret of our algorithm. We numerically demonstrate the performance of our algorithm and compare it to state-of-the-art algorithms.

Updated: 2024-05-03 07:29:24

标题: 使用组合多臂赌博机辅助缓存推荐

摘要: 我们研究在一个无线网络中具有推荐功能的内容缓存，用户通过配备有有限容量缓存的基站连接。我们假设存在一个固定集合的内容，用户偏好和内容流行度未知。我们可以向用户推荐部分内容，鼓励用户请求这些内容。推荐可以用来增加缓存命中率。我们将缓存命中率优化问题建模为组合多臂老虎机（CMAB）。我们提出了一种基于UCB的算法来决定哪些内容要缓存和推荐。我们给出了算法的遗憾上界。我们通过数字演示了我们算法的性能，并将其与最先进的算法进行了比较。

更新时间: 2024-05-03 07:29:24

领域: cs.LG,cs.IR,cs.NI

下载: http://arxiv.org/abs/2405.00080v2

Exploring the Privacy-Energy Consumption Tradeoff for Split Federated Learning

Split Federated Learning (SFL) has recently emerged as a promising distributed learning technology, leveraging the strengths of both federated and split learning. It emphasizes the advantages of rapid convergence while addressing privacy concerns. As a result, this innovation has received significant attention from both industry and academia. However, since the model is split at a specific layer, known as a cut layer, into both client-side and server-side models for the SFL, the choice of the cut layer in SFL can have a substantial impact on the energy consumption of clients and their privacy, as it influences the training burden and the output of the client-side models. In this article, we provide a comprehensive overview of the SFL process and thoroughly analyze energy consumption and privacy. This analysis considers the influence of various system parameters on the cut layer selection strategy. Additionally, we provide an illustrative example of the cut layer selection, aiming to minimize clients' risk of reconstructing the raw data at the server while sustaining energy consumption within the required energy budget, which involves trade-offs. Finally, we address open challenges in this field. These directions represent promising avenues for future research and development.

Updated: 2024-05-03 07:27:18

标题: 探索分割联邦学习中隐私和能源消耗的权衡Tradeoff

摘要: 分裂联邦学习（SFL）是一种最近出现的有前途的分布式学习技术，利用了联邦学习和分裂学习的优势。它强调了快速收敛的优势，同时解决了隐私问题。因此，这种创新受到行业和学术界的广泛关注。然而，由于模型在特定层（称为切割层）上分裂成客户端和服务器端模型，SFL中切割层的选择对客户端的能源消耗和隐私有着重大影响，因为它影响了训练负担和客户端模型的输出。在本文中，我们提供了SFL过程的全面概述，并深入分析了能源消耗和隐私。该分析考虑了各种系统参数对切割层选择策略的影响。此外，我们提供了一个说明性的切割层选择示例，旨在最小化客户端在维持能源消耗在所需能源预算范围内的同时，在服务器端重建原始数据的风险，这涉及到权衡。最后，我们讨论了这一领域的开放挑战。这些方向代表了未来研究和发展的有前途的途径。

更新时间: 2024-05-03 07:27:18

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2311.09441v4

Compositional Learning of Visually-Grounded Concepts Using Reinforcement

Children can rapidly generalize compositionally-constructed rules to unseen test sets. On the other hand, deep reinforcement learning (RL) agents need to be trained over millions of episodes, and their ability to generalize to unseen combinations remains unclear. Hence, we investigate the compositional abilities of RL agents, using the task of navigating to specified color-shape targets in synthetic 3D environments. First, we show that when RL agents are naively trained to navigate to target color-shape combinations, they implicitly learn to decompose the combinations, allowing them to (re-)compose these and succeed at held-out test combinations ("compositional learning"). Second, when agents are pretrained to learn invariant shape and color concepts ("concept learning"), the number of episodes subsequently needed for compositional learning decreased by 20 times. Furthermore, only agents trained on both concept and compositional learning could solve a more complex, out-of-distribution environment in zero-shot fashion. Finally, we verified that only text encoders pretrained on image-text datasets (e.g. CLIP) reduced the number of training episodes needed for our agents to demonstrate compositional learning, and also generalized to 5 unseen colors in zero-shot fashion. Overall, our results are the first to demonstrate that RL agents can be trained to implicitly learn concepts and compositionality, to solve more complex environments in zero-shot fashion.

Updated: 2024-05-03 07:21:37

标题: 视觉引导概念的组合学习利用强化学习

摘要: 孩子们可以迅速将构成性构建的规则推广到未见过的测试集。另一方面，深度强化学习（RL）代理需要训练数百万个剧集，并且它们对未见组合的泛化能力仍不清楚。因此，我们调查了RL代理的构成能力，使用在合成的3D环境中导航到指定颜色-形状目标的任务。首先，我们展示了当RL代理被天真地训练导航到目标颜色-形状组合时，它们隐含地学习分解这些组合，使它们能够（重新）组合这些组合并成功地完成保留的测试组合（"构成学习"）。其次，当代理被预先训练学习不变形状和颜色概念（"概念学习"）时，后续需要的构成学习的剧集数量减少了20倍。此外，只有同时接受概念和构成学习训练的代理才能以零次射击方式解决更复杂的、超出分布的环境。最后，我们验证了只有在图像-文本数据集（如CLIP）上预训练的文本编码器才能降低我们的代理展示构成学习所需的训练剧集数量，并且在零次射击方式下泛化到5种未见颜色。总的来说，我们的结果是第一个证明RL代理可以被训练隐式学习概念和构成性，以零次射击方式解决更复杂环境的研究。

更新时间: 2024-05-03 07:21:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2309.04504v2

Securing the Open RAN Infrastructure: Exploring Vulnerabilities in Kubernetes Deployments

In this paper, we investigate the security implications of virtualized and software-based Open Radio Access Network (RAN) systems, specifically focusing on the architecture proposed by the O-RAN ALLIANCE and O-Cloud deployments based on the O-RAN Software Community (OSC) stack and infrastructure. Our key findings are based on a thorough security assessment and static scanning of the OSC Near Real-Time RAN Intelligent Controller (RIC) cluster. We highlight the presence of potential vulnerabilities and misconfigurations in the Kubernetes infrastructure supporting the RIC, also due to the usage of outdated versions of software packages, and provide an estimation of their criticality using various deployment auditing frameworks (e.g., MITRE ATT&CK and the NSA CISA). In addition, we propose methodologies to minimize these issues and harden the Open RAN virtualization infrastructure. These encompass the integration of security evaluation methods into the deployment process, implementing deployment hardening measures, and employing policy-based control for RAN components. We emphasize the need to address the problems found in order to improve the overall security of virtualized Open RAN systems.

Updated: 2024-05-03 07:18:45

标题: 保护开放式RAN基础设施：探究Kubernetes部署中的漏洞

摘要: 在本文中，我们调查了虚拟化和基于软件的开放式无线接入网络（RAN）系统的安全影响，特别关注了由O-RAN联盟提出的架构和基于O-RAN软件社区（OSC）堆栈和基础设施的O-Cloud部署。我们的关键发现基于对OSC近实时RAN智能控制器（RIC）集群的彻底安全评估和静态扫描。我们强调了支持RIC的Kubernetes基础设施中潜在的漏洞和配置错误的存在，也因为使用了过时版本的软件包，我们利用各种部署审计框架（例如MITRE ATT&CK和NSA CISA）提供了对它们关键性的估计。此外，我们提出了最小化这些问题并加固开放式RAN虚拟化基础设施的方法。这些方法包括将安全评估方法整合到部署过程中，实施部署加固措施，并为RAN组件采用基于策略的控制。我们强调需要解决发现的问题，以提高虚拟化开放式RAN系统的整体安全性。

更新时间: 2024-05-03 07:18:45

领域: cs.CR

下载: http://arxiv.org/abs/2405.01888v1

Aloe: A Family of Fine-tuned Open Healthcare LLMs

As the capabilities of Large Language Models (LLMs) in healthcare and medicine continue to advance, there is a growing need for competitive open-source models that can safeguard public interest. With the increasing availability of highly competitive open base models, the impact of continued pre-training is increasingly uncertain. In this work, we explore the role of instruct tuning, model merging, alignment, red teaming and advanced inference schemes, as means to improve current open models. To that end, we introduce the Aloe family, a set of open medical LLMs highly competitive within its scale range. Aloe models are trained on the current best base models (Mistral, LLaMA 3), using a new custom dataset which combines public data sources improved with synthetic Chain of Thought (CoT). Aloe models undergo an alignment phase, becoming one of the first few policy-aligned open healthcare LLM using Direct Preference Optimization, setting a new standard for ethical performance in healthcare LLMs. Model evaluation expands to include various bias and toxicity datasets, a dedicated red teaming effort, and a much-needed risk assessment for healthcare LLMs. Finally, to explore the limits of current LLMs in inference, we study several advanced prompt engineering strategies to boost performance across benchmarks, yielding state-of-the-art results for open healthcare 7B LLMs, unprecedented at this scale.

Updated: 2024-05-03 07:14:07

标题: 芦荟：一组精细调节的开放式医疗LLM家族

摘要: 随着大型语言模型（LLMs）在医疗保健领域的能力不断提升，越来越需要竞争力强的开源模型来保护公众利益。随着高度竞争性的开源基础模型日益增多，继续预训练的影响越来越不确定。在这项工作中，我们探讨了指导微调、模型合并、对齐、红队测试和高级推理方案等作为改进当前开源模型的手段。为此，我们引入了Aloe家族，一组在其规模范围内具有很高竞争力的开放式医疗LLMs。Aloe模型是在目前最佳基础模型（Mistral，LLaMA 3）上训练的，使用结合了改进的合成思维链（CoT）的公共数据源的新自定义数据集。Aloe模型经历了对齐阶段，成为少数采用直接优化偏好的对齐政策的开放医疗LLMs之一，为医疗LLMs的道德表现设定了新标准。模型评估扩展到包括各种偏见和有毒数据集，专门的红队测试工作以及对医疗LLMs非常需要的风险评估。最后，为了探索当前LLMs在推理方面的极限，我们研究了几种高级提示工程策略，以提升各项基准测试的性能，为开放医疗7B LLMs实现了前所未有的最新成果，达到了这一规模的最新水平。

更新时间: 2024-05-03 07:14:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.01886v1

DALLMi: Domain Adaption for LLM-based Multi-label Classifier

Large language models (LLMs) increasingly serve as the backbone for classifying text associated with distinct domains and simultaneously several labels (classes). When encountering domain shifts, e.g., classifier of movie reviews from IMDb to Rotten Tomatoes, adapting such an LLM-based multi-label classifier is challenging due to incomplete label sets at the target domain and daunting training overhead. The existing domain adaptation methods address either image multi-label classifiers or text binary classifiers. In this paper, we design DALLMi, Domain Adaptation Large Language Model interpolator, a first-of-its-kind semi-supervised domain adaptation method for text data models based on LLMs, specifically BERT. The core of DALLMi is the novel variation loss and MixUp regularization, which jointly leverage the limited positively labeled and large quantity of unlabeled text and, importantly, their interpolation from the BERT word embeddings. DALLMi also introduces a label-balanced sampling strategy to overcome the imbalance between labeled and unlabeled data. We evaluate DALLMi against the partial-supervised and unsupervised approach on three datasets under different scenarios of label availability for the target domain. Our results show that DALLMi achieves higher mAP than unsupervised and partially-supervised approaches by 19.9% and 52.2%, respectively.

Updated: 2024-05-03 07:04:26

标题: DALLMi: 基于LLM的多标签分类器的域适应

摘要: 大型语言模型（LLMs）越来越多地被用作分类与不同领域相关的文本，并同时拥有多个标签（类别）的支柱。当遇到领域转移时，例如，从IMDb到Rotten Tomatoes的电影评论分类器，调整基于LLM的多标签分类器具有挑战性，因为目标领域中的标签集不完整，训练开销巨大。现有的领域适应方法要么针对图像多标签分类器，要么针对文本二进制分类器。在本文中，我们设计了DALLMi，即Domain Adaptation Large Language Model interpolator，这是一种基于LLMs（特别是BERT）的文本数据模型的半监督领域适应方法。DALLMi的核心是新颖的变差损失和MixUp正则化，它们共同利用了有限数量的正标记和大量未标记文本，重要的是它们从BERT单词嵌入中插值出来。DALLMi还引入了一种标签平衡采样策略，以克服有标签数据和无标签数据之间的不平衡。我们在三个数据集上评估了DALLMi在目标领域标签可用性不同情况下与部分监督和无监督方法的对比。我们的结果显示，DALLMi的mAP比无监督和部分监督方法分别提高了19.9％和52.2％。

更新时间: 2024-05-03 07:04:26

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.01883v1

High-fidelity Person-centric Subject-to-Image Synthesis

Current subject-driven image generation methods encounter significant challenges in person-centric image generation. The reason is that they learn the semantic scene and person generation by fine-tuning a common pre-trained diffusion, which involves an irreconcilable training imbalance. Precisely, to generate realistic persons, they need to sufficiently tune the pre-trained model, which inevitably causes the model to forget the rich semantic scene prior and makes scene generation over-fit to the training data. Moreover, even with sufficient fine-tuning, these methods can still not generate high-fidelity persons since joint learning of the scene and person generation also lead to quality compromise. In this paper, we propose Face-diffuser, an effective collaborative generation pipeline to eliminate the above training imbalance and quality compromise. Specifically, we first develop two specialized pre-trained diffusion models, i.e., Text-driven Diffusion Model (TDM) and Subject-augmented Diffusion Model (SDM), for scene and person generation, respectively. The sampling process is divided into three sequential stages, i.e., semantic scene construction, subject-scene fusion, and subject enhancement. The first and last stages are performed by TDM and SDM respectively. The subject-scene fusion stage, that is the collaboration achieved through a novel and highly effective mechanism, Saliency-adaptive Noise Fusion (SNF). Specifically, it is based on our key observation that there exists a robust link between classifier-free guidance responses and the saliency of generated images. In each time step, SNF leverages the unique strengths of each model and allows for the spatial blending of predicted noises from both models automatically in a saliency-aware manner. Extensive experiments confirm the impressive effectiveness and robustness of the Face-diffuser.

Updated: 2024-05-03 07:02:56

标题: 高保真度的以人为中心的主体到图像合成

摘要: 目前，针对以人为中心的图像生成，现有的主题驱动图像生成方法面临着重大挑战。原因在于它们通过微调一个共同预训练的扩散模型来学习语义场景和人物生成，涉及到无法调和的训练不平衡。确切地说，为了生成逼真的人物，它们需要充分调整预训练模型，这不可避免地导致模型忘记丰富的语义场景先验，并使场景生成过度拟合训练数据。此外，即使进行充分微调，这些方法仍然无法生成高保真度的人物，因为场景和人物生成的联合学习也会导致质量折衷。在本文中，我们提出了Face-diffuser，一种有效的协同生成流程，以消除上述训练不平衡和质量折衷。具体而言，我们首先开发了两个专门的预训练扩散模型，即基于文本驱动的扩散模型（TDM）和基于主体增强的扩散模型（SDM），用于分别进行场景和人物生成。采样过程分为三个连续阶段，即语义场景构建、主体场景融合和主体增强。第一个和最后一个阶段分别由TDM和SDM执行。主体场景融合阶段是通过一种新颖且高效的机制，即显著性自适应噪声融合（SNF）来实现的。具体而言，它基于我们的关键观察，即分类器自由指导响应与生成图像的显著性之间存在稳固联系。在每个时间步骤中，SNF利用每个模型的独特优势，并以一种显著性感知的方式自动混合来自两个模型的预测噪声。大量实验证实了Face-diffuser的出色有效性和鲁棒性。

更新时间: 2024-05-03 07:02:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2311.10329v5

Predictive change point detection for heterogeneous data

A change point detection (CPD) framework assisted by a predictive machine learning model called "Predict and Compare" is introduced and characterised in relation to other state-of-the-art online CPD routines which it outperforms in terms of false positive rate and out-of-control average run length. The method's focus is on improving standard methods from sequential analysis such as the CUSUM rule in terms of these quality measures. This is achieved by replacing typically used trend estimation functionals such as the running mean with more sophisticated predictive models (Predict step), and comparing their prognosis with actual data (Compare step). The two models used in the Predict step are the ARIMA model and the LSTM recursive neural network. However, the framework is formulated in general terms, so as to allow the use of other prediction or comparison methods than those tested here. The power of the method is demonstrated in a tribological case study in which change points separating the run-in, steady-state, and divergent wear phases are detected in the regime of very few false positives.

Updated: 2024-05-03 07:02:09

标题: 异质数据的预测性变点检测

摘要: 介绍了一种由预测机器学习模型“预测和比较”辅助的变点检测（CPD）框架，并将其与其他最先进的在线CPD例程进行了对比，它在假阳性率和失控平均运行长度方面表现更好。该方法的重点是通过用更复杂的预测模型（预测步骤）替换通常使用的趋势估计功能（如滑动平均），并将其预测结果与实际数据进行比较（比较步骤）来改进顺序分析中的标准方法，如CUSUM规则，以提高这些质量指标。通过在预测步骤中使用ARIMA模型和LSTM递归神经网络两种模型，并将其预测结果与实际数据进行比较，实现了这一目标。然而，该框架是以一般术语表述的，以允许使用其他预测或比较方法，而不仅限于本文测试的方法。该方法的能力在一个摩擦学案例研究中得到了展示，其中在极少数假阳性情况下检测到了分隔磨合、稳态和发散磨损阶段的变点。

更新时间: 2024-05-03 07:02:09

领域: cs.LG

下载: http://arxiv.org/abs/2305.06630v3

Millimeter Wave Radar-based Human Activity Recognition for Healthcare Monitoring Robot

Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter challenges in handling sparse point clouds, achieving real-time continuous classification, and coping with limited monitoring ranges when statically mounted. To overcome these limitations, we propose RobHAR, a movable robot-mounted mmWave radar system with lightweight deep neural networks for real-time monitoring of human activities. Specifically, we first propose a sparse point cloud-based global embedding to learn the features of point clouds using the light-PointNet (LPN) backbone. Then, we learn the temporal pattern with a bidirectional lightweight LSTM model (BiLiLSTM). In addition, we implement a transition optimization strategy, integrating the Hidden Markov Model (HMM) with Connectionist Temporal Classification (CTC) to improve the accuracy and robustness of the continuous HAR. Our experiments on three datasets indicate that our method significantly outperforms the previous studies in both discrete and continuous HAR tasks. Finally, we deploy our system on a movable robot-mounted edge computing platform, achieving flexible healthcare monitoring in real-world scenarios.

Updated: 2024-05-03 06:57:59

标题: 基于毫米波雷达的人体活动识别用于健康监护机器人

摘要: 健康监测至关重要，特别是对独居的老年人的日常护理。它可以检测危险事件，如摔倒，并提供及时警报以挽救生命。最近，基于毫米波雷达的非侵入式健康监测系统利用先进的人体活动识别（HAR）模型已经引起了广泛关注。然而，在静态安装时，它们在处理稀疏点云、实现实时连续分类和应对有限监测范围方面遇到挑战。为了克服这些限制，我们提出了RobHAR，这是一个可移动的机器人搭载的毫米波雷达系统，使用轻量级深度神经网络实现对人体活动的实时监测。具体来说，我们首先提出了基于稀疏点云的全局嵌入，使用轻量级PointNet（LPN）骨干网络学习点云的特征。然后，我们使用双向轻量级LSTM模型（BiLiLSTM）学习时间模式。此外，我们实现了一个过渡优化策略，将隐马尔可夫模型（HMM）与连接时序分类（CTC）相结合，以提高连续HAR的准确性和鲁棒性。我们在三个数据集上的实验表明，我们的方法在离散和连续HAR任务中明显优于先前的研究。最后，我们将我们的系统部署在一个可移动的机器人搭载的边缘计算平台上，在现实场景中实现灵活的健康监测。

更新时间: 2024-05-03 06:57:59

领域: cs.RO,cs.AI,eess.SP

下载: http://arxiv.org/abs/2405.01882v1

Explainable Risk Classification in Financial Reports

Every publicly traded company in the US is required to file an annual 10-K financial report, which contains a wealth of information about the company. In this paper, we propose an explainable deep-learning model, called FinBERT-XRC, that takes a 10-K report as input, and automatically assesses the post-event return volatility risk of its associated company. In contrast to previous systems, our proposed model simultaneously offers explanations of its classification decision at three different levels: the word, sentence, and corpus levels. By doing so, our model provides a comprehensive interpretation of its prediction to end users. This is particularly important in financial domains, where the transparency and accountability of algorithmic predictions play a vital role in their application to decision-making processes. Aside from its novel interpretability, our model surpasses the state of the art in predictive accuracy in experiments on a large real-world dataset of 10-K reports spanning six years.

Updated: 2024-05-03 06:56:47

标题: 财务报告中的可解释风险分类

摘要: 美国的每家上市公司都必须提交一份年度10-K财务报告，其中包含公司的大量信息。在本文中，我们提出了一个可解释的深度学习模型，称为FinBERT-XRC，它以10-K报告作为输入，自动评估其关联公司的事件后回报波动风险。与先前的系统相比，我们提出的模型同时在三个不同级别上提供其分类决策的解释：单词、句子和全文级别。通过这样做，我们的模型为最终用户提供了对其预测的全面解释。这在金融领域尤为重要，算法预测的透明度和责任性在其应用于决策过程中起着至关重要的作用。除了其新颖的可解释性，我们的模型在涵盖六年的大型真实世界10-K报告数据集的实验中超越了现有技术的预测准确性。

更新时间: 2024-05-03 06:56:47

领域: q-fin.RM,cs.LG

下载: http://arxiv.org/abs/2405.01881v1

Bayesian and Convolutional Networks for Hierarchical Morphological Classification of Galaxies

This work is focused on the morphological classification of galaxies following the Hubble sequence in which the different classes are arranged in a hierarchy. The proposed method, BCNN, is composed of two main modules. First, a convolutional neural network (CNN) is trained with images of the different classes of galaxies (image augmentation is carried out to balance some classes); the CNN outputs the probability for each class of the hierarchy, and its outputs/predictions feed the second module. The second module consists of a Bayesian network that represents the hierarchy and helps to improve the prediction accuracy by combining the predictions of the first phase while maintaining the hierarchical constraint (in a hierarchy, an instance associated with a node must be associated to all its ancestors), through probabilistic inference over the Bayesian network so that a consistent prediction is obtained. Different images from the Hubble telescope have been collected and labeled by experts, which are used to perform the experiments. The results show that BCNN performed better than several CNNs in multiple evaluation measures, reaching the next scores: 67% in exact match, 78% in accuracy, and 83% in hierarchical F-measure.

Updated: 2024-05-03 06:48:53

标题: 贝叶斯和卷积网络用于星系层次形态分类

摘要: 这项工作侧重于按照哈勃序列对星系进行形态分类，不同类别按照层次排序。提出的方法BCNN由两个主要模块组成。首先，使用不同类别星系的图像对卷积神经网络(CNN)进行训练（进行图像增强以平衡一些类别）；CNN输出层次中每个类别的概率，其输出/预测结果输入第二模块。第二模块由表示层次的贝叶斯网络组成，通过结合第一阶段的预测并保持层次约束（在层次中，与节点相关联的实例必须与其所有祖先相关联）来提高预测准确性，通过在贝叶斯网络上进行概率推理，从而获得一致的预测。已收集不同哈勃望远镜的图像，并由专家标记，用于进行实验。结果显示，BCNN在多个评估指标上表现优于几个CNN，达到以下得分：准确匹配率67%，准确度78%，层次F-度量83%。

更新时间: 2024-05-03 06:48:53

领域: astro-ph.IM,astro-ph.GA,cs.LG

下载: http://arxiv.org/abs/2405.02366v1

Adaptive and robust watermark against model extraction attack

Large language models have boosted Large Models as a Service (LMaaS) into a thriving business sector. But even model owners offering only API access while keeping model parameters and internal workings private, their Intellectual Property (IP) are still at risk of theft through model extraction attacks. To safeguard the IP of these models and mitigate unfair competition in the language model market, watermarking technology serves as an efficient post-hoc solution for identifying IP infringements. However, existing IP protection watermarking methods either explicitly alter the original output of the language model or implant watermark signals in the model logits. These methods forcefully distort the original distribution of the language model and impact the sampling process, leading to a decline in the quality of the generated text. The existing method also fails to achieve end-to-end adaptive watermark embedding and lack robustness verification in complex scenarios where watermark detection is subject to interference. To overcome these challenges, we propose PromptShield, a plug-and-play IP protection watermarking method to resist model extraction attacks without training additional modules. Leveraging the self-reminding properties inherent in large language models, we encapsulate the user's query with a watermark self-generated instruction, nudging the LLMs to automatically generate watermark words in its output without compromising generation quality. Our method does not require access to the model's internal logits and minimizes alterations to the model's distribution using prompt-guided cues. Comprehensive experimental results consistently demonstrate the effectiveness, harmlessness, and robustness of our watermark. Moreover, Our watermark detection method remains robust and high detection sensitivity even when subjected to interference.

Updated: 2024-05-03 06:41:48

标题: 适应性和鲁棒的水印抵抗模型提取攻击

摘要: 大型语言模型已经将大型模型作为服务（LMaaS）推进成为一个蓬勃发展的商业领域。即使模型所有者只提供API访问，同时保持模型参数和内部工作私密，他们的知识产权仍然面临通过模型提取攻击而被盗窃的风险。为了保护这些模型的知识产权并减轻语言模型市场中的不公平竞争，水印技术被用作一种高效的事后解决方案来识别知识产权侵权行为。然而，现有的知识产权保护水印方法要么明确改变语言模型的原始输出，要么在模型的逻辑中植入水印信号。这些方法强行扭曲了语言模型的原始分布并影响了采样过程，导致生成文本的质量下降。现有方法还未能实现端到端自适应水印嵌入，并且在水印检测受到干扰的复杂场景中缺乏鲁棒性验证。为了克服这些挑战，我们提出了PromptShield，一种即插即用的知识产权保护水印方法，可以抵抗模型提取攻击而无需训练额外模块。利用大型语言模型固有的自我提醒属性，我们将用户的查询封装为一个水印自动生成指令，促使LLMs在其输出中自动生成水印词，而不影响生成质量。我们的方法不需要访问模型的内部逻辑，并使用提示引导线索最小化对模型分布的改变。全面的实验结果一致表明了我们的水印的有效性、无害性和鲁棒性。此外，我们的水印检测方法在受干扰时仍然保持鲁棒性和高检测灵敏度。

更新时间: 2024-05-03 06:41:48

领域: cs.CR

下载: http://arxiv.org/abs/2405.02365v1

Neural Common Neighbor with Completion for Link Prediction

In this work, we propose a novel link prediction model and further boost it by studying graph incompleteness. First, we introduce MPNN-then-SF, an innovative architecture leveraging structural feature (SF) to guide MPNN's representation pooling, with its implementation, namely Neural Common Neighbor (NCN). NCN exhibits superior expressiveness and scalability compared with existing models, which can be classified into two categories: SF-then-MPNN, augmenting MPNN's input with SF, and SF-and-MPNN, decoupling SF and MPNN. Second, we investigate the impact of graph incompleteness -- the phenomenon that some links are unobserved in the input graph -- on SF, like the common neighbor. Through dataset visualization, we observe that incompleteness reduces common neighbors and induces distribution shifts, significantly affecting model performance. To address this issue, we propose to use a link prediction model to complete the common neighbor structure. Combining this method with NCN, we propose Neural Common Neighbor with Completion (NCNC). NCN and NCNC outperform recent strong baselines by large margins, and NCNC further surpasses state-of-the-art models in standard link prediction benchmarks. Our code is available at https://github.com/GraphPKU/NeuralCommonNeighbor.

Updated: 2024-05-03 06:39:52

标题: 神经网络常见邻居联合填充用于链接预测

摘要: 在这项工作中，我们提出了一种新颖的链路预测模型，并通过研究图不完整性进一步提升了它。首先，我们介绍了MPNN-then-SF，这是一种创新的架构，利用结构特征（SF）来引导MPNN的表示汇集，其实现方式是神经共同邻居（NCN）。与现有模型相比，NCN具有更优越的表达能力和可伸缩性，这些模型可以分为两类：SF-then-MPNN，用SF增强MPNN的输入，以及SF-and-MPNN，将SF和MPNN解耦。其次，我们研究了图不完整性对SF的影响 - 即输入图中存在一些未观察到的链路的现象，如共同邻居。通过数据集可视化，我们观察到不完整性会减少共同邻居并引起分布偏移，从而显著影响模型性能。为了解决这个问题，我们提出使用链路预测模型来完成共同邻居结构。将这种方法与NCN结合，我们提出了具有完整性的神经共同邻居（NCNC）。NCN和NCNC在很大程度上优于最近的强基线，而NCNC进一步超越了标准链路预测基准中的最先进模型。我们的代码可在https://github.com/GraphPKU/NeuralCommonNeighbor找到。

更新时间: 2024-05-03 06:39:52

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2302.00890v3

Cooperation Dynamics in Multi-Agent Systems: Exploring Game-Theoretic Scenarios with Mean-Field Equilibria

Cooperation is fundamental in Multi-Agent Systems (MAS) and Multi-Agent Reinforcement Learning (MARL), often requiring agents to balance individual gains with collective rewards. In this regard, this paper aims to investigate strategies to invoke cooperation in game-theoretic scenarios, namely the Iterated Prisoner's Dilemma, where agents must optimize both individual and group outcomes. Existing cooperative strategies are analyzed for their effectiveness in promoting group-oriented behavior in repeated games. Modifications are proposed where encouraging group rewards will also result in a higher individual gain, addressing real-world dilemmas seen in distributed systems. The study extends to scenarios with exponentially growing agent populations ($N \longrightarrow +\infty$), where traditional computation and equilibrium determination are challenging. Leveraging mean-field game theory, equilibrium solutions and reward structures are established for infinitely large agent sets in repeated games. Finally, practical insights are offered through simulations using the Multi Agent-Posthumous Credit Assignment trainer, and the paper explores adapting simulation algorithms to create scenarios favoring cooperation for group rewards. These practical implementations bridge theoretical concepts with real-world applications.

Updated: 2024-05-03 06:36:57

标题: 多智能体系统中的合作动态：探索具有均场均衡的博弈论情景

摘要: 合作在多智能体系统（MAS）和多智能体强化学习（MARL）中是基础的，通常需要智能体在个体收益和集体奖励之间取得平衡。在这方面，本文旨在研究在博弈理论情景中激发合作的策略，即迭代囚徒困境，其中智能体必须优化个体和群体结果。对现有的合作策略进行分析，以评估它们在重复游戏中促进以群体为导向的行为的有效性。提出了修改方案，鼓励群体奖励也将导致更高的个体收益，解决了分布式系统中出现的现实困境。研究延伸到具有指数增长的智能体人口（$N \longrightarrow +\infty$）的情景中，在这些情景中传统的计算和均衡确定具有挑战性。利用均场博弈理论，在重复游戏中为无限大的智能体集合建立均衡解和奖励结构。最后，通过使用多智能体-遗赠信用分配训练器进行模拟，提供了实际见解，并探讨了调整模拟算法以创造有利于群体奖励的情景。这些实际实现将理论概念与实际应用联系起来。

更新时间: 2024-05-03 06:36:57

领域: cs.GT,cs.AI

下载: http://arxiv.org/abs/2309.16263v3

Application of Long-Short Term Memory and Convolutional Neural Networks for Real-Time Bridge Scour Prediction

Scour around bridge piers is a critical challenge for infrastructures around the world. In the absence of analytical models and due to the complexity of the scour process, it is difficult for current empirical methods to achieve accurate predictions. In this paper, we exploit the power of deep learning algorithms to forecast the scour depth variations around bridge piers based on historical sensor monitoring data, including riverbed elevation, flow elevation, and flow velocity. We investigated the performance of Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) models for real-time scour forecasting using data collected from bridges in Alaska and Oregon from 2006 to 2021. The LSTM models achieved mean absolute error (MAE) ranging from 0.1m to 0.5m for predicting bed level variations a week in advance, showing a reasonable performance. The Fully Convolutional Network (FCN) variant of CNN outperformed other CNN configurations, showing a comparable performance to LSTMs with significantly lower computational costs. We explored various innovative random-search heuristics for hyperparameter tuning and model optimisation which resulted in reduced computational cost compared to grid-search method. The impact of different combinations of sensor features on scour prediction showed the significance of the historical time series of scour for predicting upcoming events. Overall, this study provides a greater understanding of the potential of Deep Learning algorithms for real-time scour prediction and early warning for bridges with distinct geology, geomorphology and flow characteristics.

Updated: 2024-05-03 06:32:29

标题: 长短期记忆和卷积神经网络在实时桥墩冲刷预测中的应用

摘要: 桥墩周围的冲刷是全球基础设施面临的关键挑战。由于缺乏分析模型并且由于冲刷过程的复杂性，目前的经验方法很难实现准确的预测。本文利用深度学习算法基于历史传感器监测数据（包括河床高程、流动高程和流速）来预测桥墩周围的冲刷深度变化。我们研究了长短期记忆（LSTM）和卷积神经网络（CNN）模型在利用2006年至2021年从阿拉斯加和俄勒冈的桥梁收集的数据进行实时冲刷预测时的表现。LSTM模型在预测一周内的床位高程变化时实现了平均绝对误差（MAE）在0.1m至0.5m之间，表现出了合理的性能。卷积神经网络（CNN）的全卷积网络（FCN）变体在性能上优于其他CNN配置，表现出与LSTM相当但计算成本显著更低的性能。我们探索了各种创新的随机搜索启发式方法进行超参数调整和模型优化，结果比网格搜索方法减少了计算成本。不同传感器特征组合对冲刷预测的影响显示了历史时间序列对预测即将发生事件的重要性。总的来说，这项研究为利用深度学习算法进行实时冲刷预测和早期警报提供了更深入的理解，尤其适用于具有不同地质、地貌和流动特征的桥梁。

更新时间: 2024-05-03 06:32:29

领域: cs.LG

下载: http://arxiv.org/abs/2404.16549v2

A Survey on Contribution Evaluation in Vertical Federated Learning

Vertical Federated Learning (VFL) has emerged as a critical approach in machine learning to address privacy concerns associated with centralized data storage and processing. VFL facilitates collaboration among multiple entities with distinct feature sets on the same user population, enabling the joint training of predictive models without direct data sharing. A key aspect of VFL is the fair and accurate evaluation of each entity's contribution to the learning process. This is crucial for maintaining trust among participating entities, ensuring equitable resource sharing, and fostering a sustainable collaboration framework. This paper provides a thorough review of contribution evaluation in VFL. We categorize the vast array of contribution evaluation techniques along the VFL lifecycle, granularity of evaluation, privacy considerations, and core computational methods. We also explore various tasks in VFL that involving contribution evaluation and analyze their required evaluation properties and relation to the VFL lifecycle phases. Finally, we present a vision for the future challenges of contribution evaluation in VFL. By providing a structured analysis of the current landscape and potential advancements, this paper aims to guide researchers and practitioners in the design and implementation of more effective, efficient, and privacy-centric VFL solutions. Relevant literature and open-source resources have been compiled and are being continuously updated at the GitHub repository: \url{https://github.com/cuiyuebing/VFL_CE}.

Updated: 2024-05-03 06:32:07

标题: 垂直联邦学习中贡献评估的调查

摘要: 垂直联邦学习（VFL）已经成为机器学习中的一种关键方法，用于解决与中心化数据存储和处理相关的隐私问题。VFL促进了不同特征集的多个实体在同一用户群体上的合作，实现了预测模型的联合训练，而无需直接共享数据。VFL的一个关键方面是对每个实体在学习过程中的贡献进行公平准确的评估。这对于维护参与实体之间的信任、确保资源分享的公平性以及促进可持续的合作框架至关重要。本文对VFL中的贡献评估进行了彻底的回顾。我们对VFL生命周期、评估的粒度、隐私考虑以及核心计算方法等各种贡献评估技术进行了分类。我们还探讨了在VFL中涉及贡献评估的各种任务，并分析了它们所需的评估属性以及与VFL生命周期阶段的关系。最后，我们展望了VFL中贡献评估的未来挑战。通过对当前情况和潜在进展的结构化分析，本文旨在指导研究人员和从业者设计和实施更加有效、高效和以隐私为中心的VFL解决方案。相关文献和开源资源已被整理并持续更新在GitHub存储库中：\url{https://github.com/cuiyuebing/VFL_CE}。

更新时间: 2024-05-03 06:32:07

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.02364v1

Offline Training of Language Model Agents with Functions as Learnable Weights

Researchers and practitioners have recently reframed powerful Large Language Models (LLMs) as agents, enabling them to automate complex tasks largely via the use of specialized functions. To facilitate the development of LLM agents, we present a novel paradigm of training LLM agents without modifying the LLM weights, which is particularly useful when the LLMs are difficult or inaccessible for modifications. Inspired by how humans continuously forge tools to adapt to real-world tasks, rather than change our biological structure to fit a static set of tools, we propose to progressively forge agent's functions to better solve the downstream tasks instead of modifying the LLM weights. By treating the functions as learnable `agent parameters' and leveraging the fundamental idea of model training in artificial intelligence, we develop AgentOptimizer that employs the LLM to update agents' functions and devise an agent training algorithm with two strategies, roll-back, and early-stop, to streamline the training process. With extensive experiments, we showcase that the agent training paradigm could significantly improve the performance of representative LLM agents in various downstream tasks. We also study the behavior of the agent training regarding aspects like the learning curve and domain transferability.

Updated: 2024-05-03 06:26:33

标题: 使用可学习权重的函数对语言模型代理进行离线训练

摘要: 研究人员和实践者最近将强大的大型语言模型（LLMs）重新定义为代理，使它们能够通过使用专门功能来自动化复杂任务。为了促进LLM代理的开发，我们提出了一种新颖的培训LLM代理的范式，而无需修改LLM的权重，这在LLM难以或无法修改时特别有用。受到人类不断锻造工具以适应现实任务的启发，而不是改变我们的生物结构以适应一套静态工具，我们提出逐渐锻造代理函数以更好地解决下游任务，而不是修改LLM权重。通过将函数视为可学习的“代理参数”并利用人工智能模型训练的基本思想，我们开发了AgentOptimizer，利用LLM更新代理函数并设计了一个代理训练算法，其中包括两种策略，回滚和提前停止，以简化培训过程。通过大量实验，我们展示了代理训练范式可以显著提高各种下游任务中代表性LLM代理的性能。我们还研究了代理训练在学习曲线和领域可转移性等方面的行为。

更新时间: 2024-05-03 06:26:33

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.11359v2

Random Subgraph Detection Using Queries

The planted densest subgraph detection problem refers to the task of testing whether in a given (random) graph there is a subgraph that is unusually dense. Specifically, we observe an undirected and unweighted graph on $n$ vertices. Under the null hypothesis, the graph is a realization of an Erd\H{o}s-R\'{e}nyi graph with edge probability (or, density) $q$. Under the alternative, there is a subgraph on $k$ vertices with edge probability $p>q$. The statistical as well as the computational barriers of this problem are well-understood for a wide range of the edge parameters $p$ and $q$. In this paper, we consider a natural variant of the above problem, where one can only observe a relatively small part of the graph using adaptive edge queries. For this model, we determine the number of queries necessary and sufficient (accompanied with a quasi-polynomial optimal algorithm) for detecting the presence of the planted subgraph. We also propose a polynomial-time algorithm which is able to detect the planted subgraph, albeit with more queries compared to the above lower bound. We conjecture that in the leftover regime, no polynomial-time algorithms exist. Our results resolve two open questions posed in the past literature.

Updated: 2024-05-03 06:15:48

标题: 使用查询检测随机子图

摘要: 种植的最密子图检测问题是指测试在给定（随机）图中是否存在一个异常密集的子图的任务。具体来说，我们观察到一个$n$个顶点的无向无权图。在零假设下，图是Erd\H{o}s-R\'{e}nyi图的一个实现，其边概率（或密度）为$q$。在备择假设下，有一个具有边概率$p>q$的$k$个顶点的子图。对于各种边参数$p$和$q$，这个问题的统计和计算障碍都被充分理解。在本文中，我们考虑上述问题的一个自然变体，其中只能通过自适应边查询观察图的一个相对较小的部分。对于这个模型，我们确定了检测到种植子图存在所需和必要的查询数量（伴随着一个准多项式最优算法）。我们还提出了一个能够检测到种植子图的多项式时间算法，尽管需要比上述下界更多的查询。我们猜测在剩余区域内，不存在多项式时间算法。我们的结果解决了过去文献中提出的两个未解问题。

更新时间: 2024-05-03 06:15:48

领域: cs.DS,cs.IT,cs.LG,math.IT,math.ST,stat.TH

下载: http://arxiv.org/abs/2110.00744v5

On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models

In this paper, we propose that small models may not need to absorb the cost of pre-training to reap its benefits. Instead, they can capitalize on the astonishing results achieved by modern, enormous models to a surprising degree. We observe that, when distilled on a task from a pre-trained teacher model, a small model can achieve or surpass the performance it would achieve if it was pre-trained then finetuned on that task. To allow this phenomenon to be easily leveraged, we establish a connection reducing knowledge distillation to modern contrastive learning, opening two doors: (1) vastly different model architecture pairings can work for the distillation, and (2) most contrastive learning algorithms rooted in the theory of Noise Contrastive Estimation can be easily applied and used. We demonstrate this paradigm using pre-trained teacher models from open-source model hubs, Transformer and convolution based model combinations, and a novel distillation algorithm that massages the Alignment/Uniformity perspective of contrastive learning by Wang & Isola (2020) into a distillation objective. We choose this flavor of contrastive learning due to its low computational cost, an overarching theme of this work. We also observe that this phenomenon tends not to occur if the task is data-limited. However, this can be alleviated by leveraging yet another scale-inspired development: large, pre-trained generative models for dataset augmentation. Again, we use an open-source model, and our rudimentary prompts are sufficient to boost the small model`s performance. Thus, we highlight a training method for small models that is up to 94% faster than the standard pre-training paradigm without sacrificing performance. For practitioners discouraged from fully utilizing modern foundation datasets for their small models due to the prohibitive scale, we believe our work keeps that door open.

Updated: 2024-05-03 06:08:30

标题: 关于蒸馏作为替代预训练小型模型的惊人有效性

摘要: 在这篇论文中，我们提出小模型可能不需要承担预训练的成本就能获得其好处。相反，它们可以在很大程度上利用现代巨大模型取得的惊人成果。我们观察到，当在来自预训练教师模型的任务上进行蒸馏时，小模型可以达到或超越它如果在该任务上预训练然后微调的表现。为了让这种现象容易地被利用，我们建立了一个将知识蒸馏简化为现代对比学习的联系，打开了两个门：（1）非常不同的模型架构配对可以用于蒸馏，（2）大多数根植于噪声对比估计理论的对比学习算法可以轻松应用和使用。我们使用来自开源模型中心的预训练教师模型，Transformer和基于卷积的模型组合，以及一种新的蒸馏算法，该算法将Wang＆Isola（2020）的对比学习中的Alignment/Uniformity视角转化为蒸馏目标。我们选择这种对比学习的风格是因为其低计算成本，这也是本工作的一个主题。我们还观察到，如果任务受到数据限制，这种现象往往不会发生。然而，通过利用另一种受规模启发的发展来缓解这种情况：大型、预训练的生成模型用于数据集增强。同样，我们使用开源模型，我们基本的提示就足以提升小模型的性能。因此，我们强调了一种针对小模型的训练方法，其速度比标准的预训练范式快达94％，而不牺牲性能。对于由于规模限制而被劝阻充分利用现代基础数据集用于其小模型的从业者，我们相信我们的工作保持了这扇门的敞开。

更新时间: 2024-05-03 06:08:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.03263v2

Enhancing Bangla Language Next Word Prediction and Sentence Completion through Extended RNN with Bi-LSTM Model On N-gram Language

Texting stands out as the most prominent form of communication worldwide. Individual spend significant amount of time writing whole texts to send emails or write something on social media, which is time consuming in this modern era. Word prediction and sentence completion will be suitable and appropriate in the Bangla language to make textual information easier and more convenient. This paper expands the scope of Bangla language processing by introducing a Bi-LSTM model that effectively handles Bangla next-word prediction and Bangla sentence generation, demonstrating its versatility and potential impact. We proposed a new Bi-LSTM model to predict a following word and complete a sentence. We constructed a corpus dataset from various news portals, including bdnews24, BBC News Bangla, and Prothom Alo. The proposed approach achieved superior results in word prediction, reaching 99\% accuracy for both 4-gram and 5-gram word predictions. Moreover, it demonstrated significant improvement over existing methods, achieving 35\%, 75\%, and 95\% accuracy for uni-gram, bi-gram, and tri-gram word prediction, respectively

Updated: 2024-05-03 06:06:01

标题: 通过扩展的双向长短期记忆（Bi-LSTM）模型在N-gram语言上增强孟加拉语下一个单词预测和句子完成

摘要: 短信是全球最突出的沟通形式。人们花费大量时间写整篇文本以发送电子邮件或在社交媒体上写东西，这在现代社会是耗时的。词预测和句子完成将适合于孟加拉语，使文本信息更容易和更方便。本文通过引入一个能有效处理孟加拉语下一个词预测和孟加拉语句子生成的Bi-LSTM模型，扩展了孟加拉语处理的范围，展示了其多样性和潜在影响。我们提出了一个新的Bi-LSTM模型来预测下一个单词并完成一句话。我们从各种新闻门户网站构建了一个语料库数据集，包括bdnews24、BBC新闻孟加拉语版和Prothom Alo。所提出的方法在单词预测方面取得了优越的结果，对于4-gram和5-gram单词预测均达到了99%的准确率。此外，它在现有方法基础上显著改进，分别达到了uni-gram、bi-gram和tri-gram单词预测的35%、75%和95%的准确率。

更新时间: 2024-05-03 06:06:01

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.01873v1

Predicting Traffic Congestion at Urban Intersections Using Data-Driven Modeling

Traffic congestion at intersections is a significant issue in urban areas, leading to increased commute times, safety hazards, and operational inefficiencies. This study aims to develop a predictive model for congestion at intersections in major U.S. cities, utilizing a dataset of trip-logging metrics from commercial vehicles across 4,800 intersections. The dataset encompasses 27 features, including intersection coordinates, street names, time of day, and traffic metrics (Kashyap et al., 2019). Additional features, such as rainfall/snowfall percentage, distance from downtown and outskirts, and road types, were incorporated to enhance the model's predictive power. The methodology involves data exploration, feature transformation, and handling missing values through low-rank models and label encoding. The proposed model has the potential to assist city planners and governments in anticipating traffic hot spots, optimizing operations, and identifying infrastructure challenges.

Updated: 2024-05-03 05:59:21

标题: 利用数据驱动建模预测城市交叉口的交通拥堵情况

摘要: 十字路口的交通拥堵是城市地区的一个重要问题，导致通勤时间增加、安全隐患增加以及运营效率低下。本研究旨在开发一个预测模型，用于预测美国主要城市的十字路口交通拥堵情况，利用了来自4800个十字路口商用车辆的行程记录数据集。该数据集包括27个特征，包括十字路口坐标、街道名称、时间和交通指标。为增强模型的预测能力，还加入了额外特征，如降雨/降雪百分比、距市中心和郊区的距离以及道路类型。方法包括数据探索、特征转换和通过低秩模型和标签编码处理缺失值。该模型有潜力帮助城市规划者和政府预测交通热点，优化运营并确定基础设施挑战。

更新时间: 2024-05-03 05:59:21

领域: cs.LG

下载: http://arxiv.org/abs/2404.08838v6

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of retrieved text. We introduce binary token representations (BTR), which use 1-bit vectors to precompute every token in passages, significantly reducing computation during inference. Despite the potential loss of accuracy, our new calibration techniques and training objectives restore performance. Combined with offline and runtime compression, this only requires 127GB of disk space for encoding 3 billion tokens in Wikipedia. Our experiments show that on five knowledge-intensive NLP tasks, BTR accelerates state-of-the-art inference by up to 4x and reduces storage by over 100x while maintaining over 95% task performance.

Updated: 2024-05-03 05:41:55

标题: BTR: 用于提高检索增强语言模型效率的二进制令牌表示

摘要: 检索增强解决了大型语言模型中许多关键问题，如幻觉、陈旧和隐私泄露。然而，运行检索增强的语言模型（LMs）由于处理大量检索到的文本而变慢且难以扩展。我们引入了二进制标记表示（BTR），它使用1位向量预先计算段落中的每个标记，显著减少了推理过程中的计算量。尽管存在精度损失的可能性，我们的新校准技术和训练目标恢复了性能。结合离线和运行时压缩，这仅需要127GB的磁盘空间来对维基百科中的30亿个标记进行编码。我们的实验表明，在五个知识密集型NLP任务上，BTR将最先进的推理加速了最多4倍，并将存储减少了超过100倍，同时保持了95%以上的任务性能。

更新时间: 2024-05-03 05:41:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.01329v2

Cyber Security in Energy Informatics: A Non-technical Perspective

Literature in cyber security including cyber security in energy informatics are tecnocentric focuses that may miss the chances of understanding a bigger picture of cyber security measures. This research thus aims to conduct a literature review focusing on non-technical issues in cyber security in the energy informatics field. The findings show that there are seven non-technical issues have been discussed in literature, including education, awareness, policy, standards, human, and risks, challenges, and solutions. These findings can be valuable for not only researchers, but also managers, policy makers, and educators.

Updated: 2024-05-03 05:39:23

标题: 能源信息学中的网络安全：非技术视角

摘要: 网络安全领域的文献，包括能源信息学中的网络安全，通常是技术中心化的重点，可能会忽视理解网络安全措施更广泛的机会。因此，这项研究旨在进行一项文献综述，重点关注能源信息学领域网络安全中的非技术问题。研究结果显示，文献中讨论了七个非技术问题，包括教育、意识、政策、标准、人力资源、风险、挑战和解决方案。这些发现对于研究人员、管理者、决策者和教育工作者都有价值。

更新时间: 2024-05-03 05:39:23

领域: cs.CR

下载: http://arxiv.org/abs/2405.01867v1

Assessing Confidence with Assurance 2.0

An assurance case is intended to provide justifiable confidence in the truth of its top claim, which typically concerns safety or security. A natural question is then "how much" confidence does the case provide? We argue that confidence cannot be reduced to a single attribute or measurement. Instead, we suggest it should be based on attributes that draw on three different perspectives: positive, negative, and residual doubts. Positive Perspectives consider the extent to which the evidence and overall argument of the case combine to make a positive statement justifying belief in its claims. We set a high bar for justification, requiring it to be indefeasible. The primary positive measure for this is soundness, which interprets the argument as a logical proof. Confidence in evidence can be expressed probabilistically and we use confirmation measures to ensure that the "weight" of evidence crosses some threshold. In addition, probabilities can be aggregated from evidence through the steps of the argument using probability logics to yield what we call probabilistic valuations for the claims. Negative Perspectives record doubts and challenges to the case, typically expressed as defeaters, and their exploration and resolution. Assurance developers must guard against confirmation bias and should vigorously explore potential defeaters as they develop the case, and should record them and their resolution to avoid rework and to aid reviewers. Residual Doubts: the world is uncertain so not all potential defeaters can be resolved. We explore risks and may deem them acceptable or unavoidable. It is crucial however that these judgments are conscious ones and that they are recorded in the assurance case. This report examines the perspectives in detail and indicates how Clarissa, our prototype toolset for Assurance 2.0, assists in their evaluation.

Updated: 2024-05-03 05:36:36

标题: 评估信心与保证2.0

摘要: 一个保证案件旨在提供对其顶层主张的真实性有合理信心，这些主张通常涉及安全性或安全性。一个自然的问题是“这个案例提供了多少”信心？我们认为信心不能简化为单一属性或度量。相反，我们建议应该基于吸引三种不同视角的属性：积极，消极和残余怀疑。积极视角考虑证据和案件总体论证的程度，这些证据和总体论证共同构成了一个正面声明，从而证明其主张的合理性。我们对合理性设立了较高的标准，要求其不可推翻。这方面的主要正面度量是完备性，它将论证解释为逻辑证明。对证据的信心可以以概率形式表达，我们使用确认度量来确保证据的“权重”跨越某个阈值。此外，概率可以通过概率逻辑从证据中聚合，以产生我们称之为主张的概率评估。消极视角记录对案件的怀疑和挑战，通常表达为挑战者，以及对其探索和解决。保证开发人员必须防范确认偏见，并应该在开发案例时积极探索潜在的挑战者，并记录它们及其解决方案，以避免重做并帮助审阅者。残余怀疑：世界是不确定的，因此并非所有潜在的挑战者都可以解决。我们探讨风险，并可能认为它们是可以接受的或不可避免的。然而，重要的是这些判断是有意识的，并且它们被记录在保证案例中。本报告详细探讨了这些视角，并指出我们的保证2.0原型工具集Clarissa如何协助评估。

更新时间: 2024-05-03 05:36:36

领域: cs.AI,D.2.9; K.6.4; J.7

下载: http://arxiv.org/abs/2205.04522v4

AI-Powered Autonomous Weapons Risk Geopolitical Instability and Threaten AI Research

The recent embrace of machine learning (ML) in the development of autonomous weapons systems (AWS) creates serious risks to geopolitical stability and the free exchange of ideas in AI research. This topic has received comparatively little attention of late compared to risks stemming from superintelligent artificial general intelligence (AGI), but requires fewer assumptions about the course of technological development and is thus a nearer-future issue. ML is already enabling the substitution of AWS for human soldiers in many battlefield roles, reducing the upfront human cost, and thus political cost, of waging offensive war. In the case of peer adversaries, this increases the likelihood of "low intensity" conflicts which risk escalation to broader warfare. In the case of non-peer adversaries, it reduces the domestic blowback to wars of aggression. This effect can occur regardless of other ethical issues around the use of military AI such as the risk of civilian casualties, and does not require any superhuman AI capabilities. Further, the military value of AWS raises the specter of an AI-powered arms race and the misguided imposition of national security restrictions on AI research. Our goal in this paper is to raise awareness among the public and ML researchers on the near-future risks posed by full or near-full autonomy in military technology, and we provide regulatory suggestions to mitigate these risks. We call upon AI policy experts and the defense AI community in particular to embrace transparency and caution in their development and deployment of AWS to avoid the negative effects on global stability and AI research that we highlight here.

Updated: 2024-05-03 05:19:45

标题: AI 动力自主武器的风险：地缘政治不稳定及对 AI 研究的威胁

摘要: 最近，机器学习（ML）在自主武器系统（AWS）的发展中受到了热烈的追捧，这给地缘政治稳定和人工智能研究中的自由交流带来了严重的风险。与来自超智能人工通用智能（AGI）的风险相比，这个话题最近受到的关注相对较少，但对技术发展的进程需要更少的假设，因此这是一个更近期的问题。ML已经使AWS能够在许多战场角色中替代人类士兵，减少了发动进攻战争的前期人力成本，从而降低了政治成本。在与同行对手的情况下，这增加了“低强度”冲突升级为更广泛战争的可能性。在与非同行对手的情况下，它减少了侵略战争的国内反弹。这种影响可能会发生，而不考虑军事AI使用中的其他伦理问题，如平民伤亡的风险，并不需要任何超人类的AI能力。此外，AWS的军事价值引发了AI动力军备竞赛的担忧，并误导性地对AI研究施加国家安全限制。本文旨在引起公众和机器学习研究人员对军事技术全面或近乎全面自主性带来的近期风险的意识，并提出监管建议以减轻这些风险。我们呼吁AI政策专家和防御AI社区特别是在其开发和部署AWS时采取透明和谨慎的态度，以避免我们在这里强调的对全球稳定和AI研究的负面影响。

更新时间: 2024-05-03 05:19:45

领域: cs.CY,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.01859v1

Detecting and Ranking Causal Anomalies in End-to-End Complex System

With the rapid development of technology, the automated monitoring systems of large-scale factories are becoming more and more important. By collecting a large amount of machine sensor data, we can have many ways to find anomalies. We believe that the real core value of an automated monitoring system is to identify and track the cause of the problem. The most famous method for finding causal anomalies is RCA, but there are many problems that cannot be ignored. They used the AutoRegressive eXogenous (ARX) model to create a time-invariant correlation network as a machine profile, and then use this profile to track the causal anomalies by means of a method called fault propagation. There are two major problems in describing the behavior of a machine by using the correlation network established by ARX: (1) It does not take into account the diversity of states (2) It does not separately consider the correlations with different time-lag. Based on these problems, we propose a framework called Ranking Causal Anomalies in End-to-End System (RCAE2E), which completely solves the problems mentioned above. In the experimental part, we use synthetic data and real-world large-scale photoelectric factory data to verify the correctness and existence of our method hypothesis.

Updated: 2024-05-03 05:13:55

标题: 检测和排名端到端复杂系统中的因果异常

摘要: 随着技术的迅速发展，大型工厂的自动化监控系统变得越来越重要。通过收集大量机器传感器数据，我们可以找到许多发现异常的方法。我们认为，自动监控系统的真正核心价值在于识别和跟踪问题的原因。找出因果异常的最著名方法是RCA，但有许多问题是不能忽视的。他们使用自回归外生（ARX）模型创建一个时间不变的相关网络作为机器配置文件，然后利用这个配置文件通过一种称为故障传播的方法跟踪因果异常。使用ARX建立的相关网络描述机器行为存在两个主要问题：（1）它没有考虑状态的多样性（2）它没有单独考虑不同时间滞后的相关性。基于这些问题，我们提出了一个名为Ranking Causal Anomalies in End-to-End System (RCAE2E)的框架，完全解决了上述问题。在实验部分，我们使用合成数据和现实世界大型光电工厂数据来验证我们方法假设的正确性和存在性。

更新时间: 2024-05-03 05:13:55

领域: cs.LG

下载: http://arxiv.org/abs/2301.07281v2

Learning to Persuade on the Fly: Robustness Against Ignorance

Motivated by information sharing in online platforms, we study repeated persuasion between a sender and a stream of receivers where at each time, the sender observes a payoff-relevant state drawn independently and identically from an unknown distribution, and shares state information with the receivers who each choose an action. The sender seeks to persuade the receivers into taking actions aligned with the sender's preference by selectively sharing state information. However, in contrast to the standard models, neither the sender nor the receivers know the distribution, and the sender has to persuade while learning the distribution on the fly. We study the sender's learning problem of making persuasive action recommendations to achieve low regret against the optimal persuasion mechanism with the knowledge of the distribution. To do this, we first propose and motivate a persuasiveness criterion for the unknown distribution setting that centers robustness as a requirement in the face of uncertainty. Our main result is an algorithm that, with high probability, is robustly-persuasive and achieves $O(\sqrt{T\log T})$ regret, where $T$ is the horizon length. Intuitively, at each time our algorithm maintains a set of candidate distributions, and chooses a signaling mechanism that is simultaneously persuasive for all of them. Core to our proof is a tight analysis about the cost of robust persuasion, which may be of independent interest. We further prove that this regret order is optimal (up to logarithmic terms) by showing that no algorithm can achieve regret better than $\Omega(\sqrt{T})$.

Updated: 2024-05-03 05:08:29

标题: 学会即兴说服：对抗无知的稳健性

摘要: 受在线平台信息共享的启发，我们研究了一个发送者和一组接收者之间的重复说服过程，其中在每个时间点，发送者观察到从未知分布独立和相同地抽取的与收益相关的状态，并与选择行动的接收者共享状态信息。发送者试图通过选择性地分享状态信息说服接收者采取与发送者偏好一致的行动。然而，与标准模型相反，发送者和接收者都不知道分布，发送者必须在学习过程中说服。我们研究了发送者的学习问题，即通过制定具有低后悔率的说服性行动建议来与具有分布知识的最佳说服机制竞争。为此，我们首先提出并推动了一个适用于未知分布设置的说服性准则，该准则将鲁棒性作为面对不确定性的要求。我们的主要结果是一个算法，高概率下是具有鲁棒说服力的，并且实现了$O(\sqrt{T\log T})$的后悔率，其中$T$是时间长度。直观地说，我们的算法在每个时间点维护一组候选分布，并选择一个对所有候选分布都具有说服力的信号机制。我们证明了鲁棒说服的成本的严格分析是证明的核心，这可能是独立感兴趣的。我们进一步证明了这种后悔率次序是最优的（除了对数项以外），通过展示没有算法能够实现比$\Omega(\sqrt{T})$更低的后悔率。

更新时间: 2024-05-03 05:08:29

领域: cs.GT,cs.LG,econ.TH,91A28, 68W27, 68Q25,F.2; G.3

下载: http://arxiv.org/abs/2102.10156v2

Contraction of Locally Differentially Private Mechanisms

We investigate the contraction properties of locally differentially private mechanisms. More specifically, we derive tight upper bounds on the divergence between $PK$ and $QK$ output distributions of an $\epsilon$-LDP mechanism $K$ in terms of a divergence between the corresponding input distributions $P$ and $Q$, respectively. Our first main technical result presents a sharp upper bound on the $\chi^2$-divergence $\chi^2(PK}\|QK)$ in terms of $\chi^2(P\|Q)$ and $\varepsilon$. We also show that the same result holds for a large family of divergences, including KL-divergence and squared Hellinger distance. The second main technical result gives an upper bound on $\chi^2(PK\|QK)$ in terms of total variation distance $\mathsf{TV}(P, Q)$ and $\epsilon$. We then utilize these bounds to establish locally private versions of the van Trees inequality, Le Cam's, Assouad's, and the mutual information methods, which are powerful tools for bounding minimax estimation risks. These results are shown to lead to better privacy analyses than the state-of-the-arts in several statistical problems such as entropy and discrete distribution estimation, non-parametric density estimation, and hypothesis testing.

Updated: 2024-05-03 05:05:04

标题: 局部差分隐私机制的收缩

摘要: 我们研究了局部差分隐私机制的收缩特性。更具体地，我们导出了一个$\epsilon$-LDP机制$K$的输出分布$PK$和$QK$之间的差异与相应的输入分布$P$和$Q$之间的差异之间的紧密上界。我们的第一个主要技术结果给出了一个关于$\chi^2$-散度$\chi^2(PK\|QK)$的尖锐上界，与$\chi^2(P\|Q)$和$\varepsilon$有关。我们还展示了相同的结果适用于包括KL散度和平方Hellinger距离在内的大量散度。第二个主要技术结果给出了一个关于$\chi^2(PK\|QK)$的上界，与总变差距离$\mathsf{TV}(P, Q)$和$\epsilon$有关。然后我们利用这些上界建立了范特里不等式、Le Cam、Assouad和互信息方法的局部私密版本，这些方法是用于限制极小化估计风险的强大工具。这些结果被证明在诸如熵和离散分布估计、非参数密度估计和假设检验等多个统计问题中比现有技术有更好的隐私分析。

更新时间: 2024-05-03 05:05:04

领域: cs.IT,cs.CR,math.IT,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2210.13386v4

Robust Explainable Recommendation

Explainable Recommender Systems is an important field of study which provides reasons behind the suggested recommendations. Explanations with recommender systems are useful for developers while debugging anomalies within the system and for consumers while interpreting the model's effectiveness in capturing their true preferences towards items. However, most of the existing state-of-the-art (SOTA) explainable recommenders could not retain their explanation capability under noisy circumstances and moreover are not generalizable across different datasets. The robustness of the explanations must be ensured so that certain malicious attackers do not manipulate any high-stake decision scenarios to their advantage, which could cause severe consequences affecting large groups of interest. In this work, we present a general framework for feature-aware explainable recommenders that can withstand external attacks and provide robust and generalized explanations. This paper presents a novel framework which could be utilized as an additional defense tool, preserving the global explainability when subject to model-based white box attacks. Our framework is simple to implement and supports different methods regardless of the internal model structure and intrinsic utility within any model. We experimented our framework on two architecturally different feature-based SOTA explainable algorithms by training them on three popular e-commerce datasets of increasing scales. We noticed that both the algorithms displayed an overall improvement in the quality and robustness of the global explainability under normal as well as noisy environments across all the datasets, indicating the flexibility and mutability of our framework.

Updated: 2024-05-03 05:03:07

标题: 强大的可解释推荐

摘要: 可解释的推荐系统是一个重要的研究领域，提供了对推荐原因的解释。推荐系统的解释对开发者在调试系统中的异常和消费者在解释模型捕捉其真实偏好的有效性方面很有用。然而，大多数现有的最先进的可解释推荐系统在嘈杂环境下无法保持其解释能力，而且在不同数据集之间也无法泛化。必须确保解释的鲁棒性，以防止某些恶意攻击者利用任何高风险决策场景，可能导致影响大量利益相关群体的严重后果。在这项工作中，我们提出了一个针对特征感知的可解释推荐系统的通用框架，可以抵御外部攻击并提供稳健和泛化的解释。本文提出了一个新颖的框架，可以作为额外的防御工具，在面对基于模型的白盒攻击时保留全局可解释性。我们的框架易于实现，支持不同方法，不受内部模型结构和模型内在效用的影响。我们在两种结构不同的基于特征的最先进的可解释算法上实验了我们的框架，通过在三个规模逐渐增大的流行电子商务数据集上训练它们。我们注意到，这两种算法在所有数据集上在正常和嘈杂环境下都显示出了全局解释能力的质量和稳健性的整体改善，表明了我们框架的灵活性和可变性。

更新时间: 2024-05-03 05:03:07

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.01855v1

MagicDrive: Street View Generation with Diverse 3D Geometry Control

Recent advancements in diffusion models have significantly enhanced the data synthesis with 2D control. Yet, precise 3D control in street view generation, crucial for 3D perception tasks, remains elusive. Specifically, utilizing Bird's-Eye View (BEV) as the primary condition often leads to challenges in geometry control (e.g., height), affecting the representation of object shapes, occlusion patterns, and road surface elevations, all of which are essential to perception data synthesis, especially for 3D object detection tasks. In this paper, we introduce MagicDrive, a novel street view generation framework, offering diverse 3D geometry controls including camera poses, road maps, and 3D bounding boxes, together with textual descriptions, achieved through tailored encoding strategies. Besides, our design incorporates a cross-view attention module, ensuring consistency across multiple camera views. With MagicDrive, we achieve high-fidelity street-view image & video synthesis that captures nuanced 3D geometry and various scene descriptions, enhancing tasks like BEV segmentation and 3D object detection.

Updated: 2024-05-03 04:50:27

标题: MagicDrive：具有多样化三维几何控制的街景生成

摘要: 最近扩展了扩散模型在2D控制下的数据综合，但在街景生成中精确的3D控制对于3D感知任务至关重要，仍然难以实现。具体来说，将鸟瞰图作为主要条件经常导致几何控制方面的挑战（例如高度），影响着对象形状、遮挡模式和道路表面高程的表示，这些对于感知数据综合至关重要，特别是对于3D对象检测任务。在本文中，我们介绍了MagicDrive，一个新颖的街景生成框架，提供了包括摄像机姿势、道路地图和3D边界框在内的多样化3D几何控制，同时还包括文本描述，通过定制的编码策略实现。此外，我们的设计还包含了一个跨视图注意力模块，确保在多个摄像机视图之间的一致性。通过MagicDrive，我们实现了高保真的街景图像和视频综合，捕捉了微妙的3D几何和各种场景描述，增强了诸如鸟瞰图分割和3D对象检测等任务。

更新时间: 2024-05-03 04:50:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2310.02601v7

Tokenization of Real Estate Assets Using Blockchain

Blockchain technology is one of the key technologies that have revolutionized various facets of society, such as the banking, healthcare, and other critical ecosystems. One area that can harness the usage of blockchain is the real estate sector. The most lucrative long-term investment is real estate, followed by gold, equities, mutual funds, and savings accounts. Nevertheless, it has administrative overheads such as lack of transparency, fraud, several intermediaries, title issues, paperwork, an increasing number of arbitrations, and the lack of liquidity. This paper proposes a framework that uses blockchain as an underlying technology. With the aid of blockchain and the suite of tools, it supports many of these problems that can be alleviated in the real estate investment ecosystem. These include smart contracts, immutable record management, tokenization, record tracking, and time-stamped storage. Tokenization of real estate lowers the entry barrier by fixing liquidity and interoperability and improving the interaction between various stakeholders.

Updated: 2024-05-03 04:50:17

标题: 使用区块链对房地产资产进行代币化

摘要: 区块链技术是改变社会各个方面的关键技术之一，如银行、医疗保健和其他关键生态系统。一个可以利用区块链的领域是房地产行业。最具吸引力的长期投资是房地产，其次是黄金、股票、共同基金和储蓄账户。然而，它存在行政开支，如缺乏透明度、欺诈、多个中介机构、产权问题、文件工作、越来越多的仲裁和缺乏流动性等问题。本文提出了一个使用区块链作为基础技术的框架。借助区块链和一套工具，支持房地产投资生态系统中可以缓解许多这些问题。这些包括智能合约、不可篡改的记录管理、代币化、记录跟踪和时间戳存储。房地产的代币化通过固定流动性和互操作性降低了进入壁垒，并改善了各方利益相关者之间的互动。

更新时间: 2024-05-03 04:50:17

领域: cs.DC,cs.CR,cs.ET

下载: http://arxiv.org/abs/2405.01852v1

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been explored to optimize computation distribution, achieve load balance, and minimize communication cost across processors. Yet their practical effectiveness in the dynamic and diverse real-world mobile environment is less explored. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors. Through carefully designed experiments covering various DL models, mobile software/hardware environments, workload patterns, and resource availability, we identify limitations of existing techniques and highlight opportunities for cross-level optimization.

Updated: 2024-05-03 04:47:23

标题: 异构移动处理器上的深度学习推断：潜力与风险

摘要: 随着对在资源受限的移动设备上部署计算密集型深度学习（DL）模型进行实时智能应用的需求不断增长。移动设备配备了各种处理单元，如CPU、GPU和NPU，具有加速DL推断的潜力，通过跨异构处理器的并行执行。已经探索了各种高效的并行方法来优化计算分布，实现负载平衡，并最大程度地降低处理器间的通信成本。然而它们在动态和多样化的真实移动环境中的实际有效性尚未得到充分研究。本文提出了一项全面的经验研究，评估了异构移动处理器上并行DL推断的能力和挑战。通过精心设计的实验，涵盖了各种DL模型、移动软件/硬件环境、工作负载模式和资源可用性，我们确定了现有技术的局限性，并强调了跨级别优化的机会。

更新时间: 2024-05-03 04:47:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.01851v1

Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities

In recent years, generative artificial intelligence models, represented by Large Language Models (LLMs) and Diffusion Models (DMs), have revolutionized content production methods. These artificial intelligence-generated content (AIGC) have become deeply embedded in various aspects of daily life and work. However, these technologies have also led to the emergence of Fake Artificial Intelligence Generated Content (FAIGC), posing new challenges in distinguishing genuine information. It is crucial to recognize that AIGC technology is akin to a double-edged sword; its potent generative capabilities, while beneficial, also pose risks for the creation and dissemination of FAIGC. In this survey, We propose a new taxonomy that provides a more comprehensive breakdown of the space of FAIGC methods today. Next, we explore the modalities and generative technologies of FAIGC. We introduce FAIGC detection methods and summarize the related benchmark from various perspectives. Finally, we discuss outstanding challenges and promising areas for future research.

Updated: 2024-05-03 04:47:01

标题: 伪造的人工智能生成内容（FAIGC）：理论、检测方法和机会调查

摘要: 近年来，以大型语言模型（LLMs）和扩散模型（DMs）为代表的生成式人工智能模型彻底改变了内容生产方法。这些人工智能生成的内容（AIGC）已经深深嵌入到日常生活和工作的各个方面。然而，这些技术也导致了虚假人工智能生成的内容（FAIGC）的出现，给区分真实信息带来了新的挑战。认识到AIGC技术类似于一把双刃剑是至关重要的；它强大的生成能力虽然有益，但也存在为创建和传播FAIGC带来风险的可能性。在这项调查中，我们提出了一个新的分类法，更全面地解析了当今FAIGC方法的空间。接下来，我们探讨了FAIGC的模态和生成技术。我们介绍了FAIGC检测方法，并从各个角度总结了相关基准。最后，我们讨论了未解决的挑战和未来研究的有希望的领域。

更新时间: 2024-05-03 04:47:01

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.00711v2

Stability of Explainable Recommendation

Explainable Recommendation has been gaining attention over the last few years in industry and academia. Explanations provided along with recommendations in a recommender system framework have many uses: particularly reasoning why a suggestion is provided and how well an item aligns with a user's personalized preferences. Hence, explanations can play a huge role in influencing users to purchase products. However, the reliability of the explanations under varying scenarios has not been strictly verified from an empirical perspective. Unreliable explanations can bear strong consequences such as attackers leveraging explanations for manipulating and tempting users to purchase target items that the attackers would want to promote. In this paper, we study the vulnerability of existent feature-oriented explainable recommenders, particularly analyzing their performance under different levels of external noises added into model parameters. We conducted experiments by analyzing three important state-of-the-art (SOTA) explainable recommenders when trained on two widely used e-commerce based recommendation datasets of different scales. We observe that all the explainable models are vulnerable to increased noise levels. Experimental results verify our hypothesis that the ability to explain recommendations does decrease along with increasing noise levels and particularly adversarial noise does contribute to a much stronger decrease. Our study presents an empirical verification on the topic of robust explanations in recommender systems which can be extended to different types of explainable recommenders in RS.

Updated: 2024-05-03 04:44:51

标题: 可解释推荐的稳定性

摘要: 可解释性推荐在过去几年已经在工业界和学术界引起了关注。在推荐系统框架中提供的解释具有许多用途：特别是解释为什么提供建议以及物品与用户个性化偏好的匹配程度。因此，解释在影响用户购买产品方面可以发挥巨大作用。然而，在不同情况下解释的可靠性尚未从经验角度严格验证。不可靠的解释可能会带来严重后果，例如攻击者利用解释来操纵和诱使用户购买攻击者想要推广的目标物品。在本文中，我们研究了现有基于特征的可解释性推荐系统的脆弱性，特别是分析了它们在不同级别的外部噪声添加到模型参数时的表现。我们通过分析在两个不同规模的广泛使用的电子商务推荐数据集上训练时的三个重要的最新技术（SOTA）可解释性推荐算法来进行实验。我们观察到所有的可解释性模型都对增加的噪声水平脆弱。实验结果验证了我们的假设，即解释推荐的能力随着噪声水平的增加而减弱，尤其是对抗性噪声会导致更大的减少。我们的研究提供了有关推荐系统中稳健解释的话题的经验验证，这可以扩展到不同类型的可解释性推荐系统中。

更新时间: 2024-05-03 04:44:51

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.01849v1

RankSHAP: a Gold Standard Feature Attribution Method for the Ranking Task

Several works propose various post-hoc, model-agnostic explanations for the task of ranking, i.e. the task of ordering a set of documents, via feature attribution methods. However, these attributions are seen to weakly correlate and sometimes contradict each other. In classification/regression, several works focus on \emph{axiomatic characterization} of feature attribution methods, showing that a certain method uniquely satisfies a set of desirable properties. However, no such efforts have been taken in the space of feature attributions for the task of ranking. We take an axiomatic game-theoretic approach, popular in the feature attribution community, to identify candidate attribution methods for ranking tasks. We first define desirable axioms: Rank-Efficiency, Rank-Missingness, Rank-Symmetry and Rank-Monotonicity, all variants of the classical Shapley axioms. Next, we introduce Rank-SHAP, a feature attribution algorithm for the general ranking task, which is an extension to classical Shapley values. We identify a polynomial-time algorithm for computing approximate Rank-SHAP values and evaluate the computational efficiency and accuracy of our algorithm under various scenarios. We also evaluate its alignment with human intuition with a user study. Lastly, we theoretically examine popular rank attribution algorithms, EXS and Rank-LIME, and evaluate their capacity to satisfy the classical Shapley axioms.

Updated: 2024-05-03 04:43:24

标题: RankSHAP: 一种用于排名任务的金标准特征归因方法

摘要: 一些研究提出了各种后验、与模型无关的解释，用于排名任务，即对一组文档进行排序，通过特征归因方法。然而，这些归因被认为弱相关，有时甚至相互矛盾。在分类/回归方面，一些研究关注特征归因方法的\emph{公理特征化}，展示了某种方法唯一满足一组理想属性。然而，在排名任务的特征归因领域中并没有进行类似的努力。我们采用了一个在特征归因社区中流行的公理博弈论方法，来识别用于排名任务的候选归因方法。我们首先定义了理想的公理：排名效率、排名缺失性、排名对称性和排名单调性，都是经典Shapley公理的变体。接下来，我们介绍了Rank-SHAP，一种用于一般排名任务的特征归因算法，它是对经典Shapley值的扩展。我们确定了一个计算近似Rank-SHAP值的多项式时间算法，并在不同情景下评估了我们算法的计算效率和准确性。我们还通过用户研究评估了它与人类直觉的一致性。最后，我们理论上检验了流行的排名归因算法EXS和Rank-LIME，并评估它们满足经典Shapley公理的能力。

更新时间: 2024-05-03 04:43:24

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.01848v1

Learning Risk-Aware Quadrupedal Locomotion using Distributional Reinforcement Learning

Deployment in hazardous environments requires robots to understand the risks associated with their actions and movements to prevent accidents. Despite its importance, these risks are not explicitly modeled by currently deployed locomotion controllers for legged robots. In this work, we propose a risk sensitive locomotion training method employing distributional reinforcement learning to consider safety explicitly. Instead of relying on a value expectation, we estimate the complete value distribution to account for uncertainty in the robot's interaction with the environment. The value distribution is consumed by a risk metric to extract risk sensitive value estimates. These are integrated into Proximal Policy Optimization (PPO) to derive our method, Distributional Proximal Policy Optimization (DPPO). The risk preference, ranging from risk-averse to risk-seeking, can be controlled by a single parameter, which enables to adjust the robot's behavior dynamically. Importantly, our approach removes the need for additional reward function tuning to achieve risk sensitivity. We show emergent risk sensitive locomotion behavior in simulation and on the quadrupedal robot ANYmal. Videos of the experiments and code are available at https://sites.google.com/leggedrobotics.com/risk-aware-locomotion.

Updated: 2024-05-03 04:39:46

标题: 使用分布式强化学习学习风险感知四足动物步态

摘要: 在危险环境中部署机器人需要机器人理解与其行动和移动相关的风险，以防止事故发生。尽管这一点很重要，但目前部署的四足机器人的运动控制器并没有明确建模这些风险。在这项工作中，我们提出了一种风险敏感的运动训练方法，采用分布式强化学习来明确考虑安全。我们不是依赖值期望，而是估计完整的值分布，以考虑机器人与环境互动的不确定性。值分布被风险指标消耗，以提取风险敏感的值估计。这些值估计被整合到近端策略优化（PPO）中，从而得到我们的方法，分布式近端策略优化（DPPO）。风险偏好，从风险规避到风险寻求，可以通过一个单一参数来控制，从而使机器人的行为能够动态调整。重要的是，我们的方法消除了需要额外奖励函数调整来实现风险敏感性的需求。我们在模拟环境和四足机器人ANYmal上展示了新兴的风险敏感的运动行为。实验的视频和代码可在https://sites.google.com/leggedrobotics.com/risk-aware-locomotion 上找到。

更新时间: 2024-05-03 04:39:46

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2309.14246v2

A Model-based Multi-Agent Personalized Short-Video Recommender System

Recommender selects and presents top-K items to the user at each online request, and a recommendation session consists of several sequential requests. Formulating a recommendation session as a Markov decision process and solving it by reinforcement learning (RL) framework has attracted increasing attention from both academic and industry communities. In this paper, we propose a RL-based industrial short-video recommender ranking framework, which models and maximizes user watch-time in an environment of user multi-aspect preferences by a collaborative multi-agent formulization. Moreover, our proposed framework adopts a model-based learning approach to alleviate the sample selection bias which is a crucial but intractable problem in industrial recommender system. Extensive offline evaluations and live experiments confirm the effectiveness of our proposed method over alternatives. Our proposed approach has been deployed in our real large-scale short-video sharing platform, successfully serving over hundreds of millions users.

Updated: 2024-05-03 04:34:36

标题: 基于模型的多智能体个性化短视频推荐系统

摘要: 推荐系统在每次在线请求中为用户选择并呈现前K个项目，并且推荐会话由多个连续请求组成。将推荐会话建模为马尔可夫决策过程，并通过强化学习（RL）框架解决这一问题，引起了学术界和行业界的日益关注。在本文中，我们提出了一个基于RL的工业短视频推荐排名框架，通过协作多智能体形式化，在用户多方面偏好环境中建模和最大化用户观看时间。此外，我们所提出的框架采用了基于模型的学习方法，以减轻样本选择偏差，这是工业推荐系统中一个关键但难以解决的问题。广泛的离线评估和实时实验确认了我们提出的方法相对于其他替代方法的有效性。我们提出的方法已经在我们的实际大规模短视频分享平台上部署，成功为数亿用户提供服务。

更新时间: 2024-05-03 04:34:36

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2405.01847v1

A Survey on Privacy-Preserving Caching at Network Edge: Classification, Solutions, and Challenges

Caching content at the network edge is a popular and effective technique widely deployed to alleviate the burden of network backhaul, shorten service delay and improve service quality. However, there has been some controversy over privacy violations in caching content at the network edge. On the one hand, the multi-access open edge network provides an ideal surface for external attackers to obtain private data from the edge cache by extracting sensitive information. On the other hand, privacy can be infringed by curious edge caching providers through caching trace analysis targeting to achieve better caching performance or higher profits. Therefore, an in-depth understanding of privacy issues in edge caching networks is vital and indispensable for creating a privacy-preserving caching service at the network edge. In this article, we are among the first to fill in this gap by examining privacy-preserving techniques for caching content at the network edge. Firstly, we provide an introduction to the background of Privacy-Preserving Edge Caching (PPEC). Next, we summarize the key privacy issues and present a taxonomy for caching at the network edge from the perspective of private data. Additionally, we conduct a retrospective review of the state-of-the-art countermeasures against privacy leakage from content caching at the network edge. Finally, we conclude the survey and envision challenges for future research.

Updated: 2024-05-03 04:27:32

标题: 网络边缘隐私保护缓存调查：分类、解决方案和挑战

摘要: 在网络边缘缓存内容是一种流行且有效的技术，被广泛部署以减轻网络回程的负担，缩短服务延迟并提高服务质量。然而，在网络边缘缓存内容方面存在一些关于隐私侵犯的争议。一方面，多接入开放边缘网络为外部攻击者提供了一个理想的表面，通过提取敏感信息从边缘缓存获取私人数据。另一方面，好奇的边缘缓存提供商可以通过缓存跟踪分析侵犯隐私，以实现更好的缓存性能或更高的利润。因此，对边缘缓存网络中的隐私问题有深入的了解是至关重要且不可或缺的，以创建一个在网络边缘保护隐私的缓存服务。在本文中，我们是首批填补这一差距的人，通过研究在网络边缘缓存内容的隐私保护技术。首先，我们介绍隐私保护边缘缓存（PPEC）的背景。接下来，我们总结关键的隐私问题，并从私人数据的角度提出了一个在网络边缘缓存方面的分类。此外，我们对防止内容在网络边缘缓存中泄露隐私的最新对策进行回顾。最后，我们总结调查结果并展望未来研究的挑战。

更新时间: 2024-05-03 04:27:32

领域: cs.NI,cs.CR,cs.DC

下载: http://arxiv.org/abs/2405.01844v1

Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization

The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs bridging to bring the analysis in line with practical implementations of AC. To address this, we advocate for considering the MMCLG criteria: \textbf{M}ulti-layer neural network parametrization for actor/critic, \textbf{M}arkovian sampling, \textbf{C}ontinuous state-action spaces, the performance of the \textbf{L}ast iterate, and \textbf{G}lobal optimality. These aspects are practically significant and have been largely overlooked in existing theoretical analyses of AC algorithms. In this work, we address these gaps by providing the first comprehensive theoretical analysis of AC algorithms that encompasses all five crucial practical aspects (covers MMCLG criteria). We establish global convergence sample complexity bounds of $\tilde{\mathcal{O}}\left({\epsilon^{-3}}\right)$. We achieve this result through our novel use of the weak gradient domination property of MDP's and our unique analysis of the error in critic estimation.

Updated: 2024-05-03 04:26:03

标题: Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization 缩小差距：在马尔可夫采样下利用神经网络参数化实现Actor-Critic的全局收敛（最后迭代）

摘要: 目前，关于演员-评论家（AC）算法的最先进的理论分析在处理AC实现的实际方面方面显著滞后。这一关键差距需要弥合，以使分析与AC的实际实现保持一致。为了解决这个问题，我们主张考虑MMCLG标准：\textbf{M}ulti-layer神经网络参数化用于演员/评论家，\textbf{M}arkovian采样，\textbf{C}ontinuous状态-动作空间，最后迭代的性能，以及全局最优性。\textbf{G}。这些方面在实践中具有重要意义，但在现有的AC算法的理论分析中往往被忽视。在本研究中，我们通过提供第一个涵盖所有五个关键实际方面（包括MMCLG标准）的AC算法的全面理论分析，来解决这些差距。我们建立了全局收敛样本复杂度界限为$\tilde{\mathcal{O}}\left({\epsilon^{-3}}\right)$。我们通过创新地利用MDP的弱梯度支配性质和我们对评论家估计误差的独特分析来实现这一结果。

更新时间: 2024-05-03 04:26:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.01843v1

An Essay concerning machine understanding

Artificial intelligence systems exhibit many useful capabilities, but they appear to lack understanding. This essay describes how we could go about constructing a machine capable of understanding. As John Locke (1689) pointed out words are signs for ideas, which we can paraphrase as thoughts and concepts. To understand a word is to know and be able to work with the underlying concepts for which it is an indicator. Understanding between a speaker and a listener occurs when the speaker casts his or her concepts into words and the listener recovers approximately those same concepts. Current models rely on the listener to construct any potential meaning. The diminution of behaviorism as a psychological paradigm and the rise of cognitivism provide examples of many experimental methods that can be used to determine whether and to what extent a machine might understand and to make suggestions about how that understanding might be instantiated.

Updated: 2024-05-03 04:12:43

标题: 一篇关于机器理解的论文

摘要: 人工智能系统展示了许多有用的能力，但它们似乎缺乏理解能力。本文描述了我们如何构建一台能够理解的机器。正如约翰·洛克（1689年）所指出的，单词是思想和概念的表示，理解一个词就是了解并能够处理它所指示的基本概念。说话者和听者之间的理解发生在说话者将自己的概念转化为语言时，听者大致能够恢复出相同的概念。目前的模型依赖于听者构建任何潜在的含义。行为主义作为心理学范式的衰退和认知主义的兴起提供了许多实验方法的例子，这些方法可以用来确定机器是否以及在多大程度上理解，并提出关于如何实现这种理解的建议。

更新时间: 2024-05-03 04:12:43

领域: cs.AI

下载: http://arxiv.org/abs/2405.01840v1

SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning

Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task space. The social impact theory regards the complex influencing factors as forces acting on an agent, emanating from the environment, other agents, and the agent's intrinsic motivation, referring to the social force. Inspired by this concept, we propose a novel gradient-based state representation for multi-agent reinforcement learning. To non-trivially model the social forces, we further introduce a data-driven method, where we employ denoising score matching to learn the social gradient fields (SocialGFs) from offline samples, e.g., the attractive or repulsive outcomes of each force. During interactions, the agents take actions based on the multi-dimensional gradients to maximize their own rewards. In practice, we integrate SocialGFs into the widely used multi-agent reinforcement learning algorithms, e.g., MAPPO. The empirical results reveal that SocialGFs offer four advantages for multi-agent systems: 1) they can be learned without requiring online interaction, 2) they demonstrate transferability across diverse tasks, 3) they facilitate credit assignment in challenging reward settings, and 4) they are scalable with the increasing number of agents.

Updated: 2024-05-03 04:12:19

标题: SocialGFs：学习多智能体强化学习的社交梯度场

摘要: 多智能体系统（MAS）需要适应动态环境、不断变化的智能体群体和多样化的任务。然而，由于状态和任务空间的复杂性，大多数多智能体系统无法轻松处理这些情况。社会影响理论将复杂的影响因素视为作用在智能体上的力量，来自环境、其他智能体和智能体的内在动机，即社会力。受此概念启发，我们提出了一种新颖的基于梯度的多智能体强化学习状态表示。为了非平凡地模拟社会力量，我们进一步引入了一种数据驱动方法，利用去噪评分匹配从离线样本中学习社会梯度场（SocialGFs），例如每种力量的吸引或排斥结果。在交互过程中，智能体根据多维梯度采取行动，以最大化自己的奖励。在实践中，我们将SocialGFs集成到广泛使用的多智能体强化学习算法中，例如MAPPO。实证结果显示，SocialGFs为多智能体系统提供了四个优势：1）它们可以在不需要在线交互的情况下学习，2）它们在不同任务之间展示出可转移性，3）它们促进了在具有挑战性奖励设置中的信用分配，4）它们随着智能体数量的增加而可扩展。

更新时间: 2024-05-03 04:12:19

领域: cs.AI,cs.MA

下载: http://arxiv.org/abs/2405.01839v1

A Novel Approach to Guard from Adversarial Attacks using Stable Diffusion

Recent developments in adversarial machine learning have highlighted the importance of building robust AI systems to protect against increasingly sophisticated attacks. While frameworks like AI Guardian are designed to defend against these threats, they often rely on assumptions that can limit their effectiveness. For example, they may assume attacks only come from one direction or include adversarial images in their training data. Our proposal suggests a different approach to the AI Guardian framework. Instead of including adversarial examples in the training process, we propose training the AI system without them. This aims to create a system that is inherently resilient to a wider range of attacks. Our method focuses on a dynamic defense strategy using stable diffusion that learns continuously and models threats comprehensively. We believe this approach can lead to a more generalized and robust defense against adversarial attacks. In this paper, we outline our proposed approach, including the theoretical basis, experimental design, and expected impact on improving AI security against adversarial threats.

Updated: 2024-05-03 04:08:15

标题: 一种稳定扩散的方法防范对抗性攻击

摘要: 最近在对抗机器学习领域的发展突显了建立强大的人工智能系统以抵御日益复杂的攻击的重要性。虽然像AI Guardian这样的框架旨在防御这些威胁，但它们通常依赖可能限制其有效性的假设。例如，它们可能假设攻击只来自一个方向或在其训练数据中包含对抗性图像。我们的提议建议了一种不同的AI Guardian框架方法。我们建议在训练过程中不包括对抗性示例，而是训练AI系统。这旨在创建一个固有对更广泛范围的攻击具有弹性的系统。我们的方法集中于使用稳定的扩散来学习连续并全面建模威胁的动态防御策略。我们相信这种方法可以导致更广泛和强大的对抗攻击防御。在本文中，我们概述了我们提出的方法，包括理论基础、实验设计以及对改善人工智能安全性抵御对抗性威胁的预期影响。

更新时间: 2024-05-03 04:08:15

领域: cs.LG

下载: http://arxiv.org/abs/2405.01838v1

Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent

Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inherent in the optimization objective, which allows each agent to estimate its local gradient by local cost evaluation independently, without use of any consensus protocol. The proposed algorithm exhibits an asynchronous update scheme, and is designed for stochastic non-convex optimization with a possibly non-convex feasible domain based on the block coordinate descent method. The algorithm is later employed as a distributed model-free RL algorithm for distributed linear quadratic regulator design, where a learning graph is designed to describe the required interaction relationship among agents in distributed learning. We provide an empirical validation of the proposed algorithm to benchmark its performance on convergence rate and variance against a centralized ZOO algorithm.

Updated: 2024-05-03 03:56:09

标题: 异步分布式增强学习用于LQR控制的零阶块坐标下降

摘要: 最近引入的分布式零阶优化（ZOO）算法已经显示出它们在分布式强化学习（RL）中的实用性。不幸的是，在梯度估计过程中，几乎所有这些算法都需要与全局变量具有相同维度的随机样本和/或需要评估全局成本函数，这可能会导致大规模网络的高估计方差。在本文中，我们提出了一种新颖的分布式零阶算法，通过利用优化目标中固有的网络结构，允许每个代理独立地通过本地成本评估来估计其局部梯度，而无需使用任何共识协议。所提出的算法展示了一种异步更新方案，并且是基于块坐标下降方法设计的，适用于可能是非凸可行域的随机非凸优化。该算法随后被用作分布式无模型RL算法，用于分布式线性二次调节器设计，其中设计了一个学习图来描述分布式学习中代理之间所需的交互关系。我们通过实证验证所提出的算法，以评估其收敛速度和方差对比集中式ZOO算法的性能。

更新时间: 2024-05-03 03:56:09

领域: eess.SY,cs.AI,cs.LG,cs.SY,math.OC

下载: http://arxiv.org/abs/2107.12416v4

Holistic Evaluation Metrics: Use Case Sensitive Evaluation Metrics for Federated Learning

A large number of federated learning (FL) algorithms have been proposed for different applications and from varying perspectives. However, the evaluation of such approaches often relies on a single metric (e.g., accuracy). Such a practice fails to account for the unique demands and diverse requirements of different use cases. Thus, how to comprehensively evaluate an FL algorithm and determine the most suitable candidate for a designated use case remains an open question. To mitigate this research gap, we introduce the Holistic Evaluation Metrics (HEM) for FL in this work. Specifically, we collectively focus on three primary use cases, which are Internet of Things (IoT), smart devices, and institutions. The evaluation metric encompasses various aspects including accuracy, convergence, computational efficiency, fairness, and personalization. We then assign a respective importance vector for each use case, reflecting their distinct performance requirements and priorities. The HEM index is finally generated by integrating these metric components with their respective importance vectors. Through evaluating different FL algorithms in these three prevalent use cases, our experimental results demonstrate that HEM can effectively assess and identify the FL algorithms best suited to particular scenarios. We anticipate this work sheds light on the evaluation process for pragmatic FL algorithms in real-world applications.

Updated: 2024-05-03 03:39:26

标题: 整体评估指标：针对联邦学习的用例敏感评估指标

摘要: 许多联邦学习（FL）算法已被提出用于不同的应用和不同的视角。然而，这些方法的评估通常依赖于单一指标（例如准确性）。这样的做法未能考虑不同用例的独特需求和多样化要求。因此，如何全面评估FL算法并确定最适合指定用例的候选者仍然是一个悬而未决的问题。为了弥补这一研究差距，我们在这项工作中引入了FL的整体评估指标（HEM）。具体而言，我们共同关注三个主要用例，即物联网（IoT）、智能设备和机构。评估指标涵盖了各种方面，包括准确性、收敛性、计算效率、公平性和个性化。然后，我们为每个用例分配了相应的重要性向量，反映了它们独特的性能要求和优先级。最后，通过将这些度量组件与它们各自的重要性向量整合，生成了HEM指数。通过在这三种普遍的用例中评估不同的FL算法，我们的实验结果表明，HEM可以有效评估和识别最适合特定场景的FL算法。我们期待这项工作为实际应用中的实用FL算法的评估过程提供启示。

更新时间: 2024-05-03 03:39:26

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.02360v1

CVTGAD: Simplified Transformer with Cross-View Attention for Unsupervised Graph-level Anomaly Detection

Unsupervised graph-level anomaly detection (UGAD) has received remarkable performance in various critical disciplines, such as chemistry analysis and bioinformatics. Existing UGAD paradigms often adopt data augmentation techniques to construct multiple views, and then employ different strategies to obtain representations from different views for jointly conducting UGAD. However, most previous works only considered the relationship between nodes/graphs from a limited receptive field, resulting in some key structure patterns and feature information being neglected. In addition, most existing methods consider different views separately in a parallel manner, which is not able to explore the inter-relationship across different views directly. Thus, a method with a larger receptive field that can explore the inter-relationship across different views directly is in need. In this paper, we propose a novel Simplified Transformer with Cross-View Attention for Unsupervised Graph-level Anomaly Detection, namely, CVTGAD. To increase the receptive field, we construct a simplified transformer-based module, exploiting the relationship between nodes/graphs from both intra-graph and inter-graph perspectives. Furthermore, we design a cross-view attention mechanism to directly exploit the view co-occurrence between different views, bridging the inter-view gap at node level and graph level. To the best of our knowledge, this is the first work to apply transformer and cross attention to UGAD, which realizes graph neural network and transformer working collaboratively. Extensive experiments on 15 real-world datasets of 3 fields demonstrate the superiority of CVTGAD on the UGAD task. The code is available at \url{https://github.com/jindongli-Ai/CVTGAD}.

Updated: 2024-05-03 03:31:00

标题: CVTGAD：简化的具有跨视图注意力的变压器用于无监督图级异常检测

摘要: 无监督图级异常检测（UGAD）在各种关键学科中取得了显著的性能，例如化学分析和生物信息学。现有的UGAD范式通常采用数据增强技术来构建多个视图，然后采用不同策略从不同视图中获取表示，以共同进行UGAD。然而，大多数先前的工作只考虑了有限的接受域内节点/图之间的关系，导致一些关键的结构模式和特征信息被忽略。此外，大多数现有方法以并行方式分别考虑不同视图，无法直接探索跨不同视图之间的相互关系。因此，需要一种具有更大接受域的方法，可以直接探索跨不同视图之间的相互关系。在本文中，我们提出了一种新颖的简化Transformer与跨视图注意力相结合的无监督图级异常检测方法，即CVTGAD。为了增加接受域，我们构建了一个基于简化Transformer的模块，从图内和图间的角度利用节点/图之间的关系。此外，我们设计了一个跨视图注意力机制，直接利用不同视图之间的视图共现，连接节点级和图级的跨视图差距。据我们所知，这是第一个将Transformer和跨注意力应用于UGAD的工作，实现了图神经网络和Transformer的协同工作。对三个领域的15个真实世界数据集进行了大量实验证明了CVTGAD在UGAD任务上的优越性。代码可在\url{https://github.com/jindongli-Ai/CVTGAD}获取。

更新时间: 2024-05-03 03:31:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.02359v1

A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Mode

Time series data are ubiquitous across various domains, making time series analysis critically important. Traditional time series models are task-specific, featuring singular functionality and limited generalization capacity. Recently, large language foundation models have unveiled their remarkable capabilities for cross-task transferability, zero-shot/few-shot learning, and decision-making explainability. This success has sparked interest in the exploration of foundation models to solve multiple time series challenges simultaneously. There are two main research lines, namely \textbf{pre-training foundation models from scratch for time series} and \textbf{adapting large language foundation models for time series}. They both contribute to the development of a unified model that is highly generalizable, versatile, and comprehensible for time series analysis. This survey offers a 3E analytical framework for comprehensive examination of related research. Specifically, we examine existing works from three dimensions, namely \textbf{Effectiveness}, \textbf{Efficiency} and \textbf{Explainability}. In each dimension, we focus on discussing how related works devise tailored solution by considering unique challenges in the realm of time series.Furthermore, we provide a domain taxonomy to help followers keep up with the domain-specific advancements. In addition, we introduce extensive resources to facilitate the field's development, including datasets, open-source, time series libraries. A GitHub repository is also maintained for resource updates (https://github.com/start2020/Awesome-TimeSeries-LLM-FM).

Updated: 2024-05-03 03:12:55

标题: 一个时间序列基础模型调查：用大语言模型泛化时间序列表示

摘要: 时间序列数据在各个领域中普遍存在，使得时间序列分析至关重要。传统的时间序列模型是特定任务的，具有独特功能和有限的泛化能力。最近，大型语言基础模型展现出了在跨任务可迁移性、零样本/少样本学习和决策解释性方面的显著能力。这一成功引发了对基础模型探索，以同时解决多个时间序列挑战的兴趣。主要的研究线路有两条，即从头开始对时间序列进行预训练的基础模型和调整大型语言基础模型用于时间序列。它们都为高度泛化、多功能和易理解的时间序列分析统一模型的发展做出贡献。本调查提供了一个3E分析框架，用于全面审查相关研究。具体而言，我们从三个维度，即效果、效率和可解释性，来检查现有作品。在每个维度上，我们重点讨论相关作品如何通过考虑时间序列领域的独特挑战而设计出量身定制的解决方案。此外，我们提供了领域分类法，帮助追随者跟上领域特定的进展。此外，我们还介绍了广泛的资源，以促进该领域的发展，包括数据集、开源、时间序列库。我们还维护了一个GitHub仓库，用于资源更新（https://github.com/start2020/Awesome-TimeSeries-LLM-FM）。

更新时间: 2024-05-03 03:12:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.02358v1

Exploring Speech Pattern Disorders in Autism using Machine Learning

Diagnosing autism spectrum disorder (ASD) by identifying abnormal speech patterns from examiner-patient dialogues presents significant challenges due to the subtle and diverse manifestations of speech-related symptoms in affected individuals. This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues. Utilizing a dataset of recorded dialogues, we extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs), and balance. These features encompass various aspects of speech such as intonation, volume, rhythm, and speech rate, reflecting the complex nature of communicative behaviors in ASD. We employed machine learning for both classification and regression tasks to analyze these speech features. The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%. Regression models were developed to predict speech pattern related variables and a composite score from all variables, facilitating a deeper understanding of the speech dynamics associated with ASD. The effectiveness of machine learning in interpreting intricate speech patterns and the high classification accuracy underscore the potential of computational methods in supporting the diagnostic processes for ASD. This approach not only aids in early detection but also contributes to personalized treatment planning by providing insights into the speech and communication profiles of individuals with ASD.

Updated: 2024-05-03 02:59:15

标题: 利用机器学习探索自闭症中的语言模式障碍

摘要: 通过识别检查员-患者对话中的异常言语模式来诊断自闭症谱系障碍（ASD）存在重大挑战，这是因为受影响个体的言语相关症状表现微妙且多样化。本研究提出了一种通过分析检查员-患者对话来识别独特言语模式的综合方法。利用录制对话的数据集，我们提取了40个言语相关特征，分为频率、过零率、能量、频谱特征、Mel频率倒谱系数（MFCCs）和平衡。这些特征涵盖了言语的各个方面，如语调、音量、节奏和言语速度，反映了ASD中沟通行为的复杂性质。我们采用机器学习进行分类和回归任务来分析这些言语特征。分类模型旨在区分ASD和非ASD病例，准确率达到87.75%。开发了回归模型来预测与言语模式相关的变量和所有变量的综合评分，有助于更深入地理解与ASD相关的言语动态。机器学习在解释复杂言语模式和高分类准确率方面的有效性突显了计算方法在支持ASD诊断流程中的潜力。这种方法不仅有助于早期检测，还通过提供有关ASD个体言语和沟通特征的见解，有助于个性化治疗计划的制定。

更新时间: 2024-05-03 02:59:15

领域: cs.SD,cs.AI,cs.CL,eess.AS

下载: http://arxiv.org/abs/2405.05126v1

The Hidden Power of Pure 16-bit Floating-Point Neural Networks

Lowering the precision of neural networks from the prevalent 32-bit precision has long been considered harmful to performance, despite the gain in space and time. Many works propose various techniques to implement half-precision neural networks, but none study pure 16-bit settings. This paper investigates the unexpected performance gain of pure 16-bit neural networks over the 32-bit networks in classification tasks. We present extensive experimental results that favorably compare various 16-bit neural networks' performance to those of the 32-bit models. In addition, a theoretical analysis of the efficiency of 16-bit models is provided, which is coupled with empirical evidence to back it up. Finally, we discuss situations in which low-precision training is indeed detrimental.

Updated: 2024-05-03 02:56:49

标题: 纯16位浮点神经网络的潜在力量

摘要: 将神经网络的精度从普遍的32位精度降低，尽管在空间和时间上有所收益，但长期以来一直被认为会影响性能。许多研究提出了各种技术来实现半精度神经网络，但没有研究纯16位设置。本文研究了纯16位神经网络在分类任务中比32位网络意外获得的性能提升。我们提供了大量实验结果，有利地比较了各种16位神经网络的性能与32位模型的性能。此外，我们提供了对16位模型效率的理论分析，结合实证证据加以支持。最后，我们讨论了低精度训练确实会对某些情况产生不利影响的情况。

更新时间: 2024-05-03 02:56:49

领域: cs.LG,cs.AI,cs.PF

下载: http://arxiv.org/abs/2301.12809v2

Creation of Novel Soft Robot Designs using Generative AI

Soft robotics has emerged as a promising field with the potential to revolutionize industries such as healthcare and manufacturing. However, designing effective soft robots presents challenges, particularly in managing the complex interplay of material properties, structural design, and control strategies. Traditional design methods are often time-consuming and may not yield optimal designs. In this paper, we explore the use of generative AI to create 3D models of soft actuators. We create a dataset of over 70 text-shape pairings of soft pneumatic robot actuator designs, and adapt a latent diffusion model (SDFusion) to learn the data distribution and generate novel designs from it. By employing transfer learning and data augmentation techniques, we significantly improve the performance of the diffusion model. These findings highlight the potential of generative AI in designing complex soft robotic systems, paving the way for future advancements in the field.

Updated: 2024-05-03 02:55:27

标题: 使用生成式人工智能创建新型软机器人设计

摘要: 软机器人技术已经成为一个具有潜力颠覆医疗保健和制造业等行业的领域。然而，设计有效的软机器人存在挑战，特别是在管理材料属性、结构设计和控制策略的复杂相互作用方面。传统设计方法通常耗时且可能无法产生最佳设计。在本文中，我们探讨利用生成人工智能创建软致动器的3D模型。我们创建了一个包含超过70个软气动机器人致动器设计的文本-形状配对数据集，并调整了一个潜在扩散模型（SDFusion）来学习数据分布并从中生成新设计。通过使用迁移学习和数据增强技术，我们显著提高了扩散模型的性能。这些发现突显了生成人工智能在设计复杂软机器人系统方面的潜力，为未来该领域的进步铺平了道路。

更新时间: 2024-05-03 02:55:27

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.01824v1

Large Language Models for Mobility in Transportation Systems: A Survey on Forecasting Tasks

Mobility analysis is a crucial element in the research area of transportation systems. Forecasting traffic information offers a viable solution to address the conflict between increasing transportation demands and the limitations of transportation infrastructure. Predicting human travel is significant in aiding various transportation and urban management tasks, such as taxi dispatch and urban planning. Machine learning and deep learning methods are favored for their flexibility and accuracy. Nowadays, with the advent of large language models (LLMs), many researchers have combined these models with previous techniques or applied LLMs to directly predict future traffic information and human travel behaviors. However, there is a lack of comprehensive studies on how LLMs can contribute to this field. This survey explores existing approaches using LLMs for mobility forecasting problems. We provide a literature review concerning the forecasting applications within transportation systems, elucidating how researchers utilize LLMs, showcasing recent state-of-the-art advancements, and identifying the challenges that must be overcome to fully leverage LLMs in this domain.

Updated: 2024-05-03 02:54:43

标题: 大型语言模型在交通运输系统中的应用：预测任务调查

摘要: 移动性分析是交通系统研究领域的关键要素。预测交通信息提供了解决不断增长的交通需求与交通基础设施限制之间冲突的可行解决方案。预测人类出行对于辅助各种交通和城市管理任务至关重要，如出租车调度和城市规划。机器学习和深度学习方法因其灵活性和准确性而备受青睐。如今，随着大型语言模型（LLMs）的出现，许多研究人员已将这些模型与先前的技术相结合，或直接应用LLMs来预测未来的交通信息和人类出行行为。然而，在LLMs如何在该领域发挥作用方面缺乏全面研究。本调查探讨了使用LLMs进行移动性预测问题的现有方法。我们提供了关于交通系统中预测应用的文献综述，阐明了研究人员如何利用LLMs，展示了最新的技术进展，并确定了必须克服的挑战，以充分利用LLMs在该领域中。

更新时间: 2024-05-03 02:54:43

领域: cs.LG

下载: http://arxiv.org/abs/2405.02357v1

Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator

Nonlinearities are crucial for capturing complex input-output relationships especially in deep neural networks. However, nonlinear functions often incur various hardware and compute overheads. Meanwhile, stochastic computing (SC) has emerged as a promising approach to tackle this challenge by trading output precision for hardware simplicity. To this end, this paper proposes a first-of-its-kind stochastic multivariate universal-radix finite-state machine (SMURF) that harnesses SC for hardware-simplistic multivariate nonlinear function generation at high accuracy. We present the finite-state machine (FSM) architecture for SMURF, as well as analytical derivations of sampling gate coefficients for accurately approximating generic nonlinear functions. Experiments demonstrate the superiority of SMURF, requiring only 16.07% area and 14.45% power consumption of Taylor-series approximation, and merely 2.22% area of look-up table (LUT) schemes.

Updated: 2024-05-03 02:53:32

标题: 随机多元通用基数有限状态机：一个理论上和实际上优雅的非线性函数逼近器

摘要: 非线性对于捕捉复杂的输入输出关系尤其在深度神经网络中至关重要。然而，非线性函数往往会带来各种硬件和计算开销。与此同时，随机计算（SC）已经成为一种有前途的方法来解决这一挑战，通过以硬件简易性为代价来交换输出精度。为此，本文提出了一种首创的随机多变量通用基数有限状态机（SMURF），利用SC来实现硬件简单化的多变量非线性函数生成，同时保持高精度。我们展示了SMURF的有限状态机（FSM）架构，以及对于准确逼近通用非线性函数的采样门系数的分析推导。实验表明了SMURF的优越性，仅需Taylor级数逼近的16.07%面积和14.45%功耗，以及查找表（LUT）方案的仅2.22%面积。

更新时间: 2024-05-03 02:53:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.02356v1

CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation

Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In addition, programming languages are inherently logical and complex, making them hard to be correctly generated. Existing methods rely on multiple prompts to the large language model to explore better solutions, which is expensive. In this paper, we propose Syntax Graph Retrieval Augmented Code Generation (CodeGRAG) to enhance the performance of LLMs in single-round code generation tasks. CodeGRAG extracts and summarizes the control flow and data flow of code blocks to fill the gap between programming languages and natural language. The extracted external structural knowledge models the inherent flows of code blocks, which can facilitate LLMs for better understanding of code syntax and serve as a bridge among different programming languages. CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation, e.g., C++ for Python.

Updated: 2024-05-03 02:48:55

标题: CodeGRAG：用于检索增强跨语言代码生成的复合语法图提取

摘要: 利用大型语言模型生成代码在软件开发革命中展现出了令人期待的意义。尽管一般大型语言模型展示了智能，但由于自然语言和不同编程语言之间存在句法差距和词汇不匹配，它们在代码生成的特定性仍有待提高。此外，编程语言本质上是逻辑和复杂的，使得它们难以正确生成。现有方法依赖于向大型语言模型提供多个提示来探索更好的解决方案，这是昂贵的。在本文中，我们提出了语法图检索增强代码生成（CodeGRAG），以提升LLMs在单轮代码生成任务中的性能。CodeGRAG提取和总结代码块的控制流和数据流，以填补编程语言和自然语言之间的差距。提取的外部结构知识建模代码块的固有流程，可以促进LLMs更好地理解代码语法，并作为不同编程语言之间的桥梁。CodeGRAG显著提高了LLMs的代码生成能力，甚至可以为跨语言代码生成提供性能增益，例如将C++转换为Python。

更新时间: 2024-05-03 02:48:55

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2405.02355v1

Real Risks of Fake Data: Synthetic Data, Diversity-Washing and Consent Circumvention

Machine learning systems require representations of the real world for training and testing - they require data, and lots of it. Collecting data at scale has logistical and ethical challenges, and synthetic data promises a solution to these challenges. Instead of needing to collect photos of real people's faces to train a facial recognition system, a model creator could create and use photo-realistic, synthetic faces. The comparative ease of generating this synthetic data rather than relying on collecting data has made it a common practice. We present two key risks of using synthetic data in model development. First, we detail the high risk of false confidence when using synthetic data to increase dataset diversity and representation. We base this in the examination of a real world use-case of synthetic data, where synthetic datasets were generated for an evaluation of facial recognition technology. Second, we examine how using synthetic data risks circumventing consent for data usage. We illustrate this by considering the importance of consent to the U.S. Federal Trade Commission's regulation of data collection and affected models. Finally, we discuss how these two risks exemplify how synthetic data complicates existing governance and ethical practice; by decoupling data from those it impacts, synthetic data is prone to consolidating power away those most impacted by algorithmically-mediated harm.

Updated: 2024-05-03 02:47:44

标题: 假数据的真实风险：合成数据、多样性涂白和绕过同意

摘要: 机器学习系统需要对真实世界进行表示以进行训练和测试-它们需要数据，而且需要大量的数据。在规模上收集数据存在物流和伦理挑战，而合成数据承诺解决这些挑战。与其需要收集真实人脸照片来训练面部识别系统，模型创建者可以创建和使用逼真的合成人脸。相对于依赖收集数据，生成这些合成数据的相对容易性已经成为一种常见做法。我们提出了在模型开发中使用合成数据的两个关键风险。首先，我们详细说明了使用合成数据增加数据集多样性和代表性时存在的虚假信心高风险。我们基于对合成数据实际使用案例的检查，该案例中合成数据集用于评估面部识别技术。其次，我们研究了使用合成数据如何风险规避数据使用的同意。我们通过考虑同意对美国联邦贸易委员会对数据收集和受影响模型的监管的重要性来阐明这一点。最后，我们讨论了这两个风险如何体现合成数据如何复杂化现有的治理和伦理实践；通过将数据与受其影响的人分离，合成数据容易将权力从受算法介导伤害影响最大的人身上收回。

更新时间: 2024-05-03 02:47:44

领域: cs.CY,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.01820v1

Sequencer Level Security

Current blockchains do not provide any security guarantees to the smart contracts and their users as far as the content of the transactions is concerned. In the spirit of decentralization and censorship resistance, they follow the paradigm of including valid transactions in blocks without any further scrutiny. Rollups are a special kind of blockchains whose primary purpose is to scale the transaction throughput. Many of the existing rollups operate through a centrally operated sequencing protocol. In this paper, we introduce the Sequencer Level Security (SLS) protocol, an enhancement to sequencing protocols of rollups. This pioneering contribution explores the concept of the sequencer's capability to identify and temporarily quarantine malicious transactions instead of including them in blocks immediately. We describe the mechanics of the protocol for both the transactions submitted to the rollup mempool, as well as transactions originating from Layer one. We comment on topics such as trust and decentralization, and consider the security impact on the protocol itself. We implement a prototype of the SLS protocol, Zircuit, which is built on top of Geth and the OP stack. The SLS protocol described can be easily generalized to other rollup designs, and can be used for purposes other than security.

Updated: 2024-05-03 02:47:40

标题: 测序仪层级安全

摘要: 当前的区块链在涉及交易内容时并不提供任何安全保障给智能合约及其用户。为了实现去中心化和抗审查性，它们遵循将有效交易包含在区块中而无需进一步审查的范式。Rollups是一种特殊类型的区块链，其主要目的是提高交易吞吐量。许多现有的Rollups通过中心化的排序协议运作。在本文中，我们引入了Sequencer Level Security（SLS）协议，这是对Rollups排序协议的一种增强。这一开创性的贡献探索了排序者能够识别和暂时隔离恶意交易而不立即将其包含在区块中的能力概念。我们描述了协议的机制，包括提交到Rollup内存池的交易以及源自Layer one的交易。我们评论了信任和去中心化等主题，并考虑了协议本身对安全性的影响。我们实现了SLS协议的原型Zircuit，它构建在Geth和OP堆栈之上。所描述的SLS协议可以轻松推广到其他Rollup设计，并且可以用于除安全性之外的其他目的。

更新时间: 2024-05-03 02:47:40

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2405.01819v1

Decentralization of Ethereum's Builder Market

Blockchains protect an ecosystem worth more than $500bn with their strong security properties derived from the principle of decentralization. Is today's blockchain really decentralized? In this paper, we empirically studied one of the least decentralized parts of Ethereum -- the most used blockchain system in practice -- and shed light on the decentralization issue from a new perspective. To avoid centralization caused by Maximal Extractable Value (MEV), Ethereum adopts a novel mechanism that produces blocks through a builder market. After two years in operation, however, the builder market has evolved to a highly centralized one with three builders producing more than 90% of blocks. Why does the builder market centralize, given that it is permissionless and anyone can join? Moreover, what are the security implications of a centralized builder market to MEV-Boost auctions? Through a rigorous empirical study of the builder market's core mechanism, MEV-Boost auctions, we answered these two questions using a large-scale auction dataset we curated since 2022. Unlike previous works that focus on who wins the auctions, we focus on why they win, to shed light on the {openness, competitiveness, and efficiency} of MEV-Boost auctions. Our findings also help identify directions for improving the decentralization of builder markets.

Updated: 2024-05-03 02:38:41

标题: 以太坊构建者市场的去中心化

摘要: 区块链凭借去中心化原则产生的强大安全性特性，保护了价值超过5000亿美元的生态系统。然而，今天的区块链真的去中心化吗？本文通过实证研究以太坊中最不去中心化的部分之一 -- 实际应用中最常用的区块链系统 -- 从新的角度揭示了去中心化问题。为避免由最大可提取价值（MEV）引起的集中化，以太坊采用了一个通过建造者市场生产区块的新机制。然而，在两年运营后，建造者市场已演变为高度集中化，三名建造者生产了超过90%的区块。在一个无需许可、任何人都可以加入的情况下，为什么建造者市场会集中化呢？此外，中心化建造者市场对MEV-Boost拍卖会有什么安全影响？通过对建造者市场核心机制MEV-Boost拍卖会的严格实证研究，我们利用自2022年以来整理的大规模拍卖数据集回答了这两个问题。不同于以前关注谁赢得拍卖会的研究，我们专注于为什么他们赢得拍卖会，以揭示MEV-Boost拍卖会的开放性、竞争性和效率。我们的发现也有助于确定改进建造者市场去中心化的方向。

更新时间: 2024-05-03 02:38:41

领域: cs.CR

下载: http://arxiv.org/abs/2405.01329v2

Uniformly Stable Algorithms for Adversarial Training and Beyond

In adversarial machine learning, neural networks suffer from a significant issue known as robust overfitting, where the robust test accuracy decreases over epochs (Rice et al., 2020). Recent research conducted by Xing et al.,2021; Xiao et al., 2022 has focused on studying the uniform stability of adversarial training. Their investigations revealed that SGD-based adversarial training fails to exhibit uniform stability, and the derived stability bounds align with the observed phenomenon of robust overfitting in experiments. This motivates us to develop uniformly stable algorithms specifically tailored for adversarial training. To this aim, we introduce Moreau envelope-$\mathcal{A}$, a variant of the Moreau Envelope-type algorithm. We employ a Moreau envelope function to reframe the original problem as a min-min problem, separating the non-strong convexity and non-smoothness of the adversarial loss. Then, this approach alternates between solving the inner and outer minimization problems to achieve uniform stability without incurring additional computational overhead. In practical scenarios, we show the efficacy of ME-$\mathcal{A}$ in mitigating the issue of robust overfitting. Beyond its application in adversarial training, this represents a fundamental result in uniform stability analysis, as ME-$\mathcal{A}$ is the first algorithm to exhibit uniform stability for weakly-convex, non-smooth problems.

Updated: 2024-05-03 02:30:57

标题: 对抗性训练及其更进一步的均匀稳定算法

摘要: 在对抗机器学习中，神经网络存在一个重要问题，即被称为强健过拟合的问题，其中强健测试准确性随着时间的推移而降低（Rice等，2020）。最近，Xing等人（2021）；Xiao等人（2022）进行的研究集中在研究对抗训练的均匀稳定性。他们的调查发现，基于SGD的对抗训练未能表现出均匀稳定性，而得出的稳定性界与实验中观察到的强健过拟合现象相一致。这激励我们开发专门针对对抗训练的均匀稳定算法。为此，我们引入Moreau包络-$\mathcal{A}$，这是Moreau包络型算法的一个变体。我们利用Moreau包络函数将原始问题重新构造为一个极小-极小问题，分离对抗损失的非强凸性和非平滑性。然后，这种方法交替解决内部和外部极小化问题，以实现均匀稳定性而不增加额外的计算开销。在实际场景中，我们展示了ME-$\mathcal{A}$在减轻强健过拟合问题方面的有效性。除了在对抗训练中的应用之外，这代表了均匀稳定性分析中的一个基本结果，因为ME-$\mathcal{A}$是第一个在弱凸非平滑问题中表现出均匀稳定性的算法。

更新时间: 2024-05-03 02:30:57

领域: cs.LG

下载: http://arxiv.org/abs/2405.01817v1

Dynamic Against Dynamic: An Open-set Self-learning Framework

In open-set recognition, existing methods generally learn statically fixed decision boundaries using known classes to reject unknown classes. Though they have achieved promising results, such decision boundaries are evidently insufficient for universal unknown classes in dynamic and open scenarios as they can potentially appear at any position in the feature space. Moreover, these methods just simply reject unknown class samples during testing without any effective utilization for them. In fact, such samples completely can constitute the true instantiated representation of the unknown classes to further enhance the model's performance. To address these issues, this paper proposes a novel dynamic against dynamic idea, i.e., dynamic method against dynamic changing open-set world, where an open-set self-learning (OSSL) framework is correspondingly developed. OSSL starts with a good closed-set classifier trained by known classes and utilizes available test samples for model adaptation during testing, thus gaining the adaptability to changing data distributions. In particular, a novel self-matching module is designed for OSSL, which can achieve the adaptation in automatically identifying known class samples while rejecting unknown class samples which are further utilized to enhance the discriminability of the model as the instantiated representation of unknown classes. Our method establishes new performance milestones respectively in almost all standard and cross-data benchmarks.

Updated: 2024-05-03 02:29:09

标题: 动态对抗动态：一个开放式自学习框架

摘要: 在开放集识别中，现有方法通常学习静态固定的决策边界，使用已知类别来拒绝未知类别。尽管它们取得了令人满意的结果，但这种决策边界显然不足以应对动态和开放场景中的普遍未知类别，因为它们可能出现在特征空间的任何位置。此外，这些方法在测试过程中只是简单地拒绝未知类别样本，而没有对它们进行有效利用。实际上，这些样本完全可以构成未知类别的真实实例化表示，从而进一步提升模型的性能。为了解决这些问题，本文提出了一种新颖的动态对抗动态的思想，即动态方法对抗动态变化的开放集世界，其中相应地发展了一种开放集自学习（OSSL）框架。OSSL从一个由已知类别训练的良好封闭集分类器开始，并利用可用的测试样本进行模型适应，从而获得对数据分布变化的适应性。特别地，为OSSL设计了一个新颖的自匹配模块，可以通过自动识别已知类别样本而拒绝未知类别样本，进一步利用它们来增强模型的区分能力，作为未知类别的实例化表示。我们的方法在几乎所有标准和跨数据基准测试中分别建立了新的性能里程碑。

更新时间: 2024-05-03 02:29:09

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.17830v2

Toward end-to-end interpretable convolutional neural networks for waveform signals

This paper introduces a novel convolutional neural networks (CNN) framework tailored for end-to-end audio deep learning models, presenting advancements in efficiency and explainability. By benchmarking experiments on three standard speech emotion recognition datasets with five-fold cross-validation, our framework outperforms Mel spectrogram features by up to seven percent. It can potentially replace the Mel-Frequency Cepstral Coefficients (MFCC) while remaining lightweight. Furthermore, we demonstrate the efficiency and interpretability of the front-end layer using the PhysioNet Heart Sound Database, illustrating its ability to handle and capture intricate long waveform patterns. Our contributions offer a portable solution for building efficient and interpretable models for raw waveform data.

Updated: 2024-05-03 02:24:27

标题: 朝着端到端可解释卷积神经网络在波形信号中的应用

摘要: 本文介绍了一种为端到端音频深度学习模型量身定制的新型卷积神经网络（CNN）框架，展示了在效率和可解释性方面的进展。通过在三个标准语音情感识别数据集上进行五折交叉验证的基准实验，我们的框架比Mel频谱图特征表现出高达七个百分点的优势。它可以在保持轻量化的同时潜在地取代梅尔频率倒谱系数（MFCC）。此外，我们通过使用PhysioNet心音数据库展示了前端层的效率和可解释性，展示了其处理和捕捉复杂的长波形模式的能力。我们的贡献为构建适用于原始波形数据的高效和可解释模型提供了一个便携式解决方案。

更新时间: 2024-05-03 02:24:27

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2405.01815v1

Efficient and Economic Large Language Model Inference with Attention Offloading

Transformer-based large language models (LLMs) exhibit impressive performance in generative tasks but introduce significant challenges in real-world serving due to inefficient use of the expensive, computation-optimized accelerators. This mismatch arises from the autoregressive nature of LLMs, where the generation phase comprises operators with varying resource demands. Specifically, the attention operator is memory-intensive, exhibiting a memory access pattern that clashes with the strengths of modern accelerators, especially as context length increases. To enhance the efficiency and cost-effectiveness of LLM serving, we introduce the concept of attention offloading. This approach leverages a collection of cheap, memory-optimized devices for the attention operator while still utilizing high-end accelerators for other parts of the model. This heterogeneous setup ensures that each component is tailored to its specific workload, maximizing overall performance and cost efficiency. Our comprehensive analysis and experiments confirm the viability of splitting the attention computation over multiple devices. Also, the communication bandwidth required between heterogeneous devices proves to be manageable with prevalent networking technologies. To further validate our theory, we develop Lamina, an LLM inference system that incorporates attention offloading. Experimental results indicate that Lamina can provide 1.48x-12.1x higher estimated throughput per dollar than homogeneous solutions.

Updated: 2024-05-03 02:15:15

标题: 高效经济的大型语言模型推断与关注卸载

摘要: 基于Transformer的大型语言模型(LLMs)在生成任务中表现出令人印象深刻的性能，但由于对昂贵的、计算优化的加速器的低效使用，引入了在现实世界中提供服务时面临的重大挑战。这种不匹配是由于LLMs的自回归特性，生成阶段包括具有不同资源需求的运算符。具体来说，注意力运算符具有内存密集型的特点，展现出与现代加速器的优势相冲突的内存访问模式，尤其是当上下文长度增加时。为了增强LLM服务的效率和成本效益，我们引入了注意力卸载的概念。这种方法利用一组廉价的、内存优化的设备来处理注意力运算符，同时仍然利用高端加速器来处理模型的其他部分。这种异构设置确保每个组件都针对其特定的工作负载进行了定制，最大化整体性能和成本效率。我们的综合分析和实验证实了将注意力计算分布在多个设备上的可行性。此外，异构设备之间所需的通信带宽在普遍的网络技术下被证明是可以管理的。为了进一步验证我们的理论，我们开发了Lamina，一个集成了注意力卸载的LLM推理系统。实验结果表明，Lamina比同质解决方案提供了1.48倍至12.1倍更高的每美元估计吞吐量。

更新时间: 2024-05-03 02:15:15

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.01814v1

Heterogeneous network and graph attention auto-encoder for LncRNA-disease association prediction

The emerging research shows that lncRNAs are associated with a series of complex human diseases. However, most of the existing methods have limitations in identifying nonlinear lncRNA-disease associations (LDAs), and it remains a huge challenge to predict new LDAs. Therefore, the accurate identification of LDAs is very important for the warning and treatment of diseases. In this work, multiple sources of biomedical data are fully utilized to construct characteristics of lncRNAs and diseases, and linear and nonlinear characteristics are effectively integrated. Furthermore, a novel deep learning model based on graph attention automatic encoder is proposed, called HGATELDA. To begin with, the linear characteristics of lncRNAs and diseases are created by the miRNA-lncRNA interaction matrix and miRNA-disease interaction matrix. Following this, the nonlinear features of diseases and lncRNAs are extracted using a graph attention auto-encoder, which largely retains the critical information and effectively aggregates the neighborhood information of nodes. In the end, LDAs can be predicted by fusing the linear and nonlinear characteristics of diseases and lncRNA. The HGATELDA model achieves an impressive AUC value of 0.9692 when evaluated using a 5-fold cross-validation indicating its superior performance in comparison to several recent prediction models. Meanwhile, the effectiveness of HGATELDA in identifying novel LDAs is further demonstrated by case studies. the HGATELDA model appears to be a viable computational model for predicting LDAs.

Updated: 2024-05-03 02:15:05

标题: 异质网络和图注意力自编码器用于LncRNA-疾病关联预测

摘要: 新兴研究表明，长链非编码RNA(lncRNAs)与一系列复杂的人类疾病相关。然而，大多数现有方法在识别非线性lncRNA-疾病关联(LDAs)方面存在局限性，预测新的LDAs仍然是一个巨大挑战。因此，准确识别LDAs对于疾病的预警和治疗非常重要。在这项工作中，充分利用多种生物医学数据构建了lncRNAs和疾病的特征，并有效整合了线性和非线性特征。此外，提出了一种基于图注意力自动编码器的新型深度学习模型，称为HGATELDA。首先，通过miRNA-lncRNA相互作用矩阵和miRNA-疾病相互作用矩阵创建lncRNAs和疾病的线性特征。随后，利用图注意力自动编码器提取疾病和lncRNAs的非线性特征，大大保留了关键信息，并有效地聚合了节点的邻域信息。最后，通过融合疾病和lncRNA的线性和非线性特征，可以预测LDAs。在使用5折交叉验证评估时，HGATELDA模型实现了令人印象深刻的AUC值为0.9692，表明其在与几种最近的预测模型相比具有优越的性能。同时，通过案例研究进一步证明了HGATELDA在识别新的LDAs方面的有效性。HGATELDA模型似乎是一个可行的计算模型，用于预测LDAs。

更新时间: 2024-05-03 02:15:05

领域: cs.LG,cs.AI,q-bio.QM,I.2.4; I.2.6; I.2.m

下载: http://arxiv.org/abs/2405.02354v1

Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints

This article presents a deep reinforcement learning-based approach to tackle a persistent surveillance mission requiring a single unmanned aerial vehicle initially stationed at a depot with fuel or time-of-flight constraints to repeatedly visit a set of targets with equal priority. Owing to the vehicle's fuel or time-of-flight constraints, the vehicle must be regularly refueled, or its battery must be recharged at the depot. The objective of the problem is to determine an optimal sequence of visits to the targets that minimizes the maximum time elapsed between successive visits to any target while ensuring that the vehicle never runs out of fuel or charge. We present a deep reinforcement learning algorithm to solve this problem and present the results of numerical experiments that corroborate the effectiveness of this approach in comparison with common-sense greedy heuristics.

Updated: 2024-05-03 02:05:20

标题: 基于深度强化学习的单车辆燃料约束下的持续监视问题方法

摘要: 这篇文章提出了一种基于深度强化学习的方法来应对需要单个无人飞行器首先停靠在一个燃料或飞行时间受限的仓库，然后重复访问一组具有相同优先级的目标的持续监视任务。由于飞行器的燃料或飞行时间受限，飞行器必须定期加油，或者在仓库充电。问题的目标是确定一种最优的访问目标顺序，以最小化任何目标之间连续访问之间的最长时间，同时确保飞行器永远不会用完燃料或电荷。我们提出了一个深度强化学习算法来解决这个问题，并展示了数值实验的结果，证实了这种方法与常识贪婪启发式方法相比的有效性。

更新时间: 2024-05-03 02:05:20

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.06423v3

Non-linear Welfare-Aware Strategic Learning

This paper studies algorithmic decision-making in the presence of strategic individual behaviors, where an ML model is used to make decisions about human agents and the latter can adapt their behavior strategically to improve their future data. Existing results on strategic learning have largely focused on the linear setting where agents with linear labeling functions best respond to a (noisy) linear decision policy. Instead, this work focuses on general non-linear settings where agents respond to the decision policy with only "local information" of the policy. Moreover, we simultaneously consider the objectives of maximizing decision-maker welfare (model prediction accuracy), social welfare (agent improvement caused by strategic behaviors), and agent welfare (the extent that ML underestimates the agents). We first generalize the agent best response model in previous works to the non-linear setting, then reveal the compatibility of welfare objectives. We show the three welfare can attain the optimum simultaneously only under restrictive conditions which are challenging to achieve in non-linear settings. The theoretical results imply that existing works solely maximizing the welfare of a subset of parties inevitably diminish the welfare of the others. We thus claim the necessity of balancing the welfare of each party in non-linear settings and propose an irreducible optimization algorithm suitable for general strategic learning. Experiments on synthetic and real data validate the proposed algorithm.

Updated: 2024-05-03 01:50:03

标题: 非线性福利感知战略学习

摘要: 这篇论文研究了在存在战略个体行为的情况下的算法决策，其中使用ML模型来对人类代理做出决策，后者可以战略性地调整其行为以改善其未来数据。现有关于战略学习的结果主要集中在线性设置上，在那里具有线性标记函数的代理最好地响应于（嘈杂的）线性决策政策。相反，这项工作专注于代理以仅具有政策的“局部信息”响应决策政策的一般非线性设置。此外，我们同时考虑了最大化决策者福利（模型预测准确性）、社会福利（战略行为引起的代理改善）和代理福利（ML低估代理的程度）的目标。我们首先将以前作品中的代理最佳响应模型概括到非线性设置，然后揭示福利目标的兼容性。我们展示了只有在具有挑战性的非线性设置中才能同时实现三个福利的最优条件。理论结果暗示，现有的仅最大化某些方当事人福利的作品不可避免地会降低其他方当事人的福利。因此，我们声称在非线性设置中平衡每个方当事人的福利的必要性，并提出了一种适用于一般战略学习的不可约优化算法。对合成和真实数据的实验证实了提出的算法。

更新时间: 2024-05-03 01:50:03

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.01810v1

A Logic of Sattestation

We introduce a logic for reasoning about contextual trust for web addresses, provide a Kripke semantics for it, and prove its soundness under reasonable assumptions about principals' policies. Self-Authenticating Traditional Addresses (SATAs) are valid DNS addresses or URLs that are generally meaningful -- to both humans and web infrastructure -- and contain a commitment to a public key in the address itself. Trust in web addresses is currently established via domain name registration, TLS certificates, and other hierarchical elements of the internet infrastructure. SATAs support such structural roots of trust but also complementary contextual roots associated with descriptive properties. The existing structural roots leave web connections open to a variety of well-documented and significant hijack vulnerabilities. Contextual trust roots provide, among other things, stronger resistance to such vulnerabilities. We also consider labeled SATAs, which include descriptive properties such as that a SATA is an address for a news organization, a site belonging to a particular government or company, a site with information about a certain topic, etc. Our logic addresses both trust in the bound together identity of the address and trust in the binding of labels to it. Our logic allows reasoning about delegation of trust with respect to specified labels, relationships between labels that provide more or less specific information, and the interaction between these two aspects. In addition to soundness, we prove that if a principal trusts a particular identity (possibly with label), then either this trust is initially assumed, or there is a trust chain of delegations to this from initial trust assumptions. We also present an algorithm that effectively derives all possible trust statements from the set of initial trust assumptions and show it to be sound, complete, and terminating.

Updated: 2024-05-03 01:48:07

标题: 一种认证逻辑

摘要: 我们引入了一种用于推理有关网络地址的上下文信任的逻辑，为此提供了Kripke语义，并在对主体政策进行合理假设的情况下证明了其正确性。自验证传统地址（SATAs）是有效的DNS地址或URL，通常对人类和网络基础设施都有意义，并且包含地址本身的公钥承诺。目前，对网络地址的信任是通过域名注册、TLS证书和互联网基础设施的其他层次元素来建立的。SATAs支持这种信任结构根源，同时还支持与描述性属性相关的补充性上下文根源。现有的结构根源使网络连接容易受到各种充分记录和重要的劫持漏洞的影响。上下文信任根源提供了对这些漏洞更强大的抗拒能力。我们还考虑了带有描述性属性的标记化SATAs，例如某个SATAs是新闻机构的地址，属于特定政府或公司的网站，包含有关某个主题的信息的网站等。我们的逻辑涉及对地址的绑定身份的信任以及对标签与地址的绑定的信任。我们的逻辑允许推理关于与指定标签相关的信任委托，提供更多或更少具体信息的标签之间的关系，以及这两个方面的相互作用。除了正确性外，我们还证明如果一个主体信任特定身份（可能带有标签），那么这种信任要么最初被假定，要么存在从初始信任假设到该信任的委托链。我们还提出了一种有效推导出所有可能的信任声明的算法，并表明它是正确、完整和终止的。

更新时间: 2024-05-03 01:48:07

领域: cs.CR,cs.LO

下载: http://arxiv.org/abs/2405.01809v1

Algorithmic Decision-Making under Agents with Persistent Improvement

This paper studies algorithmic decision-making under human's strategic behavior, where a decision maker uses an algorithm to make decisions about human agents, and the latter with information about the algorithm may exert effort strategically and improve to receive favorable decisions. Unlike prior works that assume agents benefit from their efforts immediately, we consider realistic scenarios where the impacts of these efforts are persistent and agents benefit from efforts by making improvements gradually. We first develop a dynamic model to characterize persistent improvements and based on this construct a Stackelberg game to model the interplay between agents and the decision-maker. We analytically characterize the equilibrium strategies and identify conditions under which agents have incentives to improve. With the dynamics, we then study how the decision-maker can design an optimal policy to incentivize the largest improvements inside the agent population. We also extend the model to settings where 1) agents may be dishonest and game the algorithm into making favorable but erroneous decisions; 2) honest efforts are forgettable and not sufficient to guarantee persistent improvements. With the extended models, we further examine conditions under which agents prefer honest efforts over dishonest behavior and the impacts of forgettable efforts.

Updated: 2024-05-03 01:36:35

标题: 算法决策在具有持续改进的代理机制下

摘要: 本文研究了在人类战略行为下的算法决策，决策者使用算法来做出关于人类代理的决策，后者在了解算法信息的情况下可能会策略性地努力并改进以获得有利的决策。与先前假设代理立即从他们的努力中受益的作品不同，我们考虑了现实场景，这些努力的影响是持久的，并且代理通过逐渐改进来从努力中受益。我们首先开发了一个动态模型来描述持久改进，并基于此构建了一个斯塔克贝格博弈，以模拟代理和决策者之间的相互作用。我们对平衡策略进行了分析性的刻画，并确定了代理有动机改进的条件。通过这种动态，我们研究了决策者如何设计最佳政策来激励代理人群内的最大改进。我们还将模型扩展到以下设置：1）代理可能不诚实，并欺骗算法使其做出有利但错误的决策；2）诚实的努力是可忘却的，不能保证持久的改进。在扩展模型中，我们进一步考察了代理更倾向于诚实努力而不是不诚实行为的条件，以及可忘却努力的影响。

更新时间: 2024-05-03 01:36:35

领域: cs.GT,cs.AI

下载: http://arxiv.org/abs/2405.01807v1

KDPrint: Passive Authentication using Keystroke Dynamics-to-Image Encoding via Standardization

In contemporary mobile user authentication systems, verifying user legitimacy has become paramount due to the widespread use of smartphones. Although fingerprint and facial recognition are widely used for mobile authentication, PIN-based authentication is still employed as a fallback option if biometric authentication fails after multiple attempts. Consequently, the system remains susceptible to attacks targeting the PIN when biometric methods are unsuccessful. In response to these concerns, two-factor authentication has been proposed, albeit with the caveat of increased user effort. To address these challenges, this paper proposes a passive authentication system that utilizes keystroke data, a byproduct of primary authentication methods, for background user authentication. Additionally, we introduce a novel image encoding technique to capture the temporal dynamics of keystroke data, overcoming the performance limitations of deep learning models. Furthermore, we present a methodology for selecting suitable behavioral biometric features for image representation. The resulting images, depicting the user's PIN input patterns, enhance the model's ability to uniquely identify users through the secondary channel with high accuracy. Experimental results demonstrate that the proposed imaging approach surpasses existing methods in terms of information capacity. In self-collected dataset experiments, incorporating features from prior research, our method achieved an Equal Error Rate (EER) of 6.7%, outperforming the existing method's 47.7%. Moreover, our imaging technique attained a True Acceptance Rate (TAR) of 94.4% and a False Acceptance Rate (FAR) of 8% for 17 users.

Updated: 2024-05-03 01:24:18

标题: KDPrint：通过标准化的按键动态到图像编码实现被动认证

摘要: 在当代移动用户身份验证系统中，由于智能手机的广泛使用，验证用户的合法性变得至关重要。尽管指纹和面部识别广泛用于移动身份验证，但在生物特征认证多次失败后，PIN码身份验证仍然被用作备用选项。因此，当生物特征方法失败时，系统仍容易受到针对PIN码的攻击。为了解决这些问题，提出了双因素身份验证，尽管这会增加用户的努力。为了解决这些挑战，本文提出了一种利用按键数据的被动身份验证系统，这是主要身份验证方法的副产品。此外，我们引入了一种新颖的图像编码技术，用于捕捉按键数据的时间动态，克服了深度学习模型的性能限制。此外，我们提出了一种选择适合图像表示的行为生物特征的方法论。结果显示，通过描述用户的PIN码输入模式的图像增强了模型通过次要通道高准确度地识别用户的能力。实验结果表明，所提出的成像方法在信息容量方面超越了现有方法。在自收集的数据集实验中，结合来自先前研究的特征，我们的方法实现了6.7%的等误差率（EER），优于现有方法的47.7%。此外，我们的成像技术为17名用户实现了94.4%的真接受率（TAR）和8%的假接受率（FAR）。

更新时间: 2024-05-03 01:24:18

领域: cs.CR

下载: http://arxiv.org/abs/2405.01080v2

UniGen: Universal Domain Generalization for Sentiment Classification via Zero-shot Dataset Generation

Although pre-trained language models have exhibited great flexibility and versatility with prompt-based few-shot learning, they suffer from the extensive parameter size and limited applicability for inference. Recent studies have suggested that PLMs be used as dataset generators and a tiny task-specific model be trained to achieve efficient inference. However, their applicability to various domains is limited because they tend to generate domain-specific datasets. In this work, we propose a novel approach to universal domain generalization that generates a dataset regardless of the target domain. This allows for generalization of the tiny task model to any domain that shares the label space, thus enhancing the real-world applicability of the dataset generation paradigm. Our experiments indicate that the proposed method accomplishes generalizability across various domains while using a parameter set that is orders of magnitude smaller than PLMs.

Updated: 2024-05-03 01:20:28

标题: UniGen：通过零样本数据集生成实现情感分类的通用领域泛化

摘要: 尽管预训练语言模型展现出了在基于提示的少样本学习中的灵活性和多功能性，但它们受到了庞大的参数大小和应用范围有限的影响。最近的研究表明，PLMs可用作数据集生成器，并训练一个小型任务特定模型以实现高效推理。然而，它们在各个领域的适用性受到限制，因为它们倾向于生成特定领域的数据集。在这项工作中，我们提出了一种新颖的通用领域泛化方法，可以生成不考虑目标领域的数据集。这使得小型任务模型可以泛化到任何共享标签空间的领域，从而增强了数据集生成范式的实际应用性。我们的实验表明，所提出的方法在使用比PLMs小几个数量级的参数集的情况下实现了跨多个领域的泛化能力。

更新时间: 2024-05-03 01:20:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.01022v2

Exploiting ChatGPT for Diagnosing Autism-Associated Language Disorders and Identifying Distinct Features

Diagnosing language disorders associated with autism is a complex and nuanced challenge, often hindered by the subjective nature and variability of traditional assessment methods. Traditional diagnostic methods not only require intensive human effort but also often result in delayed interventions due to their lack of speed and specificity. In this study, we explored the application of ChatGPT, a state of the art large language model, to overcome these obstacles by enhancing diagnostic accuracy and profiling specific linguistic features indicative of autism. Leveraging ChatGPT advanced natural language processing capabilities, this research aims to streamline and refine the diagnostic process. Specifically, we compared ChatGPT's performance with that of conventional supervised learning models, including BERT, a model acclaimed for its effectiveness in various natural language processing tasks. We showed that ChatGPT substantially outperformed these models, achieving over 13% improvement in both accuracy and F1 score in a zero shot learning configuration. This marked enhancement highlights the model potential as a superior tool for neurological diagnostics. Additionally, we identified ten distinct features of autism associated language disorders that vary significantly across different experimental scenarios. These features, which included echolalia, pronoun reversal, and atypical language usage, were crucial for accurately diagnosing ASD and customizing treatment plans. Together, our findings advocate for adopting sophisticated AI tools like ChatGPT in clinical settings to assess and diagnose developmental disorders. Our approach not only promises greater diagnostic precision but also aligns with the goals of personalized medicine, potentially transforming the evaluation landscape for autism and similar neurological conditions.

Updated: 2024-05-03 01:04:28

标题: 利用ChatGPT诊断自闭症相关语言障碍并识别独特特征

摘要: 诊断与自闭症相关的语言障碍是一个复杂而微妙的挑战，往往受传统评估方法的主观性和可变性的影响。传统的诊断方法不仅需要大量人力投入，而且往往由于缺乏速度和特异性而导致干预延迟。在这项研究中，我们探讨了ChatGPT，一种最先进的大型语言模型的应用，以克服这些障碍，提高诊断准确性并分析表明自闭症的特定语言特征。利用ChatGPT先进的自然语言处理能力，本研究旨在简化和优化诊断过程。具体来说，我们将ChatGPT的性能与传统的监督学习模型进行了比较，包括BERT，一个在各种自然语言处理任务中被誉为有效的模型。我们表明ChatGPT在零-shot学习配置中明显优于这些模型，在准确性和F1评分方面实现了超过13%的提升。这种显著提升突显了该模型作为神经诊断的优越工具的潜力。此外，我们确定了与自闭症相关的语言障碍的十种不同特征，这些特征在不同的实验场景中显著变化。这些特征包括模仿言语、代词颠倒和非典型语言使用，对于准确诊断自闭症和定制治疗计划至关重要。总之，我们的研究结果倡导在临床环境中采用像ChatGPT这样的复杂AI工具来评估和诊断发育障碍。我们的方法不仅承诺更高的诊断精度，而且符合个性化医学的目标，潜在地改变了自闭症和类似神经疾病的评估格局。

更新时间: 2024-05-03 01:04:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.01799v1

How to Use Quantum Indistinguishability Obfuscation

Quantum copy protection, introduced by Aaronson, enables giving out a quantum program-description that cannot be meaningfully duplicated. Despite over a decade of study, copy protection is only known to be possible for a very limited class of programs. As our first contribution, we show how to achieve "best-possible" copy protection for all programs. We do this by introducing quantum state indistinguishability obfuscation (qsiO), a notion of obfuscation for quantum descriptions of classical programs. We show that applying qsiO to a program immediately achieves best-possible copy protection. Our second contribution is to show that, assuming injective one-way functions exist, qsiO is concrete copy protection for a large family of puncturable programs -- significantly expanding the class of copy-protectable programs. A key tool in our proof is a new variant of unclonable encryption (UE) that we call coupled unclonable encryption (cUE). While constructing UE in the standard model remains an important open problem, we are able to build cUE from one-way functions. If we additionally assume the existence of UE, then we can further expand the class of puncturable programs for which qsiO is copy protection. Finally, we construct qsiO relative to an efficient quantum oracle.

Updated: 2024-05-03 01:00:27

标题: 如何使用量子不可区分性混淆

摘要: 量子复制保护是由Aaronson引入的，可以提供一个不能被有意义地复制的量子程序描述。尽管经过十多年的研究，复制保护只被认为对一类非常有限的程序可能。作为我们的第一个贡献，我们展示了如何实现所有程序的“最佳可能”复制保护。我们通过引入量子状态不可区分混淆（qsiO），一种用于经典程序的量子描述的混淆概念，来实现这一点。我们展示，将qsiO应用于一个程序立即实现了最佳可能的复制保护。我们的第二个贡献是展示，在假设存在单向可逆函数的情况下，qsiO是一种针对大量可穿孔程序的具体复制保护--显著扩展了可复制保护程序的类别。我们证明中的一个关键工具是我们称之为耦合不可克隆加密（cUE）的新变体的不可复制加密（UE）。虽然在标准模型中构建UE仍然是一个重要的未解决问题，但我们能够从单向函数构建cUE。如果我们另外假设存在UE，那么我们可以进一步扩展可穿孔程序的类别，其中qsiO是复制保护。最后，我们相对于一个高效的量子Oracle构建了qsiO。

更新时间: 2024-05-03 01:00:27

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2311.07794v3

Learning under Imitative Strategic Behavior with Unforeseeable Outcomes

Machine learning systems have been widely used to make decisions about individuals who may best respond and behave strategically to receive favorable outcomes, e.g., they may genuinely improve the true labels or manipulate observable features directly to game the system without changing labels. Although both behaviors have been studied (often as two separate problems) in the literature, most works assume individuals can (i) perfectly foresee the outcomes of their behaviors when they best respond; (ii) change their features arbitrarily as long as it is affordable, and the costs they need to pay are deterministic functions of feature changes. In this paper, we consider a different setting and focus on imitative strategic behaviors with unforeseeable outcomes, i.e., individuals manipulate/improve by imitating the features of those with positive labels, but the induced feature changes are unforeseeable. We first propose a Stackelberg game to model the interplay between individuals and the decision-maker, under which we examine how the decision-maker's ability to anticipate individual behavior affects its objective function and the individual's best response. We show that the objective difference between the two can be decomposed into three interpretable terms, with each representing the decision-maker's preference for a certain behavior. By exploring the roles of each term, we further illustrate how a decision-maker with adjusted preferences can simultaneously disincentivize manipulation, incentivize improvement, and promote fairness.

Updated: 2024-05-03 00:53:58

标题: 学习在模仿战略行为中的应用，具有不可预见的结果

摘要: 机器学习系统被广泛用于决定个体如何最好地回应和行为策略地获得有利结果，例如，它们可能真正改善真实标签或直接操纵可观察特征以欺骗系统而不改变标签。尽管文献中已经研究了这两种行为（通常作为两个独立的问题），但大多数作品假设个体可以（i）完全预见到他们最佳回应时的结果；（ii）只要付得起，就可以任意改变其特征，并且他们需要支付的成本是特征变化的确定性函数。在本文中，我们考虑了一个不同的设置，并关注具有不可预见结果的模仿战略行为，即，个体通过模仿具有正标签的特征来操纵/改善，但引起的特征变化是不可预见的。我们首先提出了一个斯塔克贝格博弈来模拟个体和决策者之间的互动，在此基础上，我们研究了决策者预见个体行为的能力如何影响其目标函数和个体的最佳回应。我们表明，两者之间的目标差异可以分解为三个可解释的项，每个项代表决策者对某种行为的偏好。通过探索每个项的作用，我们进一步说明了如何调整偏好的决策者可以同时消除操纵，激励改进并促进公平。

更新时间: 2024-05-03 00:53:58

领域: cs.AI

下载: http://arxiv.org/abs/2405.01797v1

The Role of Human Factors in the LastPass Breach

This paper examines the complex nature of cyber attacks through an analysis of the LastPass breach. It argues for the integration of human-centric considerations into cybersecurity measures, focusing on mitigating factors such as goal-directed behavior, cognitive overload, human biases (e.g., optimism, anchoring), and risky behaviors. Findings from an analysis of this breach offers support to the perspective that addressing both the human and technical dimensions of cyber defense can significantly enhance the resilience of cyber systems against complex threats. This means maintaining a balanced approach while simultaneously simplifying user interactions, making users aware of biases, and discouraging risky practices are essential for preventing cyber incidents.

Updated: 2024-05-03 00:41:29

标题: 《人为因素在LastPass泄露事件中的作用》

摘要: 这篇论文通过对LastPass泄露事件的分析，探讨了网络攻击的复杂性。论文主张将以人为中心的考虑因素纳入网络安全措施中，重点关注减轻因素，如目标导向行为、认知过载、人类偏见（例如乐观主义、锚定）和冒险行为。对这次泄露事件的分析结果支持了处理网络防御中人类和技术维度的观点，可以显著增强网络系统对复杂威胁的韧性。这意味着维持一种平衡的方法，同时简化用户交互，让用户意识到偏见，并且避免冒险行为对于预防网络事件至关重要。

更新时间: 2024-05-03 00:41:29

领域: cs.HC,cs.CR

下载: http://arxiv.org/abs/2405.01795v1

DORSal: Diffusion for Object-centric Representations of Scenes et al

Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes. As a consequence, generalization to unseen scenes and objects, rendering novel views from just a single or a handful of input images, and controllable scene generation that supports editing, is now possible. However, training jointly on a large number of scenes typically compromises rendering quality when compared to single-scene optimized models such as NeRFs. In this paper, we leverage recent progress in diffusion models to equip 3D scene representation learning models with the ability to render high-fidelity novel views, while retaining benefits such as object-level scene editing to a large degree. In particular, we propose DORSal, which adapts a video diffusion architecture for 3D scene generation conditioned on frozen object-centric slot-based representations of scenes. On both complex synthetic multi-object scenes and on the real-world large-scale Street View dataset, we show that DORSal enables scalable neural rendering of 3D scenes with object-level editing and improves upon existing approaches.

Updated: 2024-05-03 00:37:12

标题: DORSal：面向场景物体中心表示的扩散等

摘要: 最近在3D场景理解方面取得的进展使得能够在多样化场景的大型数据集上进行可扩展的表示学习。因此，对于未见过的场景和对象的泛化，仅从单个或少数几张输入图像中生成新视图，以及支持编辑的可控场景生成，现在都是可能的。然而，通常在大量场景上联合训练会在与单个场景优化模型（如NeRFs）相比时牺牲渲染质量。在本文中，我们利用最近在扩散模型方面的进展，为3D场景表示学习模型提供了渲染高保真新视图的能力，同时保留了诸如对象级场景编辑等好处。具体来说，我们提出了DORSal，它将视频扩散架构调整为基于冻结的以对象为中心的槽式表示的场景的3D场景生成。在复杂的合成多对象场景和真实世界的大规模街景视图数据集上，我们展示了DORSal实现了具有对象级编辑的可扩展神经渲染3D场景，并改进了现有方法。

更新时间: 2024-05-03 00:37:12

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2306.08068v3

Learning Robust Autonomous Navigation and Locomotion for Wheeled-Legged Robots

Autonomous wheeled-legged robots have the potential to transform logistics systems, improving operational efficiency and adaptability in urban environments. Navigating urban environments, however, poses unique challenges for robots, necessitating innovative solutions for locomotion and navigation. These challenges include the need for adaptive locomotion across varied terrains and the ability to navigate efficiently around complex dynamic obstacles. This work introduces a fully integrated system comprising adaptive locomotion control, mobility-aware local navigation planning, and large-scale path planning within the city. Using model-free reinforcement learning (RL) techniques and privileged learning, we develop a versatile locomotion controller. This controller achieves efficient and robust locomotion over various rough terrains, facilitated by smooth transitions between walking and driving modes. It is tightly integrated with a learned navigation controller through a hierarchical RL framework, enabling effective navigation through challenging terrain and various obstacles at high speed. Our controllers are integrated into a large-scale urban navigation system and validated by autonomous, kilometer-scale navigation missions conducted in Zurich, Switzerland, and Seville, Spain. These missions demonstrate the system's robustness and adaptability, underscoring the importance of integrated control systems in achieving seamless navigation in complex environments. Our findings support the feasibility of wheeled-legged robots and hierarchical RL for autonomous navigation, with implications for last-mile delivery and beyond.

Updated: 2024-05-03 00:29:20

标题: 学习鲁棒的轮腿机器人自主导航和运动

摘要: 自主轮腿机器人有潜力改变物流系统，提高在城市环境中的操作效率和适应性。然而，在城市环境中导航对机器人提出了独特的挑战，需要创新的解决方案来解决 locomotion 和导航问题。这些挑战包括在各种地形上需要自适应的运动能力，以及有效地绕过复杂动态障碍物的能力。本研究介绍了一个完全集成的系统，包括自适应运动控制、移动感知的本地导航规划，以及城市内的大规模路径规划。通过使用无模型增强学习（RL）技术和特权学习，我们开发了一个多功能运动控制器。该控制器实现了在各种崎岖地形上高效且稳健的运动，通过在步行和驾驶模式之间的平稳过渡实现。它与一个学习的导航控制器通过一个层次化的RL框架紧密集成，使得在具有挑战性的地形和各种障碍物上高速导航成为可能。我们的控制器已集成到一个大规模城市导航系统中，并通过在瑞士苏黎世和西班牙塞维利亚进行的自主、公里级别的导航任务进行验证。这些任务展示了系统的稳健性和适应性，强调了在实现在复杂环境中无缝导航中集成控制系统的重要性。我们的研究结果支持轮腿机器人和层次RL在自主导航中的可行性，并对最后一公里交付及其他领域产生影响。

更新时间: 2024-05-03 00:29:20

领域: cs.RO,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.01792v1

Understanding Position Bias Effects on Fairness in Social Multi-Document Summarization

Text summarization models have typically focused on optimizing aspects of quality such as fluency, relevance, and coherence, particularly in the context of news articles. However, summarization models are increasingly being used to summarize diverse sources of text, such as social media data, that encompass a wide demographic user base. It is thus crucial to assess not only the quality of the generated summaries, but also the extent to which they can fairly represent the opinions of diverse social groups. Position bias, a long-known issue in news summarization, has received limited attention in the context of social multi-document summarization. We deeply investigate this phenomenon by analyzing the effect of group ordering in input documents when summarizing tweets from three distinct linguistic communities: African-American English, Hispanic-aligned Language, and White-aligned Language. Our empirical analysis shows that although the textual quality of the summaries remains consistent regardless of the input document order, in terms of fairness, the results vary significantly depending on how the dialect groups are presented in the input data. Our results suggest that position bias manifests differently in social multi-document summarization, severely impacting the fairness of summarization models.

Updated: 2024-05-03 00:19:31

标题: 理解位置偏见对社交多文档总结中公平性的影响

摘要: 文本摘要模型通常侧重于优化流畅性、相关性和连贯性等质量方面，特别是在新闻文章的背景下。然而，摘要模型越来越多地被用于总结各种文本来源，比如社交媒体数据，这些数据涵盖了广泛的人口群体。因此，不仅要评估生成摘要的质量，还要评估它们能否公平地代表各种社会群体的观点。在新闻摘要中长期存在的位置偏见问题，在社交多文档摘要的背景下受到了有限的关注。我们通过分析在总结来自三种不同语言社区的推文时，输入文档中的群组顺序对结果的影响来深入研究这一现象：非裔美国英语、西班牙对齐语言和白人对齐语言。我们的实证分析表明，尽管总结的文本质量在输入文档顺序不同的情况下保持一致，但就公平性而言，结果却因方言群体在输入数据中的呈现方式而有显著差异。我们的结果表明，在社交多文档摘要中，位置偏见表现出不同的方式，严重影响了摘要模型的公平性。

更新时间: 2024-05-03 00:19:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.01790v1

Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming

Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F*. Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600K lines of open-source F* programs and proofs, including software used in production systems ranging from Windows and Linux, to Python and Firefox. Our dataset includes around 32K top-level F* definitions, each representing a type-directed program and proof synthesis problem -- producing a definition given a formal specification expressed as an F* type. We provide a program-fragment checker that queries F* to check the correctness of candidate solutions. We believe this is the largest corpus of SMT-assisted program proofs coupled with a reproducible program-fragment checker. Grounded in this dataset, we investigate the use of AI to synthesize programs and their proofs in F*, with promising results. Our main finding in that the performance of fine-tuned smaller language models (such as Phi-2 or StarCoder) compare favorably with large language models (such as GPT-4), at a much lower computational cost. We also identify various type-based retrieval augmentation techniques and find that they boost performance significantly. With detailed error analysis and case studies, we identify potential strengths and weaknesses of models and techniques and suggest directions for future improvements.

Updated: 2024-05-03 00:14:33

标题: 朝向SMT辅助证明导向编程的神经合成

摘要: 证明导向的程序将计算内容与程序正确性证明相结合。然而，尽管使用了Satisfiability Modulo Theories（SMT）求解器自动化诸如F*等语言中的证明，但编程和证明所涉及的人力工作仍然相当可观。为了推动使用人工智能自动化构建证明导向程序的研究，我们精选了一个包含60万行开源F*程序和证明的数据集，其中包括用于生产系统的软件，从Windows和Linux到Python和Firefox。我们的数据集包括约32K个顶层F*定义，每个定义代表一个类型导向的程序和证明综合问题 -- 根据作为F*类型表达的形式规范生成一个定义。我们提供一个程序片段检查器，查询F*来检查候选解决方案的正确性。我们认为这是与可重现的程序片段检查器相结合的最大SMT辅助程序证明语料库。基于这个数据集，我们研究了使用人工智能合成F*中程序和证明的方法，并取得了令人兴奋的结果。我们的主要发现是，经过精细调整的较小语言模型（如Phi-2或StarCoder）的性能与大型语言模型（如GPT-4）相比具有明显优势，而计算成本则低得多。我们还确定了各种基于类型的检索增强技术，并发现它们显著提升了性能。通过详细的错误分析和案例研究，我们确定了模型和技术的潜在优势和劣势，并提出了未来改进的方向。

更新时间: 2024-05-03 00:14:33

领域: cs.PL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2405.01787v1

M3Act: Learning from Synthetic Human Group Activities

The study of complex human interactions and group activities has become a focal point in human-centric computer vision. However, progress in related tasks is often hindered by the challenges of obtaining large-scale labeled datasets from real-world scenarios. To address the limitation, we introduce M3Act, a synthetic data generator for multi-view multi-group multi-person human atomic actions and group activities. Powered by Unity Engine, M3Act features multiple semantic groups, highly diverse and photorealistic images, and a comprehensive set of annotations, which facilitates the learning of human-centered tasks across single-person, multi-person, and multi-group conditions. We demonstrate the advantages of M3Act across three core experiments. The results suggest our synthetic dataset can significantly improve the performance of several downstream methods and replace real-world datasets to reduce cost. Notably, M3Act improves the state-of-the-art MOTRv2 on DanceTrack dataset, leading to a hop on the leaderboard from 10th to 2nd place. Moreover, M3Act opens new research for controllable 3D group activity generation. We define multiple metrics and propose a competitive baseline for the novel task. Our code and data are available at our project page: http://cjerry1243.github.io/M3Act.

Updated: 2024-05-03 00:00:27

标题: M3Act：从合成人类群体活动中学习

摘要: 复杂的人类互动和群体活动研究已成为人类中心计算机视觉的焦点。然而，相关任务的进展常常受到从现实场景中获取大规模标记数据集的挑战所限制。为了解决这一限制，我们引入了M3Act，一个用于多视角多群体多人类原子动作和群体活动的合成数据生成器。由Unity Engine提供支持，M3Act具有多个语义群体、高度多样化和逼真的图像，以及全面的标注集，有助于学习跨单人、多人和多群体条件下的以人为中心的任务。我们通过三个核心实验展示了M3Act的优势。结果表明，我们的合成数据集可以显著改善几种下游方法的性能，并取代真实世界数据集以降低成本。值得注意的是，M3Act在DanceTrack数据集上改进了最先进的MOTRv2，使排行榜从第10名跃升至第2名。此外，M3Act为可控的3D群体活动生成开辟了新的研究领域。我们定义了多个指标，并为新任务提出了一个竞争基线。我们的代码和数据可在我们的项目页面上找到：http://cjerry1243.github.io/M3Act。

更新时间: 2024-05-03 00:00:27

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2306.16772v6