Arxiv Day: Article

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

Understanding people's social interactions in complex real-world scenarios often relies on intricate mental reasoning. To truly understand how and why people interact with one another, we must infer the underlying mental states that give rise to the social interactions, i.e., Theory of Mind reasoning in multi-agent interactions. Additionally, social interactions are often multi-modal -- we can watch people's actions, hear their conversations, and/or read about their past behaviors. For AI systems to successfully and safely interact with people in real-world environments, they also need to understand people's mental states as well as their inferences about each other's mental states based on multi-modal information about their interactions. For this, we introduce MuMA-ToM, a Multi-modal Multi-Agent Theory of Mind benchmark. MuMA-ToM is the first multi-modal Theory of Mind benchmark that evaluates mental reasoning in embodied multi-agent interactions. In MuMA-ToM, we provide video and text descriptions of people's multi-modal behavior in realistic household environments. Based on the context, we then ask questions about people's goals, beliefs, and beliefs about others' goals. We validated MuMA-ToM in a human experiment and provided a human baseline. We also proposed a novel multi-modal, multi-agent ToM model, LIMP (Language model-based Inverse Multi-agent Planning). Our experimental results show that LIMP significantly outperforms state-of-the-art methods, including large multi-modal models (e.g., GPT-4o, Gemini-1.5 Pro) and a recent multi-modal ToM model, BIP-ALM.

Updated: 2024-08-25 23:58:25

标题: MuMA-ToM：多模态多智能体心智理论

摘要: 理解人们在复杂现实场景中的社交互动常常依赖于复杂的心理推理。要真正理解人们如何以及为什么相互作用，我们必须推断导致社交互动的潜在心理状态，即多智能体互动中的心灵理论推理。此外，社交互动通常是多模态的——我们可以观察人们的行为，听到他们的对话，和/或阅读关于他们过去行为的信息。为了使人工智能系统能够在现实环境中与人们成功且安全地互动，它们还需要理解人们的心理状态以及基于多模态信息推断彼此心理状态的推理。为此，我们引入了MuMA-ToM，一个多模态多智能体心灵理论基准。MuMA-ToM是第一个评估具体多智能体互动中心理推理的多模态心灵理论基准。在MuMA-ToM中，我们提供了人们在现实家庭环境中的多模态行为的视频和文本描述。根据背景，我们然后提出关于人们的目标、信念以及对他人目标的信念的问题。我们在人类实验中验证了MuMA-ToM，并提供了人类基线。我们还提出了一种新颖的多模态、多智能体心灵理论模型LIMP（基于语言模型的逆向多智能体规划）。我们的实验结果表明，LIMP在性能上明显优于最先进的方法，包括大型多模态模型（例如GPT-4o、Gemini-1.5 Pro）和最近的多模态心灵理论模型BIP-ALM。

更新时间: 2024-08-25 23:58:25

领域: cs.AI,cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.12574v2

Optimizing Luxury Vehicle Dealership Networks: A Graph Neural Network Approach to Site Selection

This study presents a novel application of Graph Neural Networks (GNNs) to optimize dealership network planning for a luxury car manufacturer in the U.S. By conducting a comprehensive literature review on dealership location determinants, the study identifies 65 county-level explanatory variables, augmented by two additional measures of regional interconnectedness derived from social and mobility data. An ablation study involving 34 variable combinations and ten state-of-the-art GNN operators reveals key insights into the predictive power of various variables, particularly highlighting the significance of competition, demographic factors, and mobility patterns in influencing dealership location decisions. The analysis pinpoints seven specific counties as promising targets for network expansion. This research not only illustrates the effectiveness of GNNs in solving complex geospatial decision-making problems but also provides actionable recommendations and valuable methodological insights for industry practitioners.

Updated: 2024-08-25 23:49:35

标题: 优化豪华车经销商网络：一种图神经网络方法用于选址选择

摘要: 这项研究展示了图神经网络（GNNs）在美国一家豪华汽车制造商优化经销商网络规划中的新应用。通过进行对经销商位置决定因素的全面文献综述，该研究确定了65个县级解释变量，再加上从社交和移动数据中得出的两项区域互联性衡量指标。一项涉及34种变量组合和十种最先进的GNN运算符的消融研究揭示了各种变量的预测能力的关键见解，特别突出了竞争、人口因素和移动模式在影响经销商位置决策中的重要性。分析指出了七个具体县作为网络扩展的有前景的目标。这项研究不仅展示了GNN在解决复杂地理空间决策问题中的有效性，还为行业从业者提供了可操作的建议和宝贵的方法论见解。

更新时间: 2024-08-25 23:49:35

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2408.13961v1

Sample Amplification: Increasing Dataset Size even when Learning is Impossible

Given data drawn from an unknown distribution, $D$, to what extent is it possible to ``amplify'' this dataset and output an even larger set of samples that appear to have been drawn from $D$? We formalize this question as follows: an $(n,m)$ $\text{amplification procedure}$ takes as input $n$ independent draws from an unknown distribution $D$, and outputs a set of $m > n$ ``samples''. An amplification procedure is valid if no algorithm can distinguish the set of $m$ samples produced by the amplifier from a set of $m$ independent draws from $D$, with probability greater than $2/3$. Perhaps surprisingly, in many settings, a valid amplification procedure exists, even when the size of the input dataset, $n$, is significantly less than what would be necessary to learn $D$ to non-trivial accuracy. Specifically we consider two fundamental settings: the case where $D$ is an arbitrary discrete distribution supported on $\le k$ elements, and the case where $D$ is a $d$-dimensional Gaussian with unknown mean, and fixed covariance. In the first case, we show that an $\left(n, n + \Theta(\frac{n}{\sqrt{k}})\right)$ amplifier exists. In particular, given $n=O(\sqrt{k})$ samples from $D$, one can output a set of $m=n+1$ datapoints, whose total variation distance from the distribution of $m$ i.i.d. draws from $D$ is a small constant, despite the fact that one would need quadratically more data, $n=\Theta(k)$, to learn $D$ up to small constant total variation distance. In the Gaussian case, we show that an $\left(n,n+\Theta(\frac{n}{\sqrt{d}} )\right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $\Theta(d)$ samples. In both the discrete and Gaussian settings, we show that these results are tight, to constant factors. Beyond these results, we formalize a number of curious directions for future research along this vein.

Updated: 2024-08-25 23:38:40

标题: 样本放大：增加数据集大小，即使学习是不可能的

摘要: 给定从未知分布$D$中绘制的数据，到底有多大可能性“放大”这个数据集并输出一个更大的样本集，看起来好像是从$D$中绘制出来的？我们将这个问题形式化如下：一个$(n,m)$的“放大过程”接受来自未知分布$D$的$n$个独立抽样作为输入，并输出一个包含$m > n$个“样本”的集合。一个放大过程是有效的，如果没有算法可以以大于$2/3$的概率区分由放大器产生的$m$个样本集和一个从$D$中独立抽样得到的$m$个样本集。也许令人惊讶的是，在许多情况下，即使输入数据集的大小$n$明显小于学习$D$到非平凡精度所需的数量，有效的放大过程也是存在的。具体来说，我们考虑两种基本情况：一种是$D$是一个支持$\le k$个元素的任意离散分布的情况，另一种是$D$是一个具有未知均值和固定协方差的$d$维高斯分布的情况。在第一种情况下，我们展示了一个$(n, n + \Theta(\frac{n}{\sqrt{k}}))$的放大器存在。特别地，给定$n=O(\sqrt{k})$个来自$D$的样本，可以输出一个包含$m=n+1$个数据点的集合，其与$m$个独立从$D$中绘制的样本的分布的总变差距离是一个小常数，尽管学习$D$到小常数总变差距离需要二次更多的数据，$n=\Theta(k)$。在高斯情况下，我们展示了一个$(n,n+\Theta(\frac{n}{\sqrt{d}}))$的放大器存在，即使学习分布到小常数总变差距离需要$\Theta(d)$个样本。在离散和高斯设置中，我们展示了这些结果都是紧致的，到常数因子。除了这些结果，我们还形式化了一些有趣的未来研究方向。

更新时间: 2024-08-25 23:38:40

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/1904.12053v3

The Over-Certainty Phenomenon in Modern UDA Algorithms

When neural networks are confronted with unfamiliar data that deviate from their training set, this signifies a domain shift. While these networks output predictions on their inputs, they typically fail to account for their level of familiarity with these novel observations. While prevailing works navigate unsupervised domain adaptation with the goal of curtailing model entropy, they unintentionally birth models that grapple with sub-optimal calibration - a dilemma we term the over-certainty phenomenon. In this paper, we uncover a concerning trend in unsupervised domain adaptation and propose a solution that not only maintains accuracy but also addresses calibration.

Updated: 2024-08-25 23:06:51

标题: 现代UDA算法中的超度量现象

摘要: 当神经网络面对与它们的训练集不同的陌生数据时，这意味着存在领域转移。虽然这些网络输出它们的输入的预测结果，但它们通常未能考虑到这些新观察的熟悉程度。虽然目前的研究致力于通过无监督领域适应来缩小模型熵，但它们无意中产生了需要应对次优校准的模型，这个困境我们称之为过度确定现象。在本文中，我们揭示了无监督领域适应中令人担忧的趋势，并提出了一个既保持准确性又解决校准问题的解决方案。

更新时间: 2024-08-25 23:06:51

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2404.16168v3

Better Not to Propagate: Understanding Edge Uncertainty and Over-smoothing in Signed Graph Neural Networks

Traditional Graph Neural Networks (GNNs) rely on network homophily, which can lead to performance degradation due to over-smoothing in many real-world heterophily scenarios. Recent studies analyze the smoothing effect (separability) after message-passing (MP), depending on the expectation of node features. Regarding separability gain, they provided theoretical backgrounds on over-smoothing caused by various propagation schemes, including positive, signed, and blocked MPs. More recently, by extending these theorems, some works have suggested improvements in signed propagation under multiple classes. However, prior works assume that the error ratio of all propagation schemes is fixed, failing to investigate this phenomenon correctly. To solve this problem, we propose a novel method for estimating homophily and edge error ratio, integrated with dynamic selection between blocked and signed propagation during training. Our theoretical analysis, supported by extensive experiments, demonstrates that blocking MP can be more effective than signed propagation under high edge error ratios, improving the performance in both homophilic and heterophilic graphs.

Updated: 2024-08-25 22:30:42

标题: 最好不要传播：理解有向图神经网络中的边缘不确定性和过度平滑化

摘要: 传统图神经网络（GNNs）依赖于网络同质性，这可能导致在许多现实世界的异质性情景中由于过度平滑而导致性能下降。最近的研究分析了消息传递（MP）后的平滑效应（可分离性），取决于节点特征的期望。关于可分离性增益，他们提供了关于过度平滑的理论背景，这是由于各种传播方案，包括正向、有符号和被阻止的MP。最近，通过扩展这些定理，一些工作建议在多个类别下改进有符号传播。然而，先前的工作假设所有传播方案的错误比率是固定的，未能正确调查这一现象。为了解决这个问题，我们提出了一种新颖的方法，用于估计同质性和边缘错误比率，在训练过程中集成了阻塞和有符号传播之间的动态选择。我们的理论分析，支持广泛的实验证明，在高边缘错误率下，阻塞MP可能比有符号传播更有效，从而提高同质性和异质性图的性能。

更新时间: 2024-08-25 22:30:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.04895v2

A Spectral View of Adversarially Robust Features

Given the apparent difficulty of learning models that are robust to adversarial perturbations, we propose tackling the simpler problem of developing adversarially robust features. Specifically, given a dataset and metric of interest, the goal is to return a function (or multiple functions) that 1) is robust to adversarial perturbations, and 2) has significant variation across the datapoints. We establish strong connections between adversarially robust features and a natural spectral property of the geometry of the dataset and metric of interest. This connection can be leveraged to provide both robust features, and a lower bound on the robustness of any function that has significant variance across the dataset. Finally, we provide empirical evidence that the adversarially robust features given by this spectral approach can be fruitfully leveraged to learn a robust (and accurate) model.

Updated: 2024-08-25 22:14:33

标题: 对抗鲁棒特征的频谱视角

摘要: 鉴于学习对抗扰动具有鲁棒性的模型似乎很困难，我们提出解决更简单的问题，即开发对抗性鲁棒特征。具体而言，给定一个数据集和感兴趣的度量，目标是返回一个函数（或多个函数），该函数1）对对抗性扰动具有鲁棒性，2）在数据点之间具有显著的变化。我们建立了对抗性鲁棒特征与感兴趣数据集和度量的几何结构的自然谱特性之间的强连接。这种连接可以被利用来提供鲁棒特征，并对任何在数据集中具有显著方差的函数的鲁棒性提供下界。最后，我们提供实证证据表明，通过这种谱方法给出的对抗性鲁棒特征可以有效地被利用来学习一个鲁棒（和准确）的模型。

更新时间: 2024-08-25 22:14:33

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/1811.06609v2

Bridging the Gap between Real-world and Synthetic Images for Testing Autonomous Driving Systems

Deep Neural Networks (DNNs) for Autonomous Driving Systems (ADS) are typically trained on real-world images and tested using synthetic simulator images. This approach results in training and test datasets with dissimilar distributions, which can potentially lead to erroneously decreased test accuracy. To address this issue, the literature suggests applying domain-to-domain translators to test datasets to bring them closer to the training datasets. However, translating images used for testing may unpredictably affect the reliability, effectiveness and efficiency of the testing process. Hence, this paper investigates the following questions in the context of ADS: Could translators reduce the effectiveness of images used for ADS-DNN testing and their ability to reveal faults in ADS-DNNs? Can translators result in excessive time overhead during simulation-based testing? To address these questions, we consider three domain-to-domain translators: CycleGAN and neural style transfer, from the literature, and SAEVAE, our proposed translator. Our results for two critical ADS tasks -- lane keeping and object detection -- indicate that translators significantly narrow the gap in ADS test accuracy caused by distribution dissimilarities between training and test data, with SAEVAE outperforming the other two translators. We show that, based on the recent diversity, coverage, and fault-revealing ability metrics for testing deep-learning systems, translators do not compromise the diversity and the coverage of test data, nor do they lead to revealing fewer faults in ADS-DNNs. Further, among the translators considered, SAEVAE incurs a negligible overhead in simulation time and can be efficiently integrated into simulation-based testing. Finally, we show that translators increase the correlation between offline and simulation-based testing results, which can help reduce the cost of simulation-based testing.

Updated: 2024-08-25 22:07:41

标题: 连接现实世界和合成图像以测试自动驾驶系统的差距

摘要: 自动驾驶系统（ADS）的深度神经网络（DNN）通常在真实世界的图像上进行训练，并使用合成模拟器图像进行测试。这种方法导致训练和测试数据集具有不同的分布，可能会导致测试准确性错误降低。为解决这一问题，文献建议将域到域的转换器应用于测试数据集，使其与训练数据集更接近。然而，翻译用于测试的图像可能会不可预测地影响测试过程的可靠性、有效性和效率。因此，本文在ADS的背景下研究了以下问题：转换器是否会降低用于ADS-DNN测试的图像的有效性以及其揭示ADS-DNN中的故障的能力？转换器是否会导致模拟测试过程中的过多时间开销？为解决这些问题，我们考虑了文献中的三种域到域转换器：CycleGAN和神经风格转移，以及我们提出的SAEVAE转换器。我们针对两个关键的ADS任务——车道保持和目标检测——的结果表明，转换器显著缩小了ADS测试准确性的差距，这种差距是由于训练和测试数据之间的分布差异造成的，其中SAEVAE优于其他两种转换器。我们展示了基于最近的多样性、覆盖率和测试深度学习系统的故障揭示能力指标，转换器不会损害测试数据的多样性和覆盖率，也不会导致在ADS-DNN中揭示更少的故障。此外，在考虑的转换器中，SAEVAE在模拟时间方面产生了可忽略的开销，并可以有效地集成到基于模拟的测试中。最后，我们展示了转换器提高了离线和基于模拟的测试结果之间的相关性，这有助于降低基于模拟的测试成本。

更新时间: 2024-08-25 22:07:41

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2408.13950v1

Outlier-Insensitive Kalman Filtering: Theory and Applications

State estimation of dynamical systems from noisy observations is a fundamental task in many applications. It is commonly addressed using the linear Kalman filter (KF), whose performance can significantly degrade in the presence of outliers in the observations, due to the sensitivity of its convex quadratic objective function. To mitigate such behavior, outlier detection algorithms can be applied. In this work, we propose a parameter-free algorithm which mitigates the harmful effect of outliers while requiring only a short iterative process of the standard update step of the KF. To that end, we model each potential outlier as a normal process with unknown variance and apply online estimation through either expectation maximization or alternating maximization algorithms. Simulations and field experiment evaluations demonstrate competitive performance of our method, showcasing its robustness to outliers in filtering scenarios compared to alternative algorithms.

Updated: 2024-08-25 21:59:43

标题: 异常值不敏感的卡尔曼滤波：理论与应用

摘要: 从嘈杂的观测中对动态系统进行状态估计是许多应用中的基本任务。通常使用线性卡尔曼滤波器（KF）来解决这个问题，但在观测中存在离群值时，由于其凸二次目标函数的敏感性，其性能可能会显著降低。为了缓解这种行为，可以应用离群值检测算法。在这项工作中，我们提出了一种无参数算法，可以在仅需要短时间的KF标准更新步骤迭代过程中减轻离群值的有害影响。为此，我们将每个潜在的离群值建模为具有未知方差的正态过程，并通过期望最大化或交替最大化算法进行在线估计。模拟和实地实验评估展示了我们方法在过滤场景中对离群值的鲁棒性，相比于其他算法，其表现具有竞争力。

更新时间: 2024-08-25 21:59:43

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2309.09505v3

Adelie: Detection and prevention of Byzantine behaviour in DAG-based consensus protocols

Recent developments in the Byzantine Fault Tolerant consensus protocols have shown the DAG-based protocols to be a very promising technique. While early implementations of DAG-based protocols such as Narwhal/Bullshark trade high throughput for a low latency, the latest versions of DAG-based protocols such as Mysticeti and Shoal++ show that indeed a latency comparable to that of traditional consensus protocols such as HotStuff can be achieve with the DAG-based consensus protocols while still maintaining high throughput. Mysticeti in particular achieves a low latency by implementing a novel approach of using an uncertified DAG - a significant breakthrough comparing to the certified DAG used in the previous generations of the protocol. However, the uncertified DAG exposes the system to new vectors of attacks by Byzantine validators that did not exist in the certified DAG protocols. In this paper we describe those issues and present the Adelie protocol, that addresses issues that comes with an uncertified DAG. We also incorporate some of the techniques from the Shoal++ to reduce latency even further. This paper also presents an implementation of Adelie protocol - bftd that demonstrates yet another breakthrough in the maximum achieved TPS and low latency.

Updated: 2024-08-25 21:16:58

标题: 阿德利：在基于DAG的共识协议中检测和预防拜占庭行为

摘要: 近年来，拜占庭容错共识协议的发展显示出基于有向无环图（DAG）的协议是一种非常有前途的技术。尽管早期的DAG-based协议实现，如Narwhal/Bullshark在高吞吐量和低延迟之间进行权衡，但最新版本的DAG-based协议，如Mysticeti和Shoal++表明，与HotStuff等传统共识协议相比，确实可以实现与之相媲美的延迟，并仍保持高吞吐量。特别是Mysticeti通过实现一种使用未经认证的DAG的新方法实现了低延迟 - 与之前一代协议中使用的认证DAG相比，这是一项重大突破。然而，未经认证的DAG使系统暴露于拜占庭验证者的新攻击向量，这在认证DAG协议中是不存在的。在本文中，我们描述了这些问题，并提出了Adelie协议，解决了未经认证DAG所带来的问题。我们还借鉴了Shoal++的一些技术，进一步降低了延迟。本文还介绍了Adelie协议的实现 - bftd，展示了在最大实现的交易每秒数和低延迟方面的又一突破。

更新时间: 2024-08-25 21:16:58

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2408.02000v2

Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning

In reinforcement learning (RL), exploiting environmental symmetries can significantly enhance efficiency, robustness, and performance. However, ensuring that the deep RL policy and value networks are respectively equivariant and invariant to exploit these symmetries is a substantial challenge. Related works try to design networks that are equivariant and invariant by construction, limiting them to a very restricted library of components, which in turn hampers the expressiveness of the networks. This paper proposes a method to construct equivariant policies and invariant value functions without specialized neural network components, which we term equivariant ensembles. We further add a regularization term for adding inductive bias during training. In a map-based path planning case study, we show how equivariant ensembles and regularization benefit sample efficiency and performance.

Updated: 2024-08-25 20:59:10

标题: 等变合集和正则化用于基于地图的路径规划中的强化学习

摘要: 在强化学习（RL）中，利用环境对称性可以显著提高效率、稳健性和性能。然而，确保深度RL策略和价值网络分别是等变和不变的，以利用这些对称性是一个重大挑战。相关工作尝试通过构建设计等变和不变的网络来实现，但这限制了它们的组件库，进而限制了网络的表达能力。本文提出了一种构建等变策略和不变价值函数的方法，而无需专门的神经网络组件，我们称之为等变集合。我们进一步在训练过程中添加正则化项来增加归纳偏差。在基于地图的路径规划案例研究中，我们展示了等变集合和正则化如何提高样本效率和性能。

更新时间: 2024-08-25 20:59:10

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2403.12856v3

Learning to Move Like Professional Counter-Strike Players

In multiplayer, first-person shooter games like Counter-Strike: Global Offensive (CS:GO), coordinated movement is a critical component of high-level strategic play. However, the complexity of team coordination and the variety of conditions present in popular game maps make it impractical to author hand-crafted movement policies for every scenario. We show that it is possible to take a data-driven approach to creating human-like movement controllers for CS:GO. We curate a team movement dataset comprising 123 hours of professional game play traces, and use this dataset to train a transformer-based movement model that generates human-like team movement for all players in a "Retakes" round of the game. Importantly, the movement prediction model is efficient. Performing inference for all players takes less than 0.5 ms per game step (amortized cost) on a single CPU core, making it plausible for use in commercial games today. Human evaluators assess that our model behaves more like humans than both commercially-available bots and procedural movement controllers scripted by experts (16% to 59% higher by TrueSkill rating of "human-like"). Using experiments involving in-game bot vs. bot self-play, we demonstrate that our model performs simple forms of teamwork, makes fewer common movement mistakes, and yields movement distributions, player lifetimes, and kill locations similar to those observed in professional CS:GO match play.

Updated: 2024-08-25 20:43:34

标题: 学习像职业反恐精英玩家一样移动

摘要: 在多人游戏中，像《反恐精英：全球攻势》（CS:GO）这样的第一人称射击游戏中，协调运动是高水平战略游戏的关键组成部分。然而，团队协调的复杂性以及流行游戏地图中存在的各种条件使得为每种情景编写手工制作的运动策略变得不切实际。我们展示了通过数据驱动的方法可以为CS:GO创建类似人类的运动控制器。我们策划了一个团队运动数据集，包括123小时的专业比赛轨迹，并使用这个数据集来训练一个基于变换器的运动模型，为游戏中“夺旗”回合的所有玩家生成类似人类的团队运动。重要的是，运动预测模型是高效的。在单个CPU核心上进行所有玩家的推断每个游戏步骤的成本不到0.5毫秒（摊销成本），这使得今天在商业游戏中使用它成为可能。人类评估者评估我们的模型比商业可用的机器人和专家编写的程序化运动控制器表现更像人类（根据“类人”TrueSkill评分高出16%至59%）。通过涉及游戏内机器人对战自我对弈的实验，我们展示了我们的模型执行简单形式的团队合作，减少常见运动错误，并产生类似于专业CS:GO比赛中观察到的运动分布、玩家寿命和击杀位置。

更新时间: 2024-08-25 20:43:34

领域: cs.LG,cs.AI,cs.GR

下载: http://arxiv.org/abs/2408.13934v1

Network Level Spatial Temporal Traffic State Forecasting with Hierarchical Attention LSTM (HierAttnLSTM)

Traffic state data, such as speed, volume and travel time collected from ubiquitous traffic monitoring sensors require advanced network level analytics for forecasting and identifying significant traffic patterns. This paper leverages diverse traffic state datasets from the Caltrans Performance Measurement System (PeMS) hosted on the open benchmark and achieved promising performance compared to well recognized spatial-temporal models. Drawing inspiration from the success of hierarchical architectures in various Artificial Intelligence (AI) tasks, we integrate cell and hidden states from low-level to high-level Long Short-Term Memory (LSTM) networks with an attention pooling mechanism, similar to human perception systems. The developed hierarchical structure is designed to account for dependencies across different time scales, capturing the spatial-temporal correlations of network-level traffic states, enabling the prediction of traffic states for all corridors rather than a single link or route. The efficiency of designed attention-based LSTM is analyzed by ablation study. Comparative results with baseline LSTM models demonstrate that the Hierarchical Attention LSTM (HierAttnLSTM) model not only provides higher prediction accuracy but also effectively forecasts unusual congestion patterns. Data and code are made publicly available to support reproducible scientific research.

Updated: 2024-08-25 20:43:12

标题: 使用分层注意力LSTM（HierAttnLSTM）进行网络级时空交通状态预测

摘要: 交通状态数据，如速度、流量和旅行时间，是从无处不在的交通监测传感器收集的，需要先进的网络级分析来预测和识别重要的交通模式。本文利用来自Caltrans绩效测量系统（PeMS）的多样化交通状态数据集，这些数据集托管在开放基准上，并与众所周知的空间 - 时间模型相比取得了令人满意的表现。受到分层体系结构在各种人工智能（AI）任务中的成功启发，我们将来自低级到高级的长短期记忆（LSTM）网络的单元和隐藏状态与注意力池化机制相结合，类似于人类感知系统。开发的分层结构旨在考虑不同时间尺度之间的依赖关系，捕捉网络级交通状态的空间 - 时间相关性，实现对所有走廊的交通状态进行预测，而不是仅限于单个链接或路由。设计的基于注意力的LSTM的效率通过消融研究进行了分析。与基线LSTM模型的比较结果表明，分层注意力LSTM（HierAttnLSTM）模型不仅提供更高的预测准确性，而且有效地预测异常拥堵模式。数据和代码已公开提供，以支持可重复的科学研究。

更新时间: 2024-08-25 20:43:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2201.05760v4

Efficient Shield Synthesis via State-Space Transformation

We consider the problem of synthesizing safety strategies for control systems, also known as shields. Since the state space is infinite, shields are typically computed over a finite-state abstraction, with the most common abstraction being a rectangular grid. However, for many systems, such a grid does not align well with the safety property or the system dynamics. That is why a coarse grid is rarely sufficient, but a fine grid is typically computationally infeasible to obtain. In this paper, we show that appropriate state-space transformations can still allow to use a coarse grid at almost no computational overhead. We demonstrate in three case studies that our transformation-based synthesis outperforms a standard synthesis by several orders of magnitude. In the first two case studies, we use domain knowledge to select a suitable transformation. In the third case study, we instead report on results in engineering a transformation without domain knowledge.

Updated: 2024-08-25 20:16:51

标题: 通过状态空间转换实现高效的护盾合成

摘要: 我们考虑合成控制系统的安全策略的问题，也称为防护屏障。由于状态空间是无限的，因此通常通过有限状态抽象来计算防护屏障，最常见的抽象是矩形网格。然而，对于许多系统，这样的网格与安全性质或系统动态并不完全匹配。这就是为什么粗网格很少足够，但通常计算得到一个精细网格在计算上是不可行的。在本文中，我们展示适当的状态空间转换仍然能够以几乎没有计算开销的方式使用粗网格。我们在三个案例研究中展示，我们基于转换的合成优于标准合成数个数量级。在前两个案例研究中，我们使用领域知识选择适当的转换。在第三个案例研究中，我们报道了在没有领域知识的情况下工程转换的结果。

更新时间: 2024-08-25 20:16:51

领域: cs.LO,cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.19911v2

Boolean Matrix Logic Programming

We describe a datalog query evaluation approach based on efficient and composable boolean matrix manipulation modules. We first define an overarching problem, Boolean Matrix Logic Programming (BMLP), which uses boolean matrices as an alternative computation to evaluate datalog programs. We develop two novel BMLP modules for bottom-up inferences on linear dyadic recursive datalog programs, and show how additional modules can extend this capability to compute both linear and non-linear recursive datalog programs of arity two. Our empirical results demonstrate that these modules outperform general-purpose and specialised systems by factors of 30x and 9x, respectively, when evaluating large programs with millions of facts. This boolean matrix approach significantly enhances the efficiency of datalog querying to support logic programming techniques.

Updated: 2024-08-25 20:06:45

标题: 布尔矩阵逻辑编程

摘要: 我们描述了一种基于高效和可组合的布尔矩阵操作模块的Datalog查询评估方法。我们首先定义了一个总体问题，即布尔矩阵逻辑编程（BMLP），它使用布尔矩阵作为评估Datalog程序的替代计算。我们开发了两个新颖的BMLP模块，用于对线性二元递归Datalog程序进行自底向上推理，并展示了如何通过额外模块将这种能力扩展到计算元数为二的线性和非线性递归Datalog程序。我们的实证结果表明，当评估包含数百万事实的大型程序时，这些模块分别比通用和专用系统的性能提高了30倍和9倍。这种布尔矩阵方法显著提高了Datalog查询的效率，以支持逻辑编程技术。

更新时间: 2024-08-25 20:06:45

领域: cs.SC,cs.AI,cs.LO

下载: http://arxiv.org/abs/2408.10369v2

FedGlu: A personalized federated learning-based glucose forecasting algorithm for improved performance in glycemic excursion regions

Continuous glucose monitoring (CGM) devices provide real-time glucose monitoring and timely alerts for glycemic excursions, improving glycemic control among patients with diabetes. However, identifying rare events like hypoglycemia and hyperglycemia remain challenging due to their infrequency. Moreover, limited access to sensitive patient data hampers the development of robust machine learning models. Our objective is to accurately predict glycemic excursions while addressing data privacy concerns. To tackle excursion prediction, we propose a novel Hypo-Hyper (HH) loss function, which significantly improves performance in the glycemic excursion regions. The HH loss function demonstrates a 46% improvement over mean-squared error (MSE) loss across 125 patients. To address privacy concerns, we propose FedGlu, a machine learning model trained in a federated learning (FL) framework. FL allows collaborative learning without sharing sensitive data by training models locally and sharing only model parameters across other patients. FedGlu achieves a 35% superior glycemic excursion detection rate compared to local models. This improvement translates to enhanced performance in predicting both, hypoglycemia and hyperglycemia, for 105 out of 125 patients. These results underscore the effectiveness of the proposed HH loss function in augmenting the predictive capabilities of glucose predictions. Moreover, implementing models within a federated learning framework not only ensures better predictive capabilities but also safeguards sensitive data concurrently.

Updated: 2024-08-25 19:51:27

标题: FedGlu：一种个性化的基于联邦学习的血糖预测算法，用于改善血糖波动区域的性能

摘要: 持续血糖监测（CGM）设备提供实时血糖监测和及时警报，改善了糖尿病患者的血糖控制。然而，鉴别低血糖和高血糖等罕见事件仍然具有挑战性，因为它们发生的频率较低。此外，有限的敏感患者数据访问阻碍了健壮机器学习模型的发展。我们的目标是在解决数据隐私问题的同时准确预测血糖波动。为了解决波动预测问题，我们提出了一种新颖的Hypo-Hyper（HH）损失函数，显著提高了在血糖波动区域的表现。HH损失函数在125名患者中比均方误差（MSE）损失提高了46%。为了解决隐私问题，我们提出了FedGlu，这是在联邦学习（FL）框架中训练的机器学习模型。FL允许通过在本地训练模型并仅在其他患者之间共享模型参数实现协作学习，而不共享敏感数据。与本地模型相比，FedGlu实现了35%更高的血糖波动检测率。这种改进对于125名患者中的105名来说，转化为在预测低血糖和高血糖方面的性能增强。这些结果凸显了所提出的HH损失函数在增强葡萄糖预测的预测能力方面的有效性。此外，将模型实施在联邦学习框架中不仅确保了更好的预测能力，同时也同时保护了敏感数据。

更新时间: 2024-08-25 19:51:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.13926v1

The Detection of KIC 1718360, A Rotating Variable with a Possible Companion, Using Machine Learning

This paper presents the detection of a periodic dimming event in the lightcurve of the G1.5IV-V type star KIC 1718360. This is based on visible-light observations conducted by both the TESS and Kepler space telescopes. Analysis of the data seems to point toward a high rotation rate in the star, with a rotational period of 2.938 days. The high variability seen within the star's lightcurve points toward classification as a rotating variable. The initial observation was made in Kepler Quarter 16 data using the One-Class SVM machine learning method. Subsequent observations by the TESS space telescope corroborated these findings. It appears that KIC 1718360 is a nearby rotating variable that appears in little to no major catalogs as such. A secondary, additional periodic dip is also present, indicating a possible exoplanetary companion.

Updated: 2024-08-25 19:02:37

标题: 使用机器学习技术检测KIC 1718360，一个可能有伴星的旋转变星

摘要: 本文介绍了在G1.5IV-V型星KIC 1718360的光变曲线中检测到的周期性变暗事件。这是基于TESS和Kepler太空望远镜进行的可见光观测。对数据的分析似乎指向该恒星具有高自转速率，自转周期为2.938天。在恒星的光变曲线中观察到的高变异性表明其被归类为自转变星。最初的观测是使用One-Class SVM机器学习方法在Kepler第16季数据中进行的。TESS太空望远镜的后续观测证实了这些发现。看来KIC 1718360是一个附近的自转变星，几乎不在任何主要目录中列出。此外，还存在一个次要的、额外的周期性暗区，表明可能存在一颗系外行星伴星。

更新时间: 2024-08-25 19:02:37

领域: astro-ph.EP,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2405.05282v3

LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback

Large Language Models (LLMs) excel at generating human-like dialogues and comprehending text. However, understanding the subtleties of complex exchanges in language remains a challenge. We propose a bootstrapping framework that leverages self-generated feedback to enhance LLM reasoning capabilities for lie detection. The framework consists of three stages: suggestion, feedback collection, and modification. In the suggestion stage, a cost-effective language model generates initial predictions based on game state and dialogue. The feedback-collection stage involves a language model providing feedback on these predictions. In the modification stage, a more advanced language model refines the initial predictions using the auto-generated feedback. We investigate the application of the proposed framework for detecting betrayal and deception in Diplomacy games, and compare it with feedback from professional human players. The LLM-generated feedback exhibits superior quality and significantly enhances the performance of the model. Our approach achieves a 39% improvement over the zero-shot baseline in lying-F1 without the need for any training data, rivaling state-of-the-art supervised learning results.

Updated: 2024-08-25 18:47:55

标题: LLMs是更优秀的反馈提供者：利用自生成反馈进行谎言检测的推理自举

摘要: 大型语言模型（LLMs）擅长生成类似人类对话和理解文本。然而，理解语言中复杂交流的微妙之处仍然是一个挑战。我们提出了一个自举框架，利用自动生成的反馈来增强LLM对于谎言检测的推理能力。该框架包括三个阶段：建议、反馈收集和修改。在建议阶段，一个成本效益的语言模型根据游戏状态和对话生成初始预测。反馈收集阶段涉及一个语言模型对这些预测提供反馈。在修改阶段，一个更高级的语言模型使用自动生成的反馈来完善初始预测。我们研究了提出的框架在外交游戏中检测背叛和欺骗的应用，并将其与专业人类玩家的反馈进行比较。LLM生成的反馈展示了更高质量，并显著提升了模型的性能。我们的方法在谎言F1上比零-shot基线提高了39%，而无需任何训练数据，可与最先进的监督学习结果匹敌。

更新时间: 2024-08-25 18:47:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.13915v1

UAMM: Price-oracle based Automated Market Maker

Automated market makers (AMMs) are pricing mechanisms utilized by decentralized exchanges (DEX). Traditional AMM approaches are constrained by pricing solely based on their own liquidity pool, without consideration of external markets or risk management for liquidity providers. In this paper, we propose a new approach known as UBET AMM (UAMM), which calculates prices by considering external market prices and the impermanent loss of the liquidity pool. Despite relying on external market prices, our method maintains the desired properties of a constant product curve when computing slippages. The key element of UAMM is determining the appropriate slippage amount based on the desired target balance, which encourages the liquidity pool to minimize impermanent loss. We demonstrate that our approach eliminates arbitrage opportunities when external market prices are efficient.

Updated: 2024-08-25 18:04:21

标题: UAMM：基于价格预言机的自动做市商

摘要: 自动做市商（AMM）是去中心化交易所（DEX）所使用的定价机制。传统的AMM方法受限于仅基于其自身流动性池的定价，没有考虑外部市场或流动性提供者的风险管理。在本文中，我们提出了一种新方法，称为UBET AMM（UAMM），通过考虑外部市场价格和流动性池的不可逆损失来计算价格。尽管依赖于外部市场价格，我们的方法在计算滑点时保持了常量产品曲线的期望属性。UAMM的关键元素是根据期望的目标平衡确定适当的滑点金额，这鼓励流动性池最小化不可逆损失。我们证明了我们的方法在外部市场价格有效时消除了套利机会。

更新时间: 2024-08-25 18:04:21

领域: cs.LG,cs.CE,q-fin.CP

下载: http://arxiv.org/abs/2308.06375v2

ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models

Hallucinations in Multimodal Large Language Models (MLLMs) where generated responses fail to accurately reflect the given image pose a significant challenge to their reliability. To address this, we introduce ConVis, a novel training-free contrastive decoding method. ConVis leverages a text-to-image (T2I) generation model to semantically reconstruct the given image from hallucinated captions. By comparing the contrasting probability distributions produced by the original and reconstructed images, ConVis enables MLLMs to capture visual contrastive signals that penalize hallucination generation. Notably, this method operates purely within the decoding process, eliminating the need for additional data or model updates. Our extensive experiments on five popular benchmarks demonstrate that ConVis effectively reduces hallucinations across various MLLMs, highlighting its potential to enhance model reliability.

Updated: 2024-08-25 18:02:36

标题: ConVis：对抗解码与幻觉可视化，用于减轻多模态大语言模型中的幻觉

摘要: 多模态大型语言模型（MLLMs）中的幻觉，即生成的响应未能准确反映给定图像的姿势，对它们的可靠性构成重大挑战。为了解决这个问题，我们引入了一种新颖的无需训练的对比解码方法ConVis。ConVis利用文本到图像（T2I）生成模型从幻觉的标题中语义重构给定图像。通过比较原始和重构图像产生的对比概率分布，ConVis使MLLMs能够捕捉视觉对比信号，惩罚幻觉生成。值得注意的是，这种方法完全在解码过程中进行，消除了额外数据或模型更新的需求。我们在五个流行的基准测试上进行了大量实验，证明ConVis有效地减少了各种MLLMs中的幻觉，凸显了其增强模型可靠性的潜力。

更新时间: 2024-08-25 18:02:36

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.13906v1

RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation

3D point clouds play a pivotal role in outdoor scene perception, especially in the context of autonomous driving. Recent advancements in 3D LiDAR segmentation often focus intensely on the spatial positioning and distribution of points for accurate segmentation. However, these methods, while robust in variable conditions, encounter challenges due to sole reliance on coordinates and point intensity, leading to poor isometric invariance and suboptimal segmentation. To tackle this challenge, our work introduces Range-Aware Pointwise Distance Distribution (RAPiD) features and the associated RAPiD-Seg architecture. Our RAPiD features exhibit rigid transformation invariance and effectively adapt to variations in point density, with a design focus on capturing the localized geometry of neighboring structures. They utilize inherent LiDAR isotropic radiation and semantic categorization for enhanced local representation and computational efficiency, while incorporating a 4D distance metric that integrates geometric and surface material reflectivity for improved semantic segmentation. To effectively embed high-dimensional RAPiD features, we propose a double-nested autoencoder structure with a novel class-aware embedding objective to encode high-dimensional features into manageable voxel-wise embeddings. Additionally, we propose RAPiD-Seg which incorporates a channel-wise attention fusion and two effective RAPiD-Seg variants, further optimizing the embedding for enhanced performance and generalization. Our method outperforms contemporary LiDAR segmentation work in terms of mIoU on SemanticKITTI (76.1) and nuScenes (83.6) datasets.

Updated: 2024-08-25 17:59:22

标题: RAPiD-Seg：面向3D LiDAR分割的基于范围感知的逐点距离分布网络

摘要: 3D点云在室外场景感知中发挥着至关重要的作用，尤其在自动驾驶的背景下。最近在3D LiDAR分割方面取得的进展往往强调点的空间定位和分布，以实现准确的分割。然而，这些方法虽然在不同条件下表现出鲁棒性，但由于完全依赖坐标和点强度，导致其具有较差的等距不变性和次优的分割效果。为了解决这一挑战，我们的工作引入了Range-Aware Pointwise Distance Distribution (RAPiD)特征及其相关的RAPiD-Seg架构。我们的RAPiD特征表现出刚性变换不变性，并有效地适应点密度的变化，其设计重点在于捕捉相邻结构的局部几何形状。它们利用固有的LiDAR各向同性辐射和语义分类，以增强局部表示和计算效率，同时结合了一个整合几何和表面材料反射率的4D距离度量，以改善语义分割效果。为了有效嵌入高维RAPiD特征，我们提出了一个双嵌套自编码器结构，并引入了一个新颖的面向类别的嵌入目标，将高维特征编码为可管理的体素嵌入。此外，我们提出了RAPiD-Seg，其中包括通道注意融合和两种有效的RAPiD-Seg变体，进一步优化嵌入以提高性能和泛化能力。我们的方法在SemanticKITTI（76.1）和nuScenes（83.6）数据集上的mIoU方面优于当代LiDAR分割工作。

更新时间: 2024-08-25 17:59:22

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.10159v2

TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training

3D point clouds are essential for perceiving outdoor scenes, especially within the realm of autonomous driving. Recent advances in 3D LiDAR Object Detection focus primarily on the spatial positioning and distribution of points to ensure accurate detection. However, despite their robust performance in variable conditions, these methods are hindered by their sole reliance on coordinates and point intensity, resulting in inadequate isometric invariance and suboptimal detection outcomes. To tackle this challenge, our work introduces Transformation-Invariant Local (TraIL) features and the associated TraIL-Det architecture. Our TraIL features exhibit rigid transformation invariance and effectively adapt to variations in point density, with a design focus on capturing the localized geometry of neighboring structures. They utilize the inherent isotropic radiation of LiDAR to enhance local representation, improve computational efficiency, and boost detection performance. To effectively process the geometric relations among points within each proposal, we propose a Multi-head self-Attention Encoder (MAE) with asymmetric geometric features to encode high-dimensional TraIL features into manageable representations. Our method outperforms contemporary self-supervised 3D object detection approaches in terms of mAP on KITTI (67.8, 20% label, moderate) and Waymo (68.9, 20% label, moderate) datasets under various label ratios (20%, 50%, and 100%).

Updated: 2024-08-25 17:59:17

标题: TraIL-Det：具有无监督预训练的3D LiDAR目标检测的变换不变局部特征网络

摘要: 3D点云对于感知室外场景至关重要，特别是在自动驾驶领域。最近在3D LiDAR目标检测方面取得的进展主要关注点的空间定位和分布，以确保准确检测。然而，尽管这些方法在不同条件下表现出色，但它们仅依赖坐标和点强度，导致等距不变性不足和检测结果不佳。为了解决这一挑战，我们的工作引入了Transformation-Invariant Local（TraIL）特征和相关的TraIL-Det架构。我们的TraIL特征表现出刚性变换不变性，并有效地适应点密度的变化，设计重点在于捕捉相邻结构的局部几何。它们利用LiDAR的固有各向同性辐射来增强局部表示，提高计算效率，并提升检测性能。为了有效处理每个提议中点之间的几何关系，我们提出了一个具有非对称几何特征的多头自注意力编码器（MAE），将高维TraIL特征编码为可管理的表示。我们的方法在KITTI（67.8，20％标签，中等）和Waymo（68.9，20％标签，中等）数据集上以不同标签比例（20％，50％和100％）的mAP方面优于当代自监督3D目标检测方法。

更新时间: 2024-08-25 17:59:17

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2408.13902v1

Active and Passive Causal Inference Learning

This paper serves as a starting point for machine learning researchers, engineers and students who are interested in but not yet familiar with causal inference. We start by laying out an important set of assumptions that are collectively needed for causal identification, such as exchangeability, positivity, consistency and the absence of interference. From these assumptions, we build out a set of important causal inference techniques, which we do so by categorizing them into two buckets; active and passive approaches. We describe and discuss randomized controlled trials and bandit-based approaches from the active category. We then describe classical approaches, such as matching and inverse probability weighting, in the passive category, followed by more recent deep learning based algorithms. By finishing the paper with some of the missing aspects of causal inference from this paper, such as collider biases, we expect this paper to provide readers with a diverse set of starting points for further reading and research in causal inference and discovery.

Updated: 2024-08-25 17:57:19

标题: 主动和被动因果推理学习

摘要: 这篇论文为对因果推断感兴趣但尚不熟悉的机器学习研究人员、工程师和学生提供了一个起点。我们首先阐述了因果识别所需的一组重要假设，如可交换性、积极性、一致性和无干扰性。基于这些假设，我们构建了一组重要的因果推断技术，将它们分类为主动和被动两种方法。我们从主动类别中描述和讨论了随机对照试验和赌徒算法。然后我们描述了被动类别中的经典方法，如配对和反向概率加权，接着介绍了更近期基于深度学习的算法。通过在论文结尾讨论一些因果推断中的遗漏方面，比如碰撞器偏差，我们期望这篇论文为读者提供了一个多样的起点，以便进一步阅读和研究因果推断和发现。

更新时间: 2024-08-25 17:57:19

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2308.09248v2

Quantum-Powered Personalized Learning

This paper explores the transformative potential of quantum computing in the realm of personalized learning. Traditional machine learning models and GPU-based approaches have long been utilized to tailor educational experiences to individual student needs. However, these methods face significant challenges in terms of scalability, computational efficiency, and real-time adaptation to the dynamic nature of educational data. This study proposes leveraging quantum computing to address these limitations. We review existing personalized learning systems, classical machine learning methods, and emerging quantum computing applications in education. We then outline a protocol for data collection, privacy preservation using quantum techniques, and preprocessing, followed by the development and implementation of quantum algorithms specifically designed for personalized learning. Our findings indicate that quantum algorithms offer substantial improvements in efficiency, scalability, and personalization quality compared to classical methods. This paper discusses the implications of integrating quantum computing into educational systems, highlighting the potential for enhanced teaching methodologies, curriculum design, and overall student experiences. We conclude by summarizing the advantages of quantum computing in education and suggesting future research directions.

Updated: 2024-08-25 17:45:48

标题: 量子动力个性化学习

摘要: 本文探讨了量子计算在个性化学习领域的转变潜力。传统的机器学习模型和基于GPU的方法长期以来被用来个性化教育体验以适应个体学生的需求。然而，这些方法在可扩展性、计算效率和实时适应教育数据动态性方面面临重大挑战。本研究提出利用量子计算来解决这些限制。我们回顾了现有的个性化学习系统、经典机器学习方法以及在教育领域新兴的量子计算应用。然后，我们概述了数据收集、使用量子技术进行隐私保护和预处理的协议，随后是量子算法的开发和实施，专门为个性化学习设计。我们的研究结果表明，与经典方法相比，量子算法在效率、可扩展性和个性化质量方面提供了显著的改进。本文讨论了将量子计算整合到教育系统中的意义，突出了增强教学方法、课程设计和学生整体体验的潜力。最后，我们总结了量子计算在教育中的优势，并提出了未来的研究方向。

更新时间: 2024-08-25 17:45:48

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2408.15287v1

QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs

Table summarization is a crucial task aimed at condensing information from tabular data into concise and comprehensible textual summaries. However, existing approaches often fall short of adequately meeting users' information and quality requirements and tend to overlook the complexities of real-world queries. In this paper, we propose a novel method to address these limitations by introducing query-focused multi-table summarization. Our approach, which comprises a table serialization module, a summarization controller, and a large language model (LLM), utilizes textual queries and multiple tables to generate query-dependent table summaries tailored to users' information needs. To facilitate research in this area, we present a comprehensive dataset specifically tailored for this task, consisting of 4909 query-summary pairs, each associated with multiple tables. Through extensive experiments using our curated dataset, we demonstrate the effectiveness of our proposed method compared to baseline approaches. Our findings offer insights into the challenges of complex table reasoning for precise summarization, contributing to the advancement of research in query-focused multi-table summarization.

Updated: 2024-08-25 17:22:29

标题: QFMTS：在多表输入上生成以查询为焦点的摘要

摘要: 表格总结是一个关键任务，旨在将表格数据中的信息压缩成简洁易懂的文本摘要。然而，现有方法往往无法充分满足用户的信息和质量要求，也容易忽略现实世界查询的复杂性。在本文中，我们提出了一种新颖的方法来解决这些限制，即引入基于查询的多表总结。我们的方法包括一个表格序列化模块、一个总结控制器和一个大型语言模型（LLM），利用文本查询和多个表格生成针对用户信息需求量身定制的查询相关表格摘要。为了促进这一领域的研究，我们提供了一个专门为此任务量身定制的全面数据集，包括4909个查询-摘要对，每个对应多个表格。通过使用我们策划的数据集进行广泛实验，我们展示了我们提出的方法相对于基准方法的有效性。我们的发现为精确总结的复杂表格推理挑战提供了见解，有助于推动基于查询的多表总结研究的进展。

更新时间: 2024-08-25 17:22:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.05109v2

Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks

Computational complexity of Bayesian learning is impeding its adoption in practical, large-scale tasks. Despite demonstrations of significant merits such as improved robustness and resilience to unseen or out-of-distribution inputs over their non- Bayesian counterparts, their practical use has faded to near insignificance. In this study, we introduce an innovative framework to mitigate the computational burden of Bayesian neural networks (BNNs). Our approach follows the principle of Bayesian techniques based on deep ensembles, but significantly reduces their cost via multiple low-rank perturbations of parameters arising from a pre-trained neural network. Both vanilla version of ensembles as well as more sophisticated schemes such as Bayesian learning with Stein Variational Gradient Descent (SVGD), previously deemed impractical for large models, can be seamlessly implemented within the proposed framework, called Bayesian Low-Rank LeArning (Bella). In a nutshell, i) Bella achieves a dramatic reduction in the number of trainable parameters required to approximate a Bayesian posterior; and ii) it not only maintains, but in some instances, surpasses the performance of conventional Bayesian learning methods and non-Bayesian baselines. Our results with large-scale tasks such as ImageNet, CAMELYON17, DomainNet, VQA with CLIP, LLaVA demonstrate the effectiveness and versatility of Bella in building highly scalable and practical Bayesian deep models for real-world applications.

Updated: 2024-08-25 17:07:49

标题: 贝叶斯低秩学习（Bella）：贝叶斯神经网络的实用方法

摘要: 贝叶斯学习的计算复杂性阻碍了其在实际的大规模任务中的应用。尽管相比非贝叶斯对应物，贝叶斯学习表现出了显著的优点，如改善了对未知或超出分布输入的鲁棒性和弹性，但它们的实际应用已经几乎渐渐消失。在这项研究中，我们引入了一个创新性的框架来减轻贝叶斯神经网络（BNNs）的计算负担。我们的方法遵循基于深度集合的贝叶斯技术原则，但通过来自预训练神经网络的多个低秩扰动显著降低了成本。在提出的框架中，称为贝叶斯低秩学习（Bella）中，可无缝实现原始版本的集合以及更复杂的方案，如具有Stein变分梯度下降（SVGD）的贝叶斯学习，这些方案先前被认为对于大型模型不切实际。简言之，i）Bella实现了对近似贝叶斯后验所需的可训练参数数量的显著减少；ii）它不仅维持了，而且在某些情况下甚至超过了传统的贝叶斯学习方法和非贝叶斯基线的表现。我们在大规模任务（如ImageNet、CAMELYON17、DomainNet、具有CLIP的VQA、LLaVA）上的结果展示了Bella在构建高度可扩展和实用的贝叶斯深度模型以应用于实际应用中的有效性和多功能性。

更新时间: 2024-08-25 17:07:49

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.20891v2

SPICED: Syntactical Bug and Trojan Pattern Identification in A/MS Circuits using LLM-Enhanced Detection

Analog and mixed-signal (A/MS) integrated circuits (ICs) are crucial in modern electronics, playing key roles in signal processing, amplification, sensing, and power management. Many IC companies outsource manufacturing to third-party foundries, creating security risks such as stealthy analog Trojans. Traditional detection methods, including embedding circuit watermarks or conducting hardware-based monitoring, often impose significant area and power overheads, and may not effectively identify all types of Trojans. To address these shortcomings, we propose SPICED, a Large Language Model (LLM)-based framework that operates within the software domain, eliminating the need for hardware modifications for Trojan detection and localization. This is the first work using LLM-aided techniques for detecting and localizing syntactical bugs and analog Trojans in circuit netlists, requiring no explicit training and incurring zero area overhead. Our framework employs chain-of-thought reasoning and few-shot examples to teach anomaly detection rules to LLMs. With the proposed method, we achieve an average Trojan coverage of 93.32% and an average true positive rate of 93.4% in identifying Trojan-impacted nodes for the evaluated analog benchmark circuits. These experimental results validate the effectiveness of LLMs in detecting and locating both syntactical bugs and Trojans within analog netlists.

Updated: 2024-08-25 17:07:08

标题: SPICED：利用LLM增强检测在A/MS电路中进行句法错误和木马模式识别

摘要: 模拟和混合信号集成电路在现代电子领域中至关重要，扮演着信号处理、放大、传感和功率管理等关键角色。许多集成电路公司将制造外包给第三方晶圆厂，从而产生安全风险，如隐蔽的模拟特洛伊木马。传统的检测方法，包括嵌入电路水印或进行硬件监控，往往会带来显著的面积和功耗开销，并且可能无法有效识别所有类型的特洛伊木马。为了解决这些缺点，我们提出了SPICED，这是一个基于大型语言模型（LLM）的框架，在软件领域内运作，无需硬件修改即可进行特洛伊木马的检测和定位。这是第一个利用LLM辅助技术在电路网络列表中检测和定位语法错误和模拟特洛伊木马的工作，无需显式训练且不会增加面积开销。我们的框架采用思维链推理和少量示例来向LLM教授异常检测规则。通过提出的方法，我们在评估的模拟基准电路中实现了93.32%的平均特洛伊木马覆盖率和93.4%的平均真阳性率，用于识别受特洛伊木马影响的节点。这些实验结果验证了LLM在检测和定位模拟网络列表中的语法错误和特洛伊木马方面的有效性。

更新时间: 2024-08-25 17:07:08

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.16018v1

Hiding Backdoors within Event Sequence Data via Poisoning Attacks

The financial industry relies on deep learning models for making important decisions. This adoption brings new danger, as deep black-box models are known to be vulnerable to adversarial attacks. In computer vision, one can shape the output during inference by performing an adversarial attack called poisoning via introducing a backdoor into the model during training. For sequences of financial transactions of a customer, insertion of a backdoor is harder to perform, as models operate over a more complex discrete space of sequences, and systematic checks for insecurities occur. We provide a method to introduce concealed backdoors, creating vulnerabilities without altering their functionality for uncontaminated data. To achieve this, we replace a clean model with a poisoned one that is aware of the availability of a backdoor and utilize this knowledge. Our most difficult for uncovering attacks include either additional supervised detection step of poisoned data activated during the test or well-hidden model weight modifications. The experimental study provides insights into how these effects vary across different datasets, architectures, and model components. Alternative methods and baselines, such as distillation-type regularization, are also explored but found to be less efficient. Conducted on three open transaction datasets and architectures, including LSTM, CNN, and Transformer, our findings not only illuminate the vulnerabilities in contemporary models but also can drive the construction of more robust systems.

Updated: 2024-08-25 16:47:57

标题: 通过毒化攻击在事件序列数据中隐藏后门

摘要: 金融行业依赖深度学习模型做出重要决策。这种采用带来了新的危险，因为深度黑盒模型已知容易受到对抗性攻击的影响。在计算机视觉中，可以通过进行称为“毒化”的对抗性攻击在推断过程中塑造输出，通过在训练过程中引入后门。对于客户的一系列金融交易，插入后门更难实现，因为模型在更复杂的离散序列空间上运行，并且会进行系统性的不安全性检查。我们提供了一种方法来引入隐藏的后门，创建漏洞而不改变对未受污染数据的功能。为了实现这一点，我们用一个知道后门可用性并利用这种知识的毒化模型替换了一个清洁模型。我们最难以揭露攻击的方法包括在测试期间激活的毒化数据的额外监督检测步骤或良好隐藏的模型权重修改。实验研究揭示了这些效果在不同数据集、架构和模型组件之间的变化。还探讨了其他方法和基线，如蒸馏型正则化，但发现效率较低。在包括LSTM、CNN和Transformer在内的三个开放交易数据集和架构上进行的研究结果不仅揭示了当代模型的漏洞，还能推动更健壮系统的构建。

更新时间: 2024-08-25 16:47:57

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2308.10201v2

Enhancing SQL Query Generation with Neurosymbolic Reasoning

Neurosymbolic approaches blend the effectiveness of symbolic reasoning with the flexibility of neural networks. In this work, we propose a neurosymbolic architecture for generating SQL queries that builds and explores a solution tree using Best-First Search, with the possibility of backtracking. For this purpose, it integrates a Language Model (LM) with symbolic modules that help catch and correct errors made by the LM on SQL queries, as well as guiding the exploration of the solution tree. We focus on improving the performance of smaller open-source LMs, and we find that our tool, Xander, increases accuracy by an average of 10.9% and reduces runtime by an average of 28% compared to the LM without Xander, enabling a smaller LM (with Xander) to outperform its four-times larger counterpart (without Xander).

Updated: 2024-08-25 16:37:26

标题: 利用神经符号推理增强SQL查询生成

摘要: 神经符号方法将符号推理的有效性与神经网络的灵活性相结合。在这项工作中，我们提出了一种用于生成SQL查询的神经符号架构，该架构使用最佳优先搜索构建和探索解决方案树，并具有回溯的可能性。为此，它集成了一个语言模型（LM）和符号模块，这些模块有助于捕捉和纠正LM在SQL查询中的错误，并引导解决方案树的探索。我们专注于改进较小的开源LM的性能，并发现我们的工具Xander相对于没有Xander的LM，平均提高了10.9％的准确性，并将运行时间平均减少了28％，使得较小的LM（具有Xander）能够胜过其四倍大的对应物（没有Xander）。

更新时间: 2024-08-25 16:37:26

领域: cs.DB,cs.AI,cs.SE,I.2

下载: http://arxiv.org/abs/2408.13888v1

Who Are We Missing? A Principled Approach to Characterizing the Underrepresented Population

Randomized controlled trials (RCTs) serve as the cornerstone for understanding causal effects, yet extending inferences to target populations presents challenges due to effect heterogeneity and underrepresentation. Our paper addresses the critical issue of identifying and characterizing underrepresented subgroups in RCTs, proposing a novel framework for refining target populations to improve generalizability. We introduce an optimization-based approach, Rashomon Set of Optimal Trees (ROOT), to characterize underrepresented groups. ROOT optimizes the target subpopulation distribution by minimizing the variance of the target average treatment effect estimate, ensuring more precise treatment effect estimations. Notably, ROOT generates interpretable characteristics of the underrepresented population, aiding researchers in effective communication. Our approach demonstrates improved precision and interpretability compared to alternatives, as illustrated with synthetic data experiments. We apply our methodology to extend inferences from the Starting Treatment with Agonist Replacement Therapies (START) trial -- investigating the effectiveness of medication for opioid use disorder -- to the real-world population represented by the Treatment Episode Dataset: Admissions (TEDS-A). By refining target populations using ROOT, our framework offers a systematic approach to enhance decision-making accuracy and inform future trials in diverse populations.

Updated: 2024-08-25 16:36:02

标题: 我们错过了谁？对描述代表性不足人群的原则性方法

摘要: 随机对照试验（RCTs）是理解因果效应的基石，然而，将推论扩展到目标人群面临着效果异质性和代表性不足的挑战。我们的论文解决了识别和表征RCTs中代表性不足亚组的关键问题，并提出了一个新颖的框架来优化目标人群以提高推广性。我们引入了一种基于优化的方法，Rashomon最佳树集（ROOT），来表征代表性不足的群体。ROOT通过最小化目标平均治疗效应估计的方差来优化目标亚种群分布，确保更精确的治疗效应估计。值得注意的是，ROOT生成了代表性不足人群的可解释特征，帮助研究人员进行有效沟通。我们的方法在精度和可解释性方面表现出比替代方法更好的效果，如合成数据实验所示。我们将我们的方法应用于从开始使用激动剂替代疗法（START）试验中推广推论，该试验调查了药物治疗阿片类物质使用障碍的有效性，到由治疗情节数据集：入院（TEDS-A）代表的真实世界人口。通过使用ROOT优化目标人群，我们的框架提供了一种系统方法来提高决策精度，并为未来在不同人群中进行试验提供信息。

更新时间: 2024-08-25 16:36:02

领域: stat.ME,cs.CY,cs.LG,stat.AP

下载: http://arxiv.org/abs/2401.14512v4

Neural Spacetimes for DAG Representation Learning

We propose a class of trainable deep learning-based geometries called Neural Spacetimes (NSTs), which can universally represent nodes in weighted directed acyclic graphs (DAGs) as events in a spacetime manifold. While most works in the literature focus on undirected graph representation learning or causality embedding separately, our differentiable geometry can encode both graph edge weights in its spatial dimensions and causality in the form of edge directionality in its temporal dimensions. We use a product manifold that combines a quasi-metric (for space) and a partial order (for time). NSTs are implemented as three neural networks trained in an end-to-end manner: an embedding network, which learns to optimize the location of nodes as events in the spacetime manifold, and two other networks that optimize the space and time geometries in parallel, which we call a neural (quasi-)metric and a neural partial order, respectively. The latter two networks leverage recent ideas at the intersection of fractal geometry and deep learning to shape the geometry of the representation space in a data-driven fashion, unlike other works in the literature that use fixed spacetime manifolds such as Minkowski space or De Sitter space to embed DAGs. Our main theoretical guarantee is a universal embedding theorem, showing that any $k$-point DAG can be embedded into an NST with $1+\mathcal{O}(\log(k))$ distortion while exactly preserving its causal structure. The total number of parameters defining the NST is sub-cubic in $k$ and linear in the width of the DAG. If the DAG has a planar Hasse diagram, this is improved to $\mathcal{O}(\log(k)) + 2)$ spatial and 2 temporal dimensions. We validate our framework computationally with synthetic weighted DAGs and real-world network embeddings; in both cases, the NSTs achieve lower embedding distortions than their counterparts using fixed spacetime geometries.

Updated: 2024-08-25 16:26:55

标题: 神经时空网络用于有向无环图表示学习

摘要: 我们提出了一类可训练的基于深度学习的几何结构，称为神经时空（NSTs），它可以将加权有向无环图（DAGs）中的节点通用地表示为时空流形中的事件。虽然文献中的大多数作品都集中在无向图表示学习或因果嵌入方面，但我们的可微几何可以在空间维度中编码图边权重，并在时间维度中以边方向性的形式编码因果关系。我们使用一个产品流形，结合了准度量（用于空间）和偏序（用于时间）。NSTs作为三个神经网络实现，以端到端方式进行训练：一个嵌入网络，学习优化节点在时空流形中的事件位置，以及另外两个同时优化空间和时间几何结构的网络，我们分别称之为神经（准）度量和神经偏序。后两个网络利用了最近在分形几何和深度学习交叉点上的想法，以数据驱动的方式塑造了表示空间的几何结构，与文献中使用固定时空流形（如闵可夫斯基空间或德西特空间）嵌入DAGs的其他作品不同。我们的主要理论保证是一个通用的嵌入定理，表明任何k个节点的DAG可以嵌入到一个NST中，畸变为1+O(log(k))，同时完全保留其因果结构。定义NST的总参数数量是k的次立方且与DAG的宽度呈线性关系。如果DAG具有平面哈斯图，这将改进为O(log(k)) + 2个空间和2个时间维度。我们通过合成加权DAG和真实网络嵌入在计算上验证了我们的框架；在两种情况下，NSTs的嵌入畸变低于使用固定时空几何结构的对应物。

更新时间: 2024-08-25 16:26:55

领域: cs.LG,cs.DM,cs.NE,math.MG,stat.ML

下载: http://arxiv.org/abs/2408.13885v1

Safe Policy Exploration Improvement via Subgoals

Reinforcement learning is a widely used approach to autonomous navigation, showing potential in various tasks and robotic setups. Still, it often struggles to reach distant goals when safety constraints are imposed (e.g., the wheeled robot is prohibited from moving close to the obstacles). One of the main reasons for poor performance in such setups, which is common in practice, is that the need to respect the safety constraints degrades the exploration capabilities of an RL agent. To this end, we introduce a novel learnable algorithm that is based on decomposing the initial problem into smaller sub-problems via intermediate goals, on the one hand, and respects the limit of the cumulative safety constraints, on the other hand -- SPEIS(Safe Policy Exploration Improvement via Subgoals). It comprises the two coupled policies trained end-to-end: subgoal and safe. The subgoal policy is trained to generate the subgoal based on the transitions from the buffer of the safe (main) policy that helps the safe policy to reach distant goals. Simultaneously, the safe policy maximizes its rewards while attempting not to violate the limit of the cumulative safety constraints, thus providing a certain level of safety. We evaluate SPEIS in a wide range of challenging (simulated) environments that involve different types of robots in two different environments: autonomous vehicles from the POLAMP environment and car, point, doggo, and sweep from the safety-gym environment. We demonstrate that our method consistently outperforms state-of-the-art competitors and can significantly reduce the collision rate while maintaining high success rates (higher by 80% compared to the best-performing methods).

Updated: 2024-08-25 16:12:49

标题: 通过子目标提高安全策略探索的方法

摘要: 强化学习是自主导航中广泛使用的方法，展现了在各种任务和机器人设置中的潜力。然而，当安全约束被施加时（例如，禁止轮式机器人靠近障碍物移动），它经常无法达到远距离目标。在这种常见的实践中，导致性能不佳的主要原因之一是需要尊重安全约束会降低RL代理的探索能力。为此，我们介绍了一种基于将初始问题通过中间目标分解为较小子问题的可学习算法 - SPEIS（通过子目标实现安全策略探索改进）。它包括两个耦合在一起训练的策略：子目标和安全。子目标策略被训练为基于来自安全（主）策略缓冲区的转换生成子目标，帮助安全策略达到远距离目标。同时，安全策略在尝试不违反累积安全约束的情况下最大化其奖励，从而提供一定水平的安全性。我们在涉及来自POLAMP环境的自主车辆和来自safety-gym环境的汽车、点、doggo和sweep等不同类型机器人的广泛挑战（模拟）环境中评估了SPEIS。我们证明了我们的方法始终优于最先进的竞争对手，并且可以显著降低碰撞率，同时保持高成功率（比表现最佳的方法高80%）。

更新时间: 2024-08-25 16:12:49

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2408.13881v1

Deconfounding Imitation Learning with Variational Inference

Standard imitation learning can fail when the expert demonstrators have different sensory inputs than the imitating agent. This is because partial observability gives rise to hidden confounders in the causal graph. In previous work, to work around the confounding problem, policies have been trained using query access to the expert's policy or inverse reinforcement learning (IRL). However, both approaches have drawbacks as the expert's policy may not be available and IRL can be unstable in practice. Instead, we propose to train a variational inference model to infer the expert's latent information and use it to train a latent-conditional policy. We prove that using this method, under strong assumptions, the identification of the correct imitation learning policy is theoretically possible from expert demonstrations alone. In practice, we focus on a setting with less strong assumptions where we use exploration data for learning the inference model. We show in theory and practice that this algorithm converges to the correct interventional policy, solves the confounding issue, and can under certain assumptions achieve an asymptotically optimal imitation performance.

Updated: 2024-08-25 15:45:08

标题: 使用变分推断去除模仿学习中的混淆

摘要: 当专家示范者的感知输入与模仿者有所不同时，标准模仿学习可能会失败。这是因为部分可观测性导致了因果图中的隐藏性混淆因素。在先前的工作中，为了解决混淆问题，已经使用查询访问专家政策或反向奖励学习（IRL）来训练政策。然而，这两种方法都有缺点，因为专家政策可能不可用，并且IRL在实践中可能不稳定。相反，我们提出训练一个变分推断模型来推断专家的潜在信息，并将其用于训练潜在条件政策。我们证明，使用这种方法，在强假设下，从专家示范中理论上可以识别出正确的模仿学习政策。在实践中，我们关注一个假设较弱的设置，在这种情况下，我们使用探索数据来学习推断模型。我们在理论和实践中展示，这种算法收敛到正确的干预政策，解决了混淆问题，并且在某些假设下可以实现渐近优化的模仿表现。

更新时间: 2024-08-25 15:45:08

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2211.02667v2

Flexible game-playing AI with AlphaViT: adapting to multiple games and board sizes

This paper presents novel game AI agents based on the AlphaZero framework, enhanced with Vision Transformers (ViT): AlphaViT, AlphaViD, and AlphaVDA. These agents are designed to play various board games of different sizes using a single model, overcoming AlphaZero's limitation of being restricted to a fixed board size. AlphaViT uses only a transformer encoder, while AlphaViD and AlphaVDA contain both an encoder and a decoder. AlphaViD's decoder receives input from the encoder output, while AlphaVDA uses a learnable matrix as decoder input. Using the AlphaZero framework, the three proposed methods demonstrate their versatility in different game environments, including Connect4, Gomoku, and Othello. Experimental results show that these agents, whether trained on a single game or on multiple games simultaneously, consistently outperform traditional algorithms such as Minimax and Monte Carlo tree search using a single DNN with shared weights, while approaching the performance of AlphaZero. In particular, AlphaViT and AlphaViD show strong performance across games, with AlphaViD benefiting from an additional decoder layer that enhances its ability to adapt to different action spaces and board sizes. These results may suggest the potential of transformer-based architectures to develop more flexible and robust game AI agents capable of excelling in multiple games and dynamic environments.

Updated: 2024-08-25 15:40:21

标题: 灵活的游戏AI与AlphaViT：适应多个游戏和棋盘尺寸

摘要: 本文介绍了基于AlphaZero框架的新型游戏AI代理，增强了Vision Transformers（ViT）：AlphaViT、AlphaViD和AlphaVDA。这些代理被设计用于使用单一模型玩各种不同大小的棋盘游戏，克服了AlphaZero仅限于固定棋盘大小的局限性。AlphaViT仅使用变压器编码器，而AlphaViD和AlphaVDA包含编码器和解码器。AlphaViD的解码器从编码器输出接收输入，而AlphaVDA使用可学习的矩阵作为解码器输入。利用AlphaZero框架，三种提出的方法展示了它们在不同游戏环境中的多功能性，包括Connect4、Gomoku和Othello。实验结果表明，这些代理，无论是在单个游戏上训练还是同时在多个游戏上训练，都能始终胜过传统算法如Minimax和蒙特卡洛树搜索，使用具有共享权重的单个DNN，同时接近AlphaZero的性能。特别是，AlphaViT和AlphaViD在各种游戏中表现出色，AlphaViD受益于额外的解码器层，增强了其适应不同行动空间和棋盘大小的能力。这些结果可能表明基于变压器的架构有潜力开发更灵活、更强大的游戏AI代理，能够在多个游戏和动态环境中表现出色。

更新时间: 2024-08-25 15:40:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.13871v1

Optimizing Delegation in Collaborative Human-AI Hybrid Teams

When humans and autonomous systems operate together as what we refer to as a hybrid team, we of course wish to ensure the team operates successfully and effectively. We refer to team members as agents. In our proposed framework, we address the case of hybrid teams in which, at any time, only one team member (the control agent) is authorized to act as control for the team. To determine the best selection of a control agent, we propose the addition of an AI manager (via Reinforcement Learning) which learns as an outside observer of the team. The manager learns a model of behavior linking observations of agent performance and the environment/world the team is operating in, and from these observations makes the most desirable selection of a control agent. We restrict the manager task by introducing a set of constraints. The manager constraints indicate acceptable team operation, so a violation occurs if the team enters a condition which is unacceptable and requires manager intervention. To ensure minimal added complexity or potential inefficiency for the team, the manager should attempt to minimize the number of times the team reaches a constraint violation and requires subsequent manager intervention. Therefore our manager is optimizing its selection of authorized agents to boost overall team performance while minimizing the frequency of manager intervention. We demonstrate our manager performance in a simulated driving scenario representing the case of a hybrid team of agents composed of a human driver and autonomous driving system. We perform experiments for our driving scenario with interfering vehicles, indicating the need for collision avoidance and proper speed control. Our results indicate a positive impact of our manager, with some cases resulting in increased team performance up to ~187% that of the best solo agent performance.

Updated: 2024-08-25 15:28:21

标题: 优化协作人工智能混合团队中的委托

摘要: 当人类和自主系统共同运作时，我们所称为混合团队，我们当然希望确保团队成功和有效地运作。我们将团队成员称为代理人。在我们提出的框架中，我们讨论了混合团队的情况，其中，在任何时候，只有一个团队成员（控制代理人）被授权作为团队的控制者。为了确定最佳的控制代理人选择，我们提出了通过强化学习的人工智能经理的添加，该经理作为团队的外部观察者进行学习。经理学习了将代理人性能观察和团队操作的环境/世界联系起来的行为模型，并从这些观察中选择最理想的控制代理人。我们通过引入一组约束来限制经理任务。经理约束表示可接受的团队操作，因此，如果团队进入不可接受的状态并需要经理干预，则会发生违反。为了确保团队的最小增加复杂性或潜在低效性，经理应尽量减少团队达到约束违规并需要随后经理干预的次数。因此，我们的经理正在优化其授权代理人的选择，以提高整个团队的绩效，同时最大程度地减少经理干预的频率。我们在模拟驾驶场景中展示了我们的经理绩效，该场景代表了由人类驾驶员和自主驾驶系统组成的代理团队的情况。我们对涉及其他车辆的驾驶场景进行实验，表明需要避免碰撞和适当控制速度。我们的结果表明，我们的经理产生了积极影响，有些情况下，团队绩效提高了约187%以上的最佳独立代理绩效。

更新时间: 2024-08-25 15:28:21

领域: cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2402.05605v2

CodeGraph: Enhancing Graph Reasoning of LLMs with Code

With the increasing popularity of large language models (LLMs), reasoning on basic graph algorithm problems is an essential intermediate step in assessing their abilities to process and infer complex graph reasoning tasks. Existing methods usually convert graph-structured data to textual descriptions and then use LLMs for reasoning and computation. However, LLMs often produce computation errors on arithmetic parts in basic graph algorithm problems, such as counting number of edges. In addition, they struggle to control or understand the output of the reasoning process, raising concerns about whether LLMs are simply guessing. In this paper, we introduce CodeGraph, a method that encodes graph problem solutions as code. The methods solve new graph problems by learning from exemplars, generating programs, and executing them via a program interpreter. Using the few-shot setting, we evaluate CodeGraph with the base LLM being GPT-3.5 Turbo, Llama3-70B Instruct, Mixtral-8x22B Instruct, and Mixtral-8x7B Instruct. Experimental results on six tasks with six graph encoding methods in the GraphQA dataset demonstrate that CodeGraph can boost performance on graph reasoning tasks inside LLMs by 1.3% to 58.6%, depending on the task. Compared to the existing methods, CodeGraph demonstrates strong performance on arithmetic problems in graph tasks and offers a more controllable and interpretable approach to the reasoning process.

Updated: 2024-08-25 15:27:21

标题: CodeGraph：利用代码增强LLMs的图推理

摘要: 随着大型语言模型（LLMs）的日益普及，对基本图算法问题进行推理是评估它们处理和推断复杂图推理任务能力的一个关键中间步骤。现有方法通常将图结构化数据转换为文本描述，然后使用LLMs进行推理和计算。然而，LLMs在基本图算法问题中的算术部分（如计算边的数量）常常产生计算错误。此外，它们往往难以控制或理解推理过程的输出，引发了LLMs是否只是在猜测的担忧。本文介绍了一种名为CodeGraph的方法，将图问题解决方案编码为代码。该方法通过学习示例，生成程序，并通过程序解释器执行来解决新的图问题。在少样本设置下，我们使用GPT-3.5 Turbo、Llama3-70B Instruct、Mixtral-8x22B Instruct和Mixtral-8x7B Instruct作为基础LLM，评估了CodeGraph的性能。在GraphQA数据集中使用六种图编码方法对六项任务进行实验，实验结果表明，CodeGraph可以将LLMs内的图推理任务的性能提高1.3%至58.6%，取决于任务。与现有方法相比，CodeGraph在图任务中的算术问题上表现出强大的性能，并提供了一种更可控、可解释的推理过程方法。

更新时间: 2024-08-25 15:27:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.13863v1

Draw Like an Artist: Complex Scene Generation with Diffusion Model via Composition, Painting, and Retouching

Recent advances in text-to-image diffusion models have demonstrated impressive capabilities in image quality. However, complex scene generation remains relatively unexplored, and even the definition of `complex scene' itself remains unclear. In this paper, we address this gap by providing a precise definition of complex scenes and introducing a set of Complex Decomposition Criteria (CDC) based on this definition. Inspired by the artists painting process, we propose a training-free diffusion framework called Complex Diffusion (CxD), which divides the process into three stages: composition, painting, and retouching. Our method leverages the powerful chain-of-thought capabilities of large language models (LLMs) to decompose complex prompts based on CDC and to manage composition and layout. We then develop an attention modulation method that guides simple prompts to specific regions to complete the complex scene painting. Finally, we inject the detailed output of the LLM into a retouching model to enhance the image details, thus implementing the retouching stage. Extensive experiments demonstrate that our method outperforms previous SOTA approaches, significantly improving the generation of high-quality, semantically consistent, and visually diverse images for complex scenes, even with intricate prompts.

Updated: 2024-08-25 15:05:32

标题: 像艺术家一样绘画：通过构图、绘画和润饰的扩散模型生成复杂场景

摘要: 最近关于文本到图像扩散模型的进展展示了出色的图像质量能力。然而，复杂场景生成仍然相对未开发，甚至“复杂场景”本身的定义仍然不清晰。在本文中，我们通过提供复杂场景的精确定义和基于该定义的一组复杂分解标准（CDC）来填补这一空白。受艺术家绘画过程的启发，我们提出了一个名为Complex Diffusion（CxD）的无需训练的扩散框架，将过程分为三个阶段：组合、绘画和润饰。我们的方法利用大型语言模型（LLMs）强大的思维链能力来基于CDC分解复杂提示并管理组合和布局。然后，我们开发了一种注意力调节方法，将简单提示引导至特定区域以完成复杂场景绘画。最后，我们将LLM的详细输出注入到一个润饰模型中，以增强图像细节，从而实现润饰阶段。大量实验证明，我们的方法优于先前的最先进方法，显著改善了为复杂场景生成高质量、语义一致和视觉多样的图像，即使使用复杂的提示。

更新时间: 2024-08-25 15:05:32

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.13858v1

Tangram: A Challenging Benchmark for Geometric Element Recognizing

Significant advancements in Large Multimodal Models (LMMs) have enabled them to tackle complex problems involving visual-mathematical reasoning. However, their ability to identify geometric elements remains understudied. To bridge this gap, we introduce Tangram, a novel benchmark designed to evaluate the performance of LMMs on geometric element recognition. Tangram includes 1,080 diverse geometric diagrams sourced from primary and secondary school exams, competitions, and textbooks, covering from simple basic geometric shapes to complex combinations. Each diagram is associated with four questions, resulting in a total of 4,320 visual-question-answer pairs. Unlike existing benchmarks that seek higher-level cognition and reasoning, Tangram focuses on the understanding of geometric elements, requiring models to perform a "simple but interesting" counting task. Systematic evaluation of 10 prominent LMMs, such as GPT-4o and Claude 3.5 Sonnet, shows that even in the seemingly simple task, these models still face significant challenges. Notably, the overall accuracy of the top performer across all tested models is only 56.8%, marking a significant gap when compared to human performance. These findings highlight the limitations of current multimodal artificial intelligence systems in handling basic perception tasks, and will inspire the development of the next generation of expert-level multimodal foundational models. The Tangram and evaluation code will be available soon.

Updated: 2024-08-25 14:47:25

标题: 七巧板：几何元素识别的挑战性基准

摘要: 大型多模型(LMMs)的显著进展使它们能够解决涉及视觉数学推理的复杂问题。然而，它们识别几何元素的能力仍未得到充分研究。为了弥补这一差距，我们引入了Tangram，一个新颖的基准，旨在评估LMMs在几何元素识别上的性能。Tangram包括来自小学和中学考试、竞赛和教科书的1,080个多样化的几何图表，涵盖从简单的基本几何形状到复杂的组合。每个图表都与四个问题相关联，共计4,320个视觉-问题-答案对。与寻求更高级认知和推理的现有基准不同，Tangram侧重于对几何元素的理解，要求模型执行一个“简单但有趣”的计数任务。对10个知名LMMs进行系统评估，如GPT-4o和Claude 3.5 Sonnet，结果显示，即使在看似简单的任务中，这些模型仍面临着重大挑战。值得注意的是，在所有测试模型中，表现最佳的模型的总体准确率仅为56.8％，与人类表现相比存在显著差距。这些发现突出了当前多模式人工智能系统在处理基本感知任务方面的局限性，将激发下一代专家级多模式基础模型的发展。Tangram和评估代码将很快提供。

更新时间: 2024-08-25 14:47:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.13854v1

Condensed Sample-Guided Model Inversion for Knowledge Distillation

Knowledge distillation (KD) is a key element in neural network compression that allows knowledge transfer from a pre-trained teacher model to a more compact student model. KD relies on access to the training dataset, which may not always be fully available due to privacy concerns or logistical issues related to the size of the data. To address this, "data-free" KD methods use synthetic data, generated through model inversion, to mimic the target data distribution. However, conventional model inversion methods are not designed to utilize supplementary information from the target dataset, and thus, cannot leverage it to improve performance, even when it is available. In this paper, we consider condensed samples, as a form of supplementary information, and introduce a method for using them to better approximate the target data distribution, thereby enhancing the KD performance. Our approach is versatile, evidenced by improvements of up to 11.4% in KD accuracy across various datasets and model inversion-based methods. Importantly, it remains effective even when using as few as one condensed sample per class, and can also enhance performance in few-shot scenarios where only limited real data samples are available.

Updated: 2024-08-25 14:43:27

标题: 紧缩样本引导的知识蒸馏模型反演

摘要: 知识蒸馏（KD）是神经网络压缩中的关键元素，它允许将预先训练的教师模型的知识转移给更紧凑的学生模型。KD依赖于训练数据集的访问，但由于隐私问题或与数据规模相关的物流问题，训练数据集并不总是完全可用。为了解决这个问题，“无数据”KD方法使用通过模型反演生成的合成数据来模拟目标数据分布。然而，传统的模型反演方法并未设计用于利用目标数据集的补充信息，因此，即使这些信息可用，也无法利用它来提高性能。在本文中，我们考虑压缩样本作为补充信息的一种形式，并介绍一种利用它们更好地近似目标数据分布的方法，从而增强KD性能。我们的方法是多功能的，通过在各种数据集和基于模型反演的方法中提高KD准确性高达11.4%的改进。重要的是，即使每个类别只使用一个压缩样本，它仍然有效，并且还可以增强在仅有限真实数据样本可用的少样本场景中的性能。

更新时间: 2024-08-25 14:43:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.13850v1

Structural Pruning of Pre-trained Language Models via Neural Architecture Search

Pre-trained language models (PLM), for example BERT or RoBERTa, mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data. However, their large size poses challenges in deploying them for inference in real-world applications, due to significant GPU memory requirements and high inference latency. This paper explores neural architecture search (NAS) for structural pruning to find sub-parts of the fine-tuned network that optimally trade-off efficiency, for example in terms of model size or latency, and generalization performance. We also show how we can utilize more recently developed two-stage weight-sharing NAS approaches in this setting to accelerate the search process. Unlike traditional pruning methods with fixed thresholds, we propose to adopt a multi-objective approach that identifies the Pareto optimal set of sub-networks, allowing for a more flexible and automated compression process.

Updated: 2024-08-25 14:41:32

标题: 使用神经结构搜索对预训练语言模型进行结构修剪

摘要: 预训练语言模型（PLM），例如BERT或RoBERTa，在标记有标记数据的情况下进行微调时，成为自然语言理解任务的最新技术。然而，它们的庞大尺寸在实际应用中部署推断时存在挑战，由于显著的GPU存储需求和高推断延迟。本文探讨了神经架构搜索（NAS）用于结构裁剪，以找到微调网络的最佳效率折衷部分，例如模型大小或延迟，并且推广性能。我们还展示了如何在这种情况下利用更近期开发的两阶段权重共享NAS方法来加速搜索过程。与具有固定阈值的传统剪枝方法不同，我们建议采用多目标方法，识别出帕累托最优子网络集，从而实现更灵活和自动化的压缩过程。

更新时间: 2024-08-25 14:41:32

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.02267v2

Sample-Independent Federated Learning Backdoor Attack

In federated learning, backdoor attacks embed triggers in the adversarial client's data to inject a backdoor into the model. To evade detection through sample analysis, non-sample-modifying backdoor attack methods based on dropout have been developed. However, these methods struggle to covertly utilize dropout in evaluation mode, thus hindering their deployment in real-world scenarios. To address these, this paper introduces GhostB, a novel approach to federated learning backdoor attacks that neither alters samples nor relies on dropout. This method employs the behavior of neurons producing specific values as triggers. By mapping these neuronal values to categories specified by the adversary, the backdoor is implanted and activated when particular feature values are detected at designated neurons. Our experiments conducted on TIMIT, LibriSpeech, and VoxCeleb2 databases in both Closed Set Identification (CSI) and Open Set Identification (OSI) scenarios demonstrate that GhostB achieves a 100% success rate upon activation, with this rate maintained across experiments involving 1 to 50 ghost neurons. This paper investigates how the dispersion of neurons and their depth within hidden layers affect the success rate, revealing that increased dispersion and positioning of neurons can significantly decrease effectiveness, potentially rendering the attack unsuccessful.

Updated: 2024-08-25 14:38:13

标题: 样本无关的联邦学习后门攻击

摘要: 在联邦学习中，后门攻击通过在对抗性客户端的数据中嵌入触发器，向模型注入后门。为了避免通过样本分析检测，基于辍学的非样本修改后门攻击方法已经被开发出来。然而，这些方法在评估模式下难以秘密地利用辍学，从而阻碍了它们在现实世界场景中的部署。为了解决这些问题，本文介绍了GhostB，一种新颖的联邦学习后门攻击方法，既不改变样本也不依赖于辍学。该方法利用产生特定数值的神经元的行为作为触发器。通过将这些神经元的数值映射到对手指定的类别，当在指定神经元检测到特定特征值时，后门被植入并激活。我们在TIMIT、LibriSpeech和VoxCeleb2数据库上进行的实验表明，GhostB在激活后实现了100%的成功率，在涉及1到50个幽灵神经元的实验中，这一成功率得以保持。本文调查了神经元的分散程度和它们在隐藏层中的深度如何影响成功率，揭示了神经元的增加分散和定位可以显著降低效果，潜在地使攻击失败。

更新时间: 2024-08-25 14:38:13

领域: cs.CR

下载: http://arxiv.org/abs/2408.13849v1

UniGraph: Learning a Unified Cross-Domain Foundation Model for Text-Attributed Graphs

Foundation models like ChatGPT and GPT-4 have revolutionized artificial intelligence, exhibiting remarkable abilities to generalize across a wide array of tasks and applications beyond their initial training objectives. However, graph learning has predominantly focused on single-graph models, tailored to specific tasks or datasets, lacking the ability to transfer learned knowledge to different domains. This limitation stems from the inherent complexity and diversity of graph structures, along with the different feature and label spaces specific to graph data. In this paper, we recognize text as an effective unifying medium and employ Text-Attributed Graphs (TAGs) to leverage this potential. We present our UniGraph framework, designed to learn a foundation model for TAGs, which is capable of generalizing to unseen graphs and tasks across diverse domains. Unlike single-graph models that use pre-computed node features of varying dimensions as input, our approach leverages textual features for unifying node representations, even for graphs such as molecular graphs that do not naturally have textual features. We propose a novel cascaded architecture of Language Models (LMs) and Graph Neural Networks (GNNs) as backbone networks. Additionally, we propose the first pre-training algorithm specifically designed for large-scale self-supervised learning on TAGs, based on Masked Graph Modeling. We introduce graph instruction tuning using Large Language Models (LLMs) to enable zero-shot prediction ability. Our comprehensive experiments across various graph learning tasks and domains demonstrate the model's effectiveness in self-supervised representation learning on unseen graphs, few-shot in-context transfer, and zero-shot transfer, even surpassing or matching the performance of GNNs that have undergone supervised training on target datasets.

Updated: 2024-08-25 14:37:35

标题: UniGraph：学习文本属性图的统一跨领域基础模型

摘要: 基于 ChatGPT 和 GPT-4 等基础模型的文献摘要已经彻底改变了人工智能，展示了在广泛的任务和应用中泛化的显著能力，超越了它们最初的训练目标。然而，图学习主要集中在针对特定任务或数据集的单一图模型上，缺乏将学习知识转移到不同领域的能力。这一限制源于图结构的固有复杂性和多样性，以及特定于图数据的不同特征和标签空间。在本文中，我们将文本视为一种有效的统一媒介，采用文本属性图（TAGs）来利用这种潜力。我们提出了我们的 UniGraph 框架，旨在学习 TAGs 的基础模型，能够泛化到未见过的图和跨多个领域的任务。与使用预先计算的不同维度的节点特征作为输入的单一图模型不同，我们的方法利用文本特征来统一节点表示，即使是分子图等自然不具备文本特征的图。我们提出了一种语言模型（LMs）和图神经网络（GNNs）的级联架构作为骨干网络。此外，我们提出了第一个专门设计用于 TAGs 上大规模自监督学习的预训练算法，基于遮蔽图建模。我们引入了使用大型语言模型（LLMs）进行图指令调整，以实现零-shot预测能力。我们在各种图学习任务和领域展开的全面实验表明，该模型在未见过的图上的自监督表示学习、少量上下文转移和零-shot转移方面的有效性，甚至超过或与在目标数据集上进行监督训练的 GNNs 的性能相匹敌。

更新时间: 2024-08-25 14:37:35

领域: cs.LG

下载: http://arxiv.org/abs/2402.13630v2

Conformalized Answer Set Prediction for Knowledge Graph Embedding

Knowledge graph embeddings (KGE) apply machine learning methods on knowledge graphs (KGs) to provide non-classical reasoning capabilities based on similarities and analogies. The learned KG embeddings are typically used to answer queries by ranking all potential answers, but rankings often lack a meaningful probabilistic interpretation - lower-ranked answers do not necessarily have a lower probability of being true. This limitation makes it difficult to distinguish plausible from implausible answers, posing challenges for the application of KGE methods in high-stakes domains like medicine. We address this issue by applying the theory of conformal prediction that allows generating answer sets, which contain the correct answer with probabilistic guarantees. We explain how conformal prediction can be used to generate such answer sets for link prediction tasks. Our empirical evaluation on four benchmark datasets using six representative KGE methods validates that the generated answer sets satisfy the probabilistic guarantees given by the theory of conformal prediction. We also demonstrate that the generated answer sets often have a sensible size and that the size adapts well with respect to the difficulty of the query.

Updated: 2024-08-25 14:13:15

标题: 知识图谱嵌入的规范化答案集预测

摘要: 知识图谱嵌入（KGE）应用机器学习方法在知识图谱（KGs）上提供基于相似性和类比的非经典推理能力。通常使用学习的知识图谱嵌入来通过对所有潜在答案进行排名来回答查询，但排名经常缺乏有意义的概率解释-排名较低的答案并不一定具有更低的真实概率。这种限制使得难以区分可信和不可信答案，为在高风险领域如医学中应用KGE方法带来挑战。我们通过应用符合预测理论来解决这个问题，这使得生成包含正确答案的答案集成为可能，并提供概率保证。我们解释了如何利用符合预测来为链接预测任务生成这样的答案集。我们在四个基准数据集上使用六种代表性的KGE方法进行的实证评估验证了生成的答案集满足符合预测理论给出的概率保证。我们还证明生成的答案集往往具有合理的大小，并且大小能够很好地适应查询的难度。

更新时间: 2024-08-25 14:13:15

领域: cs.AI

下载: http://arxiv.org/abs/2408.08248v2

Align and Distill: Unifying and Improving Domain Adaptive Object Detection

Object detectors often perform poorly on data that differs from their training set. Domain adaptive object detection (DAOD) methods have recently demonstrated strong results on addressing this challenge. Unfortunately, we identify systemic benchmarking pitfalls that call past results into question and hamper further progress: (a) Overestimation of performance due to underpowered baselines, (b) Inconsistent implementation practices preventing transparent comparisons of methods, and (c) Lack of generality due to outdated backbones and lack of diversity in benchmarks. We address these problems by introducing: (1) A unified benchmarking and implementation framework, Align and Distill (ALDI), enabling comparison of DAOD methods and supporting future development, (2) A fair and modern training and evaluation protocol for DAOD that addresses benchmarking pitfalls, (3) A new DAOD benchmark dataset, CFC-DAOD, enabling evaluation on diverse real-world data, and (4) A new method, ALDI++, that achieves state-of-the-art results by a large margin. ALDI++ outperforms the previous state-of-the-art by +3.5 AP50 on Cityscapes to Foggy Cityscapes, +5.7 AP50 on Sim10k to Cityscapes (where ours is the only method to outperform a fair baseline), and +0.6 AP50 on CFC Kenai to Channel. Our framework, dataset, and state-of-the-art method offer a critical reset for DAOD and provide a strong foundation for future research. Code and data are available: https://github.com/justinkay/aldi and https://github.com/visipedia/caltech-fish-counting.

Updated: 2024-08-25 14:05:18

标题: 对齐与精炼：统一和改进领域自适应目标检测

摘要: 目标检测器通常在与其训练集不同的数据上表现不佳。最近，域自适应目标检测（DAOD）方法已经展现出在解决这一挑战方面的强大结果。不幸的是，我们发现了系统性的基准测试陷阱，这些陷阱使过去的结果受到质疑，并妨碍了进一步的进展：（a）由于基线不足而导致性能被高估，（b）不一致的实施实践阻止了方法之间的透明比较，以及（c）由于过时的骨干网络和基准测试中缺乏多样性而导致的缺乏普遍性。我们通过引入以下方法来解决这些问题：（1）一个统一的基准测试和实施框架，Align and Distill（ALDI），使DAOD方法的比较成为可能，并支持未来的发展，（2）一个公平且现代的DAOD训练和评估协议，解决基准测试陷阱，（3）一个新的DAOD基准数据集，CFC-DAOD，使得在各种真实世界数据上进行评估成为可能，以及（4）一种新方法，ALDI++，通过一个很大的优势实现了最先进的结果。ALDI++在Cityscapes到Foggy Cityscapes上的AP50比以前的最先进方法高出+3.5，在Sim10k到Cityscapes上的AP50比以前的最先进方法高出+5.7（在这里我们是唯一一个优于公平基线的方法），在CFC Kenai到Channel上的AP50比以前的最先进方法高出+0.6。我们的框架、数据集和最先进方法为DAOD提供了一个关键的重置，并为未来的研究奠定了坚实的基础。代码和数据可在以下链接找到：https://github.com/justinkay/aldi 和 https://github.com/visipedia/caltech-fish-counting。

更新时间: 2024-08-25 14:05:18

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.12029v2

PropSAM: A Propagation-Based Model for Segmenting Any 3D Objects in Multi-Modal Medical Images

Volumetric segmentation is crucial for medical imaging but is often constrained by labor-intensive manual annotations and the need for scenario-specific model training. Furthermore, existing general segmentation models are inefficient due to their design and inferential approaches. Addressing this clinical demand, we introduce PropSAM, a propagation-based segmentation model that optimizes the use of 3D medical structure information. PropSAM integrates a CNN-based UNet for intra-slice processing with a Transformer-based module for inter-slice propagation, focusing on structural and semantic continuities to enhance segmentation across various modalities. Distinctively, PropSAM operates on a one-view prompt, such as a 2D bounding box or sketch mask, unlike conventional models that require two-view prompts. It has demonstrated superior performance, significantly improving the Dice Similarity Coefficient (DSC) across 44 medical datasets and various imaging modalities, outperforming models like MedSAM and SegVol with an average DSC improvement of 18.1%. PropSAM also maintains stable predictions despite prompt deviations and varying propagation configurations, confirmed by one-way ANOVA tests with P>0.5985 and P>0.6131, respectively. Moreover, PropSAM's efficient architecture enables faster inference speeds (Wilcoxon rank-sum test, P<0.001) and reduces user interaction time by 37.8% compared to two-view prompt models. Its ability to handle irregular and complex objects with robust performance further demonstrates its potential in clinical settings, facilitating more automated and reliable medical imaging analyses with minimal retraining.

Updated: 2024-08-25 13:42:47

标题: PropSAM：一种基于传播的模型，用于在多模态医学图像中分割任何3D对象

摘要: 容积分割对于医学成像至关重要，但通常受到耗时的手动注释和需要特定场景模型训练的限制。此外，由于设计和推理方法的存在，现有的通用分割模型效率低下。为了满足这一临床需求，我们引入了PropSAM，这是一种基于传播的分割模型，优化了对3D医学结构信息的利用。PropSAM将基于CNN的UNet用于切片内处理，与基于Transformer的模块用于切片间传播相结合，侧重于结构和语义的连续性，以增强在各种模态下的分割。与传统模型需要两个视图提示不同，PropSAM在一个视图提示上运行，例如2D边界框或草图掩码。它表现出优越的性能，显著提高了44个医学数据集和各种成像模态下的Dice相似系数（DSC），优于MedSAM和SegVol模型，平均DSC提高了18.1%。PropSAM在提示偏差和不同传播配置下仍保持稳定的预测，分别通过单向ANOVA测试得到P>0.5985和P>0.6131。此外，PropSAM的高效架构使得推理速度更快（Wilcoxon秩和检验，P<0.001），并与两个视图提示模型相比，减少用户交互时间37.8%。其处理不规则和复杂对象的能力以稳健性能进一步展示了其在临床环境中的潜力，促进更自动化和可靠的医学成像分析，减少重新训练的需要。

更新时间: 2024-08-25 13:42:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.13836v1

Understanding Hallucinations in Diffusion Models through Mode Interpolation

Colloquially speaking, image generation models based upon diffusion processes are frequently said to exhibit "hallucinations," samples that could never occur in the training data. But where do such hallucinations come from? In this paper, we study a particular failure mode in diffusion models, which we term mode interpolation. Specifically, we find that diffusion models smoothly "interpolate" between nearby data modes in the training set, to generate samples that are completely outside the support of the original training distribution; this phenomenon leads diffusion models to generate artifacts that never existed in real data (i.e., hallucinations). We systematically study the reasons for, and the manifestation of this phenomenon. Through experiments on 1D and 2D Gaussians, we show how a discontinuous loss landscape in the diffusion model's decoder leads to a region where any smooth approximation will cause such hallucinations. Through experiments on artificial datasets with various shapes, we show how hallucination leads to the generation of combinations of shapes that never existed. Finally, we show that diffusion models in fact know when they go out of support and hallucinate. This is captured by the high variance in the trajectory of the generated sample towards the final few backward sampling process. Using a simple metric to capture this variance, we can remove over 95% of hallucinations at generation time while retaining 96% of in-support samples. We conclude our exploration by showing the implications of such hallucination (and its removal) on the collapse (and stabilization) of recursive training on synthetic data with experiments on MNIST and 2D Gaussians dataset. We release our code at https://github.com/locuslab/diffusion-model-hallucination.

Updated: 2024-08-25 13:41:50

标题: 通过模态插值理解扩散模型中的幻觉

摘要: 口语化来说，基于扩散过程的图像生成模型经常被认为展示“幻觉”，即在训练数据中永远不会出现的样本。但这种幻觉是从何而来呢？本文研究了扩散模型中的一种特定失败模式，我们称之为模式插值。具体来说，我们发现扩散模型会在训练集中的相邻数据模式之间平滑地“插值”，生成完全超出原始训练分布支撑范围的样本；这种现象导致扩散模型生成从未存在过的人为现象（即幻觉）。我们系统地研究了这一现象的原因和表现。通过对1D和2D高斯分布的实验，我们展示了扩散模型解码器中的不连续损失景观导致了一个区域，在这个区域中任何平滑的近似都会导致这种幻觉。通过在人工数据集上进行各种形状的实验，我们展示了幻觉导致生成从未存在过的形状组合。最后，我们展示了扩散模型实际上知道何时超出支持范围并产生幻觉。这通过生成样本朝向最后几个反向采样过程的高方差轨迹来捕捉。使用一个简单的度量来捕捉这种方差，我们可以在生成时去除超过95%的幻觉，同时保留96%的支持样本。我们通过在MNIST和2D高斯数据集上进行实验，展示了这种幻觉（及其去除）对合成数据上递归训练的崩溃（和稳定）的影响。我们在https://github.com/locuslab/diffusion-model-hallucination发布了我们的代码。

更新时间: 2024-08-25 13:41:50

领域: cs.LG

下载: http://arxiv.org/abs/2406.09358v2

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!

Annually, at the Conference of Machine Translation (WMT), the Metrics Shared Task organizers conduct the meta-evaluation of Machine Translation (MT) metrics, ranking them according to their correlation with human judgments. Their results guide researchers toward enhancing the next generation of metrics and MT systems. With the recent introduction of neural metrics, the field has witnessed notable advancements. Nevertheless, the inherent opacity of these metrics has posed substantial challenges to the meta-evaluation process. This work highlights two issues with the meta-evaluation framework currently employed in WMT, and assesses their impact on the metrics rankings. To do this, we introduce the concept of sentinel metrics, which are designed explicitly to scrutinize the meta-evaluation process's accuracy, robustness, and fairness. By employing sentinel metrics, we aim to validate our findings, and shed light on and monitor the potential biases or inconsistencies in the rankings. We discover that the present meta-evaluation framework favors two categories of metrics: i) those explicitly trained to mimic human quality assessments, and ii) continuous metrics. Finally, we raise concerns regarding the evaluation capabilities of state-of-the-art metrics, emphasizing that they might be basing their assessments on spurious correlations found in their training data.

Updated: 2024-08-25 13:29:34

标题: 机器翻译元评估的守护者：哨兵指标垮台！

摘要: 每年，在机器翻译大会（WMT）上，指标共享任务组织者进行机器翻译（MT）指标的元评估，根据它们与人类判断的相关性对其进行排名。他们的结果指导研究人员改进下一代指标和MT系统。随着神经指标的最近引入，该领域已经见证了显着进展。然而，这些指标固有的不透明性给元评估过程带来了重大挑战。本文重点介绍了WMT目前采用的元评估框架中存在的两个问题，并评估了它们对指标排名的影响。为此，我们引入了哨兵指标的概念，这些指标专门设计用于监控元评估过程的准确性、鲁棒性和公平性。通过使用哨兵指标，我们旨在验证我们的发现，并揭示和监测排名中潜在的偏见或不一致性。我们发现当前的元评估框架偏向于两类指标：i）明确训练以模仿人类质量评估的指标和ii）连续指标。最后，我们对最先进指标的评估能力提出了担忧，强调它们可能是基于在训练数据中发现的虚假相关性进行评估。

更新时间: 2024-08-25 13:29:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.13831v1

Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting

Traffic forecasting has emerged as a crucial research area in the development of smart cities. Although various neural networks with intricate architectures have been developed to address this problem, they still face two key challenges: i) Recent advancements in network designs for modeling spatio-temporal correlations are starting to see diminishing returns in performance enhancements. ii) Additionally, most models do not account for the spatio-temporal heterogeneity inherent in traffic data, i.e., traffic distribution varies significantly across different regions and traffic flow patterns fluctuate across various time slots. To tackle these challenges, we introduce the Spatio-Temporal Graph Transformer (STGormer), which effectively integrates attribute and structure information inherent in traffic data for learning spatio-temporal correlations, and a mixture-of-experts module for capturing heterogeneity along spaital and temporal axes. Specifically, we design two straightforward yet effective spatial encoding methods based on the graph structure and integrate time position encoding into the vanilla transformer to capture spatio-temporal traffic patterns. Additionally, a mixture-of-experts enhanced feedforward neural network (FNN) module adaptively assigns suitable expert layers to distinct patterns via a spatio-temporal gating network, further improving overall prediction accuracy. Experiments on real-world traffic datasets demonstrate that STGormer achieves state-of-the-art performance.

Updated: 2024-08-25 13:29:28

标题: 导航时空异质性：一种用于交通预测的图变换器方法

摘要: 交通预测已经成为智慧城市发展中至关重要的研究领域。尽管已经开发了各种复杂结构的神经网络来解决这个问题，但它们仍面临两个关键挑战：i）用于建模时空相关性的网络设计的最新进展开始在性能提升方面出现递减收益。ii）此外，大多数模型没有考虑交通数据中固有的时空异质性，即交通分布在不同地区之间明显变化，交通流模式在不同时间段之间波动。为了应对这些挑战，我们引入了Spatio-Temporal Graph Transformer (STGormer)，它有效地整合了交通数据中固有的属性和结构信息，用于学习时空相关性，并引入了一个专家混合模块，用于在空间和时间轴上捕捉异质性。具体来说，我们设计了两种简单而有效的基于图结构的空间编码方法，并将时间位置编码集成到基本的Transformer中，以捕捉时空交通模式。此外，一个增强的专家混合前馈神经网络（FNN）模块通过时空门控网络自适应地将适合的专家层分配给不同的模式，进一步提高整体预测准确性。对真实交通数据集的实验表明，STGormer实现了最先进的性能。

更新时间: 2024-08-25 13:29:28

领域: cs.LG

下载: http://arxiv.org/abs/2408.10822v2

AI-Powered Energy Algorithmic Trading: Integrating Hidden Markov Models with Neural Networks

In quantitative finance, machine learning methods are essential for alpha generation. This study introduces a new approach that combines Hidden Markov Models (HMM) and neural networks, integrated with Black-Litterman portfolio optimization. During the COVID period (2019-2022), this dual-model approach achieved a 83% return with a Sharpe ratio of 0.77. It incorporates two risk models to enhance risk management, showing efficiency during volatile periods. The methodology was implemented on the QuantConnect platform, which was chosen for its robust framework and experimental reproducibility. The system, which predicts future price movements, includes a three-year warm-up to ensure proper algorithm function. It targets highly liquid, large-cap energy stocks to ensure stable and predictable performance while also considering broker payments. The dual-model alpha system utilizes log returns to select the optimal state based on the historical performance. It combines state predictions with neural network outputs, which are based on historical data, to generate trading signals. This study examined the architecture of the trading system, data pre-processing, training, and performance. The full code and backtesting data are available under the QuantConnect terms.

Updated: 2024-08-25 13:01:32

标题: 基于人工智能的能源算法交易：将隐马尔可夫模型与神经网络相结合

摘要: 在量化金融领域，机器学习方法对于Alpha收益的生成至关重要。本研究引入了一种新方法，将隐马尔可夫模型（HMM）和神经网络与Black-Litterman组合优化相结合。在COVID期间（2019-2022年），这种双模型方法实现了83%的回报率，夏普比率为0.77。它整合了两个风险模型以增强风险管理，在波动期间表现出高效性。该方法在QuantConnect平台上实施，选择该平台是因为其强大的框架和实验可重复性。该系统预测未来价格走势，包括三年的预热以确保算法功能正常。它针对高流动性、大市值的能源股票，以确保稳定和可预测的表现，同时考虑经纪人支付。双模型Alpha系统利用对数收益率根据历史表现选择最佳状态。它将状态预测与基于历史数据的神经网络输出相结合，生成交易信号。本研究考察了交易系统的架构、数据预处理、训练和表现。完整的代码和回测数据可在QuantConnect条款下获得。

更新时间: 2024-08-25 13:01:32

领域: q-fin.PM,cs.LG,q-fin.GN,stat.AP

下载: http://arxiv.org/abs/2407.19858v5

RoCP-GNN: Robust Conformal Prediction for Graph Neural Networks in Node-Classification

Graph Neural Networks (GNNs) have emerged as powerful tools for predicting outcomes in graph-structured data. However, a notable limitation of GNNs is their inability to provide robust uncertainty estimates, which undermines their reliability in contexts where errors are costly. One way to address this issue is by providing prediction sets that contain the true label with a predefined probability margin. Our approach builds upon conformal prediction (CP), a framework that promises to construct statistically robust prediction sets or intervals. There are two primary challenges: first, given dependent data like graphs, it is unclear whether the critical assumption in CP - exchangeability - still holds when applied to node classification. Second, even if the exchangeability assumption is valid for conformalized link prediction, we need to ensure high efficiency, i.e., the resulting prediction set or the interval length is small enough to provide useful information. In this article, we propose a novel approach termed Robust Conformal Prediction for GNNs (RoCP-GNN), which integrates conformal prediction (CP) directly into the GNN training process. This method generates prediction sets, instead of just point predictions, that are valid at a user-defined confidence level, assuming only exchangeability. Our approach robustly predicts outcomes with any predictive GNN model while quantifying the uncertainty in predictions within the realm of graph-based semi-supervised learning (SSL). Experimental results demonstrate that GNN models with size loss provide a statistically significant increase in performance. We validate our approach on standard graph benchmark datasets by coupling it with various state-of-the-art GNNs in node classification. The code will be made available after publication.

Updated: 2024-08-25 12:51:19

标题: RoCP-GNN：节点分类中图神经网络的鲁棒合规预测

摘要: 图神经网络（GNNs）已经成为预测图结构数据结果的强大工具。然而，GNNs的一个显著局限是它们无法提供稳健的不确定性估计，这削弱了它们在错误成本高昂的情境中的可靠性。解决这个问题的一种方法是提供包含真实标签的预测集，其具有预定义的概率边界。我们的方法建立在符合性预测（CP）之上，该框架承诺构建统计上健壮的预测集或区间。存在两个主要挑战：首先，对于类似图形的依赖数据，不清楚在应用于节点分类时CP中的关键假设-可交换性-是否仍然成立。其次，即使可交换性假设对于符合化的链接预测是有效的，我们需要确保高效性，即生成的预测集或区间长度足够小，以提供有用信息。在本文中，我们提出了一种名为Robust Conformal Prediction for GNNs（RoCP-GNN）的新方法，该方法将符合性预测（CP）直接集成到GNN训练过程中。该方法生成预测集，而不仅仅是点预测，这些预测集在用户定义的置信水平下是有效的，假设只有可交换性。我们的方法在任何预测GNN模型中稳健地预测结果，同时在图基半监督学习（SSL）领域内量化预测的不确定性。实验结果表明，具有大小损失的GNN模型在性能上提供了统计上显著的增加。我们通过将其与各种最先进的GNNs耦合在节点分类中的标准图基准数据集上验证了我们的方法。代码将在发表后提供。

更新时间: 2024-08-25 12:51:19

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2408.13825v1

A proof of contribution in blockchain using game theoretical deep learning model

Building elastic and scalable edge resources is an inevitable prerequisite for providing platform-based smart city services. Smart city services are delivered through edge computing to provide low-latency applications. However, edge computing has always faced the challenge of limited resources. A single edge device cannot undertake the various intelligent computations in a smart city, and the large-scale deployment of edge devices from different service providers to build an edge resource platform has become a necessity. Selecting computing power from different service providers is a game-theoretic problem. To incentivize service providers to actively contribute their valuable resources and provide low-latency collaborative computing power, we introduce a game-theoretic deep learning model to reach a consensus among service providers on task scheduling and resource provisioning. Traditional centralized resource management approaches are inefficient and lack credibility, while the introduction of blockchain technology can enable decentralized resource trading and scheduling. We propose a contribution-based proof mechanism to provide the low-latency service of edge computing. The deep learning model consists of dual encoders and a single decoder, where the GNN (Graph Neural Network) encoder processes structured decision action data, and the RNN (Recurrent Neural Network) encoder handles time-series task scheduling data. Extensive experiments have demonstrated that our model reduces latency by 584% compared to the state-of-the-art.

Updated: 2024-08-25 12:40:19

标题: 区块链中使用博弈论深度学习模型的贡献证明证明

摘要: 构建弹性和可扩展的边缘资源是提供基于平台的智能城市服务的不可避免的先决条件。智能城市服务通过边缘计算提供低延迟的应用程序。然而，边缘计算一直面临资源有限的挑战。单个边缘设备无法承担智能城市中的各种智能计算任务，从不同服务提供商大规模部署边缘设备来构建边缘资源平台已成为必要。从不同服务提供商选择计算资源是一个博弈论问题。为了激励服务提供商积极贡献他们宝贵的资源，并提供低延迟的协同计算能力，我们引入了一个博弈论深度学习模型，以在任务调度和资源提供方面使服务提供商达成共识。传统的集中化资源管理方法效率低下且缺乏可信度，而引入区块链技术可以实现去中心化的资源交易和调度。我们提出了一种基于贡献的证明机制，以提供边缘计算的低延迟服务。深度学习模型由双编码器和单解码器组成，其中GNN（图神经网络）编码器处理结构化决策行动数据，而RNN（循环神经网络）编码器处理时间序列任务调度数据。大量实验证明，我们的模型将延迟降低了584％，相比之下，这超过了最先进的水平。

更新时间: 2024-08-25 12:40:19

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2409.07460v1

Supersonic OT: Fast Unconditionally Secure Oblivious Transfer

Oblivious Transfer (OT) is a fundamental cryptographic protocol with applications in secure Multi-Party Computation, Federated Learning, and Private Set Intersection. With the advent of quantum computing, it is crucial to develop unconditionally secure core primitives like OT to ensure their continued security in the post-quantum era. Despite over four decades since OT's introduction, the literature has predominantly relied on computational assumptions, except in cases using unconventional methods like noisy channels or a fully trusted party. Introducing "Supersonic OT", a highly efficient and unconditionally secure OT scheme that avoids public-key-based primitives, we offer an alternative to traditional approaches. Supersonic OT enables a receiver to obtain a response of size O(1). Its simple (yet non-trivial) design facilitates easy security analysis and implementation. The protocol employs a basic secret-sharing scheme, controlled swaps, the one-time pad, and a third-party helper who may be corrupted by a semi-honest adversary. Our implementation and runtime analysis indicate that a single instance of Supersonic OT completes in 0.35 milliseconds, making it up to 2000 times faster than the state-of-the-art base OT.

Updated: 2024-08-25 12:39:05

标题: 超音速OT:快速无条件安全的遗忘传输

摘要: 遗忘传输（OT）是一种基本的加密协议，应用于安全多方计算、联邦学习和私有集合交集。随着量子计算的出现，发展无条件安全的核心原语如OT对于确保它们在后量子时代的持续安全至关重要。尽管自OT引入以来已有四十多年的时间，文献主要依赖于计算假设，除非使用像嘈杂通道或完全信任的第三方之类的非常规方法。引入“超声速OT”，这是一种高效且无条件安全的OT方案，避免了基于公钥的原语，我们为传统方法提供了一种替代方案。超声速OT使接收方能够获得大小为O(1)的响应。其简单（但非平凡）的设计有助于进行简单的安全分析和实施。该协议采用基本的秘密共享方案、受控交换、一次性密码本和可能被半诚实对手损坏的第三方助手。我们的实施和运行时分析表明，单个超声速OT实例在0.35毫秒内完成，比现有的基本OT快2000倍。

更新时间: 2024-08-25 12:39:05

领域: cs.CR,cs.DB,cs.LG

下载: http://arxiv.org/abs/2406.15529v2

CF-KAN: Kolmogorov-Arnold Network-based Collaborative Filtering to Mitigate Catastrophic Forgetting in Recommender Systems

Collaborative filtering (CF) remains essential in recommender systems, leveraging user--item interactions to provide personalized recommendations. Meanwhile, a number of CF techniques have evolved into sophisticated model architectures based on multi-layer perceptrons (MLPs). However, MLPs often suffer from catastrophic forgetting, and thus lose previously acquired knowledge when new information is learned, particularly in dynamic environments requiring continual learning. To tackle this problem, we propose CF-KAN, a new CF method utilizing Kolmogorov-Arnold networks (KANs). By learning nonlinear functions on the edge level, KANs are more robust to the catastrophic forgetting problem than MLPs. Built upon a KAN-based autoencoder, CF-KAN is designed in the sense of effectively capturing the intricacies of sparse user--item interactions and retaining information from previous data instances. Despite its simplicity, our extensive experiments demonstrate 1) CF-KAN's superiority over state-of-the-art methods in recommendation accuracy, 2) CF-KAN's resilience to catastrophic forgetting, underscoring its effectiveness in both static and dynamic recommendation scenarios, and 3) CF-KAN's edge-level interpretation facilitating the explainability of recommendations.

Updated: 2024-08-25 12:12:08

标题: CF-KAN：基于Kolmogorov-Arnold网络的协同过滤以减轻推荐系统中的灾难性遗忘

摘要: 协同过滤（CF）在推荐系统中仍然至关重要，利用用户-物品交互来提供个性化推荐。与此同时，许多CF技术已经发展成基于多层感知器（MLPs）的复杂模型架构。然而，MLPs经常遭受灾难性遗忘，因此在学习新信息时失去先前获得的知识，特别是在需要持续学习的动态环境中。为了解决这个问题，我们提出了CF-KAN，一种利用Kolmogorov-Arnold网络（KAN）的新CF方法。通过在边缘级别学习非线性函数，KAN比MLPs更能抵御灾难性遗忘问题。基于KAN的自动编码器构建的CF-KAN旨在有效捕捉稀疏用户-物品交互的复杂性，并保留先前数据实例的信息。尽管它简单，但我们广泛的实验表明：1）CF-KAN在推荐准确性上优于最先进的方法，2）CF-KAN对灾难性遗忘具有韧性，强调其在静态和动态推荐场景中的有效性，以及3）CF-KAN的边缘级别解释促进了推荐的可解释性。

更新时间: 2024-08-25 12:12:08

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2409.05878v1

Evidential Graph Contrastive Alignment for Source-Free Blending-Target Domain Adaptation

In this paper, we firstly tackle a more realistic Domain Adaptation (DA) setting: Source-Free Blending-Target Domain Adaptation (SF-BTDA), where we can not access to source domain data while facing mixed multiple target domains without any domain labels in prior. Compared to existing DA scenarios, SF-BTDA generally faces the co-existence of different label shifts in different targets, along with noisy target pseudo labels generated from the source model. In this paper, we propose a new method called Evidential Contrastive Alignment (ECA) to decouple the blending target domain and alleviate the effect from noisy target pseudo labels. First, to improve the quality of pseudo target labels, we propose a calibrated evidential learning module to iteratively improve both the accuracy and certainty of the resulting model and adaptively generate high-quality pseudo target labels. Second, we design a graph contrastive learning with the domain distance matrix and confidence-uncertainty criterion, to minimize the distribution gap of samples of a same class in the blended target domains, which alleviates the co-existence of different label shifts in blended targets. We conduct a new benchmark based on three standard DA datasets and ECA outperforms other methods with considerable gains and achieves comparable results compared with those that have domain labels or source data in prior.

Updated: 2024-08-25 11:53:23

标题: 证据图对比对齐用于无源融合-目标域自适应

摘要: 在本文中，我们首先解决了一个更加现实的领域自适应（DA）设置：无源混合目标域适应（SF-BTDA），在这种情况下，我们无法访问源域数据，同时面临着没有先验域标签的混合多个目标域。与现有的DA场景相比，SF-BTDA通常面临着不同目标中的不同标签转移的共存，以及从源模型生成的嘈杂目标伪标签。在本文中，我们提出了一种名为证据对比对齐（ECA）的新方法，以解耦混合目标域并减轻来自嘈杂目标伪标签的影响。首先，为了提高伪目标标签的质量，我们提出了一个校准的证据学习模块，通过迭代改进结果模型的准确性和确定性，并自适应地生成高质量的伪目标标签。其次，我们设计了一个图对比学习，使用域距离矩阵和置信-不确定性标准，来最小化混合目标域中同一类样本的分布差异，从而减轻混合目标中不同标签转移的共存。我们基于三个标准DA数据集进行了新的基准测试，ECA在性能上优于其他方法，并与具有领域标签或先验源数据的方法取得了可比的结果。

更新时间: 2024-08-25 11:53:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.07527v2

A Joint Learning Model with Variational Interaction for Multilingual Program Translation

Programs implemented in various programming languages form the foundation of software applications. To alleviate the burden of program migration and facilitate the development of software systems, automated program translation across languages has garnered significant attention. Previous approaches primarily focus on pairwise translation paradigms, learning translation between pairs of languages using bilingual parallel data. However, parallel data is difficult to collect for some language pairs, and the distribution of program semantics across languages can shift, posing challenges for pairwise program translation. In this paper, we argue that jointly learning a unified model to translate code across multiple programming languages is superior to separately learning from bilingual parallel data. We propose Variational Interaction for Multilingual Program Translation~(VIM-PT), a disentanglement-based generative approach that jointly trains a unified model for multilingual program translation across multiple languages. VIM-PT disentangles code into language-shared and language-specific features, using variational inference and interaction information with a novel lower bound, then achieves program translation through conditional generation. VIM-PT demonstrates four advantages: 1) captures language-shared information more accurately from various implementations and improves the quality of multilingual program translation, 2) mines and leverages the capability of non-parallel data, 3) addresses the distribution shift of program semantics across languages, 4) and serves as a unified model, reducing deployment complexity.

Updated: 2024-08-25 11:33:52

标题: 一个带有变分交互的多语言程序翻译联合学习模型

摘要: 实现在各种编程语言中的程序构成了软件应用程序的基础。为了减轻程序迁移的负担并促进软件系统的开发，自动跨语言程序翻译引起了广泛关注。先前的方法主要集中在成对翻译范例上，利用双语平行数据学习两种语言之间的翻译。然而，对于一些语言对来说，平行数据很难收集，并且程序语义在不同语言之间的分布可能会发生变化，给成对程序翻译带来挑战。在本文中，我们认为共同学习一个统一模型以跨多种编程语言翻译代码优于单独从双语平行数据中学习。我们提出了基于解缠的生成方法Variational Interaction for Multilingual Program Translation（VIM-PT），该方法通过变分推断和交互信息以及新颖的下界共同训练一个统一模型，以跨多种语言实现多语言程序翻译。VIM-PT将代码解缠为语言共享和语言特定特征，然后通过条件生成实现程序翻译。VIM-PT展示了四个优点：1）更准确地捕获来自各种实现的语言共享信息，并提高多语言程序翻译的质量，2）挖掘并利用非平行数据的能力，3）解决程序语义在不同语言之间的分布偏移，4）作为一个统一模型，降低部署复杂性。

更新时间: 2024-08-25 11:33:52

领域: cs.SE,cs.AI,cs.LG,cs.PL

下载: http://arxiv.org/abs/2408.14515v1

Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena

Recent advancements in Large Language Models (LLMs) showcase advanced reasoning, yet NLP evaluations often depend on static benchmarks. Evaluating this necessitates environments that test strategic reasoning in dynamic, competitive scenarios requiring long-term planning. We introduce AucArena, a novel evaluation suite that simulates auctions, a setting chosen for being highly unpredictable and involving many skills related to resource and risk management, while also being easy to evaluate. We conduct controlled experiments using state-of-the-art LLMs to power bidding agents to benchmark their planning and execution skills. Our research demonstrates that LLMs, such as GPT-4, possess key skills for auction participation, such as budget management and goal adherence, which improve with adaptive strategies. This highlights LLMs' potential in modeling complex social interactions in competitive contexts. However, variability in LLM performance and occasional outperformance by simpler methods indicate opportunities for further advancements in LLM design and the value of our simulation environment for ongoing testing and refinement.

Updated: 2024-08-25 11:19:33

标题: 将你的钱投在你的嘴上：评估拍卖竞技场中LLM代理的战略规划和执行

摘要: 最近对大型语言模型（LLMs）的最新进展展示了先进的推理能力，但自然语言处理评估通常依赖于静态基准。评估这一点需要在动态、竞争性场景中测试战略推理的环境，这需要长期规划。我们介绍了AucArena，这是一个新颖的评估套件，模拟拍卖，选择这种设置是因为它非常不可预测，并涉及与资源和风险管理相关的许多技能，同时也易于评估。我们进行了受控实验，使用最先进的LLMs来驱动竞标代理，以基准测试它们的规划和执行技能。我们的研究表明，像GPT-4这样的LLMs具有拍卖参与的关键技能，如预算管理和目标遵守，这些技能随着自适应策略的提高而改善。这突显了LLMs在建模竞争环境中复杂社会互动方面的潜力。然而，LLM性能的变化和偶尔被更简单方法超越的情况表明LLM设计进一步发展的机会，以及我们的模拟环境对持续测试和改进的价值。

更新时间: 2024-08-25 11:19:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.05746v4

Improving Nonlinear Projection Heads using Pretrained Autoencoder Embeddings

This empirical study aims at improving the effectiveness of the standard 2-layer MLP projection head $g(\cdot)$ featured in the SimCLR framework through the use of pretrained autoencoder embeddings. Given a contrastive learning task with a largely unlabeled image classification dataset, we first train a shallow autoencoder architecture and extract its compressed representations contained in the encoder's embedding layer. After freezing the weights within this pretrained layer, we use it as a drop-in replacement for the input layer of SimCLR's default projector. Additionally, we also apply further architectural changes to the projector by decreasing its width and changing its activation function. The different projection heads are then used to contrastively train and evaluate a feature extractor $f(\cdot)$ following the SimCLR protocol, while also examining the performance impact of Z-score normalized datasets. Our experiments indicate that using a pretrained autoencoder embedding in the projector can not only increase classification accuracy by up to 2.9% or 1.7% on average but can also significantly decrease the dimensionality of the projection space. Our results also suggest, that using the sigmoid and tanh activation functions within the projector can outperform ReLU in terms of peak and average classification accuracy. When applying our presented projectors, then not applying Z-score normalization to datasets often increases peak performance. In contrast, the default projection head can benefit more from normalization. All experiments involving our pretrained projectors are conducted with frozen embeddings, since our test results indicate an advantage compared to using their non-frozen counterparts.

Updated: 2024-08-25 11:10:33

标题: 使用预训练的自编码器嵌入来改进非线性投影头

摘要: 这项实证研究旨在通过使用预训练的自编码器嵌入来改进SimCLR框架中标准的2层MLP投影头$g(\cdot)$的有效性。在一个大部分无标签的图像分类数据集上进行对比学习任务时，我们首先训练一个浅层自编码器架构，并提取其中包含在编码器嵌入层中的压缩表示。在冻结这个预训练层内的权重之后，我们将其用作SimCLR默认投影仪的输入层的替换。此外，我们还通过减小投影仪的宽度和更改其激活函数来进行进一步的架构更改。然后使用不同的投影头来对比地训练和评估遵循SimCLR协议的特征提取器$f(\cdot)$，同时还检查Z-score归一化数据集的性能影响。我们的实验表明，在投影仪中使用预训练的自编码器嵌入不仅可以将分类准确性提高高达2.9%或平均提高1.7%，还可以显著降低投影空间的维度。我们的结果还表明，在投影仪中使用sigmoid和tanh激活函数在峰值和平均分类准确性方面可以优于ReLU。当应用我们提出的投影仪时，不对数据集进行Z-score归一化通常会提高峰值性能。相比之下，默认的投影头更容易受益于归一化。所有涉及我们预训练投影仪的实验都是在冻结的嵌入情况下进行的，因为我们的测试结果表明与使用非冻结嵌入相比具有优势。

更新时间: 2024-08-25 11:10:33

领域: cs.LG,cs.CV,I.2.10

下载: http://arxiv.org/abs/2408.14514v1

On the Robustness of Kolmogorov-Arnold Networks: An Adversarial Perspective

Kolmogorov-Arnold Networks (KANs) have recently emerged as a novel approach to function approximation, demonstrating remarkable potential in various domains. Despite their theoretical promise, the robustness of KANs under adversarial conditions has yet to be thoroughly examined. In this paper, we explore the adversarial robustness of KANs, with a particular focus on image classification tasks. We assess the performance of KANs against standard white-box adversarial attacks, comparing their resilience to that of established neural network architectures. Further, we investigate the transferability of adversarial examples between KANs and Multilayer Perceptron (MLPs), deriving critical insights into the unique vulnerabilities of KANs. Our experiments use the MNIST, FashionMNIST, and KMNIST datasets, providing a comprehensive evaluation of KANs in adversarial scenarios. This work offers the first in-depth analysis of security in KANs, laying the groundwork for future research in this emerging field.

Updated: 2024-08-25 11:10:15

标题: Kolmogorov-Arnold网络的稳健性：一种对抗性视角

摘要: 科尔莫戈洛夫-阿诺德网络（KANs）最近已经成为一种新颖的函数逼近方法，在各个领域展现出了显著的潜力。尽管它们在理论上有很大的潜力，但KANs在对抗条件下的鲁棒性尚未得到彻底的研究。在本文中，我们探讨了KANs的对抗鲁棒性，特别关注图像分类任务。我们评估了KANs在标准白盒对抗攻击下的性能，比较它们与已建立的神经网络架构的韧性。此外，我们研究了对抗样本在KANs和多层感知器（MLPs）之间的可转移性，推导出了KANs独特脆弱性的关键见解。我们的实验使用了MNIST、FashionMNIST和KMNIST数据集，对KANs在对抗场景中进行了全面评估。这项工作提供了对KANs安全性的首次深入分析，为未来在这一新兴领域的研究奠定了基础。

更新时间: 2024-08-25 11:10:15

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2408.13809v1

Prior Learning in Introspective VAEs

Variational Autoencoders (VAEs) are a popular framework for unsupervised learning and data generation. A plethora of methods have been proposed focusing on improving VAEs, with the incorporation of adversarial objectives and the integration of prior learning mechanisms being prominent directions. When it comes to the former, an indicative instance is the recently introduced family of Introspective VAEs aiming at ensuring that a low likelihood is assigned to unrealistic samples. In this study, we focus on the Soft-IntroVAE (S-IntroVAE) and investigate the implication of incorporating a multimodal and learnable prior into this framework. Namely, we formulate the prior as a third player and show that when trained in cooperation with the decoder constitutes an effective way for prior learning, which shares the Nash Equilibrium with the vanilla S-IntroVAE. Furthermore, based on a modified formulation of the optimal ELBO in S-IntroVAE, we develop theoretically motivated regularizations, that is (i) adaptive variance clipping to stabilize training when learning the prior and (ii) responsibility regularization to discourage the formation of inactive prior mode. Finally, we perform a series of targeted experiments on a 2D density estimation benchmark and in an image generation setting comprised of the (F)-MNIST and CIFAR-10 datasets demonstrating the benefit of prior learning in S-IntroVAE in generation and representation learning.

Updated: 2024-08-25 10:54:25

标题: 内省VAEs中的先验学习

摘要: 变分自编码器（VAEs）是一种流行的无监督学习和数据生成框架。已经提出了大量方法，重点是改进VAEs，其中包括整合对抗性目标和先前学习机制是重要方向。在前者方面，一个典型的例子是最近引入的Introspective VAEs家族，旨在确保对不现实样本分配低概率。在这项研究中，我们关注Soft-IntroVAE（S-IntroVAE）并探讨将一个多模态和可学习的先验引入该框架的含义。换句话说，我们将先验设定为第三方参与者，并展示出当与解码器一起训练时构成先验学习的有效方式，与普通的S-IntroVAE共享纳什均衡。此外，基于S-IntroVAE中最优ELBO的修改公式，我们开发了理论动机的正则化方法，即（i）自适应方差截断以稳定学习先验时的训练，和（ii）责任正则化以避免不活跃先验模式的形成。最后，我们在一个二维密度估计基准和一个包含（F）-MNIST和CIFAR-10数据集的图像生成设置上进行一系列有针对性的实验，展示了在生成和表示学习中S-IntroVAE中先验学习的益处。

更新时间: 2024-08-25 10:54:25

领域: cs.LG

下载: http://arxiv.org/abs/2408.13805v1

AlignBench: Benchmarking Chinese Alignment of Large Language Models

Alignment has become a critical step for instruction-tuned Large Language Models (LLMs) to become helpful assistants. However, the effective evaluation of alignment for emerging Chinese LLMs is still largely unexplored. To fill in this gap, we introduce AlignBench, a comprehensive multi-dimensional benchmark for evaluating LLMs' alignment in Chinese. We design a human-in-the-loop data curation pipeline, containing eight main categories, 683 real-scenario rooted queries and corresponding human verified references. To ensure the correctness of references, each knowledge-intensive query is accompanied with evidences collected from reliable web sources (including URLs and quotations) by our annotators. For automatic evaluation, our benchmark employs a rule-calibrated multi-dimensional LLM-as-Judge~\cite{zheng2023judging} approach with Chain-of-Thought to generate explanations and final ratings, ensuring high reliability and interpretability. All evaluation code, data, and LLM generations are available at \url{https://github.com/THUDM/AlignBench}. Since its release, AlignBench has been adopted by top (Chinese) LLMs for evaluating their alignment capabilities in Chinese, including ChatGLM, Qwen, DeepSeek, Yi, Baichuan, and Abab.

Updated: 2024-08-25 09:58:57

标题: AlignBench：大型语言模型中文对齐的基准测试

摘要: 对于调优的大型语言模型（LLMs）来说，对齐已经成为一个关键步骤，使其成为有用的助手。然而，对于新兴的中文LLMs进行有效的对齐评估仍然是一个较少探索的领域。为了填补这一空白，我们引入了AlignBench，一个用于评估中文LLMs对齐的综合多维基准。我们设计了一个人机协作的数据筛选流水线，包含八个主要类别，683个真实场景根源查询以及相应的人工验证参考。为了确保参考的正确性，每个知识密集的查询都附带了我们的标注员从可靠网络来源（包括URL和引用）收集的证据。对于自动评估，我们的基准采用了一个规则校准的多维LLM作为评判者的方法，并结合思维链生成解释和最终评分，确保高可靠性和可解释性。所有评估代码、数据和LLM生成都可以在\url{https://github.com/THUDM/AlignBench}上找到。自发布以来，AlignBench已被顶级（中国）LLMs采用，用于评估它们在中文中的对齐能力，包括ChatGLM、Qwen、DeepSeek、Yi、Baichuan和Abab。

更新时间: 2024-08-25 09:58:57

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.18743v4

Generalized Categories Discovery for Long-tailed Recognition

Generalized Class Discovery (GCD) plays a pivotal role in discerning both known and unknown categories from unlabeled datasets by harnessing the insights derived from a labeled set comprising recognized classes. A significant limitation in prevailing GCD methods is their presumption of an equitably distributed category occurrence in unlabeled data. Contrary to this assumption, visual classes in natural environments typically exhibit a long-tailed distribution, with known or prevalent categories surfacing more frequently than their rarer counterparts. Our research endeavors to bridge this disconnect by focusing on the long-tailed Generalized Category Discovery (Long-tailed GCD) paradigm, which echoes the innate imbalances of real-world unlabeled datasets. In response to the unique challenges posed by Long-tailed GCD, we present a robust methodology anchored in two strategic regularizations: (i) a reweighting mechanism that bolsters the prominence of less-represented, tail-end categories, and (ii) a class prior constraint that aligns with the anticipated class distribution. Comprehensive experiments reveal that our proposed method surpasses previous state-of-the-art GCD methods by achieving an improvement of approximately 6 - 9% on ImageNet100 and competitive performance on CIFAR100.

Updated: 2024-08-25 09:58:25

标题: 长尾识别的广义类别发现

摘要: 广义类别发现（GCD）在从未标记的数据集中辨识已知和未知类别方面发挥着关键作用，它利用从包含已知类别的标记集中获得的见解。目前广义类别发现方法的一个显著局限性是它们假设未标记数据中的类别出现是均匀分布的。与这一假设相反，自然环境中的视觉类别通常呈现长尾分布，已知或普遍类别比罕见类别更频繁地出现。我们的研究致力于弥合这种断裂，专注于长尾广义类别发现（Long-tailed GCD）范例，这与真实世界的未标记数据集的固有不平衡相呼应。针对长尾广义类别发现所面临的独特挑战，我们提出了一种基于两种策略正则化的稳健方法：（i）一种加权机制，增强了较少代表的尾部类别的重要性，和（ii）一个与预期类别分布一致的类别先验约束。全面的实验表明，我们提出的方法在ImageNet100上实现了约6-9％的改进，并在CIFAR100上表现出竞争性能，超越了先前的最先进的GCD方法。

更新时间: 2024-08-25 09:58:25

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.05352v2

Mask-Encoded Sparsification: Mitigating Biased Gradients in Communication-Efficient Split Learning

This paper introduces a novel framework designed to achieve a high compression ratio in Split Learning (SL) scenarios where resource-constrained devices are involved in large-scale model training. Our investigations demonstrate that compressing feature maps within SL leads to biased gradients that can negatively impact the convergence rates and diminish the generalization capabilities of the resulting models. Our theoretical analysis provides insights into how compression errors critically hinder SL performance, which previous methodologies underestimate. To address these challenges, we employ a narrow bit-width encoded mask to compensate for the sparsification error without increasing the order of time complexity. Supported by rigorous theoretical analysis, our framework significantly reduces compression errors and accelerates the convergence. Extensive experiments also verify that our method outperforms existing solutions regarding training efficiency and communication complexity.

Updated: 2024-08-25 09:30:34

标题: 掩码编码稀疏化：减轻通信高效分割学习中的偏差梯度

摘要: 这篇论文介绍了一个新颖的框架，旨在在涉及资源受限设备的大规模模型训练中实现高压缩比。我们的研究表明，在Split Learning（SL）中压缩特征图会导致偏置梯度，这可能会对收敛速度产生负面影响，并减弱所得模型的泛化能力。我们的理论分析揭示了压缩错误如何严重阻碍SL性能，而先前的方法论低估了这一点。为了解决这些挑战，我们采用窄位宽编码掩码来补偿稀疏化误差，而不增加时间复杂度的阶数。在严格的理论分析支持下，我们的框架显著降低了压缩错误并加速了收敛。广泛的实验证实，我们的方法在训练效率和通信复杂性方面优于现有解决方案。

更新时间: 2024-08-25 09:30:34

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2408.13787v1

Localization of Synthetic Manipulations in Western Blot Images

Recent breakthroughs in deep learning and generative systems have significantly fostered the creation of synthetic media, as well as the local alteration of real content via the insertion of highly realistic synthetic manipulations. Local image manipulation, in particular, poses serious challenges to the integrity of digital content and societal trust. This problem is not only confined to multimedia data, but also extends to biological images included in scientific publications, like images depicting Western blots. In this work, we address the task of localizing synthetic manipulations in Western blot images. To discriminate between pristine and synthetic pixels of an analyzed image, we propose a synthetic detector that operates on small patches extracted from the image. We aggregate patch contributions to estimate a tampering heatmap, highlighting synthetic pixels out of pristine ones. Our methodology proves effective when tested over two manipulated Western blot image datasets, one altered automatically and the other manually by exploiting advanced AI-based image manipulation tools that are unknown at our training stage. We also explore the robustness of our method over an external dataset of other scientific images depicting different semantics, manipulated through unseen generation techniques.

Updated: 2024-08-25 09:29:20

标题: 合成操作在Western Blot图像中的定位

摘要: 近期深度学习和生成系统的突破显著促进了合成媒体的创作，以及通过插入高度逼真的合成操作对真实内容进行局部改变。特别是局部图像操作，对数字内容的完整性和社会信任构成严重挑战。这个问题不仅限于多媒体数据，还延伸到科学出版物中包含的生物图像，比如描绘西方印迹的图像。在这项工作中，我们解决了定位西方印迹图像中的合成操作任务。为了区分被分析图像中原始和合成像素，我们提出了一种合成检测器，该检测器在从图像中提取的小块上运行。我们汇总块的贡献来估计一个篡改热图，突出显示合成像素和原始像素。我们的方法在两个经过操作的西方印迹图像数据集上进行测试时证明了有效性，一个是通过自动修改，另一个是通过利用我们在训练阶段未知的高级基于人工智能的图像操作工具手动修改的。我们还探讨了我们的方法在其他科学图像数据集上的鲁棒性，这些图像描绘不同语义，并通过未知的生成技术进行操纵。

更新时间: 2024-08-25 09:29:20

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2408.13786v1

Analyzing the Impact of Splicing Artifacts in Partially Fake Speech Signals

Speech deepfake detection has recently gained significant attention within the multimedia forensics community. Related issues have also been explored, such as the identification of partially fake signals, i.e., tracks that include both real and fake speech segments. However, generating high-quality spliced audio is not as straightforward as it may appear. Spliced signals are typically created through basic signal concatenation. This process could introduce noticeable artifacts that can make the generated data easier to detect. We analyze spliced audio tracks resulting from signal concatenation, investigate their artifacts and assess whether such artifacts introduce any bias in existing datasets. Our findings reveal that by analyzing splicing artifacts, we can achieve a detection EER of 6.16% and 7.36% on PartialSpoof and HAD datasets, respectively, without needing to train any detector. These results underscore the complexities of generating reliable spliced audio data and lead to discussions that can help improve future research in this area.

Updated: 2024-08-25 09:28:04

标题: 分析部分伪造语音信号中剪接伪影的影响

摘要: 语音深度伪造检测最近在多媒体取证社区中引起了重大关注。相关问题也已经被探讨，比如部分伪造信号的识别，即包含真实和伪造语音片段的曲目。然而，生成高质量的拼接音频并不像看起来那么简单。拼接信号通常是通过基本信号连接而创建的。这个过程可能会引入明显的伪造迹象，使生成的数据更容易被检测到。我们分析了由信号连接导致的拼接音频曲目，调查了它们的伪造迹象，并评估这些伪造迹象是否在现有数据集中引入了任何偏见。我们的研究结果表明，通过分析拼接伪造迹象，我们可以在PartialSpoof和HAD数据集上分别达到6.16%和7.36%的检测EER，而无需训练任何检测器。这些结果突显了生成可靠的拼接音频数据的复杂性，并引发了讨论，有助于改进这一领域的未来研究。

更新时间: 2024-08-25 09:28:04

领域: cs.SD,cs.AI,cs.MM,eess.AS

下载: http://arxiv.org/abs/2408.13784v1

Variational autoencoder-based neural network model compression

Variational Autoencoders (VAEs), as a form of deep generative model, have been widely used in recent years, and shown great great peformance in a number of different domains, including image generation and anomaly detection, etc.. This paper aims to explore neural network model compression method based on VAE. The experiment uses different neural network models for MNIST recognition as compression targets, including Feedforward Neural Network (FNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM). These models are the most basic models in deep learning, and other more complex and advanced models are based on them or inherit their features and evolve. In the experiment, the first step is to train the models mentioned above, each trained model will have different accuracy and number of total parameters. And then the variants of parameters for each model are processed as training data in VAEs separately, and the trained VAEs are tested by the true model parameters. The experimental results show that using the latent space as a representation of the model compression can improve the compression rate compared to some traditional methods such as pruning and quantization, meanwhile the accuracy is not greatly affected using the model parameters reconstructed based on the latent space. In the future, a variety of different large-scale deep learning models will be used more widely, so exploring different ways to save time and space on saving or transferring models will become necessary, and the use of VAE in this paper can provide a basis for these further explorations.

Updated: 2024-08-25 09:06:22

标题: 基于变分自动编码器的神经网络模型压缩

摘要: 变分自动编码器（VAEs）作为一种深度生成模型，在近年来被广泛使用，并在许多不同领域，包括图像生成和异常检测等方面表现出很好的性能。本文旨在探索基于VAE的神经网络模型压缩方法。实验使用不同的神经网络模型作为MNIST识别的压缩目标，包括前馈神经网络（FNN）、卷积神经网络（CNN）、循环神经网络（RNN）和长短期记忆（LSTM）。这些模型是深度学习中最基本的模型，其他更复杂和先进的模型都是基于它们或继承它们的特征并发展而来的。在实验中，第一步是训练上述模型，每个训练模型将具有不同的准确度和总参数数量。然后，每个模型的参数变体被分别处理为VAEs的训练数据，并通过真实模型参数对训练的VAEs进行测试。实验结果显示，使用潜在空间作为模型压缩的表示可以提高压缩率，相比于一些传统方法如修剪和量化，同时使用基于潜在空间的模型参数重建并不会对准确度产生很大影响。将来，各种不同的大规模深度学习模型将更广泛地使用，因此探索不同的节省时间和空间的方式以保存或传输模型将变得必要，而本文中使用的VAE可以为这些进一步探索提供基础。

更新时间: 2024-08-25 09:06:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.14513v1

SAB:A Stealing and Robust Backdoor Attack based on Steganographic Algorithm against Federated Learning

Federated learning, an innovative network architecture designed to safeguard user privacy, is gaining widespread adoption in the realm of technology. However, given the existence of backdoor attacks in federated learning, exploring the security of federated learning is significance. Nevertheless, the backdoors investigated in current federated learning research can be readily detected by human inspection or resisted by detection algorithms. Accordingly, a new goal has been set to develop stealing and robust federated learning backdoor attacks. In this paper, we introduce a novel approach, SAB, tailored specifically for backdoor attacks in federated learning, presenting an alternative gradient updating mechanism. SAB attack based on steganographic algorithm, using image steganographic algorithm to build a full-size trigger to improve the accuracy of backdoors and use multiple loss joint computation to produce triggers. SAB exhibits smaller distances to benign samples and greater imperceptibility to the human eye. As such, our triggers are capable of mitigating or evading specific backdoor defense methods. In SAB, the bottom-95\% method is applied to extend the lifespan of backdoor attacks. It updates the gradient on minor value points to reduce the probability of being cleaned. Finally, the generalization of backdoors is enhanced with Sparse-update to improve the backdoor accuracy.

Updated: 2024-08-25 08:54:08

标题: SAB：基于隐写算法的窃取和强大后门攻击对抗联邦学习

摘要: 联邦学习是一种创新的网络架构，旨在保护用户隐私，在技术领域得到了广泛的应用。然而，鉴于联邦学习中存在后门攻击，探索联邦学习的安全性具有重要意义。然而，当前联邦学习研究中调查的后门攻击可以很容易地被人工检查检测或被检测算法抵抗。因此，一个新的目标是开发具有偷窃性和强大性的联邦学习后门攻击。在本文中，我们介绍了一种新颖的方法SAB，专门针对联邦学习中的后门攻击，提出了一种替代的梯度更新机制。基于隐写术算法的SAB攻击，利用图像隐写术算法构建一个完整的触发器，以提高后门的准确性，并使用多重损失联合计算来产生触发器。SAB对良性样本具有更小的距离，对人眼更不可察觉。因此，我们的触发器能够缓解或规避特定的后门防御方法。在SAB中，采用底部95%的方法来延长后门攻击的寿命。它在次要数值点上更新梯度，以减少被清除的概率。最后，通过稀疏更新来增强后门的泛化性，以提高后门的准确性。

更新时间: 2024-08-25 08:54:08

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2408.13773v1

Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning

These notes are based on a lecture delivered by NC on March 2021, as part of an advanced course in Princeton University on the mathematical understanding of deep learning. They present a theory (developed by NC, NR and collaborators) of linear neural networks -- a fundamental model in the study of optimization and generalization in deep learning. Practical applications born from the presented theory are also discussed. The theory is based on mathematical tools that are dynamical in nature. It showcases the potential of such tools to push the envelope of our understanding of optimization and generalization in deep learning. The text assumes familiarity with the basics of statistical learning theory. Exercises (without solutions) are included.

Updated: 2024-08-25 08:24:48

标题: 线性神经网络讲座笔记：深度学习中的优化和泛化故事

摘要: 这些笔记是基于2021年3月NC在普林斯顿大学的一门高级课程中所讲授的内容，涉及深度学习数学理解。它们介绍了线性神经网络的理论（由NC、NR和合作者开发），这是深度学习中优化和泛化研究的基本模型。文中还讨论了基于该理论产生的实际应用。该理论基于具有动态特性的数学工具，展示了这些工具推动我们对深度学习优化和泛化理解的潜力。文本假定读者熟悉统计学习理论的基础知识，并包含了练习题（无解答）。

更新时间: 2024-08-25 08:24:48

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2408.13767v1

On-device Learning of EEGNet-based Network For Wearable Motor Imagery Brain-Computer Interface

Electroencephalogram (EEG)-based Brain-Computer Interfaces (BCIs) have garnered significant interest across various domains, including rehabilitation and robotics. Despite advancements in neural network-based EEG decoding, maintaining performance across diverse user populations remains challenging due to feature distribution drift. This paper presents an effective approach to address this challenge by implementing a lightweight and efficient on-device learning engine for wearable motor imagery recognition. The proposed approach, applied to the well-established EEGNet architecture, enables real-time and accurate adaptation to EEG signals from unregistered users. Leveraging the newly released low-power parallel RISC-V-based processor, GAP9 from Greeenwaves, and the Physionet EEG Motor Imagery dataset, we demonstrate a remarkable accuracy gain of up to 7.31\% with respect to the baseline with a memory footprint of 15.6 KByte. Furthermore, by optimizing the input stream, we achieve enhanced real-time performance without compromising inference accuracy. Our tailored approach exhibits inference time of 14.9 ms and 0.76 mJ per single inference and 20 us and 0.83 uJ per single update during online training. These findings highlight the feasibility of our method for edge EEG devices as well as other battery-powered wearable AI systems suffering from subject-dependant feature distribution drift.

Updated: 2024-08-25 08:23:51

标题: 在可穿戴运动想象脑机接口中基于EEGNet网络的设备学习

摘要: 基于脑电图（EEG）的脑-计算机接口（BCI）在康复和机器人等各个领域引起了广泛关注。尽管基于神经网络的EEG解码取得了进展，但由于特征分布漂移，跨不同用户群体保持性能仍然具有挑战性。本文提出了一种有效的方法，通过在可穿戴的运动意象识别中实现轻量级和高效的设备上学习引擎来解决这一挑战。所提出的方法应用于已建立的EEGNet架构，能够实时且准确地适应未注册用户的EEG信号。通过利用Greeenwaves公司新发布的低功耗并行RISC-V处理器GAP9和Physionet EEG动作意象数据集，我们展示了相对于基准的高达7.31\%的准确度增益，内存占用为15.6 KByte。此外，通过优化输入流，我们实现了增强的实时性能，而不影响推理准确度。我们定制的方法展示了每次推理的推理时间为14.9毫秒和0.76毫焦耳，每次在线训练更新为20微秒和0.83微焦耳。这些发现突显了我们的方法对于边缘EEG设备以及其他受受试者特征分布漂移影响的电池供电可穿戴AI系统的可行性。

更新时间: 2024-08-25 08:23:51

领域: eess.SP,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2409.00083v1

Multimodal Ensemble with Conditional Feature Fusion for Dysgraphia Diagnosis in Children from Handwriting Samples

Developmental dysgraphia is a neurological disorder that hinders children's writing skills. In recent years, researchers have increasingly explored machine learning methods to support the diagnosis of dysgraphia based on offline and online handwriting. In most previous studies, the two types of handwriting have been analysed separately, which does not necessarily lead to promising results. In this way, the relationship between online and offline data cannot be explored. To address this limitation, we propose a novel multimodal machine learning approach utilizing both online and offline handwriting data. We created a new dataset by transforming an existing online handwritten dataset, generating corresponding offline handwriting images. We considered only different types of word data (simple word, pseudoword & difficult word) in our multimodal analysis. We trained SVM and XGBoost classifiers separately on online and offline features as well as implemented multimodal feature fusion and soft-voted ensemble. Furthermore, we proposed a novel ensemble with conditional feature fusion method which intelligently combines predictions from online and offline classifiers, selectively incorporating feature fusion when confidence scores fall below a threshold. Our novel approach achieves an accuracy of 88.8%, outperforming SVMs for single modalities by 12-14%, existing methods by 8-9%, and traditional multimodal approaches (soft-vote ensemble and feature fusion) by 3% and 5%, respectively. Our methodology contributes to the development of accurate and efficient dysgraphia diagnosis tools, requiring only a single instance of multimodal word/pseudoword data to determine the handwriting impairment. This work highlights the potential of multimodal learning in enhancing dysgraphia diagnosis, paving the way for accessible and practical diagnostic tools.

Updated: 2024-08-25 07:42:54

标题: 儿童书写样本中多模态条件特征融合的组合用于诊断儿童书写障碍

摘要: 发展性书写障碍是一种影响儿童书写能力的神经系统障碍。近年来，研究人员越来越多地探索机器学习方法，以支持基于离线和在线手写的发展性书写障碍的诊断。在大多数先前的研究中，这两种类型的手写被分别分析，这并不一定会产生令人满意的结果。通过这种方式，无法探究在线和离线数据之间的关系。为了解决这一限制，我们提出了一种新颖的多模态机器学习方法，利用在线和离线手写数据。我们通过转换现有的在线手写数据集，生成相应的离线手写图像，创建了一个新数据集。在我们的多模态分析中，我们仅考虑了不同类型的词数据（简单词、伪词和困难词）。我们分别在在线和离线特征上训练了SVM和XGBoost分类器，并实施了多模态特征融合和软投票集成。此外，我们提出了一种具有条件特征融合方法的新型集成，智能地将在线和离线分类器的预测结合起来，在置信分数低于阈值时选择性地合并特征。我们的新方法实现了88.8%的准确率，比单模态的SVM高出12-14%，比现有方法高出8-9%，比传统的多模态方法（软投票集成和特征融合）分别高出3%和5%。我们的方法有助于开发准确和高效的发展性书写障碍诊断工具，只需要一个多模态词/伪词数据实例即可确定书写障碍。这项工作突显了多模态学习在增强发展性书写障碍诊断方面的潜力，为获得易于访问和实用的诊断工具铺平了道路。

更新时间: 2024-08-25 07:42:54

领域: cs.CV,cs.AI,I.2.6; I.2.10; I.4.9; I.5.1; I.5.4

下载: http://arxiv.org/abs/2408.13754v1

Multi-Agent Target Assignment and Path Finding for Intelligent Warehouse: A Cooperative Multi-Agent Deep Reinforcement Learning Perspective

Multi-agent target assignment and path planning (TAPF) are two key problems in intelligent warehouse. However, most literature only addresses one of these two problems separately. In this study, we propose a method to simultaneously solve target assignment and path planning from a perspective of cooperative multi-agent deep reinforcement learning (RL). To the best of our knowledge, this is the first work to model the TAPF problem for intelligent warehouse to cooperative multi-agent deep RL, and the first to simultaneously address TAPF based on multi-agent deep RL. Furthermore, previous literature rarely considers the physical dynamics of agents. In this study, the physical dynamics of the agents is considered. Experimental results show that our method performs well in various task settings, which means that the target assignment is solved reasonably well and the planned path is almost shortest. Moreover, our method is more time-efficient than baselines.

Updated: 2024-08-25 07:32:58

标题: 多智能体目标分配和路径规划在智能仓库中的应用：合作式多智能体深度强化学习视角

摘要: 多智能体目标分配和路径规划（TAPF）是智能仓库中的两个关键问题。然而，大多数文献只单独解决这两个问题中的一个。在本研究中，我们提出了一种方法，从合作多智能体深度强化学习（RL）的角度同时解决目标分配和路径规划问题。据我们所知，这是第一项将智能仓库的TAPF问题建模为合作多智能体深度RL的工作，也是第一次基于多智能体深度RL同时解决TAPF问题。此外，先前的文献很少考虑智能体的物理动态。在本研究中，考虑了智能体的物理动态。实验结果显示，我们的方法在各种任务设置中表现良好，这意味着目标分配得到了合理解决，规划的路径几乎是最短的。此外，我们的方法比基线方法更具时间效率。

更新时间: 2024-08-25 07:32:58

领域: cs.AI,cs.MA

下载: http://arxiv.org/abs/2408.13750v1

Quartered Spectral Envelope and 1D-CNN-based Classification of Normally Phonated and Whispered Speech

Whisper, as a form of speech, is not sufficiently addressed by mainstream speech applications. This is due to the fact that systems built for normal speech do not work as expected for whispered speech. A first step to building a speech application that is inclusive of whispered speech, is the successful classification of whispered speech and normal speech. Such a front-end classification system is expected to have high accuracy and low computational overhead, which is the scope of this paper. One of the characteristics of whispered speech is the absence of the fundamental frequency (or pitch), and hence the pitch harmonics as well. The presence of the pitch and pitch harmonics in normal speech, and its absence in whispered speech, is evident in the spectral envelope of the Fourier transform. We observe that this characteristic is predominant in the first quarter of the spectrum, and exploit the same as a feature. We propose the use of one dimensional convolutional neural networks (1D-CNN) to capture these features from the quartered spectral envelope (QSE). The system yields an accuracy of 99.31% when trained and tested on the wTIMIT dataset, and 100% on the CHAINS dataset. The proposed feature is compared with Mel frequency cepstral coefficients (MFCC), a staple in the speech domain. The proposed classification system is also compared with the state-of-the-art system based on log-filterbank energy (LFBE) features trained on long short-term memory (LSTM) network. The proposed system based on 1D-CNN performs better than, or as good as, the state-of-the-art across multiple experiments. It also converges sooner, with lesser computational overhead. Finally, the proposed system is evaluated under the presence of white noise at various signal-to-noise ratios and found to be robust.

Updated: 2024-08-25 07:17:11

标题: 四分频谱包络和基于1D-CNN的正常语音和低声语音分类

摘要: Whisper, 作为一种言语形式，未能得到主流言语应用的充分关注。这是因为为正常言语设计的系统对低声言语的表现并不如预期。构建一个包含低声言语的言语应用的第一步是成功分类低声言语和正常言语。这样的前端分类系统预计具有高准确性和低计算开销，这也是本文的研究范围。低声言语的一个特征是缺乏基频（或音高），因此也缺乏音高谐波。正常言语中存在音高和音高谐波，而低声言语中不存在，这在傅立叶变换的频谱包络中是明显的。我们观察到这一特征在频谱的第一四分之一中占主导地位，并将其作为一个特征加以利用。我们提出使用一维卷积神经网络（1D-CNN）从四分之一频谱包络（QSE）中捕捉这些特征。该系统在训练和测试wTIMIT数据集时的准确率为99.31%，在CHAINS数据集上为100%。提出的特征与语音领域中常见的Mel频率倒谱系数（MFCC）进行了比较。提出的分类系统还与基于长短期记忆（LSTM）网络训练的基于对数滤波器组能量（LFBE）特征的最新系统进行了比较。基于1D-CNN的提出系统在多次实验中表现比最新技术更好，或者同样出色。它收敛得更快，计算开销更小。最后，提出的系统在各种信噪比下经过白噪声的影响进行了评估，并被证明具有稳健性。

更新时间: 2024-08-25 07:17:11

领域: eess.AS,cs.LG

下载: http://arxiv.org/abs/2408.13746v1

CAMH: Advancing Model Hijacking Attack in Machine Learning

In the burgeoning domain of machine learning, the reliance on third-party services for model training and the adoption of pre-trained models have surged. However, this reliance introduces vulnerabilities to model hijacking attacks, where adversaries manipulate models to perform unintended tasks, leading to significant security and ethical concerns, like turning an ordinary image classifier into a tool for detecting faces in pornographic content, all without the model owner's knowledge. This paper introduces Category-Agnostic Model Hijacking (CAMH), a novel model hijacking attack method capable of addressing the challenges of class number mismatch, data distribution divergence, and performance balance between the original and hijacking tasks. CAMH incorporates synchronized training layers, random noise optimization, and a dual-loop optimization approach to ensure minimal impact on the original task's performance while effectively executing the hijacking task. We evaluate CAMH across multiple benchmark datasets and network architectures, demonstrating its potent attack effectiveness while ensuring minimal degradation in the performance of the original task.

Updated: 2024-08-25 07:03:01

标题: CAMH：推进机器学习中的模型劫持攻击

摘要: 在机器学习领域的蓬勃发展中，对于模型训练的第三方服务的依赖和采用预训练模型的采用已经激增。然而，这种依赖引入了模型劫持攻击的漏洞，即对手操纵模型执行意外任务，导致重大的安全和道德关切，例如将普通图像分类器变成一种用于检测色情内容中人脸的工具，而模型所有者并不知情。本文介绍了一种新颖的模型劫持攻击方法Category-Agnostic Model Hijacking (CAMH)，能够解决类别数量不匹配、数据分布差异以及原始任务和劫持任务之间性能平衡等挑战。CAMH包括了同步训练层、随机噪声优化和双循环优化方法，以确保对原始任务性能的最小影响，同时有效地执行劫持任务。我们在多个基准数据集和网络架构上评估了CAMH，展示了其强大的攻击效果，同时确保原始任务性能的最小降级。

更新时间: 2024-08-25 07:03:01

领域: cs.CR

下载: http://arxiv.org/abs/2408.13741v1

Literary and Colloquial Tamil Dialect Identification

Culture and language evolve together. The old literary form of Tamil is used commonly for writing and the contemporary colloquial Tamil is used for speaking. Human-computer interaction applications require Colloquial Tamil (CT) to make it more accessible and easy for the everyday user and, it requires Literary Tamil (LT) when information is needed in a formal written format. Continuing the use of LT alongside CT in computer aided language learning applications will both preserve LT, and provide ease of use via CT, at the same time. Hence there is a need for the conversion between LT and CT dialects, which demands as a first step, dialect identification. Dialect Identification (DID) of LT and CT is an unexplored area of research. In the current work, keeping the nuances of both these dialects in mind, five methods are explored which include two implicit methods - Gaussian Mixture Model (GMM) and Convolutional Neural Network (CNN); two explicit methods - Parallel Phone Recognition (PPR) and Parallel Large Vocabulary Continuous Speech Recognition (P-LVCSR); two versions of the proposed explicit Unified Phone Recognition method (UPR-1 and UPR-2). These methods vary based on: the need for annotated data, the size of the unit, the way in which modelling is carried out, and the way in which the final decision is made. Even though the average duration of the test utterances is less - 4.9s for LT and 2.5s for CT - the systems performed well, offering the following identification accuracies: 87.72% (GMM), 93.97% (CNN), 89.24% (PPR), 94.21% (P-LVCSR), 88.57% (UPR-1), 93.53% (UPR-1 with P-LVCSR), 94.55% (UPR-2), and 95.61% (UPR-2 with P-LVCSR).

Updated: 2024-08-25 06:52:48

标题: 文学与口语泰米尔方言识别

摘要: 文化和语言是共同演变的。古代泰米尔文学形式通常用于书写，当代口语泰米尔用于口语交流。人机交互应用程序需要口语泰米尔（CT）以使其更易于日常用户使用，并且在需要以正式书面格式提供信息时需要文学泰米尔（LT）。在计算机辅助语言学习应用程序中继续同时使用LT和CT将既保留LT，又通过CT提供使用便利。因此，有必要在LT和CT方言之间进行转换，这首先需要方言识别。文学泰米尔和口语泰米尔的方言识别（DID）是一个未被探索的研究领域。在当前工作中，考虑到这两种方言的细微差别，探讨了五种方法，其中包括两种隐式方法 - 高斯混合模型（GMM）和卷积神经网络（CNN）；两种显式方法 - 平行电话识别（PPR）和平行大词汇连续语音识别（P-LVCSR）；提出的显式统一电话识别方法的两个版本（UPR-1和UPR-2）。这些方法根据需要标注数据、单位大小、建模方式和最终决策方式等方面有所不同。尽管测试话语的平均持续时间较短 - LT为4.9秒，CT为2.5秒 - 系统表现良好，提供了以下识别准确率：87.72%（GMM）、93.97%（CNN）、89.24%（PPR）、94.21%（P-LVCSR）、88.57%（UPR-1）、93.53%（UPR-1与P-LVCSR）、94.55%（UPR-2）和95.61%（UPR-2与P-LVCSR）。

更新时间: 2024-08-25 06:52:48

领域: eess.AS,cs.CL,cs.HC,cs.LG,cs.SD

下载: http://arxiv.org/abs/2408.13739v1

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices. To mitigate this, we introduce a method that employs error-bounded lossy compression to reduce the communication data size and accelerate DLRM training. We develop a novel error-bounded lossy compression algorithm, informed by an in-depth analysis of embedding data features, to achieve high compression ratios. Moreover, we introduce a dual-level adaptive strategy for error-bound adjustment, spanning both table-wise and iteration-wise aspects, to balance the compression benefits with the potential impacts on accuracy. We further optimize our compressor for PyTorch tensors on GPUs, minimizing compression overhead. Evaluation shows that our method achieves a 1.38$\times$ training speedup with a minimal accuracy impact.

Updated: 2024-08-25 06:47:44

标题: 使用双层自适应损失压缩加速深度学习推荐模型训练中的通信

摘要: DLRM是一种最先进的推荐系统模型，已在各种行业应用中得到广泛应用。然而，DLRM模型的庞大尺寸需要使用多个设备/GPU进行高效训练。这一过程中的一个重要瓶颈是需要耗时的全对全通信来收集来自所有设备的嵌入数据。为了缓解这一问题，我们引入了一种方法，利用误差有界的有损压缩来减小通信数据大小并加速DLRM训练。我们开发了一种新颖的误差有界的有损压缩算法，通过深入分析嵌入数据特征，实现高压缩比。此外，我们引入了一个双层自适应策略进行误差界调整，涵盖表级和迭代级两个方面，以平衡压缩带来的好处和对准确性的潜在影响。我们进一步优化了在GPU上的PyTorch张量的压缩器，最小化了压缩开销。评估结果显示，我们的方法实现了1.38倍的训练加速，对准确性影响很小。

更新时间: 2024-08-25 06:47:44

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2407.04272v4

Differentiable Logic Programming for Distant Supervision

We introduce a new method for integrating neural networks with logic programming in Neural-Symbolic AI (NeSy), aimed at learning with distant supervision, in which direct labels are unavailable. Unlike prior methods, our approach does not depend on symbolic solvers for reasoning about missing labels. Instead, it evaluates logical implications and constraints in a differentiable manner by embedding both neural network outputs and logic programs into matrices. This method facilitates more efficient learning under distant supervision. We evaluated our approach against existing methods while maintaining a constant volume of training data. The findings indicate that our method not only matches or exceeds the accuracy of other methods across various tasks but also speeds up the learning process. These results highlight the potential of our approach to enhance both accuracy and learning efficiency in NeSy applications.

Updated: 2024-08-25 06:40:06

标题: 远程监督的可微逻辑编程

摘要: 我们介绍了一种新的方法，用于将神经网络与逻辑编程结合在神经符号人工智能（NeSy）中，旨在使用远程监督进行学习，其中直接标签不可用。与先前的方法不同，我们的方法不依赖于符号求解器来推理缺失标签。相反，它通过将神经网络输出和逻辑程序嵌入矩阵中以可微分的方式评估逻辑蕴涵和约束。这种方法在远程监督下促进了更高效的学习。我们对我们的方法进行了评估，同时保持了训练数据的恒定量。研究结果表明，我们的方法不仅在各种任务中与其他方法的准确性相匹配或超出，而且还加快了学习过程。这些结果突出了我们的方法在NeSy应用中提高准确性和学习效率的潜力。

更新时间: 2024-08-25 06:40:06

领域: cs.AI

下载: http://arxiv.org/abs/2408.12591v2

From Zero to Hero: Harnessing Transformers for Biomedical Named Entity Recognition in Zero- and Few-shot Contexts

Supervised named entity recognition (NER) in the biomedical domain depends on large sets of annotated texts with the given named entities. The creation of such datasets can be time-consuming and expensive, while extraction of new entities requires additional annotation tasks and retraining the model. To address these challenges, this paper proposes a method for zero- and few-shot NER in the biomedical domain. The method is based on transforming the task of multi-class token classification into binary token classification and pre-training on a large amount of datasets and biomedical entities, which allow the model to learn semantic relations between the given and potentially novel named entity labels. We have achieved average F1 scores of 35.44% for zero-shot NER, 50.10% for one-shot NER, 69.94% for 10-shot NER, and 79.51% for 100-shot NER on 9 diverse evaluated biomedical entities with fine-tuned PubMedBERT-based model. The results demonstrate the effectiveness of the proposed method for recognizing new biomedical entities with no or limited number of examples, outperforming previous transformer-based methods, and being comparable to GPT3-based models using models with over 1000 times fewer parameters. We make models and developed code publicly available.

Updated: 2024-08-25 06:22:00

标题: 从零到英雄：利用变压器在零和少样本情境下进行生物医学命名实体识别

摘要: 在生物医学领域，监督命名实体识别（NER）取决于具有给定命名实体的大量注释文本。创建这样的数据集可能耗时且昂贵，而提取新实体需要额外的注释任务和重新训练模型。为了解决这些挑战，本文提出了一种零次和少次命名实体识别的方法。该方法基于将多类标记分类任务转化为二元标记分类，并在大量数据集和生物医学实体上进行预训练，使模型能够学习给定和潜在新命名实体标签之间的语义关系。我们在9个不同的生物医学实体上使用经过微调的PubMedBERT模型分别实现了零次NER的平均F1分数为35.44％，一次NER的平均F1分数为50.10％，10次NER的平均F1分数为69.94％，100次NER的平均F1分数为79.51％。结果表明，所提出的方法对于识别没有或有限数量示例的新生物医学实体非常有效，优于先前基于变压器的方法，并且与使用参数数量超过1000倍少的GPT3模型相媲美。我们公开提供了模型和开发代码。

更新时间: 2024-08-25 06:22:00

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2305.04928v5

Tackling the Local Bias in Federated Graph Learning

Federated graph learning (FGL) has become an important research topic in response to the increasing scale and the distributed nature of graph-structured data in the real world. In FGL, a global graph is distributed across different clients, where each client holds a subgraph. Existing FGL methods often fail to effectively utilize cross-client edges, losing structural information during the training; additionally, local graphs often exhibit significant distribution divergence. These two issues make local models in FGL less desirable than in centralized graph learning, namely the local bias problem in this paper. To solve this problem, we propose a novel FGL framework to make the local models similar to the model trained in a centralized setting. Specifically, we design a distributed learning scheme, fully leveraging cross-client edges to aggregate information from other clients. In addition, we propose a label-guided sampling approach to alleviate the imbalanced local data and meanwhile, distinctly reduce the training overhead. Extensive experiments demonstrate that local bias can compromise the model performance and slow down the convergence during training. Experimental results also verify that our framework successfully mitigates local bias, achieving better performance than other baselines with lower time and memory overhead.

Updated: 2024-08-25 06:19:22

标题: 解决联邦图学习中的本地偏差

摘要: 联合图学习（FGL）已经成为一个重要的研究课题，以应对现实世界中图结构数据的规模不断增加和分布性。在FGL中，一个全局图被分布到不同的客户端，每个客户端持有一个子图。现有的FGL方法通常无法有效利用跨客户端边缘，在训练过程中丢失结构信息；此外，本地图通常表现出显著的分布差异。这两个问题使得FGL中的本地模型不如集中式图学习中的本地模型理想，即本文中的本地偏差问题。为了解决这个问题，我们提出了一个新颖的FGL框架，使本地模型类似于在集中设置中训练的模型。具体来说，我们设计了一个分布式学习方案，充分利用跨客户端边缘从其他客户端聚合信息。此外，我们提出了一个标签引导的采样方法，以减轻不平衡的本地数据，同时明显减少训练开销。大量实验证明，本地偏差可能损害模型性能并在训练过程中减慢收敛速度。实验结果还验证了我们的框架成功地缓解了本地偏差，实现了比其他基线更好的性能，并且具有更低的时间和内存开销。

更新时间: 2024-08-25 06:19:22

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2110.12906v3

UniHENN: Designing Faster and More Versatile Homomorphic Encryption-based CNNs without im2col

Homomorphic encryption (HE) enables privacy-preserving deep learning by allowing computations on encrypted data without decryption. However, deploying convolutional neural networks (CNNs) with HE is challenging due to the need to convert input data into a two-dimensional matrix for convolution using the im2col technique, which rearranges the input for efficient computation. This restricts the types of CNN models that can be used since the encrypted data structure must be compatible with the specific model. UniHENN is a novel HE-based CNN architecture that eliminates the need for im2col, enhancing its versatility and compatibility with a broader range of CNN models. UniHENN flattens input data to one dimension without using im2col. The kernel performs convolutions by traversing the image, using incremental rotations and structured multiplication on the flattened input, with results spaced by the stride interval. Experimental results show that UniHENN significantly outperforms the state-of-the-art 2D CNN inference architecture named PyCrCNN in terms of inference time. For example, on the LeNet-1 model, UniHENN achieves an average inference time of 30.089 seconds, about 26.6 times faster than PyCrCNN's 800.591 seconds. Furthermore, UniHENN outperforms TenSEAL, an im2col-optimized CNN model, in concurrent image processing. For ten samples, UniHENN (16.247 seconds) was about 3.9 times faster than TenSEAL (63.706 seconds), owing to its support for batch processing of up to 10 samples. We demonstrate UniHENN's adaptability to various CNN architectures, including a 1D CNN and six 2D CNNs, highlighting its flexibility and efficiency for privacy-preserving cloud-based CNN services.

Updated: 2024-08-25 06:12:41

标题: UniHENN：设计更快、更多功能的基于同态加密的CNN，无需im2col

摘要: 同态加密（HE）通过允许对加密数据进行计算而无需解密，实现了隐私保护的深度学习。然而，使用HE部署卷积神经网络（CNNs）具有挑战性，因为需要将输入数据转换为二维矩阵以便使用im2col技术进行卷积，该技术重新排列输入以实现高效计算。这限制了可以使用的CNN模型的类型，因为加密数据结构必须与特定模型兼容。UniHENN是一种新颖的基于HE的CNN架构，消除了对im2col的需求，增强了其与更广泛范围的CNN模型的兼容性。UniHENN将输入数据展平为一维，而不使用im2col。该核通过遍历图像执行卷积，使用增量旋转和对展平输入的结构化乘法，结果间隔为步幅间隔。实验结果表明，UniHENN在推理时间方面明显优于名为PyCrCNN的最先进的2D CNN推理架构。例如，在LeNet-1模型上，UniHENN实现了平均推理时间为30.089秒，比PyCrCNN的800.591秒快约26.6倍。此外，UniHENN在并发图像处理方面也优于经过im2col优化的CNN模型TenSEAL。对于十个样本，UniHENN（16.247秒）比TenSEAL（63.706秒）快约3.9倍，这归功于其支持对最多10个样本进行批处理。我们展示了UniHENN对各种CNN架构的适应性，包括1D CNN和六个2D CNNs，突出了其用于隐私保护云端CNN服务的灵活性和效率。

更新时间: 2024-08-25 06:12:41

领域: cs.CR

下载: http://arxiv.org/abs/2402.03060v3

LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models

Logs are ubiquitous digital footprints, playing an indispensable role in system diagnostics, security analysis, and performance optimization. The extraction of actionable insights from logs is critically dependent on the log parsing process, which converts raw logs into structured formats for downstream analysis. Yet, the complexities of contemporary systems and the dynamic nature of logs pose significant challenges to existing automatic parsing techniques. The emergence of Large Language Models (LLM) offers new horizons. With their expansive knowledge and contextual prowess, LLMs have been transformative across diverse applications. Building on this, we introduce LogParser-LLM, a novel log parser integrated with LLM capabilities. This union seamlessly blends semantic insights with statistical nuances, obviating the need for hyper-parameter tuning and labeled training data, while ensuring rapid adaptability through online parsing. Further deepening our exploration, we address the intricate challenge of parsing granularity, proposing a new metric and integrating human interactions to allow users to calibrate granularity to their specific needs. Our method's efficacy is empirically demonstrated through evaluations on the Loghub-2k and the large-scale LogPub benchmark. In evaluations on the LogPub benchmark, involving an average of 3.6 million logs per dataset across 14 datasets, our LogParser-LLM requires only 272.5 LLM invocations on average, achieving a 90.6% F1 score for grouping accuracy and an 81.1% for parsing accuracy. These results demonstrate the method's high efficiency and accuracy, outperforming current state-of-the-art log parsers, including pattern-based, neural network-based, and existing LLM-enhanced approaches.

Updated: 2024-08-25 05:34:24

标题: LogParser-LLM：通过大型语言模型推进高效日志解析

摘要: 日志是无处不在的数字足迹，在系统诊断、安全分析和性能优化中发挥着不可或缺的作用。从日志中提取可操作的见解在很大程度上取决于日志解析过程，该过程将原始日志转换为结构化格式，以便进行下游分析。然而，当代系统的复杂性和日志的动态特性对现有的自动解析技术提出了重大挑战。大型语言模型（LLM）的出现开辟了新的视野。凭借其广泛的知识和语境能力，LLM在各种应用中产生了变革性的影响。在此基础上，我们介绍了LogParser-LLM，这是一个集成了LLM能力的新型日志解析器。这种结合无缝地融合了语义见解和统计微妙之处，消除了超参数调整和标记训练数据的需要，同时通过在线解析确保了快速适应性。进一步深入探索，我们解决了解析粒度的复杂挑战，提出了一个新的度量标准，并整合人类互动，允许用户根据其特定需求来校准粒度。我们的方法通过对Loghub-2k和大规模LogPub基准进行评估，从经验上证明了其有效性。在对LogPub基准的评估中，涉及14个数据集中每个数据集平均3.6百万条日志，我们的LogParser-LLM平均只需272.5次LLM调用，达到了90.6%的分组准确度和81.1%的解析准确度的F1分数。这些结果显示了该方法的高效性和准确性，优于当前最先进的日志解析器，包括基于模式的、基于神经网络的和现有的LLM增强方法。

更新时间: 2024-08-25 05:34:24

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2408.13727v1

Relaxed Rotational Equivariance via $G$-Biases in Vision

Group Equivariant Convolution (GConv) can effectively handle rotational symmetry data. They assume uniform and strict rotational symmetry across all features, as the transformations under the specific group. However, real-world data rarely conforms to strict rotational symmetry commonly referred to as Rotational Symmetry-Breaking in the system or dataset, making GConv unable to adapt effectively to this phenomenon. Motivated by this, we propose a simple but highly effective method to address this problem, which utilizes a set of learnable biases called the $G$-Biases under the group order to break strict group constraints and achieve \textbf{R}elaxed \textbf{R}otational \textbf{E}quivarant \textbf{Conv}olution (RREConv). We conduct extensive experiments to validate Relaxed Rotational Equivariance on rotational symmetry groups $\mathcal{C}_n$ (e.g. $\mathcal{C}_2$, $\mathcal{C}_4$, and $\mathcal{C}_6$ groups). Further experiments demonstrate that our proposed RREConv-based methods achieve excellent performance, compared to existing GConv-based methods in classification and detection tasks on natural image datasets.

Updated: 2024-08-25 05:18:26

标题: 在视觉中通过$G$-偏差实现松弛的旋转等变性

摘要: Group Equivariant Convolution (GConv)可以有效处理具有旋转对称性的数据。他们假设所有特征都具有均匀严格的旋转对称性，作为特定群体下的转换。然而，现实世界的数据很少符合严格的旋转对称性，通常被称为系统或数据集中的旋转对称性破坏，使得GConv无法有效地适应这种现象。受此启发，我们提出了一种简单但非常有效的方法来解决这个问题，该方法利用一组可学习的偏差，称为$G$-Biases，以打破严格的群体约束，并实现\textbf{R}elaxed \textbf{R}otational \textbf{E}quivarant \textbf{Conv}olution (RREConv)。我们进行了大量实验证明在旋转对称群$\mathcal{C}_n$（例如$\mathcal{C}_2$、$\mathcal{C}_4$和$\mathcal{C}_6$群）上验证了Relaxed Rotational Equivariance。进一步的实验证明，与现有基于GConv的方法相比，我们提出的基于RREConv的方法在自然图像数据集上的分类和检测任务中表现出优异的性能。

更新时间: 2024-08-25 05:18:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.12454v2

Unveiling Visual Biases in Audio-Visual Localization Benchmarks

Audio-Visual Source Localization (AVSL) aims to localize the source of sound within a video. In this paper, we identify a significant issue in existing benchmarks: the sounding objects are often easily recognized based solely on visual cues, which we refer to as visual bias. Such biases hinder these benchmarks from effectively evaluating AVSL models. To further validate our hypothesis regarding visual biases, we examine two representative AVSL benchmarks, VGG-SS and EpicSounding-Object, where the vision-only models outperform all audiovisual baselines. Our findings suggest that existing AVSL benchmarks need further refinement to facilitate audio-visual learning.

Updated: 2024-08-25 04:56:08

标题: 揭示音视定位基准中的视觉偏见

摘要: Audio-Visual Source Localization（AVSL）旨在定位视频中声音的来源。在这篇论文中，我们发现现有基准测试中存在一个重要问题：声音对象通常仅基于视觉线索就很容易被识别，我们称之为视觉偏见。这种偏见阻碍了这些基准测试有效评估AVSL模型。为了进一步验证我们关于视觉偏见的假设，我们检查了两个代表性的AVSL基准测试，VGG-SS和EpicSounding-Object，在这些测试中，仅有视觉模型表现优于所有音视频基线模型。我们的发现表明，现有的AVSL基准测试需要进一步完善以促进音视频学习。

更新时间: 2024-08-25 04:56:08

领域: cs.MM,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2409.06709v1

Interpretable and Robust AI in EEG Systems: A Survey

The close coupling of artificial intelligence (AI) and electroencephalography (EEG) has substantially advanced human-computer interaction (HCI) technologies in the AI era. Different from traditional EEG systems, the interpretability and robustness of AI-based EEG systems are becoming particularly crucial. The interpretability clarifies the inner working mechanisms of AI models and thus can gain the trust of users. The robustness reflects the AI's reliability against attacks and perturbations, which is essential for sensitive and fragile EEG signals. Thus the interpretability and robustness of AI in EEG systems have attracted increasing attention, and their research has achieved great progress recently. However, there is still no survey covering recent advances in this field. In this paper, we present the first comprehensive survey and summarize the interpretable and robust AI techniques for EEG systems. Specifically, we first propose a taxonomy of interpretability by characterizing it into three types: backpropagation, perturbation, and inherently interpretable methods. Then we classify the robustness mechanisms into four classes: noise and artifacts, human variability, data acquisition instability, and adversarial attacks. Finally, we identify several critical and unresolved challenges for interpretable and robust AI in EEG systems and further discuss their future directions.

Updated: 2024-08-25 04:41:36

标题: 可解释和稳健的脑电图系统中的人工智能：一项调查

摘要: 人工智能（AI）和脑电图（EEG）的紧密耦合在AI时代显著推动了人机交互（HCI）技术的发展。与传统的EEG系统不同，基于AI的EEG系统的可解释性和鲁棒性变得尤为关键。可解释性可以澄清AI模型的内部工作机制，从而获得用户的信任。鲁棒性反映了AI对攻击和干扰的可靠性，这对于敏感和脆弱的EEG信号至关重要。因此，EEG系统中AI的可解释性和鲁棒性引起越来越多的关注，其研究最近取得了巨大进展。然而，目前仍没有涵盖该领域最新进展的综述。本文提出了第一份全面调查，并总结了EEG系统的可解释和鲁棒AI技术。具体而言，我们首先提出了一个解释性的分类法，将其划分为三类：反向传播，扰动和固有可解释方法。然后，我们将鲁棒性机制分类为四类：噪声和伪影，人类变异性，数据采集不稳定性和对抗性攻击。最后，我们确定了在EEG系统中可解释和鲁棒AI面临的一些关键和未解决的挑战，并进一步讨论它们的未来方向。

更新时间: 2024-08-25 04:41:36

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2304.10755v3

LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings

Zero-shot graph machine learning, especially with graph neural networks (GNNs), has garnered significant interest due to the challenge of scarce labeled data. While methods like self-supervised learning and graph prompt learning have been extensively explored, they often rely on fine-tuning with task-specific labels, limiting their effectiveness in zero-shot scenarios. Inspired by the zero-shot capabilities of instruction-fine-tuned large language models (LLMs), we introduce a novel framework named Token Embedding-Aligned Graph Language Model (TEA-GLM) that leverages LLMs as cross-dataset and cross-task zero-shot learners for graph machine learning. Concretely, we pretrain a GNN, aligning its representations with token embeddings of an LLM. We then train a linear projector that transforms the GNN's representations into a fixed number of graph token embeddings without tuning the LLM. A unified instruction is designed for various graph tasks at different levels, such as node classification (node-level) and link prediction (edge-level). These design choices collectively enhance our method's effectiveness in zero-shot learning, setting it apart from existing methods. Experiments show that our graph token embeddings help the LLM predictor achieve state-of-the-art performance on unseen datasets and tasks compared to other methods using LLMs as predictors.

Updated: 2024-08-25 04:32:45

标题: LLMs作为零-shot图学习器：将GNN表示与LLM标记嵌入对齐

摘要: 零样本图机器学习，尤其是基于图神经网络（GNNs）的方法，由于标记数据稀缺的挑战而引起了极大的兴趣。虽然诸如自监督学习和图提示学习等方法已经得到广泛探讨，但它们通常依赖于对特定任务的微调标签，从而限制了它们在零样本情况下的有效性。受到指令微调大型语言模型（LLMs）的零样本能力的启发，我们引入了一个名为Token Embedding-Aligned Graph Language Model (TEA-GLM)的新框架，利用LLMs作为跨数据集和跨任务的零样本学习器用于图机器学习。具体地，我们预训练一个GNN，将其表示与LLM的标记嵌入对齐。然后我们训练一个线性投影器，将GNN的表示转换为固定数量的图标记嵌入，而不调整LLM。我们为不同层次的各种图任务设计了一个统一的指令，例如节点分类（节点级）和链接预测（边级）。这些设计选择共同提高了我们方法在零样本学习中的有效性，使其与现有方法有所不同。实验表明，我们的图标记嵌入有助于LLM预测器在未见过的数据集和任务上实现比使用LLM作为预测器的其他方法更领先的性能。

更新时间: 2024-08-25 04:32:45

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.14512v1

A prototype-based model for set classification

Classification of sets of inputs (e.g., images and texts) is an active area of research within both computer vision (CV) and natural language processing (NLP). A common way to represent a set of vectors is to model them as linear subspaces. In this contribution, we present a prototype-based approach for learning on the manifold formed from such linear subspaces, the Grassmann manifold. Our proposed method learns a set of subspace prototypes capturing the representative characteristics of classes and a set of relevance factors automating the selection of the dimensionality of the subspaces. This leads to a transparent classifier model which presents the computed impact of each input vector on its decision. Through experiments on benchmark image and text datasets, we have demonstrated the efficiency of our proposed classifier, compared to the transformer-based models in terms of not only performance and explainability but also computational resource requirements.

Updated: 2024-08-25 04:29:18

标题: 一个基于原型的集合分类模型

摘要: 输入集合（例如图像和文本）的分类是计算机视觉（CV）和自然语言处理（NLP）领域的一个活跃研究领域。表示一组向量的常见方法是将它们建模为线性子空间。在本文中，我们提出了一种基于原型的方法，用于在由这些线性子空间形成的流形上进行学习，即 Grassmann 流形。我们提出的方法学习一组捕捉类别代表性特征的子空间原型，以及一组自动选择子空间维度的相关性因子。这导致了一个透明的分类器模型，展示了每个输入向量对其决策的计算影响。通过对基准图像和文本数据集的实验，我们已经证明了我们提出的分类器的效率，与基于 transformer 的模型相比，不仅在性能和可解释性方面，而且在计算资源需求方面也更好。

更新时间: 2024-08-25 04:29:18

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2408.13720v1

Count-based Novelty Exploration in Classical Planning

Count-based exploration methods are widely employed to improve the exploratory behavior of learning agents over sequential decision problems. Meanwhile, Novelty search has achieved success in Classical Planning through recording of the first, but not successive, occurrences of tuples. In order to structure the exploration, however, the number of tuples considered needs to grow exponentially as the search progresses. We propose a new novelty technique, classical count-based novelty, which aims to explore the state space with a constant number of tuples, by leveraging the frequency of each tuple's appearance in a search tree. We then justify the mechanisms through which lower tuple counts lead the search towards novel tuples. We also introduce algorithmic contributions in the form of a trimmed open list that maintains a constant size by pruning nodes with bad novelty values. These techniques are shown to complement existing novelty heuristics when integrated in a classical solver, achieving competitive results in challenging benchmarks from recent International Planning Competitions. Moreover, adapting our solver as the frontend planner in dual configurations that utilize both memory and time thresholds demonstrates a significant increase in instance coverage, surpassing current state-of-the-art solvers.

Updated: 2024-08-25 04:25:10

标题: 经典规划中基于计数的新颖性探索

摘要: 计数型探索方法被广泛应用于改善学习代理在顺序决策问题上的探索行为。同时，新颖性搜索通过记录元组的第一次出现（而非连续出现）在经典规划领域取得了成功。然而，为了构建探索，考虑的元组数量需要随着搜索的进行而呈指数增长。我们提出了一种新的新颖性技术，经典的基于计数的新颖性，旨在通过利用每个元组在搜索树中出现的频率，以恒定数量的元组探索状态空间。然后，我们证明了较低的元组计数如何引导搜索走向新颖的元组。我们还通过引入一个修剪开放列表的算法贡献形式，通过删除具有不良新颖性值的节点来保持恒定大小。这些技术在经典求解器中集成时被证明能够补充现有的新颖性启发式，在最近的国际规划竞赛中取得了具有竞争力的结果。此外，将我们的求解器作为前端规划器，配合使用内存和时间阈值的双重配置，展示了实例覆盖范围的显著增加，超过了当前最先进的求解器。

更新时间: 2024-08-25 04:25:10

领域: cs.AI

下载: http://arxiv.org/abs/2408.13719v1

On the Effects of Data Scale on Computer Control Agents

Autonomous agents that control computer interfaces to accomplish human tasks are emerging. Leveraging LLMs to power such agents has been of special interest, but unless fine-tuned on human-collected task demonstrations, performance is still relatively low. In this work we study whether fine-tuning alone is a viable approach for building real-world computer control agents. In particularly, we investigate how performance measured on both high and low-level tasks in domain and out of domain scales as more training data is collected. To this end we collect and release a new dataset, AndroidControl, consisting of 15,283 demonstrations of everyday tasks with Android apps. Compared to existing datasets, each AndroidControl task instance includes both high and low-level human-generated instructions, allowing us to explore the level of task complexity an agent can handle. Moreover, AndroidControl is the most diverse computer control dataset to date, including 15,283 unique tasks over 833 Android apps, thus allowing us to conduct in-depth analysis of the model performance in and out of the domain of the training data. Using the dataset, we find that when tested in domain fine-tuned models outperform zero and few-shot baselines and scale in such a way that robust performance might feasibly be obtained simply by collecting more data. Out of domain, performance scales significantly more slowly and suggests that in particular for high-level tasks, fine-tuning on more data alone may be insufficient for achieving robust out-of-domain performance.

Updated: 2024-08-25 03:53:10

标题: 关于数据规模对计算机控制代理的影响

摘要: 自主代理控制计算机界面以完成人类任务的技术正在兴起。利用LLM来支持这样的代理已经引起了特别关注，但除非在人类收集的任务演示上进行精细调节，否则性能仍然相对较低。在这项工作中，我们研究了单独进行微调是否是构建真实世界计算机控制代理的一种可行方法。具体而言，我们调查了随着收集更多训练数据，领域内和领域外高低级任务的性能如何变化。为此，我们收集并发布了一个新数据集AndroidControl，其中包含15,283个使用Android应用程序进行日常任务演示。与现有数据集相比，每个AndroidControl任务实例包括高低级人为生成的指令，使我们能够探索代理能够处理的任务复杂性水平。此外，AndroidControl是迄今最多样化的计算机控制数据集，包括833个Android应用程序上的15,283个独特任务，因此我们可以对训练数据的领域内外模型性能进行深入分析。使用数据集，我们发现在领域内进行测试时，经过微调的模型胜过零和少次基线，并且随着收集更多数据，性能会逐渐提升，从而有可能获得稳健的性能。而领域外，性能增长速度明显较慢，并且表明特别是对于高级任务，仅仅依靠更多数据进行微调可能不足以实现稳健的领域外性能。

更新时间: 2024-08-25 03:53:10

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.03679v4

Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor Attacks

Backdoor attacks have become a significant threat to the pre-training and deployment of deep neural networks (DNNs). Although numerous methods for detecting and mitigating backdoor attacks have been proposed, most rely on identifying and eliminating the ``shortcut" created by the backdoor, which links a specific source class to a target class. However, these approaches can be easily circumvented by designing multiple backdoor triggers that create shortcuts everywhere and therefore nowhere specific. In this study, we explore the concept of Multi-Trigger Backdoor Attacks (MTBAs), where multiple adversaries leverage different types of triggers to poison the same dataset. By proposing and investigating three types of multi-trigger attacks including \textit{parallel}, \textit{sequential}, and \textit{hybrid} attacks, we demonstrate that 1) multiple triggers can coexist, overwrite, or cross-activate one another, and 2) MTBAs easily break the prevalent shortcut assumption underlying most existing backdoor detection/removal methods, rendering them ineffective. Given the security risk posed by MTBAs, we have created a multi-trigger backdoor poisoning dataset to facilitate future research on detecting and mitigating these attacks, and we also discuss potential defense strategies against MTBAs.

Updated: 2024-08-25 03:25:16

标题: 到处都有快捷方式，但又无处不在：探索多触发后门攻击

摘要: 后门攻击已经成为深度神经网络（DNNs）的预训练和部署面临的重大威胁。尽管已经提出了许多用于检测和缓解后门攻击的方法，但大多数依赖于识别和消除后门创建的“快捷方式”，将特定源类连接到目标类。然而，这些方法可以很容易地被设计多个后门触发器规避，这些触发器在任何地方都创建快捷方式，因此没有特定的快捷方式。在这项研究中，我们探讨了多触发器后门攻击（MTBAs）的概念，其中多个对手利用不同类型的触发器对同一数据集进行污染。通过提出和研究包括并行、顺序和混合攻击在内的三种类型的多触发器攻击，我们证明了：1）多个触发器可以共存、覆盖或相互激活，2）MTBAs易于破坏大多数现有后门检测/消除方法的主流快捷方式假设，使这些方法失效。鉴于MTBAs带来的安全风险，我们创建了一个多触发器后门毒害数据集，以促进未来检测和缓解这些攻击的研究，并讨论了对抗MTBAs的潜在防御策略。

更新时间: 2024-08-25 03:25:16

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2401.15295v2

Causal Estimation of Exposure Shifts with Neural Networks

A fundamental task in causal inference is estimating the effect of distribution shift in the treatment variable. We refer to this problem as shift-response function (SRF) estimation. Existing neural network methods for causal inference lack theoretical guarantees and practical implementations for SRF estimation. In this paper, we introduce Targeted Regularization for Exposure Shifts with Neural Networks (TRESNET), a method to estimate SRFs with robustness and efficiency guarantees. Our contributions are twofold. First, we propose a targeted regularization loss for neural networks with theoretical properties that ensure double robustness and asymptotic efficiency specific to SRF estimation. Second, we extend targeted regularization to support loss functions from the exponential family to accommodate non-continuous outcome distributions (e.g., discrete counts). We conduct benchmark experiments demonstrating TRESNET's broad applicability and competitiveness. We then apply our method to a key policy question in public health to estimate the causal effect of revising the US National Ambient Air Quality Standards (NAAQS) for PM 2.5 from 12 ${\mu}g/m^3$ to 9 ${\mu}g/m^3$. This change has been recently proposed by the US Environmental Protection Agency (EPA). Our goal is to estimate the reduction in deaths that would result from this anticipated revision using data consisting of 68 million individuals across the U.S.

Updated: 2024-08-25 02:46:35

标题: 使用神经网络进行暴露转变的因果估计

摘要: 因果推断中的一个基本任务是估计治疗变量中的分布偏移效应。我们将这个问题称为移位-响应函数（SRF）估计。现有的神经网络方法在因果推断中缺乏对SRF估计的理论保证和实际实现。在本文中，我们介绍了一种名为神经网络目标正则化曝光偏移（TRESNET）的方法，用于估计具有稳健性和效率保证的SRF。我们的贡献有两个。首先，我们提出了一种针对神经网络的目标正则化损失，具有确保双重稳健性和渐近效率的理论特性，特定于SRF估计。其次，我们将目标正则化扩展为支持指数族损失函数，以适应非连续结果分布（例如，离散计数）。我们进行了基准实验，展示了TRESNET的广泛适用性和竞争力。然后，我们将该方法应用于公共卫生中的一个关键政策问题，即估计修订美国国家环境空气质量标准（NAAQS）中PM2.5从12微克/立方米降至9微克/立方米的因果效应。这一变化最近由美国环境保护局（EPA）提出。我们的目标是估计这一预期修订将导致的死亡人数减少，使用的数据包括美国境内6800万人。

更新时间: 2024-08-25 02:46:35

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2302.02560v4

InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth

Indoor monocular depth estimation helps home automation, including robot navigation or AR/VR for surrounding perception. Most previous methods primarily experiment with the NYUv2 Dataset and concentrate on the overall performance in their evaluation. However, their robustness and generalization to diversely unseen types or categories for indoor spaces (spaces types) have yet to be discovered. Researchers may empirically find degraded performance in a released pretrained model on custom data or less-frequent types. This paper studies the common but easily overlooked factor-space type and realizes a model's performance variances across spaces. We present InSpaceType Dataset, a high-quality RGBD dataset for general indoor scenes, and benchmark 13 recent state-of-the-art methods on InSpaceType. Our examination shows that most of them suffer from performance imbalance between head and tailed types, and some top methods are even more severe. The work reveals and analyzes underlying bias in detail for transparency and robustness. We extend the analysis to a total of 4 datasets and discuss the best practice in synthetic data curation for training indoor monocular depth. Further, dataset ablation is conducted to find out the key factor in generalization. This work marks the first in-depth investigation of performance variances across space types and, more importantly, releases useful tools, including datasets and codes, to closely examine your pretrained depth models. Data and code: https://depthcomputation.github.io/DepthPublic/

Updated: 2024-08-25 02:39:55

标题: InSpaceType：用于重新考虑室内单目深度跨空间类型性能的数据集和基准

摘要: 室内单目深度估计有助于家庭自动化，包括机器人导航或周围感知的AR/VR。大多数先前的方法主要在NYUv2数据集上进行实验，并集中于其评估中的整体性能。然而，它们在室内空间中对多样化未见类型或类别的鲁棒性和泛化能力尚未被发现。研究人员可能会在自定义数据或较少见类型上的发布的预训练模型中发现性能下降。本文研究了普遍但常被忽视的因素-空间类型，并了解了模型在不同空间之间的性能差异。我们提出了InSpaceType数据集，一个用于一般室内场景的高质量RGBD数据集，并在InSpaceType上对13种最新的最先进方法进行基准测试。我们的研究表明，大多数方法在头尾类型之间存在性能不平衡，一些顶尖方法甚至更为严重。该工作详细揭示并分析了透明和鲁棒性的潜在偏见。我们将分析扩展到总共4个数据集，并讨论了用于训练室内单目深度的合成数据筛选的最佳实践。此外，我们进行了数据集消融实验，以找出泛化的关键因素。这项工作标志着对空间类型之间性能差异的首次深入调查，更重要的是，发布了一些有用的工具，包括数据集和代码，以密切检查您的预训练深度模型。数据和代码：https://depthcomputation.github.io/DepthPublic/

更新时间: 2024-08-25 02:39:55

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.13708v1

DHP Benchmark: Are LLMs Good NLG Evaluators?

Large Language Models (LLMs) are increasingly serving as evaluators in Natural Language Generation (NLG) tasks. However, the capabilities of LLMs in scoring NLG quality remain inadequately explored. Current studies depend on human assessments and simple metrics that fail to capture the discernment of LLMs across diverse NLG tasks. To address this gap, we propose the Discernment of Hierarchical Perturbation (DHP) benchmarking framework, which provides quantitative discernment scores for LLMs utilizing hierarchically perturbed text data and statistical tests to measure the NLG evaluation capabilities of LLMs systematically. We have re-established six evaluation datasets for this benchmark, covering four NLG tasks: Summarization, Story Completion, Question Answering, and Translation. Our comprehensive benchmarking of five major LLM series provides critical insight into their strengths and limitations as NLG evaluators.

Updated: 2024-08-25 02:01:38

标题: DHP基准测试：LLMs是否是良好的自然语言生成评估者？

摘要: 大型语言模型（LLMs）越来越多地被用作自然语言生成（NLG）任务的评估器。然而，LLMs在评分NLG质量方面的能力仍然没有得到充分探索。目前的研究依赖于人工评估和不能捕捉LLMs在各种NLG任务中的区分能力的简单指标。为了解决这一差距，我们提出了Hierarchical Perturbation（DHP）基准框架，该框架利用分层扰动文本数据为LLMs提供定量的区分能力评分，并利用统计测试系统地测量LLMs的NLG评估能力。我们为这一基准重新建立了六个评估数据集，涵盖了四个NLG任务：摘要、故事完成、问答和翻译。我们对五个主要的LLM系列进行了全面的基准测试，为它们作为NLG评估器的优势和局限性提供了关键见解。

更新时间: 2024-08-25 02:01:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.13704v1

Path-Consistency: Prefix Enhancement for Efficient Inference in LLM

To enhance the reasoning capabilities of large language models (LLMs), self-consistency has gained significant popularity by combining multiple sampling with majority voting. However, the state-of-the-art self-consistency approaches consume substantial computational resources and lead to significant additional time costs due to the multiple sampling. This prevents its full potential from being realized in scenarios where computational resources are critical. To improve the inference efficiency, this paper introduces \textit{path-consistency}, a method that leverages the confidence of answers generated in earlier branches to identify the prefix of the most promising path. By dynamically guiding the generation of subsequent branches based on this prefix, the \textit{path-consistency} mitigates both the errors and redundancies from random or less useful sampling in self-consistency. As a result, it can significantly accelerate the inference process by reducing the number of tokens generated. Our extensive empirical evaluation shows that the \textit{path-consistency} achieves significant acceleration in inference latency ranging from $7.8\%$ to $40.5\%$, while maintaining or even improving task accuracy across different datasets, including mathematical reasoning, common sense reasoning, symbolic reasoning, and code generation.

Updated: 2024-08-25 01:45:53

标题: 路径一致性：LLM中高效推理的前缀增强

摘要: 为了增强大型语言模型（LLMs）的推理能力，自一致性通过将多次采样与多数投票相结合而受到了极大的欢迎。然而，目前最先进的自一致性方法消耗了大量的计算资源，并由于多次采样导致了显著的额外时间成本。这阻碍了在计算资源关键的场景中充分实现其潜力。为了提高推理效率，本文引入了“路径一致性”方法，利用在先前分支中生成的答案的置信度来识别最有前途路径的前缀。通过根据这个前缀动态地指导后续分支的生成，“路径一致性”减轻了自一致性中随机或不太有用的采样所导致的错误和冗余。结果，它可以通过减少生成的令牌数量显著加速推理过程。我们进行了大量的实证评估，结果显示，“路径一致性”在不同数据集（包括数学推理、常识推理、符号推理和代码生成）上实现了推理延迟的显著加速，范围从7.8%到40.5%，同时在任务准确性方面保持甚至提高了水平。

更新时间: 2024-08-25 01:45:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.01281v1

Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models

Remote sensing imagery has attracted significant attention in recent years due to its instrumental role in global environmental monitoring, land usage monitoring, and more. As image databases grow each year, performing automatic segmentation with deep learning models has gradually become the standard approach for processing the data. Despite the improved performance of current models, certain limitations remain unresolved. Firstly, training deep learning models for segmentation requires per-pixel annotations. Given the large size of datasets, only a small portion is fully annotated and ready for training. Additionally, the high intra-dataset variance in remote sensing data limits the transfer learning ability of such models. Although recently proposed generic segmentation models like SAM have shown promising results in zero-shot instance-level segmentation, adapting them to semantic segmentation is a non-trivial task. To tackle these challenges, we propose a novel method named Text2Seg for remote sensing semantic segmentation. Text2Seg overcomes the dependency on extensive annotations by employing an automatic prompt generation process using different visual foundation models (VFMs), which are trained to understand semantic information in various ways. This approach not only reduces the need for fully annotated datasets but also enhances the model's ability to generalize across diverse datasets. Evaluations on four widely adopted remote sensing datasets demonstrate that Text2Seg significantly improves zero-shot prediction performance compared to the vanilla SAM model, with relative improvements ranging from 31% to 225%. Our code is available at https://github.com/Douglas2Code/Text2Seg.

Updated: 2024-08-25 01:30:47

标题: Text2Seg：通过文本引导的视觉基础模型进行遥感图像语义分割

摘要: 遥感影像由于在全球环境监测、土地利用监测等方面的关键作用而引起了人们的广泛关注。随着每年图像数据库的增长，使用深度学习模型进行自动分割逐渐成为处理数据的标准方法。尽管当前模型的性能有所提高，但仍存在一些限制。首先，为了进行分割，需要对每个像素进行注释。鉴于数据集的庞大规模，只有很小一部分数据被完全注释并准备好进行训练。此外，遥感数据中高内部数据集方差限制了这些模型的迁移学习能力。尽管最近提出的像SAM这样的通用分割模型在零样本实例级分割方面取得了有希望的结果，但将其调整为语义分割是一项非常困难的任务。为了解决这些挑战，我们提出了一种名为Text2Seg的新方法，用于遥感语义分割。Text2Seg通过使用不同的视觉基础模型（VFMs）进行自动提示生成过程，从而克服了对广泛注释的依赖，这些模型被训练以以不同方式理解语义信息。这种方法不仅减少了对完全注释数据集的需求，还增强了模型在各种数据集之间的泛化能力。对四个广泛采用的遥感数据集的评估表明，与普通的SAM模型相比，Text2Seg在零样本预测性能上取得了显著的改善，相对改进范围从31％到225％不等。我们的代码可在https://github.com/Douglas2Code/Text2Seg 上找到。

更新时间: 2024-08-25 01:30:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2304.10597v2

A Note On Deterministic Submodular Maximization With Bounded Curvature

We show that the recent breakthrough result of [Buchbinder and Feldman, FOCS'24] could further lead to a deterministic $(1-\kappa_{f}/e-\varepsilon)$-approximate algorithm for maximizing a submodular function with curvature $\kappa_{f}$ under matroid constraint.

Updated: 2024-08-25 01:16:30

标题: 关于具有有界曲率的确定性次模最大化的注解

摘要: 我们展示了最近的突破性成果[Buchbinder和Feldman，FOCS'24]可能进一步导致在满足matroid约束下，最大化曲率为$\kappa_{f}$的次模函数的确定性$(1-\kappa_{f}/e-\varepsilon)$-近似算法。

更新时间: 2024-08-25 01:16:30

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2409.02943v1

Revisiting DNN Training for Intermittently Powered Energy Harvesting Micro Computers

The deployment of Deep Neural Networks in energy-constrained environments, such as Energy Harvesting Wireless Sensor Networks, presents unique challenges, primarily due to the intermittent nature of power availability. To address these challenges, this study introduces and evaluates a novel training methodology tailored for DNNs operating within such contexts. In particular, we propose a dynamic dropout technique that adapts to both the architecture of the device and the variability in energy availability inherent in energy harvesting scenarios. Our proposed approach leverages a device model that incorporates specific parameters of the network architecture and the energy harvesting profile to optimize dropout rates dynamically during the training phase. By modulating the network's training process based on predicted energy availability, our method not only conserves energy but also ensures sustained learning and inference capabilities under power constraints. Our preliminary results demonstrate that this strategy provides 6 to 22 percent accuracy improvements compared to the state of the art with less than 5 percent additional compute. This paper details the development of the device model, describes the integration of energy profiles with intermittency aware dropout and quantization algorithms, and presents a comprehensive evaluation of the proposed approach using real-world energy harvesting data.

Updated: 2024-08-25 01:13:00

标题: 重新审视间歇性供电的能量收集微型计算机的深度神经网络训练

摘要: 在能量受限环境中部署深度神经网络，如能量收集的无线传感器网络，存在独特的挑战，主要是由于电力可用性的间歇性特性。为了解决这些挑战，本研究介绍并评估了一种专为在这种背景下运行的DNN量身定制的新型训练方法。具体来说，我们提出了一种动态的dropout技术，该技术适应于设备的体系结构和能量收集场景中固有的能量可用性的变化。我们提出的方法利用了一个包含网络体系结构和能量收集概况的特定参数的设备模型，在训练阶段动态优化dropout率。通过基于预测的能量可用性调节网络的训练过程，我们的方法不仅节约能量，而且确保在电力限制下持续的学习和推断能力。我们的初步结果表明，与现有技术相比，这种策略提供了6至22％的准确性改进，而额外计算量不到5％。本文详细介绍了设备模型的开发，描述了将能量概况与间歇感知的dropout和量化算法集成，并使用真实世界的能量收集数据对所提出的方法进行了全面评估。

更新时间: 2024-08-25 01:13:00

领域: cs.LG

下载: http://arxiv.org/abs/2408.13696v1

Infinite-Horizon Reach-Avoid Zero-Sum Games via Deep Reinforcement Learning

In this paper, we consider the infinite-horizon reach-avoid zero-sum game problem, where the goal is to find a set in the state space, referred to as the reach-avoid set, such that the system starting at a state therein could be controlled to reach a given target set without violating constraints under the worst-case disturbance. We address this problem by designing a new value function with a contracting Bellman backup, where the super-zero level set, i.e., the set of states where the value function is evaluated to be non-negative, recovers the reach-avoid set. Building upon this, we prove that the proposed method can be adapted to compute the viability kernel, or the set of states which could be controlled to satisfy given constraints, and the backward reachable set, or the set of states that could be driven towards a given target set. Finally, we propose to alleviate the curse of dimensionality issue in high-dimensional problems by extending Conservative Q-Learning, a deep reinforcement learning technique, to learn a value function such that the super-zero level set of the learned value function serves as a (conservative) approximation to the reach-avoid set. Our theoretical and empirical results suggest that the proposed method could learn reliably the reach-avoid set and the optimal control policy even with neural network approximation.

Updated: 2024-08-25 01:00:49

标题: 通过深度强化学习实现无限时域的避开-避免零和博弈

摘要: 在这篇论文中，我们考虑了无限时间跨度的到达-避免零和博弈问题，目标是找到一个在状态空间中的集合，称为到达-避免集，使得在该集合中的系统可以被控制以在最坏情况干扰下到达给定目标集而不违反约束。我们通过设计一个具有收缩贝尔曼备份的新价值函数来解决这个问题，其中超零级集，即价值函数被评估为非负的状态集，恢复了到达-避免集。在此基础上，我们证明了所提出的方法可以被调整为计算可行域核，或者可以被控制以满足给定约束的状态集，以及向后可达集，或者可以被驱向给定目标集的状态集。最后，我们提出通过将深度强化学习技术Conservative Q-Learning扩展到学习一个价值函数，使得学习到的价值函数的超零级集作为到达-避免集的（保守）近似，来缓解高维问题中的维度诅咒问题。我们的理论和实证结果表明，所提出的方法即使使用神经网络逼近，也能可靠地学习到到达-避免集和最优控制策略。

更新时间: 2024-08-25 01:00:49

领域: eess.SY,cs.AI,cs.LG,cs.SY,math.OC

下载: http://arxiv.org/abs/2203.10142v2